Message ID | 1511803118-2552-6-git-send-email-tixxdz@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote: > This uses the new request_module_cap() facility to directly propagate > CAP_NET_ADMIN capability and the 'netdev' module prefix to the > capability subsystem as it was suggested. This is the kind of complexity that I wonder if it's worth it at all. Nobody sane actually uses those stupid capability bits. Have you ever actually seen it used in real life? They were a mistake, and we should never have done them - another case of security people who think that complexity == security, when in reality nobody actually wants the complexity or is willing to set it up and manage it. Linus
Hi Linus, On Mon, Nov 27, 2017 at 7:44 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote: >> This uses the new request_module_cap() facility to directly propagate >> CAP_NET_ADMIN capability and the 'netdev' module prefix to the >> capability subsystem as it was suggested. > > This is the kind of complexity that I wonder if it's worth it at all. > > Nobody sane actually uses those stupid capability bits. Have you ever > actually seen it used in real life? Yes they are complicated even for developers, and normal users do not understand them, however yes every sandbox and container is exposing them to endusers directly, they are documented! so yes CAP_SYS_MODULE is exposed but it does not cover autoloading. However, we are trying hard to abstract some semantics that are easy to grasp, we are mutating capabilities and seccomp to have an abstracted "yes/no" options for our endusers. Now, if you are referring to kernel code, the networking subsystem is using them and I don't want to break any assumption here. There is still the request_module(), the request_module_cap() was suggested so networking code later won't have to do the checks on its own, and maybe it can be consistent in the long term. The phonet sockets even needs CAP_SYS_ADMIN... > > They were a mistake, and we should never have done them - another case > of security people who think that complexity == security, when in > reality nobody actually wants the complexity or is willing to set it > up and manage it. Alright, but I guess we are stuck, is there something better on how we can do this or describe this ? Please note in these patches, the mode is specifically described as: * allowed: for backward compatibility (I would have done without it) * privileged: which includes capabilities (backward compatibility too) or we can add what ever in the future * disabled: even for privileged. So I would have preferred if it is something like "yes/no" but... However in userspace we will try hard to hide this complexity and the capability bits. Now I can see that the code comments and doc refer to privileged with capabilities a lot, where we can maybe update that doc and code to less state that privileged means capabilities ? Suggestions ? Thanks! > Linus
On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote: > > However, we are trying hard to abstract some semantics that are easy > to grasp, we are mutating capabilities and seccomp to have an > abstracted "yes/no" options for our endusers. Yes. Sadly, it looks like we actually do have users that just expect to load modules dynamically without any capabilities at all. So we can't actually disallow it by default at all, which imho makes this security option essentially useless. A security option that people can't use without breaking their system is pointless. We saw that with SELinux - people ended up just disabling it for _years_, simply because it ended up breaking so much in practice. And yes, it got fixed eventually, but at an incredibly high maintenance cost of all the crazy rules databases. > Alright, but I guess we are stuck, is there something better on how we > can do this or describe this ? So I wonder if we can perhaps look at the places that actually do "requerst_module()", and start filtering them on that basis. Some of them will already have checked for capabilities. Others clearly expect to juist work even _without_ capabilities (ie the bluetoothd case). So the whole "let's add a global config option" model is broken. There is no possible global rule. It will break things, which in turn mean that people won't turn it on (and we can't turn it on by default), which in turn makes this pointless. In other words, I really think that anything that just adds a mode flag cannot work. So instead of having one "modules_autoload_mode" thing, maybe the individual requerst_module() cases need to simply be audited. Put another way: I think the part of your patch series that does that "request_module_cap()" and makes the netdev modules use it is a good addition. It's the "mode" part I really don't agree with, because apparently we really need to default it to permissive. So how about instead: - add that "request_module_cap()" and make the networking code that already uses CAP_ADMIN_NET use it. - make "request_module()" itself default to being "request_module_cap(CAP_SYS_MODULE,..)" - make sure that when the capability check fails, we print an error message, and then for the ones that trigger, we will audit them and see if it's ok. Because that "mode" flag defaulting to off will just mean that the default case will remain the existing unsafe one, and that's bad. Opt-in really doesn't work. We've done it. Global flags for varied behavior really doesn't work. We've done that too. Different cases want different behavior, the global flag is just fundamentally broken. Linus
On Mon, Nov 27, 2017 at 2:04 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote: >> >> However, we are trying hard to abstract some semantics that are easy >> to grasp, we are mutating capabilities and seccomp to have an >> abstracted "yes/no" options for our endusers. > > Yes. > > Sadly, it looks like we actually do have users that just expect to > load modules dynamically without any capabilities at all. > > So we can't actually disallow it by default at all, which imho makes > this security option essentially useless. The good news here is that this series comes with an expected user: systemd, for which I suspect this will option will get wider and wider use. > A security option that people can't use without breaking their system > is pointless. Another bit of good news is that systems _depending_ cap-less module loading are uncommon enough that many system builders can enable this protection and nothing will break. (Even you thought it was rare enough that we could just make this default-enabled.) Besides, distros understand that they have to keep an eye out for these things since the kernel has a long history of not allowing default-enabled protections that change userspace behavior, etc. > So I wonder if we can perhaps look at the places that actually do > "requerst_module()", and start filtering them on that basis. > > Some of them will already have checked for capabilities. > > Others clearly expect to just work even _without_ capabilities (ie > the bluetoothd case). > > So the whole "let's add a global config option" model is broken. There > is no possible global rule. It will break things, which in turn mean > that people won't turn it on (and we can't turn it on by default), > which in turn makes this pointless. > > In other words, I really think that anything that just adds a mode > flag cannot work. > > So instead of having one "modules_autoload_mode" thing, maybe the > individual requerst_module() cases need to simply be audited. > > Put another way: I think the part of your patch series that does that > "request_module_cap()" and makes the netdev modules use it is a good > addition. > > It's the "mode" part I really don't agree with, because apparently we > really need to default it to permissive. > > So how about instead: > > - add that "request_module_cap()" and make the networking code that > already uses CAP_ADMIN_NET use it. > > - make "request_module()" itself default to being > "request_module_cap(CAP_SYS_MODULE,..)" > > - make sure that when the capability check fails, we print an error > message, and then for the ones that trigger, we will audit them and > see if it's ok. The issue isn't "is it always okay to cap-less-load this module?" it's "should the kernel's attack surface be allowed to grown by an unprivileged user?" Nearly all the request_module() sites have already been audited and adjusted to avoid _arbitrary_ module loading (usually by adding module prefixes: netdev-, crypto-, etc), and that greatly helped reduce the ability for a unprivileged user to load a known-buggy module, but there will remain many subsystems that continue to expect to grow the kernel's features on demand within their subsystem, and sometimes those modules will have bugs that an unprivileged user can exploit (this history of this happening is quite real; it's not a theoretical concern). This is especially problematic for distro kernels where they build tons of modules. There isn't a way for an admin to say "I only want root to be able to load modules". They can't delete all the "unused" kernel modules, because maybe root will want the module, they can't change module file permissions because the modules are read with init's permissions (and it would be terrible to repeatedly change all the file perms each time a new kernel got installed), modules can only be blacklisted (not whitelisted) in /etc/modprobe.d/, etc. > Because that "mode" flag defaulting to off will just mean that the > default case will remain the existing unsafe one, and that's bad. > > Opt-in really doesn't work. We've done it. > > Global flags for varied behavior really doesn't work. We've done that > too. Different cases want different behavior, the global flag is just > fundamentally broken. I don't disagree that a global should be avoided, but I'm struggling to see another option here. We can't break userspace by default so we can't restrict cap-less loading by default. But we can allow userspace to _choose_ to break itself, especially within a container. This isn't uncommon, especially for modules, where we even have the global "modules_disabled" sysctl already. The level of granularity of control here is the issue, and it's what this series solves. The options I see for module loading control are: 1) monolithic kernel (no modules) 2) modular kernel that flips on modules_disabled after boot (no modules after boot) 3) modular kernel that allows per-subsystem unpriv module loading (all modules loadable) There is a demand for something between 2 and 3 where only root can load modules. (And as pointed out in the series, this is _especially_ true for containers where the admin may want to leave module loading alone in the init namespace, but stop any module loading in the container.) -Kees
On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: > > I don't disagree that a global should be avoided, but I'm struggling > to see another option here. We can't break userspace by default so we > can't restrict cap-less loading by default. But we can allow userspace > to _choose_ to break itself, especially within a container. This isn't > uncommon, especially for modules, where we even have the global > "modules_disabled" sysctl already. The level of granularity of control > here is the issue, and it's what this series solves. So there's two "global" here - if a container were to choose to break itself, it should damn well be container-specific, not some global option This part seems to be ok in the patch series, since the "global" is really per-task. So it's not global in the "system-wide" sense. - if _one_ request_module() caller were to say "I don't want to be loaded by a normal user", that doesn't mean that _other_ request_module() cases shouldn't. This is the part I'm objecting to, because it means that we can't enable this stricter policy by default. And the thing is, the patch series seems to already introduce largely the better model of just making it site-specific. Introducing that request_module_cap() thing and then using it for networking is a good step. But I also suspect that we _could_ just make the stricter rules actually be default, if we just fixed the thing up to not be "every request_module() is the same". For example, several request_module() calls come from device node opens, and it makes sense that we can just say: "if you have access to the device node, then you have the right to request the module". But that would need to be not a global "request_module()" behavior, but a behavior that is tied to the particular call-site. IOW, extend on that request_module_cap() model, and introduce (perhaps) a "request_module_dev()" call that basically means "the user opened the device node for the requested module". Because those kinds of permissions aren't necessarily about capabilities, but about things like "I'm in the dialout group, I get to open tty devices and by implication request their modules". And that _really_ isn't global behavior. The fact that I might be able to load a serial; module has *nothing* to do with whether I can load some other kind of module at all. That global mode is just wrong. Linus
On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: >> >> I don't disagree that a global should be avoided, but I'm struggling >> to see another option here. We can't break userspace by default so we >> can't restrict cap-less loading by default. But we can allow userspace >> to _choose_ to break itself, especially within a container. This isn't >> uncommon, especially for modules, where we even have the global >> "modules_disabled" sysctl already. The level of granularity of control >> here is the issue, and it's what this series solves. > > So there's two "global" here > > - if a container were to choose to break itself, it should damn well > be container-specific, not some global option > > This part seems to be ok in the patch series, since the "global" is > really per-task. So it's not global in the "system-wide" sense. > > - if _one_ request_module() caller were to say "I don't want to be > loaded by a normal user", that doesn't mean that _other_ > request_module() cases shouldn't. > > This is the part I'm objecting to, because it means that we can't > enable this stricter policy by default. > > And the thing is, the patch series seems to already introduce largely > the better model of just making it site-specific. Introducing that > request_module_cap() thing and then using it for networking is a good > step. > > But I also suspect that we _could_ just make the stricter rules > actually be default, if we just fixed the thing up to not be "every > request_module() is the same". > > For example, several request_module() calls come from device node > opens, and it makes sense that we can just say: "if you have access to > the device node, then you have the right to request the module". > > But that would need to be not a global "request_module()" behavior, > but a behavior that is tied to the particular call-site. > > IOW, extend on that request_module_cap() model, and introduce > (perhaps) a "request_module_dev()" call that basically means "the user > opened the device node for the requested module". > > Because those kinds of permissions aren't necessarily about > capabilities, but about things like "I'm in the dialout group, I get > to open tty devices and by implication request their modules". > > And that _really_ isn't global behavior. The fact that I might be > able to load a serial; module has *nothing* to do with whether I can > load some other kind of module at all. > > That global mode is just wrong. What about exporting this entirely to userspace, giving it as much context as possible? i.e. inform modprobe about the user doing it, maybe the subsystem, etc? -Kees
On Mon, Nov 27, 2017 at 3:19 PM, Kees Cook <keescook@chromium.org> wrote: > > What about exporting this entirely to userspace, giving it as much > context as possible? i.e. inform modprobe about the user doing it, > maybe the subsystem, etc? Yeah, except for the fact that we don't trust user-mode? We used to do that exact thing. It was a nasty disaster, and caused version skew and other horrible problems. So no. Th e"let's just let user mode sort it out" doesn't work. User mode doesn't sort anything out, it just makes it worse. It's not some made-up example when I say that user-mode has decided that kernel requests have to be completely serialized, and recusive invocations will just hang. So no. We do not go down that particular rat-hole. It's just a bigger chance of getting things wrong. Linus
On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: >> >> I don't disagree that a global should be avoided, but I'm struggling >> to see another option here. We can't break userspace by default so we >> can't restrict cap-less loading by default. But we can allow userspace >> to _choose_ to break itself, especially within a container. This isn't >> uncommon, especially for modules, where we even have the global >> "modules_disabled" sysctl already. The level of granularity of control >> here is the issue, and it's what this series solves. > > So there's two "global" here > > - if a container were to choose to break itself, it should damn well > be container-specific, not some global option > > This part seems to be ok in the patch series, since the "global" is > really per-task. So it's not global in the "system-wide" sense. Right, though in the case of init, it could flip that toggle for itself and it would then effectively be system-wide. > - if _one_ request_module() caller were to say "I don't want to be > loaded by a normal user", that doesn't mean that _other_ > request_module() cases shouldn't. > > This is the part I'm objecting to, because it means that we can't > enable this stricter policy by default. Okay, I see what you mean here. You want to clearly distinguish between unprivileged and privileged-only. I'm unconvinced that's going to change much, as the bulk of the exposed request_module() users are already expecting to be unprivileged (and that's why they were all converted to requiring a named prefix). > And the thing is, the patch series seems to already introduce largely > the better model of just making it site-specific. Introducing that > request_module_cap() thing and then using it for networking is a good > step. > > But I also suspect that we _could_ just make the stricter rules > actually be default, if we just fixed the thing up to not be "every > request_module() is the same". When doing some of the older module name prefix work, I did consider introducing a new request_module() API that included the prefix name as an explicit argument (instead of embedding it in the format string). We could easily start there, and then have "plain" request_module() require privs. But we'll still need a way to say "admin doesn't want unpriv module auto-loading". > For example, several request_module() calls come from device node > opens, and it makes sense that we can just say: "if you have access to > the device node, then you have the right to request the module". Many of these callers are using network interfaces to do this work, so there isn't as clean a permission model associated with those like there might be with a filesystem open(). But that doesn't matter (see below). > But that would need to be not a global "request_module()" behavior, > but a behavior that is tied to the particular call-site. > > IOW, extend on that request_module_cap() model, and introduce > (perhaps) a "request_module_dev()" call that basically means "the user > opened the device node for the requested module". > > Because those kinds of permissions aren't necessarily about > capabilities, but about things like "I'm in the dialout group, I get > to open tty devices and by implication request their modules". This really doesn't address the main concern that is the problem: whitelisting vs blacklisting. In your example, the dialout group gives access to specific ttys or serial ports, etc, but an admin may want a way to make sure the users don't load some buggy line discipline. Now, that admin could blacklist all those modules one at a time, but new stuff might get introduced, it doesn't handle other subsystems, etc. We need to provide a way to whitelist autoloaded modules. The demonstrated need (to whitelist _no_ modules, addressed by this series) provides that level of control on a task basis (effectively making it container-specific). > And that _really_ isn't global behavior. The fact that I might be > able to load a serial; module has *nothing* to do with whether I can > load some other kind of module at all. > > That global mode is just wrong. If the per-task thing stays and the global sysctl goes, that would be fine by me. That still gives admins a way to control the autoload behavior, assuming their init knows how to set the flag. The global sysctl, in my mind, is really more of a way for an admin to do this after the fact without rebooting, etc. But I don't have a strong opinion about the global sysctl. -Kees
> From: Linus Torvalds <torvalds@linux-foundation.org> > Sent: Mon Nov 27 23:04:58 CET 2017 > To: Djalal Harouni <tixxdz@gmail.com> > Cc: Kees Cook <keescook@chromium.org>, Andy Lutomirski <luto@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Luis R. Rodriguez <mcgrof@kernel.org>, James Morris <james.l.morris@oracle.com>, Ben Hutchings <ben.hutchings@codethink.co.uk>, Solar Designer <solar@openwall.com>, Serge Hallyn <serge@hallyn.com>, Jessica Yu <jeyu@kernel.org>, Rusty Russell <rusty@rustcorp.com.au>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, LSM List <linux-security-module@vger.kernel.org>, kernel-hardening@lists.openwall.com <kernel-hardening@lists.openwall.com>, Jonathan Corbet <corbet@lwn.net>, Ingo Molnar <mingo@kernel.org>, David S. Miller <davem@davemloft.net>, Network Development <netdev@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org> > Subject: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules > > > On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote: > > > > However, we are trying hard to abstract some semantics that are easy > > to grasp, we are mutating capabilities and seccomp to have an > > abstracted "yes/no" options for our endusers. > > Yes. > > Sadly, it looks like we actually do have users that just expect to > load modules dynamically without any capabilities at all. > > So we can't actually disallow it by default at all, which imho makes > this security option essentially useless. > > A security option that people can't use without breaking their system > is pointless. > > We saw that with SELinux - people ended up just disabling it for > _years_, simply because it ended up breaking so much in practice. And > yes, it got fixed eventually, but at an incredibly high maintenance > cost of all the crazy rules databases. > > > Alright, but I guess we are stuck, is there something better on how we > > can do this or describe this ? > > So I wonder if we can perhaps look at the places that actually do > "requerst_module()", and start filtering them on that basis. > > Some of them will already have checked for capabilities. > > Others clearly expect to juist work even _without_ capabilities (ie > the bluetoothd case). > > So the whole "let's add a global config option" model is broken. There > is no possible global rule. It will break things, which in turn mean > that people won't turn it on (and we can't turn it on by default), > which in turn makes this pointless. > > In other words, I really think that anything that just adds a mode > flag cannot work. > > So instead of having one "modules_autoload_mode" thing, maybe the > individual requerst_module() cases need to simply be audited. > > Put another way: I think the part of your patch series that does that > "request_module_cap()" and makes the netdev modules use it is a good > addition. > > It's the "mode" part I really don't agree with, because apparently we > really need to default it to permissive. > > So how about instead: > > - add that "request_module_cap()" and make the networking code that > already uses CAP_ADMIN_NET use it. > > - make "request_module()" itself default to being > "request_module_cap(CAP_SYS_MODULE,..)" > > - make sure that when the capability check fails, we print an error > message, and then for the ones that trigger, we will audit them and > see if it's ok. > > Because that "mode" flag defaulting to off will just mean that the > default case will remain the existing unsafe one, and that's bad. > > Opt-in really doesn't work. We've done it. > > Global flags for varied behavior really doesn't work. We've done that > too. Different cases want different behavior, the global flag is just > fundamentally broken. > > Linus > ---------------------------------------- The only reason this is off by default is kernel rule No 1. Most of new security features are off or WARN_ON because of it otherwise bad things happen on mailinglists :) . It seems to me that you demand an impossible job here. Don't break userspace and don't disable by default. It can't happen due to years of technical debt. I mean everyone prefers default-on but we can choose best alternative instead of maintaining status quo. Userspace can be configured in a way which is compatible with those changes being on the same as it can be configured to work with selinux. That means on distro level or sysadmin level it's a valuable tool. It's better than nothing and it's better than using some out-of-tree patches instead. Switching one sysctl would make their life easier. There is demand for kernel hardening stuff and default-off behavior is good compromise between breaking changes and nothing at all. The kernel has very diverse users group with different needs and capabilities so there is no one size fits all. Targeting all of them will often end in nothing useful to anyone. Yours sincerely G. K.
On Tue, Nov 28, 2017 at 01:16:59PM +0100, Geo Kozey wrote: > > Userspace can be configured in a way which is compatible with those > changes being on the same as it can be configured to work with > selinux. That means on distro level or sysadmin level it's a > valuable tool. It's better than nothing and it's better than using > some out-of-tree patches instead. Switching one sysctl would make > their life easier. If *selinux* can opt-in to something more stringent, such that when you upgrade to a new version of selinux which enables something which breaks all modules unless you set up the rules corretly, I don't see a problem with it. It might force distributions not to go to the latest version of SELinux because users get cranky when their systems get broken, but for people like me, who *still* don't use SELinux because every few years, i try to enable on my development laptop running Debian, watch ***far*** too much stuff break. and then turn it off again. So tieing it to SELinux (as far as I am concerned) reduces it to a previously unsolved problem. :-) But that's different from opting it on by default for non-SELinux users. To which I can only say, "Please, No." - Ted
On Tue, Nov 28, 2017 at 11:32 AM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Nov 28, 2017 at 01:16:59PM +0100, Geo Kozey wrote: >> >> Userspace can be configured in a way which is compatible with those >> changes being on the same as it can be configured to work with >> selinux. That means on distro level or sysadmin level it's a >> valuable tool. It's better than nothing and it's better than using >> some out-of-tree patches instead. Switching one sysctl would make >> their life easier. > > If *selinux* can opt-in to something more stringent, such that when > you upgrade to a new version of selinux which enables something which > breaks all modules unless you set up the rules corretly, I don't see a > problem with it. It might force distributions not to go to the latest > version of SELinux because users get cranky when their systems get > broken, but for people like me, who *still* don't use SELinux because > every few years, i try to enable on my development laptop running > Debian, watch ***far*** too much stuff break. and then turn it off > again. So tieing it to SELinux (as far as I am concerned) reduces it to > a previously unsolved problem. :-) > > But that's different from opting it on by default for non-SELinux > users. To which I can only say, "Please, No." I don't want to see this tied to SELinux because it narrows the audience, and SELinux still hasn't solved their issues in containers. I think the per-task setting is sufficient. Linus, are you okay with this series if the global sysctl gets dropped? -Kees
On Tue, Nov 28, 2017 at 12:08 PM, Kees Cook <keescook@chromium.org> wrote: > > Linus, are you okay with this series if the global sysctl gets dropped? So really, it's not the "global sysctl" as much as the "global request_module()" that annoys me. I'll happily take the request_module_cap() part and the thing that makes networking use that. But the flag that we have to default to off because it breaks every single box otherwise? No. It doesn't matter if it's one single global or just a "global behavior for request_module() for this process" at that point, it's still a pointless security flag that is opt-in. Linus
On Tue, Nov 28, 2017 at 12:12 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Nov 28, 2017 at 12:08 PM, Kees Cook <keescook@chromium.org> wrote: >> >> Linus, are you okay with this series if the global sysctl gets dropped? > > So really, it's not the "global sysctl" as much as the "global > request_module()" that annoys me. > > I'll happily take the request_module_cap() part and the thing that > makes networking use that. > > But the flag that we have to default to off because it breaks every > single box otherwise? No. It doesn't matter if it's one single global > or just a "global behavior for request_module() for this process" at > that point, it's still a pointless security flag that is opt-in. To be clear: such a flag wouldn't doesn't break every system, but I understand your concern. So what's the right path forward for allowing a way to block autoloading? Separate existing request_module() calls into "must be privileged" and "can be unpriv" first, then rework the series to deal with the "unpriv okay" subset? -Kees
On Tue, Nov 28, 2017 at 12:20 PM, Kees Cook <keescook@chromium.org> wrote: > > So what's the right path forward for allowing a way to block > autoloading? Separate existing request_module() calls into "must be > privileged" and "can be unpriv" first, then rework the series to deal > with the "unpriv okay" subset? So once we've taken care of the networking ones that check their own different capability bit, maybe we can then make the regular request_module() do a rate-limited warning for non-CAP_SYS_MODULE uses that prints which module it's loading. And then just see what people report. Because maybe it's just a very small handful that matters, and we can say "those are ok". And maybe that is too optimistic, and we have a lot of device driver ones because people still have a static /dev and don't have udev populating modules and device nodes, and then maybe we need to introduce a "request_module_dev()" where the rule is that you had to at least have privileges to open the device node. Because I really am *not* interested in these security flags that are off by default and then get turned on by special cases. I think it's completely unacceptable to say "we're insecure by default but then you can do X and be secure". It doesn't work. It doesn't fix anything. Linus
On Tue, Nov 28, 2017 at 9:33 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Nov 28, 2017 at 12:20 PM, Kees Cook <keescook@chromium.org> wrote: >> >> So what's the right path forward for allowing a way to block >> autoloading? Separate existing request_module() calls into "must be >> privileged" and "can be unpriv" first, then rework the series to deal >> with the "unpriv okay" subset? > > So once we've taken care of the networking ones that check their own > different capability bit, maybe we can then make the regular > request_module() do a rate-limited warning for non-CAP_SYS_MODULE uses > that prints which module it's loading. Alright, I can start by those. > And then just see what people report. > > Because maybe it's just a very small handful that matters, and we can > say "those are ok". The ones that are in the cover letter, etc may not have the appropriate context, the request_module_dev() sure can be made since if you can open you already have the context. > And maybe that is too optimistic, and we have a lot of device driver > ones because people still have a static /dev and don't have udev > populating modules and device nodes, and then maybe we need to > introduce a "request_module_dev()" where the rule is that you had to > at least have privileges to open the device node. > > Because I really am *not* interested in these security flags that are > off by default and then get turned on by special cases. I think it's > completely unacceptable to say "we're insecure by default but then you > can do X and be secure". It doesn't work. It doesn't fix anything. this still leaves all the cases where we don't have the appropriate context and other implicit loads that are triggered by another implicit load, etc. Also the simple local flag is easy to grasp, with real users for it, and we can abstract on top with load "newfeatures" of course this does not mean that we should say "we-re insecure by default". I want it to be more allow newfeatures or not for apps and users... requiring caps may give users the idea to pass CAP_SYS_MODULE or other caps for something that used to work, they may start giving it if we break lot of usecases, and yeh the caps are much broader and do much more harm... Ok, so beside updating with request_module_cap() I will investigate request_module_dev() and we can see Thanks! > Linus
On Tue, Nov 28, 2017 at 12:33 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Nov 28, 2017 at 12:20 PM, Kees Cook <keescook@chromium.org> wrote: >> >> So what's the right path forward for allowing a way to block >> autoloading? Separate existing request_module() calls into "must be >> privileged" and "can be unpriv" first, then rework the series to deal >> with the "unpriv okay" subset? > > So once we've taken care of the networking ones that check their own > different capability bit, maybe we can then make the regular > request_module() do a rate-limited warning for non-CAP_SYS_MODULE uses > that prints which module it's loading. > > And then just see what people report. I'm fine to try this experiment. I just think it'll be very noisy. :) > Because maybe it's just a very small handful that matters, and we can > say "those are ok". See, that's the problem: it's not a matter of "those are ok". Those modules will either have bugs discovered in them or will gain bugs over time. The point is to provide a mechanism to block an entire attack surface on systems that do not need auto-loading (which is a lot of systems). As I've said before, this isn't a theoretical attack surface. This year alone there have been three known-exploitable flaws exposed by autoloading: The exploit for CVE-2017-2636 uses int n_hdlc = N_HDLC; ioctl(fd, TIOCSETD, &n_hdlc) [1]. This is using the existing "tty-ldisc-" prefix, and is intentionally unprivileged. The exploit for CVE-2017-6074 uses socket(PF_INET6, SOCK_DCCP, IPPROTO_IP) [2]. This is using the existing proto prefix, and is intentionally unprivileged. The exploit for CVE-2017-7184 uses socket(PF_NETLINK, SOCK_RAW, NETLINK_XFRM) [3]. This is using existing proto prefix and became unprivileged by way of user namespaces (CAP_NET_ADMIN in a user-namespace). Concerned admins have already disabled user namespaces, so they were protected from the last one, but they couldn't stop the first two without explicit blacklists ahead of time. (Though anyone paying attention to vulnerability histories would have already blacklisted DCCP.) > And maybe that is too optimistic, and we have a lot of device driver > ones because people still have a static /dev and don't have udev > populating modules and device nodes, and then maybe we need to > introduce a "request_module_dev()" where the rule is that you had to > at least have privileges to open the device node. > > Because I really am *not* interested in these security flags that are > off by default and then get turned on by special cases. I think it's > completely unacceptable to say "we're insecure by default but then you > can do X and be secure". It doesn't work. It doesn't fix anything. I empathize with this stance, but the last many years of hardening has been doing this just fine, and we've progressively moved defenses from "off by default" to "on by default", by way of distros, etc, over time. But sometimes we have things that must stay off by default to avoid breaking rule #1, even though every distro has turned it on by default (e.g. link restrictions). So, saying "it must be on by default and not break userspace" puts us in a bit of a catch-22. It sounds like the proposed path forward is: 1) convert existing privileged request_module() usage to use explicit permission and/or capability checks 2) mark the remaining with WARN_RATELIMIT 3) wait 4) ??? It's steps 3 and 4 that worry me because we have a standing need for a solution here, and "wait" isn't a very convincing solution and step 4 is unknown. We know we're going to have unprivileged module loading (e.g. AF_ALG), even if we reduce the places where it's used. In the end, we will still need a way for a process to say "I don't want to auto-load modules from here on out". That's the part I want to figure out, otherwise Djalal doesn't have a way to address the attack surface. -Kees [1] https://github.com/snorez/exploits/blob/master/cve-2017-2636/cve-2017-2636.c [2] https://github.com/xairy/kernel-exploits/blob/master/CVE-2017-6074/poc.c [3] https://github.com/snorez/exploits/blob/master/cve-2017-7184/exp.c
> From: Linus Torvalds <torvalds@linux-foundation.org> > Sent: Tue Nov 28 21:33:22 CET 2017 > To: Kees Cook <keescook@chromium.org> > Subject: Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules > Because I really am *not* interested in these security flags that are > off by default and then get turned on by special cases. I think it's > completely unacceptable to say "we're insecure by default but then you > can do X and be secure". It doesn't work. It doesn't fix anything. > > Linus > ---------------------------------------- What about "we're insecure by default but you can't do anything to change this"? It describes current situation. For last 20 years linux allowed for insecure behavior and tons of tools were built depending on it. It's recurring theme of kernel security development. I'll be glad if some genius propose perfect idea solving this problem but I'm afraid things go nowhere instead. Yours sincerely G. K.
On Tue, Nov 28, 2017 at 01:33:40PM -0800, Kees Cook wrote: > As I've said before, this isn't a theoretical attack surface. This > year alone there have been three known-exploitable flaws exposed by > autoloading: > > The exploit for CVE-2017-2636 uses int n_hdlc = N_HDLC; ioctl(fd, > TIOCSETD, &n_hdlc) [1]. This is using the existing "tty-ldisc-" > prefix, and is intentionally unprivileged. > > The exploit for CVE-2017-6074 uses socket(PF_INET6, SOCK_DCCP, > IPPROTO_IP) [2]. This is using the existing proto prefix, and is > intentionally unprivileged. So in these two cases, if the kernel was built w/o modules, and HDLC and DCCP was built-in, you'd be screwed, then? Is the goal here to protect people using distro kernels which build the world as modules, including dodgy pieces of kernel code that are bug-ridden? If so, then presumably 90% of the problem you've cited can be done by creating a script which takes a look of the modules that are normally in use once the machine is in production, and then deleting everything else? Correct? And yes, this will potentially break some users, but the security folks who are advocating for the more aggressive version of this change seem to be OK with breaking users, so they can do this without making kernel changes. Good luck getting Red Hat and SuSE to accept such a change, though.... - Ted
On Tue, Nov 28, 2017 at 3:23 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Nov 28, 2017 at 01:33:40PM -0800, Kees Cook wrote: >> As I've said before, this isn't a theoretical attack surface. This >> year alone there have been three known-exploitable flaws exposed by >> autoloading: >> >> The exploit for CVE-2017-2636 uses int n_hdlc = N_HDLC; ioctl(fd, >> TIOCSETD, &n_hdlc) [1]. This is using the existing "tty-ldisc-" >> prefix, and is intentionally unprivileged. >> >> The exploit for CVE-2017-6074 uses socket(PF_INET6, SOCK_DCCP, >> IPPROTO_IP) [2]. This is using the existing proto prefix, and is >> intentionally unprivileged. > > So in these two cases, if the kernel was built w/o modules, and HDLC > and DCCP was built-in, you'd be screwed, then? Sure, but that's not the common situation. > Is the goal here to protect people using distro kernels which build > the world as modules, including dodgy pieces of kernel code that are > bug-ridden? The bulk of the risk comes from distro kernels, yes. (Though "bug ridden" is a strong statement. There are and will be bugs, scattered across a wide portion of the kernel, it's just that modules tend to cover most of that attack surface.) > If so, then presumably 90% of the problem you've cited can be done by > creating a script which takes a look of the modules that are normally > in use once the machine is in production, and then deleting everything > else? Correct? This isn't always obvious nor is it easy: a kernel upgrade will reinstall everything, etc. This is effectively a blacklist, and we need an easy whitelist approach. And an admin may not want to make it so hard for themselves to load some special filesystem module that they have to hunt it down and reinstall it. Or they'll want some containers to be able to trigger module loading and others not to. They just want to know their users aren't triggering module loading that expands the kernel attack surface. > And yes, this will potentially break some users, but the security > folks who are advocating for the more aggressive version of this > change seem to be OK with breaking users, so they can do this without > making kernel changes. Good luck getting Red Hat and SuSE to accept > such a change, though.... We should strive to make the kernel easy to protect. Expecting admins to jump through these hoops isn't a sensible way forward. -Kees
On Tue, Nov 28, 2017 at 03:29:01PM -0800, Kees Cook wrote: > > So in these two cases, if the kernel was built w/o modules, and HDLC > > and DCCP was built-in, you'd be screwed, then? > > Sure, but that's not the common situation. > > > Is the goal here to protect people using distro kernels which build > > the world as modules, including dodgy pieces of kernel code that are > > bug-ridden? > > The bulk of the risk comes from distro kernels, yes. (Though "bug > ridden" is a strong statement. There are and will be bugs, scattered > across a wide portion of the kernel, it's just that modules tend to > cover most of that attack surface.) OK, but if the goal is to protect users who are running distro kernels, then a kernel config that breaks some percentage of users is ****highly**** unlikely to be enabled by Red Hat and SuSE, right? And many of these users either can't (from a skills perspective) or won't (because they lose all support from the enterprise distro's help desk) recompile their kernel to enable an "might break 3% of all users --- oh, well" config option. Which argues for being extremely conservative about making something that has an extremely low probability of breaking users, and it points out why Geo Kozey's "who cares about breaking users; security is IMPORTANT" argument is so wrong-headed. If the goal is to protect distro kernels, but a sloppy approach guarantees that distro kernels will never enable it, is it really worth it? - Ted P.S. This is where it might be useful to get some input from the Red Hat and SuSE support teams. How many angry user calls to their help desk are they willing to field before they'll just turn off the kernel config option for their kernels?
On Tue, Nov 28, 2017 at 1:51 PM, Geo Kozey <geokozey@mailfence.com> wrote: > > What about "we're insecure by default but you can't do anything to change this"? It describes current situation. Go away, and don't send me patches until you have dug your head out of whatever hole you have put it in.. If this is the kind of shit-headed responses I get from the "hardening" list, then I don't want to have anything to do with you guys. Seriously. I sent out a long explanation of what's wrong with the hardening people last week. It made the news. If you still don't understand, you're simply not worth working with. If you cannot help improve kernel security for the default case, and you can't even be bothered to try, and only want to fix some special case that doesn't then improve anything at all for most people, I really _really_ suggest you go play in your own sandbox. Because clearly, if you're not interested in improving things for anybody else, why the hell should you care about the upstream kernel anyway? That's what this boils down to: if you send me patches, you had better strive to improve security for everybody, not just for some little locked-down special case. We're not grsecurity. We never have been. We're not interested in the crazy people. We're interested in the kind of security that is generally applicable. To the mainline kernel, not breaking existing users matters, but it also matters that the patches make sense for everybody, because otherwise, why be mainline? So a patch that avoids breaking existing users, but also doesn't actually improve anything for existing users, simply shouldn't be part of the mainline kernel. Comprende? Linus
On Wed, Nov 29, 2017 at 12:23 AM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Nov 28, 2017 at 01:33:40PM -0800, Kees Cook wrote: >> As I've said before, this isn't a theoretical attack surface. This >> year alone there have been three known-exploitable flaws exposed by >> autoloading: >> >> The exploit for CVE-2017-2636 uses int n_hdlc = N_HDLC; ioctl(fd, >> TIOCSETD, &n_hdlc) [1]. This is using the existing "tty-ldisc-" >> prefix, and is intentionally unprivileged. >> >> The exploit for CVE-2017-6074 uses socket(PF_INET6, SOCK_DCCP, >> IPPROTO_IP) [2]. This is using the existing proto prefix, and is >> intentionally unprivileged. > > So in these two cases, if the kernel was built w/o modules, and HDLC > and DCCP was built-in, you'd be screwed, then? > > Is the goal here to protect people using distro kernels which build > the world as modules, including dodgy pieces of kernel code that are > bug-ridden? > > If so, then presumably 90% of the problem you've cited can be done by > creating a script which takes a look of the modules that are normally > in use once the machine is in production, and then deleting everything > else? Correct? > > And yes, this will potentially break some users, but the security > folks who are advocating for the more aggressive version of this > change seem to be OK with breaking users, so they can do this without > making kernel changes. Good luck getting Red Hat and SuSE to accept > such a change, though.... The patches does not change default and make it easy for users and we have request for this, not all world is Red Hat / SuSE , I build embedded Linux for clients when I manage to have some, and I clearly would have set this to my clients since most of them won't be able to afford all the signing and complexity, now how I should allow modules load/unload replace with newer versions, but restrict some of their apps from triggering it, "modules_disabled=1" is not practical. I can't build a perfect version for every usecase, and they started to ship apps for IoT, even containers for IoT, yes they do it and they use the same os for various use cases! so it is not about those, even embedded vendors have one single shared layer that they use for all their products. For distros, the target is also containers and sandboxes, and they are already interested in it. P.S. please the cover letter already mentions that this is for Embedded and IoT. > - Ted
On Tue, Nov 28, 2017 at 3:51 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So a patch that avoids breaking existing users, but also doesn't > actually improve anything for existing users, simply shouldn't be part > of the mainline kernel. Just to clarify: maybe it ends up being truly impossible to make the default be more restrictive. The default certainly won't be the most restrictive option. But at the same time, if people can't even be bothered to try to improve the general case, and only do things that you have to opt in for, it really isn't much of an improvement. We had this whole "opt-in" discussion for another thread entirely, and it basically didn't improve anything for anybody for the better half of a decade. Hardening that only works for special cases isn't hardening at all. It will just mean that 99+% of all kernel developers won't see the fallout at all. Yes, something like Android may be 99% of the devices, but it's a very small portion of the core developer base because the hardware is all locked down, and it's an even smaller portion of the usage patterns. So I can see some people say "We can use this for android and protect the 99%". But if it then is basically invisible to the rest of the user base, it means that all those servers etc aren't getting the kind of protection they should have. Just to take that DCCP thing as an example: being a module wasn't what made it vulnerable. It would have been vulnerable compiled in too. What made it vulnerable was that the DCCP code simply isn't widely enough used and tested, and basically barely on life support. And it was available much too widely despite that. So all this is about is to make for a smaller attack surface. But if it turns out that we can make the attack surface smaller by simply white-listing a few modules that we know are actively used and feel better about the quality of, that makes for a much smaller attack surface _too_. And it does so in general, without having to set some flag that 99% of all MIS people won't even really know about. So that's why I want people to look at a different approach. Yes, the opt-in model means that by default nothing changes. That protects against the whole "oops, we don't break user space". But it has a huge downside. The model that I am a proponent of is to take a softer approach initially: don't forbid module loading (because that breaks users), but instead _warn_ about non-root module loading. And then we can start fixing the cases that we find. See? This is *exactly* the same thing that the user-mode access thing was about. Hardening people need to get over their "hard rules" mindset. We don't kill processes or forbid them from doing things that might be bad. We start by warning about them, to see what "might be bad" cases are actually normal, and not actually bad at all. And then we use that information to guide our notion of what should actually trigger a stronger response. Linus
On Tue, Nov 28, 2017 at 3:49 PM, Theodore Ts'o <tytso@mit.edu> wrote: > OK, but if the goal is to protect users who are running distro > kernels, then a kernel config that breaks some percentage of users is > ****highly**** unlikely to be enabled by Red Hat and SuSE, right? And > many of these users either can't (from a skills perspective) or won't > (because they lose all support from the enterprise distro's help desk) > recompile their kernel to enable an "might break 3% of all users --- > oh, well" config option. There's a spectrum across "insane not to be done everywhere" (STRICT_KERNEL_RWX), "this is a good idea for nearly all Linux systems" (link restrictions), "this can break some common use-cases, but protects systems without that use-case" (user-namespace disabling), "this breaks most systems, but specialized deployments really need it" (modules_disabled). There's also a difference between immutable CONFIG options that cannot be disabled at runtime, those that can, global sysctls, per-namespace controls, etc etc. The kernel is all about providing admins with knobs to tweak their performance and security. Suddenly being told that we can't create optional improvements is very odd. Now, being told "make it easier to audit all the module loading we're already doing so we can further reduce needless exposures for everyone even without this feature enabled", THAT makes sense. My point there is that we've already done those kinds of things; see commits like: 7f78e0351394 ("fs: Limit sys_mount to only request filesystem modules.") 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"") 4943ba16bbc2 ("crypto: include crypto- module prefix in template") > Which argues for being extremely conservative about making something > that has an extremely low probability of breaking users, and it points > out why Geo Kozey's "who cares about breaking users; security is > IMPORTANT" argument is so wrong-headed. I don't want to break users either. I want to provide meaningful ways for admins to reduce their kernel attack surface. Djalal and I aren't advocating for on-by-default removal of module auto-loading (that would have been a very small patch!). The idea was to provide a dynamic control over it, and make that available to distros. As shown in the patch series, it would be immediately put to use with systemd for process-tree isolation and for container restriction. > If the goal is to protect distro kernels, but a sloppy approach > guarantees that distro kernels will never enable it, is it really > worth it? I don't think this is sloppy and of course distros would see use, since systemd would be using it. > P.S. This is where it might be useful to get some input from the Red > Hat and SuSE support teams. How many angry user calls to their help > desk are they willing to field before they'll just turn off the kernel > config option for their kernels? This isn't about giant-hammer CONFIGs -- this is like more like PR_SET_DUMPABLE or seccomp. -Kees
On Tue, Nov 28, 2017 at 4:17 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > But at the same time, if people can't even be bothered to try to > improve the general case, and only do things that you have to opt in > for, it really isn't much of an improvement. We had this whole Right, and I'm fine to help with those general improvements. As I mentioned in another reply, we've been steadily narrowing the exposures by keeping request_module() from being exposed to "all possible modules" in many of the places it's been exposed to userspace: 7f78e0351394 ("fs: Limit sys_mount to only request filesystem modules.") 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"") 4943ba16bbc2 ("crypto: include crypto- module prefix in template") I don't want you to think this is an area where work hasn't already been done to try to improve things. > The model that I am a proponent of is to take a softer approach > initially: don't forbid module loading (because that breaks users), > but instead _warn_ about non-root module loading. And then we can > start fixing the cases that we find. I am totally fine with this. The question I'm hoping to have answered is, "then what?" We already have concrete examples of module autoloading that will still be need to stay unprivileged and as-is in the kernel (even if we remove others). What do you see as the way to allow an admin to turn those off? -Kees
On Tue, Nov 28, 2017 at 4:26 PM, Kees Cook <keescook@chromium.org> wrote: > >> The model that I am a proponent of is to take a softer approach >> initially: don't forbid module loading (because that breaks users), >> but instead _warn_ about non-root module loading. And then we can >> start fixing the cases that we find. > > I am totally fine with this. The question I'm hoping to have answered > is, "then what?" We already have concrete examples of module > autoloading that will still be need to stay unprivileged and as-is in > the kernel (even if we remove others). What do you see as the way to > allow an admin to turn those off? Just thinking about the DCCP case, where networking people actually knew it was pretty deprecated and had no real maintainer, I think one thing to look at would be simply a per-module flag. That kind of thing should be fairly easy to implement, along the same lines as the module license - it just sets a flag in the ELF section headers. With something like that, we literally could make the default be "no autoloading except for root", and then just mark the modules that we think are ok and well maintained. Sure, if you then do a lock-down mode that makes that flag parsing stricter, then that's a separate thing. But I suspect we definitely could be a lot stricter on a per-module basis, and do it in a way where a normal user wouldn't even notice that we've limited the autoloading. But the first step would be to just add some noise. And even with the per-module flag, at first it would only suppress the noise (ie we'd still _allow_ loading other modules, they'd just be noisy). Then, if nobody hollers, maybe the next kernel release we'll make it actually enforce the flag. Does that sound reasonable? Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, Nov 28, 2017 at 4:26 PM, Kees Cook <keescook@chromium.org> wrote: >> >>> The model that I am a proponent of is to take a softer approach >>> initially: don't forbid module loading (because that breaks users), >>> but instead _warn_ about non-root module loading. And then we can >>> start fixing the cases that we find. >> >> I am totally fine with this. The question I'm hoping to have answered >> is, "then what?" We already have concrete examples of module >> autoloading that will still be need to stay unprivileged and as-is in >> the kernel (even if we remove others). What do you see as the way to >> allow an admin to turn those off? > > Just thinking about the DCCP case, where networking people actually > knew it was pretty deprecated and had no real maintainer, I think one > thing to look at would be simply a per-module flag. > > That kind of thing should be fairly easy to implement, along the same > lines as the module license - it just sets a flag in the ELF section > headers. > > With something like that, we literally could make the default be "no > autoloading except for root", and then just mark the modules that we > think are ok and well maintained. > > Sure, if you then do a lock-down mode that makes that flag parsing > stricter, then that's a separate thing. But I suspect we definitely > could be a lot stricter on a per-module basis, and do it in a way > where a normal user wouldn't even notice that we've limited the > autoloading. > > But the first step would be to just add some noise. And even with the > per-module flag, at first it would only suppress the noise (ie we'd > still _allow_ loading other modules, they'd just be noisy). Then, if > nobody hollers, maybe the next kernel release we'll make it actually > enforce the flag. > On a slight tangent to all of this. The issue of reducing attack surface has also come up in another thread and it was suggested there that we make some ns_capable calls conditionally capable calls so certain pieces of code won't be available in user namespaces, when we know there is a bug but don't yet have a good fix rolled out yet. Which again brings us to attack surface reduction. I was wondering if perhaps a better way to do some of that, might be to have places in the kernel where we could use something like ftrace to add a permission check at a well known functions boundaries and fail the functions when we want to reduce the attack surface. It is not as elegant as adding a maintenance status to a module and only allowing actively maintained modules to be auto-loaded. But perhaps with a few more eyes the idea can be fleshed out to something generally useful. Eric
On Tue, Nov 28, 2017 at 04:18:59PM -0800, Kees Cook wrote: > There's also a difference between immutable CONFIG options that cannot > be disabled at runtime, those that can, global sysctls, per-namespace > controls, etc etc. The kernel is all about providing admins with knobs > to tweak their performance and security. Suddenly being told that we > can't create optional improvements is very odd. I just think that tweakable knobs are mostly pointless. From my experience the number of sysadmins that adjust knobs is ***tiny***[1]. Put another way, the effort to determine whether tweaking a knob will result in breakages or will be safe is as much work as creating a white list of modules that are allowed to be loaded. [1] And I say that having providing a lot of knobs for ext4. :-) This is why some on the kernel-hardening list have argued for making the default to be opt-out, which means some users will be breaken (and their answer to that seems to be, "oh well --- gotta break some eggs to make an omlette". Sucks if you're one of the eggs, though.) And I don't see how systemd magically means no one will be broken. If you have a non-root process trying to invoke a line discpline which has to be loaded as a module, if you flip the switch, that process will be broken. How does using systemd make the problem go away? - Ted
> From: Theodore Ts'o <tytso@mit.edu> > Sent: Wed Nov 29 00:49:20 CET 2017 > Subject: Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules > > OK, but if the goal is to protect users who are running distro > kernels, then a kernel config that breaks some percentage of users is > ****highly**** unlikely to be enabled by Red Hat and SuSE, right? And > many of these users either can't (from a skills perspective) or won't > (because they lose all support from the enterprise distro's help desk) > recompile their kernel to enable an "might break 3% of all users --- > oh, well" config option. > > Which argues for being extremely conservative about making something > that has an extremely low probability of breaking users, and it points > out why Geo Kozey's "who cares about breaking users; security is > IMPORTANT" argument is so wrong-headed. > ---------------------------------------- My argument was the opposite - don't break anything by default. Nobody here wanted to break any users. Some of them may choose stricter option because it actually doesn't break anything relevant for them. That's all. It's the same as perf_event_paranoid, kptr_restrict, yama.ptrace_scope sysctls are working currently. We're talking about runtime, not compile time option. I doubt RH or SUSE prevent users from changing anything at runtime. G. K.
> From: Linus Torvalds <torvalds@linux-foundation.org> > Sent: Wed Nov 29 01:17:05 CET 2017 > To: Geo Kozey <geokozey@mailfence.com> > Subject: Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules > > > On Tue, Nov 28, 2017 at 3:51 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > So a patch that avoids breaking existing users, but also doesn't > > actually improve anything for existing users, simply shouldn't be part > > of the mainline kernel. > > Just to clarify: maybe it ends up being truly impossible to make the > default be more restrictive. The default certainly won't be the most > restrictive option. > > But at the same time, if people can't even be bothered to try to > improve the general case, and only do things that you have to opt in > for, it really isn't much of an improvement. We had this whole > "opt-in" discussion for another thread entirely, and it basically > didn't improve anything for anybody for the better half of a decade. > > Hardening that only works for special cases isn't hardening at all. It > will just mean that 99+% of all kernel developers won't see the > fallout at all. > > Yes, something like Android may be 99% of the devices, but it's a very > small portion of the core developer base because the hardware is all > locked down, and it's an even smaller portion of the usage patterns. > > So I can see some people say "We can use this for android and protect > the 99%". But if it then is basically invisible to the rest of the > user base, it means that all those servers etc aren't getting the kind > of protection they should have. > > Just to take that DCCP thing as an example: being a module wasn't what > made it vulnerable. It would have been vulnerable compiled in too. > What made it vulnerable was that the DCCP code simply isn't widely > enough used and tested, and basically barely on life support. And it > was available much too widely despite that. > > So all this is about is to make for a smaller attack surface. > > But if it turns out that we can make the attack surface smaller by > simply white-listing a few modules that we know are actively used and > feel better about the quality of, that makes for a much smaller attack > surface _too_. And it does so in general, without having to set some > flag that 99% of all MIS people won't even really know about. > > So that's why I want people to look at a different approach. Yes, the > opt-in model means that by default nothing changes. That protects > against the whole "oops, we don't break user space". But it has a huge > downside. > > The model that I am a proponent of is to take a softer approach > initially: don't forbid module loading (because that breaks users), > but instead _warn_ about non-root module loading. And then we can > start fixing the cases that we find. > > See? This is *exactly* the same thing that the user-mode access thing > was about. Hardening people need to get over their "hard rules" > mindset. We don't kill processes or forbid them from doing things that > might be bad. We start by warning about them, to see what "might be > bad" cases are actually normal, and not actually bad at all. And then > we use that information to guide our notion of what should actually > trigger a stronger response. > > Linus I followed your last week discussion and I agreed with your arguments so it's unfortunate for me that You put me on the opposite side. I agree that improving kernel code by finding and fixing existing bugs is the ultimate goal for security features. On the other hand the whole debugging process (find, report, wait for fix) takes a lot of time and work and is basically never ending story. That's why some kind of prevention which protects users in-between is important for users which priorities maximal security on their systems. I like your idea about printing warnings. What about: modules_autoload_mode =0 (default) _warn_ about non-root module loading and allow modules_autoload_mode =1 _warn_ about non-root module loading and deny This way we can start fixing found cases and also give users option for proactive defense at the same time without breaking general public. The end game (after a lot of fixes) would be to make 0 and 1 mostly indistinguishable from security perspective. Yours sincerely G. K.
On Tue, Nov 28, 2017 at 4:50 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Nov 28, 2017 at 4:26 PM, Kees Cook <keescook@chromium.org> wrote: >> >>> The model that I am a proponent of is to take a softer approach >>> initially: don't forbid module loading (because that breaks users), >>> but instead _warn_ about non-root module loading. And then we can >>> start fixing the cases that we find. >> >> I am totally fine with this. The question I'm hoping to have answered >> is, "then what?" We already have concrete examples of module >> autoloading that will still be need to stay unprivileged and as-is in >> the kernel (even if we remove others). What do you see as the way to >> allow an admin to turn those off? > > Just thinking about the DCCP case, where networking people actually > knew it was pretty deprecated and had no real maintainer, I think one > thing to look at would be simply a per-module flag. > > That kind of thing should be fairly easy to implement, along the same > lines as the module license - it just sets a flag in the ELF section > headers. > > With something like that, we literally could make the default be "no > autoloading except for root", and then just mark the modules that we > think are ok and well maintained. > > Sure, if you then do a lock-down mode that makes that flag parsing > stricter, then that's a separate thing. But I suspect we definitely > could be a lot stricter on a per-module basis, and do it in a way > where a normal user wouldn't even notice that we've limited the > autoloading. > > But the first step would be to just add some noise. And even with the > per-module flag, at first it would only suppress the noise (ie we'd > still _allow_ loading other modules, they'd just be noisy). Then, if > nobody hollers, maybe the next kernel release we'll make it actually > enforce the flag. > > Does that sound reasonable? Yeah, I think I see the way forward here; thanks for the discussion! -Kees
On Wed, Nov 29, 2017 at 10:30 AM, Kees Cook <keescook@chromium.org> wrote: > On Tue, Nov 28, 2017 at 4:50 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> Just thinking about the DCCP case, where networking people actually >> knew it was pretty deprecated and had no real maintainer, I think one >> thing to look at would be simply a per-module flag. >> [ ... ] >> Does that sound reasonable? > > Yeah, I think I see the way forward here; thanks for the discussion! Note: I don't want to really force that per-module flag if it ends up being painful, I was really just thinking that considering the DCCP case, it would be (I think) a really nice solution. In particular, request_module() ends up having that indirection through usermode-helper, which makes it potentially very inconvenient to store the "did the original caller have proper capabilities" and then check it at actual module load time. So the module flag is technically easy to add, and it's technically easy to read at module loading time, but I suspect that it's actually annoyingly hard to pass the original request_module() capability information around to where we actually read the module. That's why it might be _much_ easier to try to do it per call-site instead. It's not quite as fine-grained (because several call-sites do things like request_module("net-pf-%d-proto-%d", PF_NETLINK, protocol); that can load a large number of different modules), but if we can get away with just saying "this particular callsite is ok, because it only loads the bluetooth module that we thing is trustworthy" or "this call-site is ok, because you already had to have access to the device node", that is going to be much simpler and more straightforward. In other words: I think there are at least two different models to go after, and there may be practical reasons to prefer one over the other. Linus
On Wed, Nov 29, 2017 at 10:46 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So the module flag is technically easy to add, and it's technically > easy to read at module loading time, but I suspect that it's actually > annoyingly hard to pass the original request_module() capability > information around to where we actually read the module. One possibly interesting approach would be to run the usermode helper not as root, but with the credentials of the request_module() caller. That's arguably the right thing to do (in that request_module() would never do anything that the user wouldn't be able to do on their own) and probably what we should have done originally, but while it feels like a nice solution I suspect it would break pretty much every distro out there. Because they all expect modprobe/kmod to be called as root in the original init-namespace. Linus
On Wed, Nov 29, 2017 at 10:46 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Nov 29, 2017 at 10:30 AM, Kees Cook <keescook@chromium.org> wrote: >> On Tue, Nov 28, 2017 at 4:50 PM, Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> >>> Just thinking about the DCCP case, where networking people actually >>> knew it was pretty deprecated and had no real maintainer, I think one >>> thing to look at would be simply a per-module flag. >>> [ ... ] >>> Does that sound reasonable? >> >> Yeah, I think I see the way forward here; thanks for the discussion! > > Note: I don't want to really force that per-module flag if it ends up > being painful, I was really just thinking that considering the DCCP > case, it would be (I think) a really nice solution. > > In particular, request_module() ends up having that indirection > through usermode-helper, which makes it potentially very inconvenient > to store the "did the original caller have proper capabilities" and > then check it at actual module load time. Yeah, for this it seems like the permission _evaluation_ would happen at request_module() time, and it could pass that as a module argument to the usermode-helper, so that during load_module(), it would have all the needed information to take an action: - was this load considered privileged? - is this module marked privileged-only? - what are we configured to do in the case of mismatches? > So the module flag is technically easy to add, and it's technically > easy to read at module loading time, but I suspect that it's actually > annoyingly hard to pass the original request_module() capability > information around to where we actually read the module. The clever solution grsecurity uses for this is to pass information as module loading arguments (mentioned above) in call_modprobe(), and then read them back in load_module(). This seemed like overkill to me, but it does allow for delaying the decision on the action. I think this would only be needed to deal with per-module markings, though, since all other cases the evaluation and action could happen at request_module() time. (More on this below.) > That's why it might be _much_ easier to try to do it per call-site > instead. It's not quite as fine-grained (because several call-sites do > things like > > request_module("net-pf-%d-proto-%d", PF_NETLINK, protocol); > > that can load a large number of different modules), but if we can get > away with just saying "this particular callsite is ok, because it only > loads the bluetooth module that we thing is trustworthy" or "this > call-site is ok, because you already had to have access to the device > node", that is going to be much simpler and more straightforward. This is basically what we've historically already tried to do: limit request_module() calls to either one or a set of const strings, or to load with a format template. Removing all the request_module(whatever_the_user_says) calls has (hopefully) been completed, though sometimes weird stuff sneaks by. The difficulty I see here is that it's not always clear which are expected to be privileged loads or not. Even an analysis[1] of whether or not request_module() is using a format template doesn't tell us much. Is it an unprivileged net module like your example above or is it some privileged device hotplug event like request_module("ulpi:v%04xp%04x", ulpi->id.vendor, ulpi->id.product)? > In other words: I think there are at least two different models to go > after, and there may be practical reasons to prefer one over the > other. I would agree, and it's why I thought the original series was good: I feel like we've already done the harder work of narrowing request_module() calls by subsystem prefixes over the years, and separating privilege with a new API didn't make sense to me. The privileges are a state to check in the existing API, etc. But as I said, if there's a way forward you're happy with, let's do it. It just needs to get to the point where an admin can explicitly disable unprivileged module autoloading with some appropriate level of granularity. [email merging your later reply...] > One possibly interesting approach would be to run the usermode helper > not as root, but with the credentials of the request_module() caller. > > That's arguably the right thing to do (in that request_module() would > never do anything that the user wouldn't be able to do on their own) > and probably what we should have done originally, but while it feels > like a nice solution I suspect it would break pretty much every distro > out there. > > Because they all expect modprobe/kmod to be called as root in the > original init-namespace. Dropping to the request_module() privileges for the usermode helper would absolutely break the world. It would actually be much more disruptive than the proposed series. :) One of the (many) problems with module loading is that it changes the global features available to the kernel: suddenly all existing processes have access to whatever feature just got loaded (modulo a syscall path to touch the code). This is a rather ugly from the perspective of containers: unprivileged module loading can change the kernel globally for all containers, the load gets triggered to run in the init userspace, not the container, etc, etc. (Though loading from the container would be nonsense too.) So, what we have now is that the permission verification already happens at and around the existing request_module() calls. Callers are already checking caps within containers, for example, or isolating which things can be loaded with subsystem format prefixes, etc. All the loading logic is being done around the request_module() calls, and once this is deemed "safe", we explicitly elevate privileges to init-ns CAP_SYS_MODULE and load the module by way of the usermode helper's call to finit_module(). As in, we have an intentional CAP_SYS_MODULE privilege elevation for doing module loading. What I like about the proposed request_modue_cap() is that is codifies the specific privilege change "having this cap allows you to gain CAP_SYS_MODULE to load this possible prefixed subset of modules", instead of having that logic open-coded near the request_module() call site. And we can add more of these explicit kinds of calls, but it won't change the need for a way to say "no privilege elevations should be allowed". It still sounds like you'd like to see an explicit change, similar to the proposed request_module_cap(), that identifies the privilege expectations on a per-call-site basis. How about this plan: 1) Add request_module_cap(required_cap, module_name_prefix, fmt, fmt_args...) 2) Convert known privileged-but-not-CAP_SYS_MODULE request_module() callers to request_module_cap(the_needed_cap, prefix, ...) 2) Convert known unprivileged callers to use request_module_cap(0, ...) 3) Add WARN_RATELIMIT for request_module() calls without CAP_SYS_MODULE to shake out other places where request_module_cap() is needed. 4) Adapt the original patch series to add the per-process flag that can block privilege elevations. This will get us the improvement of removing any unexpectedly unprivileged request_module() call sites for everyone, and still provide the optional process flag to mark a process as not allowed to make change its effective privilege to load a module. Does this sound workable? -Kees [1] Just some greps to see the extent of request_module() use... We've got a bit over 400 call sites: $ git grep request_module'\b' | wc -l 438 About a third aren't using a format string, so they're loading either a specific module or a specific alias: $ git grep request_module'\b' | grep -v % | grep '"' | wc -l 160 Another third use format strings to load some aliased subset: $ git grep request_module'\b' | grep % | wc -l 154 The rest construct their strings manually, making them less easy to trivially examine.
On Wed, Nov 29, 2017 at 1:17 PM, Kees Cook <keescook@chromium.org> wrote: > > So, what we have now is that the permission verification already > happens at and around the existing request_module() calls. Usually, yes. I liked the "request_module_cap()" interface partly because that made the net/core/dev_ioctl.c ones more explicit, and maybe it could be convenient if we make other places do similar things. I was hoping some other users could be converted, but grepping around, there's no obvious cases. There is tcp_cong.c and tcp_ulp.c, but they want some extra locking in between the checking.. > It still sounds like you'd like to see an explicit change, similar to > the proposed request_module_cap(), that identifies the privilege > expectations on a per-call-site basis. How about this plan: Yes. I'd be perfectly happy to have a long-range plan where the existing "request_module()" ends up requiring more capabilities. I just don't think it's a good first step, exactly because *if* it's a first step, it basically has to be disabled by default. And once you disable it by default, and it becomes purely opt-in, that means that nothing will change for most cases. Some embedded people that do their own thing (ie Android) might change, but normal distributions probably won't. Yes, Android may be 99% of the users, and yes, the embedded world in general needs to be secure, but I'd still like this to be something that helps _everybody_. So: > 1) Add request_module_cap(required_cap, module_name_prefix, fmt, fmt_args...) > > 2) Convert known privileged-but-not-CAP_SYS_MODULE request_module() > callers to request_module_cap(the_needed_cap, prefix, ...) Yes. The upside seems to be very limited here, but at least it makes the users that use CAP_NET_ADMIN instead of CAP_SYS_MODULE able to specify so. > 2) Convert known unprivileged callers to use request_module_cap(0, ...) 0 is CAP_CHOWN, so it would have to be -1. And I wouldn't actually want to see that as-is. Not only would I not want to see people have that "-1" in random driver subsystems, I'd much prefer to have actual helper naming that descibes why something is ok Because as mentioned, I think there are valid permission reasons that are _not_ about capabilities that make you able to load a module. If you can open a character device node, then "misc_open()" will do request_module("char-major-%d-%d", MISC_MAJOR, minor); and there is nothing about capabilities in the CAP_SYS_MODULE sense about the user. But the user _did_ have the privileges to open that character device file. That's why I suggested something like request_module_dev(): it's not at all the same thing as request_module_cap(-1, ...), saying "I don't need/have a capability". It's saying something else entirely, it's basically saying "I have the right based on device permissions". And something like request_module_dev() might even have real semantic meaning, exactly because it says "this module request comes from trying to open a device node". Why would that be? If we know we're on a system where /dev is auto-populated through udev, then the device nodes should have been created by the drivers, not the other way around. So we might even have a rule that notices that automatically, and simply disables request_module_dev() entirely. Anyway, I'm not saying that is necessarily something we should do, but I do suspect that we could adapt to modern systems without having to have tons of magic settings, and try to be as strict as possible without breaking them. Because I dislike "system tuning" in general. I hate knobs that do kernel performance tuning - we try very hard to just DTRT wrt sizing hashes etc instead of expecting the system admin to set flags. And I think we can try to avoid some system tuning in this area too. I suspect that for a lot of our existing request_module() cases, they really are pretty trivial. In most cases, it's probably really about whether you have the hardware or not. So for the hardware driver cases, either the hardware enumerates itself, or it is presumably set up by the system scripts anyway, and CAP_SYS_MODULE is all fine. The "open device node" case is one special case, though. That mainly leaves the protocol ones we need to look out for, I suspect. > 3) Add WARN_RATELIMIT for request_module() calls without > CAP_SYS_MODULE to shake out other places where request_module_cap() is > needed. Yes. And this is where I hope that there really aren't actually all that many cases that will warn, and that it's hopefully easy to simply just look at a handful of reports and say "ok, that case is obviously fine". And I may be wrong. > 4) Adapt the original patch series to add the per-process flag that > can block privilege elevations. Yes. Linus
On Wed, Nov 29, 2017 at 2:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Nov 29, 2017 at 1:17 PM, Kees Cook <keescook@chromium.org> wrote: >> 1) Add request_module_cap(required_cap, module_name_prefix, fmt, fmt_args...) >> >> 2) Convert known privileged-but-not-CAP_SYS_MODULE request_module() >> callers to request_module_cap(the_needed_cap, prefix, ...) > > Yes. The upside seems to be very limited here, but at least it makes > the users that use CAP_NET_ADMIN instead of CAP_SYS_MODULE able to > specify so. Yeah, agreed; it's limited in scope. >> 2) Convert known unprivileged callers to use request_module_cap(0, ...) > > 0 is CAP_CHOWN, so it would have to be -1. Oops, yes, think-o. > And I wouldn't actually want to see that as-is. Not only would I not > want to see people have that "-1" in random driver subsystems, I'd > much prefer to have actual helper naming that descibes why something > is ok So, if some catch-all is needed, maybe request_module_unpriv(...), but I think it should be possible to identify SOME kind of defining privilege to check. > Because as mentioned, I think there are valid permission reasons that > are _not_ about capabilities that make you able to load a module. > > If you can open a character device node, then "misc_open()" will do > > request_module("char-major-%d-%d", MISC_MAJOR, minor); > > and there is nothing about capabilities in the CAP_SYS_MODULE sense > about the user. But the user _did_ have the privileges to open that > character device file. > > That's why I suggested something like request_module_dev(): it's not > at all the same thing as request_module_cap(-1, ...), saying "I don't > need/have a capability". It's saying something else entirely, it's > basically saying "I have the right based on device permissions". > > And something like request_module_dev() might even have real semantic > meaning, exactly because it says "this module request comes from > trying to open a device node". I just want to make sure I'm visualizing this correctly. Using the misc_open() example, you're thinking something like: request_module_dev(file, "char-major-%d-%d", MISC_MAJOR, minor); which would then do a verification that file->f_cred matched current_cred()? (For misc_open(), this is obviously going to always be okay, but other cases may be further from the fops open handler...) > Why would that be? If we know we're on a system where /dev is > auto-populated through udev, then the device nodes should have been > created by the drivers, not the other way around. So we might even > have a rule that notices that automatically, and simply disables > request_module_dev() entirely. Right, we have both directions -- hotplug module loading ("I saw this PCI alias, load something! Oh! It needs some fancy crypto too!") and on-use module loading. It seems like on-use module loading would get covered by request_module_dev(), and the hotplug case would always be privileged. Anyway... exploring this with real code is clearly warranted. > I suspect that for a lot of our existing request_module() cases, they > really are pretty trivial. In most cases, it's probably really about > whether you have the hardware or not. Right, and those would all be privileged, etc. > So for the hardware driver cases, either the hardware enumerates > itself, or it is presumably set up by the system scripts anyway, and > CAP_SYS_MODULE is all fine. The "open device node" case is one special > case, though. Yeah, going through the call sites and asking "where does the privilege for loading this module derive from?" for each one will help shape the resulting API changes. > That mainly leaves the protocol ones we need to look out for, I suspect. This is where a lot of the exposure really comes from. socket() triggers a bunch of stuff, but doesn't have an obvious privilege associated with it... while it already does the name templates, maybe add request_module_socket() just to explicitly mark it? >> 3) Add WARN_RATELIMIT for request_module() calls without >> CAP_SYS_MODULE to shake out other places where request_module_cap() is >> needed. > > Yes. > > And this is where I hope that there really aren't actually all that > many cases that will warn, and that it's hopefully easy to simply just > look at a handful of reports and say "ok, that case is obviously > fine". > > And I may be wrong. > >> 4) Adapt the original patch series to add the per-process flag that >> can block privilege elevations. > > Yes. Okay, excellent. Hopefully we haven't scared off Djalal, and hopefully Jessica doesn't think this is all insane. :) Thanks! -Kees
On Wed, Nov 29, 2017 at 4:44 PM, Kees Cook <keescook@chromium.org> wrote: > >> That mainly leaves the protocol ones we need to look out for, I suspect. > > This is where a lot of the exposure really comes from. socket() > triggers a bunch of stuff, but doesn't have an obvious privilege > associated with it... while it already does the name templates, maybe > add request_module_socket() just to explicitly mark it? .. and this is where I'd expect that maybe we'd need some hackery. Even including some ad-hoc rules like "this module is actually maintained", possibly even with some /sys interface to extend/reduce that set. But maybe it's not even that bad. Linus
> And once you disable it by default, and it becomes purely opt-in, that > means that nothing will change for most cases. Some embedded people > that do their own thing (ie Android) might change, but normal > distributions probably won't. > > Yes, Android may be 99% of the users, and yes, the embedded world in > general needs to be secure, but I'd still like this to be something > that helps _everybody_. Android devices won't get much benefit since they ship a tiny set of modules chosen for the device. The kernels already get very stripped down to the bare minimum vs. enabling every feature and driver available and shipping it all by default on a traditional distribution. Lots of potential module attack surface also gets eliminated by default via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl commands, etc. The global seccomp whitelist might be relevant in some cases too. Android devices like to build everything into the kernel too, so even if they weren't using a module this feature wouldn't usually help them. It would need to work like this existing sysctl: net.ipv4.tcp_available_congestion_control = cubic reno lp i.e. whitelists for functionality offered by the modules, not just whether they can be loaded.
On Thu, Nov 30, 2017 at 7:51 AM, Daniel Micay <danielmicay@gmail.com> wrote: [...] > Lots of potential module attack surface also gets eliminated by default > via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl > commands, etc. The global seccomp whitelist might be relevant in some > cases too. In embedded systems we can't maintain a SELinux policy, distro man power hardly manage. We have abstracted seccomp etc, but the kernel inherited the difficult multiplex things, plus all other paths that trigger this. > Android devices like to build everything into the kernel too, so even if > they weren't using a module this feature wouldn't usually help them. It > would need to work like this existing sysctl: > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > i.e. whitelists for functionality offered by the modules, not just > whether they can be loaded. Yes, but it is hard to maintain a whitelist policy, the code is hardly maintained... if you include everything you should have an LSM policy or something like that, and compiling kernels is expert thing. Otherwise IMHO the kernel should provide default secure behaviour on how to load or add new functionality to the running one. From a user perspective, a switch "yes/no" that a privileged entity will *understand* and assume is what should be there, and the switch or flag as discussed here is local to processes, the sysctl will be removed. IMO it should come from userspace point of view, cause as an example the sysctl: net.ipv4.tcp_available_congestion_control = cubic reno lp Is kernel thing, too technical, userspace developers, admins or privileged entity will not understand what cubic or reno mean. Doing the same per functionality directly like this seems to much of a burden compared to the use case. The kernel maybe can do this to advance the art of the networking stack and for advanced cases, but in IMHO a sane default behaviour + an abstracted process/sandbox flag "yes/no" for most others, userspace developers and humans is what should be provided and we need the kernel to help here. It seems that Linus and kees agreed on this direction which allows me to follow up. Thanks!
On Thu, Nov 30, 2017 at 09:50:27AM +0100, Djalal Harouni wrote: > In embedded systems we can't maintain a SELinux policy, distro man > power hardly manage. We have abstracted seccomp etc, but the kernel > inherited the difficult multiplex things, plus all other paths that > trigger this..... > Yes, but it is hard to maintain a whitelist policy, the code is hardly > maintained... So this is the part that scares me to death about IOT, and why I tell everyone to ***never*** trust an IOT device on their home network, and ***never*** trust it with anything you don't mind splattered all over the front page of NY Times and RT / Sputnick news. You're saying that you want to use modules (as opposed to compile everything tightly down to just what you need for the embedded system); that the code is "hardly maintained". And yet we're supposed to consider it trustworthy? If that's the case, turning off implicit module loading sounds and thinking that this will somehow be a magic wand sounds.... crazy. - Ted
On Thu, Nov 30, 2017 at 3:16 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Thu, Nov 30, 2017 at 09:50:27AM +0100, Djalal Harouni wrote: >> In embedded systems we can't maintain a SELinux policy, distro man >> power hardly manage. We have abstracted seccomp etc, but the kernel >> inherited the difficult multiplex things, plus all other paths that >> trigger this..... > >> Yes, but it is hard to maintain a whitelist policy, the code is hardly >> maintained... > > So this is the part that scares me to death about IOT, and why I tell > everyone to ***never*** trust an IOT device on their home network, and > ***never*** trust it with anything you don't mind splattered all over > the front page of NY Times and RT / Sputnick news. Yes. For your pleasure: https://techcrunch.com/2017/04/25/brickerbot-is-a-vigilante-worm-that-destroys-insecure-iot-devices/ bricked million of devices to stupid busybox remote port. https://en.wikipedia.org/wiki/Mirai_(malware) an other million bots used to disturb netflix, twitter and others I don't know the details. ... > You're saying that you want to use modules (as opposed to compile > everything tightly down to just what you need for the embedded > system); that the code is "hardly maintained". And yet we're supposed > to consider it trustworthy? I didn't say that. > If that's the case, turning off implicit module loading sounds and > thinking that this will somehow be a magic wand sounds.... crazy. The product costs decide, web developers, javascript, big data analysis, electronic engineers all want to use Linux for IoT prototype and sell in some months, they will get any kernel+userspace add their value on top and sell. It will be non-sense to think that if a web developer wants to sell a node.js app as an IoT he has to compile a kernel and do all the other stuff, they all re-use the same layer the same config for everything. Requiring for everyone to compile its own kernel does not make much sense. Default safe behaviour is what we should do. Thanks! > - Ted
It was suggested that the feature would only be adopted in niches like Android and I pointed out that it's not really relevant to Android. It's a waste of time to try convincing me that it's useful elsewhere. I never said or implied that it wasn't. On Thu, 2017-11-30 at 09:50 +0100, Djalal Harouni wrote: > On Thu, Nov 30, 2017 at 7:51 AM, Daniel Micay <danielmicay@gmail.com> > wrote: > [...] > > Lots of potential module attack surface also gets eliminated by > > default > > via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl > > commands, etc. The global seccomp whitelist might be relevant in > > some > > cases too. > > In embedded systems we can't maintain a SELinux policy, distro man > power hardly manage. We have abstracted seccomp etc, but the kernel > inherited the difficult multiplex things, plus all other paths that > trigger this. It's cheaper to use an existing system like Android Things where device makers only need to make their apps and perhaps some userspace hardware drivers for cases not covered by mainline kernel drivers. I don't think it makes sense for every device vendor to manage an OS and I seriously doubt that's how the ecosystem is going to end up as it matures. > > Android devices like to build everything into the kernel too, so > > even if > > they weren't using a module this feature wouldn't usually help them. > > It > > would need to work like this existing sysctl: > > > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > > > i.e. whitelists for functionality offered by the modules, not just > > whether they can be loaded. > > Yes, but it is hard to maintain a whitelist policy, the code is hardly > maintained... if you include everything you should have an LSM policy > or something like that, and compiling kernels is expert thing. I'm not talking about whitelist vs. blacklist, compiling kernels or anything like that. > Otherwise IMHO the kernel should provide default secure behaviour on > how to load or add new functionality to the running one. From a user > perspective, a switch "yes/no" that a privileged entity will > *understand* and assume is what should be there, and the switch or > flag as discussed here is local to processes, the sysctl will be > removed. IMO it should come from userspace point of view, cause as an > example the sysctl: > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > Is kernel thing, too technical, userspace developers, admins or > privileged entity will not understand what cubic or reno mean. Congestion control algorithms are being used as an example in terms of the mechanism being used to control which are available to unprivileged users. The obscurity of congestion control algorithms is irrelevant. > Doing > the same per functionality directly like this seems to much of a > burden compared to the use case. The kernel maybe can do this to > advance the art of the networking stack and for advanced cases, but in > IMHO a sane default behaviour + an abstracted process/sandbox flag > "yes/no" for most others, userspace developers and humans is what > should be provided and we need the kernel to help here. There are cases where unprivileged module auto-loading is relied upon like network protocols. Having configuration for which protocols can be used by unprivileged users is superior to limiting only which ones can be auto-loaded. That's why I bought up the existing congestion control knob. It works well in terms of having a whitelist of the sane, widely used cases with exposing anything obscure requiring configuration. They happen to be implemented as modules too. Killing off unprivileged module loading other than a few cases like that makes sense, and then those can provide similar control with similarly sane defaults.
On Tue, Nov 28, 2017 at 06:49:20PM -0500, Theodore Ts'o wrote: > On Tue, Nov 28, 2017 at 03:29:01PM -0800, Kees Cook wrote: > > > So in these two cases, if the kernel was built w/o modules, and HDLC > > > and DCCP was built-in, you'd be screwed, then? > > > > Sure, but that's not the common situation. > > > > > Is the goal here to protect people using distro kernels which build > > > the world as modules, including dodgy pieces of kernel code that are > > > bug-ridden? > > > > The bulk of the risk comes from distro kernels, yes. (Though "bug > > ridden" is a strong statement. There are and will be bugs, scattered > > across a wide portion of the kernel, it's just that modules tend to > > cover most of that attack surface.) > > OK, but if the goal is to protect users who are running distro > kernels, then a kernel config that breaks some percentage of users is > ****highly**** unlikely to be enabled by Red Hat and SuSE, right? And > many of these users either can't (from a skills perspective) or won't > (because they lose all support from the enterprise distro's help desk) > recompile their kernel to enable an "might break 3% of all users --- > oh, well" config option. Yes, breaking customers is not seen lightly. I also (not related to this thread here, more to SLAB hardening et.al) have a hard time getting performance losses caused by hardening features approved. > Which argues for being extremely conservative about making something > that has an extremely low probability of breaking users, and it points > out why Geo Kozey's "who cares about breaking users; security is > IMPORTANT" argument is so wrong-headed. > > If the goal is to protect distro kernels, but a sloppy approach > guarantees that distro kernels will never enable it, is it really > worth it? > > - Ted > > P.S. This is where it might be useful to get some input from the Red > Hat and SuSE support teams. How many angry user calls to their help > desk are they willing to field before they'll just turn off the kernel > config option for their kernels? Speaking for SUSE ... If something that worked for people before and it breaks, we do get feedback. If no one used it however, we won't. For our last major product we went over the network module list and disabled some for building. e.g. DCCP is no longer built. We did not receive any complaints about missing DCCP to my knowledge. We also seperate our modules into "regular supported" and "unsupported" in different RPMs. The "unsupported" module packages are not shipped on the Server product. They were shipped on the desktop as some of the WiFi drivers were requested by customers but were considered not supportable. We do review this supportable list between kernel version jumps. Ciao, Marcus
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 7e690d0..fdd8560 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -382,8 +382,10 @@ void dev_load(struct net *net, const char *name) rcu_read_unlock(); no_module = !dev; + /* "netdev-%s" modules are allowed if CAP_NET_ADMIN is set */ if (no_module && capable(CAP_NET_ADMIN)) - no_module = request_module("netdev-%s", name); + no_module = request_module_cap(CAP_NET_ADMIN, "netdev", + "%s", name); if (no_module && capable(CAP_SYS_MODULE)) request_module("%s", name); }
This uses the new request_module_cap() facility to directly propagate CAP_NET_ADMIN capability and the 'netdev' module prefix to the capability subsystem as it was suggested. We do not remove the explicit capable(CAP_NET_ADMIN) check here, but we may remove it in future versions since it is also performed by the capability subsystem. This allows to have a better interface where other subsystems will just use this call and let the capability subsystem handles the permission checks, if the modules should be loaded or not. This is also an infrastructure fix since historically Linux always allowed to auto-load modules without privileges, and later the net code started to check capabilities and prefixes, adapted the CAP_NET_ADMIN check with the 'netdev' prefix to prevent abusing the capability by loading non-netdev modules. However from a bigger picture we want to continue to support automatic module loading as non privileged but also implement easy policy solutions like: User=djalal DenyNewFeatures=no Which will translate to allow the interactive user djalal to load extra Linux features. Others, volatile accounts or guests can be easily blocked from doing so. We have introduced in previous patches the necessary infrastructure and now with this change we start to use the new request_module_cap() function to explicitly tell the capability subsystem that we want to auto-load modules with CAP_NET_ADMIN if they are prefixed. This is also based on suggestions from Rusty Russel and Kees Cook [1] [1] https://lkml.org/lkml/2017/4/26/735 Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: James Morris <james.l.morris@oracle.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Solar Designer <solar@openwall.com> Cc: Andy Lutomirski <luto@kernel.org> Suggested-by: Rusty Russell <rusty@rustcorp.com.au> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Djalal Harouni <tixxdz@gmail.com> --- net/core/dev_ioctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)