Message ID | CAHC9VhQcxm=Zhe2XEesx3UsBgr8H6H=BtJc92roqeF8o+DK+XQ@mail.gmail.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Paul Moore |
Headers | show |
Series | [GIT,PULL] SELinux fixes for v5.15 (#1) | expand |
On Thu, Sep 16, 2021 at 9:14 PM Paul Moore <paul@paul-moore.com> wrote: > > Hi Linus, > > A single patch to address some issues with the incorrect subject being > used in some of the SELinux lockdown access controls. You saw, and > joined the discussion, earlier versions of this patch that included > the related BPF changes; the BPF changes have already been merged, > this patch has all the remainders. Beyond that, the commit > description is pretty good so if you are interested in more detail I > would suggest reading that first. > > Please merge for the next v5.15-rcX release, thank you. > -Paul I wanted to check in on this PR to see if you were planning on merging it for v5.15-rcX, kicking it back for -next instead, or simply glaring at it with quiet disgust? > -- > The following changes since commit 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f: > > Linux 5.15-rc1 (2021-09-12 16:28:37 -0700) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git > tags/selinux-pr-20210916 > > for you to fetch changes up to fdc9cbff7a764513a5e72a03b796087fcadb2fa3: > > lockdown,selinux: fix wrong subject in some SELinux lockdown checks > (2021-09-16 21:04:44 -0400) > > ---------------------------------------------------------------- > selinux/stable-5.15 PR 20210916 > > ---------------------------------------------------------------- > Ondrej Mosnacek (1): > lockdown,selinux: fix wrong subject in some SELinux lockdown checks > > arch/powerpc/xmon/xmon.c | 4 ++-- > arch/x86/kernel/ioport.c | 4 ++-- > arch/x86/kernel/msr.c | 4 ++-- > arch/x86/mm/testmmiotrace.c | 2 +- > drivers/acpi/acpi_configfs.c | 2 +- > drivers/acpi/custom_method.c | 2 +- > drivers/acpi/osl.c | 3 ++- > drivers/acpi/tables.c | 2 +- > drivers/char/mem.c | 2 +- > drivers/cxl/pci.c | 2 +- > drivers/firmware/efi/efi.c | 2 +- > drivers/firmware/efi/test/efi_test.c | 2 +- > drivers/pci/pci-sysfs.c | 6 +++--- > drivers/pci/proc.c | 6 +++--- > drivers/pci/syscall.c | 2 +- > drivers/pcmcia/cistpl.c | 2 +- > drivers/tty/serial/serial_core.c | 2 +- > fs/debugfs/file.c | 2 +- > fs/debugfs/inode.c | 2 +- > fs/proc/kcore.c | 2 +- > fs/tracefs/inode.c | 2 +- > include/linux/lsm_hook_defs.h | 2 +- > include/linux/lsm_hooks.h | 1 + > include/linux/security.h | 5 +++-- > kernel/bpf/helpers.c | 10 ++++++---- > kernel/events/core.c | 2 +- > kernel/kexec.c | 2 +- > kernel/kexec_file.c | 2 +- > kernel/module.c | 2 +- > kernel/params.c | 2 +- > kernel/power/hibernate.c | 2 +- > kernel/trace/bpf_trace.c | 25 +++++++++++++++---------- > kernel/trace/ftrace.c | 4 ++-- > kernel/trace/ring_buffer.c | 2 +- > kernel/trace/trace.c | 10 +++++----- > kernel/trace/trace_events.c | 2 +- > kernel/trace/trace_events_hist.c | 4 ++-- > kernel/trace/trace_events_synth.c | 2 +- > kernel/trace/trace_events_trigger.c | 2 +- > kernel/trace/trace_kprobe.c | 6 +++--- > kernel/trace/trace_printk.c | 2 +- > kernel/trace/trace_stack.c | 2 +- > kernel/trace/trace_stat.c | 2 +- > kernel/trace/trace_uprobe.c | 4 ++-- > net/xfrm/xfrm_user.c | 11 +++++++++-- > security/lockdown/lockdown.c | 3 ++- > security/security.c | 4 ++-- > security/selinux/hooks.c | 7 +++++-- > 48 files changed, 100 insertions(+), 79 deletions(-)
On Wed, Sep 22, 2021 at 10:57 AM Paul Moore <paul@paul-moore.com> wrote: > > I wanted to check in on this PR to see if you were planning on merging > it for v5.15-rcX, kicking it back for -next instead, or simply glaring > at it with quiet disgust? Heh. I glanced at it with quiet disgust when it came in, and then it just got lost. But now I'm looking at it again since you reminded me, and I don't understand why it has to be done in such an ugly manner. And I also don't think the thing makes a whit of sense in the first place. Honestly, that whole "lockdown depends on current creds" makes NO SENSE at all to me. The whole and only point about lockdown is that it's locked down. Not that "oh, root can do X". That's against the point. The whole and only point of lockdown was as a global thing. Now somebody seems to have noticed that they violated that basic rule, and that it's about permissions after all, and it's just a complete mess. But why the heck would a normal lockdown user have to care about this fundamental design mistake? Why should a normal lockdown user pass in a credential that doesn't make sense? Since 99% of all users DON'T want special rules, why isn't the normal security_locked_down() kept the way it is? Make the few special cases do special things, in other words. This is *literally* why we have all those wrapper functions in security/security.c - so that people can do sane interfaces and not call down to the raw hooks. (Yeah, yeah, a lot of them do nothing but pass it down, but others do other sanity stuff so that callers don't have to do pointless boilerplate) IOW, why is this changing all the normal users that *really* don't want it, and that really have absolutely no business looking up the current creds? Make the regular security_locked_down() function do that, and add a if (WARN_ON_ONCE(!in_task())) return -EPERM; so that any bad cases get flagged and refuse to continue. And then the couple of special users (whether due to interrupt context or whatever), could get their own wrapper function. Note how that would (a) make the patch smaller (b) not pollute normal users pointlessly (c) actually be a kind of documentation too (d) not make the default lockdown testing function be senseless because right now you have absolutely no explanation in the crazy cases, so y9ou have code like this: lockdown = !!security_locked_down(NULL, LOCKDOWN_XMON_RW); in xmon_is_locked_down(), and it makes no sense to anybody. Why the NULL? What is going on? Yes, yes, I can tell why the NULL, and what is going on, because I read the commit message and it's in my context. But look at that line, and tell me that it makes sense as code to anybody who has paged out that context (or never had it in the first place). If it said "security_globally_locked_down()" maybe it would at least give a hint about what's going on. But the other side of the argument is that the *common* lockdown functions are insane too after the patch. In kernel/module.c, this line: return security_locked_down(current_cred(), LOCKDOWN_MODULE_SIGNATURE); really screams "lockdown is fundamentally broken and mis-designed". Seriously. If lockdown needs "current creds" then lockdown is wrong. I was unhappy about lockdown from before, so maybe I'm more likely to just reject this kind of garbage, but this really looks completely broken to me. Hmm? This lockdown stuff is some ugly random code to begin with, I think the interfaces need to make SENSE. Passing in credentials fundamentally does not make sense to me. Linus
On Wed, Sep 22, 2021 at 1:55 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Make the regular security_locked_down() function do that, and add a > > if (WARN_ON_ONCE(!in_task())) > return -EPERM; > > so that any bad cases get flagged and refuse to continue. Actually, no, I take that back. It's not the "!in_task()" case that is the problem. That's just the symptom. The real problem is that we clearly have some lock-down rule that seems to care about credentials and who it is that does the lockdown query. That seems to be the real issue here. Doing lockdown checks from interrupts should be fine. The security layer really seems to be doing odd and random things. Maybe we should have some debug switch with the above WARN_ON() inside current_cred() (and a number of other 'current' users too). Just to see if there are other cases where people look up basically random credentials. When the security layer is confused, that's not a good sign. Linus
On Wed, Sep 22, 2021 at 5:10 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Sep 22, 2021 at 1:55 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Make the regular security_locked_down() function do that, and add a > > > > if (WARN_ON_ONCE(!in_task())) > > return -EPERM; > > > > so that any bad cases get flagged and refuse to continue. > > Actually, no, I take that back. > > It's not the "!in_task()" case that is the problem. That's just the symptom. > > The real problem is that we clearly have some lock-down rule that > seems to care about credentials and who it is that does the lockdown > query. That seems to be the real issue here. Doing lockdown checks > from interrupts should be fine. The basic idea, or problem from a LSM point of view, is that in some cases you have a user task which is doing the lockdown access check and in others you have the kernel itself; the creds parameter to security_locked_down() hook was intended to be used to indicate if it was a user task (param == current_cred()) or the kernel (param == NULL). There was a discussion about using two different hooks/funcs, e.g. security_locked_down() and security_locked_down_kern(), instead of the creds parameter, but there were more votes for the param variant. As I type this I'm trying to muster something other than indifference towards this patch, but the reality is I just want to be done with it. If you'll merge a revision of this patch that does away with the cred parameter and goes with the two hooks I'm not going to argue against it. During the review of the latest draft of this patch I half-jokingly said it was cursed, perhaps it's time to honestly consider it cursed.
On Wed, Sep 22, 2021 at 2:40 PM Paul Moore <paul@paul-moore.com> wrote: > > The basic idea, or problem from a LSM point of view, is that in some > cases you have a user task which is doing the lockdown access check > and in others you have the kernel itself I don't understand. In that case, it would be a boolean for "kernel vs user". But that's not what it is. It literally seems to care about _which_ user, and looks at cred_sid(). This is what makes no sense to me. If it's about lockdown,. then the user is immaterial. Either it's locked down, or it's not. Yeah, yeah, clearly that isn't how it works. Something is very rotten in the state of lockdown. But that rottenness shouldn't then be exposed as a horrible interface. Why has selinux allowed the SID to be an issue for SECCLASS_LOCKDOWN at all? And why is selinux foceing it's very odd and arguably completely misguided "lockdown" logic onto the security layer? Yes, using "current_sid()" in selinux_lockdown() is clearly wrong, since it's not sensible in an interrupt, but lockdown questions are. But why isn't that just considered a selinux bug, and that u32 sid = current_sid(); just replaced with something silly like // lockdown is lockdown, user labeling isn't relevant u32 sid = SECINITSID_UNLABELED; and solve that issue that way? Just say that lockdown cannot depend on who is asking for it. Linus
On Wed, Sep 22, 2021 at 7:43 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Sep 22, 2021 at 2:40 PM Paul Moore <paul@paul-moore.com> wrote: > > > > The basic idea, or problem from a LSM point of view, is that in some > > cases you have a user task which is doing the lockdown access check > > and in others you have the kernel itself > > I don't understand. In that case, it would be a boolean for "kernel vs user". > > But that's not what it is. It literally seems to care about _which_ > user, and looks at cred_sid(). Well, yes, it does look at the credential if it is passed; I guess I wrongly assumed that was understood. If it was just a simple user/kernel decision then yes, it would be a boolean (or similar). > This is what makes no sense to me. If it's about lockdown,. then the > user is immaterial. Either it's locked down, or it's not. If all you have is the lockdown LSM, then yes, lockdown doesn't take into account the context of the request, it is simply a test of the lockdown threshold: only disclosures on the proper side of the lockdown value are allowed. However, we have the LSM framework because there is never one way to solve a problem, and the LSM hooks have always changed to support these different approaches to access control. While the lockdown LSM takes a context-free approach to enforcing the lockdown setting, the SELinux LSM takes a different enforcement approach which not only better integrates with the SELinux policy, but it offers new functionality beyond the lockdown LSM: * Access based on the integrity and confidentiality reasons can be specified independently with SELinux. * Provide the ability to define the lockdown level within the context of individual security domains. It's also worth noting that with LSM stacking and the combination of the lockdown and SELinux LSMs, the SELinux lockdown controls would not grant any additional disclosures beyond what the lockdown LSM would allow, the SELinux controls would only further restrict the disclosure of specific security domains as specified in the SELinux policy. -- paul moore www.paul-moore.com
On Thu, Sep 23, 2021 at 8:43 AM Paul Moore <paul@paul-moore.com> wrote: > > However, we have the LSM framework because there is never one way to > solve a problem, The thing is, the lockdown patches were merged because they were allegedly sane. As far as I can tell, this is purely a SELinux internal bug. SELinux did something wrong. Stop doing it. Stop sending patches to then screw up the generic security layer, and violate the rules under which these patches were accepted. We have now this week have two discussions about the selinux doing completely invalid and incorrect things, and both were related to just thinking that it's ok to just randomly access thread data. At some point, you just have to look at the SELinux code and say :"this does something wrong". Instead of this kind of "no, everybody else is wrong, I will modify them to do what I mistakenly did". IOW, just make the patch be diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 6517f221d52c..4e93bf5dc8ef 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -7016,7 +7016,8 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) static int selinux_lockdown(enum lockdown_reason what) { struct common_audit_data ad; - u32 sid = current_sid(); + /* Lockdown requests come in non-thread context, can't use 'current_sid()' */ + u32 sid = SECINITSID_UNLABELED; int invalid_reason = (what <= LOCKDOWN_NONE) || (what == LOCKDOWN_INTEGRITY_MAX) || (what >= LOCKDOWN_CONFIDENTIALITY_MAX); and stop accessing random security ID's from random contexts. And stop thinking it's ok for SELinux to just do bad things. Linus
On Thu, Sep 23, 2021 at 11:53 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Sep 23, 2021 at 8:43 AM Paul Moore <paul@paul-moore.com> wrote: > > > > However, we have the LSM framework because there is never one way to > > solve a problem, > > The thing is, the lockdown patches were merged because they were allegedly sane. > > As far as I can tell, this is purely a SELinux internal bug. I'm not sure why that matters, but sure. There is a problem, we are working to fix it. > SELinux did something wrong. Stop doing it. We *are* trying to fix the problem. Please stop pretending we are not. You can certainly disagree with our approach, but I'm getting tired of the chastising where you imply we are actively trying to screw things up or do bad things. We care about the kernel. We care a lot. I care more than I probably should, and probably more than is healthy at times. You can disagree with the design and/or implementation, but making claims that we are "thinking it's ok for SELinux to just do bad things." is just plain wrong not to mention insulting. > Stop sending patches to > then screw up the generic security layer, and violate the rules under > which these patches were accepted. *Sigh* There was plenty of discussion about this patch and the previous drafts, no one was overly upset about adding the caller context/cred info. The other LSM folks were okay with it. We've got plenty of historical examples of the LSM hook evolving over time to adapt to LSMs adding new functionality, new LSMs, and new kernel modules. So let's look at why you are shouting about "screwing up the generic security layer", but let's try to keep the focus at the LSM interface level. Prior to this patch there was one relevant LSM hook for lockdown: int security_locked_down(enum lockdown_reason what); ... the patch in this PR changed it to this: int security_locked_down(const struct cred *cred, enum lockdown_reason what); It's become clear you *really* don't like passing the cred pointer here, presumably based on a very specific security model for lockdown. At this point there are two thoughts that spring to mind: 1) how else can we enable the SELinux model that we want to implement and 2) why is the LSM forcing a single security model on LSMs for the lockdown hook? Let's deal with the first point first. If you aren't going to merge a change to the LSM framework that allows for the context credentials, would you be willing to merge a new LSM hook that is used in place of the existing lockdown hook for callers that are not associated with a user task? Both hooks would take a single lockdown_reason as the only argument and would look something like this: int security_locked_down(enum lockdown_reason what); int security_locked_down_kern(enum lockdown_reason what); There is already precedence in the kernel as a whole for LSM hooks that exist solely for kernel (non-user tasks) operations so this wouldn't be a big stretch. LSMs that don't care to make a distinction between the two, e.g. the existing lockdown LSM, could set the LSM hook to point to the same function (in the case of lockdown this would be lockdown_is_locked_down()). However, if the above doesn't fly, let's move on to the second thought I mentioned above: why is the LSM forcing a single security model on LSMs for the lockdown hook? If the lockdown functionality is really going to be restricted to just a single security model, why is it implemented as a LSM and not as core kernel functionality? The original motivation for the LSM was that the kernel needed an abstraction layer to support multiple security models and we've seen it do just that over the years; SELinux may have been the first, but the number has certainly grown over the years and the LSM framework has evolved right along with it. Putting restrictions on the LSM framework so that only specific security models could be implemented is not something we have really done, and at this point I think it would be a major mistake. -- paul moore www.paul-moore.com
On Thu, Sep 23, 2021 at 10:13 AM Paul Moore <paul@paul-moore.com> wrote: > > It's become clear you *really* don't like passing the cred pointer > here, presumably based on a very specific security model for lockdown. Not just that. I'm sensitive about the cred pointer because of the timing, but I'm also sensitive about just looking at code and saying "that makes no sense". Having callers do pointless things that make no sense to the callers is a bad thing. Having a patch where 59 out of 63 callers just mindlessly pass in "current_cred()", and then remaining three callers pass in NULL for NO DISCERNIBLE REASON is a sign of just a bad interface. And I realize that to you, these interfaces are what you do. To everybody else, it's completely mindless noise that makes no sense at all, because there's no _logic_ to it. It's random LSM calls in random contexts that have no clear reason for them, because the "reason" is hidden in some odd SELinux rule that no kernel developer even knows about. You can't even grep for the use of it, because there is none. It's literally randomness. Which means that it _has_ to make conceptual sense to be at all maintainable. So when I see a diffstat where basically 90% of the patch has ABSOLUTELY NOTHING to do with SElinux or anything you maintain, and that 90% makes absolutely no sense to begin with, guess what? I think that patch is bad. When I then look closer, and see that the _reason_ for the patch is that SElinux - yet again - incorrectly accessed process credentials in random contexts, I just do "this is not just a bad patch, this is actually a fundamental problem with SELinux". See where I'm coming from? The "specific security model for lockdown" is just the high-level view of what made sense. You violated that view for bad reasons, with a bad patch, and with bad timing. This isn't _just_ a SElinux problem. I think it's a LSM problem in general. Lots of those random callbacks are completely incomprehensible, have odd rules, and the LSM people aren't even trying to have them make sense. As long as the oddity is contained, that's one thing. But when it is made this obvious and affects random code in strange places in the kernel, and it's for a feature that had a lot of discussion even when it got merged, I'm putting my foot down. Linus