Message ID | CAJfZ7==rsOexenE+76YX0fbgX9_eXGjygmNzaHL1G8azKk8EMw@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, 2017-04-14 at 21:43 +0200, Nicolas Iooss wrote: > On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.c > om> wrote: > > On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote: > > > On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote: > > > > On 04/14/2017 11:33 AM, Stephen Smalley wrote: > > > > > On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote: > > > > > > Bear with me please, because i might not fully grasp the > > > > > > issue (i > > > > > > received help with diagnosing this issue): > > > > > > > > > > > > This commit causes issues (and is, i think, a lousy hack): > > > > > > e3cab998b48ab293a9962faf9779d70ca339c65d > > > > > > > > > > > > The commit causes entities to "think" that SELinux is > > > > > > disabled > > > > > > after > > > > > > "mount -o remount,ro /sys/fs/selinux > > > > > > > > > > > > It is "neat" to be able to make processes "think" that > > > > > > selinux is > > > > > > disabled on a selinux enabled system but not if it break > > > > > > anything > > > > > > > > > > > > The above results in the following: > > > > > > > > > > > > Systemd services that have ProtectKernelTunables=yes set in > > > > > > their > > > > > > respective service units, think that SELinux is disabled. > > > > > > > > > > > > However we have found that some of these services actually > > > > > > rely > > > > > > on > > > > > > SELinux to ensure proper labeling. > > > > > > > > > > > > So we have the option to make people aware that if you set > > > > > > ProtectKernelTunables=yes that then the process cannot be > > > > > > SELinux- > > > > > > aware properly, or we can just get rid of the commit above > > > > > > and > > > > > > just > > > > > > accept that process know that SELinux is enabled. > > > > > > > > > > > > Actual bug that caused me to look into this: systemd- > > > > > > localed > > > > > > selinux > > > > > > awareness is broken due it having ProtectKernelTunables=yes > > > > > > in > > > > > > its > > > > > > service unit > > > > > > > > > > If selinuxfs is mounted read-only, then they can't use most > > > > > of the > > > > > selinuxfs interfaces, including even the ability to validate > > > > > or > > > > > canonicalize security contexts. That will break most > > > > > SELinux-aware > > > > > services if we tell them that SELinux is enabled. Are you > > > > > sure > > > > > systemd-localed would actually work if you told it SELinux > > > > > was > > > > > enabled > > > > > when selinuxfs was mounted read-only? What SELinux > > > > > interfaces is > > > > > it > > > > > using? > > > > > > > > > > The other question is whether ProtectKernelTunables ought to > > > > > be > > > > > mounting selinuxfs read-only. SELinux already controls the > > > > > ability > > > > > to > > > > > use its interfaces, including limiting even root, so it is > > > > > unclear > > > > > what > > > > > benefit we derive from having systemd add a further > > > > > restriction on > > > > > top. > > > > > > > > > > > > > Why is selinuxfs mounted readonly in this case? > > > > > > I don't actually see this in upstream systemd unless I am just > > > missing > > > it. > > > > > > systemd/src/core/namespace.c: > > > /* ProtectKernelTunables= option and the related filesystem APIs > > > */ > > > static const MountEntry protect_kernel_tunables_table[] = { > > > { "/proc/sys", READONLY, false }, > > > { "/proc/sysrq-trigger", READONLY, true }, > > > { "/proc/latency_stats", READONLY, true }, > > > { "/proc/mtrr", READONLY, true }, > > > { "/proc/apm", READONLY, true }, /* > > > Obsolete > > > API, there's no point in permitting access to this, ever */ > > > { "/proc/acpi", READONLY, true }, > > > { "/proc/timer_stats", READONLY, true }, > > > { "/proc/asound", READONLY, true }, > > > { "/proc/bus", READONLY, true }, > > > { "/proc/fs", READONLY, true }, > > > { "/proc/irq", READONLY, true }, > > > { "/sys", READONLY, false }, > > > { "/sys/kernel/debug", READONLY, true }, > > > { "/sys/kernel/tracing", READONLY, true }, > > > { "/sys/fs/cgroup", READWRITE, false }, /* > > > READONLY is > > > set by ProtectControlGroups= option */ > > > }; > > > > > > No mention of selinuxfs at all. Maybe it is a Fedora patch? > > > > > > > The reason we want this is so that processes inside of > > > > containers do > > > > not > > > > attempt to do SELinux stuff. > > > > > > > > http://danwalsh.livejournal.com/73099.html > > > > Before one dismisses my concern (8 minute proof): > > > > https://www.youtube.com/watch?v=YqiM1MlOG0w > > Hello, > I see this on Arch Linux as well, where there is no > distribution-specific patch which is applied to systemd (the only > patches which are applied are backported commits). A simple way to > see > that the selinuxfs is mounted read-only is the following command: > "localectl && findmnt --task $(pgrep systemd-localed)". It will > display the mountpoints of systemd-localed.service, which (with > systemd 232 [1]) contains: > > ├─/sys sys sysfs > ro,nosuid,nodev,noexec,relatime,seclabel > │ ├─/sys/firmware/efi/efivars efivarfs efivarfs > ro,nosuid,nodev,noexec,relatime > │ ├─/sys/kernel/security securityfs securityfs > ro,nosuid,nodev,noexec,relatime > │ ├─/sys/fs/selinux selinuxfs selinuxfs > ro,relatime > │ ├─/sys/fs/cgroup tmpfs tmpfs > ro,nosuid,nodev,noexec,seclabel,mode=755 > │ │ ├─/sys/fs/cgroup/systemd cgroup cgroup > ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/ > systemd-cgroups-agent,name= > │ │ ├─/sys/fs/cgroup/net_cls cgroup cgroup > ro,nosuid,nodev,noexec,relatime,net_cls > │ │ ├─/sys/fs/cgroup/perf_event cgroup cgroup > ro,nosuid,nodev,noexec,relatime,perf_event > │ │ ├─/sys/fs/cgroup/pids cgroup cgroup > ro,nosuid,nodev,noexec,relatime,pids > │ │ ├─/sys/fs/cgroup/blkio cgroup cgroup > ro,nosuid,nodev,noexec,relatime,blkio > │ │ ├─/sys/fs/cgroup/freezer cgroup cgroup > ro,nosuid,nodev,noexec,relatime,freezer > │ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup cgroup > ro,nosuid,nodev,noexec,relatime,cpu,cpuacct > │ │ ├─/sys/fs/cgroup/cpuset cgroup cgroup > ro,nosuid,nodev,noexec,relatime,cpuset > │ │ ├─/sys/fs/cgroup/devices cgroup cgroup > ro,nosuid,nodev,noexec,relatime,devices > │ │ └─/sys/fs/cgroup/memory cgroup cgroup > ro,nosuid,nodev,noexec,relatime,memory > │ ├─/sys/fs/pstore pstore pstore > ro,nosuid,nodev,noexec,relatime,seclabel > │ ├─/sys/kernel/debug debugfs debugfs > ro,relatime,seclabel > │ ├─/sys/kernel/config configfs configfs > ro,relatime > │ └─/sys/fs/fuse/connections fusectl fusectl > ro,relatime > > /sys/fs/selinux is mounted read-only. Moreover when I run "strace -f > -p 1 -e mount" while starting systemd-localed.service, I get: > > 3401 mount(NULL, "/sys/fs/cgroup/perf_event", NULL, > MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 > 3401 mount(NULL, "/sys/fs/cgroup/blkio", NULL, > MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 > 3401 mount(NULL, "/sys/fs/cgroup/pids", NULL, > MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 > 3401 mount(NULL, "/sys/fs/selinux", NULL, > MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0 > 3401 mount(NULL, "/sys/fs/cgroup", NULL, > MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 > 3401 mount(NULL, "/sys/kernel/debug", NULL, > MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0 > ... > > So /sys/fs/selinux *is* remounted read-only by systemd. When I remove > "ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is > not > remounted and is kept RW in the namespace of the service. Hmmm...so is systemd just recursively bind mounting everything under /sys as read-only? If so, why does it separately list /sys/kernel/debug and /sys/kernel/tracing in protect_kernel_tunables_table[]? > About containers, in http://danwalsh.livejournal.com/73099.html there > is: "In containers we don't mount these file systems by default or we > mount it read/only causing libselinux to report that it is > disabled.". > Why does /sys/fs/selinux need to be mounted read-only instead of not > been mounted at all? I'll defer that one to Dan. > About systemd-localed, its use of namespaces makes it "look like" a > container, but it needs to be SELinux-aware in order to use > /proc/thread-self/attr/fscreate. The use-case is to atomically create > files like /etc/vconsole.conf with the right context. In order to do > so, the service: > * loads the file context database, > * requests the expected context of /etc/vconsole.conf > (selabel_lookup_raw), > * configures the fscreate context (setfscreatecon_raw) > * creates a temporary file with this context named for example > "/etc/.#vconsole.confiYiPml", > * writes data to it and closes it, > * and finally renames it to /etc/vconsole.conf (with the rename > syscall) > > I am not aware of a way of making /etc/vconsole.conf have the right > file context in the end without making the program use libselinux's > API (named type_transition does not support patterns suitable for > temporary files). Did I miss something? Hmm...this is fragile. Suppose for instance that systemd were to start passing SELABEL_OPT_VALIDATE to selabel_open(). That would trigger failures because it wouldn't be able to write the context to /sys/fs/selinux/context to validate them. Or if it were using matchpathcon(), which writes the context to /sys/fs/selinux/context and reads back the canonicalized context for use (not sure why we stopped doing that in selabel_lookup; maybe that's a bug). > Anyway, there is a bug in vanilla code (it is not specific to Fedora) > and it is not clear whether it is a bug in libselinux code or in > systemd's one. Is it's libselinux, I have prepared a patch for it > (attached). Otherwise, what does systemd did wrong in its use of the > SELinux API? With regard to the patch, Dan or others would have to assess the compatibility implications, since there are userspace components now relying on is_selinux_enabled() to return 0 if selinuxfs is read-only. With regard to use of the SELinux API, we've never guaranteed that a subset of the API will work if selinuxfs is not available or is read- only. Obviously parts of it are usable, but that seems fragile. I don't really think systemd ought to be remounting it read-only, but maybe that's just me. > Nicolas > > [1] ProtectKernelTunables=yes has actually been introduced in systemd > 232 with https://github.com/systemd/systemd/commit/0c28d51ac84973904e > 5f780b024adf8108e69fa1
On 04/14/2017 04:41 PM, Stephen Smalley wrote: > On Fri, 2017-04-14 at 21:43 +0200, Nicolas Iooss wrote: >> On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.c >> om> wrote: >>> On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote: >>>> On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote: >>>>> On 04/14/2017 11:33 AM, Stephen Smalley wrote: >>>>>> On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote: >>>>>>> Bear with me please, because i might not fully grasp the >>>>>>> issue (i >>>>>>> received help with diagnosing this issue): >>>>>>> >>>>>>> This commit causes issues (and is, i think, a lousy hack): >>>>>>> e3cab998b48ab293a9962faf9779d70ca339c65d >>>>>>> >>>>>>> The commit causes entities to "think" that SELinux is >>>>>>> disabled >>>>>>> after >>>>>>> "mount -o remount,ro /sys/fs/selinux >>>>>>> >>>>>>> It is "neat" to be able to make processes "think" that >>>>>>> selinux is >>>>>>> disabled on a selinux enabled system but not if it break >>>>>>> anything >>>>>>> >>>>>>> The above results in the following: >>>>>>> >>>>>>> Systemd services that have ProtectKernelTunables=yes set in >>>>>>> their >>>>>>> respective service units, think that SELinux is disabled. >>>>>>> >>>>>>> However we have found that some of these services actually >>>>>>> rely >>>>>>> on >>>>>>> SELinux to ensure proper labeling. >>>>>>> >>>>>>> So we have the option to make people aware that if you set >>>>>>> ProtectKernelTunables=yes that then the process cannot be >>>>>>> SELinux- >>>>>>> aware properly, or we can just get rid of the commit above >>>>>>> and >>>>>>> just >>>>>>> accept that process know that SELinux is enabled. >>>>>>> >>>>>>> Actual bug that caused me to look into this: systemd- >>>>>>> localed >>>>>>> selinux >>>>>>> awareness is broken due it having ProtectKernelTunables=yes >>>>>>> in >>>>>>> its >>>>>>> service unit >>>>>> If selinuxfs is mounted read-only, then they can't use most >>>>>> of the >>>>>> selinuxfs interfaces, including even the ability to validate >>>>>> or >>>>>> canonicalize security contexts. That will break most >>>>>> SELinux-aware >>>>>> services if we tell them that SELinux is enabled. Are you >>>>>> sure >>>>>> systemd-localed would actually work if you told it SELinux >>>>>> was >>>>>> enabled >>>>>> when selinuxfs was mounted read-only? What SELinux >>>>>> interfaces is >>>>>> it >>>>>> using? >>>>>> >>>>>> The other question is whether ProtectKernelTunables ought to >>>>>> be >>>>>> mounting selinuxfs read-only. SELinux already controls the >>>>>> ability >>>>>> to >>>>>> use its interfaces, including limiting even root, so it is >>>>>> unclear >>>>>> what >>>>>> benefit we derive from having systemd add a further >>>>>> restriction on >>>>>> top. >>>>>> >>>>> Why is selinuxfs mounted readonly in this case? >>>> I don't actually see this in upstream systemd unless I am just >>>> missing >>>> it. >>>> >>>> systemd/src/core/namespace.c: >>>> /* ProtectKernelTunables= option and the related filesystem APIs >>>> */ >>>> static const MountEntry protect_kernel_tunables_table[] = { >>>> { "/proc/sys", READONLY, false }, >>>> { "/proc/sysrq-trigger", READONLY, true }, >>>> { "/proc/latency_stats", READONLY, true }, >>>> { "/proc/mtrr", READONLY, true }, >>>> { "/proc/apm", READONLY, true }, /* >>>> Obsolete >>>> API, there's no point in permitting access to this, ever */ >>>> { "/proc/acpi", READONLY, true }, >>>> { "/proc/timer_stats", READONLY, true }, >>>> { "/proc/asound", READONLY, true }, >>>> { "/proc/bus", READONLY, true }, >>>> { "/proc/fs", READONLY, true }, >>>> { "/proc/irq", READONLY, true }, >>>> { "/sys", READONLY, false }, >>>> { "/sys/kernel/debug", READONLY, true }, >>>> { "/sys/kernel/tracing", READONLY, true }, >>>> { "/sys/fs/cgroup", READWRITE, false }, /* >>>> READONLY is >>>> set by ProtectControlGroups= option */ >>>> }; >>>> >>>> No mention of selinuxfs at all. Maybe it is a Fedora patch? >>>> >>>>> The reason we want this is so that processes inside of >>>>> containers do >>>>> not >>>>> attempt to do SELinux stuff. >>>>> >>>>> http://danwalsh.livejournal.com/73099.html >>> Before one dismisses my concern (8 minute proof): >>> >>> https://www.youtube.com/watch?v=YqiM1MlOG0w >> Hello, >> I see this on Arch Linux as well, where there is no >> distribution-specific patch which is applied to systemd (the only >> patches which are applied are backported commits). A simple way to >> see >> that the selinuxfs is mounted read-only is the following command: >> "localectl && findmnt --task $(pgrep systemd-localed)". It will >> display the mountpoints of systemd-localed.service, which (with >> systemd 232 [1]) contains: >> >> ├─/sys sys sysfs >> ro,nosuid,nodev,noexec,relatime,seclabel >> │ ├─/sys/firmware/efi/efivars efivarfs efivarfs >> ro,nosuid,nodev,noexec,relatime >> │ ├─/sys/kernel/security securityfs securityfs >> ro,nosuid,nodev,noexec,relatime >> │ ├─/sys/fs/selinux selinuxfs selinuxfs >> ro,relatime >> │ ├─/sys/fs/cgroup tmpfs tmpfs >> ro,nosuid,nodev,noexec,seclabel,mode=755 >> │ │ ├─/sys/fs/cgroup/systemd cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/ >> systemd-cgroups-agent,name= >> │ │ ├─/sys/fs/cgroup/net_cls cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,net_cls >> │ │ ├─/sys/fs/cgroup/perf_event cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,perf_event >> │ │ ├─/sys/fs/cgroup/pids cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,pids >> │ │ ├─/sys/fs/cgroup/blkio cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,blkio >> │ │ ├─/sys/fs/cgroup/freezer cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,freezer >> │ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,cpu,cpuacct >> │ │ ├─/sys/fs/cgroup/cpuset cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,cpuset >> │ │ ├─/sys/fs/cgroup/devices cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,devices >> │ │ └─/sys/fs/cgroup/memory cgroup cgroup >> ro,nosuid,nodev,noexec,relatime,memory >> │ ├─/sys/fs/pstore pstore pstore >> ro,nosuid,nodev,noexec,relatime,seclabel >> │ ├─/sys/kernel/debug debugfs debugfs >> ro,relatime,seclabel >> │ ├─/sys/kernel/config configfs configfs >> ro,relatime >> │ └─/sys/fs/fuse/connections fusectl fusectl >> ro,relatime >> >> /sys/fs/selinux is mounted read-only. Moreover when I run "strace -f >> -p 1 -e mount" while starting systemd-localed.service, I get: >> >> 3401 mount(NULL, "/sys/fs/cgroup/perf_event", NULL, >> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >> 3401 mount(NULL, "/sys/fs/cgroup/blkio", NULL, >> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >> 3401 mount(NULL, "/sys/fs/cgroup/pids", NULL, >> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >> 3401 mount(NULL, "/sys/fs/selinux", NULL, >> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0 >> 3401 mount(NULL, "/sys/fs/cgroup", NULL, >> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >> 3401 mount(NULL, "/sys/kernel/debug", NULL, >> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0 >> ... >> >> So /sys/fs/selinux *is* remounted read-only by systemd. When I remove >> "ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is >> not >> remounted and is kept RW in the namespace of the service. > Hmmm...so is systemd just recursively bind mounting everything under > /sys as read-only? If so, why does it separately list /sys/kernel/debug > and /sys/kernel/tracing in protect_kernel_tunables_table[]? > >> About containers, in http://danwalsh.livejournal.com/73099.html there >> is: "In containers we don't mount these file systems by default or we >> mount it read/only causing libselinux to report that it is >> disabled.". >> Why does /sys/fs/selinux need to be mounted read-only instead of not >> been mounted at all? > I'll defer that one to Dan. I believe that libselinux still reports that the system is running with SELinux, if the selinuxfs is not mounted inside of the container at all. >> About systemd-localed, its use of namespaces makes it "look like" a >> container, but it needs to be SELinux-aware in order to use >> /proc/thread-self/attr/fscreate. The use-case is to atomically create >> files like /etc/vconsole.conf with the right context. In order to do >> so, the service: >> * loads the file context database, >> * requests the expected context of /etc/vconsole.conf >> (selabel_lookup_raw), >> * configures the fscreate context (setfscreatecon_raw) >> * creates a temporary file with this context named for example >> "/etc/.#vconsole.confiYiPml", >> * writes data to it and closes it, >> * and finally renames it to /etc/vconsole.conf (with the rename >> syscall) >> >> I am not aware of a way of making /etc/vconsole.conf have the right >> file context in the end without making the program use libselinux's >> API (named type_transition does not support patterns suitable for >> temporary files). Did I miss something? > Hmm...this is fragile. Suppose for instance that systemd were to start > passing SELABEL_OPT_VALIDATE to selabel_open(). That would trigger > failures because it wouldn't be able to write the context to > /sys/fs/selinux/context to validate them. Or if it were using > matchpathcon(), which writes the context to /sys/fs/selinux/context and > reads back the canonicalized context for use (not sure why we stopped > doing that in selabel_lookup; maybe that's a bug). > >> Anyway, there is a bug in vanilla code (it is not specific to Fedora) >> and it is not clear whether it is a bug in libselinux code or in >> systemd's one. Is it's libselinux, I have prepared a patch for it >> (attached). Otherwise, what does systemd did wrong in its use of the >> SELinux API? > With regard to the patch, Dan or others would have to assess the > compatibility implications, since there are userspace components now > relying on is_selinux_enabled() to return 0 if selinuxfs is read-only. > > With regard to use of the SELinux API, we've never guaranteed that a > subset of the API will work if selinuxfs is not available or is read- > only. Obviously parts of it are usable, but that seems fragile. I > don't really think systemd ought to be remounting it read-only, but > maybe that's just me. > >> Nicolas >> >> [1] ProtectKernelTunables=yes has actually been introduced in systemd >> 232 with https://github.com/systemd/systemd/commit/0c28d51ac84973904e >> 5f780b024adf8108e69fa1 > _______________________________________________ > Selinux mailing list > Selinux@tycho.nsa.gov > To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. > To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov.
On Sat, Apr 15, 2017 at 12:23 PM, Daniel Walsh <dwalsh@redhat.com> wrote: > On 04/14/2017 04:41 PM, Stephen Smalley wrote: >> On Fri, 2017-04-14 at 21:43 +0200, Nicolas Iooss wrote: >>> On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.c >>> om> wrote: >>>> On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote: >>>>> On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote: >>>>>> On 04/14/2017 11:33 AM, Stephen Smalley wrote: >>>>>>> On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote: >>>>>>>> Bear with me please, because i might not fully grasp the >>>>>>>> issue (i >>>>>>>> received help with diagnosing this issue): >>>>>>>> >>>>>>>> This commit causes issues (and is, i think, a lousy hack): >>>>>>>> e3cab998b48ab293a9962faf9779d70ca339c65d >>>>>>>> >>>>>>>> The commit causes entities to "think" that SELinux is >>>>>>>> disabled >>>>>>>> after >>>>>>>> "mount -o remount,ro /sys/fs/selinux >>>>>>>> >>>>>>>> It is "neat" to be able to make processes "think" that >>>>>>>> selinux is >>>>>>>> disabled on a selinux enabled system but not if it break >>>>>>>> anything >>>>>>>> >>>>>>>> The above results in the following: >>>>>>>> >>>>>>>> Systemd services that have ProtectKernelTunables=yes set in >>>>>>>> their >>>>>>>> respective service units, think that SELinux is disabled. >>>>>>>> >>>>>>>> However we have found that some of these services actually >>>>>>>> rely >>>>>>>> on >>>>>>>> SELinux to ensure proper labeling. >>>>>>>> >>>>>>>> So we have the option to make people aware that if you set >>>>>>>> ProtectKernelTunables=yes that then the process cannot be >>>>>>>> SELinux- >>>>>>>> aware properly, or we can just get rid of the commit above >>>>>>>> and >>>>>>>> just >>>>>>>> accept that process know that SELinux is enabled. >>>>>>>> >>>>>>>> Actual bug that caused me to look into this: systemd- >>>>>>>> localed >>>>>>>> selinux >>>>>>>> awareness is broken due it having ProtectKernelTunables=yes >>>>>>>> in >>>>>>>> its >>>>>>>> service unit >>>>>>> If selinuxfs is mounted read-only, then they can't use most >>>>>>> of the >>>>>>> selinuxfs interfaces, including even the ability to validate >>>>>>> or >>>>>>> canonicalize security contexts. That will break most >>>>>>> SELinux-aware >>>>>>> services if we tell them that SELinux is enabled. Are you >>>>>>> sure >>>>>>> systemd-localed would actually work if you told it SELinux >>>>>>> was >>>>>>> enabled >>>>>>> when selinuxfs was mounted read-only? What SELinux >>>>>>> interfaces is >>>>>>> it >>>>>>> using? >>>>>>> >>>>>>> The other question is whether ProtectKernelTunables ought to >>>>>>> be >>>>>>> mounting selinuxfs read-only. SELinux already controls the >>>>>>> ability >>>>>>> to >>>>>>> use its interfaces, including limiting even root, so it is >>>>>>> unclear >>>>>>> what >>>>>>> benefit we derive from having systemd add a further >>>>>>> restriction on >>>>>>> top. >>>>>>> >>>>>> Why is selinuxfs mounted readonly in this case? >>>>> I don't actually see this in upstream systemd unless I am just >>>>> missing >>>>> it. >>>>> >>>>> systemd/src/core/namespace.c: >>>>> /* ProtectKernelTunables= option and the related filesystem APIs >>>>> */ >>>>> static const MountEntry protect_kernel_tunables_table[] = { >>>>> { "/proc/sys", READONLY, false }, >>>>> { "/proc/sysrq-trigger", READONLY, true }, >>>>> { "/proc/latency_stats", READONLY, true }, >>>>> { "/proc/mtrr", READONLY, true }, >>>>> { "/proc/apm", READONLY, true }, /* >>>>> Obsolete >>>>> API, there's no point in permitting access to this, ever */ >>>>> { "/proc/acpi", READONLY, true }, >>>>> { "/proc/timer_stats", READONLY, true }, >>>>> { "/proc/asound", READONLY, true }, >>>>> { "/proc/bus", READONLY, true }, >>>>> { "/proc/fs", READONLY, true }, >>>>> { "/proc/irq", READONLY, true }, >>>>> { "/sys", READONLY, false }, >>>>> { "/sys/kernel/debug", READONLY, true }, >>>>> { "/sys/kernel/tracing", READONLY, true }, >>>>> { "/sys/fs/cgroup", READWRITE, false }, /* >>>>> READONLY is >>>>> set by ProtectControlGroups= option */ >>>>> }; >>>>> >>>>> No mention of selinuxfs at all. Maybe it is a Fedora patch? >>>>> >>>>>> The reason we want this is so that processes inside of >>>>>> containers do >>>>>> not >>>>>> attempt to do SELinux stuff. >>>>>> >>>>>> http://danwalsh.livejournal.com/73099.html >>>> Before one dismisses my concern (8 minute proof): >>>> >>>> https://www.youtube.com/watch?v=YqiM1MlOG0w >>> Hello, >>> I see this on Arch Linux as well, where there is no >>> distribution-specific patch which is applied to systemd (the only >>> patches which are applied are backported commits). A simple way to >>> see >>> that the selinuxfs is mounted read-only is the following command: >>> "localectl && findmnt --task $(pgrep systemd-localed)". It will >>> display the mountpoints of systemd-localed.service, which (with >>> systemd 232 [1]) contains: >>> >>> ├─/sys sys sysfs >>> ro,nosuid,nodev,noexec,relatime,seclabel >>> │ ├─/sys/firmware/efi/efivars efivarfs efivarfs >>> ro,nosuid,nodev,noexec,relatime >>> │ ├─/sys/kernel/security securityfs securityfs >>> ro,nosuid,nodev,noexec,relatime >>> │ ├─/sys/fs/selinux selinuxfs selinuxfs >>> ro,relatime >>> │ ├─/sys/fs/cgroup tmpfs tmpfs >>> ro,nosuid,nodev,noexec,seclabel,mode=755 >>> │ │ ├─/sys/fs/cgroup/systemd cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/ >>> systemd-cgroups-agent,name= >>> │ │ ├─/sys/fs/cgroup/net_cls cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,net_cls >>> │ │ ├─/sys/fs/cgroup/perf_event cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,perf_event >>> │ │ ├─/sys/fs/cgroup/pids cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,pids >>> │ │ ├─/sys/fs/cgroup/blkio cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,blkio >>> │ │ ├─/sys/fs/cgroup/freezer cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,freezer >>> │ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,cpu,cpuacct >>> │ │ ├─/sys/fs/cgroup/cpuset cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,cpuset >>> │ │ ├─/sys/fs/cgroup/devices cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,devices >>> │ │ └─/sys/fs/cgroup/memory cgroup cgroup >>> ro,nosuid,nodev,noexec,relatime,memory >>> │ ├─/sys/fs/pstore pstore pstore >>> ro,nosuid,nodev,noexec,relatime,seclabel >>> │ ├─/sys/kernel/debug debugfs debugfs >>> ro,relatime,seclabel >>> │ ├─/sys/kernel/config configfs configfs >>> ro,relatime >>> │ └─/sys/fs/fuse/connections fusectl fusectl >>> ro,relatime >>> >>> /sys/fs/selinux is mounted read-only. Moreover when I run "strace -f >>> -p 1 -e mount" while starting systemd-localed.service, I get: >>> >>> 3401 mount(NULL, "/sys/fs/cgroup/perf_event", NULL, >>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >>> 3401 mount(NULL, "/sys/fs/cgroup/blkio", NULL, >>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >>> 3401 mount(NULL, "/sys/fs/cgroup/pids", NULL, >>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >>> 3401 mount(NULL, "/sys/fs/selinux", NULL, >>> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0 >>> 3401 mount(NULL, "/sys/fs/cgroup", NULL, >>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 >>> 3401 mount(NULL, "/sys/kernel/debug", NULL, >>> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0 >>> ... >>> >>> So /sys/fs/selinux *is* remounted read-only by systemd. When I remove >>> "ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is >>> not >>> remounted and is kept RW in the namespace of the service. >> Hmmm...so is systemd just recursively bind mounting everything under >> /sys as read-only? If so, why does it separately list /sys/kernel/debug >> and /sys/kernel/tracing in protect_kernel_tunables_table[]? systemd/src/core/namespace.c also contains the definition of make_read_only(), which calls bind_remount_recursive(mount_entry_path(m), true, blacklist). So the RO-remount is recursive. >> >>> About containers, in http://danwalsh.livejournal.com/73099.html there >>> is: "In containers we don't mount these file systems by default or we >>> mount it read/only causing libselinux to report that it is >>> disabled.". >>> Why does /sys/fs/selinux need to be mounted read-only instead of not >>> been mounted at all? >> I'll defer that one to Dan. > I believe that libselinux still reports that the system is running with > SELinux, if the selinuxfs is not mounted > inside of the container at all. On my system, I get: # unshare -m # sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: refpolicy-patched Current mode: permissive Mode from config file: permissive Policy MLS status: disabled Policy deny_unknown status: denied Max kernel policy version: 30 # mount -o remount,ro /sys/fs/selinux mount: selinuxfs mounted on /sys/fs/selinux. # sestatus SELinux status: disabled # umount /sys/fs/selinux umount: /sys/fs/selinux unmounted # sestatus SELinux status: disabled The last line seems to disagree with your belief. >>> About systemd-localed, its use of namespaces makes it "look like" a >>> container, but it needs to be SELinux-aware in order to use >>> /proc/thread-self/attr/fscreate. The use-case is to atomically create >>> files like /etc/vconsole.conf with the right context. In order to do >>> so, the service: >>> * loads the file context database, >>> * requests the expected context of /etc/vconsole.conf >>> (selabel_lookup_raw), >>> * configures the fscreate context (setfscreatecon_raw) >>> * creates a temporary file with this context named for example >>> "/etc/.#vconsole.confiYiPml", >>> * writes data to it and closes it, >>> * and finally renames it to /etc/vconsole.conf (with the rename >>> syscall) >>> >>> I am not aware of a way of making /etc/vconsole.conf have the right >>> file context in the end without making the program use libselinux's >>> API (named type_transition does not support patterns suitable for >>> temporary files). Did I miss something? >> Hmm...this is fragile. Suppose for instance that systemd were to start >> passing SELABEL_OPT_VALIDATE to selabel_open(). That would trigger >> failures because it wouldn't be able to write the context to >> /sys/fs/selinux/context to validate them. Or if it were using >> matchpathcon(), which writes the context to /sys/fs/selinux/context and >> reads back the canonicalized context for use (not sure why we stopped >> doing that in selabel_lookup; maybe that's a bug). >> >>> Anyway, there is a bug in vanilla code (it is not specific to Fedora) >>> and it is not clear whether it is a bug in libselinux code or in >>> systemd's one. Is it's libselinux, I have prepared a patch for it >>> (attached). Otherwise, what does systemd did wrong in its use of the >>> SELinux API? >> With regard to the patch, Dan or others would have to assess the >> compatibility implications, since there are userspace components now >> relying on is_selinux_enabled() to return 0 if selinuxfs is read-only. >> >> With regard to use of the SELinux API, we've never guaranteed that a >> subset of the API will work if selinuxfs is not available or is read- >> only. Obviously parts of it are usable, but that seems fragile. I >> don't really think systemd ought to be remounting it read-only, but >> maybe that's just me. All right. I was not aware of /sys/fs/selinux/context and, indeed, it makes sense to only support RW selinuxfs. I have opened a Pull Request for systemd on https://github.com/systemd/systemd/pull/5741 in order to keep /sys/fs/selinux writable. Thanks, Nicolas
On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote: > I believe that libselinux still reports that the system is running > with > SELinux, if the selinuxfs is not mounted > inside of the container at all. Not after the commit referenced in the subject line; you removed the fallback code to check /proc/filesystems for selinuxfs from is_selinux_enabled(), so if selinuxfs is not mounted at all, it will return 0 (not enabled). On non-Android, you can also cause is_selinux_enabled() to return 0 by not providing an /etc/selinux/config file in your container's root directory (see commit c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do not install selinux-policy in your container root, then it will return disabled.
On 04/17/2017 09:34 AM, Stephen Smalley wrote: > On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote: >> I believe that libselinux still reports that the system is running >> with >> SELinux, if the selinuxfs is not mounted >> inside of the container at all. > Not after the commit referenced in the subject line; you removed the > fallback code to check /proc/filesystems for selinuxfs from > is_selinux_enabled(), so if selinuxfs is not mounted at all, it will > return 0 (not enabled). On non-Android, you can also cause > is_selinux_enabled() to return 0 by not providing an > /etc/selinux/config file in your container's root directory (see commit > > c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do not > install selinux-policy in your container root, then it will return > disabled. > _______________________________________________ > Selinux mailing list > Selinux@tycho.nsa.gov > To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. > To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov. > > That seems to a chancy way of handling this. Since I can see it as pretty easy to accidently pull in selinux-policy package into a container and then the container gets /etc/selinux/config and stuff starts blowing up. Not sure why the availability of this file should indicate selinux is enabled.
On Mon, 2017-04-17 at 10:40 -0400, Daniel Walsh wrote: > On 04/17/2017 09:34 AM, Stephen Smalley wrote: > > On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote: > > > I believe that libselinux still reports that the system is > > > running > > > with > > > SELinux, if the selinuxfs is not mounted > > > inside of the container at all. > > > > Not after the commit referenced in the subject line; you removed > > the > > fallback code to check /proc/filesystems for selinuxfs from > > is_selinux_enabled(), so if selinuxfs is not mounted at all, it > > will > > return 0 (not enabled). On non-Android, you can also cause > > is_selinux_enabled() to return 0 by not providing an > > /etc/selinux/config file in your container's root directory (see > > commit > > > > c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do > > not > > install selinux-policy in your container root, then it will return > > disabled. > > That seems to a chancy way of handling this. Since I can see it as > pretty easy to accidently pull in selinux-policy package into a > container and then the container gets /etc/selinux/config and stuff > starts blowing up. Not sure why the availability of this file should > indicate selinux is enabled. The existence of /etc/selinux/config is necessary but not sufficient; is_selinux_enabled() only returns 1 if selinuxfs is mounted (read-write with the current logic) _and_ (on non-Android) if /etc/selinux/config exists. The /etc/selinux/config test was added to avoid a regression when we dropped the old no-policy-loaded test. In any event, not mounting selinuxfs within the container would suffice to cause is_selinux_enabled() to return 0.
On 04/17/2017 10:49 AM, Stephen Smalley wrote: > On Mon, 2017-04-17 at 10:40 -0400, Daniel Walsh wrote: >> On 04/17/2017 09:34 AM, Stephen Smalley wrote: >>> On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote: >>>> I believe that libselinux still reports that the system is >>>> running >>>> with >>>> SELinux, if the selinuxfs is not mounted >>>> inside of the container at all. >>> Not after the commit referenced in the subject line; you removed >>> the >>> fallback code to check /proc/filesystems for selinuxfs from >>> is_selinux_enabled(), so if selinuxfs is not mounted at all, it >>> will >>> return 0 (not enabled). On non-Android, you can also cause >>> is_selinux_enabled() to return 0 by not providing an >>> /etc/selinux/config file in your container's root directory (see >>> commit >>> >>> c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do >>> not >>> install selinux-policy in your container root, then it will return >>> disabled. >> That seems to a chancy way of handling this. Since I can see it as >> pretty easy to accidently pull in selinux-policy package into a >> container and then the container gets /etc/selinux/config and stuff >> starts blowing up. Not sure why the availability of this file should >> indicate selinux is enabled. > The existence of /etc/selinux/config is necessary but not sufficient; > is_selinux_enabled() only returns 1 if selinuxfs is mounted (read-write > with the current logic) _and_ (on non-Android) if /etc/selinux/config > exists. The /etc/selinux/config test was added to avoid a regression > when we dropped the old no-policy-loaded test. > > In any event, not mounting selinuxfs within the container would suffice > to cause is_selinux_enabled() to return 0. > > _______________________________________________ > Selinux mailing list > Selinux@tycho.nsa.gov > To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. > To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov. If that is the case, then I have no problem removing the read/only check. We can make sure /sys/fs/selinux is not mounted into the container.
From 92688d49f16a28875034c95827fbe0d20e221d7b Mon Sep 17 00:00:00 2001 From: Nicolas Iooss <nicolas.iooss@m4x.org> Date: Fri, 14 Apr 2017 21:27:42 +0200 Subject: [PATCH 1/1] libselinux: detect that SELinux is enabled when /sys/fs/selinux is mounted read-only systemd service units can use "ProtectKernelTunables=yes" in order to mount some file systems read-only (the documentation is available at https://www.freedesktop.org/software/systemd/man/systemd.exec.html). This makes /sys/fs/selinux read-only too. Services using such a configuration option sees SELinux as disabled because of the behavior described in commit e3cab998b48a ("libselinux mountpoint changing patch."): NOTE: We added the check for RO, to allow tools like mock to be able to tell a chroot that SELinux is disabled while enforcing it outside the chroot. However this changes the behavior of some systemd services in unexpected ways. For example systemd-localed uses libselinux in order to create /etc/locale.conf, /etc/vconsole.conf... with the right file context while using a temporary file (using setfscreatecon_raw() with the label of /etc/vconsole.conf). With ProtectKernelTunables=yes, systemd-localed sees SELinux as being disabled (because /sys/fs/selinux is mounted read-only) and creates files in /etc with a wrong label. Fix this issue by making verify_selinuxmnt() use read-only mount-points too. Reported-by: Dominick Grift <dac.override@gmail.com> Signed-off-by: Nicolas Iooss <nicolas.iooss@m4x.org> --- libselinux/src/init.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libselinux/src/init.c b/libselinux/src/init.c index 814c7e6a9964..4fca2b9c6ecd 100644 --- a/libselinux/src/init.c +++ b/libselinux/src/init.c @@ -42,9 +42,7 @@ static int verify_selinuxmnt(const char *mnt) struct statvfs vfsbuf; rc = statvfs(mnt, &vfsbuf); if (rc == 0) { - if (!(vfsbuf.f_flag & ST_RDONLY)) { - set_selinuxmnt(mnt); - } + set_selinuxmnt(mnt); return 0; } } -- 2.12.0