diff mbox

let's revert e3cab998b48ab293a9962faf9779d70ca339c65d

Message ID CAJfZ7==rsOexenE+76YX0fbgX9_eXGjygmNzaHL1G8azKk8EMw@mail.gmail.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Nicolas Iooss April 14, 2017, 7:43 p.m. UTC
On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.com> wrote:
> On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote:
>> On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote:
>> > On 04/14/2017 11:33 AM, Stephen Smalley wrote:
>> > > On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote:
>> > > > Bear with me please, because i might not fully grasp the issue (i
>> > > > received help with diagnosing this issue):
>> > > >
>> > > > This commit causes issues (and is, i think, a lousy hack):
>> > > > e3cab998b48ab293a9962faf9779d70ca339c65d
>> > > >
>> > > > The commit causes entities to "think" that SELinux is disabled
>> > > > after
>> > > > "mount -o remount,ro /sys/fs/selinux
>> > > >
>> > > > It is "neat" to be able to make processes "think" that selinux is
>> > > > disabled on a selinux enabled system but not if it break anything
>> > > >
>> > > > The above results in the following:
>> > > >
>> > > > Systemd services that have ProtectKernelTunables=yes set in their
>> > > > respective service units, think that SELinux is disabled.
>> > > >
>> > > > However we have found that some of these services actually rely
>> > > > on
>> > > > SELinux to ensure proper labeling.
>> > > >
>> > > > So we have the option to make people aware that if you set
>> > > > ProtectKernelTunables=yes that then the process cannot be
>> > > > SELinux-
>> > > > aware properly, or we can just get rid of the commit above and
>> > > > just
>> > > > accept that process know that SELinux is enabled.
>> > > >
>> > > > Actual bug that caused me to look into this: systemd-localed
>> > > > selinux
>> > > > awareness is broken due it having ProtectKernelTunables=yes in
>> > > > its
>> > > > service unit
>> > >
>> > > If selinuxfs is mounted read-only, then they can't use most of the
>> > > selinuxfs interfaces, including even the ability to validate or
>> > > canonicalize security contexts.  That will break most SELinux-aware
>> > > services if we tell them that SELinux is enabled.  Are you sure
>> > > systemd-localed would actually work if you told it SELinux was
>> > > enabled
>> > > when selinuxfs was mounted read-only?  What SELinux interfaces is
>> > > it
>> > > using?
>> > >
>> > > The other question is whether ProtectKernelTunables ought to be
>> > > mounting selinuxfs read-only.  SELinux already controls the ability
>> > > to
>> > > use its interfaces, including limiting even root, so it is unclear
>> > > what
>> > > benefit we derive from having systemd add a further restriction on
>> > > top.
>> > >
>> >
>> > Why is selinuxfs mounted readonly in this case?
>>
>> I don't actually see this in upstream systemd unless I am just missing
>> it.
>>
>> systemd/src/core/namespace.c:
>> /* ProtectKernelTunables= option and the related filesystem APIs */
>> static const MountEntry protect_kernel_tunables_table[] = {
>>         { "/proc/sys",           READONLY,     false },
>>         { "/proc/sysrq-trigger", READONLY,     true  },
>>         { "/proc/latency_stats", READONLY,     true  },
>>         { "/proc/mtrr",          READONLY,     true  },
>>         { "/proc/apm",           READONLY,     true  }, /* Obsolete
>> API, there's no point in permitting access to this, ever */
>>         { "/proc/acpi",          READONLY,     true  },
>>         { "/proc/timer_stats",   READONLY,     true  },
>>         { "/proc/asound",        READONLY,     true  },
>>         { "/proc/bus",           READONLY,     true  },
>>         { "/proc/fs",            READONLY,     true  },
>>         { "/proc/irq",           READONLY,     true  },
>>         { "/sys",                READONLY,     false },
>>         { "/sys/kernel/debug",   READONLY,     true  },
>>         { "/sys/kernel/tracing", READONLY,     true  },
>>         { "/sys/fs/cgroup",      READWRITE,    false }, /* READONLY is
>> set by ProtectControlGroups= option */
>> };
>>
>> No mention of selinuxfs at all.  Maybe it is a Fedora patch?
>>
>> > The reason we want this is so that processes inside of containers do
>> > not
>> > attempt to do SELinux stuff.
>> >
>> > http://danwalsh.livejournal.com/73099.html
>
> Before one dismisses my concern (8 minute proof):
>
> https://www.youtube.com/watch?v=YqiM1MlOG0w

Hello,
I see this on Arch Linux as well, where there is no
distribution-specific patch which is applied to systemd (the only
patches which are applied are backported commits). A simple way to see
that the selinuxfs is mounted read-only is the following command:
"localectl && findmnt --task $(pgrep systemd-localed)". It will
display the mountpoints of systemd-localed.service, which (with
systemd 232 [1]) contains:

├─/sys                           sys                      sysfs
ro,nosuid,nodev,noexec,relatime,seclabel
│ ├─/sys/firmware/efi/efivars    efivarfs                 efivarfs
ro,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/security         securityfs               securityfs
ro,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/selinux              selinuxfs                selinuxfs
ro,relatime
│ ├─/sys/fs/cgroup               tmpfs                    tmpfs
ro,nosuid,nodev,noexec,seclabel,mode=755
│ │ ├─/sys/fs/cgroup/systemd     cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=
│ │ ├─/sys/fs/cgroup/net_cls     cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,net_cls
│ │ ├─/sys/fs/cgroup/perf_event  cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,perf_event
│ │ ├─/sys/fs/cgroup/pids        cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,pids
│ │ ├─/sys/fs/cgroup/blkio       cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,blkio
│ │ ├─/sys/fs/cgroup/freezer     cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,freezer
│ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,cpu,cpuacct
│ │ ├─/sys/fs/cgroup/cpuset      cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,cpuset
│ │ ├─/sys/fs/cgroup/devices     cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,devices
│ │ └─/sys/fs/cgroup/memory      cgroup                   cgroup
ro,nosuid,nodev,noexec,relatime,memory
│ ├─/sys/fs/pstore               pstore                   pstore
ro,nosuid,nodev,noexec,relatime,seclabel
│ ├─/sys/kernel/debug            debugfs                  debugfs
ro,relatime,seclabel
│ ├─/sys/kernel/config           configfs                 configfs
ro,relatime
│ └─/sys/fs/fuse/connections     fusectl                  fusectl
ro,relatime

/sys/fs/selinux is mounted read-only. Moreover when I run "strace -f
-p 1 -e mount" while starting systemd-localed.service, I get:

3401  mount(NULL, "/sys/fs/cgroup/perf_event", NULL,
MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
3401  mount(NULL, "/sys/fs/cgroup/blkio", NULL,
MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
3401  mount(NULL, "/sys/fs/cgroup/pids", NULL,
MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
3401  mount(NULL, "/sys/fs/selinux", NULL,
MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
3401  mount(NULL, "/sys/fs/cgroup", NULL,
MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
3401  mount(NULL, "/sys/kernel/debug", NULL,
MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
...

So /sys/fs/selinux *is* remounted read-only by systemd. When I remove
"ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is not
remounted and is kept RW in the namespace of the service.

About containers, in http://danwalsh.livejournal.com/73099.html there
is: "In containers we don't mount these file systems by default or we
mount it read/only causing libselinux to report that it is disabled.".
Why does /sys/fs/selinux need to be mounted read-only instead of not
been mounted at all?


About systemd-localed, its use of namespaces makes it "look like" a
container, but it needs to be SELinux-aware in order to use
/proc/thread-self/attr/fscreate. The use-case is to atomically create
files like /etc/vconsole.conf with the right context. In order to do
so, the service:
* loads the file context database,
* requests the expected context of /etc/vconsole.conf (selabel_lookup_raw),
* configures the fscreate context (setfscreatecon_raw)
* creates a temporary file with this context named for example
"/etc/.#vconsole.confiYiPml",
* writes data to it and closes it,
* and finally renames it to /etc/vconsole.conf (with the rename syscall)

I am not aware of a way of making /etc/vconsole.conf have the right
file context in the end without making the program use libselinux's
API (named type_transition does not support patterns suitable for
temporary files). Did I miss something?


Anyway, there is a bug in vanilla code (it is not specific to Fedora)
and it is not clear whether it is a bug in libselinux code or in
systemd's one. Is it's libselinux, I have prepared a patch for it
(attached). Otherwise, what does systemd did wrong in its use of the
SELinux API?

Nicolas

[1] ProtectKernelTunables=yes has actually been introduced in systemd
232 with https://github.com/systemd/systemd/commit/0c28d51ac84973904e5f780b024adf8108e69fa1

Comments

Stephen Smalley April 14, 2017, 8:41 p.m. UTC | #1
On Fri, 2017-04-14 at 21:43 +0200, Nicolas Iooss wrote:
> On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.c
> om> wrote:
> > On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote:
> > > On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote:
> > > > On 04/14/2017 11:33 AM, Stephen Smalley wrote:
> > > > > On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote:
> > > > > > Bear with me please, because i might not fully grasp the
> > > > > > issue (i
> > > > > > received help with diagnosing this issue):
> > > > > > 
> > > > > > This commit causes issues (and is, i think, a lousy hack):
> > > > > > e3cab998b48ab293a9962faf9779d70ca339c65d
> > > > > > 
> > > > > > The commit causes entities to "think" that SELinux is
> > > > > > disabled
> > > > > > after
> > > > > > "mount -o remount,ro /sys/fs/selinux
> > > > > > 
> > > > > > It is "neat" to be able to make processes "think" that
> > > > > > selinux is
> > > > > > disabled on a selinux enabled system but not if it break
> > > > > > anything
> > > > > > 
> > > > > > The above results in the following:
> > > > > > 
> > > > > > Systemd services that have ProtectKernelTunables=yes set in
> > > > > > their
> > > > > > respective service units, think that SELinux is disabled.
> > > > > > 
> > > > > > However we have found that some of these services actually
> > > > > > rely
> > > > > > on
> > > > > > SELinux to ensure proper labeling.
> > > > > > 
> > > > > > So we have the option to make people aware that if you set
> > > > > > ProtectKernelTunables=yes that then the process cannot be
> > > > > > SELinux-
> > > > > > aware properly, or we can just get rid of the commit above
> > > > > > and
> > > > > > just
> > > > > > accept that process know that SELinux is enabled.
> > > > > > 
> > > > > > Actual bug that caused me to look into this: systemd-
> > > > > > localed
> > > > > > selinux
> > > > > > awareness is broken due it having ProtectKernelTunables=yes
> > > > > > in
> > > > > > its
> > > > > > service unit
> > > > > 
> > > > > If selinuxfs is mounted read-only, then they can't use most
> > > > > of the
> > > > > selinuxfs interfaces, including even the ability to validate
> > > > > or
> > > > > canonicalize security contexts.  That will break most
> > > > > SELinux-aware
> > > > > services if we tell them that SELinux is enabled.  Are you
> > > > > sure
> > > > > systemd-localed would actually work if you told it SELinux
> > > > > was
> > > > > enabled
> > > > > when selinuxfs was mounted read-only?  What SELinux
> > > > > interfaces is
> > > > > it
> > > > > using?
> > > > > 
> > > > > The other question is whether ProtectKernelTunables ought to
> > > > > be
> > > > > mounting selinuxfs read-only.  SELinux already controls the
> > > > > ability
> > > > > to
> > > > > use its interfaces, including limiting even root, so it is
> > > > > unclear
> > > > > what
> > > > > benefit we derive from having systemd add a further
> > > > > restriction on
> > > > > top.
> > > > > 
> > > > 
> > > > Why is selinuxfs mounted readonly in this case?
> > > 
> > > I don't actually see this in upstream systemd unless I am just
> > > missing
> > > it.
> > > 
> > > systemd/src/core/namespace.c:
> > > /* ProtectKernelTunables= option and the related filesystem APIs
> > > */
> > > static const MountEntry protect_kernel_tunables_table[] = {
> > >         { "/proc/sys",           READONLY,     false },
> > >         { "/proc/sysrq-trigger", READONLY,     true  },
> > >         { "/proc/latency_stats", READONLY,     true  },
> > >         { "/proc/mtrr",          READONLY,     true  },
> > >         { "/proc/apm",           READONLY,     true  }, /*
> > > Obsolete
> > > API, there's no point in permitting access to this, ever */
> > >         { "/proc/acpi",          READONLY,     true  },
> > >         { "/proc/timer_stats",   READONLY,     true  },
> > >         { "/proc/asound",        READONLY,     true  },
> > >         { "/proc/bus",           READONLY,     true  },
> > >         { "/proc/fs",            READONLY,     true  },
> > >         { "/proc/irq",           READONLY,     true  },
> > >         { "/sys",                READONLY,     false },
> > >         { "/sys/kernel/debug",   READONLY,     true  },
> > >         { "/sys/kernel/tracing", READONLY,     true  },
> > >         { "/sys/fs/cgroup",      READWRITE,    false }, /*
> > > READONLY is
> > > set by ProtectControlGroups= option */
> > > };
> > > 
> > > No mention of selinuxfs at all.  Maybe it is a Fedora patch?
> > > 
> > > > The reason we want this is so that processes inside of
> > > > containers do
> > > > not
> > > > attempt to do SELinux stuff.
> > > > 
> > > > http://danwalsh.livejournal.com/73099.html
> > 
> > Before one dismisses my concern (8 minute proof):
> > 
> > https://www.youtube.com/watch?v=YqiM1MlOG0w
> 
> Hello,
> I see this on Arch Linux as well, where there is no
> distribution-specific patch which is applied to systemd (the only
> patches which are applied are backported commits). A simple way to
> see
> that the selinuxfs is mounted read-only is the following command:
> "localectl && findmnt --task $(pgrep systemd-localed)". It will
> display the mountpoints of systemd-localed.service, which (with
> systemd 232 [1]) contains:
> 
> ├─/sys                           sys                      sysfs
> ro,nosuid,nodev,noexec,relatime,seclabel
> │ ├─/sys/firmware/efi/efivars    efivarfs                 efivarfs
> ro,nosuid,nodev,noexec,relatime
> │ ├─/sys/kernel/security         securityfs               securityfs
> ro,nosuid,nodev,noexec,relatime
> │ ├─/sys/fs/selinux              selinuxfs                selinuxfs
> ro,relatime
> │ ├─/sys/fs/cgroup               tmpfs                    tmpfs
> ro,nosuid,nodev,noexec,seclabel,mode=755
> │ │ ├─/sys/fs/cgroup/systemd     cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/
> systemd-cgroups-agent,name=
> │ │ ├─/sys/fs/cgroup/net_cls     cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,net_cls
> │ │ ├─/sys/fs/cgroup/perf_event  cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,perf_event
> │ │ ├─/sys/fs/cgroup/pids        cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,pids
> │ │ ├─/sys/fs/cgroup/blkio       cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,blkio
> │ │ ├─/sys/fs/cgroup/freezer     cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,freezer
> │ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,cpu,cpuacct
> │ │ ├─/sys/fs/cgroup/cpuset      cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,cpuset
> │ │ ├─/sys/fs/cgroup/devices     cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,devices
> │ │ └─/sys/fs/cgroup/memory      cgroup                   cgroup
> ro,nosuid,nodev,noexec,relatime,memory
> │ ├─/sys/fs/pstore               pstore                   pstore
> ro,nosuid,nodev,noexec,relatime,seclabel
> │ ├─/sys/kernel/debug            debugfs                  debugfs
> ro,relatime,seclabel
> │ ├─/sys/kernel/config           configfs                 configfs
> ro,relatime
> │ └─/sys/fs/fuse/connections     fusectl                  fusectl
> ro,relatime
> 
> /sys/fs/selinux is mounted read-only. Moreover when I run "strace -f
> -p 1 -e mount" while starting systemd-localed.service, I get:
> 
> 3401  mount(NULL, "/sys/fs/cgroup/perf_event", NULL,
> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
> 3401  mount(NULL, "/sys/fs/cgroup/blkio", NULL,
> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
> 3401  mount(NULL, "/sys/fs/cgroup/pids", NULL,
> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
> 3401  mount(NULL, "/sys/fs/selinux", NULL,
> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
> 3401  mount(NULL, "/sys/fs/cgroup", NULL,
> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
> 3401  mount(NULL, "/sys/kernel/debug", NULL,
> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
> ...
> 
> So /sys/fs/selinux *is* remounted read-only by systemd. When I remove
> "ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is
> not
> remounted and is kept RW in the namespace of the service.

Hmmm...so is systemd just recursively bind mounting everything under
/sys as read-only? If so, why does it separately list /sys/kernel/debug
and /sys/kernel/tracing in protect_kernel_tunables_table[]? 

> About containers, in http://danwalsh.livejournal.com/73099.html there
> is: "In containers we don't mount these file systems by default or we
> mount it read/only causing libselinux to report that it is
> disabled.".
> Why does /sys/fs/selinux need to be mounted read-only instead of not
> been mounted at all?

I'll defer that one to Dan.

> About systemd-localed, its use of namespaces makes it "look like" a
> container, but it needs to be SELinux-aware in order to use
> /proc/thread-self/attr/fscreate. The use-case is to atomically create
> files like /etc/vconsole.conf with the right context. In order to do
> so, the service:
> * loads the file context database,
> * requests the expected context of /etc/vconsole.conf
> (selabel_lookup_raw),
> * configures the fscreate context (setfscreatecon_raw)
> * creates a temporary file with this context named for example
> "/etc/.#vconsole.confiYiPml",
> * writes data to it and closes it,
> * and finally renames it to /etc/vconsole.conf (with the rename
> syscall)
> 
> I am not aware of a way of making /etc/vconsole.conf have the right
> file context in the end without making the program use libselinux's
> API (named type_transition does not support patterns suitable for
> temporary files). Did I miss something?

Hmm...this is fragile.  Suppose for instance that systemd were to start
passing SELABEL_OPT_VALIDATE to selabel_open().  That would trigger
failures because it wouldn't be able to write the context to
/sys/fs/selinux/context to validate them.  Or if it were using
matchpathcon(), which writes the context to /sys/fs/selinux/context and
reads back the canonicalized context for use (not sure why we stopped
doing that in selabel_lookup; maybe that's a bug). 

> Anyway, there is a bug in vanilla code (it is not specific to Fedora)
> and it is not clear whether it is a bug in libselinux code or in
> systemd's one. Is it's libselinux, I have prepared a patch for it
> (attached). Otherwise, what does systemd did wrong in its use of the
> SELinux API?

With regard to the patch, Dan or others would have to assess the
compatibility implications, since there are userspace components now
relying on is_selinux_enabled() to return 0 if selinuxfs is read-only.

With regard to use of the SELinux API, we've never guaranteed that a
subset of the API will work if selinuxfs is not available or is read-
only.  Obviously parts of it are usable, but that seems fragile.  I
don't really think systemd ought to be remounting it read-only, but
maybe that's just me.

> Nicolas
> 
> [1] ProtectKernelTunables=yes has actually been introduced in systemd
> 232 with https://github.com/systemd/systemd/commit/0c28d51ac84973904e
> 5f780b024adf8108e69fa1
Daniel Walsh April 15, 2017, 10:23 a.m. UTC | #2
On 04/14/2017 04:41 PM, Stephen Smalley wrote:
> On Fri, 2017-04-14 at 21:43 +0200, Nicolas Iooss wrote:
>> On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.c
>> om> wrote:
>>> On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote:
>>>> On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote:
>>>>> On 04/14/2017 11:33 AM, Stephen Smalley wrote:
>>>>>> On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote:
>>>>>>> Bear with me please, because i might not fully grasp the
>>>>>>> issue (i
>>>>>>> received help with diagnosing this issue):
>>>>>>>
>>>>>>> This commit causes issues (and is, i think, a lousy hack):
>>>>>>> e3cab998b48ab293a9962faf9779d70ca339c65d
>>>>>>>
>>>>>>> The commit causes entities to "think" that SELinux is
>>>>>>> disabled
>>>>>>> after
>>>>>>> "mount -o remount,ro /sys/fs/selinux
>>>>>>>
>>>>>>> It is "neat" to be able to make processes "think" that
>>>>>>> selinux is
>>>>>>> disabled on a selinux enabled system but not if it break
>>>>>>> anything
>>>>>>>
>>>>>>> The above results in the following:
>>>>>>>
>>>>>>> Systemd services that have ProtectKernelTunables=yes set in
>>>>>>> their
>>>>>>> respective service units, think that SELinux is disabled.
>>>>>>>
>>>>>>> However we have found that some of these services actually
>>>>>>> rely
>>>>>>> on
>>>>>>> SELinux to ensure proper labeling.
>>>>>>>
>>>>>>> So we have the option to make people aware that if you set
>>>>>>> ProtectKernelTunables=yes that then the process cannot be
>>>>>>> SELinux-
>>>>>>> aware properly, or we can just get rid of the commit above
>>>>>>> and
>>>>>>> just
>>>>>>> accept that process know that SELinux is enabled.
>>>>>>>
>>>>>>> Actual bug that caused me to look into this: systemd-
>>>>>>> localed
>>>>>>> selinux
>>>>>>> awareness is broken due it having ProtectKernelTunables=yes
>>>>>>> in
>>>>>>> its
>>>>>>> service unit
>>>>>> If selinuxfs is mounted read-only, then they can't use most
>>>>>> of the
>>>>>> selinuxfs interfaces, including even the ability to validate
>>>>>> or
>>>>>> canonicalize security contexts.  That will break most
>>>>>> SELinux-aware
>>>>>> services if we tell them that SELinux is enabled.  Are you
>>>>>> sure
>>>>>> systemd-localed would actually work if you told it SELinux
>>>>>> was
>>>>>> enabled
>>>>>> when selinuxfs was mounted read-only?  What SELinux
>>>>>> interfaces is
>>>>>> it
>>>>>> using?
>>>>>>
>>>>>> The other question is whether ProtectKernelTunables ought to
>>>>>> be
>>>>>> mounting selinuxfs read-only.  SELinux already controls the
>>>>>> ability
>>>>>> to
>>>>>> use its interfaces, including limiting even root, so it is
>>>>>> unclear
>>>>>> what
>>>>>> benefit we derive from having systemd add a further
>>>>>> restriction on
>>>>>> top.
>>>>>>
>>>>> Why is selinuxfs mounted readonly in this case?
>>>> I don't actually see this in upstream systemd unless I am just
>>>> missing
>>>> it.
>>>>
>>>> systemd/src/core/namespace.c:
>>>> /* ProtectKernelTunables= option and the related filesystem APIs
>>>> */
>>>> static const MountEntry protect_kernel_tunables_table[] = {
>>>>         { "/proc/sys",           READONLY,     false },
>>>>         { "/proc/sysrq-trigger", READONLY,     true  },
>>>>         { "/proc/latency_stats", READONLY,     true  },
>>>>         { "/proc/mtrr",          READONLY,     true  },
>>>>         { "/proc/apm",           READONLY,     true  }, /*
>>>> Obsolete
>>>> API, there's no point in permitting access to this, ever */
>>>>         { "/proc/acpi",          READONLY,     true  },
>>>>         { "/proc/timer_stats",   READONLY,     true  },
>>>>         { "/proc/asound",        READONLY,     true  },
>>>>         { "/proc/bus",           READONLY,     true  },
>>>>         { "/proc/fs",            READONLY,     true  },
>>>>         { "/proc/irq",           READONLY,     true  },
>>>>         { "/sys",                READONLY,     false },
>>>>         { "/sys/kernel/debug",   READONLY,     true  },
>>>>         { "/sys/kernel/tracing", READONLY,     true  },
>>>>         { "/sys/fs/cgroup",      READWRITE,    false }, /*
>>>> READONLY is
>>>> set by ProtectControlGroups= option */
>>>> };
>>>>
>>>> No mention of selinuxfs at all.  Maybe it is a Fedora patch?
>>>>
>>>>> The reason we want this is so that processes inside of
>>>>> containers do
>>>>> not
>>>>> attempt to do SELinux stuff.
>>>>>
>>>>> http://danwalsh.livejournal.com/73099.html
>>> Before one dismisses my concern (8 minute proof):
>>>
>>> https://www.youtube.com/watch?v=YqiM1MlOG0w
>> Hello,
>> I see this on Arch Linux as well, where there is no
>> distribution-specific patch which is applied to systemd (the only
>> patches which are applied are backported commits). A simple way to
>> see
>> that the selinuxfs is mounted read-only is the following command:
>> "localectl && findmnt --task $(pgrep systemd-localed)". It will
>> display the mountpoints of systemd-localed.service, which (with
>> systemd 232 [1]) contains:
>>
>> ├─/sys                           sys                      sysfs
>> ro,nosuid,nodev,noexec,relatime,seclabel
>> │ ├─/sys/firmware/efi/efivars    efivarfs                 efivarfs
>> ro,nosuid,nodev,noexec,relatime
>> │ ├─/sys/kernel/security         securityfs               securityfs
>> ro,nosuid,nodev,noexec,relatime
>> │ ├─/sys/fs/selinux              selinuxfs                selinuxfs
>> ro,relatime
>> │ ├─/sys/fs/cgroup               tmpfs                    tmpfs
>> ro,nosuid,nodev,noexec,seclabel,mode=755
>> │ │ ├─/sys/fs/cgroup/systemd     cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/
>> systemd-cgroups-agent,name=
>> │ │ ├─/sys/fs/cgroup/net_cls     cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,net_cls
>> │ │ ├─/sys/fs/cgroup/perf_event  cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,perf_event
>> │ │ ├─/sys/fs/cgroup/pids        cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,pids
>> │ │ ├─/sys/fs/cgroup/blkio       cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,blkio
>> │ │ ├─/sys/fs/cgroup/freezer     cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,freezer
>> │ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,cpu,cpuacct
>> │ │ ├─/sys/fs/cgroup/cpuset      cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,cpuset
>> │ │ ├─/sys/fs/cgroup/devices     cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,devices
>> │ │ └─/sys/fs/cgroup/memory      cgroup                   cgroup
>> ro,nosuid,nodev,noexec,relatime,memory
>> │ ├─/sys/fs/pstore               pstore                   pstore
>> ro,nosuid,nodev,noexec,relatime,seclabel
>> │ ├─/sys/kernel/debug            debugfs                  debugfs
>> ro,relatime,seclabel
>> │ ├─/sys/kernel/config           configfs                 configfs
>> ro,relatime
>> │ └─/sys/fs/fuse/connections     fusectl                  fusectl
>> ro,relatime
>>
>> /sys/fs/selinux is mounted read-only. Moreover when I run "strace -f
>> -p 1 -e mount" while starting systemd-localed.service, I get:
>>
>> 3401  mount(NULL, "/sys/fs/cgroup/perf_event", NULL,
>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>> 3401  mount(NULL, "/sys/fs/cgroup/blkio", NULL,
>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>> 3401  mount(NULL, "/sys/fs/cgroup/pids", NULL,
>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>> 3401  mount(NULL, "/sys/fs/selinux", NULL,
>> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
>> 3401  mount(NULL, "/sys/fs/cgroup", NULL,
>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>> 3401  mount(NULL, "/sys/kernel/debug", NULL,
>> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
>> ...
>>
>> So /sys/fs/selinux *is* remounted read-only by systemd. When I remove
>> "ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is
>> not
>> remounted and is kept RW in the namespace of the service.
> Hmmm...so is systemd just recursively bind mounting everything under
> /sys as read-only? If so, why does it separately list /sys/kernel/debug
> and /sys/kernel/tracing in protect_kernel_tunables_table[]? 
>
>> About containers, in http://danwalsh.livejournal.com/73099.html there
>> is: "In containers we don't mount these file systems by default or we
>> mount it read/only causing libselinux to report that it is
>> disabled.".
>> Why does /sys/fs/selinux need to be mounted read-only instead of not
>> been mounted at all?
> I'll defer that one to Dan.
I believe that libselinux still reports that the system is running with
SELinux, if the selinuxfs is not mounted
inside of the container at all.
>> About systemd-localed, its use of namespaces makes it "look like" a
>> container, but it needs to be SELinux-aware in order to use
>> /proc/thread-self/attr/fscreate. The use-case is to atomically create
>> files like /etc/vconsole.conf with the right context. In order to do
>> so, the service:
>> * loads the file context database,
>> * requests the expected context of /etc/vconsole.conf
>> (selabel_lookup_raw),
>> * configures the fscreate context (setfscreatecon_raw)
>> * creates a temporary file with this context named for example
>> "/etc/.#vconsole.confiYiPml",
>> * writes data to it and closes it,
>> * and finally renames it to /etc/vconsole.conf (with the rename
>> syscall)
>>
>> I am not aware of a way of making /etc/vconsole.conf have the right
>> file context in the end without making the program use libselinux's
>> API (named type_transition does not support patterns suitable for
>> temporary files). Did I miss something?
> Hmm...this is fragile.  Suppose for instance that systemd were to start
> passing SELABEL_OPT_VALIDATE to selabel_open().  That would trigger
> failures because it wouldn't be able to write the context to
> /sys/fs/selinux/context to validate them.  Or if it were using
> matchpathcon(), which writes the context to /sys/fs/selinux/context and
> reads back the canonicalized context for use (not sure why we stopped
> doing that in selabel_lookup; maybe that's a bug). 
>
>> Anyway, there is a bug in vanilla code (it is not specific to Fedora)
>> and it is not clear whether it is a bug in libselinux code or in
>> systemd's one. Is it's libselinux, I have prepared a patch for it
>> (attached). Otherwise, what does systemd did wrong in its use of the
>> SELinux API?
> With regard to the patch, Dan or others would have to assess the
> compatibility implications, since there are userspace components now
> relying on is_selinux_enabled() to return 0 if selinuxfs is read-only.
>
> With regard to use of the SELinux API, we've never guaranteed that a
> subset of the API will work if selinuxfs is not available or is read-
> only.  Obviously parts of it are usable, but that seems fragile.  I
> don't really think systemd ought to be remounting it read-only, but
> maybe that's just me.
>
>> Nicolas
>>
>> [1] ProtectKernelTunables=yes has actually been introduced in systemd
>> 232 with https://github.com/systemd/systemd/commit/0c28d51ac84973904e
>> 5f780b024adf8108e69fa1
> _______________________________________________
> Selinux mailing list
> Selinux@tycho.nsa.gov
> To unsubscribe, send email to Selinux-leave@tycho.nsa.gov.
> To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov.
Nicolas Iooss April 15, 2017, 2:10 p.m. UTC | #3
On Sat, Apr 15, 2017 at 12:23 PM, Daniel Walsh <dwalsh@redhat.com> wrote:
> On 04/14/2017 04:41 PM, Stephen Smalley wrote:
>> On Fri, 2017-04-14 at 21:43 +0200, Nicolas Iooss wrote:
>>> On Fri, Apr 14, 2017 at 8:49 PM, Dominick Grift <dac.override@gmail.c
>>> om> wrote:
>>>> On Fri, Apr 14, 2017 at 01:56:30PM -0400, Stephen Smalley wrote:
>>>>> On Fri, 2017-04-14 at 13:47 -0400, Daniel Walsh wrote:
>>>>>> On 04/14/2017 11:33 AM, Stephen Smalley wrote:
>>>>>>> On Fri, 2017-04-14 at 16:57 +0200, Dominick Grift wrote:
>>>>>>>> Bear with me please, because i might not fully grasp the
>>>>>>>> issue (i
>>>>>>>> received help with diagnosing this issue):
>>>>>>>>
>>>>>>>> This commit causes issues (and is, i think, a lousy hack):
>>>>>>>> e3cab998b48ab293a9962faf9779d70ca339c65d
>>>>>>>>
>>>>>>>> The commit causes entities to "think" that SELinux is
>>>>>>>> disabled
>>>>>>>> after
>>>>>>>> "mount -o remount,ro /sys/fs/selinux
>>>>>>>>
>>>>>>>> It is "neat" to be able to make processes "think" that
>>>>>>>> selinux is
>>>>>>>> disabled on a selinux enabled system but not if it break
>>>>>>>> anything
>>>>>>>>
>>>>>>>> The above results in the following:
>>>>>>>>
>>>>>>>> Systemd services that have ProtectKernelTunables=yes set in
>>>>>>>> their
>>>>>>>> respective service units, think that SELinux is disabled.
>>>>>>>>
>>>>>>>> However we have found that some of these services actually
>>>>>>>> rely
>>>>>>>> on
>>>>>>>> SELinux to ensure proper labeling.
>>>>>>>>
>>>>>>>> So we have the option to make people aware that if you set
>>>>>>>> ProtectKernelTunables=yes that then the process cannot be
>>>>>>>> SELinux-
>>>>>>>> aware properly, or we can just get rid of the commit above
>>>>>>>> and
>>>>>>>> just
>>>>>>>> accept that process know that SELinux is enabled.
>>>>>>>>
>>>>>>>> Actual bug that caused me to look into this: systemd-
>>>>>>>> localed
>>>>>>>> selinux
>>>>>>>> awareness is broken due it having ProtectKernelTunables=yes
>>>>>>>> in
>>>>>>>> its
>>>>>>>> service unit
>>>>>>> If selinuxfs is mounted read-only, then they can't use most
>>>>>>> of the
>>>>>>> selinuxfs interfaces, including even the ability to validate
>>>>>>> or
>>>>>>> canonicalize security contexts.  That will break most
>>>>>>> SELinux-aware
>>>>>>> services if we tell them that SELinux is enabled.  Are you
>>>>>>> sure
>>>>>>> systemd-localed would actually work if you told it SELinux
>>>>>>> was
>>>>>>> enabled
>>>>>>> when selinuxfs was mounted read-only?  What SELinux
>>>>>>> interfaces is
>>>>>>> it
>>>>>>> using?
>>>>>>>
>>>>>>> The other question is whether ProtectKernelTunables ought to
>>>>>>> be
>>>>>>> mounting selinuxfs read-only.  SELinux already controls the
>>>>>>> ability
>>>>>>> to
>>>>>>> use its interfaces, including limiting even root, so it is
>>>>>>> unclear
>>>>>>> what
>>>>>>> benefit we derive from having systemd add a further
>>>>>>> restriction on
>>>>>>> top.
>>>>>>>
>>>>>> Why is selinuxfs mounted readonly in this case?
>>>>> I don't actually see this in upstream systemd unless I am just
>>>>> missing
>>>>> it.
>>>>>
>>>>> systemd/src/core/namespace.c:
>>>>> /* ProtectKernelTunables= option and the related filesystem APIs
>>>>> */
>>>>> static const MountEntry protect_kernel_tunables_table[] = {
>>>>>         { "/proc/sys",           READONLY,     false },
>>>>>         { "/proc/sysrq-trigger", READONLY,     true  },
>>>>>         { "/proc/latency_stats", READONLY,     true  },
>>>>>         { "/proc/mtrr",          READONLY,     true  },
>>>>>         { "/proc/apm",           READONLY,     true  }, /*
>>>>> Obsolete
>>>>> API, there's no point in permitting access to this, ever */
>>>>>         { "/proc/acpi",          READONLY,     true  },
>>>>>         { "/proc/timer_stats",   READONLY,     true  },
>>>>>         { "/proc/asound",        READONLY,     true  },
>>>>>         { "/proc/bus",           READONLY,     true  },
>>>>>         { "/proc/fs",            READONLY,     true  },
>>>>>         { "/proc/irq",           READONLY,     true  },
>>>>>         { "/sys",                READONLY,     false },
>>>>>         { "/sys/kernel/debug",   READONLY,     true  },
>>>>>         { "/sys/kernel/tracing", READONLY,     true  },
>>>>>         { "/sys/fs/cgroup",      READWRITE,    false }, /*
>>>>> READONLY is
>>>>> set by ProtectControlGroups= option */
>>>>> };
>>>>>
>>>>> No mention of selinuxfs at all.  Maybe it is a Fedora patch?
>>>>>
>>>>>> The reason we want this is so that processes inside of
>>>>>> containers do
>>>>>> not
>>>>>> attempt to do SELinux stuff.
>>>>>>
>>>>>> http://danwalsh.livejournal.com/73099.html
>>>> Before one dismisses my concern (8 minute proof):
>>>>
>>>> https://www.youtube.com/watch?v=YqiM1MlOG0w
>>> Hello,
>>> I see this on Arch Linux as well, where there is no
>>> distribution-specific patch which is applied to systemd (the only
>>> patches which are applied are backported commits). A simple way to
>>> see
>>> that the selinuxfs is mounted read-only is the following command:
>>> "localectl && findmnt --task $(pgrep systemd-localed)". It will
>>> display the mountpoints of systemd-localed.service, which (with
>>> systemd 232 [1]) contains:
>>>
>>> ├─/sys                           sys                      sysfs
>>> ro,nosuid,nodev,noexec,relatime,seclabel
>>> │ ├─/sys/firmware/efi/efivars    efivarfs                 efivarfs
>>> ro,nosuid,nodev,noexec,relatime
>>> │ ├─/sys/kernel/security         securityfs               securityfs
>>> ro,nosuid,nodev,noexec,relatime
>>> │ ├─/sys/fs/selinux              selinuxfs                selinuxfs
>>> ro,relatime
>>> │ ├─/sys/fs/cgroup               tmpfs                    tmpfs
>>> ro,nosuid,nodev,noexec,seclabel,mode=755
>>> │ │ ├─/sys/fs/cgroup/systemd     cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/
>>> systemd-cgroups-agent,name=
>>> │ │ ├─/sys/fs/cgroup/net_cls     cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,net_cls
>>> │ │ ├─/sys/fs/cgroup/perf_event  cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,perf_event
>>> │ │ ├─/sys/fs/cgroup/pids        cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,pids
>>> │ │ ├─/sys/fs/cgroup/blkio       cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,blkio
>>> │ │ ├─/sys/fs/cgroup/freezer     cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,freezer
>>> │ │ ├─/sys/fs/cgroup/cpu,cpuacct cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,cpu,cpuacct
>>> │ │ ├─/sys/fs/cgroup/cpuset      cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,cpuset
>>> │ │ ├─/sys/fs/cgroup/devices     cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,devices
>>> │ │ └─/sys/fs/cgroup/memory      cgroup                   cgroup
>>> ro,nosuid,nodev,noexec,relatime,memory
>>> │ ├─/sys/fs/pstore               pstore                   pstore
>>> ro,nosuid,nodev,noexec,relatime,seclabel
>>> │ ├─/sys/kernel/debug            debugfs                  debugfs
>>> ro,relatime,seclabel
>>> │ ├─/sys/kernel/config           configfs                 configfs
>>> ro,relatime
>>> │ └─/sys/fs/fuse/connections     fusectl                  fusectl
>>> ro,relatime
>>>
>>> /sys/fs/selinux is mounted read-only. Moreover when I run "strace -f
>>> -p 1 -e mount" while starting systemd-localed.service, I get:
>>>
>>> 3401  mount(NULL, "/sys/fs/cgroup/perf_event", NULL,
>>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>>> 3401  mount(NULL, "/sys/fs/cgroup/blkio", NULL,
>>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>>> 3401  mount(NULL, "/sys/fs/cgroup/pids", NULL,
>>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>>> 3401  mount(NULL, "/sys/fs/selinux", NULL,
>>> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
>>> 3401  mount(NULL, "/sys/fs/cgroup", NULL,
>>> MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
>>> 3401  mount(NULL, "/sys/kernel/debug", NULL,
>>> MS_RDONLY|MS_REMOUNT|MS_BIND, NULL) = 0
>>> ...
>>>
>>> So /sys/fs/selinux *is* remounted read-only by systemd. When I remove
>>> "ProtectKernelTunables=yes" from the unit file, /sys/fs/selinux is
>>> not
>>> remounted and is kept RW in the namespace of the service.
>> Hmmm...so is systemd just recursively bind mounting everything under
>> /sys as read-only? If so, why does it separately list /sys/kernel/debug
>> and /sys/kernel/tracing in protect_kernel_tunables_table[]?

systemd/src/core/namespace.c also contains the definition of
make_read_only(), which calls
bind_remount_recursive(mount_entry_path(m), true, blacklist). So the
RO-remount is recursive.

>>
>>> About containers, in http://danwalsh.livejournal.com/73099.html there
>>> is: "In containers we don't mount these file systems by default or we
>>> mount it read/only causing libselinux to report that it is
>>> disabled.".
>>> Why does /sys/fs/selinux need to be mounted read-only instead of not
>>> been mounted at all?
>> I'll defer that one to Dan.
> I believe that libselinux still reports that the system is running with
> SELinux, if the selinuxfs is not mounted
> inside of the container at all.

On my system, I get:

  # unshare -m
  # sestatus
  SELinux status:                 enabled
  SELinuxfs mount:                /sys/fs/selinux
  SELinux root directory:         /etc/selinux
  Loaded policy name:             refpolicy-patched
  Current mode:                   permissive
  Mode from config file:          permissive
  Policy MLS status:              disabled
  Policy deny_unknown status:     denied
  Max kernel policy version:      30
  # mount -o remount,ro /sys/fs/selinux
  mount: selinuxfs mounted on /sys/fs/selinux.
  # sestatus
  SELinux status:                 disabled
  # umount /sys/fs/selinux
  umount: /sys/fs/selinux unmounted
  # sestatus
  SELinux status:                 disabled

The last line seems to disagree with your belief.

>>> About systemd-localed, its use of namespaces makes it "look like" a
>>> container, but it needs to be SELinux-aware in order to use
>>> /proc/thread-self/attr/fscreate. The use-case is to atomically create
>>> files like /etc/vconsole.conf with the right context. In order to do
>>> so, the service:
>>> * loads the file context database,
>>> * requests the expected context of /etc/vconsole.conf
>>> (selabel_lookup_raw),
>>> * configures the fscreate context (setfscreatecon_raw)
>>> * creates a temporary file with this context named for example
>>> "/etc/.#vconsole.confiYiPml",
>>> * writes data to it and closes it,
>>> * and finally renames it to /etc/vconsole.conf (with the rename
>>> syscall)
>>>
>>> I am not aware of a way of making /etc/vconsole.conf have the right
>>> file context in the end without making the program use libselinux's
>>> API (named type_transition does not support patterns suitable for
>>> temporary files). Did I miss something?
>> Hmm...this is fragile.  Suppose for instance that systemd were to start
>> passing SELABEL_OPT_VALIDATE to selabel_open().  That would trigger
>> failures because it wouldn't be able to write the context to
>> /sys/fs/selinux/context to validate them.  Or if it were using
>> matchpathcon(), which writes the context to /sys/fs/selinux/context and
>> reads back the canonicalized context for use (not sure why we stopped
>> doing that in selabel_lookup; maybe that's a bug).
>>
>>> Anyway, there is a bug in vanilla code (it is not specific to Fedora)
>>> and it is not clear whether it is a bug in libselinux code or in
>>> systemd's one. Is it's libselinux, I have prepared a patch for it
>>> (attached). Otherwise, what does systemd did wrong in its use of the
>>> SELinux API?
>> With regard to the patch, Dan or others would have to assess the
>> compatibility implications, since there are userspace components now
>> relying on is_selinux_enabled() to return 0 if selinuxfs is read-only.
>>
>> With regard to use of the SELinux API, we've never guaranteed that a
>> subset of the API will work if selinuxfs is not available or is read-
>> only.  Obviously parts of it are usable, but that seems fragile.  I
>> don't really think systemd ought to be remounting it read-only, but
>> maybe that's just me.

All right. I was not aware of /sys/fs/selinux/context and, indeed, it
makes sense to only support RW selinuxfs. I have opened a Pull Request
for systemd on https://github.com/systemd/systemd/pull/5741 in order
to keep /sys/fs/selinux writable.

Thanks,
Nicolas
Stephen Smalley April 17, 2017, 1:34 p.m. UTC | #4
On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote:
> I believe that libselinux still reports that the system is running
> with
> SELinux, if the selinuxfs is not mounted
> inside of the container at all.

Not after the commit referenced in the subject line; you removed the
fallback code to check /proc/filesystems for selinuxfs from
is_selinux_enabled(), so if selinuxfs is not mounted at all, it will
return 0 (not enabled).  On non-Android, you can also cause
is_selinux_enabled() to return 0 by not providing an
/etc/selinux/config file in your container's root directory (see commit
 
c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do not
install selinux-policy in your container root, then it will return
disabled.
Daniel Walsh April 17, 2017, 2:40 p.m. UTC | #5
On 04/17/2017 09:34 AM, Stephen Smalley wrote:
> On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote:
>> I believe that libselinux still reports that the system is running
>> with
>> SELinux, if the selinuxfs is not mounted
>> inside of the container at all.
> Not after the commit referenced in the subject line; you removed the
> fallback code to check /proc/filesystems for selinuxfs from
> is_selinux_enabled(), so if selinuxfs is not mounted at all, it will
> return 0 (not enabled).  On non-Android, you can also cause
> is_selinux_enabled() to return 0 by not providing an
> /etc/selinux/config file in your container's root directory (see commit
>  
> c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do not
> install selinux-policy in your container root, then it will return
> disabled.
> _______________________________________________
> Selinux mailing list
> Selinux@tycho.nsa.gov
> To unsubscribe, send email to Selinux-leave@tycho.nsa.gov.
> To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov.
>
>
That seems to a chancy way of handling this.  Since I can see it as
pretty easy to accidently pull in selinux-policy package into a
container and then the container gets /etc/selinux/config and stuff
starts blowing up.  Not sure why the availability of this file should
indicate selinux is enabled.
Stephen Smalley April 17, 2017, 2:49 p.m. UTC | #6
On Mon, 2017-04-17 at 10:40 -0400, Daniel Walsh wrote:
> On 04/17/2017 09:34 AM, Stephen Smalley wrote:
> > On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote:
> > > I believe that libselinux still reports that the system is
> > > running
> > > with
> > > SELinux, if the selinuxfs is not mounted
> > > inside of the container at all.
> > 
> > Not after the commit referenced in the subject line; you removed
> > the
> > fallback code to check /proc/filesystems for selinuxfs from
> > is_selinux_enabled(), so if selinuxfs is not mounted at all, it
> > will
> > return 0 (not enabled).  On non-Android, you can also cause
> > is_selinux_enabled() to return 0 by not providing an
> > /etc/selinux/config file in your container's root directory (see
> > commit
> >  
> > c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do
> > not
> > install selinux-policy in your container root, then it will return
> > disabled.
> 
> That seems to a chancy way of handling this.  Since I can see it as
> pretty easy to accidently pull in selinux-policy package into a
> container and then the container gets /etc/selinux/config and stuff
> starts blowing up.  Not sure why the availability of this file should
> indicate selinux is enabled.

The existence of /etc/selinux/config is necessary but not sufficient;
is_selinux_enabled() only returns 1 if selinuxfs is mounted (read-write 
with the current logic) _and_ (on non-Android) if /etc/selinux/config
exists.  The /etc/selinux/config test was added to avoid a regression
when we dropped the old no-policy-loaded test.

In any event, not mounting selinuxfs within the container would suffice
to cause is_selinux_enabled() to return 0.
Daniel Walsh April 17, 2017, 3:08 p.m. UTC | #7
On 04/17/2017 10:49 AM, Stephen Smalley wrote:
> On Mon, 2017-04-17 at 10:40 -0400, Daniel Walsh wrote:
>> On 04/17/2017 09:34 AM, Stephen Smalley wrote:
>>> On Sat, 2017-04-15 at 06:23 -0400, Daniel Walsh wrote:
>>>> I believe that libselinux still reports that the system is
>>>> running
>>>> with
>>>> SELinux, if the selinuxfs is not mounted
>>>> inside of the container at all.
>>> Not after the commit referenced in the subject line; you removed
>>> the
>>> fallback code to check /proc/filesystems for selinuxfs from
>>> is_selinux_enabled(), so if selinuxfs is not mounted at all, it
>>> will
>>> return 0 (not enabled).  On non-Android, you can also cause
>>> is_selinux_enabled() to return 0 by not providing an
>>> /etc/selinux/config file in your container's root directory (see
>>> commit
>>>  
>>> c08c4eacab8d55598b9e5caaef8a871a7a476cab), i.e. as long as you do
>>> not
>>> install selinux-policy in your container root, then it will return
>>> disabled.
>> That seems to a chancy way of handling this.  Since I can see it as
>> pretty easy to accidently pull in selinux-policy package into a
>> container and then the container gets /etc/selinux/config and stuff
>> starts blowing up.  Not sure why the availability of this file should
>> indicate selinux is enabled.
> The existence of /etc/selinux/config is necessary but not sufficient;
> is_selinux_enabled() only returns 1 if selinuxfs is mounted (read-write 
> with the current logic) _and_ (on non-Android) if /etc/selinux/config
> exists.  The /etc/selinux/config test was added to avoid a regression
> when we dropped the old no-policy-loaded test.
>
> In any event, not mounting selinuxfs within the container would suffice
> to cause is_selinux_enabled() to return 0.
>
> _______________________________________________
> Selinux mailing list
> Selinux@tycho.nsa.gov
> To unsubscribe, send email to Selinux-leave@tycho.nsa.gov.
> To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov.

If that is the case, then I have no problem removing the read/only
check.  We can

make sure /sys/fs/selinux is not mounted into the container.
diff mbox

Patch

From 92688d49f16a28875034c95827fbe0d20e221d7b Mon Sep 17 00:00:00 2001
From: Nicolas Iooss <nicolas.iooss@m4x.org>
Date: Fri, 14 Apr 2017 21:27:42 +0200
Subject: [PATCH 1/1] libselinux: detect that SELinux is enabled when
 /sys/fs/selinux is mounted read-only

systemd service units can use "ProtectKernelTunables=yes" in order to
mount some file systems read-only (the documentation is available at
https://www.freedesktop.org/software/systemd/man/systemd.exec.html).
This makes /sys/fs/selinux read-only too.

Services using such a configuration option sees SELinux as disabled
because of the behavior described in commit e3cab998b48a ("libselinux
mountpoint changing patch."):

    NOTE:  We added the check for RO, to allow tools like mock to be
    able to tell a chroot that SELinux is disabled while enforcing it
    outside the chroot.

However this changes the behavior of some systemd services in unexpected
ways. For example systemd-localed uses libselinux in order to create
/etc/locale.conf, /etc/vconsole.conf... with the right file context
while using a temporary file (using setfscreatecon_raw() with the label
of /etc/vconsole.conf). With ProtectKernelTunables=yes, systemd-localed
sees SELinux as being disabled (because /sys/fs/selinux is mounted
read-only) and creates files in /etc with a wrong label.

Fix this issue by making verify_selinuxmnt() use read-only mount-points
too.

Reported-by: Dominick Grift <dac.override@gmail.com>
Signed-off-by: Nicolas Iooss <nicolas.iooss@m4x.org>
---
 libselinux/src/init.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libselinux/src/init.c b/libselinux/src/init.c
index 814c7e6a9964..4fca2b9c6ecd 100644
--- a/libselinux/src/init.c
+++ b/libselinux/src/init.c
@@ -42,9 +42,7 @@  static int verify_selinuxmnt(const char *mnt)
 			struct statvfs vfsbuf;
 			rc = statvfs(mnt, &vfsbuf);
 			if (rc == 0) {
-				if (!(vfsbuf.f_flag & ST_RDONLY)) {
-					set_selinuxmnt(mnt);
-				}
+				set_selinuxmnt(mnt);
 				return 0;
 			}
 		}
-- 
2.12.0