Message ID | 20190109091028.24485-1-omosnace@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Allow initializing the kernfs node's secctx based on its parent | expand |
Hello, On Wed, Jan 09, 2019 at 10:10:25AM +0100, Ondrej Mosnacek wrote: > The main motivation for this change is that the userspace users of cgroupfs > (which is built on kernfs) expect the usual security context inheritance > to work under SELinux (see [1] and [2]). This functionality is required for > better confinement of containers under SELinux. Can you please go into details on what the expected use cases are like for cgroupfs? It shows up as a filesystem but isn't a real one and has its own permission scheme for delegation and stuff. If sysfs hasn't needed selinux support, I'm having a bit of difficulty seeing why cgroupfs would. Thanks.
On Fri, Jan 11, 2019 at 9:51 PM Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Wed, Jan 09, 2019 at 10:10:25AM +0100, Ondrej Mosnacek wrote: > > The main motivation for this change is that the userspace users of cgroupfs > > (which is built on kernfs) expect the usual security context inheritance > > to work under SELinux (see [1] and [2]). This functionality is required for > > better confinement of containers under SELinux. > > Can you please go into details on what the expected use cases are like > for cgroupfs? It shows up as a filesystem but isn't a real one and > has its own permission scheme for delegation and stuff. If sysfs > hasn't needed selinux support, I'm having a bit of difficulty seeing > why cgroupfs would. I'm not sure what are the exact needs of the container people, but IIUC the goal is to make it possible to have a subtree labeled with a specific label (that gets inherited by newly created cgroups in that subtree by default) so that container processes do not need to be given permissions for the whole cgroupfs tree. I'm cc'ing Dan Walsh, who should be able to explain the use cases in more details. Dan, this is related to the cgroupfs labeling problem ([1] and [2]). See [3] for the root of this discussion. [1] https://github.com/SELinuxProject/selinux-kernel/issues/39 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1553803 [3] https://lore.kernel.org/selinux/CAFqZXNsxfjwDaCWDrqxP736y_3Jm-r=twaHtkkTDtMuym774Jw@mail.gmail.com/T/ -- Ondrej Mosnacek <omosnace at redhat dot com> Associate Software Engineer, Security Technologies Red Hat, Inc.
On Mon, Jan 14, 2019 at 10:14 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > [...] > [3] https://lore.kernel.org/selinux/CAFqZXNsxfjwDaCWDrqxP736y_3Jm-r=twaHtkkTDtMuym774Jw@mail.gmail.com/T/ Actually, this thread belongs to v1 of the patch series, which is archived here: https://lore.kernel.org/selinux/CAFqZXNu-bHGmUi80UiyW3djcbedycC+0KUyiQuv9-8b+WmrYuA@mail.gmail.com/T/
Hello, On Mon, Jan 14, 2019 at 10:14:32AM +0100, Ondrej Mosnacek wrote: > I'm not sure what are the exact needs of the container people, but > IIUC the goal is to make it possible to have a subtree labeled with a > specific label (that gets inherited by newly created cgroups in that > subtree by default) so that container processes do not need to be > given permissions for the whole cgroupfs tree. > > I'm cc'ing Dan Walsh, who should be able to explain the use cases in > more details. Dan, this is related to the cgroupfs labeling problem > ([1] and [2]). See [3] for the root of this discussion. Let's wait for Dan to respond but I'm pretty skeptical that this is a good direction. Thanks.
On 1/11/19 3:50 PM, Tejun Heo wrote: > Hello, > > On Wed, Jan 09, 2019 at 10:10:25AM +0100, Ondrej Mosnacek wrote: >> The main motivation for this change is that the userspace users of cgroupfs >> (which is built on kernfs) expect the usual security context inheritance >> to work under SELinux (see [1] and [2]). This functionality is required for >> better confinement of containers under SELinux. > > Can you please go into details on what the expected use cases are like > for cgroupfs? It shows up as a filesystem but isn't a real one and > has its own permission scheme for delegation and stuff. If sysfs > hasn't needed selinux support, I'm having a bit of difficulty seeing > why cgroupfs would. Just to clarify with respect to your last point about sysfs, sysfs selinux support was first introduced in commit ddd29ec6597125c830f7 ("sysfs: Add labeling support for sysfs") for use by libvirt, and this support was carried over into kernfs, and is extensively used particularly in Android for controlling access to sysfs files. The patch set in this series is extending that support to enable inheritance of security labels set via setxattr from parent to child when appropriate, which has particularly been requested for cgroup but would also be useful for sysfs.
Hello, On Thu, Jan 17, 2019 at 10:01:23AM -0500, Daniel Walsh wrote: > The above comment is correct. We want to be able to run a container > where we hand it control over a limited subdir of the cgroups hierachy. > We can currently do this and label the content correctly, but when > subdirs of the directory get created by processes inside the container > they do not get the correct label. For example we add a label like > system_u:object_r:container_file_t:s0 to a directory but when the > process inside of the container creates a fd within this directory the > kernel says the label is the default label for cgroups > system_u:object_r:cgroup_t:s0. This forces us to write looser policy > that from an SELinux point of view allows a process within the container > to write anywhere on the cgroup file system, rather then just the > designated directories. Can you please go into a bit more details on why the existing cgroup delegation model isn't enough? Thanks.
On 1/17/19 11:15 AM, Tejun Heo wrote: > Hello, > > On Thu, Jan 17, 2019 at 10:01:23AM -0500, Daniel Walsh wrote: >> The above comment is correct. We want to be able to run a container >> where we hand it control over a limited subdir of the cgroups hierachy. >> We can currently do this and label the content correctly, but when >> subdirs of the directory get created by processes inside the container >> they do not get the correct label. For example we add a label like >> system_u:object_r:container_file_t:s0 to a directory but when the >> process inside of the container creates a fd within this directory the >> kernel says the label is the default label for cgroups >> system_u:object_r:cgroup_t:s0. This forces us to write looser policy >> that from an SELinux point of view allows a process within the container >> to write anywhere on the cgroup file system, rather then just the >> designated directories. > > Can you please go into a bit more details on why the existing > cgroup delegation model isn't enough? I would hazard a guess that it is because the existing cgroup delegation model is based on user IDs and discretionary access control (DAC), whereas they are using per-container SELinux security contexts and mandatory access control (MAC) to enforce the separation of containers irrespective of UID and DAC. Optimally both would be supported by cgroup, as DAC and MAC have different properties and use cases.
On 1/17/19 11:39 AM, Stephen Smalley wrote: > On 1/17/19 11:15 AM, Tejun Heo wrote: >> Hello, >> >> On Thu, Jan 17, 2019 at 10:01:23AM -0500, Daniel Walsh wrote: >>> The above comment is correct. We want to be able to run a container >>> where we hand it control over a limited subdir of the cgroups hierachy. >>> We can currently do this and label the content correctly, but when >>> subdirs of the directory get created by processes inside the container >>> they do not get the correct label. For example we add a label like >>> system_u:object_r:container_file_t:s0 to a directory but when the >>> process inside of the container creates a fd within this directory the >>> kernel says the label is the default label for cgroups >>> system_u:object_r:cgroup_t:s0. This forces us to write looser policy >>> that from an SELinux point of view allows a process within the >>> container >>> to write anywhere on the cgroup file system, rather then just the >>> designated directories. >> >> Can you please go into a bit more details on why the existing >> cgroup delegation model isn't enough? > > I would hazard a guess that it is because the existing cgroup > delegation model is based on user IDs and discretionary access control > (DAC), whereas they are using per-container SELinux security contexts > and mandatory access control (MAC) to enforce the separation of > containers irrespective of UID and DAC. Optimally both would be > supported by cgroup, as DAC and MAC have different properties and use > cases. As Steven said, existing model is DAC. We have the situation where we have a "root" process running within a container that is not using User Namespace. I want to control that that root process can not write to anywhere within the cgroup hierarchy based on SELinux controls. This is security in depth. If other mechanisms prevent the process from writing to other places in cgroups that is great, but I want it also secured from a MAC Point of view.
On 1/17/19 11:15 AM, Tejun Heo wrote: > Hello, > > On Thu, Jan 17, 2019 at 10:01:23AM -0500, Daniel Walsh wrote: >> The above comment is correct. We want to be able to run a container >> where we hand it control over a limited subdir of the cgroups hierachy. >> We can currently do this and label the content correctly, but when >> subdirs of the directory get created by processes inside the container >> they do not get the correct label. For example we add a label like >> system_u:object_r:container_file_t:s0 to a directory but when the >> process inside of the container creates a fd within this directory the >> kernel says the label is the default label for cgroups >> system_u:object_r:cgroup_t:s0. This forces us to write looser policy >> that from an SELinux point of view allows a process within the container >> to write anywhere on the cgroup file system, rather then just the >> designated directories. > Can you please go into a bit more details on why the existing > cgroup delegation model isn't enough? > > Thanks. > If I label a container container_t:s0:c1,c2 by policy it can only write to container_file_t:s0:c1,c2. So the container engine sets up files and directories within the cgroup hierarchy with labels of container_file_t:s0:c1,c2. When the container writes to one of these directories, the kernel says the file is labeled cgroup_t:s0, and is denied by policy. In most/all other file systems that support labeling, the content of a directory gets the same label as the containing directory. So from an SELinux point of view, I would have expected the kernel to label the new file as container_file_t:s0:c1,c2 and everything would work securely. But cgroups does not work correctly so we need to add a rule that says container_t:s0:c1,c2 can write files labeles cgroup_t:s0 which means it can write anywhere on /sys/fs/cgroup. This is from a MAC Point of view. I don't care if other security measure might control this, I want to have security in depth and have MAC Control it.