Message ID | 155991702981.15579.6007568669839441045.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | Mount, FS, Block and Keyrings notifications [ver #4] | expand |
On 6/7/19 10:17 AM, David Howells wrote: > > Hi Al, > > Here's a set of patches to add a general variable-length notification queue > concept and to add sources of events for: > > (1) Mount topology events, such as mounting, unmounting, mount expiry, > mount reconfiguration. > > (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O > errors (not complete yet). > > (3) Key/keyring events, such as creating, linking and removal of keys. > > (4) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > One of the reasons for this is so that we can remove the issue of processes > having to repeatedly and regularly scan /proc/mounts, which has proven to > be a system performance problem. To further aid this, the fsinfo() syscall > on which this patch series depends, provides a way to access superblock and > mount information in binary form without the need to parse /proc/mounts. > > > LSM support is included, but controversial: > > (1) The creds of the process that did the fput() that reduced the refcount > to zero are cached in the file struct. > > (2) __fput() overrides the current creds with the creds from (1) whilst > doing the cleanup, thereby making sure that the creds seen by the > destruction notification generated by mntput() appears to come from > the last fputter. > > (3) security_post_notification() is called for each queue that we might > want to post a notification into, thereby allowing the LSM to prevent > covert communications. > > (?) Do I need to add security_set_watch(), say, to rule on whether a watch > may be set in the first place? I might need to add a variant per > watch-type. > > (?) Do I really need to keep track of the process creds in which an > implicit object destruction happened? For example, imagine you create > an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > refers to on close unless move_mount() clears that flag. Now, imagine > someone looking at that fd through procfs at the same time as you exit > due to an error. The LSM sees the destruction notification come from > the looker if they happen to do their fput() after yours. I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. > > > Design decisions: > > (1) A misc chardev is used to create and open a ring buffer: > > fd = open("/dev/watch_queue", O_RDWR); > > which is then configured and mmap'd into userspace: > > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > The fd cannot be read or written (though there is a facility to use > write to inject records for debugging) and userspace just pulls data > directly out of the buffer. > > (2) The ring index pointers are stored inside the ring and are thus > accessible to userspace. Userspace should only update the tail > pointer and never the head pointer or risk breaking the buffer. The > kernel checks that the pointers appear valid before trying to use > them. A 'skip' record is maintained around the pointers. > > (3) poll() can be used to wait for data to appear in the buffer. > > (4) Records in the buffer are binary, typed and have a length so that they > can be of varying size. > > This means that multiple heterogeneous sources can share a common > buffer. Tags may be specified when a watchpoint is created to help > distinguish the sources. > > (5) The queue is reusable as there are 16 million types available, of > which I've used 4, so there is scope for others to be used. > > (6) Records are filterable as types have up to 256 subtypes that can be > individually filtered. Other filtration is also available. > > (7) Each time the buffer is opened, a new buffer is created - this means > that there's no interference between watchers. > > (8) When recording a notification, the kernel will not sleep, but will > rather mark a queue as overrun if there's insufficient space, thereby > avoiding userspace causing the kernel to hang. > > (9) The 'watchpoint' should be specific where possible, meaning that you > specify the object that you want to watch. > > (10) The buffer is created and then watchpoints are attached to it, using > one of: > > keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); > mount_notify(AT_FDCWD, "/", 0, fd, 0x02); > sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03); > > where in all three cases, fd indicates the queue and the number after > is a tag between 0 and 255. > > (11) The watch must be removed if either the watch buffer is destroyed or > the watched object is destroyed. > > > Things I want to avoid: > > (1) Introducing features that make the core VFS dependent on the network > stack or networking namespaces (ie. usage of netlink). > > (2) Dumping all this stuff into dmesg and having a daemon that sits there > parsing the output and distributing it as this then puts the > responsibility for security into userspace and makes handling > namespaces tricky. Further, dmesg might not exist or might be > inaccessible inside a container. > > (3) Letting users see events they shouldn't be able to see. > > > Further things that could be considered: > > (1) Adding a keyctl call to allow a watch on a keyring to be extended to > "children" of that keyring, such that the watch is removed from the > child if it is unlinked from the keyring. > > (2) Adding global superblock event queue. > > (3) Propagating watches to child superblock over automounts. > > > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications > > Changes: > > v4: Split the basic UAPI bits out into their own patch and then split the > LSM hooks out into an intermediate patch. Add LSM hooks for setting > watches. > > Rename the *_notify() system calls to watch_*() for consistency. > > v3: I've added a USB notification source and reformulated the block > notification source so that there's now a common watch list, for which > the system call is now device_notify(). > > I've assigned a pair of unused ioctl numbers in the 'W' series to the > ioctls added by this series. > > I've also added a description of the kernel API to the documentation. > > v2: I've fixed various issues raised by Jann Horn and GregKH and moved to > krefs for refcounting. I've added some security features to try and > give Casey Schaufler the LSM control he wants. > > David > --- > David Howells (13): > security: Override creds in __fput() with last fputter's creds > uapi: General notification ring definitions > security: Add hooks to rule on setting a watch > security: Add a hook for the point of notification insertion > General notification queue with user mmap()'able ring buffer > keys: Add a notification facility > vfs: Add a mount-notification facility > vfs: Add superblock notifications > fsinfo: Export superblock notification counter > Add a general, global device notification watch list > block: Add block layer notifications > usb: Add USB subsystem notifications > Add sample notification program > > > Documentation/ioctl/ioctl-number.txt | 1 > Documentation/security/keys/core.rst | 58 ++ > Documentation/watch_queue.rst | 492 ++++++++++++++++++ > arch/x86/entry/syscalls/syscall_32.tbl | 3 > arch/x86/entry/syscalls/syscall_64.tbl | 3 > block/Kconfig | 9 > block/blk-core.c | 29 + > drivers/base/Kconfig | 9 > drivers/base/Makefile | 1 > drivers/base/watch.c | 89 +++ > drivers/misc/Kconfig | 13 > drivers/misc/Makefile | 1 > drivers/misc/watch_queue.c | 889 ++++++++++++++++++++++++++++++++ > drivers/usb/core/Kconfig | 10 > drivers/usb/core/devio.c | 55 ++ > drivers/usb/core/hub.c | 3 > fs/Kconfig | 21 + > fs/Makefile | 1 > fs/file_table.c | 12 > fs/fsinfo.c | 12 > fs/mount.h | 33 + > fs/mount_notify.c | 187 +++++++ > fs/namespace.c | 9 > fs/super.c | 122 ++++ > include/linux/blkdev.h | 15 + > include/linux/dcache.h | 1 > include/linux/device.h | 7 > include/linux/fs.h | 79 +++ > include/linux/key.h | 4 > include/linux/lsm_hooks.h | 48 ++ > include/linux/security.h | 35 + > include/linux/syscalls.h | 5 > include/linux/usb.h | 19 + > include/linux/watch_queue.h | 87 +++ > include/uapi/linux/fsinfo.h | 10 > include/uapi/linux/keyctl.h | 1 > include/uapi/linux/watch_queue.h | 213 ++++++++ > kernel/sys_ni.c | 7 > samples/Kconfig | 6 > samples/Makefile | 1 > samples/vfs/test-fsinfo.c | 13 > samples/watch_queue/Makefile | 9 > samples/watch_queue/watch_test.c | 308 +++++++++++ > security/keys/Kconfig | 10 > security/keys/compat.c | 2 > security/keys/gc.c | 5 > security/keys/internal.h | 30 + > security/keys/key.c | 37 + > security/keys/keyctl.c | 95 +++ > security/keys/keyring.c | 17 - > security/keys/request_key.c | 4 > security/security.c | 29 + > 52 files changed, 3121 insertions(+), 38 deletions(-) > create mode 100644 Documentation/watch_queue.rst > create mode 100644 drivers/base/watch.c > create mode 100644 drivers/misc/watch_queue.c > create mode 100644 fs/mount_notify.c > create mode 100644 include/linux/watch_queue.h > create mode 100644 include/uapi/linux/watch_queue.h > create mode 100644 samples/watch_queue/Makefile > create mode 100644 samples/watch_queue/watch_test.c >
On 6/10/2019 8:21 AM, Stephen Smalley wrote: > On 6/7/19 10:17 AM, David Howells wrote: >> >> Hi Al, >> >> Here's a set of patches to add a general variable-length notification queue >> concept and to add sources of events for: >> >> (1) Mount topology events, such as mounting, unmounting, mount expiry, >> mount reconfiguration. >> >> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >> errors (not complete yet). >> >> (3) Key/keyring events, such as creating, linking and removal of keys. >> >> (4) General device events (single common queue) including: >> >> - Block layer events, such as device errors >> >> - USB subsystem events, such as device/bus attach/remove, device >> reset, device errors. >> >> One of the reasons for this is so that we can remove the issue of processes >> having to repeatedly and regularly scan /proc/mounts, which has proven to >> be a system performance problem. To further aid this, the fsinfo() syscall >> on which this patch series depends, provides a way to access superblock and >> mount information in binary form without the need to parse /proc/mounts. >> >> >> LSM support is included, but controversial: >> >> (1) The creds of the process that did the fput() that reduced the refcount >> to zero are cached in the file struct. >> >> (2) __fput() overrides the current creds with the creds from (1) whilst >> doing the cleanup, thereby making sure that the creds seen by the >> destruction notification generated by mntput() appears to come from >> the last fputter. >> >> (3) security_post_notification() is called for each queue that we might >> want to post a notification into, thereby allowing the LSM to prevent >> covert communications. >> >> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >> may be set in the first place? I might need to add a variant per >> watch-type. >> >> (?) Do I really need to keep track of the process creds in which an >> implicit object destruction happened? For example, imagine you create >> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >> refers to on close unless move_mount() clears that flag. Now, imagine >> someone looking at that fd through procfs at the same time as you exit >> due to an error. The LSM sees the destruction notification come from >> the looker if they happen to do their fput() after yours. > > I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. > > For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. > > I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. I cannot agree. There is an explicit action by a subject that results in information being delivered to an object. Just like a signal or a UDP packet delivery. Smack handles this kind of thing just fine. The internal mechanism that results in the access is irrelevant from this viewpoint. I can understand how a mechanism like SELinux that works on finer granularity might view it differently. > >> >> >> Design decisions: >> >> (1) A misc chardev is used to create and open a ring buffer: >> >> fd = open("/dev/watch_queue", O_RDWR); >> >> which is then configured and mmap'd into userspace: >> >> ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); >> ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); >> buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, >> MAP_SHARED, fd, 0); >> >> The fd cannot be read or written (though there is a facility to use >> write to inject records for debugging) and userspace just pulls data >> directly out of the buffer. >> >> (2) The ring index pointers are stored inside the ring and are thus >> accessible to userspace. Userspace should only update the tail >> pointer and never the head pointer or risk breaking the buffer. The >> kernel checks that the pointers appear valid before trying to use >> them. A 'skip' record is maintained around the pointers. >> >> (3) poll() can be used to wait for data to appear in the buffer. >> >> (4) Records in the buffer are binary, typed and have a length so that they >> can be of varying size. >> >> This means that multiple heterogeneous sources can share a common >> buffer. Tags may be specified when a watchpoint is created to help >> distinguish the sources. >> >> (5) The queue is reusable as there are 16 million types available, of >> which I've used 4, so there is scope for others to be used. >> >> (6) Records are filterable as types have up to 256 subtypes that can be >> individually filtered. Other filtration is also available. >> >> (7) Each time the buffer is opened, a new buffer is created - this means >> that there's no interference between watchers. >> >> (8) When recording a notification, the kernel will not sleep, but will >> rather mark a queue as overrun if there's insufficient space, thereby >> avoiding userspace causing the kernel to hang. >> >> (9) The 'watchpoint' should be specific where possible, meaning that you >> specify the object that you want to watch. >> >> (10) The buffer is created and then watchpoints are attached to it, using >> one of: >> >> keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); >> mount_notify(AT_FDCWD, "/", 0, fd, 0x02); >> sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03); >> >> where in all three cases, fd indicates the queue and the number after >> is a tag between 0 and 255. >> >> (11) The watch must be removed if either the watch buffer is destroyed or >> the watched object is destroyed. >> >> >> Things I want to avoid: >> >> (1) Introducing features that make the core VFS dependent on the network >> stack or networking namespaces (ie. usage of netlink). >> >> (2) Dumping all this stuff into dmesg and having a daemon that sits there >> parsing the output and distributing it as this then puts the >> responsibility for security into userspace and makes handling >> namespaces tricky. Further, dmesg might not exist or might be >> inaccessible inside a container. >> >> (3) Letting users see events they shouldn't be able to see. >> >> >> Further things that could be considered: >> >> (1) Adding a keyctl call to allow a watch on a keyring to be extended to >> "children" of that keyring, such that the watch is removed from the >> child if it is unlinked from the keyring. >> >> (2) Adding global superblock event queue. >> >> (3) Propagating watches to child superblock over automounts. >> >> >> The patches can be found here also: >> >> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications >> >> Changes: >> >> v4: Split the basic UAPI bits out into their own patch and then split the >> LSM hooks out into an intermediate patch. Add LSM hooks for setting >> watches. >> >> Rename the *_notify() system calls to watch_*() for consistency. >> >> v3: I've added a USB notification source and reformulated the block >> notification source so that there's now a common watch list, for which >> the system call is now device_notify(). >> >> I've assigned a pair of unused ioctl numbers in the 'W' series to the >> ioctls added by this series. >> >> I've also added a description of the kernel API to the documentation. >> >> v2: I've fixed various issues raised by Jann Horn and GregKH and moved to >> krefs for refcounting. I've added some security features to try and >> give Casey Schaufler the LSM control he wants. >> >> David >> --- >> David Howells (13): >> security: Override creds in __fput() with last fputter's creds >> uapi: General notification ring definitions >> security: Add hooks to rule on setting a watch >> security: Add a hook for the point of notification insertion >> General notification queue with user mmap()'able ring buffer >> keys: Add a notification facility >> vfs: Add a mount-notification facility >> vfs: Add superblock notifications >> fsinfo: Export superblock notification counter >> Add a general, global device notification watch list >> block: Add block layer notifications >> usb: Add USB subsystem notifications >> Add sample notification program >> >> >> Documentation/ioctl/ioctl-number.txt | 1 >> Documentation/security/keys/core.rst | 58 ++ >> Documentation/watch_queue.rst | 492 ++++++++++++++++++ >> arch/x86/entry/syscalls/syscall_32.tbl | 3 >> arch/x86/entry/syscalls/syscall_64.tbl | 3 >> block/Kconfig | 9 >> block/blk-core.c | 29 + >> drivers/base/Kconfig | 9 >> drivers/base/Makefile | 1 >> drivers/base/watch.c | 89 +++ >> drivers/misc/Kconfig | 13 >> drivers/misc/Makefile | 1 >> drivers/misc/watch_queue.c | 889 ++++++++++++++++++++++++++++++++ >> drivers/usb/core/Kconfig | 10 >> drivers/usb/core/devio.c | 55 ++ >> drivers/usb/core/hub.c | 3 >> fs/Kconfig | 21 + >> fs/Makefile | 1 >> fs/file_table.c | 12 >> fs/fsinfo.c | 12 >> fs/mount.h | 33 + >> fs/mount_notify.c | 187 +++++++ >> fs/namespace.c | 9 >> fs/super.c | 122 ++++ >> include/linux/blkdev.h | 15 + >> include/linux/dcache.h | 1 >> include/linux/device.h | 7 >> include/linux/fs.h | 79 +++ >> include/linux/key.h | 4 >> include/linux/lsm_hooks.h | 48 ++ >> include/linux/security.h | 35 + >> include/linux/syscalls.h | 5 >> include/linux/usb.h | 19 + >> include/linux/watch_queue.h | 87 +++ >> include/uapi/linux/fsinfo.h | 10 >> include/uapi/linux/keyctl.h | 1 >> include/uapi/linux/watch_queue.h | 213 ++++++++ >> kernel/sys_ni.c | 7 >> samples/Kconfig | 6 >> samples/Makefile | 1 >> samples/vfs/test-fsinfo.c | 13 >> samples/watch_queue/Makefile | 9 >> samples/watch_queue/watch_test.c | 308 +++++++++++ >> security/keys/Kconfig | 10 >> security/keys/compat.c | 2 >> security/keys/gc.c | 5 >> security/keys/internal.h | 30 + >> security/keys/key.c | 37 + >> security/keys/keyctl.c | 95 +++ >> security/keys/keyring.c | 17 - >> security/keys/request_key.c | 4 >> security/security.c | 29 + >> 52 files changed, 3121 insertions(+), 38 deletions(-) >> create mode 100644 Documentation/watch_queue.rst >> create mode 100644 drivers/base/watch.c >> create mode 100644 drivers/misc/watch_queue.c >> create mode 100644 fs/mount_notify.c >> create mode 100644 include/linux/watch_queue.h >> create mode 100644 include/uapi/linux/watch_queue.h >> create mode 100644 samples/watch_queue/Makefile >> create mode 100644 samples/watch_queue/watch_test.c >> >
On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 6/10/2019 8:21 AM, Stephen Smalley wrote: > > On 6/7/19 10:17 AM, David Howells wrote: > >> > >> Hi Al, > >> > >> Here's a set of patches to add a general variable-length notification queue > >> concept and to add sources of events for: > >> > >> (1) Mount topology events, such as mounting, unmounting, mount expiry, > >> mount reconfiguration. > >> > >> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O > >> errors (not complete yet). > >> > >> (3) Key/keyring events, such as creating, linking and removal of keys. > >> > >> (4) General device events (single common queue) including: > >> > >> - Block layer events, such as device errors > >> > >> - USB subsystem events, such as device/bus attach/remove, device > >> reset, device errors. > >> > >> One of the reasons for this is so that we can remove the issue of processes > >> having to repeatedly and regularly scan /proc/mounts, which has proven to > >> be a system performance problem. To further aid this, the fsinfo() syscall > >> on which this patch series depends, provides a way to access superblock and > >> mount information in binary form without the need to parse /proc/mounts. > >> > >> > >> LSM support is included, but controversial: > >> > >> (1) The creds of the process that did the fput() that reduced the refcount > >> to zero are cached in the file struct. > >> > >> (2) __fput() overrides the current creds with the creds from (1) whilst > >> doing the cleanup, thereby making sure that the creds seen by the > >> destruction notification generated by mntput() appears to come from > >> the last fputter. > >> > >> (3) security_post_notification() is called for each queue that we might > >> want to post a notification into, thereby allowing the LSM to prevent > >> covert communications. > >> > >> (?) Do I need to add security_set_watch(), say, to rule on whether a watch > >> may be set in the first place? I might need to add a variant per > >> watch-type. > >> > >> (?) Do I really need to keep track of the process creds in which an > >> implicit object destruction happened? For example, imagine you create > >> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > >> refers to on close unless move_mount() clears that flag. Now, imagine > >> someone looking at that fd through procfs at the same time as you exit > >> due to an error. The LSM sees the destruction notification come from > >> the looker if they happen to do their fput() after yours. > > > > I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. > > > > For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. > > > > I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. > > I cannot agree. There is an explicit action by a subject that results > in information being delivered to an object. Just like a signal or a > UDP packet delivery. Smack handles this kind of thing just fine. The > internal mechanism that results in the access is irrelevant from > this viewpoint. I can understand how a mechanism like SELinux that > works on finer granularity might view it differently. I think you really need to give an example of a coherent policy that needs this. As it stands, your analogy seems confusing. If someone changes the system clock, we don't restrict who is allowed to be notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock was changed based on who changed the clock. Similarly, if someone tries to receive a packet on a socket, we check whether they have the right to receive on that socket (from the endpoint in question) and, if the sender is local, whether the sender can send to that socket. We do not check whether the sender can send to the receiver. The signal example is inapplicable. Sending a signal to a process is an explicit action done to that process, and it can easily adversely affect the target. Of course it requires permission. --Andy
On 6/10/2019 9:42 AM, Andy Lutomirski wrote: > On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 6/10/2019 8:21 AM, Stephen Smalley wrote: >>> On 6/7/19 10:17 AM, David Howells wrote: >>>> Hi Al, >>>> >>>> Here's a set of patches to add a general variable-length notification queue >>>> concept and to add sources of events for: >>>> >>>> (1) Mount topology events, such as mounting, unmounting, mount expiry, >>>> mount reconfiguration. >>>> >>>> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >>>> errors (not complete yet). >>>> >>>> (3) Key/keyring events, such as creating, linking and removal of keys. >>>> >>>> (4) General device events (single common queue) including: >>>> >>>> - Block layer events, such as device errors >>>> >>>> - USB subsystem events, such as device/bus attach/remove, device >>>> reset, device errors. >>>> >>>> One of the reasons for this is so that we can remove the issue of processes >>>> having to repeatedly and regularly scan /proc/mounts, which has proven to >>>> be a system performance problem. To further aid this, the fsinfo() syscall >>>> on which this patch series depends, provides a way to access superblock and >>>> mount information in binary form without the need to parse /proc/mounts. >>>> >>>> >>>> LSM support is included, but controversial: >>>> >>>> (1) The creds of the process that did the fput() that reduced the refcount >>>> to zero are cached in the file struct. >>>> >>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>> doing the cleanup, thereby making sure that the creds seen by the >>>> destruction notification generated by mntput() appears to come from >>>> the last fputter. >>>> >>>> (3) security_post_notification() is called for each queue that we might >>>> want to post a notification into, thereby allowing the LSM to prevent >>>> covert communications. >>>> >>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>> may be set in the first place? I might need to add a variant per >>>> watch-type. >>>> >>>> (?) Do I really need to keep track of the process creds in which an >>>> implicit object destruction happened? For example, imagine you create >>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>> someone looking at that fd through procfs at the same time as you exit >>>> due to an error. The LSM sees the destruction notification come from >>>> the looker if they happen to do their fput() after yours. >>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. >>> >>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. >>> >>> I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. >> I cannot agree. There is an explicit action by a subject that results >> in information being delivered to an object. Just like a signal or a >> UDP packet delivery. Smack handles this kind of thing just fine. The >> internal mechanism that results in the access is irrelevant from >> this viewpoint. I can understand how a mechanism like SELinux that >> works on finer granularity might view it differently. > I think you really need to give an example of a coherent policy that > needs this. I keep telling you, and you keep ignoring what I say. > As it stands, your analogy seems confusing. It's pretty simple. I have given both the abstract and examples. > If someone > changes the system clock, we don't restrict who is allowed to be > notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock > was changed based on who changed the clock. That's right. The system clock is not an object that unprivileged processes can modify. In fact, it is not an object at all. If you care to look, you will see that Smack does nothing with the clock. > Similarly, if someone > tries to receive a packet on a socket, we check whether they have the > right to receive on that socket (from the endpoint in question) and, > if the sender is local, whether the sender can send to that socket. > We do not check whether the sender can send to the receiver. Bzzzt! Smack sure does. > The signal example is inapplicable. From a modeling viewpoint the actions are identical. > Sending a signal to a process is > an explicit action done to that process, and it can easily adversely > affect the target. Of course it requires permission. > > --Andy
> On Jun 10, 2019, at 11:01 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >> On 6/10/2019 9:42 AM, Andy Lutomirski wrote: >>> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> On 6/10/2019 8:21 AM, Stephen Smalley wrote: >>>>> On 6/7/19 10:17 AM, David Howells wrote: >>>>> Hi Al, >>>>> >>>>> Here's a set of patches to add a general variable-length notification queue >>>>> concept and to add sources of events for: >>>>> >>>>> (1) Mount topology events, such as mounting, unmounting, mount expiry, >>>>> mount reconfiguration. >>>>> >>>>> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >>>>> errors (not complete yet). >>>>> >>>>> (3) Key/keyring events, such as creating, linking and removal of keys. >>>>> >>>>> (4) General device events (single common queue) including: >>>>> >>>>> - Block layer events, such as device errors >>>>> >>>>> - USB subsystem events, such as device/bus attach/remove, device >>>>> reset, device errors. >>>>> >>>>> One of the reasons for this is so that we can remove the issue of processes >>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to >>>>> be a system performance problem. To further aid this, the fsinfo() syscall >>>>> on which this patch series depends, provides a way to access superblock and >>>>> mount information in binary form without the need to parse /proc/mounts. >>>>> >>>>> >>>>> LSM support is included, but controversial: >>>>> >>>>> (1) The creds of the process that did the fput() that reduced the refcount >>>>> to zero are cached in the file struct. >>>>> >>>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>>> doing the cleanup, thereby making sure that the creds seen by the >>>>> destruction notification generated by mntput() appears to come from >>>>> the last fputter. >>>>> >>>>> (3) security_post_notification() is called for each queue that we might >>>>> want to post a notification into, thereby allowing the LSM to prevent >>>>> covert communications. >>>>> >>>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>>> may be set in the first place? I might need to add a variant per >>>>> watch-type. >>>>> >>>>> (?) Do I really need to keep track of the process creds in which an >>>>> implicit object destruction happened? For example, imagine you create >>>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>>> someone looking at that fd through procfs at the same time as you exit >>>>> due to an error. The LSM sees the destruction notification come from >>>>> the looker if they happen to do their fput() after yours. >>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. >>>> >>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. >>>> >>>> I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. >>> I cannot agree. There is an explicit action by a subject that results >>> in information being delivered to an object. Just like a signal or a >>> UDP packet delivery. Smack handles this kind of thing just fine. The >>> internal mechanism that results in the access is irrelevant from >>> this viewpoint. I can understand how a mechanism like SELinux that >>> works on finer granularity might view it differently. >> I think you really need to give an example of a coherent policy that >> needs this. > > I keep telling you, and you keep ignoring what I say. > >> As it stands, your analogy seems confusing. > > It's pretty simple. I have given both the abstract > and examples. You gave the /dev/null example, which is inapplicable to this patchset. > >> If someone >> changes the system clock, we don't restrict who is allowed to be >> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock >> was changed based on who changed the clock. > > That's right. The system clock is not an object that > unprivileged processes can modify. In fact, it is not > an object at all. If you care to look, you will see that > Smack does nothing with the clock. And this is different from the mount tree how? > >> Similarly, if someone >> tries to receive a packet on a socket, we check whether they have the >> right to receive on that socket (from the endpoint in question) and, >> if the sender is local, whether the sender can send to that socket. >> We do not check whether the sender can send to the receiver. > > Bzzzt! Smack sure does. This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. > >> The signal example is inapplicable. > > From a modeling viewpoint the actions are identical. This seems incorrect to me and, I think, to most everyone else reading this. Can you explain? In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process.
On 6/10/2019 11:22 AM, Andy Lutomirski wrote: >> On Jun 10, 2019, at 11:01 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> >>> On 6/10/2019 9:42 AM, Andy Lutomirski wrote: >>>> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> On 6/10/2019 8:21 AM, Stephen Smalley wrote: >>>>>> On 6/7/19 10:17 AM, David Howells wrote: >>>>>> Hi Al, >>>>>> >>>>>> Here's a set of patches to add a general variable-length notification queue >>>>>> concept and to add sources of events for: >>>>>> >>>>>> (1) Mount topology events, such as mounting, unmounting, mount expiry, >>>>>> mount reconfiguration. >>>>>> >>>>>> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >>>>>> errors (not complete yet). >>>>>> >>>>>> (3) Key/keyring events, such as creating, linking and removal of keys. >>>>>> >>>>>> (4) General device events (single common queue) including: >>>>>> >>>>>> - Block layer events, such as device errors >>>>>> >>>>>> - USB subsystem events, such as device/bus attach/remove, device >>>>>> reset, device errors. >>>>>> >>>>>> One of the reasons for this is so that we can remove the issue of processes >>>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to >>>>>> be a system performance problem. To further aid this, the fsinfo() syscall >>>>>> on which this patch series depends, provides a way to access superblock and >>>>>> mount information in binary form without the need to parse /proc/mounts. >>>>>> >>>>>> >>>>>> LSM support is included, but controversial: >>>>>> >>>>>> (1) The creds of the process that did the fput() that reduced the refcount >>>>>> to zero are cached in the file struct. >>>>>> >>>>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>>>> doing the cleanup, thereby making sure that the creds seen by the >>>>>> destruction notification generated by mntput() appears to come from >>>>>> the last fputter. >>>>>> >>>>>> (3) security_post_notification() is called for each queue that we might >>>>>> want to post a notification into, thereby allowing the LSM to prevent >>>>>> covert communications. >>>>>> >>>>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>>>> may be set in the first place? I might need to add a variant per >>>>>> watch-type. >>>>>> >>>>>> (?) Do I really need to keep track of the process creds in which an >>>>>> implicit object destruction happened? For example, imagine you create >>>>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>>>> someone looking at that fd through procfs at the same time as you exit >>>>>> due to an error. The LSM sees the destruction notification come from >>>>>> the looker if they happen to do their fput() after yours. >>>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. >>>>> >>>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. >>>>> >>>>> I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. >>>> I cannot agree. There is an explicit action by a subject that results >>>> in information being delivered to an object. Just like a signal or a >>>> UDP packet delivery. Smack handles this kind of thing just fine. The >>>> internal mechanism that results in the access is irrelevant from >>>> this viewpoint. I can understand how a mechanism like SELinux that >>>> works on finer granularity might view it differently. >>> I think you really need to give an example of a coherent policy that >>> needs this. >> I keep telling you, and you keep ignoring what I say. >> >>> As it stands, your analogy seems confusing. >> It's pretty simple. I have given both the abstract >> and examples. > You gave the /dev/null example, which is inapplicable to this patchset. That addressed an explicit objection, and pointed out an exception to a generality you had asserted, which was not true. It's also a red herring regarding the current discussion. >>> If someone >>> changes the system clock, we don't restrict who is allowed to be >>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock >>> was changed based on who changed the clock. >> That's right. The system clock is not an object that >> unprivileged processes can modify. In fact, it is not >> an object at all. If you care to look, you will see that >> Smack does nothing with the clock. > And this is different from the mount tree how? The mount tree can be modified by unprivileged users. If nothing that unprivileged users can do to the mount tree can trigger a notification you are correct, the mount tree is very like the system clock. Is that the case? >>> Similarly, if someone >>> tries to receive a packet on a socket, we check whether they have the >>> right to receive on that socket (from the endpoint in question) and, >>> if the sender is local, whether the sender can send to that socket. >>> We do not check whether the sender can send to the receiver. >> Bzzzt! Smack sure does. > This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. Process A sends a packet to process B. If A has access to TopSecret data and B is not allowed to see TopSecret data, the delivery should be prevented. Is that nonsensical? >>> The signal example is inapplicable. >> From a modeling viewpoint the actions are identical. > This seems incorrect to me What would be correct then? Some convoluted combination of system entities that aren't owned or controlled by any mechanism? > and, I think, to most everyone else reading this. That's quite the assertion. You may even be correct. > Can you explain? > > In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. YES!!!!!!!!!!!! And when a process triggers a notification it is the subject and the watching process is the object! Subject == active entity Object == passive entity Triggering an event is, like calling kill(), an action!
On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > >>> I think you really need to give an example of a coherent policy that > >>> needs this. > >> I keep telling you, and you keep ignoring what I say. > >> > >>> As it stands, your analogy seems confusing. > >> It's pretty simple. I have given both the abstract > >> and examples. > > You gave the /dev/null example, which is inapplicable to this patchset. > > That addressed an explicit objection, and pointed out > an exception to a generality you had asserted, which was > not true. It's also a red herring regarding the current > discussion. This argument is pointless. Please humor me and just give me an example. If you think you have already done so, feel free to repeat yourself. If you have no example, then please just say so. > > >>> If someone > >>> changes the system clock, we don't restrict who is allowed to be > >>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock > >>> was changed based on who changed the clock. > >> That's right. The system clock is not an object that > >> unprivileged processes can modify. In fact, it is not > >> an object at all. If you care to look, you will see that > >> Smack does nothing with the clock. > > And this is different from the mount tree how? > > The mount tree can be modified by unprivileged users. > If nothing that unprivileged users can do to the mount > tree can trigger a notification you are correct, the > mount tree is very like the system clock. Is that the > case? The mount tree can't be modified by unprivileged users, unless a privileged user very carefully configured it as such. An unprivileged user can create a new userns and a new mount ns, but then they're modifying a whole different mount tree. > > >>> Similarly, if someone > >>> tries to receive a packet on a socket, we check whether they have the > >>> right to receive on that socket (from the endpoint in question) and, > >>> if the sender is local, whether the sender can send to that socket. > >>> We do not check whether the sender can send to the receiver. > >> Bzzzt! Smack sure does. > > This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. > > Process A sends a packet to process B. > If A has access to TopSecret data and B is not > allowed to see TopSecret data, the delivery should > be prevented. Is that nonsensical? It makes sense. As I see it, the way that a sensible policy should do this is by making sure that there are no sockets, pipes, etc that Process A can write and that Process B can read. If you really want to prevent a malicious process with TopSecret data from sending it to a different process, then you can't use Linux on x86 or ARM. Maybe that will be fixed some day, but you're going to need to use an extremely tight sandbox to make this work. > > >>> The signal example is inapplicable. > >> From a modeling viewpoint the actions are identical. > > This seems incorrect to me > > What would be correct then? Some convoluted combination > of system entities that aren't owned or controlled by > any mechanism? > POSIX signal restrictions aren't there to prevent two processes from communicating. They're there to prevent the sender from manipulating or crashing the receiver without appropriate privilege. > > and, I think, to most everyone else reading this. > > That's quite the assertion. You may even be correct. > > > Can you explain? > > > > In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. > > YES!!!!!!!!!!!! > > And when a process triggers a notification it is the subject > and the watching process is the object! > > Subject == active entity > Object == passive entity > > Triggering an event is, like calling kill(), an action! > And here is where I disagree with your interpretation. Triggering an event is a side effect of writing to the file. There are *two* security relevant actions, not one, and they are: First, the write: Subject == the writer Action == write Object == the file Then the event, which could be modeled in a couple of ways: Subject == the file Action == notify Object == the recipient or Subject == the recipient Action == watch Object == the file By conflating these two actions into one, you've made the modeling very hard, and you start running into all these nasty questions like "who actually closed this open file"
On 6/10/2019 12:53 PM, Andy Lutomirski wrote: > On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> I think you really need to give an example of a coherent policy that >>>>> needs this. >>>> I keep telling you, and you keep ignoring what I say. >>>> >>>>> As it stands, your analogy seems confusing. >>>> It's pretty simple. I have given both the abstract >>>> and examples. >>> You gave the /dev/null example, which is inapplicable to this patchset. >> That addressed an explicit objection, and pointed out >> an exception to a generality you had asserted, which was >> not true. It's also a red herring regarding the current >> discussion. > This argument is pointless. > > Please humor me and just give me an example. If you think you have > already done so, feel free to repeat yourself. If you have no > example, then please just say so. To repeat the /dev/null example: Process A and process B both open /dev/null. A and B can write and read to their hearts content to/from /dev/null without ever once communicating. The mutual accessibility of /dev/null in no way implies that A and B can communicate. If A can set a watch on /dev/null, and B triggers an event, there still has to be an access check on the delivery of the event because delivering an event to A is not an action on /dev/null, but on A. > >>>>> If someone >>>>> changes the system clock, we don't restrict who is allowed to be >>>>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock >>>>> was changed based on who changed the clock. >>>> That's right. The system clock is not an object that >>>> unprivileged processes can modify. In fact, it is not >>>> an object at all. If you care to look, you will see that >>>> Smack does nothing with the clock. >>> And this is different from the mount tree how? >> The mount tree can be modified by unprivileged users. >> If nothing that unprivileged users can do to the mount >> tree can trigger a notification you are correct, the >> mount tree is very like the system clock. Is that the >> case? > The mount tree can't be modified by unprivileged users, unless a > privileged user very carefully configured it as such. "Unless" means *is* possible. In which case access control is required. I will admit to being less then expert on the extent to which mounts can be done without privilege. > An unprivileged > user can create a new userns and a new mount ns, but then they're > modifying a whole different mount tree. Within those namespaces you can still have multiple users, constrained be system access control policy. > >>>>> Similarly, if someone >>>>> tries to receive a packet on a socket, we check whether they have the >>>>> right to receive on that socket (from the endpoint in question) and, >>>>> if the sender is local, whether the sender can send to that socket. >>>>> We do not check whether the sender can send to the receiver. >>>> Bzzzt! Smack sure does. >>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. >> Process A sends a packet to process B. >> If A has access to TopSecret data and B is not >> allowed to see TopSecret data, the delivery should >> be prevented. Is that nonsensical? > It makes sense. As I see it, the way that a sensible policy should do > this is by making sure that there are no sockets, pipes, etc that > Process A can write and that Process B can read. You can't explain UDP controls without doing the access check on packet delivery. The sendmsg() succeeds when the packet leaves the sender. There doesn't even have to be a socket bound to the port. The only opportunity you have for control is on packet delivery, which is the only point at which you can have the information required. > If you really want to prevent a malicious process with TopSecret data > from sending it to a different process, then you can't use Linux on > x86 or ARM. Maybe that will be fixed some day, but you're going to > need to use an extremely tight sandbox to make this work. I won't be commenting on that. > >>>>> The signal example is inapplicable. >>>> From a modeling viewpoint the actions are identical. >>> This seems incorrect to me >> What would be correct then? Some convoluted combination >> of system entities that aren't owned or controlled by >> any mechanism? >> > POSIX signal restrictions aren't there to prevent two processes from > communicating. They're there to prevent the sender from manipulating > or crashing the receiver without appropriate privilege. POSIX signal restrictions have a long history. In the P10031e/2c debates both communication and manipulation where seriously considered. I would say both are true. >>> and, I think, to most everyone else reading this. >> That's quite the assertion. You may even be correct. >> >>> Can you explain? >>> >>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. >> YES!!!!!!!!!!!! >> >> And when a process triggers a notification it is the subject >> and the watching process is the object! >> >> Subject == active entity >> Object == passive entity >> >> Triggering an event is, like calling kill(), an action! >> > And here is where I disagree with your interpretation. Triggering an > event is a side effect of writing to the file. There are *two* > security relevant actions, not one, and they are: > > First, the write: > > Subject == the writer > Action == write > Object == the file > > Then the event, which could be modeled in a couple of ways: > > Subject == the file Files are not subjects. They are passive entities. > Action == notify > Object == the recipient > > or > > Subject == the recipient > Action == watch > Object == the file > > By conflating these two actions into one, you've made the modeling > very hard, and you start running into all these nasty questions like > "who actually closed this open file" No, I've made the code more difficult. You can not call the file a subject. That is just wrong. It's not a valid model.
Casey Schaufler <casey@schaufler-ca.com> wrote: > Process A and process B both open /dev/null. > A and B can write and read to their hearts content > to/from /dev/null without ever once communicating. > The mutual accessibility of /dev/null in no way implies that > A and B can communicate. If A can set a watch on /dev/null, > and B triggers an event, there still has to be an access > check on the delivery of the event because delivering an event > to A is not an action on /dev/null, but on A. If a process has the privilege, it appears that fanotify() allows that process to see others accessing /dev/null (FAN_ACCESS, FAN_ACCESS_PERM). There don't seem to be any LSM checks there either. On the other hand, the privilege required is CAP_SYS_ADMIN, > > The mount tree can't be modified by unprivileged users, unless a > > privileged user very carefully configured it as such. > > "Unless" means *is* possible. In which case access control is > required. I will admit to being less then expert on the extent > to which mounts can be done without privilege. Automounts in network filesystems, for example. The initial mount of the network filesystem requires local privilege, but then mountpoints are managed with remote privilege as granted by things like kerberos tickets. The local kernel has no control. If you have CONFIG_AFS_FS enabled in your kernel, for example, and you install the keyutils package (dnf, rpm, apt, etc.), then you should be able to do: mount -t afs none /mnt -o dyn ls /afs/grand.central.org/software/ for example. That will go through a couple of automount points. Assuming you don't have a kerberos login on those servers, however, you shouldn't be able to add new mountpoints. Someone watching the mount topology can see events when an automount is enacted and when it expires, the latter being an event with the system as the subject since the expiry is done on a timeout set by the kernel. David
> On Jun 10, 2019, at 2:25 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >> On 6/10/2019 12:53 PM, Andy Lutomirski wrote: >> On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>> I think you really need to give an example of a coherent policy that >>>>>> needs this. >>>>> I keep telling you, and you keep ignoring what I say. >>>>> >>>>>> As it stands, your analogy seems confusing. >>>>> It's pretty simple. I have given both the abstract >>>>> and examples. >>>> You gave the /dev/null example, which is inapplicable to this patchset. >>> That addressed an explicit objection, and pointed out >>> an exception to a generality you had asserted, which was >>> not true. It's also a red herring regarding the current >>> discussion. >> This argument is pointless. >> >> Please humor me and just give me an example. If you think you have >> already done so, feel free to repeat yourself. If you have no >> example, then please just say so. > > To repeat the /dev/null example: > > Process A and process B both open /dev/null. > A and B can write and read to their hearts content > to/from /dev/null without ever once communicating. > The mutual accessibility of /dev/null in no way implies that > A and B can communicate. If A can set a watch on /dev/null, > and B triggers an event, there still has to be an access > check on the delivery of the event because delivering an event > to A is not an action on /dev/null, but on A. > At discussed, this is an irrelevant straw man. This patch series does not produce events when this happens. I’m looking for a relevant example, please. > > >> An unprivileged >> user can create a new userns and a new mount ns, but then they're >> modifying a whole different mount tree. > > Within those namespaces you can still have multiple users, > constrained be system access control policy. And the one doing the mounting will be constrained by MAC and DAC policy, as always. The namespace creator is, from the perspective of those processes, admin. > >> >>>>>> Similarly, if someone >>>>>> tries to receive a packet on a socket, we check whether they have the >>>>>> right to receive on that socket (from the endpoint in question) and, >>>>>> if the sender is local, whether the sender can send to that socket. >>>>>> We do not check whether the sender can send to the receiver. >>>>> Bzzzt! Smack sure does. >>>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. >>> Process A sends a packet to process B. >>> If A has access to TopSecret data and B is not >>> allowed to see TopSecret data, the delivery should >>> be prevented. Is that nonsensical? >> It makes sense. As I see it, the way that a sensible policy should do >> this is by making sure that there are no sockets, pipes, etc that >> Process A can write and that Process B can read. > > You can't explain UDP controls without doing the access check > on packet delivery. The sendmsg() succeeds when the packet leaves > the sender. There doesn't even have to be a socket bound to the > port. The only opportunity you have for control is on packet > delivery, which is the only point at which you can have the > information required. Huh? You sendmsg() from an address to an address. My point is that, for most purposes, that’s all the information that’s needed. > >> If you really want to prevent a malicious process with TopSecret data >> from sending it to a different process, then you can't use Linux on >> x86 or ARM. Maybe that will be fixed some day, but you're going to >> need to use an extremely tight sandbox to make this work. > > I won't be commenting on that. Then why is preventing this is an absolute requirement? It’s unattainable. > >> >>>>>> The signal example is inapplicable. >>>>> From a modeling viewpoint the actions are identical. >>>> This seems incorrect to me >>> What would be correct then? Some convoluted combination >>> of system entities that aren't owned or controlled by >>> any mechanism? >>> >> POSIX signal restrictions aren't there to prevent two processes from >> communicating. They're there to prevent the sender from manipulating >> or crashing the receiver without appropriate privilege. > > POSIX signal restrictions have a long history. In the P10031e/2c > debates both communication and manipulation where seriously > considered. I would say both are true. > >>>> and, I think, to most everyone else reading this. >>> That's quite the assertion. You may even be correct. >>> >>>> Can you explain? >>>> >>>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. >>> YES!!!!!!!!!!!! >>> >>> And when a process triggers a notification it is the subject >>> and the watching process is the object! >>> >>> Subject == active entity >>> Object == passive entity >>> >>> Triggering an event is, like calling kill(), an action! >>> >> And here is where I disagree with your interpretation. Triggering an >> event is a side effect of writing to the file. There are *two* >> security relevant actions, not one, and they are: >> >> First, the write: >> >> Subject == the writer >> Action == write >> Object == the file >> >> Then the event, which could be modeled in a couple of ways: >> >> Subject == the file > > Files are not subjects. They are passive entities. > >> Action == notify >> Object == the recipient Great. Then use the variant below. >> >> or >> >> Subject == the recipient >> Action == watch >> Object == the file >> >> By conflating these two actions into one, you've made the modeling >> very hard, and you start running into all these nasty questions like >> "who actually closed this open file" > > No, I've made the code more difficult. > You can not call > the file a subject. That is just wrong. It's not a valid > model. You’ve ignored the “Action == watch” variant. Do you care to comment?
On 6/10/19 8:13 PM, Andy Lutomirski wrote: > > >> On Jun 10, 2019, at 2:25 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> >>> On 6/10/2019 12:53 PM, Andy Lutomirski wrote: >>> On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>> I think you really need to give an example of a coherent policy that >>>>>>> needs this. >>>>>> I keep telling you, and you keep ignoring what I say. >>>>>> >>>>>>> As it stands, your analogy seems confusing. >>>>>> It's pretty simple. I have given both the abstract >>>>>> and examples. >>>>> You gave the /dev/null example, which is inapplicable to this patchset. >>>> That addressed an explicit objection, and pointed out >>>> an exception to a generality you had asserted, which was >>>> not true. It's also a red herring regarding the current >>>> discussion. >>> This argument is pointless. >>> >>> Please humor me and just give me an example. If you think you have >>> already done so, feel free to repeat yourself. If you have no >>> example, then please just say so. >> >> To repeat the /dev/null example: >> >> Process A and process B both open /dev/null. >> A and B can write and read to their hearts content >> to/from /dev/null without ever once communicating. >> The mutual accessibility of /dev/null in no way implies that >> A and B can communicate. If A can set a watch on /dev/null, >> and B triggers an event, there still has to be an access >> check on the delivery of the event because delivering an event >> to A is not an action on /dev/null, but on A. >> > > At discussed, this is an irrelevant straw man. This patch series does not produce events when this happens. I’m looking for a relevant example, please. >> >> >>> An unprivileged >>> user can create a new userns and a new mount ns, but then they're >>> modifying a whole different mount tree. >> >> Within those namespaces you can still have multiple users, >> constrained be system access control policy. > > And the one doing the mounting will be constrained by MAC and DAC policy, as always. The namespace creator is, from the perspective of those processes, admin. > >> >>> >>>>>>> Similarly, if someone >>>>>>> tries to receive a packet on a socket, we check whether they have the >>>>>>> right to receive on that socket (from the endpoint in question) and, >>>>>>> if the sender is local, whether the sender can send to that socket. >>>>>>> We do not check whether the sender can send to the receiver. >>>>>> Bzzzt! Smack sure does. >>>>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. >>>> Process A sends a packet to process B. >>>> If A has access to TopSecret data and B is not >>>> allowed to see TopSecret data, the delivery should >>>> be prevented. Is that nonsensical? >>> It makes sense. As I see it, the way that a sensible policy should do >>> this is by making sure that there are no sockets, pipes, etc that >>> Process A can write and that Process B can read. >> >> You can't explain UDP controls without doing the access check >> on packet delivery. The sendmsg() succeeds when the packet leaves >> the sender. There doesn't even have to be a socket bound to the >> port. The only opportunity you have for control is on packet >> delivery, which is the only point at which you can have the >> information required. > > Huh? You sendmsg() from an address to an address. My point is that, for most purposes, that’s all the information that’s needed. > >> >>> If you really want to prevent a malicious process with TopSecret data >>> from sending it to a different process, then you can't use Linux on >>> x86 or ARM. Maybe that will be fixed some day, but you're going to >>> need to use an extremely tight sandbox to make this work. >> >> I won't be commenting on that. > > Then why is preventing this is an absolute requirement? It’s unattainable. > >> >>> >>>>>>> The signal example is inapplicable. >>>>>> From a modeling viewpoint the actions are identical. >>>>> This seems incorrect to me >>>> What would be correct then? Some convoluted combination >>>> of system entities that aren't owned or controlled by >>>> any mechanism? >>>> >>> POSIX signal restrictions aren't there to prevent two processes from >>> communicating. They're there to prevent the sender from manipulating >>> or crashing the receiver without appropriate privilege. >> >> POSIX signal restrictions have a long history. In the P10031e/2c >> debates both communication and manipulation where seriously >> considered. I would say both are true. >> >>>>> and, I think, to most everyone else reading this. >>>> That's quite the assertion. You may even be correct. >>>> >>>>> Can you explain? >>>>> >>>>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. >>>> YES!!!!!!!!!!!! >>>> >>>> And when a process triggers a notification it is the subject >>>> and the watching process is the object! >>>> >>>> Subject == active entity >>>> Object == passive entity >>>> >>>> Triggering an event is, like calling kill(), an action! >>>> >>> And here is where I disagree with your interpretation. Triggering an >>> event is a side effect of writing to the file. There are *two* >>> security relevant actions, not one, and they are: >>> >>> First, the write: >>> >>> Subject == the writer >>> Action == write >>> Object == the file >>> >>> Then the event, which could be modeled in a couple of ways: >>> >>> Subject == the file >> >> Files are not subjects. They are passive entities. >> >>> Action == notify >>> Object == the recipient > > Great. Then use the variant below. > >>> >>> or >>> >>> Subject == the recipient >>> Action == watch >>> Object == the file >>> >>> By conflating these two actions into one, you've made the modeling >>> very hard, and you start running into all these nasty questions like >>> "who actually closed this open file" >> >> No, I've made the code more difficult. >> You can not call >> the file a subject. That is just wrong. It's not a valid >> model. > > You’ve ignored the “Action == watch” variant. Do you care to comment? While I agree with this model in general, I will note two caveats when trying to apply this to watches/notifications: 1) The object on which the notification was triggered and the object on which the watch was placed are not necessarily the same and access to one might not imply access to the other, 2) If notifications can be triggered by read-like operations (as in fanotify, for example), then a "read" can be turned into a "write" flow through a notification. Whether or not these caveats are applicable to the notifications in this series I am not clear.
Stephen Smalley <sds@tycho.nsa.gov> wrote: > 2) If notifications can be triggered by read-like operations (as in fanotify, > for example), then a "read" can be turned into a "write" flow through a > notification. I don't think any of the things can be classed as "read-like" operations. At the moment, there are the following groups: (1) Addition of objects (eg. key_link, mount). (2) Modifications to things (eg. keyctl_write, remount). (3) Removal of objects (eg. key_unlink, unmount, fput+FMODE_NEED_UNMOUNT). (4) I/O or hardware errors (eg. USB device add/remove, EDQUOT, ENOSPC). I have not currently defined any access events. I've been looking at the possibility of having epoll generate events this way, but that's not as straightforward as I'd hoped and fanotify could potentially use it also, but in both those cases, the process is already getting the events currently by watching for them using synchronous waiting syscalls. Instead this would generate an event to say it had happened. David