Message ID | 155991702981.15579.6007568669839441045.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | Mount, FS, Block and Keyrings notifications [ver #4] | expand |
On 6/7/19 10:17 AM, David Howells wrote: > > Hi Al, > > Here's a set of patches to add a general variable-length notification queue > concept and to add sources of events for: > > (1) Mount topology events, such as mounting, unmounting, mount expiry, > mount reconfiguration. > > (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O > errors (not complete yet). > > (3) Key/keyring events, such as creating, linking and removal of keys. > > (4) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > One of the reasons for this is so that we can remove the issue of processes > having to repeatedly and regularly scan /proc/mounts, which has proven to > be a system performance problem. To further aid this, the fsinfo() syscall > on which this patch series depends, provides a way to access superblock and > mount information in binary form without the need to parse /proc/mounts. > > > LSM support is included, but controversial: > > (1) The creds of the process that did the fput() that reduced the refcount > to zero are cached in the file struct. > > (2) __fput() overrides the current creds with the creds from (1) whilst > doing the cleanup, thereby making sure that the creds seen by the > destruction notification generated by mntput() appears to come from > the last fputter. > > (3) security_post_notification() is called for each queue that we might > want to post a notification into, thereby allowing the LSM to prevent > covert communications. > > (?) Do I need to add security_set_watch(), say, to rule on whether a watch > may be set in the first place? I might need to add a variant per > watch-type. > > (?) Do I really need to keep track of the process creds in which an > implicit object destruction happened? For example, imagine you create > an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > refers to on close unless move_mount() clears that flag. Now, imagine > someone looking at that fd through procfs at the same time as you exit > due to an error. The LSM sees the destruction notification come from > the looker if they happen to do their fput() after yours. I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. > > > Design decisions: > > (1) A misc chardev is used to create and open a ring buffer: > > fd = open("/dev/watch_queue", O_RDWR); > > which is then configured and mmap'd into userspace: > > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > The fd cannot be read or written (though there is a facility to use > write to inject records for debugging) and userspace just pulls data > directly out of the buffer. > > (2) The ring index pointers are stored inside the ring and are thus > accessible to userspace. Userspace should only update the tail > pointer and never the head pointer or risk breaking the buffer. The > kernel checks that the pointers appear valid before trying to use > them. A 'skip' record is maintained around the pointers. > > (3) poll() can be used to wait for data to appear in the buffer. > > (4) Records in the buffer are binary, typed and have a length so that they > can be of varying size. > > This means that multiple heterogeneous sources can share a common > buffer. Tags may be specified when a watchpoint is created to help > distinguish the sources. > > (5) The queue is reusable as there are 16 million types available, of > which I've used 4, so there is scope for others to be used. > > (6) Records are filterable as types have up to 256 subtypes that can be > individually filtered. Other filtration is also available. > > (7) Each time the buffer is opened, a new buffer is created - this means > that there's no interference between watchers. > > (8) When recording a notification, the kernel will not sleep, but will > rather mark a queue as overrun if there's insufficient space, thereby > avoiding userspace causing the kernel to hang. > > (9) The 'watchpoint' should be specific where possible, meaning that you > specify the object that you want to watch. > > (10) The buffer is created and then watchpoints are attached to it, using > one of: > > keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); > mount_notify(AT_FDCWD, "/", 0, fd, 0x02); > sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03); > > where in all three cases, fd indicates the queue and the number after > is a tag between 0 and 255. > > (11) The watch must be removed if either the watch buffer is destroyed or > the watched object is destroyed. > > > Things I want to avoid: > > (1) Introducing features that make the core VFS dependent on the network > stack or networking namespaces (ie. usage of netlink). > > (2) Dumping all this stuff into dmesg and having a daemon that sits there > parsing the output and distributing it as this then puts the > responsibility for security into userspace and makes handling > namespaces tricky. Further, dmesg might not exist or might be > inaccessible inside a container. > > (3) Letting users see events they shouldn't be able to see. > > > Further things that could be considered: > > (1) Adding a keyctl call to allow a watch on a keyring to be extended to > "children" of that keyring, such that the watch is removed from the > child if it is unlinked from the keyring. > > (2) Adding global superblock event queue. > > (3) Propagating watches to child superblock over automounts. > > > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications > > Changes: > > v4: Split the basic UAPI bits out into their own patch and then split the > LSM hooks out into an intermediate patch. Add LSM hooks for setting > watches. > > Rename the *_notify() system calls to watch_*() for consistency. > > v3: I've added a USB notification source and reformulated the block > notification source so that there's now a common watch list, for which > the system call is now device_notify(). > > I've assigned a pair of unused ioctl numbers in the 'W' series to the > ioctls added by this series. > > I've also added a description of the kernel API to the documentation. > > v2: I've fixed various issues raised by Jann Horn and GregKH and moved to > krefs for refcounting. I've added some security features to try and > give Casey Schaufler the LSM control he wants. > > David > --- > David Howells (13): > security: Override creds in __fput() with last fputter's creds > uapi: General notification ring definitions > security: Add hooks to rule on setting a watch > security: Add a hook for the point of notification insertion > General notification queue with user mmap()'able ring buffer > keys: Add a notification facility > vfs: Add a mount-notification facility > vfs: Add superblock notifications > fsinfo: Export superblock notification counter > Add a general, global device notification watch list > block: Add block layer notifications > usb: Add USB subsystem notifications > Add sample notification program > > > Documentation/ioctl/ioctl-number.txt | 1 > Documentation/security/keys/core.rst | 58 ++ > Documentation/watch_queue.rst | 492 ++++++++++++++++++ > arch/x86/entry/syscalls/syscall_32.tbl | 3 > arch/x86/entry/syscalls/syscall_64.tbl | 3 > block/Kconfig | 9 > block/blk-core.c | 29 + > drivers/base/Kconfig | 9 > drivers/base/Makefile | 1 > drivers/base/watch.c | 89 +++ > drivers/misc/Kconfig | 13 > drivers/misc/Makefile | 1 > drivers/misc/watch_queue.c | 889 ++++++++++++++++++++++++++++++++ > drivers/usb/core/Kconfig | 10 > drivers/usb/core/devio.c | 55 ++ > drivers/usb/core/hub.c | 3 > fs/Kconfig | 21 + > fs/Makefile | 1 > fs/file_table.c | 12 > fs/fsinfo.c | 12 > fs/mount.h | 33 + > fs/mount_notify.c | 187 +++++++ > fs/namespace.c | 9 > fs/super.c | 122 ++++ > include/linux/blkdev.h | 15 + > include/linux/dcache.h | 1 > include/linux/device.h | 7 > include/linux/fs.h | 79 +++ > include/linux/key.h | 4 > include/linux/lsm_hooks.h | 48 ++ > include/linux/security.h | 35 + > include/linux/syscalls.h | 5 > include/linux/usb.h | 19 + > include/linux/watch_queue.h | 87 +++ > include/uapi/linux/fsinfo.h | 10 > include/uapi/linux/keyctl.h | 1 > include/uapi/linux/watch_queue.h | 213 ++++++++ > kernel/sys_ni.c | 7 > samples/Kconfig | 6 > samples/Makefile | 1 > samples/vfs/test-fsinfo.c | 13 > samples/watch_queue/Makefile | 9 > samples/watch_queue/watch_test.c | 308 +++++++++++ > security/keys/Kconfig | 10 > security/keys/compat.c | 2 > security/keys/gc.c | 5 > security/keys/internal.h | 30 + > security/keys/key.c | 37 + > security/keys/keyctl.c | 95 +++ > security/keys/keyring.c | 17 - > security/keys/request_key.c | 4 > security/security.c | 29 + > 52 files changed, 3121 insertions(+), 38 deletions(-) > create mode 100644 Documentation/watch_queue.rst > create mode 100644 drivers/base/watch.c > create mode 100644 drivers/misc/watch_queue.c > create mode 100644 fs/mount_notify.c > create mode 100644 include/linux/watch_queue.h > create mode 100644 include/uapi/linux/watch_queue.h > create mode 100644 samples/watch_queue/Makefile > create mode 100644 samples/watch_queue/watch_test.c >
On 6/10/2019 8:21 AM, Stephen Smalley wrote: > On 6/7/19 10:17 AM, David Howells wrote: >> >> Hi Al, >> >> Here's a set of patches to add a general variable-length notification queue >> concept and to add sources of events for: >> >> (1) Mount topology events, such as mounting, unmounting, mount expiry, >> mount reconfiguration. >> >> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >> errors (not complete yet). >> >> (3) Key/keyring events, such as creating, linking and removal of keys. >> >> (4) General device events (single common queue) including: >> >> - Block layer events, such as device errors >> >> - USB subsystem events, such as device/bus attach/remove, device >> reset, device errors. >> >> One of the reasons for this is so that we can remove the issue of processes >> having to repeatedly and regularly scan /proc/mounts, which has proven to >> be a system performance problem. To further aid this, the fsinfo() syscall >> on which this patch series depends, provides a way to access superblock and >> mount information in binary form without the need to parse /proc/mounts. >> >> >> LSM support is included, but controversial: >> >> (1) The creds of the process that did the fput() that reduced the refcount >> to zero are cached in the file struct. >> >> (2) __fput() overrides the current creds with the creds from (1) whilst >> doing the cleanup, thereby making sure that the creds seen by the >> destruction notification generated by mntput() appears to come from >> the last fputter. >> >> (3) security_post_notification() is called for each queue that we might >> want to post a notification into, thereby allowing the LSM to prevent >> covert communications. >> >> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >> may be set in the first place? I might need to add a variant per >> watch-type. >> >> (?) Do I really need to keep track of the process creds in which an >> implicit object destruction happened? For example, imagine you create >> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >> refers to on close unless move_mount() clears that flag. Now, imagine >> someone looking at that fd through procfs at the same time as you exit >> due to an error. The LSM sees the destruction notification come from >> the looker if they happen to do their fput() after yours. > > I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. > > For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. > > I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. I cannot agree. There is an explicit action by a subject that results in information being delivered to an object. Just like a signal or a UDP packet delivery. Smack handles this kind of thing just fine. The internal mechanism that results in the access is irrelevant from this viewpoint. I can understand how a mechanism like SELinux that works on finer granularity might view it differently. > >> >> >> Design decisions: >> >> (1) A misc chardev is used to create and open a ring buffer: >> >> fd = open("/dev/watch_queue", O_RDWR); >> >> which is then configured and mmap'd into userspace: >> >> ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); >> ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); >> buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, >> MAP_SHARED, fd, 0); >> >> The fd cannot be read or written (though there is a facility to use >> write to inject records for debugging) and userspace just pulls data >> directly out of the buffer. >> >> (2) The ring index pointers are stored inside the ring and are thus >> accessible to userspace. Userspace should only update the tail >> pointer and never the head pointer or risk breaking the buffer. The >> kernel checks that the pointers appear valid before trying to use >> them. A 'skip' record is maintained around the pointers. >> >> (3) poll() can be used to wait for data to appear in the buffer. >> >> (4) Records in the buffer are binary, typed and have a length so that they >> can be of varying size. >> >> This means that multiple heterogeneous sources can share a common >> buffer. Tags may be specified when a watchpoint is created to help >> distinguish the sources. >> >> (5) The queue is reusable as there are 16 million types available, of >> which I've used 4, so there is scope for others to be used. >> >> (6) Records are filterable as types have up to 256 subtypes that can be >> individually filtered. Other filtration is also available. >> >> (7) Each time the buffer is opened, a new buffer is created - this means >> that there's no interference between watchers. >> >> (8) When recording a notification, the kernel will not sleep, but will >> rather mark a queue as overrun if there's insufficient space, thereby >> avoiding userspace causing the kernel to hang. >> >> (9) The 'watchpoint' should be specific where possible, meaning that you >> specify the object that you want to watch. >> >> (10) The buffer is created and then watchpoints are attached to it, using >> one of: >> >> keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); >> mount_notify(AT_FDCWD, "/", 0, fd, 0x02); >> sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03); >> >> where in all three cases, fd indicates the queue and the number after >> is a tag between 0 and 255. >> >> (11) The watch must be removed if either the watch buffer is destroyed or >> the watched object is destroyed. >> >> >> Things I want to avoid: >> >> (1) Introducing features that make the core VFS dependent on the network >> stack or networking namespaces (ie. usage of netlink). >> >> (2) Dumping all this stuff into dmesg and having a daemon that sits there >> parsing the output and distributing it as this then puts the >> responsibility for security into userspace and makes handling >> namespaces tricky. Further, dmesg might not exist or might be >> inaccessible inside a container. >> >> (3) Letting users see events they shouldn't be able to see. >> >> >> Further things that could be considered: >> >> (1) Adding a keyctl call to allow a watch on a keyring to be extended to >> "children" of that keyring, such that the watch is removed from the >> child if it is unlinked from the keyring. >> >> (2) Adding global superblock event queue. >> >> (3) Propagating watches to child superblock over automounts. >> >> >> The patches can be found here also: >> >> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications >> >> Changes: >> >> v4: Split the basic UAPI bits out into their own patch and then split the >> LSM hooks out into an intermediate patch. Add LSM hooks for setting >> watches. >> >> Rename the *_notify() system calls to watch_*() for consistency. >> >> v3: I've added a USB notification source and reformulated the block >> notification source so that there's now a common watch list, for which >> the system call is now device_notify(). >> >> I've assigned a pair of unused ioctl numbers in the 'W' series to the >> ioctls added by this series. >> >> I've also added a description of the kernel API to the documentation. >> >> v2: I've fixed various issues raised by Jann Horn and GregKH and moved to >> krefs for refcounting. I've added some security features to try and >> give Casey Schaufler the LSM control he wants. >> >> David >> --- >> David Howells (13): >> security: Override creds in __fput() with last fputter's creds >> uapi: General notification ring definitions >> security: Add hooks to rule on setting a watch >> security: Add a hook for the point of notification insertion >> General notification queue with user mmap()'able ring buffer >> keys: Add a notification facility >> vfs: Add a mount-notification facility >> vfs: Add superblock notifications >> fsinfo: Export superblock notification counter >> Add a general, global device notification watch list >> block: Add block layer notifications >> usb: Add USB subsystem notifications >> Add sample notification program >> >> >> Documentation/ioctl/ioctl-number.txt | 1 >> Documentation/security/keys/core.rst | 58 ++ >> Documentation/watch_queue.rst | 492 ++++++++++++++++++ >> arch/x86/entry/syscalls/syscall_32.tbl | 3 >> arch/x86/entry/syscalls/syscall_64.tbl | 3 >> block/Kconfig | 9 >> block/blk-core.c | 29 + >> drivers/base/Kconfig | 9 >> drivers/base/Makefile | 1 >> drivers/base/watch.c | 89 +++ >> drivers/misc/Kconfig | 13 >> drivers/misc/Makefile | 1 >> drivers/misc/watch_queue.c | 889 ++++++++++++++++++++++++++++++++ >> drivers/usb/core/Kconfig | 10 >> drivers/usb/core/devio.c | 55 ++ >> drivers/usb/core/hub.c | 3 >> fs/Kconfig | 21 + >> fs/Makefile | 1 >> fs/file_table.c | 12 >> fs/fsinfo.c | 12 >> fs/mount.h | 33 + >> fs/mount_notify.c | 187 +++++++ >> fs/namespace.c | 9 >> fs/super.c | 122 ++++ >> include/linux/blkdev.h | 15 + >> include/linux/dcache.h | 1 >> include/linux/device.h | 7 >> include/linux/fs.h | 79 +++ >> include/linux/key.h | 4 >> include/linux/lsm_hooks.h | 48 ++ >> include/linux/security.h | 35 + >> include/linux/syscalls.h | 5 >> include/linux/usb.h | 19 + >> include/linux/watch_queue.h | 87 +++ >> include/uapi/linux/fsinfo.h | 10 >> include/uapi/linux/keyctl.h | 1 >> include/uapi/linux/watch_queue.h | 213 ++++++++ >> kernel/sys_ni.c | 7 >> samples/Kconfig | 6 >> samples/Makefile | 1 >> samples/vfs/test-fsinfo.c | 13 >> samples/watch_queue/Makefile | 9 >> samples/watch_queue/watch_test.c | 308 +++++++++++ >> security/keys/Kconfig | 10 >> security/keys/compat.c | 2 >> security/keys/gc.c | 5 >> security/keys/internal.h | 30 + >> security/keys/key.c | 37 + >> security/keys/keyctl.c | 95 +++ >> security/keys/keyring.c | 17 - >> security/keys/request_key.c | 4 >> security/security.c | 29 + >> 52 files changed, 3121 insertions(+), 38 deletions(-) >> create mode 100644 Documentation/watch_queue.rst >> create mode 100644 drivers/base/watch.c >> create mode 100644 drivers/misc/watch_queue.c >> create mode 100644 fs/mount_notify.c >> create mode 100644 include/linux/watch_queue.h >> create mode 100644 include/uapi/linux/watch_queue.h >> create mode 100644 samples/watch_queue/Makefile >> create mode 100644 samples/watch_queue/watch_test.c >> >
On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 6/10/2019 8:21 AM, Stephen Smalley wrote: > > On 6/7/19 10:17 AM, David Howells wrote: > >> > >> Hi Al, > >> > >> Here's a set of patches to add a general variable-length notification queue > >> concept and to add sources of events for: > >> > >> (1) Mount topology events, such as mounting, unmounting, mount expiry, > >> mount reconfiguration. > >> > >> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O > >> errors (not complete yet). > >> > >> (3) Key/keyring events, such as creating, linking and removal of keys. > >> > >> (4) General device events (single common queue) including: > >> > >> - Block layer events, such as device errors > >> > >> - USB subsystem events, such as device/bus attach/remove, device > >> reset, device errors. > >> > >> One of the reasons for this is so that we can remove the issue of processes > >> having to repeatedly and regularly scan /proc/mounts, which has proven to > >> be a system performance problem. To further aid this, the fsinfo() syscall > >> on which this patch series depends, provides a way to access superblock and > >> mount information in binary form without the need to parse /proc/mounts. > >> > >> > >> LSM support is included, but controversial: > >> > >> (1) The creds of the process that did the fput() that reduced the refcount > >> to zero are cached in the file struct. > >> > >> (2) __fput() overrides the current creds with the creds from (1) whilst > >> doing the cleanup, thereby making sure that the creds seen by the > >> destruction notification generated by mntput() appears to come from > >> the last fputter. > >> > >> (3) security_post_notification() is called for each queue that we might > >> want to post a notification into, thereby allowing the LSM to prevent > >> covert communications. > >> > >> (?) Do I need to add security_set_watch(), say, to rule on whether a watch > >> may be set in the first place? I might need to add a variant per > >> watch-type. > >> > >> (?) Do I really need to keep track of the process creds in which an > >> implicit object destruction happened? For example, imagine you create > >> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > >> refers to on close unless move_mount() clears that flag. Now, imagine > >> someone looking at that fd through procfs at the same time as you exit > >> due to an error. The LSM sees the destruction notification come from > >> the looker if they happen to do their fput() after yours. > > > > I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. > > > > For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. > > > > I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. > > I cannot agree. There is an explicit action by a subject that results > in information being delivered to an object. Just like a signal or a > UDP packet delivery. Smack handles this kind of thing just fine. The > internal mechanism that results in the access is irrelevant from > this viewpoint. I can understand how a mechanism like SELinux that > works on finer granularity might view it differently. I think you really need to give an example of a coherent policy that needs this. As it stands, your analogy seems confusing. If someone changes the system clock, we don't restrict who is allowed to be notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock was changed based on who changed the clock. Similarly, if someone tries to receive a packet on a socket, we check whether they have the right to receive on that socket (from the endpoint in question) and, if the sender is local, whether the sender can send to that socket. We do not check whether the sender can send to the receiver. The signal example is inapplicable. Sending a signal to a process is an explicit action done to that process, and it can easily adversely affect the target. Of course it requires permission. --Andy
On 6/10/2019 9:42 AM, Andy Lutomirski wrote: > On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 6/10/2019 8:21 AM, Stephen Smalley wrote: >>> On 6/7/19 10:17 AM, David Howells wrote: >>>> Hi Al, >>>> >>>> Here's a set of patches to add a general variable-length notification queue >>>> concept and to add sources of events for: >>>> >>>> (1) Mount topology events, such as mounting, unmounting, mount expiry, >>>> mount reconfiguration. >>>> >>>> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >>>> errors (not complete yet). >>>> >>>> (3) Key/keyring events, such as creating, linking and removal of keys. >>>> >>>> (4) General device events (single common queue) including: >>>> >>>> - Block layer events, such as device errors >>>> >>>> - USB subsystem events, such as device/bus attach/remove, device >>>> reset, device errors. >>>> >>>> One of the reasons for this is so that we can remove the issue of processes >>>> having to repeatedly and regularly scan /proc/mounts, which has proven to >>>> be a system performance problem. To further aid this, the fsinfo() syscall >>>> on which this patch series depends, provides a way to access superblock and >>>> mount information in binary form without the need to parse /proc/mounts. >>>> >>>> >>>> LSM support is included, but controversial: >>>> >>>> (1) The creds of the process that did the fput() that reduced the refcount >>>> to zero are cached in the file struct. >>>> >>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>> doing the cleanup, thereby making sure that the creds seen by the >>>> destruction notification generated by mntput() appears to come from >>>> the last fputter. >>>> >>>> (3) security_post_notification() is called for each queue that we might >>>> want to post a notification into, thereby allowing the LSM to prevent >>>> covert communications. >>>> >>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>> may be set in the first place? I might need to add a variant per >>>> watch-type. >>>> >>>> (?) Do I really need to keep track of the process creds in which an >>>> implicit object destruction happened? For example, imagine you create >>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>> someone looking at that fd through procfs at the same time as you exit >>>> due to an error. The LSM sees the destruction notification come from >>>> the looker if they happen to do their fput() after yours. >>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. >>> >>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. >>> >>> I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. >> I cannot agree. There is an explicit action by a subject that results >> in information being delivered to an object. Just like a signal or a >> UDP packet delivery. Smack handles this kind of thing just fine. The >> internal mechanism that results in the access is irrelevant from >> this viewpoint. I can understand how a mechanism like SELinux that >> works on finer granularity might view it differently. > I think you really need to give an example of a coherent policy that > needs this. I keep telling you, and you keep ignoring what I say. > As it stands, your analogy seems confusing. It's pretty simple. I have given both the abstract and examples. > If someone > changes the system clock, we don't restrict who is allowed to be > notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock > was changed based on who changed the clock. That's right. The system clock is not an object that unprivileged processes can modify. In fact, it is not an object at all. If you care to look, you will see that Smack does nothing with the clock. > Similarly, if someone > tries to receive a packet on a socket, we check whether they have the > right to receive on that socket (from the endpoint in question) and, > if the sender is local, whether the sender can send to that socket. > We do not check whether the sender can send to the receiver. Bzzzt! Smack sure does. > The signal example is inapplicable. From a modeling viewpoint the actions are identical. > Sending a signal to a process is > an explicit action done to that process, and it can easily adversely > affect the target. Of course it requires permission. > > --Andy
> On Jun 10, 2019, at 11:01 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >> On 6/10/2019 9:42 AM, Andy Lutomirski wrote: >>> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> On 6/10/2019 8:21 AM, Stephen Smalley wrote: >>>>> On 6/7/19 10:17 AM, David Howells wrote: >>>>> Hi Al, >>>>> >>>>> Here's a set of patches to add a general variable-length notification queue >>>>> concept and to add sources of events for: >>>>> >>>>> (1) Mount topology events, such as mounting, unmounting, mount expiry, >>>>> mount reconfiguration. >>>>> >>>>> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >>>>> errors (not complete yet). >>>>> >>>>> (3) Key/keyring events, such as creating, linking and removal of keys. >>>>> >>>>> (4) General device events (single common queue) including: >>>>> >>>>> - Block layer events, such as device errors >>>>> >>>>> - USB subsystem events, such as device/bus attach/remove, device >>>>> reset, device errors. >>>>> >>>>> One of the reasons for this is so that we can remove the issue of processes >>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to >>>>> be a system performance problem. To further aid this, the fsinfo() syscall >>>>> on which this patch series depends, provides a way to access superblock and >>>>> mount information in binary form without the need to parse /proc/mounts. >>>>> >>>>> >>>>> LSM support is included, but controversial: >>>>> >>>>> (1) The creds of the process that did the fput() that reduced the refcount >>>>> to zero are cached in the file struct. >>>>> >>>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>>> doing the cleanup, thereby making sure that the creds seen by the >>>>> destruction notification generated by mntput() appears to come from >>>>> the last fputter. >>>>> >>>>> (3) security_post_notification() is called for each queue that we might >>>>> want to post a notification into, thereby allowing the LSM to prevent >>>>> covert communications. >>>>> >>>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>>> may be set in the first place? I might need to add a variant per >>>>> watch-type. >>>>> >>>>> (?) Do I really need to keep track of the process creds in which an >>>>> implicit object destruction happened? For example, imagine you create >>>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>>> someone looking at that fd through procfs at the same time as you exit >>>>> due to an error. The LSM sees the destruction notification come from >>>>> the looker if they happen to do their fput() after yours. >>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. >>>> >>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. >>>> >>>> I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. >>> I cannot agree. There is an explicit action by a subject that results >>> in information being delivered to an object. Just like a signal or a >>> UDP packet delivery. Smack handles this kind of thing just fine. The >>> internal mechanism that results in the access is irrelevant from >>> this viewpoint. I can understand how a mechanism like SELinux that >>> works on finer granularity might view it differently. >> I think you really need to give an example of a coherent policy that >> needs this. > > I keep telling you, and you keep ignoring what I say. > >> As it stands, your analogy seems confusing. > > It's pretty simple. I have given both the abstract > and examples. You gave the /dev/null example, which is inapplicable to this patchset. > >> If someone >> changes the system clock, we don't restrict who is allowed to be >> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock >> was changed based on who changed the clock. > > That's right. The system clock is not an object that > unprivileged processes can modify. In fact, it is not > an object at all. If you care to look, you will see that > Smack does nothing with the clock. And this is different from the mount tree how? > >> Similarly, if someone >> tries to receive a packet on a socket, we check whether they have the >> right to receive on that socket (from the endpoint in question) and, >> if the sender is local, whether the sender can send to that socket. >> We do not check whether the sender can send to the receiver. > > Bzzzt! Smack sure does. This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. > >> The signal example is inapplicable. > > From a modeling viewpoint the actions are identical. This seems incorrect to me and, I think, to most everyone else reading this. Can you explain? In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process.
On 6/10/2019 11:22 AM, Andy Lutomirski wrote: >> On Jun 10, 2019, at 11:01 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> >>> On 6/10/2019 9:42 AM, Andy Lutomirski wrote: >>>> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> On 6/10/2019 8:21 AM, Stephen Smalley wrote: >>>>>> On 6/7/19 10:17 AM, David Howells wrote: >>>>>> Hi Al, >>>>>> >>>>>> Here's a set of patches to add a general variable-length notification queue >>>>>> concept and to add sources of events for: >>>>>> >>>>>> (1) Mount topology events, such as mounting, unmounting, mount expiry, >>>>>> mount reconfiguration. >>>>>> >>>>>> (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O >>>>>> errors (not complete yet). >>>>>> >>>>>> (3) Key/keyring events, such as creating, linking and removal of keys. >>>>>> >>>>>> (4) General device events (single common queue) including: >>>>>> >>>>>> - Block layer events, such as device errors >>>>>> >>>>>> - USB subsystem events, such as device/bus attach/remove, device >>>>>> reset, device errors. >>>>>> >>>>>> One of the reasons for this is so that we can remove the issue of processes >>>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to >>>>>> be a system performance problem. To further aid this, the fsinfo() syscall >>>>>> on which this patch series depends, provides a way to access superblock and >>>>>> mount information in binary form without the need to parse /proc/mounts. >>>>>> >>>>>> >>>>>> LSM support is included, but controversial: >>>>>> >>>>>> (1) The creds of the process that did the fput() that reduced the refcount >>>>>> to zero are cached in the file struct. >>>>>> >>>>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>>>> doing the cleanup, thereby making sure that the creds seen by the >>>>>> destruction notification generated by mntput() appears to come from >>>>>> the last fputter. >>>>>> >>>>>> (3) security_post_notification() is called for each queue that we might >>>>>> want to post a notification into, thereby allowing the LSM to prevent >>>>>> covert communications. >>>>>> >>>>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>>>> may be set in the first place? I might need to add a variant per >>>>>> watch-type. >>>>>> >>>>>> (?) Do I really need to keep track of the process creds in which an >>>>>> implicit object destruction happened? For example, imagine you create >>>>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>>>> someone looking at that fd through procfs at the same time as you exit >>>>>> due to an error. The LSM sees the destruction notification come from >>>>>> the looker if they happen to do their fput() after yours. >>>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea. >>>>> >>>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type. >>>>> >>>>> I wouldn't expect SELinux to implement security_post_notification() at all. I can't see how one can construct a meaningful, stable policy for it. I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it. >>>> I cannot agree. There is an explicit action by a subject that results >>>> in information being delivered to an object. Just like a signal or a >>>> UDP packet delivery. Smack handles this kind of thing just fine. The >>>> internal mechanism that results in the access is irrelevant from >>>> this viewpoint. I can understand how a mechanism like SELinux that >>>> works on finer granularity might view it differently. >>> I think you really need to give an example of a coherent policy that >>> needs this. >> I keep telling you, and you keep ignoring what I say. >> >>> As it stands, your analogy seems confusing. >> It's pretty simple. I have given both the abstract >> and examples. > You gave the /dev/null example, which is inapplicable to this patchset. That addressed an explicit objection, and pointed out an exception to a generality you had asserted, which was not true. It's also a red herring regarding the current discussion. >>> If someone >>> changes the system clock, we don't restrict who is allowed to be >>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock >>> was changed based on who changed the clock. >> That's right. The system clock is not an object that >> unprivileged processes can modify. In fact, it is not >> an object at all. If you care to look, you will see that >> Smack does nothing with the clock. > And this is different from the mount tree how? The mount tree can be modified by unprivileged users. If nothing that unprivileged users can do to the mount tree can trigger a notification you are correct, the mount tree is very like the system clock. Is that the case? >>> Similarly, if someone >>> tries to receive a packet on a socket, we check whether they have the >>> right to receive on that socket (from the endpoint in question) and, >>> if the sender is local, whether the sender can send to that socket. >>> We do not check whether the sender can send to the receiver. >> Bzzzt! Smack sure does. > This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. Process A sends a packet to process B. If A has access to TopSecret data and B is not allowed to see TopSecret data, the delivery should be prevented. Is that nonsensical? >>> The signal example is inapplicable. >> From a modeling viewpoint the actions are identical. > This seems incorrect to me What would be correct then? Some convoluted combination of system entities that aren't owned or controlled by any mechanism? > and, I think, to most everyone else reading this. That's quite the assertion. You may even be correct. > Can you explain? > > In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. YES!!!!!!!!!!!! And when a process triggers a notification it is the subject and the watching process is the object! Subject == active entity Object == passive entity Triggering an event is, like calling kill(), an action!
On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > >>> I think you really need to give an example of a coherent policy that > >>> needs this. > >> I keep telling you, and you keep ignoring what I say. > >> > >>> As it stands, your analogy seems confusing. > >> It's pretty simple. I have given both the abstract > >> and examples. > > You gave the /dev/null example, which is inapplicable to this patchset. > > That addressed an explicit objection, and pointed out > an exception to a generality you had asserted, which was > not true. It's also a red herring regarding the current > discussion. This argument is pointless. Please humor me and just give me an example. If you think you have already done so, feel free to repeat yourself. If you have no example, then please just say so. > > >>> If someone > >>> changes the system clock, we don't restrict who is allowed to be > >>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock > >>> was changed based on who changed the clock. > >> That's right. The system clock is not an object that > >> unprivileged processes can modify. In fact, it is not > >> an object at all. If you care to look, you will see that > >> Smack does nothing with the clock. > > And this is different from the mount tree how? > > The mount tree can be modified by unprivileged users. > If nothing that unprivileged users can do to the mount > tree can trigger a notification you are correct, the > mount tree is very like the system clock. Is that the > case? The mount tree can't be modified by unprivileged users, unless a privileged user very carefully configured it as such. An unprivileged user can create a new userns and a new mount ns, but then they're modifying a whole different mount tree. > > >>> Similarly, if someone > >>> tries to receive a packet on a socket, we check whether they have the > >>> right to receive on that socket (from the endpoint in question) and, > >>> if the sender is local, whether the sender can send to that socket. > >>> We do not check whether the sender can send to the receiver. > >> Bzzzt! Smack sure does. > > This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. > > Process A sends a packet to process B. > If A has access to TopSecret data and B is not > allowed to see TopSecret data, the delivery should > be prevented. Is that nonsensical? It makes sense. As I see it, the way that a sensible policy should do this is by making sure that there are no sockets, pipes, etc that Process A can write and that Process B can read. If you really want to prevent a malicious process with TopSecret data from sending it to a different process, then you can't use Linux on x86 or ARM. Maybe that will be fixed some day, but you're going to need to use an extremely tight sandbox to make this work. > > >>> The signal example is inapplicable. > >> From a modeling viewpoint the actions are identical. > > This seems incorrect to me > > What would be correct then? Some convoluted combination > of system entities that aren't owned or controlled by > any mechanism? > POSIX signal restrictions aren't there to prevent two processes from communicating. They're there to prevent the sender from manipulating or crashing the receiver without appropriate privilege. > > and, I think, to most everyone else reading this. > > That's quite the assertion. You may even be correct. > > > Can you explain? > > > > In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. > > YES!!!!!!!!!!!! > > And when a process triggers a notification it is the subject > and the watching process is the object! > > Subject == active entity > Object == passive entity > > Triggering an event is, like calling kill(), an action! > And here is where I disagree with your interpretation. Triggering an event is a side effect of writing to the file. There are *two* security relevant actions, not one, and they are: First, the write: Subject == the writer Action == write Object == the file Then the event, which could be modeled in a couple of ways: Subject == the file Action == notify Object == the recipient or Subject == the recipient Action == watch Object == the file By conflating these two actions into one, you've made the modeling very hard, and you start running into all these nasty questions like "who actually closed this open file"
On 6/10/2019 12:53 PM, Andy Lutomirski wrote: > On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> I think you really need to give an example of a coherent policy that >>>>> needs this. >>>> I keep telling you, and you keep ignoring what I say. >>>> >>>>> As it stands, your analogy seems confusing. >>>> It's pretty simple. I have given both the abstract >>>> and examples. >>> You gave the /dev/null example, which is inapplicable to this patchset. >> That addressed an explicit objection, and pointed out >> an exception to a generality you had asserted, which was >> not true. It's also a red herring regarding the current >> discussion. > This argument is pointless. > > Please humor me and just give me an example. If you think you have > already done so, feel free to repeat yourself. If you have no > example, then please just say so. To repeat the /dev/null example: Process A and process B both open /dev/null. A and B can write and read to their hearts content to/from /dev/null without ever once communicating. The mutual accessibility of /dev/null in no way implies that A and B can communicate. If A can set a watch on /dev/null, and B triggers an event, there still has to be an access check on the delivery of the event because delivering an event to A is not an action on /dev/null, but on A. > >>>>> If someone >>>>> changes the system clock, we don't restrict who is allowed to be >>>>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock >>>>> was changed based on who changed the clock. >>>> That's right. The system clock is not an object that >>>> unprivileged processes can modify. In fact, it is not >>>> an object at all. If you care to look, you will see that >>>> Smack does nothing with the clock. >>> And this is different from the mount tree how? >> The mount tree can be modified by unprivileged users. >> If nothing that unprivileged users can do to the mount >> tree can trigger a notification you are correct, the >> mount tree is very like the system clock. Is that the >> case? > The mount tree can't be modified by unprivileged users, unless a > privileged user very carefully configured it as such. "Unless" means *is* possible. In which case access control is required. I will admit to being less then expert on the extent to which mounts can be done without privilege. > An unprivileged > user can create a new userns and a new mount ns, but then they're > modifying a whole different mount tree. Within those namespaces you can still have multiple users, constrained be system access control policy. > >>>>> Similarly, if someone >>>>> tries to receive a packet on a socket, we check whether they have the >>>>> right to receive on that socket (from the endpoint in question) and, >>>>> if the sender is local, whether the sender can send to that socket. >>>>> We do not check whether the sender can send to the receiver. >>>> Bzzzt! Smack sure does. >>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. >> Process A sends a packet to process B. >> If A has access to TopSecret data and B is not >> allowed to see TopSecret data, the delivery should >> be prevented. Is that nonsensical? > It makes sense. As I see it, the way that a sensible policy should do > this is by making sure that there are no sockets, pipes, etc that > Process A can write and that Process B can read. You can't explain UDP controls without doing the access check on packet delivery. The sendmsg() succeeds when the packet leaves the sender. There doesn't even have to be a socket bound to the port. The only opportunity you have for control is on packet delivery, which is the only point at which you can have the information required. > If you really want to prevent a malicious process with TopSecret data > from sending it to a different process, then you can't use Linux on > x86 or ARM. Maybe that will be fixed some day, but you're going to > need to use an extremely tight sandbox to make this work. I won't be commenting on that. > >>>>> The signal example is inapplicable. >>>> From a modeling viewpoint the actions are identical. >>> This seems incorrect to me >> What would be correct then? Some convoluted combination >> of system entities that aren't owned or controlled by >> any mechanism? >> > POSIX signal restrictions aren't there to prevent two processes from > communicating. They're there to prevent the sender from manipulating > or crashing the receiver without appropriate privilege. POSIX signal restrictions have a long history. In the P10031e/2c debates both communication and manipulation where seriously considered. I would say both are true. >>> and, I think, to most everyone else reading this. >> That's quite the assertion. You may even be correct. >> >>> Can you explain? >>> >>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. >> YES!!!!!!!!!!!! >> >> And when a process triggers a notification it is the subject >> and the watching process is the object! >> >> Subject == active entity >> Object == passive entity >> >> Triggering an event is, like calling kill(), an action! >> > And here is where I disagree with your interpretation. Triggering an > event is a side effect of writing to the file. There are *two* > security relevant actions, not one, and they are: > > First, the write: > > Subject == the writer > Action == write > Object == the file > > Then the event, which could be modeled in a couple of ways: > > Subject == the file Files are not subjects. They are passive entities. > Action == notify > Object == the recipient > > or > > Subject == the recipient > Action == watch > Object == the file > > By conflating these two actions into one, you've made the modeling > very hard, and you start running into all these nasty questions like > "who actually closed this open file" No, I've made the code more difficult. You can not call the file a subject. That is just wrong. It's not a valid model.
Casey Schaufler <casey@schaufler-ca.com> wrote: > Process A and process B both open /dev/null. > A and B can write and read to their hearts content > to/from /dev/null without ever once communicating. > The mutual accessibility of /dev/null in no way implies that > A and B can communicate. If A can set a watch on /dev/null, > and B triggers an event, there still has to be an access > check on the delivery of the event because delivering an event > to A is not an action on /dev/null, but on A. If a process has the privilege, it appears that fanotify() allows that process to see others accessing /dev/null (FAN_ACCESS, FAN_ACCESS_PERM). There don't seem to be any LSM checks there either. On the other hand, the privilege required is CAP_SYS_ADMIN, > > The mount tree can't be modified by unprivileged users, unless a > > privileged user very carefully configured it as such. > > "Unless" means *is* possible. In which case access control is > required. I will admit to being less then expert on the extent > to which mounts can be done without privilege. Automounts in network filesystems, for example. The initial mount of the network filesystem requires local privilege, but then mountpoints are managed with remote privilege as granted by things like kerberos tickets. The local kernel has no control. If you have CONFIG_AFS_FS enabled in your kernel, for example, and you install the keyutils package (dnf, rpm, apt, etc.), then you should be able to do: mount -t afs none /mnt -o dyn ls /afs/grand.central.org/software/ for example. That will go through a couple of automount points. Assuming you don't have a kerberos login on those servers, however, you shouldn't be able to add new mountpoints. Someone watching the mount topology can see events when an automount is enacted and when it expires, the latter being an event with the system as the subject since the expiry is done on a timeout set by the kernel. David
> On Jun 10, 2019, at 2:25 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >> On 6/10/2019 12:53 PM, Andy Lutomirski wrote: >> On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>> I think you really need to give an example of a coherent policy that >>>>>> needs this. >>>>> I keep telling you, and you keep ignoring what I say. >>>>> >>>>>> As it stands, your analogy seems confusing. >>>>> It's pretty simple. I have given both the abstract >>>>> and examples. >>>> You gave the /dev/null example, which is inapplicable to this patchset. >>> That addressed an explicit objection, and pointed out >>> an exception to a generality you had asserted, which was >>> not true. It's also a red herring regarding the current >>> discussion. >> This argument is pointless. >> >> Please humor me and just give me an example. If you think you have >> already done so, feel free to repeat yourself. If you have no >> example, then please just say so. > > To repeat the /dev/null example: > > Process A and process B both open /dev/null. > A and B can write and read to their hearts content > to/from /dev/null without ever once communicating. > The mutual accessibility of /dev/null in no way implies that > A and B can communicate. If A can set a watch on /dev/null, > and B triggers an event, there still has to be an access > check on the delivery of the event because delivering an event > to A is not an action on /dev/null, but on A. > At discussed, this is an irrelevant straw man. This patch series does not produce events when this happens. I’m looking for a relevant example, please. > > >> An unprivileged >> user can create a new userns and a new mount ns, but then they're >> modifying a whole different mount tree. > > Within those namespaces you can still have multiple users, > constrained be system access control policy. And the one doing the mounting will be constrained by MAC and DAC policy, as always. The namespace creator is, from the perspective of those processes, admin. > >> >>>>>> Similarly, if someone >>>>>> tries to receive a packet on a socket, we check whether they have the >>>>>> right to receive on that socket (from the endpoint in question) and, >>>>>> if the sender is local, whether the sender can send to that socket. >>>>>> We do not check whether the sender can send to the receiver. >>>>> Bzzzt! Smack sure does. >>>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. >>> Process A sends a packet to process B. >>> If A has access to TopSecret data and B is not >>> allowed to see TopSecret data, the delivery should >>> be prevented. Is that nonsensical? >> It makes sense. As I see it, the way that a sensible policy should do >> this is by making sure that there are no sockets, pipes, etc that >> Process A can write and that Process B can read. > > You can't explain UDP controls without doing the access check > on packet delivery. The sendmsg() succeeds when the packet leaves > the sender. There doesn't even have to be a socket bound to the > port. The only opportunity you have for control is on packet > delivery, which is the only point at which you can have the > information required. Huh? You sendmsg() from an address to an address. My point is that, for most purposes, that’s all the information that’s needed. > >> If you really want to prevent a malicious process with TopSecret data >> from sending it to a different process, then you can't use Linux on >> x86 or ARM. Maybe that will be fixed some day, but you're going to >> need to use an extremely tight sandbox to make this work. > > I won't be commenting on that. Then why is preventing this is an absolute requirement? It’s unattainable. > >> >>>>>> The signal example is inapplicable. >>>>> From a modeling viewpoint the actions are identical. >>>> This seems incorrect to me >>> What would be correct then? Some convoluted combination >>> of system entities that aren't owned or controlled by >>> any mechanism? >>> >> POSIX signal restrictions aren't there to prevent two processes from >> communicating. They're there to prevent the sender from manipulating >> or crashing the receiver without appropriate privilege. > > POSIX signal restrictions have a long history. In the P10031e/2c > debates both communication and manipulation where seriously > considered. I would say both are true. > >>>> and, I think, to most everyone else reading this. >>> That's quite the assertion. You may even be correct. >>> >>>> Can you explain? >>>> >>>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. >>> YES!!!!!!!!!!!! >>> >>> And when a process triggers a notification it is the subject >>> and the watching process is the object! >>> >>> Subject == active entity >>> Object == passive entity >>> >>> Triggering an event is, like calling kill(), an action! >>> >> And here is where I disagree with your interpretation. Triggering an >> event is a side effect of writing to the file. There are *two* >> security relevant actions, not one, and they are: >> >> First, the write: >> >> Subject == the writer >> Action == write >> Object == the file >> >> Then the event, which could be modeled in a couple of ways: >> >> Subject == the file > > Files are not subjects. They are passive entities. > >> Action == notify >> Object == the recipient Great. Then use the variant below. >> >> or >> >> Subject == the recipient >> Action == watch >> Object == the file >> >> By conflating these two actions into one, you've made the modeling >> very hard, and you start running into all these nasty questions like >> "who actually closed this open file" > > No, I've made the code more difficult. > You can not call > the file a subject. That is just wrong. It's not a valid > model. You’ve ignored the “Action == watch” variant. Do you care to comment?
To see if we can try and make progress on this, can we try and come at this from another angle: what do LSMs *actually* need to do this? And I grant that each LSM might require different things. -~- [A] There are a bunch of things available, some of which may be coincident, depending on the context: (1) The creds of the process that created a watch_queue (ie. opened /dev/watch_queue). (2) The creds of the process that set a watch (ie. called watch_sb, KEYCTL_NOTIFY, ...); (3) The creds of the process that tripped the event (which might be the system). (4) The security attributes of the object on which the watch was set (uid, gid, mode, labels). (5) The security attributes of the object on which the event was tripped. (6) The security attributes of all the objects between the object in (5) and the object in (4), assuming we work from (5) towards (4) if the two aren't coincident (WATCH_INFO_RECURSIVE). At the moment, when post_one_notification() wants to write a notification into a queue, it calls security_post_notification() to ask if it should be allowed to do so. This is passed (1) and (3) above plus the notification record. [B] There are a number of places I can usefully potentially add hooks: (a) The point at which a watch queue is created (ie. /dev/watch_queue is opened). (b) The point at which a watch is set (ie. watch_sb). (c) The point at which a notification is generated (ie. an automount point is tripped). (d) The point at which a notification is delivered (ie. we write the message into the queue). (e) All the points at which we walk over an object in a chain from (c) to find the watch on which we can effect (d) (eg. we walk rootwards from a mountpoint to find watches on a branch in the mount topology). [C] Problems that need to be resolved: (x) Do I need to put a security pointer in struct watch for the active LSM to fill in? If so, I presume this would need passing to security_post_notification(). (y) What checks should be done on object destruction after final put and what contexts need to be supplied? This one is made all the harder because the creds that are in force when close(), exit(), exec(), dup2(), etc. close a file descriptor might need to be propagated to deferred-fput, which must in turn propagate them to af_unix-cleanup, and thence back to deferred-fput and thence to implicit unmount (dissolve_on_fput()[*]). [*] Though it should be noted that if this happens, the subtree cannot be attached to the root of a namespace. Further, if several processes are sharing a file object, it's not predictable as to which process the final notification will come from. (z) Do intermediate objects, say in a mount topology notification, actually need to be checked against the watcher's creds? For a mount topology notification, would this require calling inode_permission() for each intervening directory? Doing that might be impractical as it would probably have to be done outside of of the RCU read lock and the filesystem ->permission() hooks might want to sleep (to touch disk or talk to a server). David
On 6/10/19 8:13 PM, Andy Lutomirski wrote: > > >> On Jun 10, 2019, at 2:25 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> >>> On 6/10/2019 12:53 PM, Andy Lutomirski wrote: >>> On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>> I think you really need to give an example of a coherent policy that >>>>>>> needs this. >>>>>> I keep telling you, and you keep ignoring what I say. >>>>>> >>>>>>> As it stands, your analogy seems confusing. >>>>>> It's pretty simple. I have given both the abstract >>>>>> and examples. >>>>> You gave the /dev/null example, which is inapplicable to this patchset. >>>> That addressed an explicit objection, and pointed out >>>> an exception to a generality you had asserted, which was >>>> not true. It's also a red herring regarding the current >>>> discussion. >>> This argument is pointless. >>> >>> Please humor me and just give me an example. If you think you have >>> already done so, feel free to repeat yourself. If you have no >>> example, then please just say so. >> >> To repeat the /dev/null example: >> >> Process A and process B both open /dev/null. >> A and B can write and read to their hearts content >> to/from /dev/null without ever once communicating. >> The mutual accessibility of /dev/null in no way implies that >> A and B can communicate. If A can set a watch on /dev/null, >> and B triggers an event, there still has to be an access >> check on the delivery of the event because delivering an event >> to A is not an action on /dev/null, but on A. >> > > At discussed, this is an irrelevant straw man. This patch series does not produce events when this happens. I’m looking for a relevant example, please. >> >> >>> An unprivileged >>> user can create a new userns and a new mount ns, but then they're >>> modifying a whole different mount tree. >> >> Within those namespaces you can still have multiple users, >> constrained be system access control policy. > > And the one doing the mounting will be constrained by MAC and DAC policy, as always. The namespace creator is, from the perspective of those processes, admin. > >> >>> >>>>>>> Similarly, if someone >>>>>>> tries to receive a packet on a socket, we check whether they have the >>>>>>> right to receive on that socket (from the endpoint in question) and, >>>>>>> if the sender is local, whether the sender can send to that socket. >>>>>>> We do not check whether the sender can send to the receiver. >>>>>> Bzzzt! Smack sure does. >>>>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense. >>>> Process A sends a packet to process B. >>>> If A has access to TopSecret data and B is not >>>> allowed to see TopSecret data, the delivery should >>>> be prevented. Is that nonsensical? >>> It makes sense. As I see it, the way that a sensible policy should do >>> this is by making sure that there are no sockets, pipes, etc that >>> Process A can write and that Process B can read. >> >> You can't explain UDP controls without doing the access check >> on packet delivery. The sendmsg() succeeds when the packet leaves >> the sender. There doesn't even have to be a socket bound to the >> port. The only opportunity you have for control is on packet >> delivery, which is the only point at which you can have the >> information required. > > Huh? You sendmsg() from an address to an address. My point is that, for most purposes, that’s all the information that’s needed. > >> >>> If you really want to prevent a malicious process with TopSecret data >>> from sending it to a different process, then you can't use Linux on >>> x86 or ARM. Maybe that will be fixed some day, but you're going to >>> need to use an extremely tight sandbox to make this work. >> >> I won't be commenting on that. > > Then why is preventing this is an absolute requirement? It’s unattainable. > >> >>> >>>>>>> The signal example is inapplicable. >>>>>> From a modeling viewpoint the actions are identical. >>>>> This seems incorrect to me >>>> What would be correct then? Some convoluted combination >>>> of system entities that aren't owned or controlled by >>>> any mechanism? >>>> >>> POSIX signal restrictions aren't there to prevent two processes from >>> communicating. They're there to prevent the sender from manipulating >>> or crashing the receiver without appropriate privilege. >> >> POSIX signal restrictions have a long history. In the P10031e/2c >> debates both communication and manipulation where seriously >> considered. I would say both are true. >> >>>>> and, I think, to most everyone else reading this. >>>> That's quite the assertion. You may even be correct. >>>> >>>>> Can you explain? >>>>> >>>>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file. When you send a signal to a process, the object is the target process. >>>> YES!!!!!!!!!!!! >>>> >>>> And when a process triggers a notification it is the subject >>>> and the watching process is the object! >>>> >>>> Subject == active entity >>>> Object == passive entity >>>> >>>> Triggering an event is, like calling kill(), an action! >>>> >>> And here is where I disagree with your interpretation. Triggering an >>> event is a side effect of writing to the file. There are *two* >>> security relevant actions, not one, and they are: >>> >>> First, the write: >>> >>> Subject == the writer >>> Action == write >>> Object == the file >>> >>> Then the event, which could be modeled in a couple of ways: >>> >>> Subject == the file >> >> Files are not subjects. They are passive entities. >> >>> Action == notify >>> Object == the recipient > > Great. Then use the variant below. > >>> >>> or >>> >>> Subject == the recipient >>> Action == watch >>> Object == the file >>> >>> By conflating these two actions into one, you've made the modeling >>> very hard, and you start running into all these nasty questions like >>> "who actually closed this open file" >> >> No, I've made the code more difficult. >> You can not call >> the file a subject. That is just wrong. It's not a valid >> model. > > You’ve ignored the “Action == watch” variant. Do you care to comment? While I agree with this model in general, I will note two caveats when trying to apply this to watches/notifications: 1) The object on which the notification was triggered and the object on which the watch was placed are not necessarily the same and access to one might not imply access to the other, 2) If notifications can be triggered by read-like operations (as in fanotify, for example), then a "read" can be turned into a "write" flow through a notification. Whether or not these caveats are applicable to the notifications in this series I am not clear.
On 6/11/19 10:21 AM, David Howells wrote: > To see if we can try and make progress on this, can we try and come at this > from another angle: what do LSMs *actually* need to do this? And I grant that > each LSM might require different things. I think part of the problem here is that the discussion is too abstract and not dealing with the specifics of the notifications in question. Those details matter. > > -~- > > [A] There are a bunch of things available, some of which may be coincident, > depending on the context: > > (1) The creds of the process that created a watch_queue (ie. opened > /dev/watch_queue). These will be used when checking permissions to open /dev/watch_queue. > (2) The creds of the process that set a watch (ie. called watch_sb, > KEYCTL_NOTIFY, ...); These will be used when checking permissions to set a watch. > (3) The creds of the process that tripped the event (which might be the > system). These will be used when checking permission to perform whatever operation tripped the event (if the event is triggered by a userspace operation). > (4) The security attributes of the object on which the watch was set (uid, > gid, mode, labels). These will be used when checking permissions to set the watch. > (5) The security attributes of the object on which the event was tripped. These will be used when checking permission to perform whatever operation tripped the event. > (6) The security attributes of all the objects between the object in (5) and > the object in (4), assuming we work from (5) towards (4) if the two > aren't coincident (WATCH_INFO_RECURSIVE). Does this apply to anything other than mount notifications? And for mount notifications, isn't the notification actually for a change to the mount namespace, not a change to any file? Hence, the real "object" for events that trigger mount notifications is the mount namespace, right? The watched path is just a way of identifying a subtree of the mount namespace for notifications - it isn't the real object being watched. > At the moment, when post_one_notification() wants to write a notification into > a queue, it calls security_post_notification() to ask if it should be allowed > to do so. This is passed (1) and (3) above plus the notification record. Not convinced we need this. > [B] There are a number of places I can usefully potentially add hooks: > > (a) The point at which a watch queue is created (ie. /dev/watch_queue is > opened). Already covered by existing hooks on opening files. > (b) The point at which a watch is set (ie. watch_sb). Yes, this requires a hook and corresponding check. > (c) The point at which a notification is generated (ie. an automount point is > tripped). Preferably covered by existing hooks on object accesses that would generate notifications. > (d) The point at which a notification is delivered (ie. we write the message > into the queue). Preferably not needed. > (e) All the points at which we walk over an object in a chain from (c) to > find the watch on which we can effect (d) (eg. we walk rootwards from a > mountpoint to find watches on a branch in the mount topology). Not necessary if the real object of mount notifications is the mount namespace and if we do not support recursive notifications on e.g. directories or some other object where the two can truly diverge. > [C] Problems that need to be resolved: > > (x) Do I need to put a security pointer in struct watch for the active LSM to > fill in? If so, I presume this would need passing to > security_post_notification(). I don't see why or where it would get used. > (y) What checks should be done on object destruction after final put and what > contexts need to be supplied? IMHO, no. > > This one is made all the harder because the creds that are in force when > close(), exit(), exec(), dup2(), etc. close a file descriptor might need > to be propagated to deferred-fput, which must in turn propagate them to > af_unix-cleanup, and thence back to deferred-fput and thence to implicit > unmount (dissolve_on_fput()[*]). > > [*] Though it should be noted that if this happens, the subtree cannot be > attached to the root of a namespace. > > Further, if several processes are sharing a file object, it's not > predictable as to which process the final notification will come from. > > (z) Do intermediate objects, say in a mount topology notification, actually > need to be checked against the watcher's creds? For a mount topology > notification, would this require calling inode_permission() for each > intervening directory? I don't think so, because the real object is the mount namespace, not the individual directories. > > Doing that might be impractical as it would probably have to be done > outside of of the RCU read lock and the filesystem ->permission() hooks > might want to sleep (to touch disk or talk to a server).
On 6/11/2019 7:21 AM, David Howells wrote: > To see if we can try and make progress on this, can we try and come at this > from another angle: what do LSMs *actually* need to do this? And I grant that > each LSM might require different things. > > -~- > > [A] There are a bunch of things available, some of which may be coincident, > depending on the context: > > (1) The creds of the process that created a watch_queue (ie. opened > /dev/watch_queue). Smack needs this for the filesystem access required to open /dev/watch_queue. > (2) The creds of the process that set a watch (ie. called watch_sb, > KEYCTL_NOTIFY, ...); Smack needs this to set a watch on any named object (file, key, ...). Smack needs this as the object information for event delivery. > (3) The creds of the process that tripped the event (which might be the > system). Smack needs this as the subject information for the event delivery. > (4) The security attributes of the object on which the watch was set (uid, > gid, mode, labels). Smack needs this to set a watch on the named object (file, key, ...). I am going to say that if you can't access an object you can't watch it. I think that read access is sufficient provided that no one else can see what watches I've created. > (5) The security attributes of the object on which the event was tripped. Smack does not need these for the event mechanism as that object isn't involved in the event delivery, except as may be required by (4). > (6) The security attributes of all the objects between the object in (5) and > the object in (4), assuming we work from (5) towards (4) if the two > aren't coincident (WATCH_INFO_RECURSIVE). Smack needs these only as they would apply to (4). > At the moment, when post_one_notification() wants to write a notification into > a queue, it calls security_post_notification() to ask if it should be allowed > to do so. This is passed (1) and (3) above plus the notification record. Is "current" (2)? Smack needs (2) for the event delivery access check. > [B] There are a number of places I can usefully potentially add hooks: > > (a) The point at which a watch queue is created (ie. /dev/watch_queue is > opened). Smack would not need a new check as the filesystem checks should suffice. > (b) The point at which a watch is set (ie. watch_sb). Smack would need a check to ensure the watcher has access in cases where what is being watched is an object. > (c) The point at which a notification is generated (ie. an automount point is > tripped). Smack does not require an explicit check here. > (d) The point at which a notification is delivered (ie. we write the message > into the queue). Smack requires a check here. (2) as the object and (3) as the subject. > (e) All the points at which we walk over an object in a chain from (c) to > find the watch on which we can effect (d) (eg. we walk rootwards from a > mountpoint to find watches on a branch in the mount topology). Smack does not require anything beyond existing checks. > [C] Problems that need to be resolved: > > (x) Do I need to put a security pointer in struct watch for the active LSM to > fill in? If so, I presume this would need passing to > security_post_notification(). Smack does not need this. > (y) What checks should be done on object destruction after final put and what > contexts need to be supplied? Classically there is no such thing as filesystem object deletion. By making it possible to set a watch on that you've inadvertently added a security relevant action to the security model. :o > This one is made all the harder because the creds that are in force when > close(), exit(), exec(), dup2(), etc. close a file descriptor might need > to be propagated to deferred-fput, which must in turn propagate them to > af_unix-cleanup, and thence back to deferred-fput and thence to implicit > unmount (dissolve_on_fput()[*]). > > [*] Though it should be noted that if this happens, the subtree cannot be > attached to the root of a namespace. > > Further, if several processes are sharing a file object, it's not > predictable as to which process the final notification will come from. How about we don't add filesystem object deletion to the security model? If what you really care about is removal of last filesystem reference (last unlink) this shouldn't be any harder than any other watched change (e.g. chmod) to the object. If, on the other hand, you really want to watch for the last inode reference and actual destruction of the thing I will suggest an argument based on the model itself. If all of the directory entries are unlinked the object no longer exists in the filesystem namespace. If all of the fd's are closed (by whatever mechanism, we don't really care) it no longer exists in any process space. At that point it has no names, and is no longer a named object. No process (subject) deletes it. They can't. They don't have access to it without a name. The deletion is a system event, like setting the clock. There is no policy that says when or even if the destruction occurs. If and when the system gets around to cleaning up what has become nothing more than system resources any outstanding watches can be triggered using the system credential. > (z) Do intermediate objects, say in a mount topology notification, actually > need to be checked against the watcher's creds? For a mount topology > notification, would this require calling inode_permission() for each > intervening directory? Smack would not require this. Paths are an illusion. > Doing that might be impractical as it would probably have to be done > outside of of the RCU read lock and the filesystem ->permission() hooks > might want to sleep (to touch disk or talk to a server). > > David
Stephen Smalley <sds@tycho.nsa.gov> wrote: > 2) If notifications can be triggered by read-like operations (as in fanotify, > for example), then a "read" can be turned into a "write" flow through a > notification. I don't think any of the things can be classed as "read-like" operations. At the moment, there are the following groups: (1) Addition of objects (eg. key_link, mount). (2) Modifications to things (eg. keyctl_write, remount). (3) Removal of objects (eg. key_unlink, unmount, fput+FMODE_NEED_UNMOUNT). (4) I/O or hardware errors (eg. USB device add/remove, EDQUOT, ENOSPC). I have not currently defined any access events. I've been looking at the possibility of having epoll generate events this way, but that's not as straightforward as I'd hoped and fanotify could potentially use it also, but in both those cases, the process is already getting the events currently by watching for them using synchronous waiting syscalls. Instead this would generate an event to say it had happened. David
Stephen Smalley <sds@tycho.nsa.gov> wrote: > > (6) The security attributes of all the objects between the object in (5) > > and the object in (4), assuming we work from (5) towards (4) if the > > two aren't coincident (WATCH_INFO_RECURSIVE). > > Does this apply to anything other than mount notifications? Not at the moment. I'm considering making it such that you can make a watch on a keyring get automatically propagated to keys that get added to the keyring (and removed upon unlink) - the idea being that there is no 'single parent path' concept for a keyring as there is for a directory. I'm also pondering the idea of making it possible to have superblock watches automatically propagated to superblocks created by automount points on the watched superblock. > And for mount notifications, isn't the notification actually for a change to > the mount namespace, not a change to any file? Yes. > Hence, the real "object" for events that trigger mount notifications is the > mount namespace, right? Um... arguably. Would that mean that that would need a label from somewhere? > The watched path is just a way of identifying a subtree of the mount > namespace for notifications - it isn't the real object being watched. I like that argument. Thanks, David
Casey Schaufler <casey@schaufler-ca.com> wrote: > > (4) The security attributes of the object on which the watch was set (uid, > > gid, mode, labels). > > Smack needs this to set a watch on the named object (file, key, ...). > I am going to say that if you can't access an object you can't watch it. So for the things I've so far defined: (*) Keys/keyrings require View permission, but it could be Read permission instead - though that may get disabled if the key type does not support KEYCTL_READ. (*) Mount/superblock watches - I've made these require execute permission on the specified directory. Could be read permission instead. (*) Device events (block/usb) don't require any permissions, but currently only deliver hardware notifications. > I think that read access is sufficient provided that no one else can > see what watches I've created. You can't find out what watches exist. > > At the moment, when post_one_notification() wants to write a notification > > into a queue, it calls security_post_notification() to ask if it should be > > allowed to do so. This is passed (1) and (3) above plus the notification > > record. > > Is "current" (2)? Smack needs (2) for the event delivery access check. (2) was current_cred() when watch_sb(), KEYCTL_NOTIFY, etc. was called, but isn't necessarily current_cred() at the point post_one_notification() is called. The latter is called at the point the event is generated and current_cred() is the creds of whatever thread that is called in (which may be a work_queue thread if it got deferred). At the moment, I'm storing the creds of whoever opened the queue (ie. (1)) and using that, but I could cache the creds of whoever created each watch (ie. (2)) and pass that to post_one_notification() instead. However, it should be noted that (1) is the creds of the buffer owner. > > (e) All the points at which we walk over an object in a chain from (c) to > > find the watch on which we can effect (d) (eg. we walk rootwards from a > > mountpoint to find watches on a branch in the mount topology). > > Smack does not require anything beyond existing checks. I'm glad to hear that, as this might be sufficiently impractical as to render it unusable with Smack. Calling i_op->permissions() a lot would suck. > > (y) What checks should be done on object destruction after final put and > > what contexts need to be supplied? > > Classically there is no such thing as filesystem object deletion. > By making it possible to set a watch on that you've inadvertently > added a security relevant action to the security model. :o That wasn't my original intention - I intended fsmount(2) to mount directly as mount(2) does, but Al had other ideas. Now fsmount(2) produces a file descriptor referring to a new mount object that can be mounted into by move_mount(2) before being spliced into the mount topology by move_mount(2). However, if the fd is closed before the last step, close() will destroy the mount subtree attached to it (kind of quasi-unmount). That wouldn't be a problem, except that the fd from fsmount(2) can be used anywhere an O_PATH fd can be used - including watch_mount(2), fchdir(2), ... Further, FMODE_NEED_UNMOUNT gets cleared if the mount is spliced in at least once. Okay, having tried it you don't get an unmount event (since there's no actual unmount), but you do get an event to say that your watch got deleted (if the directory on which the watch was placed got removed from the system). So... does the "your watch got deleted" message need checking? In my opinion, it shouldn't get discarded because then the watcher doesn't know their watch went away. David
On 6/12/2019 10:41 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >>> (4) The security attributes of the object on which the watch was set (uid, >>> gid, mode, labels). >> Smack needs this to set a watch on the named object (file, key, ...). >> I am going to say that if you can't access an object you can't watch it. > So for the things I've so far defined: > > (*) Keys/keyrings require View permission, but it could be Read permission > instead - though that may get disabled if the key type does not support > KEYCTL_READ. View is good enough. > (*) Mount/superblock watches - I've made these require execute permission on > the specified directory. Could be read permission instead. Execute is good enough. > (*) Device events (block/usb) don't require any permissions, but currently > only deliver hardware notifications. How do you specify what device you want to watch? Don't you have to access a /dev/something? "currently" makes me nervous. >> I think that read access is sufficient provided that no one else can >> see what watches I've created. > You can't find out what watches exist. Not even your own? >>> At the moment, when post_one_notification() wants to write a notification >>> into a queue, it calls security_post_notification() to ask if it should be >>> allowed to do so. This is passed (1) and (3) above plus the notification >>> record. >> Is "current" (2)? Smack needs (2) for the event delivery access check. > (2) was current_cred() when watch_sb(), KEYCTL_NOTIFY, etc. was called, but > isn't necessarily current_cred() at the point post_one_notification() is > called. The latter is called at the point the event is generated and > current_cred() is the creds of whatever thread that is called in (which may be > a work_queue thread if it got deferred). > > At the moment, I'm storing the creds of whoever opened the queue (ie. (1)) and > using that, but I could cache the creds of whoever created each watch > (ie. (2)) and pass that to post_one_notification() instead. > > However, it should be noted that (1) is the creds of the buffer owner. How are buffers shared? Who besides the buffer creator can use it? >>> (e) All the points at which we walk over an object in a chain from (c) to >>> find the watch on which we can effect (d) (eg. we walk rootwards from a >>> mountpoint to find watches on a branch in the mount topology). >> Smack does not require anything beyond existing checks. > I'm glad to hear that, as this might be sufficiently impractical as to render > it unusable with Smack. Calling i_op->permissions() a lot would suck. > >>> (y) What checks should be done on object destruction after final put and >>> what contexts need to be supplied? >> Classically there is no such thing as filesystem object deletion. >> By making it possible to set a watch on that you've inadvertently >> added a security relevant action to the security model. :o > That wasn't my original intention - I intended fsmount(2) to mount directly as > mount(2) does, but Al had other ideas. > > Now fsmount(2) produces a file descriptor referring to a new mount object that > can be mounted into by move_mount(2) before being spliced into the mount > topology by move_mount(2). However, if the fd is closed before the last step, > close() will destroy the mount subtree attached to it (kind of quasi-unmount). > > That wouldn't be a problem, except that the fd from fsmount(2) can be used > anywhere an O_PATH fd can be used - including watch_mount(2), fchdir(2), ... > Further, FMODE_NEED_UNMOUNT gets cleared if the mount is spliced in at least > once. > > Okay, having tried it you don't get an unmount event (since there's no actual > unmount), but you do get an event to say that your watch got deleted (if the > directory on which the watch was placed got removed from the system). > > So... does the "your watch got deleted" message need checking? In my > opinion, it shouldn't get discarded because then the watcher doesn't know > their watch went away. Can you glean information from the watch being deleted? I wouldn't think so, and it seems like a one-time event from the system, so I don't think an access check would be required. > > David
Casey Schaufler <casey@schaufler-ca.com> wrote: > > (*) Device events (block/usb) don't require any permissions, but currently > > only deliver hardware notifications. > > How do you specify what device you want to watch? It's a general queue. > Don't you have to access a /dev/something? Not at the moment. One problem is that there may not be a /dev/something for a device (take a bridge for example), and even if it does, the device driver doesn't necessarily have access to the path. The messages contain the device name string as appears in dmesg ("3-7" for a USB device, for example). I think it would be wise to limit the general queue to hardware events that either get triggered by someone physically mucking around with the hardware or device errors, such as bad sectors or links going up and down. > > You can't find out what watches exist. > > Not even your own? No. > > However, it should be noted that (1) is the creds of the buffer owner. > > How are buffers shared? Who besides the buffer creator can use it? When you open /dev/watch_queue, you get buffers private to that file object; a second open of the device, even by the same process, will get different buffers. The buffers are 'attached' to that file and are accessed by calling mmap() on the fd; shareability is governed by how shareable the fd and a mapping are shareable. > Can you glean information from the watch being deleted? > I wouldn't think so, and it seems like a one-time event > from the system, so I don't think an access check would > be required. As you say, it's a one-time message per watch. The object that got deleted would need to be recreated, rewatched and made available to both parties. David
On 6/12/19 7:43 AM, David Howells wrote: > Stephen Smalley <sds@tycho.nsa.gov> wrote: > >>> (6) The security attributes of all the objects between the object in (5) >>> and the object in (4), assuming we work from (5) towards (4) if the >>> two aren't coincident (WATCH_INFO_RECURSIVE). >> >> Does this apply to anything other than mount notifications? > > Not at the moment. I'm considering making it such that you can make a watch > on a keyring get automatically propagated to keys that get added to the > keyring (and removed upon unlink) - the idea being that there is no 'single > parent path' concept for a keyring as there is for a directory. > > I'm also pondering the idea of making it possible to have superblock watches > automatically propagated to superblocks created by automount points on the > watched superblock. So at the point where you can set a watch on one object O1 with label X, and receive notifications triggered upon operations on another object O2 with label Y, we have to consider whether the relationship between X and Y is controlled in any way (possibly transitively through a series of checks performed earlier) and whether we can reasonably infer that the authorization to watch X implies the ability to be notified of operations on Y. Not a problem for the mount notifications AFAICS because there is only truly one object - the mount namespace itself, and it is always our own. > >> And for mount notifications, isn't the notification actually for a change to >> the mount namespace, not a change to any file? > > Yes. > >> Hence, the real "object" for events that trigger mount notifications is the >> mount namespace, right? > > Um... arguably. Would that mean that that would need a label from somewhere? That takes us into the whole question of whether namespaces should be labeled (presumably from their creator), and the association between processes and their namespaces should be controlled. I think when we originally looked at them, it wasn't much of a concern since the only means of creating a new namespace and associating with it was via clone() and then later also via unshare(). /proc/pid/ns and setns() changed that picture, but still requires ptrace read mode access, which at least provides some control over entering namespaces created by others. I suspect that ultimately we want namespaces to be labeled and controlled but that isn't your problem to solve here. For your purposes, a process is setting a watch on its own namespace, and it already inherently can observe changes to that namespace without needing watches/notifications, and it can modify that namespace iff privileged wrt to the namespace. One might argue that no check is required at all for setting the watch, and at most, it would be a check between the process and its own label to match the checking when accessing /proc/self/mounts. That presumes that no additional information is conveyed via the notification that isn't already available from /proc/self/mounts, particularly any information specific to the process that triggered the notification. Does that make sense? > >> The watched path is just a way of identifying a subtree of the mount >> namespace for notifications - it isn't the real object being watched. > > I like that argument. > > Thanks, > David >