Message ID | 155981411940.17513.7137844619951358374.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | Mount, FS, Block and Keyrings notifications [ver #3] | expand |
On 6/6/19 5:41 AM, David Howells wrote: > > Hi Al, > > Here's a set of patches to add a general variable-length notification queue > concept and to add sources of events for: > > (1) Mount topology events, such as mounting, unmounting, mount expiry, > mount reconfiguration. > > (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O > errors (not complete yet). > > (3) Key/keyring events, such as creating, linking and removal of keys. > > (4) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > One of the reasons for this is so that we can remove the issue of processes > having to repeatedly and regularly scan /proc/mounts, which has proven to > be a system performance problem. To further aid this, the fsinfo() syscall > on which this patch series depends, provides a way to access superblock and > mount information in binary form without the need to parse /proc/mounts. > > > LSM support is included, but controversial: > > (1) The creds of the process that did the fput() that reduced the refcount > to zero are cached in the file struct. > > (2) __fput() overrides the current creds with the creds from (1) whilst > doing the cleanup, thereby making sure that the creds seen by the > destruction notification generated by mntput() appears to come from > the last fputter. > > (3) security_post_notification() is called for each queue that we might > want to post a notification into, thereby allowing the LSM to prevent > covert communications. > > (?) Do I need to add security_set_watch(), say, to rule on whether a watch > may be set in the first place? I might need to add a variant per > watch-type. > > (?) Do I really need to keep track of the process creds in which an > implicit object destruction happened? For example, imagine you create > an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > refers to on close unless move_mount() clears that flag. Now, imagine > someone looking at that fd through procfs at the same time as you exit > due to an error. The LSM sees the destruction notification come from > the looker if they happen to do their fput() after yours. I'm not in favor of this approach. Can we check permission to the object being watched when a watch is set (read-like access), make sure every access that can trigger a notification requires a (write-like) permission to the accessed object, and make sure there is some sane way to control the relationship between the accessed object and the watched object (write-like)? For cases where we have no object per se or at least no security structure/label associated with it, we may have to fall back to a coarse-grained "Can the watcher get this kind of notification in general?". > > > Design decisions: > > (1) A misc chardev is used to create and open a ring buffer: > > fd = open("/dev/watch_queue", O_RDWR); > > which is then configured and mmap'd into userspace: > > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > The fd cannot be read or written (though there is a facility to use > write to inject records for debugging) and userspace just pulls data > directly out of the buffer. > > (2) The ring index pointers are stored inside the ring and are thus > accessible to userspace. Userspace should only update the tail > pointer and never the head pointer or risk breaking the buffer. The > kernel checks that the pointers appear valid before trying to use > them. A 'skip' record is maintained around the pointers. > > (3) poll() can be used to wait for data to appear in the buffer. > > (4) Records in the buffer are binary, typed and have a length so that they > can be of varying size. > > This means that multiple heterogeneous sources can share a common > buffer. Tags may be specified when a watchpoint is created to help > distinguish the sources. > > (5) The queue is reusable as there are 16 million types available, of > which I've used 4, so there is scope for others to be used. > > (6) Records are filterable as types have up to 256 subtypes that can be > individually filtered. Other filtration is also available. > > (7) Each time the buffer is opened, a new buffer is created - this means > that there's no interference between watchers. > > (8) When recording a notification, the kernel will not sleep, but will > rather mark a queue as overrun if there's insufficient space, thereby > avoiding userspace causing the kernel to hang. > > (9) The 'watchpoint' should be specific where possible, meaning that you > specify the object that you want to watch. > > (10) The buffer is created and then watchpoints are attached to it, using > one of: > > keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); > mount_notify(AT_FDCWD, "/", 0, fd, 0x02); > sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03); > > where in all three cases, fd indicates the queue and the number after > is a tag between 0 and 255. > > (11) The watch must be removed if either the watch buffer is destroyed or > the watched object is destroyed. > > > Things I want to avoid: > > (1) Introducing features that make the core VFS dependent on the network > stack or networking namespaces (ie. usage of netlink). > > (2) Dumping all this stuff into dmesg and having a daemon that sits there > parsing the output and distributing it as this then puts the > responsibility for security into userspace and makes handling > namespaces tricky. Further, dmesg might not exist or might be > inaccessible inside a container. > > (3) Letting users see events they shouldn't be able to see. > > > Further things that could be considered: > > (1) Adding a keyctl call to allow a watch on a keyring to be extended to > "children" of that keyring, such that the watch is removed from the > child if it is unlinked from the keyring. > > (2) Adding global superblock event queue. > > (3) Propagating watches to child superblock over automounts. > > > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications > > Changes: > > v3: I've added a USB notification source and reformulated the block > notification source so that there's now a common watch list, for which > the system call is now device_notify(). > > I've assigned a pair of unused ioctl numbers in the 'W' series to the > ioctls added by this series. > > I've also added a description of the kernel API to the documentation. > > v2: I've fixed various issues raised by Jann Horn and GregKH and moved to > krefs for refcounting. I've added some security features to try and > give Casey Schaufler the LSM control he wants. > > David > --- > David Howells (10): > security: Override creds in __fput() with last fputter's creds > General notification queue with user mmap()'able ring buffer > keys: Add a notification facility > vfs: Add a mount-notification facility > vfs: Add superblock notifications > fsinfo: Export superblock notification counter > Add a general, global device notification watch list > block: Add block layer notifications > usb: Add USB subsystem notifications > Add sample notification program > > > Documentation/ioctl/ioctl-number.txt | 1 > Documentation/security/keys/core.rst | 58 ++ > Documentation/watch_queue.rst | 492 ++++++++++++++++++ > arch/x86/entry/syscalls/syscall_32.tbl | 3 > arch/x86/entry/syscalls/syscall_64.tbl | 3 > block/Kconfig | 9 > block/blk-core.c | 29 + > drivers/base/Kconfig | 9 > drivers/base/Makefile | 1 > drivers/base/notify.c | 82 +++ > drivers/misc/Kconfig | 13 > drivers/misc/Makefile | 1 > drivers/misc/watch_queue.c | 889 ++++++++++++++++++++++++++++++++ > drivers/usb/core/Kconfig | 10 > drivers/usb/core/devio.c | 55 ++ > drivers/usb/core/hub.c | 3 > fs/Kconfig | 21 + > fs/Makefile | 1 > fs/file_table.c | 12 > fs/fsinfo.c | 12 > fs/mount.h | 33 + > fs/mount_notify.c | 180 ++++++ > fs/namespace.c | 9 > fs/super.c | 116 ++++ > include/linux/blkdev.h | 15 + > include/linux/dcache.h | 1 > include/linux/device.h | 7 > include/linux/fs.h | 79 +++ > include/linux/key.h | 4 > include/linux/lsm_hooks.h | 15 + > include/linux/security.h | 14 + > include/linux/syscalls.h | 5 > include/linux/usb.h | 19 + > include/linux/watch_queue.h | 87 +++ > include/uapi/linux/fsinfo.h | 10 > include/uapi/linux/keyctl.h | 1 > include/uapi/linux/watch_queue.h | 213 ++++++++ > kernel/sys_ni.c | 7 > mm/interval_tree.c | 2 > mm/memory.c | 1 > samples/Kconfig | 6 > samples/Makefile | 1 > samples/vfs/test-fsinfo.c | 13 > samples/watch_queue/Makefile | 9 > samples/watch_queue/watch_test.c | 310 +++++++++++ > security/keys/Kconfig | 10 > security/keys/compat.c | 2 > security/keys/gc.c | 5 > security/keys/internal.h | 30 + > security/keys/key.c | 37 + > security/keys/keyctl.c | 88 +++ > security/keys/keyring.c | 17 - > security/keys/request_key.c | 4 > security/security.c | 9 > 54 files changed, 3025 insertions(+), 38 deletions(-) > create mode 100644 Documentation/watch_queue.rst > create mode 100644 drivers/base/notify.c > create mode 100644 drivers/misc/watch_queue.c > create mode 100644 fs/mount_notify.c > create mode 100644 include/linux/watch_queue.h > create mode 100644 include/uapi/linux/watch_queue.h > create mode 100644 samples/watch_queue/Makefile > create mode 100644 samples/watch_queue/watch_test.c >
Stephen Smalley <sds@tycho.nsa.gov> wrote: This might be easier to discuss if you can reply to: https://lore.kernel.org/lkml/5393.1559768763@warthog.procyon.org.uk/ which is on the ver #2 posting of this patchset. > > LSM support is included, but controversial: > > > > (1) The creds of the process that did the fput() that reduced the refcount > > to zero are cached in the file struct. > > > > (2) __fput() overrides the current creds with the creds from (1) whilst > > doing the cleanup, thereby making sure that the creds seen by the > > destruction notification generated by mntput() appears to come from > > the last fputter. > > > > (3) security_post_notification() is called for each queue that we might > > want to post a notification into, thereby allowing the LSM to prevent > > covert communications. > > > > (?) Do I need to add security_set_watch(), say, to rule on whether a watch > > may be set in the first place? I might need to add a variant per > > watch-type. > > > > (?) Do I really need to keep track of the process creds in which an > > implicit object destruction happened? For example, imagine you create > > an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > > refers to on close unless move_mount() clears that flag. Now, imagine > > someone looking at that fd through procfs at the same time as you exit > > due to an error. The LSM sees the destruction notification come from > > the looker if they happen to do their fput() after yours. > > > I'm not in favor of this approach. Which bit? The last point? Keeping track of the process creds after an implicit object destruction. > Can we check permission to the object being watched when a watch is set > (read-like access), Yes, and I need to do that. I think it's likely to require an extra hook for each entry point added because the objects are different: int security_watch_key(struct watch *watch, struct key *key); int security_watch_sb(struct watch *watch, struct path *path); int security_watch_mount(struct watch *watch, struct path *path); int security_watch_devices(struct watch *watch); > make sure every access that can trigger a notification requires a > (write-like) permission to the accessed object, "write-like permssion" for whom? The triggerer or the watcher? There are various 'classes' of events: (1) System events (eg. hardware I/O errors, automount points expiring). (2) Direct events (eg. automounts, manual mounts, EDQUOT, key linkage). (3) Indirect events (eg. exit/close doing the last fput and causing an unmount). Class (1) are uncaused by a process, so I use init_cred for them. One could argue that the automount point expiry should perhaps take place under the creds of whoever triggered it in the first place, but we need to be careful about long-term cred pinning. Class (2) the causing process must've had permission to cause them - otherwise we wouldn't have got the event. Class (3) is interesting since it's currently entirely cleanup events and the process may have the right to do them (close, dup2, exit, but also execve) whether the LSM thinks it should be able to cause the object to be destroyed or not. It gets more complicated than that, though: multiple processes with different security attributes can all have fds pointing to a common file object - and the last one to close carries the can as far as the LSM is concerned. And yet more complicated when you throw in unix sockets with partially passed fds still in their queues. That's what patch 01 is designed to try and cope with. > and make sure there is some sane way to control the relationship between the > accessed object and the watched object (write-like)? This is the trick. Keys and superblocks have object labels of their own and don't - for now - propagate their watches. With these, the watch is on the object you initially assign it to and it goes no further than that. mount_notify() is the interesting case since we want to be able to detect mount topology change events from within the vfs subtree rooted at the watched directory without having to manually put a watch on every directory in that subtree - or even just every mount object. Or, maybe, that's what I'll have to do: make it mount_notify() can only apply to the subtree within its superblock, and the caller must call mount_notify() for every mount object it wants to monitor. That would at least ensure that the caller can, at that point, reach all those mount points. > For cases where we have no object per se or at least no security > structure/label associated with it, we may have to fall back to a > coarse-grained "Can the watcher get this kind of notification in general?". Agreed - and we should probably have that anyway. David
On 6/6/19 9:16 AM, David Howells wrote: > Stephen Smalley <sds@tycho.nsa.gov> wrote: > > This might be easier to discuss if you can reply to: > > https://lore.kernel.org/lkml/5393.1559768763@warthog.procyon.org.uk/ > > which is on the ver #2 posting of this patchset. Sorry for being late to the party. Not sure whether you're asking me to respond only there or both there and here to your comments below. I'll start here but can revisit if it's a problem. > >>> LSM support is included, but controversial: >>> >>> (1) The creds of the process that did the fput() that reduced the refcount >>> to zero are cached in the file struct. >>> >>> (2) __fput() overrides the current creds with the creds from (1) whilst >>> doing the cleanup, thereby making sure that the creds seen by the >>> destruction notification generated by mntput() appears to come from >>> the last fputter. >>> >>> (3) security_post_notification() is called for each queue that we might >>> want to post a notification into, thereby allowing the LSM to prevent >>> covert communications. >>> >>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>> may be set in the first place? I might need to add a variant per >>> watch-type. >>> >>> (?) Do I really need to keep track of the process creds in which an >>> implicit object destruction happened? For example, imagine you create >>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>> refers to on close unless move_mount() clears that flag. Now, imagine >>> someone looking at that fd through procfs at the same time as you exit >>> due to an error. The LSM sees the destruction notification come from >>> the looker if they happen to do their fput() after yours. >> >> >> I'm not in favor of this approach. > > Which bit? The last point? Keeping track of the process creds after an > implicit object destruction. (1), (2), (3), and the last point. >> Can we check permission to the object being watched when a watch is set >> (read-like access), > > Yes, and I need to do that. I think it's likely to require an extra hook for > each entry point added because the objects are different: > > int security_watch_key(struct watch *watch, struct key *key); > int security_watch_sb(struct watch *watch, struct path *path); > int security_watch_mount(struct watch *watch, struct path *path); > int security_watch_devices(struct watch *watch); > >> make sure every access that can trigger a notification requires a >> (write-like) permission to the accessed object, > > "write-like permssion" for whom? The triggerer or the watcher? The former, i.e. the process that performed the operation that triggered the notification. Think of it as a write from the process to the accessed object, which triggers a notification (another write) on some related object (the watched object), which is then read by the watcher. > There are various 'classes' of events: > > (1) System events (eg. hardware I/O errors, automount points expiring). > > (2) Direct events (eg. automounts, manual mounts, EDQUOT, key linkage). > > (3) Indirect events (eg. exit/close doing the last fput and causing an > unmount). > > Class (1) are uncaused by a process, so I use init_cred for them. One could > argue that the automount point expiry should perhaps take place under the > creds of whoever triggered it in the first place, but we need to be careful > about long-term cred pinning. This seems equivalent to just checking whether the watcher is allowed to get that kind of event, no other cred truly needed. > Class (2) the causing process must've had permission to cause them - otherwise > we wouldn't have got the event. So we've already done a check on the causing process, and we're going to check whether the watcher can set the watch. We just need to establish the connection between the accessed object and the watched object in some manner. > Class (3) is interesting since it's currently entirely cleanup events and the > process may have the right to do them (close, dup2, exit, but also execve) > whether the LSM thinks it should be able to cause the object to be destroyed > or not. > > It gets more complicated than that, though: multiple processes with different > security attributes can all have fds pointing to a common file object - and > the last one to close carries the can as far as the LSM is concerned. Yes, I'd prefer to avoid that. You can't write policy that is stable and meaningful that way. This may fall under a similar situation as class (1) - all we can meaningfully do is check whether the watcher is allowed to see all such events. > And yet more complicated when you throw in unix sockets with partially passed > fds still in their queues. That's what patch 01 is designed to try and cope > with. > >> and make sure there is some sane way to control the relationship between the >> accessed object and the watched object (write-like)? > > This is the trick. Keys and superblocks have object labels of their own and > don't - for now - propagate their watches. With these, the watch is on the > object you initially assign it to and it goes no further than that. > > mount_notify() is the interesting case since we want to be able to detect > mount topology change events from within the vfs subtree rooted at the watched > directory without having to manually put a watch on every directory in that > subtree - or even just every mount object. > > Or, maybe, that's what I'll have to do: make it mount_notify() can only apply > to the subtree within its superblock, and the caller must call mount_notify() > for every mount object it wants to monitor. That would at least ensure that > the caller can, at that point, reach all those mount points. Would that at least make it consistent with fanotify (not that it provides a great example)? >> For cases where we have no object per se or at least no security >> structure/label associated with it, we may have to fall back to a >> coarse-grained "Can the watcher get this kind of notification in general?". > > Agreed - and we should probably have that anyway. > > David
On Thu, Jun 06, 2019 at 10:41:59AM +0100, David Howells wrote: > > Hi Al, > > Here's a set of patches to add a general variable-length notification queue > concept and to add sources of events for: > > (1) Mount topology events, such as mounting, unmounting, mount expiry, > mount reconfiguration. > > (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O > errors (not complete yet). > > (3) Key/keyring events, such as creating, linking and removal of keys. > > (4) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > One of the reasons for this is so that we can remove the issue of processes > having to repeatedly and regularly scan /proc/mounts, which has proven to > be a system performance problem. To further aid this, the fsinfo() syscall > on which this patch series depends, provides a way to access superblock and > mount information in binary form without the need to parse /proc/mounts. > > > LSM support is included, but controversial: Apart from the LSM/security controversy here the general direction of this patchset is pretty well received it seems. Imho, having a notification mechanism like this is a very good thing for userspace. So would be great if there'd be a consensus on the LSM bits. Christian
On 6/6/2019 7:05 AM, Stephen Smalley wrote: > On 6/6/19 9:16 AM, David Howells wrote: >> Stephen Smalley <sds@tycho.nsa.gov> wrote: >> >> This might be easier to discuss if you can reply to: >> >> https://lore.kernel.org/lkml/5393.1559768763@warthog.procyon.org.uk/ >> >> which is on the ver #2 posting of this patchset. > > Sorry for being late to the party. Not sure whether you're asking me to respond only there or both there and here to your comments below. I'll start here but can revisit if it's a problem. >> >>>> LSM support is included, but controversial: >>>> >>>> (1) The creds of the process that did the fput() that reduced the refcount >>>> to zero are cached in the file struct. >>>> >>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>> doing the cleanup, thereby making sure that the creds seen by the >>>> destruction notification generated by mntput() appears to come from >>>> the last fputter. >>>> >>>> (3) security_post_notification() is called for each queue that we might >>>> want to post a notification into, thereby allowing the LSM to prevent >>>> covert communications. >>>> >>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>> may be set in the first place? I might need to add a variant per >>>> watch-type. >>>> >>>> (?) Do I really need to keep track of the process creds in which an >>>> implicit object destruction happened? For example, imagine you create >>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>> someone looking at that fd through procfs at the same time as you exit >>>> due to an error. The LSM sees the destruction notification come from >>>> the looker if they happen to do their fput() after yours. >>> >>> >>> I'm not in favor of this approach. >> >> Which bit? The last point? Keeping track of the process creds after an >> implicit object destruction. > > (1), (2), (3), and the last point. > >>> Can we check permission to the object being watched when a watch is set >>> (read-like access), >> >> Yes, and I need to do that. I think it's likely to require an extra hook for >> each entry point added because the objects are different: >> >> int security_watch_key(struct watch *watch, struct key *key); >> int security_watch_sb(struct watch *watch, struct path *path); >> int security_watch_mount(struct watch *watch, struct path *path); >> int security_watch_devices(struct watch *watch); >> >>> make sure every access that can trigger a notification requires a >>> (write-like) permission to the accessed object, >> >> "write-like permssion" for whom? The triggerer or the watcher? > > The former, i.e. the process that performed the operation that triggered the notification. Think of it as a write from the process to the accessed object, which triggers a notification (another write) on some related object (the watched object), which is then read by the watcher. We agree that the process that performed the operation that triggered the notification is the initial subject. Smack will treat the process with the watch set (in particular, its ring buffer) as the object being written to. SELinux, with its finer grained controls, will involve other things as noted above. There are other place where we see this, notably IP packet delivery. The implication is that the information about the triggering process needs to be available throughout. > >> There are various 'classes' of events: >> >> (1) System events (eg. hardware I/O errors, automount points expiring). >> >> (2) Direct events (eg. automounts, manual mounts, EDQUOT, key linkage). >> >> (3) Indirect events (eg. exit/close doing the last fput and causing an >> unmount). >> >> Class (1) are uncaused by a process, so I use init_cred for them. One could >> argue that the automount point expiry should perhaps take place under the >> creds of whoever triggered it in the first place, but we need to be careful >> about long-term cred pinning. > > This seems equivalent to just checking whether the watcher is allowed to get that kind of event, no other cred truly needed. > >> Class (2) the causing process must've had permission to cause them - otherwise >> we wouldn't have got the event. > > So we've already done a check on the causing process, and we're going to check whether the watcher can set the watch. We just need to establish the connection between the accessed object and the watched object in some manner. I don't agree. That is, I don't believe it is sufficient. There is no guarantee that being able to set a watch on an object implies that every process that can trigger the event can send it to you. Watcher has Smack label W Triggerer has Smack label T Watched object has Smack label O Relevant Smack rules are W O rw T O rw The watcher will be able to set the watch, the triggerer will be able to trigger the event, but there is nothing that would allow the watcher to receive the event. This is not a case of watcher reading the watched object, as the event is delivered without any action by watcher. > >> Class (3) is interesting since it's currently entirely cleanup events and the >> process may have the right to do them (close, dup2, exit, but also execve) >> whether the LSM thinks it should be able to cause the object to be destroyed >> or not. >> >> It gets more complicated than that, though: multiple processes with different >> security attributes can all have fds pointing to a common file object - and >> the last one to close carries the can as far as the LSM is concerned. > > Yes, I'd prefer to avoid that. You can't write policy that is stable and meaningful that way. This may fall under a similar situation as class (1) - all we can meaningfully do is check whether the watcher is allowed to see all such events. Back in the day when we were doing security evaluations the implications of "deleting" an object gave the security evaluators fits. UNIX/Linux files don't get deleted, they simply fall off the filesystem namespace when no one cares about them anymore. The model we used back in the day was that "delete" wasn't an operation that occurs on filesystem objects. But now you want to do something with security implications when the object disappears. We could say that the event does nothing but signal that the system has removed the watch on your behalf because it is now meaningless. No reason to worry about who dropped the last reference. We don't care about that from a policy viewpoint anyway. > >> And yet more complicated when you throw in unix sockets with partially passed >> fds still in their queues. That's what patch 01 is designed to try and cope >> with. >> >>> and make sure there is some sane way to control the relationship between the >>> accessed object and the watched object (write-like)? >> >> This is the trick. Keys and superblocks have object labels of their own and >> don't - for now - propagate their watches. With these, the watch is on the >> object you initially assign it to and it goes no further than that. >> >> mount_notify() is the interesting case since we want to be able to detect >> mount topology change events from within the vfs subtree rooted at the watched >> directory without having to manually put a watch on every directory in that >> subtree - or even just every mount object. > >> Or, maybe, that's what I'll have to do: make it mount_notify() can only apply >> to the subtree within its superblock, and the caller must call mount_notify() >> for every mount object it wants to monitor. That would at least ensure that >> the caller can, at that point, reach all those mount points. > > Would that at least make it consistent with fanotify (not that it provides a great example)? > >>> For cases where we have no object per se or at least no security >>> structure/label associated with it, we may have to fall back to a >>> coarse-grained "Can the watcher get this kind of notification in general?". >> >> Agreed - and we should probably have that anyway. >> >> David
On Thu, Jun 6, 2019 at 9:43 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 6/6/2019 7:05 AM, Stephen Smalley wrote: > > On 6/6/19 9:16 AM, David Howells wrote: > >> Stephen Smalley <sds@tycho.nsa.gov> wrote: > >> > >> This might be easier to discuss if you can reply to: > >> > >> https://lore.kernel.org/lkml/5393.1559768763@warthog.procyon.org.uk/ > >> > >> which is on the ver #2 posting of this patchset. > > > > Sorry for being late to the party. Not sure whether you're asking me to respond only there or both there and here to your comments below. I'll start here but can revisit if it's a problem. > >> > >>>> LSM support is included, but controversial: > >>>> > >>>> (1) The creds of the process that did the fput() that reduced the refcount > >>>> to zero are cached in the file struct. > >>>> > >>>> (2) __fput() overrides the current creds with the creds from (1) whilst > >>>> doing the cleanup, thereby making sure that the creds seen by the > >>>> destruction notification generated by mntput() appears to come from > >>>> the last fputter. > >>>> > >>>> (3) security_post_notification() is called for each queue that we might > >>>> want to post a notification into, thereby allowing the LSM to prevent > >>>> covert communications. > >>>> > >>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch > >>>> may be set in the first place? I might need to add a variant per > >>>> watch-type. > >>>> > >>>> (?) Do I really need to keep track of the process creds in which an > >>>> implicit object destruction happened? For example, imagine you create > >>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it > >>>> refers to on close unless move_mount() clears that flag. Now, imagine > >>>> someone looking at that fd through procfs at the same time as you exit > >>>> due to an error. The LSM sees the destruction notification come from > >>>> the looker if they happen to do their fput() after yours. > >>> > >>> > >>> I'm not in favor of this approach. > >> > >> Which bit? The last point? Keeping track of the process creds after an > >> implicit object destruction. > > > > (1), (2), (3), and the last point. > > > >>> Can we check permission to the object being watched when a watch is set > >>> (read-like access), > >> > >> Yes, and I need to do that. I think it's likely to require an extra hook for > >> each entry point added because the objects are different: > >> > >> int security_watch_key(struct watch *watch, struct key *key); > >> int security_watch_sb(struct watch *watch, struct path *path); > >> int security_watch_mount(struct watch *watch, struct path *path); > >> int security_watch_devices(struct watch *watch); > >> > >>> make sure every access that can trigger a notification requires a > >>> (write-like) permission to the accessed object, > >> > >> "write-like permssion" for whom? The triggerer or the watcher? > > > > The former, i.e. the process that performed the operation that triggered the notification. Think of it as a write from the process to the accessed object, which triggers a notification (another write) on some related object (the watched object), which is then read by the watcher. > > We agree that the process that performed the operation that triggered > the notification is the initial subject. Smack will treat the process > with the watch set (in particular, its ring buffer) as the object > being written to. SELinux, with its finer grained controls, will > involve other things as noted above. There are other place where we > see this, notably IP packet delivery. > > The implication is that the information about the triggering > process needs to be available throughout. > > > > >> There are various 'classes' of events: > >> > >> (1) System events (eg. hardware I/O errors, automount points expiring). > >> > >> (2) Direct events (eg. automounts, manual mounts, EDQUOT, key linkage). > >> > >> (3) Indirect events (eg. exit/close doing the last fput and causing an > >> unmount). > >> > >> Class (1) are uncaused by a process, so I use init_cred for them. One could > >> argue that the automount point expiry should perhaps take place under the > >> creds of whoever triggered it in the first place, but we need to be careful > >> about long-term cred pinning. > > > > This seems equivalent to just checking whether the watcher is allowed to get that kind of event, no other cred truly needed. > > > >> Class (2) the causing process must've had permission to cause them - otherwise > >> we wouldn't have got the event. > > > > So we've already done a check on the causing process, and we're going to check whether the watcher can set the watch. We just need to establish the connection between the accessed object and the watched object in some manner. > > I don't agree. That is, I don't believe it is sufficient. > There is no guarantee that being able to set a watch on an > object implies that every process that can trigger the event > can send it to you. > > Watcher has Smack label W > Triggerer has Smack label T > Watched object has Smack label O > > Relevant Smack rules are > > W O rw > T O rw > > The watcher will be able to set the watch, > the triggerer will be able to trigger the event, > but there is nothing that would allow the watcher > to receive the event. This is not a case of watcher > reading the watched object, as the event is delivered > without any action by watcher. I think this is an example of a bogus policy that should not be supported by the kernel.
On 6/6/19 12:43 PM, Casey Schaufler wrote: > On 6/6/2019 7:05 AM, Stephen Smalley wrote: >> On 6/6/19 9:16 AM, David Howells wrote: >>> Stephen Smalley <sds@tycho.nsa.gov> wrote: >>> >>> This might be easier to discuss if you can reply to: >>> >>> https://lore.kernel.org/lkml/5393.1559768763@warthog.procyon.org.uk/ >>> >>> which is on the ver #2 posting of this patchset. >> >> Sorry for being late to the party. Not sure whether you're asking me to respond only there or both there and here to your comments below. I'll start here but can revisit if it's a problem. >>> >>>>> LSM support is included, but controversial: >>>>> >>>>> (1) The creds of the process that did the fput() that reduced the refcount >>>>> to zero are cached in the file struct. >>>>> >>>>> (2) __fput() overrides the current creds with the creds from (1) whilst >>>>> doing the cleanup, thereby making sure that the creds seen by the >>>>> destruction notification generated by mntput() appears to come from >>>>> the last fputter. >>>>> >>>>> (3) security_post_notification() is called for each queue that we might >>>>> want to post a notification into, thereby allowing the LSM to prevent >>>>> covert communications. >>>>> >>>>> (?) Do I need to add security_set_watch(), say, to rule on whether a watch >>>>> may be set in the first place? I might need to add a variant per >>>>> watch-type. >>>>> >>>>> (?) Do I really need to keep track of the process creds in which an >>>>> implicit object destruction happened? For example, imagine you create >>>>> an fd with fsopen()/fsmount(). It is marked to dissolve the mount it >>>>> refers to on close unless move_mount() clears that flag. Now, imagine >>>>> someone looking at that fd through procfs at the same time as you exit >>>>> due to an error. The LSM sees the destruction notification come from >>>>> the looker if they happen to do their fput() after yours. >>>> >>>> >>>> I'm not in favor of this approach. >>> >>> Which bit? The last point? Keeping track of the process creds after an >>> implicit object destruction. >> >> (1), (2), (3), and the last point. >> >>>> Can we check permission to the object being watched when a watch is set >>>> (read-like access), >>> >>> Yes, and I need to do that. I think it's likely to require an extra hook for >>> each entry point added because the objects are different: >>> >>> int security_watch_key(struct watch *watch, struct key *key); >>> int security_watch_sb(struct watch *watch, struct path *path); >>> int security_watch_mount(struct watch *watch, struct path *path); >>> int security_watch_devices(struct watch *watch); >>> >>>> make sure every access that can trigger a notification requires a >>>> (write-like) permission to the accessed object, >>> >>> "write-like permssion" for whom? The triggerer or the watcher? >> >> The former, i.e. the process that performed the operation that triggered the notification. Think of it as a write from the process to the accessed object, which triggers a notification (another write) on some related object (the watched object), which is then read by the watcher. > > We agree that the process that performed the operation that triggered > the notification is the initial subject. Smack will treat the process > with the watch set (in particular, its ring buffer) as the object > being written to. SELinux, with its finer grained controls, will > involve other things as noted above. There are other place where we > see this, notably IP packet delivery. > > The implication is that the information about the triggering > process needs to be available throughout. > >> >>> There are various 'classes' of events: >>> >>> (1) System events (eg. hardware I/O errors, automount points expiring). >>> >>> (2) Direct events (eg. automounts, manual mounts, EDQUOT, key linkage). >>> >>> (3) Indirect events (eg. exit/close doing the last fput and causing an >>> unmount). >>> >>> Class (1) are uncaused by a process, so I use init_cred for them. One could >>> argue that the automount point expiry should perhaps take place under the >>> creds of whoever triggered it in the first place, but we need to be careful >>> about long-term cred pinning. >> >> This seems equivalent to just checking whether the watcher is allowed to get that kind of event, no other cred truly needed. >> >>> Class (2) the causing process must've had permission to cause them - otherwise >>> we wouldn't have got the event. >> >> So we've already done a check on the causing process, and we're going to check whether the watcher can set the watch. We just need to establish the connection between the accessed object and the watched object in some manner. > > I don't agree. That is, I don't believe it is sufficient. > There is no guarantee that being able to set a watch on an > object implies that every process that can trigger the event > can send it to you. > > Watcher has Smack label W > Triggerer has Smack label T > Watched object has Smack label O > > Relevant Smack rules are > > W O rw > T O rw > > The watcher will be able to set the watch, > the triggerer will be able to trigger the event, > but there is nothing that would allow the watcher > to receive the event. This is not a case of watcher > reading the watched object, as the event is delivered > without any action by watcher. You are allowing arbitrary information flow between T and W above. Who cares about notifications? How is it different from W and T mapping the same file as a shared mapping and communicating by reading and writing the shared memory? You aren't performing a permission check directly between W and T there. > >> >>> Class (3) is interesting since it's currently entirely cleanup events and the >>> process may have the right to do them (close, dup2, exit, but also execve) >>> whether the LSM thinks it should be able to cause the object to be destroyed >>> or not. >>> >>> It gets more complicated than that, though: multiple processes with different >>> security attributes can all have fds pointing to a common file object - and >>> the last one to close carries the can as far as the LSM is concerned. >> >> Yes, I'd prefer to avoid that. You can't write policy that is stable and meaningful that way. This may fall under a similar situation as class (1) - all we can meaningfully do is check whether the watcher is allowed to see all such events. > > Back in the day when we were doing security evaluations > the implications of "deleting" an object gave the security > evaluators fits. UNIX/Linux files don't get deleted, they > simply fall off the filesystem namespace when no one cares > about them anymore. The model we used back in the day was > that "delete" wasn't an operation that occurs on filesystem > objects. > > But now you want to do something with security implications > when the object disappears. We could say that the event does > nothing but signal that the system has removed the watch on > your behalf because it is now meaningless. No reason to worry > about who dropped the last reference. We don't care about > that from a policy viewpoint anyway. > >> >>> And yet more complicated when you throw in unix sockets with partially passed >>> fds still in their queues. That's what patch 01 is designed to try and cope >>> with. >>> >>>> and make sure there is some sane way to control the relationship between the >>>> accessed object and the watched object (write-like)? >>> >>> This is the trick. Keys and superblocks have object labels of their own and >>> don't - for now - propagate their watches. With these, the watch is on the >>> object you initially assign it to and it goes no further than that. >>> >>> mount_notify() is the interesting case since we want to be able to detect >>> mount topology change events from within the vfs subtree rooted at the watched >>> directory without having to manually put a watch on every directory in that >>> subtree - or even just every mount object. > >>> Or, maybe, that's what I'll have to do: make it mount_notify() can only apply >>> to the subtree within its superblock, and the caller must call mount_notify() >>> for every mount object it wants to monitor. That would at least ensure that >>> the caller can, at that point, reach all those mount points. >> >> Would that at least make it consistent with fanotify (not that it provides a great example)? >> >>>> For cases where we have no object per se or at least no security >>>> structure/label associated with it, we may have to fall back to a >>>> coarse-grained "Can the watcher get this kind of notification in general?". >>> >>> Agreed - and we should probably have that anyway. >>> >>> David >
On 6/6/2019 10:11 AM, Andy Lutomirski wrote: > On Thu, Jun 6, 2019 at 9:43 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >> ... >> I don't agree. That is, I don't believe it is sufficient. >> There is no guarantee that being able to set a watch on an >> object implies that every process that can trigger the event >> can send it to you. >> >> Watcher has Smack label W >> Triggerer has Smack label T >> Watched object has Smack label O >> >> Relevant Smack rules are >> >> W O rw >> T O rw >> >> The watcher will be able to set the watch, >> the triggerer will be able to trigger the event, >> but there is nothing that would allow the watcher >> to receive the event. This is not a case of watcher >> reading the watched object, as the event is delivered >> without any action by watcher. > I think this is an example of a bogus policy that should not be > supported by the kernel. At this point it's pretty hard for me to care much what you think. You don't seem to have any insight into the implications of the features you're advocating, or their potential consequences.
> On Jun 6, 2019, at 11:33 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >> On 6/6/2019 10:11 AM, Andy Lutomirski wrote: >>> On Thu, Jun 6, 2019 at 9:43 AM Casey Schaufler <casey@schaufler-ca.com> wrote: >>> ... >>> I don't agree. That is, I don't believe it is sufficient. >>> There is no guarantee that being able to set a watch on an >>> object implies that every process that can trigger the event >>> can send it to you. >>> >>> Watcher has Smack label W >>> Triggerer has Smack label T >>> Watched object has Smack label O >>> >>> Relevant Smack rules are >>> >>> W O rw >>> T O rw >>> >>> The watcher will be able to set the watch, >>> the triggerer will be able to trigger the event, >>> but there is nothing that would allow the watcher >>> to receive the event. This is not a case of watcher >>> reading the watched object, as the event is delivered >>> without any action by watcher. >> I think this is an example of a bogus policy that should not be >> supported by the kernel. > > At this point it's pretty hard for me to care much what > you think. You don't seem to have any insight into the > implications of the features you're advocating, or their > potential consequences. > > Can you try to spell it out, then? A mostly or fully worked out example might help. As Stephen said, it looks like you are considering cases where there is already a full communication channel between two processes, and you’re concerned that this new mechanism might add a side channel too. If this is wrong, can you explain how?
On 6/6/2019 10:16 AM, Stephen Smalley wrote: > On 6/6/19 12:43 PM, Casey Schaufler wrote: >> ... >> I don't agree. That is, I don't believe it is sufficient. >> There is no guarantee that being able to set a watch on an >> object implies that every process that can trigger the event >> can send it to you. >> >> Watcher has Smack label W >> Triggerer has Smack label T >> Watched object has Smack label O >> >> Relevant Smack rules are >> >> W O rw >> T O rw >> >> The watcher will be able to set the watch, >> the triggerer will be able to trigger the event, >> but there is nothing that would allow the watcher >> to receive the event. This is not a case of watcher >> reading the watched object, as the event is delivered >> without any action by watcher. > > You are allowing arbitrary information flow between T and W above. Who cares about notifications? I do. If Watched object is /dev/null no data flow is possible. There are many objects on a modern Linux system for which this is true. Even if it's "just a file" the existence of one path for data to flow does not justify ignoring the rules for other data paths. > > How is it different from W and T mapping the same file as a shared mapping and communicating by reading and writing the shared memory? You aren't performing a permission check directly between W and T there. In this case there is one object O, two subjects, W and T and two accesses. W open O T open O They fiddle about with the data in O. In the event case, there are two objects, O and W, two subjects W and T, and three accesses. W watch O T trigger O T write-event W You can't wave away the flow of data. Different objects are involved. An analogy is that two processes with different UIDs can open a file, but still can't signal each other. Different mechanisms have different policies. I'm not saying that's good, but it's the context we're in.
On Thu, Jun 6, 2019 at 11:56 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 6/6/2019 10:16 AM, Stephen Smalley wrote: > > On 6/6/19 12:43 PM, Casey Schaufler wrote: > >> ... > >> I don't agree. That is, I don't believe it is sufficient. > >> There is no guarantee that being able to set a watch on an > >> object implies that every process that can trigger the event > >> can send it to you. > >> > >> Watcher has Smack label W > >> Triggerer has Smack label T > >> Watched object has Smack label O > >> > >> Relevant Smack rules are > >> > >> W O rw > >> T O rw > >> > >> The watcher will be able to set the watch, > >> the triggerer will be able to trigger the event, > >> but there is nothing that would allow the watcher > >> to receive the event. This is not a case of watcher > >> reading the watched object, as the event is delivered > >> without any action by watcher. > > > > You are allowing arbitrary information flow between T and W above. Who cares about notifications? > > I do. If Watched object is /dev/null no data flow is possible. > There are many objects on a modern Linux system for which this > is true. Even if it's "just a file" the existence of one path > for data to flow does not justify ignoring the rules for other > data paths. Aha! Even ignoring security, writes to things like /dev/null should probably not trigger notifications to people who are watching /dev/null. (There are probably lots of things like this: /dev/zero, /dev/urandom, etc.) David, are there any notification types that have this issue in your patchset? If so, is there a straightforward way to fix it? Generically, it seems like maybe writes to device nodes shouldn't trigger notifications since, despite the fact that different openers of a device node share an inode, there isn't necessarily any connection between them. Casey, if this is fixed in general, do you have another case where the right to write and the right to read do not imply the right to communicate? > An analogy is that two processes with different UIDs can open a file, > but still can't signal each other. What do you mean "signal"? If two processes with different UIDs can open the same file for read and write, then they can communicate with each other in many ways. For example, one can write to the file and the other can read it. One can take locks and the other can read the lock state. They can both map it and use any number of memory access side channels to communicate. But, of course, they can't send each other signals with kill(). If, however, one of these processes is using some fancy mechanism (inotify, dnotify, kqueue, fanotify, whatever) to watch the file, and the other one writes it, then it seems inconsistent to lie to the watching process and say that the file wasn't written because some security policy has decided to allow the write, allow the read, but suppress this particular notification. Hence my request for a real example: when would it make sense to do this?
Andy Lutomirski <luto@kernel.org> wrote: > > > You are allowing arbitrary information flow between T and W above. Who > > > cares about notifications? > > > > I do. If Watched object is /dev/null no data flow is possible. > > There are many objects on a modern Linux system for which this > > is true. Even if it's "just a file" the existence of one path > > for data to flow does not justify ignoring the rules for other > > data paths. > > Aha! > > Even ignoring security, writes to things like /dev/null should > probably not trigger notifications to people who are watching > /dev/null. (There are probably lots of things like this: /dev/zero, > /dev/urandom, etc.) Even writes to /dev/null might generate access notifications; leastways, vfs_read() will call fsnotify_access() afterwards on success. Whether or not you can set marks on open device files is another matter. > David, are there any notification types that have this issue in your > patchset? If so, is there a straightforward way to fix it? I'm not sure what issue you're referring to specifically. Do you mean whether writes to device files generate notifications? > Generically, it seems like maybe writes to device nodes shouldn't trigger > notifications since, despite the fact that different openers of a device > node share an inode, there isn't necessarily any connection between them. With the notification types I have currently implemented, I don't even notice any accesses to a device file unless: (1) Someone mounts over the top of one. (2) The access triggers an I/O error or device reset or causes the device to be attached or detached. (3) Wangling the device causes some other superblock event. (4) The driver calls request_key() and that creates a new key. > Casey, if this is fixed in general, do you have another case where the > right to write and the right to read do not imply the right to > communicate? > > > An analogy is that two processes with different UIDs can open a file, > > but still can't signal each other. > > What do you mean "signal"? If two processes with different UIDs can > open the same file for read and write, then they can communicate with > each other in many ways. For example, one can write to the file and > the other can read it. One can take locks and the other can read the > lock state. They can both map it and use any number of memory access > side channels to communicate. But, of course, they can't send each > other signals with kill(). > > If, however, one of these processes is using some fancy mechanism > (inotify, dnotify, kqueue, fanotify, whatever) to watch the file, and > the other one writes it, then it seems inconsistent to lie to the > watching process and say that the file wasn't written because some > security policy has decided to allow the write, allow the read, but > suppress this particular notification. Hence my request for a real > example: when would it make sense to do this? Note that fanotify requires CAP_SYS_ADMIN, but inotify and dnotify do not. dnotify is applied to an open file, so it might be usable on a chardev such as /dev/null, say. David
> On Jun 6, 2019, at 2:17 PM, David Howells <dhowells@redhat.com> wrote: > > Andy Lutomirski <luto@kernel.org> wrote: > >>>> You are allowing arbitrary information flow between T and W above. Who >>>> cares about notifications? >>> >>> I do. If Watched object is /dev/null no data flow is possible. >>> There are many objects on a modern Linux system for which this >>> is true. Even if it's "just a file" the existence of one path >>> for data to flow does not justify ignoring the rules for other >>> data paths. >> >> Aha! >> >> Even ignoring security, writes to things like /dev/null should >> probably not trigger notifications to people who are watching >> /dev/null. (There are probably lots of things like this: /dev/zero, >> /dev/urandom, etc.) > > Even writes to /dev/null might generate access notifications; leastways, > vfs_read() will call fsnotify_access() afterwards on success. Hmm. I can see this being an issue, but I guess not with your patch set. > > Whether or not you can set marks on open device files is another matter. > >> David, are there any notification types that have this issue in your >> patchset? If so, is there a straightforward way to fix it? > > I'm not sure what issue you're referring to specifically. Do you mean whether > writes to device files generate notifications? I mean: are there cases where some action generates a notification but does not otherwise have an effect visible to the users who can receive the notification. It looks like the answer is probably “no”, which is good. Casey, is this good enough for you, or is there still an issue?
Andy Lutomirski <luto@amacapital.net> wrote: > I mean: are there cases where some action generates a notification but does > not otherwise have an effect visible to the users who can receive the > notification. It looks like the answer is probably “no”, which is good. mount_notify(). You can get a notification that someone altered the mount topology (eg. by mounting something). A process receiving a notification could then use fsinfo(), say, to reread the mount topology tree, find out where the new mount is and wander over there to have a look - assuming they have the permissions for pathwalk to succeed. David
> On Jun 6, 2019, at 3:38 PM, David Howells <dhowells@redhat.com> wrote: > > Andy Lutomirski <luto@amacapital.net> wrote: > >> I mean: are there cases where some action generates a notification but does >> not otherwise have an effect visible to the users who can receive the >> notification. It looks like the answer is probably “no”, which is good. > > mount_notify(). You can get a notification that someone altered the mount > topology (eg. by mounting something). A process receiving a notification > could then use fsinfo(), say, to reread the mount topology tree, find out > where the new mount is and wander over there to have a look - assuming they > have the permissions for pathwalk to succeed. > > They can call fsinfo() anyway, or just read /proc/self/mounts. As far as I’m concerned, if you have CAP_SYS_ADMIN over a mount namespace and LSM policy lets you mount things, the of course you can get information to basically anyone who can use that mount namespace.
Andy Lutomirski <luto@amacapital.net> wrote: > They can call fsinfo() anyway, or just read /proc/self/mounts. As far as I’m > concerned, if you have CAP_SYS_ADMIN over a mount namespace and LSM policy > lets you mount things, the of course you can get information to basically > anyone who can use that mount namespace. And automounts? You don't need CAP_SYS_ADMIN to trigger one of those, but they still generate events. On the other hand, you need CSA to mount something that has automounts in the first place, and if you're particularly concerned about security, you probably don't want the processes you might be suspicious of having access to things that contain automounts (typically network filesystems). David