Message ID | 156717343223.2204.15875738850129174524.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | Keyrings, Block and USB notifications [ver #7] | expand |
.\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public Licence .\" as published by the Free Software Foundation; either version .\" 2 of the Licence, or (at your option) any later version. .\" .TH WATCH_QUEUE 7 "28 Aug 2019" Linux "General Kernel Notifications" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME /dev/watch_queue \- General kernel notification queue .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS #include <linux/watch_queue.h> .EX int fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, size / page_size); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); .EE .SH OVERVIEW .PP The general kernel notification queue is a general purpose transport for kernel notification messages to userspace. Notification messages are marked with type information so that events from multiple sources can be distinguished. Messages are also of variable length to accommodate different information for each type. .PP This queue is implemented as a misc device that can be opened multiple times, each opening creating a fully independent queue. Queues are then configured with the size and filtering, event sources are attached and the queue is mapped into a process's VM. .PP Queues take the form of a ring buffer with shared index pointers, all of which is accessed directly within the mapping. There are no read and write methods, though poll is provided so that the buffer can be waited upon. .PP A queue pins a certain amount of locked kernel memory (so that the kernel can write a notification into it from contexts where swapping cannot be performed), and so is subject to resource limit restrictions on .BR RLIMIT_MEMLOCK . .PP Sources must be attached to a queue manually; there's no single global event source, but rather a variety of sources, each of which can be attached to by multiple queues. Attachments can be set up by: .TP .BR keyctl_watch_key (3) Monitor a key or keyring for changes. .TP .BR device_notify (2) Monitor a global source of device events from USB and block devices, such as device detection, device removal and I/O errors. .PP Because a source can produce a lot of different events, not all of which may be of interest to the watcher, a filter can be set on a queue to determine whether a particular event will get inserted in a queue at the point of posting inside the kernel. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RING STRUCTURE .PP The ring buffer is divided into 8-byte slots and notification message occupies between 1 and 63 of those slots. Each message begins with a header of the form: .PP .in +4n .EX struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; .EE .in .PP Where .I type indicates the general class of notification, .I subtype indicates the specific type of notification within that class and .I info includes the length (in slots), the watcher's ID and some type-specific information. .PP Messages inserted into the buffer aren't allowed to split over the end of the buffer; instead a .I skip notification will be inserted to pad to the end of the buffer. A skip notification will have the type set to .B WATCH_TYPE_META and the subtype set to .BR WATCH_META_SKIP_NOTIFICATION , with the length indicating how much should be skipped. .PP To avoid the need for an extra page dedicated solely to metadata pointers, the first few slots are covered by a permanent skip notification and contain ring metadata including the pointers. The buffer has a 'header' of the form: .PP .in +4n .EX struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; }; .EE .in .PP This includes the ring indices, .IR head " and " tail , and a .I mask to mask them off with before use. When using the ring indices, the following precautions should be observed: .TP .B (1) .I head indicates where the kernel will insert the next message into the buffer. Only the kernel is allowed to change head. .TP .B (2) .I tail indicates where the next message for userspace to consume can be found; tail will never be changed by the kernel. .TP .B (3) An .IR acquire -class memory barrier must be used to read head. It is not necessary to use a memory barrier to read tail. .TP .B (4) The buffer is empty if tail == head. .TP .B (5) head and tail should not be masked off after increment, but rather left to wrap naturally; this means that the index must be masked off before being used to access the buffer. .TP .B (6) After consuming a message, the length (in slots) of the message should be added to tail and tail must not be then masked off. .TP .B (7) A .IR release -class memory barrier must be used to update .IR tail . .PP If the head and tail values become too far separated or head points to a forbidden area of the buffer, no further message insertion will take place and .IR poll () will flag .BR POLLERR . Otherwise, poll() will flag .BR POLLIN " and " POLLRDNORM if tail != head. .PP The ring as a whole is described by the following structure: .PP .in +4n .EX struct watch_queue_buffer { union { struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; } meta; struct watch_notification slots[0]; }; }; .EE .in .PP Where .I meta covers the slots holding the ring indices and other metadata. Note that the metadata may be extended in future. It's size can be determined by checking the length of the skip pseudo-message that covers it (see .IR meta.watch ). .PP In the event that the ring is full when the kernel needs to write in a notification, it will set .B WATCH_INFO_NOTIFICATIONS_LOST in .IR meta.watch.info to indicate an overrun. If the flag is noticed as being unset, the entire word can be simply cleared without bothering the kernel as the kernel doesn't ever read it. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH IOCTL COMMANDS The device has the following .IR ioctl () commands: .TP .B IOC_WATCH_QUEUE_SET_SIZE The ioctl argument is indicates the size of the buffer in pages and must be a power of two. This command allocates the memory to back the buffer. .IP This may only be done once and the buffer cannot be mmap'd until this command has been done. .TP .B IOC_WATCH_QUEUE_SET_FILTER This is used to set filters on the notifications that get written into the buffer. The ioctl argument points to a structure of the following form: .IP .in +4n .EX struct watch_notification_filter { __u32 nr_filters; __u32 __reserved; struct watch_notification_type_filter filters[]; }; .EE .in .IP Where .I nr_filters indicates the number of elements in the .IR filters [] array. Each element in the filters array specifies a filter and is of the following form: .IP .in +4n .EX struct watch_notification_type_filter { __u32 type; __u32 info_filter; __u32 info_mask; __u32 subtype_filter[8]; }; .EE .in .IP Where .I type refer to the type field in a notification record header, info_filter and info_mask refer to the info field and subtype_filter is a bit-mask of subtypes. .IP If no filters are installed, all notifications are allowed by default and if one or more filters are installed, notifications are disallowed by default. .IP A notifications matches a filter if, for notification N and filter F: .IP .in +4n .EX N->type == F->type && (F->subtype_filter[N->subtype >> 5] & (1U << (N->subtype & 31))) && (N->info & F->info_mask) == F->info_filter) .EE .in .IP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH EXAMPLE To use the notification mechanism, first of all the device has to be opened, the size must be set and the buffer mapped: .PP .in +4n .EX int wfd = open("/dev/watch_queue", O_RDWR); ioctl(wfd, IOC_WATCH_QUEUE_SET_SIZE, 1); struct watch_queue_buffer *buf = mmap(NULL, 1 * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, wfd, 0); .EE .in .PP From this point, the buffer is open for business. Filters can be set to restrict the notifications that get inserted into the buffer from the sources that are watched. For example: .PP .in +4n .EX static struct watch_notification_filter filter = { .nr_filters = 2, .__reserved = 0, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_KEY_LINKED, .info_filter = 1 << WATCH_INFO_FLAG_2, .info_mask = 1 << WATCH_INFO_FLAG_2, }, [1] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_USB_DEVICE_ADD, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); .EE .in .PP will only allow key-change notifications that indicate a key is linked into a keyring and then only if type-specific flag WATCH_INFO_FLAG_2 is set on the notification and will only allow USB device-add notifications, blocking other USB notifications and all block device notifications. .PP Sources can then be watched, for example: .PP .in +4n .EX keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, wfd, 0x33); watch_devices(wfd, 0x55, 0); .EE .in .PP The first places a watch on the process's session keyring, directing the notifications to the buffer we just created and specifying that they should be tagged with 0x33 in the info ID field. The second places a watch on the global device notifications queue, specifying that notifications from that should be tagged with info ID 0x55. .PP The device file descriptor can then be polled to find out when the kernel writes something into the buffer or if the ring indices become incoherent: .PP .in +4n .EX struct pollfd p[1]; p[0].fd = wfd; p[0].events = POLLIN | POLLERR; p[0].revents = 0; poll(p, 1, -1); .EE .in .PP When it is determined that there is something in the buffer, messages can be read out of the ring with something like the following: .PP .in +4n .EX struct watch_notification *n; unsigned int len, head, tail, mask = buf->meta.mask; while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), tail = buf->meta.tail, tail != head ) { n = &buf->slots[tail & mask]; len = n->info & WATCH_INFO_LENGTH; len >>= WATCH_INFO_LENGTH__SHIFT; if (len == 0) abort(); switch (n->type) { case WATCH_TYPE_META: switch (n->subtype) { case WATCH_META_REMOVAL_NOTIFICATION: saw_removal_notification(n); break; } break; case WATCH_TYPE_KEY_NOTIFY: saw_key_change(n); break; case WATCH_TYPE_USB_NOTIFY: saw_usb_event(n); break; } tail += len; __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); } .EE .in .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH SEE ALSO .ad l .nh .BR ioctl (2), .BR keyctl (1), .BR keyctl_watch_key (3), .BR poll (2), .BR setrlimit (2)
'\" t .\" Copyright (c) 2019 David Howells <dhowells@redhat.com> .\" .\" %%%LICENSE_START(VERBATIM) .\" Permission is granted to make and distribute verbatim copies of this .\" manual provided the copyright notice and this permission notice are .\" preserved on all copies. .\" .\" Permission is granted to copy and distribute modified versions of this .\" manual under the conditions for verbatim copying, provided that the .\" entire resulting derived work is distributed under the terms of a .\" permission notice identical to this one. .\" .\" Since the Linux kernel and libraries are constantly changing, this .\" manual page may be incorrect or out-of-date. The author(s) assume no .\" responsibility for errors or omissions, or for damages resulting from .\" the use of the information contained herein. The author(s) may not .\" have taken the same level of care in the production of this manual, .\" which is licensed free of charge, as they might when working .\" professionally. .\" .\" Formatted or processed versions of this manual, if unaccompanied by .\" the source, must acknowledge the copyright and authors of this work. .\" %%%LICENSE_END .\" .TH WATCH_DEVICES 2 2019-08-29 "Linux" "Linux Programmer's Manual" .SH NAME watch_devices \- Watch for global device notifications .SH SYNOPSIS .nf .B #include <linux/watch_queue.h> .br .B #include <unistd.h> .br .BI "int watch_devices(int " watch_fd ", int " watch_id ", unsigned int " flags ); .fi .PP .IR Note : There are no glibc wrappers for these system calls. .SH DESCRIPTION .PP .BR watch_devices () attaches a watch on the global device notification source to a previously opened and configured watch queue. See .BR watch_queue (7) for more information on how to set up and use those. .PP The global device notification source is provided with events from a number of sources, including block device errors and USB events. Each notification type has a specific format. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Block Device Notifications Events on block devices, such as I/O errors are posted to any watching queues. The message format is: .PP .in +4n .EX struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; }; .EE .in .PP The .I watch.type field will be set to .BR WATCH_TYPE_BLOCK_NOTIFY , the .I watch.subtype field will contain a constant that indicates the particular event that occurred and the watch_id passed to watch_devices() will be placed in .I watch.info in the ID field. .PP .I dev will contain the major and minor device numbers in .B dev_t form and .I sector will contain the first sector the notification pertains to. .PP The following events are defined: .PP .in +4n .TS lB l. NOTIFY_BLOCK_ERROR_TIMEOUT NOTIFY_BLOCK_ERROR_NO_SPACE NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT NOTIFY_BLOCK_ERROR_CRITICAL_TARGET NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM NOTIFY_BLOCK_ERROR_PROTECTION NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE NOTIFY_BLOCK_ERROR_IO .TE .in .PP All of which indicate error conditions. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS USB Device Notifications Events on USB devices, such as I/O errors are posted to any watching queues. The message format is: .PP .in +4n .EX struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; }; .EE .in .PP The .I watch.type field will be set to .BR WATCH_TYPE_USB_NOTIFY , the .I watch.subtype field will contain a constant that indicates the particular event that occurred and the watch_id passed to watch_devices() will be placed in .I watch.info in the ID field. .PP .IR name " and " name_len indicates the textual name of the USB device that originated the notification. The name will be truncated to .B USB_NOTIFICATION_MAX_NAME_LEN if it is longer than that. .PP The following subtypes are currently defined: .TP .B NOTIFY_USB_DEVICE_ADD A new USB device has been plugged in. .TP .B NOTIFY_USB_DEVICE_REMOVE A USB device has been unplugged. .TP .B NOTIFY_USB_BUS_ADD A new USB bus is now available. .TP .B NOTIFY_USB_BUS_REMOVE A USB bus has been removed. .TP .B NOTIFY_USB_DEVICE_RESET A USB device has been reset. .TP .B NOTIFY_USB_DEVICE_ERROR A USB device has generated an error; a suitable error code will have been placed in .IR error . .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RETURN VALUE On success, the function returns 0. On error, \-1 is returned, and .I errno is set appropriately. .SH ERRORS The following errors may be returned: .TP .B EBADF .I watch_fd is an invalid file descriptor. .TP .B EBADSLT The watch does not exist and so cannot be removed. .TP .B EBUSY The source is already attached to the watch device instance specified by .I watch_fd and so cannot be added. .TP .B EINVAL .I watch_fd does not refer to a watch_queue device file. .TP .B EINVAL .IR watch_fd " or " watch_id is out of range. .TP .B EINVAL Unsupported .I flags set. .TP .B ENOMEM Insufficient memory available to allocate a watch record. .TP .B EPERM The caller does not have the required privileges. .SH CONFORMING TO These functions are Linux-specific and should not be used in programs intended to be portable. .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH NOTES Glibc does not (yet) provide a wrapper for the .BR watch_devices "()" system call; call it using .BR syscall (2). .SH SEE ALSO .BR watch_queue (7)
.\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public License .\" as published by the Free Software Foundation; either version .\" 2 of the License, or (at your option) any later version. .\" .TH KEYCTL_GRANT_PERMISSION 3 "28 Aug 2019" Linux "Linux Key Management Calls" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME keyctl_watch_key \- Watch for changes to a key .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS .nf .B #include <keyutils.h> .sp .BI "long keyctl_watch_key(key_serial_t " key , .BI " int " watch_queue_fd .BI " int " watch_id ");" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH DESCRIPTION .BR keyctl_watch_key () sets or removes a watch on .IR key . .PP .I watch_id specifies the ID for a watch that will be included in notification messages. It can be between 0 and 255 to add a key; it should be -1 to remove a key. .PP .I watch_queue_fd is a file descriptor attached to a watch_queue device instance. Multiple openings of a device provide separate instances. Each device instance can only have one watch on any particular key. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Notification Record .PP Key-specific notification messages that the kernel emits into the buffer have the following format: .PP .in +4n .EX struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; }; .EE .in .PP The .I watch.type field will be set to .B WATCH_TYPE_KEY_NOTIFY and the .I watch.subtype field will contain one of the following constants, indicating the event that occurred and the watch_id passed to keyctl_watch_key() will be placed in .I watch.info in the ID field. The following events are defined: .TP .B NOTIFY_KEY_INSTANTIATED This indicates that a watched key got instantiated or negatively instantiated. .I key_id indicates the key that was instantiated and .I aux is unused. .TP .B NOTIFY_KEY_UPDATED This indicates that a watched key got updated or instantiated by update. .I key_id indicates the key that was updated and .I aux is unused. .TP .B NOTIFY_KEY_LINKED This indicates that a key got linked into a watched keyring. .I key_id indicates the keyring that was modified .I aux indicates the key that was added. .TP .B NOTIFY_KEY_UNLINKED This indicates that a key got unlinked from a watched keyring. .I key_id indicates the keyring that was modified .I aux indicates the key that was removed. .TP .B NOTIFY_KEY_CLEARED This indicates that a watched keyring got cleared. .I key_id indicates the keyring that was cleared and .I aux is unused. .TP .B NOTIFY_KEY_REVOKED This indicates that a watched key got revoked. .I key_id indicates the key that was revoked and .I aux is unused. .TP .B NOTIFY_KEY_INVALIDATED This indicates that a watched key got invalidated. .I key_id indicates the key that was invalidated and .I aux is unused. .TP .B NOTIFY_KEY_SETATTR This indicates that a watched key had its attributes (owner, group, permissions, timeout) modified. .I key_id indicates the key that was modified and .I aux is unused. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Removal Notification When a watched key is garbage collected, all of its watches are automatically destroyed and a notification is delivered to each watcher. This will normally be an extended notification of the form: .PP .in +4n .EX struct watch_notification_removal { struct watch_notification watch; __u64 id; }; .EE .in .PP The .I watch.type field will be set to .B WATCH_TYPE_META and the .I watch.subtype field will contain .BR WATCH_META_REMOVAL_NOTIFICATION . If the extended notification is given, then the length will be 2 units, otherwise it will be 1 and only the header will be present. .PP The watch_id passed to .IR keyctl_watch_key () will be placed in .I watch.info in the ID field. .PP If the extension is present, .I id will be set to the ID of the destroyed key. .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RETURN VALUE On success .BR keyctl_watch_key () returns .B 0 . On error, the value .B -1 will be returned and .I errno will have been set to an appropriate error. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH ERRORS .TP .B ENOKEY The specified key does not exist. .TP .B EKEYEXPIRED The specified key has expired. .TP .B EKEYREVOKED The specified key has been revoked. .TP .B EACCES The named key exists, but does not grant .B view permission to the calling process. .TP .B EBUSY The specified key already has a watch on it for that device instance (add only). .TP .B EBADSLT The specified key doesn't have a watch on it (removal only). .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH LINKING This is a library function that can be found in .IR libkeyutils . When linking, .B \-lkeyutils should be specified to the linker. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SEE ALSO .ad l .nh .BR keyctl (1), .BR add_key (2), .BR keyctl (2), .BR request_key (2), .BR keyctl (3), .BR keyrings (7), .BR keyutils (7)
On 8/30/2019 6:57 AM, David Howells wrote: > Here's a set of patches to add a general notification queue concept and to > add sources of events for: > > (1) Key/keyring events, such as creating, linking and removal of keys. > > (2) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > Tests for the key/keyring events can be found on the keyutils next branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next I'm having trouble with the "make install" on Fedora. Is there an unusual dependency? > > Notifications are done automatically inside of the testing infrastructure > on every change to that every test makes to a key or keyring. > > Manual pages can be found there also, including pages for watch_queue(7) > and the watch_devices(2) system call (these should be transferred to the > manpages package if taken upstream). > > LSM hooks are included: > > (1) A set of hooks are provided that allow an LSM to rule on whether or > not a watch may be set. Each of these hooks takes a different > "watched object" parameter, so they're not really shareable. The LSM > should use current's credentials. [Wanted by SELinux & Smack] > > (2) A hook is provided to allow an LSM to rule on whether or not a > particular message may be posted to a particular queue. This is given > the credentials from the event generator (which may be the system) and > the watch setter. [Wanted by Smack] > > I've provided a preliminary attempt to provide SELinux and Smack with > implementations of some of these hooks. > > > Design decisions: > > (1) A misc chardev is used to create and open a ring buffer: > > fd = open("/dev/watch_queue", O_RDWR); > > which is then configured and mmap'd into userspace: > > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > The fd cannot be read or written (though there is a facility to use > write to inject records for debugging) and userspace just pulls data > directly out of the buffer. > > (2) The ring index pointers are stored inside the ring and are thus > accessible to userspace. Userspace should only update the tail > pointer and never the head pointer or risk breaking the buffer. The > kernel checks that the pointers appear valid before trying to use > them. A 'skip' record is maintained around the pointers. > > (3) poll() can be used to wait for data to appear in the buffer. > > (4) Records in the buffer are binary, typed and have a length so that they > can be of varying size. > > This means that multiple heterogeneous sources can share a common > buffer. Tags may be specified when a watchpoint is created to help > distinguish the sources. > > (5) The queue is reusable as there are 16 million types available, of > which I've used just a few, so there is scope for others to be used. > > (6) Records are filterable as types have up to 256 subtypes that can be > individually filtered. Other filtration is also available. > > (7) Each time the buffer is opened, a new buffer is created - this means > that there's no interference between watchers. > > (8) When recording a notification, the kernel will not sleep, but will > rather mark a queue as overrun if there's insufficient space, thereby > avoiding userspace causing the kernel to hang. > > (9) The 'watchpoint' should be specific where possible, meaning that you > specify the object that you want to watch. > > (10) The buffer is created and then watchpoints are attached to it, using > one of: > > keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); > watch_devices(fd, 0x02, 0); > > where in both cases, fd indicates the queue and the number after is a > tag between 0 and 255. > > (11) The watch must be removed if either the watch buffer is destroyed or > the watched object is destroyed. > > > Things I want to avoid: > > (1) Introducing features that make the core VFS dependent on the network > stack or networking namespaces (ie. usage of netlink). > > (2) Dumping all this stuff into dmesg and having a daemon that sits there > parsing the output and distributing it as this then puts the > responsibility for security into userspace and makes handling > namespaces tricky. Further, dmesg might not exist or might be > inaccessible inside a container. > > (3) Letting users see events they shouldn't be able to see. > > > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core > > Changes: > > ver #7: > > (*) Removed the 'watch' argument from the security_watch_key() and > security_watch_devices() hooks as current_cred() can be used instead > of watch->cred. > > ver #6: > > (*) Fix mmap bug in watch_queue driver. > > (*) Add an extended removal notification that can transmit an identifier > to userspace (such as a key ID). > > (*) Don't produce a instantiation notification in mark_key_instantiated() > but rather do it in the caller to prevent key updates from producing > an instantiate notification as well as an update notification. > > (*) Set the right number of filters in the sample program. > > (*) Provide preliminary hook implementations for SELinux and Smack. > > ver #5: > > (*) Split the superblock watch and mount watch parts out into their own > branch (notifications-mount) as they really need certain fsinfo() > attributes. > > (*) Rearrange the watch notification UAPI header to push the length down > to bits 0-5 and remove the lost-message bits. The userspace's watch > ID tag is moved to bits 8-15 and then the message type is allocated > all of bits 16-31 for its own purposes. > > The lost-message bit is moved over to the header, rather than being > placed in the next message to be generated and given its own word so > it can be cleared with xchg(,0) for parisc. > > (*) The security_post_notification() hook is no longer called with the > spinlock held and softirqs disabled - though the RCU readlock is still > held. > > (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK > will skip the overuse check. > > (*) The buffer is marked VM_DONTEXPAND. > > (*) Save the watch-setter's creds in struct watch and give that to the LSM > hook for posting a message. > > ver #4: > > (*) Split the basic UAPI bits out into their own patch and then split the > LSM hooks out into an intermediate patch. Add LSM hooks for setting > watches. > > Rename the *_notify() system calls to watch_*() for consistency. > > ver #3: > > (*) I've added a USB notification source and reformulated the block > notification source so that there's now a common watch list, for which > the system call is now device_notify(). > > I've assigned a pair of unused ioctl numbers in the 'W' series to the > ioctls added by this series. > > I've also added a description of the kernel API to the documentation. > > ver #2: > > (*) I've fixed various issues raised by Jann Horn and GregKH and moved to > krefs for refcounting. I've added some security features to try and > give Casey Schaufler the LSM control he wants. > > David > --- > David Howells (11): > uapi: General notification ring definitions > security: Add hooks to rule on setting a watch > security: Add a hook for the point of notification insertion > General notification queue with user mmap()'able ring buffer > keys: Add a notification facility > Add a general, global device notification watch list > block: Add block layer notifications > usb: Add USB subsystem notifications > Add sample notification program > selinux: Implement the watch_key security hook > smack: Implement the watch_key and post_notification hooks [untested] > > > Documentation/ioctl/ioctl-number.rst | 1 > Documentation/security/keys/core.rst | 58 ++ > Documentation/watch_queue.rst | 460 ++++++++++++++ > arch/alpha/kernel/syscalls/syscall.tbl | 1 > arch/arm/tools/syscall.tbl | 1 > arch/ia64/kernel/syscalls/syscall.tbl | 1 > arch/m68k/kernel/syscalls/syscall.tbl | 1 > arch/microblaze/kernel/syscalls/syscall.tbl | 1 > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 > arch/parisc/kernel/syscalls/syscall.tbl | 1 > arch/powerpc/kernel/syscalls/syscall.tbl | 1 > arch/s390/kernel/syscalls/syscall.tbl | 1 > arch/sh/kernel/syscalls/syscall.tbl | 1 > arch/sparc/kernel/syscalls/syscall.tbl | 1 > arch/x86/entry/syscalls/syscall_32.tbl | 1 > arch/x86/entry/syscalls/syscall_64.tbl | 1 > arch/xtensa/kernel/syscalls/syscall.tbl | 1 > block/Kconfig | 9 > block/blk-core.c | 29 + > drivers/base/Kconfig | 9 > drivers/base/Makefile | 1 > drivers/base/watch.c | 90 +++ > drivers/misc/Kconfig | 13 > drivers/misc/Makefile | 1 > drivers/misc/watch_queue.c | 893 +++++++++++++++++++++++++++ > drivers/usb/core/Kconfig | 9 > drivers/usb/core/devio.c | 56 ++ > drivers/usb/core/hub.c | 4 > include/linux/blkdev.h | 15 > include/linux/device.h | 7 > include/linux/key.h | 3 > include/linux/lsm_audit.h | 1 > include/linux/lsm_hooks.h | 38 + > include/linux/security.h | 32 + > include/linux/syscalls.h | 1 > include/linux/usb.h | 18 + > include/linux/watch_queue.h | 94 +++ > include/uapi/asm-generic/unistd.h | 4 > include/uapi/linux/keyctl.h | 2 > include/uapi/linux/watch_queue.h | 183 ++++++ > kernel/sys_ni.c | 1 > samples/Kconfig | 6 > samples/Makefile | 1 > samples/watch_queue/Makefile | 8 > samples/watch_queue/watch_test.c | 233 +++++++ > security/keys/Kconfig | 9 > security/keys/compat.c | 3 > security/keys/gc.c | 5 > security/keys/internal.h | 30 + > security/keys/key.c | 38 + > security/keys/keyctl.c | 99 +++ > security/keys/keyring.c | 20 - > security/keys/request_key.c | 4 > security/security.c | 23 + > security/selinux/hooks.c | 14 > security/smack/smack_lsm.c | 82 ++ > 58 files changed, 2593 insertions(+), 30 deletions(-) > create mode 100644 Documentation/watch_queue.rst > create mode 100644 drivers/base/watch.c > create mode 100644 drivers/misc/watch_queue.c > create mode 100644 include/linux/watch_queue.h > create mode 100644 include/uapi/linux/watch_queue.h > create mode 100644 samples/watch_queue/Makefile > create mode 100644 samples/watch_queue/watch_test.c >
Casey Schaufler <casey@schaufler-ca.com> wrote: > > Tests for the key/keyring events can be found on the keyutils next branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next > > I'm having trouble with the "make install" on Fedora. Is there an > unusual dependency? What's the symptom you're seeing? Is it this: install -D -m 0644 libkeyutils.a /tmp/opt/lib64 libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007f7dcbf6d000)/libkeyutils.a /bin/sh: -c: line 0: syntax error near unexpected token `(' /bin/sh: -c: line 0: `install -D -m 0644 libkeyutils.a /tmp/opt/lib64 libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007f7dcbf6d000)/libkeyutils.a' David
Casey Schaufler <casey@schaufler-ca.com> wrote: > > Tests for the key/keyring events can be found on the keyutils next branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next > > I'm having trouble with the "make install" on Fedora. Is there an > unusual dependency? I've pushed a couple of patches to my next branch. Do "make install" and "make rpm" now work for you? David
Hillf Danton <hdanton@sina.com> wrote: > > + smp_store_release(&buf->meta.head, head); > > Add a line of comment for the paring smp_load_acquire(). > I did not find it in 04/11. You won't find smp_load_acquire() - it's not in the kernel, though if you look in the sample, you'll find the corresponding barrier in userspace. Note that there's a further implicit barrier you don't see. I've added the comments: /* Barrier against userspace, ordering data read before tail read */ ring_tail = READ_ONCE(buf->meta.tail); and: /* Barrier against userspace, ordering head update after data write. */ smp_store_release(&buf->meta.head, head); David
Hillf Danton <hdanton@sina.com> wrote: > > + for (i = 0; i < wf->nr_filters; i++) { > > + wt = &wf->filters[i]; > > + if (n->type == wt->type && > > + (wt->subtype_filter[n->subtype >> 5] & > > + (1U << (n->subtype & 31))) && > > Replace the pure numbers with something easier to understand. How about the following: static bool filter_watch_notification(const struct watch_filter *wf, const struct watch_notification *n) { const struct watch_type_filter *wt; unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; unsigned int st_index = n->subtype / st_bits; unsigned int st_bit = 1U << (n->subtype % st_bits); int i; if (!test_bit(n->type, wf->type_filter)) return false; for (i = 0; i < wf->nr_filters; i++) { wt = &wf->filters[i]; if (n->type == wt->type && (wt->subtype_filter[st_index] & st_bit) && (n->info & wt->info_mask) == wt->info_filter) return true; } return false; /* If there is a filter, the default is to reject. */ } David