Message ID | 20230823090107.65749-2-bchalios@amazon.es (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Herbert Xu |
Headers | show |
Series | Propagating reseed notifications to user space | expand |
On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: > Sometimes, PRNGs need to reseed. For example, on a regular timer > interval, to ensure nothing consumes a random value for longer than e.g. > 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to > clones. > > The notification happens through a 32bit epoch value that changes every > time cached entropy is no longer valid, hence PRNGs need to reseed. User > space applications can get hold of a pointer to this value through > /dev/(u)random. We introduce a new ioctl() that returns an anonymous > file descriptor. From this file descriptor we can mmap() a single page > which includes the epoch at offset 0. > > random.c maintains the epoch value in a global shared page. It exposes > a registration API for kernel subsystems that are able to notify when > reseeding is needed. Notifiers register with random.c and receive a > unique 8bit ID and a pointer to the epoch. When they need to report a > reseeding event they write a new epoch value which includes the > notifier ID in the first 8 bits and an increasing counter value in the > remaining 24 bits: > > RNG epoch > *-------------*---------------------* > | notifier id | epoch counter value | > *-------------*---------------------* > 8 bits 24 bits Why not just use 32/32 for a full 64bit value, or better yet, 2 different variables? Why is 32bits and packing things together here somehow simpler? thanks, greg k-h
Hi Greg, On 23/8/23 11:08, Greg KH wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: >> Sometimes, PRNGs need to reseed. For example, on a regular timer >> interval, to ensure nothing consumes a random value for longer than e.g. >> 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to >> clones. >> >> The notification happens through a 32bit epoch value that changes every >> time cached entropy is no longer valid, hence PRNGs need to reseed. User >> space applications can get hold of a pointer to this value through >> /dev/(u)random. We introduce a new ioctl() that returns an anonymous >> file descriptor. From this file descriptor we can mmap() a single page >> which includes the epoch at offset 0. >> >> random.c maintains the epoch value in a global shared page. It exposes >> a registration API for kernel subsystems that are able to notify when >> reseeding is needed. Notifiers register with random.c and receive a >> unique 8bit ID and a pointer to the epoch. When they need to report a >> reseeding event they write a new epoch value which includes the >> notifier ID in the first 8 bits and an increasing counter value in the >> remaining 24 bits: >> >> RNG epoch >> *-------------*---------------------* >> | notifier id | epoch counter value | >> *-------------*---------------------* >> 8 bits 24 bits > Why not just use 32/32 for a full 64bit value, or better yet, 2 > different variables? Why is 32bits and packing things together here > somehow simpler? We made it 32 bits so that we can read/write it atomically in all 32bit architectures. Do you think that's not a problem? Cheers, Babis
On Wed, Aug 23, 2023 at 11:27:11AM +0200, Babis Chalios wrote: > Hi Greg, > > On 23/8/23 11:08, Greg KH wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: > > > Sometimes, PRNGs need to reseed. For example, on a regular timer > > > interval, to ensure nothing consumes a random value for longer than e.g. > > > 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to > > > clones. > > > > > > The notification happens through a 32bit epoch value that changes every > > > time cached entropy is no longer valid, hence PRNGs need to reseed. User > > > space applications can get hold of a pointer to this value through > > > /dev/(u)random. We introduce a new ioctl() that returns an anonymous > > > file descriptor. From this file descriptor we can mmap() a single page > > > which includes the epoch at offset 0. > > > > > > random.c maintains the epoch value in a global shared page. It exposes > > > a registration API for kernel subsystems that are able to notify when > > > reseeding is needed. Notifiers register with random.c and receive a > > > unique 8bit ID and a pointer to the epoch. When they need to report a > > > reseeding event they write a new epoch value which includes the > > > notifier ID in the first 8 bits and an increasing counter value in the > > > remaining 24 bits: > > > > > > RNG epoch > > > *-------------*---------------------* > > > | notifier id | epoch counter value | > > > *-------------*---------------------* > > > 8 bits 24 bits > > Why not just use 32/32 for a full 64bit value, or better yet, 2 > > different variables? Why is 32bits and packing things together here > > somehow simpler? > > We made it 32 bits so that we can read/write it atomically in all 32bit > architectures. > Do you think that's not a problem? What 32bit platforms care about this type of interface at all? thanks, greg k-h
Hey Greg, On 23.08.23 12:08, Babis Chalios wrote: > > > On 23/8/23 12:06, Greg KH wrote: >> >> On Wed, Aug 23, 2023 at 11:27:11AM +0200, Babis Chalios wrote: >>> Hi Greg, >>> >>> On 23/8/23 11:08, Greg KH wrote: >>>> >>>> On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: >>>>> Sometimes, PRNGs need to reseed. For example, on a regular timer >>>>> interval, to ensure nothing consumes a random value for longer than e.g. >>>>> 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to >>>>> clones. >>>>> >>>>> The notification happens through a 32bit epoch value that changes every >>>>> time cached entropy is no longer valid, hence PRNGs need to reseed. User >>>>> space applications can get hold of a pointer to this value through >>>>> /dev/(u)random. We introduce a new ioctl() that returns an anonymous >>>>> file descriptor. From this file descriptor we can mmap() a single page >>>>> which includes the epoch at offset 0. >>>>> >>>>> random.c maintains the epoch value in a global shared page. It exposes >>>>> a registration API for kernel subsystems that are able to notify when >>>>> reseeding is needed. Notifiers register with random.c and receive a >>>>> unique 8bit ID and a pointer to the epoch. When they need to report a >>>>> reseeding event they write a new epoch value which includes the >>>>> notifier ID in the first 8 bits and an increasing counter value in the >>>>> remaining 24 bits: >>>>> >>>>> RNG epoch >>>>> *-------------*---------------------* >>>>> | notifier id | epoch counter value | >>>>> *-------------*---------------------* >>>>> 8 bits 24 bits >>>> Why not just use 32/32 for a full 64bit value, or better yet, 2 >>>> different variables? Why is 32bits and packing things together here >>>> somehow simpler? >>> We made it 32 bits so that we can read/write it atomically in all 32bit >>> architectures. >>> Do you think that's not a problem? >> What 32bit platforms care about this type of interface at all? > > I think, any 32bit platform that gets random bytes from the kernel. We're building an ABI here that generically propagates an atomic way for user space to learn about an "rng epoch". Since there are 32bit user space applications out there whose executing VMs can be cloned or that want to learn about regular epoch changes (i386, arm32, 32bit riscv, etc), we need to make sure we have a viable way for them to consume the ABI as well. This applies to 32bit user space - the kernel may as well run 64bit like a typical aarch64 setup today. We could of course build this ABI with a "long" notion in mind. But then you would get an ioctl to a kernel data structure that is 64bit, even with CONFIG_COMPAT. So now we'd have to build wrappers and maintain 2 structures for 32bit and 64bit user space and everything would become super complicated. These events won't happen very often. 24bits is very likely easily sufficient to not ever race between someone setting a new epoch and someone reading the epoch value. So by keeping it 32bit, we make it guaranteed atomic on all targets and completely remove any CONFIG_COMPAT woes. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Wed, Aug 23, 2023 at 12:08:35PM +0200, Babis Chalios wrote: > > > On 23/8/23 12:06, Greg KH wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > On Wed, Aug 23, 2023 at 11:27:11AM +0200, Babis Chalios wrote: > > > Hi Greg, > > > > > > On 23/8/23 11:08, Greg KH wrote: > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > > > > > > > > > On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: > > > > > Sometimes, PRNGs need to reseed. For example, on a regular timer > > > > > interval, to ensure nothing consumes a random value for longer than e.g. > > > > > 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to > > > > > clones. > > > > > > > > > > The notification happens through a 32bit epoch value that changes every > > > > > time cached entropy is no longer valid, hence PRNGs need to reseed. User > > > > > space applications can get hold of a pointer to this value through > > > > > /dev/(u)random. We introduce a new ioctl() that returns an anonymous > > > > > file descriptor. From this file descriptor we can mmap() a single page > > > > > which includes the epoch at offset 0. > > > > > > > > > > random.c maintains the epoch value in a global shared page. It exposes > > > > > a registration API for kernel subsystems that are able to notify when > > > > > reseeding is needed. Notifiers register with random.c and receive a > > > > > unique 8bit ID and a pointer to the epoch. When they need to report a > > > > > reseeding event they write a new epoch value which includes the > > > > > notifier ID in the first 8 bits and an increasing counter value in the > > > > > remaining 24 bits: > > > > > > > > > > RNG epoch > > > > > *-------------*---------------------* > > > > > | notifier id | epoch counter value | > > > > > *-------------*---------------------* > > > > > 8 bits 24 bits > > > > Why not just use 32/32 for a full 64bit value, or better yet, 2 > > > > different variables? Why is 32bits and packing things together here > > > > somehow simpler? > > > We made it 32 bits so that we can read/write it atomically in all 32bit > > > architectures. > > > Do you think that's not a problem? > > What 32bit platforms care about this type of interface at all? > > I think, any 32bit platform that gets random bytes from the kernel. You are making a new api, for some new functionality, for what I thought was virtual machines (hence the virtio driver), none of which work in a 32bit system. I thought this was an ioctl for userspace, which can handle 64bits at once (or 2 32bit numbers). For internal kernel stuff, a lock should be fine, or better yet, a 64bit atomic value read (horrible on 32bit platforms, I know...) Just asking, it feels odd to pack bits in these days for when 90% of the cpus really don't need it. greg k-h
On 23.08.23 12:25, Greg KH wrote: > On Wed, Aug 23, 2023 at 12:08:35PM +0200, Babis Chalios wrote: >> >> On 23/8/23 12:06, Greg KH wrote: >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>> >>> >>> >>> On Wed, Aug 23, 2023 at 11:27:11AM +0200, Babis Chalios wrote: >>>> Hi Greg, >>>> >>>> On 23/8/23 11:08, Greg KH wrote: >>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>>>> >>>>> >>>>> >>>>> On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: >>>>>> Sometimes, PRNGs need to reseed. For example, on a regular timer >>>>>> interval, to ensure nothing consumes a random value for longer than e.g. >>>>>> 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to >>>>>> clones. >>>>>> >>>>>> The notification happens through a 32bit epoch value that changes every >>>>>> time cached entropy is no longer valid, hence PRNGs need to reseed. User >>>>>> space applications can get hold of a pointer to this value through >>>>>> /dev/(u)random. We introduce a new ioctl() that returns an anonymous >>>>>> file descriptor. From this file descriptor we can mmap() a single page >>>>>> which includes the epoch at offset 0. >>>>>> >>>>>> random.c maintains the epoch value in a global shared page. It exposes >>>>>> a registration API for kernel subsystems that are able to notify when >>>>>> reseeding is needed. Notifiers register with random.c and receive a >>>>>> unique 8bit ID and a pointer to the epoch. When they need to report a >>>>>> reseeding event they write a new epoch value which includes the >>>>>> notifier ID in the first 8 bits and an increasing counter value in the >>>>>> remaining 24 bits: >>>>>> >>>>>> RNG epoch >>>>>> *-------------*---------------------* >>>>>> | notifier id | epoch counter value | >>>>>> *-------------*---------------------* >>>>>> 8 bits 24 bits >>>>> Why not just use 32/32 for a full 64bit value, or better yet, 2 >>>>> different variables? Why is 32bits and packing things together here >>>>> somehow simpler? >>>> We made it 32 bits so that we can read/write it atomically in all 32bit >>>> architectures. >>>> Do you think that's not a problem? >>> What 32bit platforms care about this type of interface at all? >> I think, any 32bit platform that gets random bytes from the kernel. > You are making a new api, for some new functionality, for what I thought > was virtual machines (hence the virtio driver), none of which work in a > 32bit system. There should be 2 use cases of this that I'm aware of: * Virtual machine clones. Most 64bit VMs can execute 32bit user space. * Bare metal rng time-to-live. An easy mechanism to tell every PRNG in the system to reseed every 5 minutes. This applies to all architectures Linux supports. > I thought this was an ioctl for userspace, which can handle 64bits at > once (or 2 32bit numbers). The ioctl is only to create a file descriptor that you can use to mmap() a shared page between kernel and user space which you can then atomically access to understand if you're in the old epoch (keep using previous RNG values) or in the new epoch (discard any old cached RNG values). > For internal kernel stuff, a lock should be fine, or better yet, a 64bit > atomic value read (horrible on 32bit platforms, I know...) > > Just asking, it feels odd to pack bits in these days for when 90% of the > cpus really don't need it. I agree, but we're really not really bit constrained for this value and by making it 32bit always, we can guarantee that there will never be muckery for 32-on-64 compatibility. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
diff --git a/drivers/char/random.c b/drivers/char/random.c index 3cb37760dfec..72b524099b60 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -54,6 +54,8 @@ #include <linux/suspend.h> #include <linux/siphash.h> #include <linux/sched/isolation.h> +#include "linux/anon_inodes.h" +#include "linux/bitmap.h" #include <crypto/chacha.h> #include <crypto/blake2s.h> #include <asm/archrandom.h> @@ -206,6 +208,7 @@ enum { static struct { u8 key[CHACHA_KEY_SIZE] __aligned(__alignof__(long)); unsigned long generation; + u32 cached_epoch; spinlock_t lock; } base_crng = { .lock = __SPIN_LOCK_UNLOCKED(base_crng.lock) @@ -242,6 +245,138 @@ static unsigned int crng_reseed_interval(void) return CRNG_RESEED_INTERVAL; } +/* + * Tracking moments in time that PRNGs (ours and user-space) need to reseed + * due to an "entropy leak". + * + * We call the time period between two "entropy leak" events an "epoch". + * Epoch is a 32-bit unsigned value that lives in a dedicated global page. + * Systems that want to report entropy leaks will get an 1-byte notifier id + * (up to 256 notifiers) and the address of the epoch. + * + * Each notifier will write epochs in the form: + * + * 1 byte 3 bytes + * +---------------+-------------------------------+ + * | notifier id | next epoch counter value | + * +---------------+-------------------------------+ + * + * This way, epochs are namespaced per notifier, so no two different + * notifiers will ever write the same epoch value. + */ + +static struct { + struct rand_epoch_data *epoch; + DECLARE_BITMAP(notifiers, RNG_EPOCH_NOTIFIER_NR_BITS); + spinlock_t lock; +} epoch_data = { + .lock = __SPIN_LOCK_UNLOCKED(epoch_data.lock), +}; + +static int epoch_mmap(struct file *filep, struct vm_area_struct *vma) +{ + if (vma->vm_pgoff || vma_pages(vma) > 1) + return -EINVAL; + + if (vma->vm_flags & VM_WRITE) + return -EPERM; + + /* Don't allow growing the region with mremap(). */ + vm_flags_set(vma, VM_DONTEXPAND); + /* Don't allow mprotect() to make this writeable in the future */ + vm_flags_clear(vma, VM_MAYWRITE); + + return vm_insert_page(vma, vma->vm_start, virt_to_page(epoch_data.epoch)); +} + +static const struct file_operations rng_epoch_fops = { + .mmap = epoch_mmap, + .llseek = noop_llseek, +}; + +static int create_epoch_fd(void) +{ + unsigned long flags; + int ret = -ENOTTY; + + spin_lock_irqsave(&epoch_data.lock, flags); + if (bitmap_empty(epoch_data.notifiers, RNG_EPOCH_NOTIFIER_NR_BITS)) + goto out; + spin_unlock_irqrestore(&epoch_data.lock, flags); + + return anon_inode_getfd("rand:epoch", &rng_epoch_fops, &epoch_data, O_RDONLY | O_CLOEXEC); +out: + spin_unlock_irqrestore(&epoch_data.lock, flags); + return ret; +} + +/* + * Get the current epoch. If nobody has subscribed, this will always return 0. + */ +static unsigned long get_epoch(void) +{ + u32 epoch = 0; + + if (likely(epoch_data.epoch)) + epoch = epoch_data.epoch->data; + + return epoch; +} + +/* + * Register an epoch notifier + * + * Allocate a notifier ID and provide the address to the epoch. If the address + * has not being allocated yet (this is the first call to register a notifier) + * this will allocate the page holding the epoch. If we have reached the limit + * of notifiers it will fail. + */ +int rng_register_epoch_notifier(struct rng_epoch_notifier *notifier) +{ + unsigned long flags; + u8 new_id; + + if (!notifier) + return -EINVAL; + + spin_lock_irqsave(&epoch_data.lock, flags); + new_id = bitmap_find_free_region(epoch_data.notifiers, RNG_EPOCH_NOTIFIER_NR_BITS, 0); + if (new_id < 0) + goto err_no_id; + spin_unlock_irqrestore(&epoch_data.lock, flags); + + notifier->id = new_id; + notifier->epoch = epoch_data.epoch; + return 0; + +err_no_id: + spin_unlock_irqrestore(&epoch_data.lock, flags); + return -ENOMEM; +} +EXPORT_SYMBOL_GPL(rng_register_epoch_notifier); + +/* + * Unregister an epoch notifier + * + * This will release the notifier ID previously allocated through + * `rng_register_epoch_notifier`. + */ +int rng_unregister_epoch_notifier(struct rng_epoch_notifier *notifier) +{ + unsigned long flags; + + if (!notifier) + return -EINVAL; + + spin_lock_irqsave(&epoch_data.lock, flags); + bitmap_clear(epoch_data.notifiers, notifier->id, 1); + spin_unlock_irqrestore(&epoch_data.lock, flags); + + notifier->epoch = NULL; + return 0; +} +EXPORT_SYMBOL_GPL(rng_unregister_epoch_notifier); + /* Used by crng_reseed() and crng_make_state() to extract a new seed from the input pool. */ static void extract_entropy(void *buf, size_t len); @@ -344,6 +479,14 @@ static void crng_make_state(u32 chacha_state[CHACHA_STATE_WORDS], return; } + /* + * If the epoch has changed we reseed. + */ + if (unlikely(READ_ONCE(base_crng.cached_epoch) != get_epoch())) { + WRITE_ONCE(base_crng.cached_epoch, get_epoch()); + crng_reseed(NULL); + } + local_lock_irqsave(&crngs.lock, flags); crng = raw_cpu_ptr(&crngs); @@ -888,6 +1031,8 @@ void __init random_init(void) _mix_pool_bytes(&entropy, sizeof(entropy)); add_latent_entropy(); + epoch_data.epoch = (struct rand_epoch_data *)get_zeroed_page(GFP_KERNEL); + /* * If we were initialized by the cpu or bootloader before jump labels * are initialized, then we should enable the static branch here, where @@ -1528,6 +1673,8 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg) return -ENODATA; crng_reseed(NULL); return 0; + case RNDEPOCH: + return create_epoch_fd(); default: return -EINVAL; } diff --git a/include/linux/random.h b/include/linux/random.h index b0a940af4fff..0fdacf4ee8aa 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -161,4 +161,32 @@ int random_online_cpu(unsigned int cpu); extern const struct file_operations random_fops, urandom_fops; #endif + +/* + * Constants that define the format of the epoch value. + * + * Currently we use a 8/24 split for epoch values. The lower 24 bits are used + * for the epoch counter and the 8 remaining are used for the notifier ID. + */ +#define RNG_EPOCH_NOTIFIER_NR_BITS 8 +#define RNG_EPOCH_COUNTER_SHIFT 0 +#define RNG_EPOCH_COUNTER_MASK GENMASK(23, 0) +#define RNG_EPOCH_ID_SHIFT 24 +#define RNG_EPOCH_ID_MASK GENMASK(31, 24) + +/* + * An epoch notifier is a system that can report entropy leak events. + * Notifiers receive a unique identifier and the address where they will write + * a new epoch when an entropy leak happens. + */ +struct rng_epoch_notifier { + /* unique ID of the notifier */ + u8 id; + /* pointer to epoch data */ + struct rand_epoch_data *epoch; +}; + +int rng_register_epoch_notifier(struct rng_epoch_notifier *notifier); +int rng_unregister_epoch_notifier(struct rng_epoch_notifier *notifier); + #endif /* _LINUX_RANDOM_H */ diff --git a/include/uapi/linux/random.h b/include/uapi/linux/random.h index e744c23582eb..f79d93820bdd 100644 --- a/include/uapi/linux/random.h +++ b/include/uapi/linux/random.h @@ -38,6 +38,9 @@ /* Reseed CRNG. (Superuser only.) */ #define RNDRESEEDCRNG _IO( 'R', 0x07 ) +/* Get a file descriptor for the RNG generation page. */ +#define RNDEPOCH _IO('R', 0x08) + struct rand_pool_info { int entropy_count; int buf_size; @@ -55,4 +58,12 @@ struct rand_pool_info { #define GRND_RANDOM 0x0002 #define GRND_INSECURE 0x0004 +/* + * The epoch type exposed through /dev/(u)random to notify user-space + * PRNGs that need to re-seed + */ +struct rand_epoch_data { + __u32 data; +}; + #endif /* _UAPI_LINUX_RANDOM_H */
Sometimes, PRNGs need to reseed. For example, on a regular timer interval, to ensure nothing consumes a random value for longer than e.g. 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to clones. The notification happens through a 32bit epoch value that changes every time cached entropy is no longer valid, hence PRNGs need to reseed. User space applications can get hold of a pointer to this value through /dev/(u)random. We introduce a new ioctl() that returns an anonymous file descriptor. From this file descriptor we can mmap() a single page which includes the epoch at offset 0. random.c maintains the epoch value in a global shared page. It exposes a registration API for kernel subsystems that are able to notify when reseeding is needed. Notifiers register with random.c and receive a unique 8bit ID and a pointer to the epoch. When they need to report a reseeding event they write a new epoch value which includes the notifier ID in the first 8 bits and an increasing counter value in the remaining 24 bits: RNG epoch *-------------*---------------------* | notifier id | epoch counter value | *-------------*---------------------* 8 bits 24 bits Like this, different notifiers always write different values in the epoch. Signed-off-by: Babis Chalios <bchalios@amazon.es> --- drivers/char/random.c | 147 ++++++++++++++++++++++++++++++++++++ include/linux/random.h | 28 +++++++ include/uapi/linux/random.h | 11 +++ 3 files changed, 186 insertions(+)