Message ID | 20161216030328.11602-4-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 15, 2016 at 7:03 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > -static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash) > - __aligned(sizeof(unsigned long)); > +static DEFINE_PER_CPU(u64, get_random_int_chaining); > [...] > unsigned long get_random_long(void) > { > - __u32 *hash; > unsigned long ret; > + u64 *chaining; > > if (arch_get_random_long(&ret)) > return ret; > > - hash = get_cpu_var(get_random_int_hash); > - > - hash[0] += current->pid + jiffies + random_get_entropy(); > - md5_transform(hash, random_int_secret); > - ret = *(unsigned long *)hash; > - put_cpu_var(get_random_int_hash); > - > + chaining = &get_cpu_var(get_random_int_chaining); > + ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() + > + current->pid, random_int_secret); > + put_cpu_var(get_random_int_chaining); > return ret; > } I think it would be nice to try to strenghen the PRNG construction. FWIW, I'm not an expert in PRNGs, and there's fairly extensive literature, but I can at least try. Here are some properties I'd like: 1. A one-time leak of memory contents doesn't ruin security until reboot. This is especially value across suspend and/or hibernation. 2. An attack with a low work factor (2^64?) shouldn't break the scheme until reboot. This is effectively doing: output = H(prev_output, weak "entropy", per-boot secret); One unfortunately downside is that, if used in a context where an attacker can see a single output, the attacker learns the chaining value. If the attacker can guess the entropy, then, with 2^64 work, they learn the secret, and they can predict future outputs. I would advocate adding two types of improvements. First, re-seed it every now and then (every 128 calls?) by just replacing both the chaining value and the percpu secret with fresh CSPRNG output. Second, change the mode so that an attacker doesn't learn so much internal state. For example: output = H(old_chain, entropy, secret); new_chain = old_chain + entropy + output; This increases the effort needed to brute-force the internal state from 2^64 to 2^128 (barring any weaknesses in the scheme). Also, can we not call this get_random_int()? get_random_int() sounds too much like get_random_bytes(), and the latter is intended to be a real CSPRNG. Can we call it get_weak_random_int() or similar? --Andy
diff --git a/drivers/char/random.c b/drivers/char/random.c index d6876d506220..a51f0ff43f00 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -262,6 +262,7 @@ #include <linux/syscalls.h> #include <linux/completion.h> #include <linux/uuid.h> +#include <linux/siphash.h> #include <crypto/chacha20.h> #include <asm/processor.h> @@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = { }; #endif /* CONFIG_SYSCTL */ -static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned; +static siphash_key_t random_int_secret; int random_int_secret_init(void) { @@ -2050,8 +2051,7 @@ int random_int_secret_init(void) return 0; } -static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash) - __aligned(sizeof(unsigned long)); +static DEFINE_PER_CPU(u64, get_random_int_chaining); /* * Get a random word for internal kernel use only. Similar to urandom but @@ -2061,19 +2061,16 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash) */ unsigned int get_random_int(void) { - __u32 *hash; unsigned int ret; + u64 *chaining; if (arch_get_random_int(&ret)) return ret; - hash = get_cpu_var(get_random_int_hash); - - hash[0] += current->pid + jiffies + random_get_entropy(); - md5_transform(hash, random_int_secret); - ret = hash[0]; - put_cpu_var(get_random_int_hash); - + chaining = &get_cpu_var(get_random_int_chaining); + ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() + + current->pid, random_int_secret); + put_cpu_var(get_random_int_chaining); return ret; } EXPORT_SYMBOL(get_random_int); @@ -2083,19 +2080,16 @@ EXPORT_SYMBOL(get_random_int); */ unsigned long get_random_long(void) { - __u32 *hash; unsigned long ret; + u64 *chaining; if (arch_get_random_long(&ret)) return ret; - hash = get_cpu_var(get_random_int_hash); - - hash[0] += current->pid + jiffies + random_get_entropy(); - md5_transform(hash, random_int_secret); - ret = *(unsigned long *)hash; - put_cpu_var(get_random_int_hash); - + chaining = &get_cpu_var(get_random_int_chaining); + ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() + + current->pid, random_int_secret); + put_cpu_var(get_random_int_chaining); return ret; } EXPORT_SYMBOL(get_random_long);
This duplicates the current algorithm for get_random_int/long, but uses siphash instead. This comes with several benefits. It's certainly faster and more cryptographically secure than MD5. This patch also separates hashed fields into three values instead of one, in order to increase diffusion. The previous MD5 algorithm used a per-cpu MD5 state, which caused successive calls to the function to chain upon each other. While it's not entirely clear that this kind of chaining is absolutely necessary when using a secure PRF like siphash, it can't hurt, and the timing of the call chain does add a degree of natural entropy. So, in keeping with this design, instead of the massive per-cpu 64-byte MD5 state, there is instead a per-cpu previously returned value for chaining. The speed benefits are substantial: | siphash | md5 | speedup | ------------------------------ get_random_long | 137130 | 415983 | 3.03x | get_random_int | 86384 | 343323 | 3.97x | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Ted Tso <tytso@mit.edu> --- drivers/char/random.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-)