Message ID | 20161214163731.luj2dzmnihcuhn5p@thunk.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hey Ted, On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote: > One somewhat undesirable aspect of the current algorithm is that we > never change random_int_secret. Why exactly would this be a problem? So long as the secret is kept secret, the PRF is secure. If an attacker can read arbitrary kernel memory, there are much much bigger issues to be concerned about. As well, the "chaining" variable I introduce ensures that the random numbers are, per-cpu, related to the uniqueness of timing of subsequent calls. > So I've been toying with the > following, which is 4 times faster than md5. (I haven't tried > benchmarking against siphash yet.) > > [ 3.606139] random benchmark!! > [ 3.606276] get_random_int # cycles: 326578 > [ 3.606317] get_random_int_new # cycles: 95438 > [ 3.607423] get_random_bytes # cycles: 2653388 Cool, I'll benchmark it against the siphash implementation. I like what you did with batching up lots of chacha output, and doling it out bit by bit. I suspect this will be quite fast, because with chacha20 you get an entire block. > P.S. It's interesting to note that siphash24 and chacha20 are both > add-rotate-xor based algorithms. Quite! Lots of nice shiny things are turning out be be ARX -- ChaCha, BLAKE2, Siphash, NORX. The simplicity is really appealing. Jason
Hi again, On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote: > [ 3.606139] random benchmark!! > [ 3.606276] get_random_int # cycles: 326578 > [ 3.606317] get_random_int_new # cycles: 95438 > [ 3.607423] get_random_bytes # cycles: 2653388 Looks to me like my siphash implementation is much faster for get_random_long, and more or less tied for get_random_int: [ 1.729370] random benchmark!! [ 1.729710] get_random_long # cycles: 349771 [ 1.730128] get_random_long_chacha # cycles: 359660 [ 1.730457] get_random_long_siphash # cycles: 94255 [ 1.731307] get_random_bytes # cycles: 1354894 [ 1.731707] get_random_int # cycles: 305640 [ 1.732095] get_random_int_chacha # cycles: 80726 [ 1.732425] get_random_int_siphash # cycles: 94265 [ 1.733278] get_random_bytes # cycles: 1315873 Given the increasing usage of get_random_long for ASLR and related, I think this makes the siphash approach worth pursuing. The chacha approach is also not significantly different from the md5 approach in terms of speed for get_rand_long. Additionally, since siphash is a PRF, I think this opens up a big window for optimizing it even further. Benchmark here: https://git.zx2c4.com/linux-dev/commit/?h=rng-bench Jason
Hey Ted, On Wed, Dec 14, 2016 at 8:12 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > I think this opens up a big window for optimizing it even > further. I optimized it a bit further and siphash is now the clear winner over chacha: [ 1.784801] random benchmark!! [ 1.785161] get_random_long # cycles: 415983 [ 1.785595] get_random_long_chacha # cycles: 242047 [ 1.785997] get_random_long_siphash # cycles: 137130 [ 1.787450] get_random_bytes # cycles: 1452985 [ 1.787947] get_random_int # cycles: 343323 [ 1.788282] get_random_int_chacha # cycles: 170767 [ 1.788656] get_random_int_siphash # cycles: 86384 [ 1.789764] get_random_bytes # cycles: 2279519 And even still, there is more that could be optimized. Therefore, I'll continue to keep this patch in the series and will CC you on the next patch set that goes out. Jason
diff --git a/drivers/char/random.c b/drivers/char/random.c index d6876d506220..be172ea75799 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1681,6 +1681,38 @@ static int rand_initialize(void) } early_initcall(rand_initialize); +static unsigned int get_random_int_new(void); + +static int rand_benchmark(void) +{ + cycles_t start,finish; + int i, out; + + pr_crit("random benchmark!!\n"); + start = get_cycles(); + for (i = 0; i < 1000; i++) { + get_random_int(); + } + finish = get_cycles(); + pr_err("get_random_int # cycles: %llu\n", finish - start); + + start = get_cycles(); + for (i = 0; i < 1000; i++) { + get_random_int_new(); + } + finish = get_cycles(); + pr_err("get_random_int_new # cycles: %llu\n", finish - start); + + start = get_cycles(); + for (i = 0; i < 1000; i++) { + get_random_bytes(&out, sizeof(out)); + } + finish = get_cycles(); + pr_err("get_random_bytes # cycles: %llu\n", finish - start); + return 0; +} +device_initcall(rand_benchmark); + #ifdef CONFIG_BLOCK void rand_initialize_disk(struct gendisk *disk) { @@ -2064,8 +2096,10 @@ unsigned int get_random_int(void) __u32 *hash; unsigned int ret; +#if 0 // force slow path if (arch_get_random_int(&ret)) return ret; +#endif hash = get_cpu_var(get_random_int_hash); @@ -2100,6 +2134,38 @@ unsigned long get_random_long(void) } EXPORT_SYMBOL(get_random_long); +struct random_buf { + __u8 buf[CHACHA20_BLOCK_SIZE]; + int ptr; +}; + +static DEFINE_PER_CPU(struct random_buf, batched_entropy); + +static void get_batched_entropy(void *buf, int n) +{ + struct random_buf *p; + + p = &get_cpu_var(batched_entropy); + + if ((p->ptr == 0) || + (p->ptr + n >= CHACHA20_BLOCK_SIZE)) { + extract_crng(p->buf); + p->ptr = 0; + } + BUG_ON(n > CHACHA20_BLOCK_SIZE); + memcpy(buf, p->buf, n); + p->ptr += n; + put_cpu_var(batched_entropy); +} + +static unsigned int get_random_int_new(void) +{ + int ret; + + get_batched_entropy(&ret, sizeof(ret)); + return ret; +} + /** * randomize_page - Generate a random, page aligned address * @start: The smallest acceptable address the caller will take.