Message ID | 20211231114903.60882-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | [v2] random: avoid superfluous call to RDRAND in CRNG extraction | expand |
On Fri, Dec 31, 2021 at 12:49:03PM +0100, Jason A. Donenfeld wrote: > RDRAND is not fast. RDRAND is actually quite slow. We've known this for > a while, which is why functions like get_random_u{32,64} were converted > to use batching of our ChaCha-based CRNG instead. > > Yet CRNG extraction still includes a call to RDRAND, in the hot path of > every call to get_random_bytes(), /dev/urandom, and getrandom(2). > > This call to RDRAND here seems quite superfluous. CRNG is already > extracting things based on a 256-bit key, based on good entropy, which > is then reseeded periodically, updated, backtrack-mutated, and so > forth. The CRNG extraction construction is something that we're already > relying on to be secure and solid. If it's not, that's a serious > problem, and it's unlikely that mixing in a measly 32 bits from RDRAND > is going to alleviate things. > > And in the case where the CRNG doesn't have enough entropy yet, we're > already initializing the ChaCha key row with RDRAND in > crng_init_try_arch_early(). > > Removing the call to RDRAND improves performance on an i7-11850H by > 370%. In other words, the vast majority of the work done by > extract_crng() prior to this commit was devoted to fetching 32 bits of > RDRAND. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu>
If we are removing RDRAND, what about adding some cheaper mixing? Something along these lines? The current code's mixing is triggered only once in 2^32 iterations, depends only on crng->state[], always changes the same state word, and introduces no new entropy. Make it happen more often, depend on a randomly initialised counter as well as state[], make a data-dependent choice of word to change, and use random_get_entropy(). --- drivers/char/random.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 605969ed0f96..d2be079f004d 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -985,6 +985,10 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) } } +#define CC_SHIFT 8 +#define CC_MASK ((1<<CC_SHIFT)-1) +static u32 cc_count = 0 ; + static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]) { @@ -998,8 +1002,22 @@ static void _extract_crng(struct crng_state *crng, if (arch_get_random_long(&v)) crng->state[14] ^= v; chacha20_block(&crng->state[0], out); - if (crng->state[12] == 0) - crng->state[13]++; + if (cc_count == 0) + cc_count = crng->state[9] ^ random_get_entropy() ; + switch ((crng->state[12] ^ cc_count) & CC_MASK) { + case 0: + cc_count = crng->state[10] ^ (cc_count>>CC_SHIFT); + break ; + case 31: case 97: case 253: + crng->state[crng->state[13]&7]++; + break ; + case 61: case 127: + crng->state[crng->state[11]&7] += random_get_entropy(); + break ; + default: + break ; + } + cc_count++ ; spin_unlock_irqrestore(&crng->lock, flags); }
On Tue, Jan 04, 2022 at 01:03:43PM +0800, Sandy Harris wrote: > If we are removing RDRAND, what about adding some > cheaper mixing? Something along these lines? > > The current code's mixing is triggered only once in 2^32 > iterations, depends only on crng->state[], always changes > the same state word, and introduces no new entropy. I wouldn't call it "mixing", because the state array isn't an entropy pool. Recall how ChaCha20's state array is set up. crng->state[0..3] contain ChaCha20's initial constants, crng->state[4..11] contain the ChaCha20 key, crng->state[12] is the 32-bit counter (which is incremented when we call ChaCha20), and crng->state[13..15] is the 96-bit IV. The IV and counter --- state[12..15] --- is initialized when the CRNG is initialized. We replace the key every time the CRNG is reseeded. But what if we manage to call _extract_crng() more than 2**32 times? Well, that's what this is all about: if (crng->state[12] == 0) crng->state[13]++; What we've effectively done is treat state[12..13] as a 64-bit counter, and state[14..15] is initialized to a 64-bit random value ("the IV") when the CRNG is initialized, and not updated during the life of the CRNG. This is really the only place where we've modified ChaCha20. Now, either we believe in the strength of ChaCha20, or we don't. The whole *point* of a CRNG is that we rely on the crypto, and adding some random bit-mashing to mix in the CPU cycle counter into parts of the ChaCha20 key (state[10..11]) and part of the ChaCha20 IV (state[12]) isn't consistent with the philosophy of a CRNG. At the very least, I'd like to get an opinion from a respected cryptographer about what they think this would buy us (or what it might cost). If we want to worry about what happens if we could actually manage to call _extract_crng() more than 2**64 times before the reseed interval is up --- which *is* one of the benefits of: if (arch_get_random_long(^v)) crng->state[14] ^= v; I could see doing perhaps this instead: if (crng->state[12] == 0) { crng->state[13]++; if (crng->state[13] == 0) { crng->state[14]++; if (crng->state[14] == 0) { crng->state[15]++; } } } This essentially makes state[12..15] a 128-bit counter, which is initialized to a random value when the CRNG is initialized, and we would continue to treat state[4..11] as the 256 bit ChaCha20 key. This would be a much more philosophically consistent approach, and would allow us to more easily reason about the security based on cryptographic research into ChaCha20 the stream cipher. Cheers, - Ted
On Fri, 31 Dec 2021 at 12:50, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > RDRAND is not fast. RDRAND is actually quite slow. We've known this for > a while, which is why functions like get_random_u{32,64} were converted > to use batching of our ChaCha-based CRNG instead. > > Yet CRNG extraction still includes a call to RDRAND, in the hot path of > every call to get_random_bytes(), /dev/urandom, and getrandom(2). > > This call to RDRAND here seems quite superfluous. CRNG is already > extracting things based on a 256-bit key, based on good entropy, which > is then reseeded periodically, updated, backtrack-mutated, and so > forth. The CRNG extraction construction is something that we're already > relying on to be secure and solid. If it's not, that's a serious > problem, and it's unlikely that mixing in a measly 32 bits from RDRAND > is going to alleviate things. > > And in the case where the CRNG doesn't have enough entropy yet, we're > already initializing the ChaCha key row with RDRAND in > crng_init_try_arch_early(). > > Removing the call to RDRAND improves performance on an i7-11850H by > 370%. In other words, the vast majority of the work done by > extract_crng() prior to this commit was devoted to fetching 32 bits of > RDRAND. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > drivers/char/random.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 4de0feb69781..17ec60948795 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -1023,7 +1023,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) > static void _extract_crng(struct crng_state *crng, > __u8 out[CHACHA_BLOCK_SIZE]) > { > - unsigned long v, flags, init_time; > + unsigned long flags, init_time; > > if (crng_ready()) { > init_time = READ_ONCE(crng->init_time); > @@ -1033,8 +1033,6 @@ static void _extract_crng(struct crng_state *crng, > &input_pool : NULL); > } > spin_lock_irqsave(&crng->lock, flags); > - if (arch_get_random_long(&v)) > - crng->state[14] ^= v; > chacha20_block(&crng->state[0], out); > if (crng->state[12] == 0) > crng->state[13]++; Given that arch_get_random_long() may be backed by other things than special instructions on some architectures/platforms, avoiding it if we can on any path that may be a hot path is good, so Acked-by: Ard Biesheuvel <ardb@kernel.org>
diff --git a/drivers/char/random.c b/drivers/char/random.c index 4de0feb69781..17ec60948795 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1023,7 +1023,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]) { - unsigned long v, flags, init_time; + unsigned long flags, init_time; if (crng_ready()) { init_time = READ_ONCE(crng->init_time); @@ -1033,8 +1033,6 @@ static void _extract_crng(struct crng_state *crng, &input_pool : NULL); } spin_lock_irqsave(&crng->lock, flags); - if (arch_get_random_long(&v)) - crng->state[14] ^= v; chacha20_block(&crng->state[0], out); if (crng->state[12] == 0) crng->state[13]++;
RDRAND is not fast. RDRAND is actually quite slow. We've known this for a while, which is why functions like get_random_u{32,64} were converted to use batching of our ChaCha-based CRNG instead. Yet CRNG extraction still includes a call to RDRAND, in the hot path of every call to get_random_bytes(), /dev/urandom, and getrandom(2). This call to RDRAND here seems quite superfluous. CRNG is already extracting things based on a 256-bit key, based on good entropy, which is then reseeded periodically, updated, backtrack-mutated, and so forth. The CRNG extraction construction is something that we're already relying on to be secure and solid. If it's not, that's a serious problem, and it's unlikely that mixing in a measly 32 bits from RDRAND is going to alleviate things. And in the case where the CRNG doesn't have enough entropy yet, we're already initializing the ChaCha key row with RDRAND in crng_init_try_arch_early(). Removing the call to RDRAND improves performance on an i7-11850H by 370%. In other words, the vast majority of the work done by extract_crng() prior to this commit was devoted to fetching 32 bits of RDRAND. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- drivers/char/random.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)