Message ID | 20220218121054.45304-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | random: use max-period linear interrupt extractor | expand |
On Fri, Feb 18, 2022 at 01:10:54PM +0100, Jason A. Donenfeld wrote: > The current fast_mix() function is a piece of classic mailing list > crypto, where it just sort of sprung up without a lot of real analysis > of what precisely it was accomplishing. As an ARX permutation of only > two rounds, there are some easily searchable differential trails in it, > and as a means of preventing malicious interrupts, it completely fails, > since it xors new data into the entire state every time. It can't really > be analyzed as a random permutation, because it clearly isn't, and it > can't be analyzed as an interesting linear algebraic structure either, > because it's also not that. There really is very little one can say > about it in terms of entropy accumulation. It might diffuse bits, some > of the time, maybe, we hope, I guess. But for the most part, it fails to > accomplish anything concrete. > > As a reminder, the simple goal of add_interrupt_randomness() is to > simply accumulate entropy until ~64 interrupts have elapsed, and then > dump it into the main input pool, which uses a cryptographic hash. > > It would be nice to have something cryptographically strong in the > interrupt handler itself, in case a malicious interrupt compromises a > per-cpu fast pool within the 64 interrupts / 1 second window, and then > inside of that same window somehow can control its return address and > cycle counter, even if that's a bit far fetched. However, with a very > CPU-limited budget, actually doing that remains an active research > project (and perhaps there'll be something useful for Linux to come out > of it). And while the abundance of caution would be nice, this isn't > *currently* the security model, and we don't yet have a fast enough > solution to make it our security model. Plus there's not exactly a > pressing need to do that. (And for the avoidance of doubt, the actual > cluster of 64 accumulated interrupts still gets dumped into our > cryptographically secure input pool.) > > So, for now we are going to stick with the existing interrupt security > model, which assumes that each cluster of 64 interrupt data samples is > mostly non-malicious and not colluding with an infoleaker. With this as > our goal, we can then endeavor to simply accumulate entropy linearly, > discarding the least amount of it, and make sure that accumulation is > sound, unlike the case with the current fast_mix(). > > It turns out that this security model is also the trade off that other > operating systems have taken. The NT kernel, for example, uses something > very simple to accumulate entropy in interrupts, `s = ror32(s, 5) ^ x`. > Dodis et al. analyzed this in <https://eprint.iacr.org/2021/523>, and > found that rotation by 7 would have been better than 5, but that > otherwise, simple linear constructions like this can be used as an > entropy collector for 2-monotone distributions. > > However, when considering this for our four-word accumulation, versus > NT's one-word, we run into potential problems because the words don't > contribute to each other, and some may well be fixed, which means we'd > need something to schedule on top. And more importantly, our > distribution is not 2-monotone like NT's, because in addition to the > cycle counter, we also include in those 4 words a register value, a > return address, and an inverted jiffies. (Whether capturing anything > beyond the cycle counter in the interrupt handler is even adding much of > value is a question for a different time.) > > So since we have 4 words, and not all of them are 2-monotone, we instead > look for a proven linear extractor that works for more complex > distributions. It turns out that a max-period linear feedback shift > register fits this bill quite well, easily extending to the larger state > size and to the fact that we're mixing in more than just the cycle > counter. By picking a linear function with no non-trivial invariant > subspace, unlike NT's rotate-xor, we benefit from the analysis of > <https://eprint.iacr.org/2021/1002>. This paper proves that those types > of linear functions, used in the form `s = f(s) ^ x`, make very good > entropy extractors for the type of complex distributions that we have. > > This commit implements one such very fast and high diffusion max-period > linear function in a Feistel-like fashion, which pipelines quite well. > On an i7-11850H, this takes 34 cycles, versus the original's 65 cycles. > (Though, as a note for posterity: if later this is replaced with some > sort of cryptographic hash function, I'll still be keeping 65 cycles as > my target
Just FYI, I'll probably sit on this for a bit before merging and will likely send a v2. In practice this is probably fine. But I'd like to spend a bit more time working through the theory before settling on this.
diff --git a/drivers/char/random.c b/drivers/char/random.c index caa276cfbc76..4dc751ad3854 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1195,44 +1195,33 @@ void add_bootloader_randomness(const void *buf, size_t size) EXPORT_SYMBOL_GPL(add_bootloader_randomness); struct fast_pool { - union { - u32 pool32[4]; - u64 pool64[2]; - }; struct work_struct mix; unsigned long last; unsigned int count; + u32 pool[4]; u16 reg_idx; }; /* - * This is a fast mixing routine used by the interrupt randomness - * collector. It's hardcoded for an 128 bit pool and assumes that any - * locks that might be needed are taken by the caller. + * This is a max-period LFSR, mixing 128 bits into the 128-bit pool. + * It assumes that its inputs are non-malicious. It is designed to + * be much faster than computing a cryptographic hash function, yet + * still accumulate entropy, though it has no security on its own. */ -static void fast_mix(u32 pool[4]) +static void fast_mix(u32 h[4], const u32 v[4]) { - u32 a = pool[0], b = pool[1]; - u32 c = pool[2], d = pool[3]; - - a += b; c += d; - b = rol32(b, 6); d = rol32(d, 27); - d ^= a; b ^= c; - - a += b; c += d; - b = rol32(b, 16); d = rol32(d, 14); - d ^= a; b ^= c; - - a += b; c += d; - b = rol32(b, 6); d = rol32(d, 27); - d ^= a; b ^= c; - - a += b; c += d; - b = rol32(b, 16); d = rol32(d, 14); - d ^= a; b ^= c; + size_t i; - pool[0] = a; pool[1] = b; - pool[2] = c; pool[3] = d; + for (i = 0; i < 4; ++i) { + u32 w = h[0] ^ h[1] ^ v[i] ^ h[3]; + w ^= w << 17; + w ^= w >> 6; + w ^= w >> 9; + h[0] = h[1]; + h[1] = h[2]; + h[2] = h[3]; + h[3] = w; + } } static DEFINE_PER_CPU(struct fast_pool, irq_randomness); @@ -1291,7 +1280,7 @@ static void mix_interrupt_randomness(struct work_struct *work) * Copy the pool to the stack so that the mixer always has a * consistent view, before we reenable irqs again. */ - memcpy(pool, fast_pool->pool32, sizeof(pool)); + memcpy(pool, fast_pool->pool, sizeof(pool)); fast_pool->count = 0; fast_pool->last = jiffies; local_irq_enable(); @@ -1309,35 +1298,39 @@ void add_interrupt_randomness(int irq) unsigned long now = jiffies; cycles_t cycles = random_get_entropy(); unsigned int new_count; + union { + u32 u32[4]; + u64 u64[2]; + } irq_data; if (cycles == 0) cycles = get_reg(fast_pool, regs); if (sizeof(cycles) == 8) - fast_pool->pool64[0] ^= cycles ^ rol64(now, 32) ^ irq; + irq_data.u64[0] = cycles ^ rol64(now, 32) ^ irq; else { - fast_pool->pool32[0] ^= cycles ^ irq; - fast_pool->pool32[1] ^= now; + irq_data.u32[0] = cycles ^ irq; + irq_data.u32[1] = now; } if (sizeof(unsigned long) == 8) - fast_pool->pool64[1] ^= regs ? instruction_pointer(regs) : _RET_IP_; + irq_data.u64[1] = regs ? instruction_pointer(regs) : _RET_IP_; else { - fast_pool->pool32[2] ^= regs ? instruction_pointer(regs) : _RET_IP_; - fast_pool->pool32[3] ^= get_reg(fast_pool, regs); + irq_data.u32[2] = regs ? instruction_pointer(regs) : _RET_IP_; + irq_data.u32[3] = get_reg(fast_pool, regs); } - fast_mix(fast_pool->pool32); + fast_mix(fast_pool->pool, irq_data.u32); new_count = ++fast_pool->count; if (unlikely(crng_init == 0)) { if (new_count >= 64 && - crng_pre_init_inject(fast_pool->pool32, sizeof(fast_pool->pool32), + crng_pre_init_inject(fast_pool->pool, sizeof(fast_pool->pool), true, true) > 0) { fast_pool->count = 0; fast_pool->last = now; if (spin_trylock(&input_pool.lock)) { - _mix_pool_bytes(&fast_pool->pool32, sizeof(fast_pool->pool32)); + _mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool)); spin_unlock(&input_pool.lock); } }