diff mbox series

[v2] random: mix build-time latent entropy into pool at init

Message ID 20220331152641.169301-1-Jason@zx2c4.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series [v2] random: mix build-time latent entropy into pool at init | expand

Commit Message

Jason A. Donenfeld March 31, 2022, 3:26 p.m. UTC
Prior, the "input_pool_data" array needed no real initialization, and so
it was easy to mark it with __latent_entropy to populate it during
compile-time. In switching to using a hash function, this required us to
specifically initialize it to some specific state, which means we
dropped the __latent_entropy attribute. An unfortunate side effect was
this meant the pool was no longer seeded using compile-time random data.
In order to bring this back, we declare an array in rand_initialize()
with __latent_entropy and call mix_pool_bytes() on that at init, which
accomplishes the same thing as before. We make this __initconst, so that
it doesn't take up space at runtime after init.

Fixes: 6e8ec2552c7d ("random: use computational hash for entropy extraction")
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
- Use __initconst.

 drivers/char/random.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Michael Brooks March 31, 2022, 4:02 p.m. UTC | #1
mix_pool_bytes() has numerous problems, as discussed in prior emails.
Do we still want to be putting so much effort into a development dead
end?

-Michael

On Thu, Mar 31, 2022 at 8:28 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Prior, the "input_pool_data" array needed no real initialization, and so
> it was easy to mark it with __latent_entropy to populate it during
> compile-time. In switching to using a hash function, this required us to
> specifically initialize it to some specific state, which means we
> dropped the __latent_entropy attribute. An unfortunate side effect was
> this meant the pool was no longer seeded using compile-time random data.
> In order to bring this back, we declare an array in rand_initialize()
> with __latent_entropy and call mix_pool_bytes() on that at init, which
> accomplishes the same thing as before. We make this __initconst, so that
> it doesn't take up space at runtime after init.
>
> Fixes: 6e8ec2552c7d ("random: use computational hash for entropy extraction")
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v1->v2:
> - Use __initconst.
>
>  drivers/char/random.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 40107f8b9e9e..1d8242969751 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -975,6 +975,11 @@ int __init rand_initialize(void)
>         bool arch_init = true;
>         unsigned long rv;
>
> +#if defined(LATENT_ENTROPY_PLUGIN)
> +       static const u8 compiletime_seed[BLAKE2S_BLOCK_SIZE] __initconst __latent_entropy;
> +       _mix_pool_bytes(compiletime_seed, sizeof(compiletime_seed));
> +#endif
> +
>         for (i = 0; i < BLAKE2S_BLOCK_SIZE; i += sizeof(rv)) {
>                 if (!arch_get_random_seed_long_early(&rv) &&
>                     !arch_get_random_long_early(&rv)) {
> --
> 2.35.1
>
Theodore Ts'o March 31, 2022, 5:55 p.m. UTC | #2
On Thu, Mar 31, 2022 at 11:26:41AM -0400, Jason A. Donenfeld wrote:
> Prior, the "input_pool_data" array needed no real initialization, and so
> it was easy to mark it with __latent_entropy to populate it during
> compile-time. In switching to using a hash function, this required us to
> specifically initialize it to some specific state, which means we
> dropped the __latent_entropy attribute. An unfortunate side effect was
> this meant the pool was no longer seeded using compile-time random data.
> In order to bring this back, we declare an array in rand_initialize()
> with __latent_entropy and call mix_pool_bytes() on that at init, which
> accomplishes the same thing as before. We make this __initconst, so that
> it doesn't take up space at runtime after init.
> 
> Fixes: 6e8ec2552c7d ("random: use computational hash for entropy extraction")
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

LGTM

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o March 31, 2022, 6:21 p.m. UTC | #3
On Thu, Mar 31, 2022 at 09:02:27AM -0700, Michael Brooks wrote:
> mix_pool_bytes() has numerous problems, as discussed in prior emails.
> Do we still want to be putting so much effort into a development dead
> end?

Michael, with respect, there were a number of things in your analysis
which simply didn't make any sense.  Discussing it on an e-mail thread
relating to stable bacports wasn't the right place, so I didn't extend
the discussion there.

You believe that max_pool_bytes() has numerous problems.  That's not
the same thing as it having problems.

And making incremental changes, with code review, is the much better
approach than just doing a rip-and-replace with some something else
--- which might have different, even more exciting problems.

Something for you to consider, since your comments seem to indicate
that you are not familiar with the full random driver design.  There
are two halves to how the random driver works.  The first half is the
collection of entropy, and the priamry way this is accomplished is by
taking timestamps of various events that an external attacker
hopefully won't have access to.  For example, keystrokes from the
user, mouse motion events, network and disk interrupts, etc.  Where
possible, we don't just use jiffies, but we also use high preceision
counters, such as the CPU counter.  The idea here is that even if the
external interrupts sources can be seen by an attacker, when the
interrupt is serviced when measured by a high precision cycle counter
(for example) is not going to be as easily guessed.  That being said,
we only get a tiny amount of entropy (by which I mean uncertainty by
the attacker) out of each event.  This is why it is important to
distill it in an input pool, so that as we add more and more
unpredictable inputs into the pool, it becomes less and less tractible
for the attacker to make educating guesses about what is in the pool.

Then periodically (and doing this periodically is important, because
we want to wait until there we have a large amount of uncertainty with
respect to the attacker accumulated in the pool) we extract from the
input pool and use that to reseed the second part of the random
driver, which is used to be called the "output pool".

It used to be that both the input pool and output pool were literally
bitpools that were mixed using an LFSR scheme, and then extracted
using cryptographic hash.

The output pool is now a ChaCha-based CRNG, and most recently the
"input pool" is a accumulating entropy using a Blake2 hash.  So in
many ways, the term "input pool" is a bit of a misnomer now, and
perhaps should be renamed.

For more information, I direct you to the Yarrow paper[1].  The basic
idea of using two pools coupled with a catastrophic reseed was
shamelessly stolen from Bruce Schneier's work.

[1] https://www.schneier.com/wp-content/uploads/2016/02/paper-yarrow.pdf

Are there reasons why we didn't just implement Yarrow?  That's because
/dev/random predates Yarrow, and we made incremental changes to adopt
("steal") good ideas from other sources, which hopefully don't
invalidate previous analysis and reviews about /dev/random.  Please
note that there are a number of academic researches who have published
peer previews of /dev/random, and that is incredibly useful.

We've made changes over time to improve /dev/random and to addresses
various theoretical weaknesses noted by these academic reviewers.  So
when you claim that there are "numerous problems" with the input pool,
I'll have to note that /dev/random has undergone reviews by
cryptographers, and they have not identified the problems that you
claim are there.

Regards,

						- Ted
Sandy Harris April 2, 2022, 4:44 a.m. UTC | #4
On Fri, Apr 1, 2022 at 11:16 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Prior, the "input_pool_data" array needed no real initialization, and so
> it was easy to mark it with __latent_entropy to populate it during
> compile-time.

As I see it, that was the correct approach.

> In switching to using a hash function, this required us to
> specifically initialize it to some specific state,

Hash functions do not require that. Any such function must
work correctly with a new input block and a more-or-less
random state from hashing previous blocks.

In general, except perhaps at boot time, I do not think
any of the hopefully-random data structures -- input
pool, hash context or chacha context -- should ever be
set to any specific state. Update them only with += or
^= and preferably not with constants.

What requires a fixed initialisation is your decision to
eliminate the input pool & just collect entropy in a hash
context. In effect you are reducing the driver to a
Yarrow-like design, which I think is an error.

Yarrow is a good design, but it has limitations; in particular
the Yarrow paper says the cryptographic strength is limited
to the size of the hash context, 160 bits for their SHA-1 &
512 for our Blake.

512 bits is more than enough for nearly all use cases, but
we may have some where it is not. How many random bits
are needed to generate a 4k-bit PGP key?

Will some users try to generate one-time pads from /dev/random?
The OTP security proof requires truly random data as long as the
message; with anything short of that the proof fails & you get
a stream cipher.

Patches will follow, but likely not soon; I'm busy with
other things.
Jason A. Donenfeld April 2, 2022, 1:48 p.m. UTC | #5
Hi Sandy,

On Sat, Apr 2, 2022 at 6:45 AM Sandy Harris <sandyinchina@gmail.com> wrote:
> > In switching to using a hash function, this required us to
> > specifically initialize it to some specific state,
>
> Hash functions do not require that. Any such function must
> work correctly with a new input block and a more-or-less
> random state from hashing previous blocks.

Well yes and no. Strictly no in the sense that blake2s_state has a few
book-keeping variables, which we probably benefit in terms of caching
from having next to the other state variables. Almost yes in the sense
that in the ideal model, the hash state can become _anything_ so
initializing it to random might be okay. But in practice, maybe not,
because at the moment the latent entropy plugin is actually expanding
a 64-bit seed with a basic LFSR, rather than supplying more uniformly
random bytes (I have a patch out for that now). These details might
matter, so rather than tempting fate, just calling blake2s_update the
way the hash function is intended to be used seems a lot more cautious
than poking at the function's innards unnecessarily.

Jason
Eric Biggers April 4, 2022, 11:27 p.m. UTC | #6
On Sat, Apr 02, 2022 at 12:44:42PM +0800, Sandy Harris wrote:
> Yarrow is a good design, but it has limitations; in particular
> the Yarrow paper says the cryptographic strength is limited
> to the size of the hash context, 160 bits for their SHA-1 &
> 512 for our Blake.
> 
> 512 bits is more than enough for nearly all use cases, but
> we may have some where it is not. How many random bits
> are needed to generate a 4k-bit PGP key?
> 
> Will some users try to generate one-time pads from /dev/random?
> The OTP security proof requires truly random data as long as the
> message; with anything short of that the proof fails & you get
> a stream cipher.

All the data from /dev/{u,}random is generated by ChaCha20, which is a 256-bit
stream cipher.  We don't target, or need to target, more than 256-bit security.
So the entropy pool itself doesn't need to be more than 256 bits, provided that
it is implemented properly using a cryptographic hash function, which it now is.

- Eric
diff mbox series

Patch

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 40107f8b9e9e..1d8242969751 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -975,6 +975,11 @@  int __init rand_initialize(void)
 	bool arch_init = true;
 	unsigned long rv;
 
+#if defined(LATENT_ENTROPY_PLUGIN)
+	static const u8 compiletime_seed[BLAKE2S_BLOCK_SIZE] __initconst __latent_entropy;
+	_mix_pool_bytes(compiletime_seed, sizeof(compiletime_seed));
+#endif
+
 	for (i = 0; i < BLAKE2S_BLOCK_SIZE; i += sizeof(rv)) {
 		if (!arch_get_random_seed_long_early(&rv) &&
 		    !arch_get_random_long_early(&rv)) {