[1/5] random: fix crng_ready() test
diff mbox

Message ID 20180413013046.404-1-tytso@mit.edu
State Not Applicable
Delegated to: Herbert Xu
Headers show

Commit Message

Theodore Y. Ts'o April 13, 2018, 1:30 a.m. UTC
The crng_init variable has three states:

0: The CRNG is not initialized at all
1: The CRNG has a small amount of entropy, hopefully good enough for
   early-boot, non-cryptographical use cases
2: The CRNG is fully initialized and we are sure it is safe for
   cryptographic use cases.

The crng_ready() function should only return true once we are in the
last state.

Reported-by: Jann Horn <jannh@google.com>
Fixes: e192be9d9a30 ("random: replace non-blocking pool...")
Cc: stable@kernel.org # 4.8+
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jann Horn <jannh@google.com>
---
 drivers/char/random.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Stephan Mueller April 13, 2018, 5:38 a.m. UTC | #1
Am Freitag, 13. April 2018, 03:30:42 CEST schrieb Theodore Ts'o:

Hi Theodore,

> The crng_init variable has three states:
> 
> 0: The CRNG is not initialized at all
> 1: The CRNG has a small amount of entropy, hopefully good enough for
>    early-boot, non-cryptographical use cases
> 2: The CRNG is fully initialized and we are sure it is safe for
>    cryptographic use cases.
> 
> The crng_ready() function should only return true once we are in the
> last state.
> 
Do I see that correctly that getrandom(2) will now unblock after the 
input_pool has obtained 128 bits of entropy? Similarly for 
get_random_bytes_wait.

As this seems to be the only real use case for crng_ready (apart from 
logging), what is the purpose of crng_init == 1?

Ciao
Stephan
Theodore Y. Ts'o April 13, 2018, 12:53 p.m. UTC | #2
On Fri, Apr 13, 2018 at 07:38:30AM +0200, Stephan Mueller wrote:
> > The crng_init variable has three states:
> > 
> > 0: The CRNG is not initialized at all
> > 1: The CRNG has a small amount of entropy, hopefully good enough for
> >    early-boot, non-cryptographical use cases
> > 2: The CRNG is fully initialized and we are sure it is safe for
> >    cryptographic use cases.
> > 
> > The crng_ready() function should only return true once we are in the
> > last state.
> > 
> Do I see that correctly that getrandom(2) will now unblock after the 
> input_pool has obtained 128 bits of entropy? Similarly for 
> get_random_bytes_wait.

Correct.

> As this seems to be the only real use case for crng_ready (apart from 
> logging), what is the purpose of crng_init == 1?

Immediately after boot (crng_init state 0), we funnel the harvested
entropy from add_interrupt_entropy() directly to the CRNG pool.  The
goal is to get at least some amount of entropy into the CRNG is it's
not blatently predictable.  When we have accomplished this, by mixing
in 2 * CHACHA20_KEY_SIZE bytes of input from add_input_randomness, we
enter state crng_init state 1.

In crng_init state 1, we start funnelling the harvested entropy to the
input_pool.  Once we have accumulated 128 bits of entropy in the input
pool, we then reseed the CRNG from the input_pool, and we consider the
CRNG to be fully intialized.

During crng_init state 1, the CRNG is basically in a "good enough for
government work" state.  It's going to get used by things like initial
TCP sequence numbers, and UUID's by udev, et. al, but I wouldn't want
to use it for anything that has strong cryptographic use cases.

It would be preferable if nothing in the kernel and early systemd
startup used get_random_bytes() or /dev/urandom or getrandom(2) w/
GRND_NONBLOCK.  Unfortunately, (a) there is a lot of legacy code which
still uses /dev/urandom and not getrandom(2) and (b) in some cases,
such as initialization of the Stack Canary, the kernel simply can't
wait for the CRNG to be fully initialized.  Or if the kernel needs
enough of the networking stack to mount an NFS root file system, for
example.

This was always the original design intent, but I screwed up and no
one noticed until Jann reached out to be and said, "Hey.... this
doesn't seem to make much sense".

Regards,

					- Ted
Stephan Mueller April 13, 2018, 1:05 p.m. UTC | #3
Am Freitag, 13. April 2018, 14:53:13 CEST schrieb Theodore Y. Ts'o:

Hi Theodore,
> 
> This was always the original design intent, but I screwed up and no
> one noticed until Jann reached out to be and said, "Hey.... this
> doesn't seem to make much sense".

I disagree, but I guess you would have expected that. But this is not the 
issue.

What I would like to point out that more and more folks change to 
getrandom(2). As this call will now unblock much later in the boot cycle, 
these systems see a significant departure from the current system behavior.

E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes 
as of now. Now it can be a matter minutes before it responds. Thus, is such 
change in the kernel behavior something for stable?

Ciao
Stephan
Theodore Y. Ts'o April 13, 2018, 5 p.m. UTC | #4
On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote:
> 
> What I would like to point out that more and more folks change to 
> getrandom(2). As this call will now unblock much later in the boot cycle, 
> these systems see a significant departure from the current system behavior.
> 
> E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes 
> as of now. Now it can be a matter minutes before it responds. Thus, is such 
> change in the kernel behavior something for stable?

It will have some change on the kernel behavior, but not as much as
you might think.  That's because in older kernels, we were *already*
blocking until crng_init > 2 --- if the getrandom(2) call happened
while crng_init was in state 0.

Even before this patch series, we didn't wake up a process blocked on
crng_init_wait until crng_init state 2 is reached:

static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
{
	...
	if (crng == &primary_crng && crng_init < 2) {
		invalidate_batched_entropy();
		crng_init = 2;
		process_random_ready_list();
		wake_up_interruptible(&crng_init_wait);
		pr_notice("random: crng init done\n");
	}
}

This is the reason why there are reports like this: "Boot delayed for
about 90 seconds until 'random: crng init done'"[1]

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1685794


So we have the problem already.  There will be more cases of this
after this patch series is applied, true.  But what we have already is
an inconsistent state where if you call getrandom(2) while the kernel
is in crng_init state 0, you will block until crng_init state 2, but
if you are in crng_init state 1, you will assume the CRNG is fully
initialized.

Given the documentation of how getrandom(2) works what its documented
guarantees are, I think it does justify making its behavior both more
consistent with itself, and more consistent what the security
guarantees we have promised people.

I was a little worried that on VM's this could end up causing things
to block for a long time, but an experiment on a GCE VM shows that
isn't a problem:

[    0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018
[    1.282220] random: fast init done
[    3.987092] random: crng init done
[    4.376787] EXT4-fs (sda1): re-mounted. Opts: (null)

There are some desktops where the "crng_init done" report doesn't
happen until 45-90 seconds into the boot.  I don't think I've seen
reports where it takes _minutes_ however.  Can you give me some
examples of such cases?

					- Ted

P.S.  Of course, in a VM environment, if the host supports virtio-rng,
the boot delay problem is completely not an issue.  You just have to
enable virtio-rng in the guest kernel, which I believe is already the
case for most distro kernels.

BTW, for KVM, it's fairly simple to set it the host-side support for
virtio-rng.  Just add to the kvm command-line options:

    -object rng-random,filename=/dev/urandom,id=rng0 \
	-device virtio-rng-pci,rng=rng0
Geert Uytterhoeven May 2, 2018, 4:18 p.m. UTC | #5
Hi Ted,

On Fri, Apr 13, 2018 at 3:30 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> The crng_init variable has three states:
>
> 0: The CRNG is not initialized at all
> 1: The CRNG has a small amount of entropy, hopefully good enough for
>    early-boot, non-cryptographical use cases
> 2: The CRNG is fully initialized and we are sure it is safe for
>    cryptographic use cases.
>
> The crng_ready() function should only return true once we are in the
> last state.
>
> Reported-by: Jann Horn <jannh@google.com>
> Fixes: e192be9d9a30 ("random: replace non-blocking pool...")
> Cc: stable@kernel.org # 4.8+
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Reviewed-by: Jann Horn <jannh@google.com>

Since commit 43838a23a05fbd13 ("random: fix crng_ready() test"),
all (few) remaining users of %p are printing too early, leading to "(ptrval)"
strings instead of actual hashed pointer values.

Sample timings on two platforms (arm / arm64) booting with lots of
debug ingo:

[   28.521158] random: crng init done
[   17.792705] random: crng init done

Gr{oetje,eeting}s,

                        Geert
Srivatsa S. Bhat May 17, 2018, 12:07 a.m. UTC | #6
On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote:
> On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote:
>>
>> What I would like to point out that more and more folks change to 
>> getrandom(2). As this call will now unblock much later in the boot cycle, 
>> these systems see a significant departure from the current system behavior.
>>
>> E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes 
>> as of now. Now it can be a matter minutes before it responds. Thus, is such 
>> change in the kernel behavior something for stable?

[...]

> I was a little worried that on VM's this could end up causing things
> to block for a long time, but an experiment on a GCE VM shows that
> isn't a problem:
> 
> [    0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018
> [    1.282220] random: fast init done
> [    3.987092] random: crng init done
> [    4.376787] EXT4-fs (sda1): re-mounted. Opts: (null)
> 
> There are some desktops where the "crng_init done" report doesn't
> happen until 45-90 seconds into the boot.  I don't think I've seen
> reports where it takes _minutes_ however.  Can you give me some
> examples of such cases?

On a Photon OS VM running on VMware ESXi, this patch causes a boot speed
regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools
(rngd) installed. ]

[    1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered
[    1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz
[    1.470707] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns
[    1.474249] clocksource: Switched to clocksource tsc
[    1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1
[  346.620718] random: crng init done

Interestingly, the boot delay is exacerbated on VMs with large amounts
of RAM. For example, the delay is not so noticeable (< 30 seconds) on a
VM with 2GB memory, but goes up to 5 minutes on an 8GB VM.

Also, cloud-init-local.service seems to be the one blocking for entropy
here. systemd-analyze critical-chain shows:

The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

multi-user.target @6min 1.283s
└─vmtoolsd.service @6min 1.282s
  └─cloud-final.service @6min 366ms +914ms
    └─cloud-config.service @5min 59.174s +1.190s
      └─cloud-config.target @5min 59.172s
        └─cloud-init.service @5min 47.423s +11.744s
          └─systemd-networkd-wait-online.service @5min 45.999s +1.420s
            └─systemd-networkd.service @5min 45.975s +21ms
              └─network-pre.target @5min 45.973s
                └─cloud-init-local.service @241ms +5min 45.687s
                  └─systemd-remount-fs.service @222ms +13ms
                    └─systemd-fsck-root.service @193ms +26ms
                      └─systemd-journald.socket @188ms
                        └─-.mount @151ms
                          └─system.slice @161ms
                            └─-.slice @151ms

It would be great if this CVE can be fixed somehow without causing boot speed
to spike from ~20 seconds to 5 minutes, as that makes the system pretty much
unusable. I can workaround this by installing haveged, but ideally an in-kernel
fix would be better. If you need any other info about my setup or if you have
a patch that I can test, please let me know!

Thank you very much!

Regards,
Srivatsa
VMware Photon OS
Christophe Leroy May 17, 2018, 6:01 a.m. UTC | #7
Le 13/04/2018 à 19:00, Theodore Y. Ts'o a écrit :
> On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote:
>>
>> What I would like to point out that more and more folks change to
>> getrandom(2). As this call will now unblock much later in the boot cycle,
>> these systems see a significant departure from the current system behavior.
>>
>> E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes
>> as of now. Now it can be a matter minutes before it responds. Thus, is such
>> change in the kernel behavior something for stable?
> 
> It will have some change on the kernel behavior, but not as much as
> you might think.  That's because in older kernels, we were *already*
> blocking until crng_init > 2 --- if the getrandom(2) call happened
> while crng_init was in state 0.
> 
> Even before this patch series, we didn't wake up a process blocked on
> crng_init_wait until crng_init state 2 is reached:
> 
> static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
> {
> 	...
> 	if (crng == &primary_crng && crng_init < 2) {
> 		invalidate_batched_entropy();
> 		crng_init = 2;
> 		process_random_ready_list();
> 		wake_up_interruptible(&crng_init_wait);
> 		pr_notice("random: crng init done\n");
> 	}
> }
> 
> This is the reason why there are reports like this: "Boot delayed for
> about 90 seconds until 'random: crng init done'"[1]
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1685794
> 
> 
> So we have the problem already.  There will be more cases of this
> after this patch series is applied, true.  But what we have already is
> an inconsistent state where if you call getrandom(2) while the kernel
> is in crng_init state 0, you will block until crng_init state 2, but
> if you are in crng_init state 1, you will assume the CRNG is fully
> initialized.
> 
> Given the documentation of how getrandom(2) works what its documented
> guarantees are, I think it does justify making its behavior both more
> consistent with itself, and more consistent what the security
> guarantees we have promised people.
> 
> I was a little worried that on VM's this could end up causing things
> to block for a long time, but an experiment on a GCE VM shows that
> isn't a problem:
> 
> [    0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018
> [    1.282220] random: fast init done
> [    3.987092] random: crng init done
> [    4.376787] EXT4-fs (sda1): re-mounted. Opts: (null)
> 
> There are some desktops where the "crng_init done" report doesn't
> happen until 45-90 seconds into the boot.  I don't think I've seen
> reports where it takes _minutes_ however.  Can you give me some
> examples of such cases?


On a powerpc embedded board which has an mpc8xx processor running at 
133Mhz, I now get the startup done in more than 7 minutes instead of 30 
seconds. This is due to the webserver blocking on read on /dev/random 
until we get 'random: crng init done':

[    0.000000] Linux version 4.17.0-rc4-00415-gd2f75d40072d 
(root@localhost) (gcc version 5.4.0 (GCC)) #203 PREEMPT Wed May 16 
16:32:02 CEST 2018
[    0.295453] random: get_random_u32 called from 
bucket_table_alloc+0x84/0x1bc with crng_init=0
[    1.030472] device: 'random': device_add
[    1.031279] device: 'urandom': device_add
[    1.420069] device: 'hw_random': device_add
[    2.156853] random: fast init done
[  462.007776] random: crng init done

This has become really critical, is there anything that can be done ?

Christophe

> 
> 					- Ted
> 
> P.S.  Of course, in a VM environment, if the host supports virtio-rng,
> the boot delay problem is completely not an issue.  You just have to
> enable virtio-rng in the guest kernel, which I believe is already the
> case for most distro kernels.
> 
> BTW, for KVM, it's fairly simple to set it the host-side support for
> virtio-rng.  Just add to the kvm command-line options:
> 
>      -object rng-random,filename=/dev/urandom,id=rng0 \
> 	-device virtio-rng-pci,rng=rng0
>
Theodore Y. Ts'o May 17, 2018, 8:53 p.m. UTC | #8
On Wed, May 16, 2018 at 05:07:08PM -0700, Srivatsa S. Bhat wrote:
> 
> On a Photon OS VM running on VMware ESXi, this patch causes a boot speed
> regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools
> (rngd) installed. ]
> 
> [    1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered
> [    1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz
> [    1.470707] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns
> [    1.474249] clocksource: Switched to clocksource tsc
> [    1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1
> [  346.620718] random: crng init done
> 
> Interestingly, the boot delay is exacerbated on VMs with large amounts
> of RAM. For example, the delay is not so noticeable (< 30 seconds) on a
> VM with 2GB memory, but goes up to 5 minutes on an 8GB VM.
> 
> Also, cloud-init-local.service seems to be the one blocking for entropy
> here.

So the first thing I'd ask you to investigate is what the heck
cloud-init-local.service is doing, and why it really needs
cryptographic grade random numbers?

> It would be great if this CVE can be fixed somehow without causing boot speed
> to spuike from ~20 seconds to 5 minutes, as that makes the system pretty much
> unusable. I can workaround this by installing haveged, but ideally an in-kernel
> fix would be better. If you need any other info about my setup or if you have
> a patch that I can test, please let me know!

So the question is why is strong random numbers needed by
cloud-init-local, and how much do you trust either haveged and/or
RDRAND (which is what you will be depending upon if you install
rng-tools).  If you believe that Intel and/or the NSA hasn't
backdoored RDRAND[1], or you believe that because Intel processor's
internal cache architecture isn't publically documented, and it's
Soooooooo complicated that no one can figure it out (which is what you
will be depending upon if you if you choose haveged), be my guest.  I
personally consider the latter to be "security via obscu7rity", but
others have different opinions.

[1] As an aside, read the best paper from the 37th IEEE Symposium on
Security and Privacy and weep:

https://www.computerworld.com/article/3079417/security/researchers-built-devious-undetectable-hardware-level-backdoor-in-computer-chips.html

If it turns out that the security if your VM is critically dependent
on what cloud-init-local.service is doing, and you don't like making
those assumptions, then you may need to ask VMWare ESXi to make
virtio-rng available to guest OS's, and then make Photon OS depend on
a secure RNG available to the host OS.

Best regards,

						- Ted
Theodore Y. Ts'o May 17, 2018, 8:56 p.m. UTC | #9
On Thu, May 17, 2018 at 08:01:04AM +0200, Christophe LEROY wrote:
> 
> On a powerpc embedded board which has an mpc8xx processor running at 133Mhz,
> I now get the startup done in more than 7 minutes instead of 30 seconds.
> This is due to the webserver blocking on read on /dev/random until we get
> 'random: crng init done':
> 
> [    0.000000] Linux version 4.17.0-rc4-00415-gd2f75d40072d (root@localhost)
> (gcc version 5.4.0 (GCC)) #203 PREEMPT Wed May 16 16:32:02 CEST 2018
> [    0.295453] random: get_random_u32 called from
> bucket_table_alloc+0x84/0x1bc with crng_init=0
> [    1.030472] device: 'random': device_add
> [    1.031279] device: 'urandom': device_add
> [    1.420069] device: 'hw_random': device_add
> [    2.156853] random: fast init done
> [  462.007776] random: crng init done
> 
> This has become really critical, is there anything that can be done ?

Figure out why the webserver needs to read /dev/random and is it for a
security critical purpose?

A kernel patch which makes the kernel do a "lalalalala I'm secure"
when it really isn't secure is a simple "solution", but would it
really make you happy?


						- Ted

Patch
diff mbox

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e027e7fa1472..c8ec1e70abde 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -427,7 +427,7 @@  struct crng_state primary_crng = {
  * its value (from 0->1->2).
  */
 static int crng_init = 0;
-#define crng_ready() (likely(crng_init > 0))
+#define crng_ready() (likely(crng_init > 1))
 static int crng_init_cnt = 0;
 #define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE)
 static void _extract_crng(struct crng_state *crng,
@@ -794,7 +794,7 @@  static int crng_fast_load(const char *cp, size_t len)
 
 	if (!spin_trylock_irqsave(&primary_crng.lock, flags))
 		return 0;
-	if (crng_ready()) {
+	if (crng_init != 0) {
 		spin_unlock_irqrestore(&primary_crng.lock, flags);
 		return 0;
 	}
@@ -856,7 +856,7 @@  static void _extract_crng(struct crng_state *crng,
 {
 	unsigned long v, flags;
 
-	if (crng_init > 1 &&
+	if (crng_ready() &&
 	    time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
 		crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL);
 	spin_lock_irqsave(&crng->lock, flags);
@@ -1139,7 +1139,7 @@  void add_interrupt_randomness(int irq, int irq_flags)
 	fast_mix(fast_pool);
 	add_interrupt_bench(cycles);
 
-	if (!crng_ready()) {
+	if (unlikely(crng_init == 0)) {
 		if ((fast_pool->count >= 64) &&
 		    crng_fast_load((char *) fast_pool->pool,
 				   sizeof(fast_pool->pool))) {
@@ -2212,7 +2212,7 @@  void add_hwgenerator_randomness(const char *buffer, size_t count,
 {
 	struct entropy_store *poolp = &input_pool;
 
-	if (!crng_ready()) {
+	if (unlikely(crng_init == 0)) {
 		crng_fast_load(buffer, count);
 		return;
 	}