mbox series

[v24,00/12] /dev/random - a new approach with full SP800-90B compliance

Message ID 6157374.ptSnyUpaCn@positron.chronox.de (mailing list archive)
Headers show
Series /dev/random - a new approach with full SP800-90B compliance | expand

Message

Stephan Mueller Nov. 11, 2019, 6:17 p.m. UTC
Hi,

The following patch set provides a different approach to /dev/random which is
called Linux Random Number Generator (LRNG) to collect entropy within the Linux
kernel. The main improvements compared to the existing /dev/random is to provide
sufficient entropy during boot time as well as in virtual environments and when
using SSDs. A secondary design goal is to limit the impact of the entropy
collection on massive parallel systems and also allow the use accelerated
cryptographic primitives. Also, all steps of the entropic data processing are
testable.

The LRNG patch set allows a user to select use of the existing /dev/random or
the LRNG during compile time. As the LRNG provides API and ABI compatible
interfaces to the existing /dev/random implementation, the user can freely chose
the RNG implementation without affecting kernel or user space operations.

This patch set provides early boot-time entropy which implies that no
additional flags to the getrandom(2) system call discussed recently on
the LKML is considered to be necessary.

The LRNG is fully compliant to SP800-90B requirements and is shipped with a
full SP800-90B assessment and all required test tools. The existing /dev/random
implementation on the other hand has architectural limitations which
does not easily allow to bring the implementation in compliance with
SP800-90B. The key statement that causes concern is SP800-90B section
3.1.6. This section denies crediting entropy to multiple similar noise
sources. This section explicitly references different noise sources resting
on the timing of events and their derivatives (i.e. it is a direct complaint
to the existing existing /dev/random implementation). Therefore, SP800-90B
now denies the very issue mentioned in [1] with the existing /dev/random
implementation for a long time: crediting entropy to interrupts as well as
crediting entropy to derivatives of interrupts (HID and disk events). This is
not permissible with SP800-90B.

SP800-90B specifies various requirements for the noise source(s) that seed any
DRNG including SP800-90A DRBGs. In about a year from now, SP800-90B will be
mandated for all noise sources that provide entropy to DRBGs as part of a FIPS
140-[2|3] validation or other evaluation types. That means, if we there are no
solutions to comply with the requirements of SP800-90B found till one year
from now, any random number generation and ciphers based on random numbers
on Linux will be considered and treated as not applicable and delivering
no entropy! As /dev/urandom, getrandom(2) and /dev/random are the most
common and prevalent noise sources for DRNGs, all these DRNGs are affected.
This applies across the board for all validations of cryptography executing on
Linux (kernel and user space modules).

For users that are not interested in SP800-90B, the entire code for the
compliance as well as test interfaces can be deselected at compile time.

The design and implementation is driven by a set of goals described in [1]
that the LRNG completely implements. Furthermore, [1] includes the full
assessment of the SP800-90B compliance as well as a comparison with RNG
design suggestions of SP800-90C, and AIS20/31.

The LRNG provides a complete separation of the noise source maintenance
and the collection of entropy into an entropy pool from the post-processing
using a pseudo-random number generator. Different DRNGs are supported,
including:

* The LRNG can be compile-time enabled to replace the existing /dev/random
  implementation. When not selecting the LRNG at compile time (default), the
  existing /dev/random implementation is built.

* Built-in ChaCha20 DRNG which has no dependency to other kernel
  frameworks.

* SP800-90A DRBG using the kernel crypto API including its accelerated
  raw cipher implementations. This implies that the output of /dev/random,
  getrandom(2), /dev/urandom or get_random_bytes is fully compliant to
  SP800-90A.

* Arbitrary DRNGs registered with the kernel crypto API

* Full compliance with SP800-90B which covers the startup and runtime health
  tests mandated by SP800-90B as well as providing the test tools and test
  interfaces to obtain raw noise data securely. The test tools are provided at
  [1].

Booting the patch with the kernel command line option
"dyndbg=file drivers/char/lrng* +p" generates logs indicating the operation
of the LRNG. Each log is pre-pended with "lrng".

The LRNG has a flexible design by allowing an easy replacement of the
deterministic random number generator component.

Compared to the existing /dev/random implementation, the compiled binary
is smaller when the LRNG is compiled with all options equal to the
existing /dev/random (i.e. only CONFIG_LRNG and
CONFIG_LRNG_TRNG_SUPPORT are set): random.o is 52.5 kBytes whereas
all LRNG object files are in 49 kBytes in size. The fully
SP800-90A/SP800-90B compliant binary code (CONFIG_LRNG,
CONFIG_LRNG_DRNG_SWITCH, CONFIG_LRNG_DRBG, CONFIG_LRNG_HEALTH_TESTS)
uses some 61 kBytes.

[1] http://www.chronox.de/lrng.html - If the patch is accepted, I would
be volunteering to convert the documentation into RST format and
contribute it to the Linux kernel documentation directory.

Changes (compared to the previous patch set for 5.2):

* breakup of the monolithic code base into several logically isolated
  files and move all files into drivers/char/lrng/ - this also reduces
  the number of ifdefs in the code significantly as the make system is
  used to select the enabled code

* Add Tested-by and Reviewed-by lines

* Significant speedup of code executing in interrupt handler: the LRNG
  is now almost 50% faster as the existing /dev/random. On one example
  system, the LRNG interrupt handling code executes within an average of
  65 cycles whereas the existing /dev/random on the same device takes
  about 97 cycles.

* SP800-90B compliance
	- use hash_df function defined in SP800-90A section 10.3.1 to read
	  entropy pool
	- add compile time configurable SP800-90B health tests and eliminate
	  any FIPS 140-2 code from the base code
	- consider entropy reduction of conditioning operation compliant
	  to SP800-90B
	- complete entropy assessment and entropy assessment tests available
	at [1]

* prune base LRNG code of any FIPS-related code - all FIPS-related code is
  in the SP800-90B compliance code that can be deactivated at compile time

* testing performed with all tests offered at [1] including all required
  SP800-90B tests, as well as KASAN, UBSAN, and lockdep while executing
  stress tests. Tests were performed on: x86, S390

* make DRNG switching support compile-time configurable

* selection of entropy pool size is now a configure option

* support deactivation of TRNG (i.e. blocking behavior of /dev/random)
  at compile time. If deactivated, /dev/random behaves like
  getrandom(2).

* conditionally compile NUMA support

* eliminate in_atomic() invocation: In-kernel consumers always use
  the ChaCha20 DRNG unless the new API call get_random_bytes_full
  is invoked which may sleep but offer access to the full functionality
  of the LRNG including all types of DRNG.

* use debugfs file for obtaining raw entropy test data required to fulfill
  SP800-90B requirements

* fix: ensure that gathering raw entropy does not affect runtime of the kernel

* fix: import upstream patch b7d5dc21072cda7124d13eae2aefb7343ef94197

* fix: import upstream patch 428826f5358c922dc378830a1717b682c0823160

* fix: integrate patch "random: Don't freeze in add_hwgenerator_randomness()
  if stopping kthread"

* documentation enhancement: import upstream patch
  92e507d216139b356a375afbda2824e85235e748 into documentation to cover all
  interfaces of the LRNG

* speedup of injection of non-aligned data into entropy pool

As a side node: With the switchable DRNG support offered in this patch set,
the following areas could be removed. As the existing /dev/random has no support
for switchable DRNGs, however, this is not yet feasible though.

* remove lrng_ready_list and all code around it in lrng_interfaces.c

* remove the kernel crypto API RNG API to avoid having two random number
  providing APIs - this would imply that all RNGs developed for this API would
  be converted to the LRNG interface

CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: "Alexander E. Patrakov" <patrakov@gmail.com>
CC: "Ahmed S. Darwish" <darwish.07@gmail.com>
CC: "Theodore Y. Ts'o" <tytso@mit.edu>
CC: Willy Tarreau <w@1wt.eu>
CC: Matthew Garrett <mjg59@srcf.ucam.org>
CC: Vito Caputo <vcaputo@pengaru.com>
CC: Andreas Dilger <adilger.kernel@dilger.ca>
CC: Jan Kara <jack@suse.cz>
CC: Ray Strode <rstrode@redhat.com>
CC: William Jon McCann <mccann@jhu.edu>
CC: zhangjs <zachary@baishancloud.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Florian Weimer <fweimer@redhat.com>
CC: Lennart Poettering <mzxreary@0pointer.de>
CC: Nicolai Stange <nstange@suse.de>
Tested-by: Roman Drahtmüller <draht@schaltsekun.de>
Tested-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Tested-by: Neil Horman <nhorman@redhat.com>

Stephan Mueller (12):
  Linux Random Number Generator
  LRNG - allocate one SDRNG instance per NUMA node
  LRNG - /proc interface
  LRNG - add switchable DRNG support
  crypto: DRBG - externalize DRBG functions for LRNG
  LRNG - add SP800-90A DRBG extension
  LRNG - add kernel crypto API PRNG extension
  crypto: provide access to a static Jitter RNG state
  LRNG - add Jitter RNG fast noise source
  LRNG - add TRNG support
  LRNG - add SP800-90B compliance
  LRNG - add interface for gathering of raw entropy

 MAINTAINERS                         |   7 +
 crypto/drbg.c                       |  16 +-
 crypto/jitterentropy.c              |  23 +
 drivers/char/Kconfig                |   2 +
 drivers/char/Makefile               |   9 +-
 drivers/char/lrng/Kconfig           | 145 ++++++
 drivers/char/lrng/Makefile          |  19 +
 drivers/char/lrng/lrng_archrandom.c | 105 +++++
 drivers/char/lrng/lrng_aux.c        | 161 +++++++
 drivers/char/lrng/lrng_chacha20.c   | 341 ++++++++++++++
 drivers/char/lrng/lrng_drbg.c       | 274 +++++++++++
 drivers/char/lrng/lrng_health.c     | 424 ++++++++++++++++++
 drivers/char/lrng/lrng_interfaces.c | 648 ++++++++++++++++++++++++++
 drivers/char/lrng/lrng_internal.h   | 322 +++++++++++++
 drivers/char/lrng/lrng_jent.c       | 101 +++++
 drivers/char/lrng/lrng_kcapi.c      | 341 ++++++++++++++
 drivers/char/lrng/lrng_numa.c       | 114 +++++
 drivers/char/lrng/lrng_pool.c       | 673 ++++++++++++++++++++++++++++
 drivers/char/lrng/lrng_proc.c       | 192 ++++++++
 drivers/char/lrng/lrng_sdrng.c      | 458 +++++++++++++++++++
 drivers/char/lrng/lrng_sw_noise.c   | 156 +++++++
 drivers/char/lrng/lrng_switch.c     | 198 ++++++++
 drivers/char/lrng/lrng_testing.c    | 324 +++++++++++++
 drivers/char/lrng/lrng_trng.c       | 301 +++++++++++++
 include/crypto/drbg.h               |   7 +
 include/linux/lrng.h                |  83 ++++
 26 files changed, 5437 insertions(+), 7 deletions(-)
 create mode 100644 drivers/char/lrng/Kconfig
 create mode 100644 drivers/char/lrng/Makefile
 create mode 100644 drivers/char/lrng/lrng_archrandom.c
 create mode 100644 drivers/char/lrng/lrng_aux.c
 create mode 100644 drivers/char/lrng/lrng_chacha20.c
 create mode 100644 drivers/char/lrng/lrng_drbg.c
 create mode 100644 drivers/char/lrng/lrng_health.c
 create mode 100644 drivers/char/lrng/lrng_interfaces.c
 create mode 100644 drivers/char/lrng/lrng_internal.h
 create mode 100644 drivers/char/lrng/lrng_jent.c
 create mode 100644 drivers/char/lrng/lrng_kcapi.c
 create mode 100644 drivers/char/lrng/lrng_numa.c
 create mode 100644 drivers/char/lrng/lrng_pool.c
 create mode 100644 drivers/char/lrng/lrng_proc.c
 create mode 100644 drivers/char/lrng/lrng_sdrng.c
 create mode 100644 drivers/char/lrng/lrng_sw_noise.c
 create mode 100644 drivers/char/lrng/lrng_switch.c
 create mode 100644 drivers/char/lrng/lrng_testing.c
 create mode 100644 drivers/char/lrng/lrng_trng.c
 create mode 100644 include/linux/lrng.h

Comments

Florian Weimer Nov. 12, 2019, 1:23 p.m. UTC | #1
* Stephan Müller:

> * support deactivation of TRNG (i.e. blocking behavior of /dev/random)
>   at compile time. If deactivated, /dev/random behaves like
>   getrandom(2).

I don't quite understand this comment.  Doesn't getrandom with the
GRND_RANDOM always behave like /dev/random?  Presumably, without the
TRNG tap, the GRND_RANDOM flag for getrandom is ignored, and reading
from /dev/random behaves like reading from /dev/urandom.

Anyway, reading the accompanying PDF, this looks rather impressive:
the userspace bootstrapping problem is gone (the issue where waiting
for more entropy prevents the collection of more entropy), *and* we
can still make the standards people happy.

(Replying from my other account due to mail issues, sorry.)
Andy Lutomirski Nov. 12, 2019, 3:33 p.m. UTC | #2
On Mon, Nov 11, 2019 at 11:13 AM Stephan Müller <smueller@chronox.de> wrote:
>
> The following patch set provides a different approach to /dev/random which is
> called Linux Random Number Generator (LRNG) to collect entropy within the Linux
> kernel. The main improvements compared to the existing /dev/random is to provide
> sufficient entropy during boot time as well as in virtual environments and when
> using SSDs. A secondary design goal is to limit the impact of the entropy
> collection on massive parallel systems and also allow the use accelerated
> cryptographic primitives. Also, all steps of the entropic data processing are
> testable.

This is very nice!

>
> The LRNG patch set allows a user to select use of the existing /dev/random or
> the LRNG during compile time. As the LRNG provides API and ABI compatible
> interfaces to the existing /dev/random implementation, the user can freely chose
> the RNG implementation without affecting kernel or user space operations.
>
> This patch set provides early boot-time entropy which implies that no
> additional flags to the getrandom(2) system call discussed recently on
> the LKML is considered to be necessary.

I'm uneasy about this.  I fully believe that, *on x86*, this works.
But on embedded systems with in-order CPUs, a single clock, and very
lightweight boot processes, most or all of boot might be too
deterministic for this to work.

I have a somewhat competing patch set here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=random/kill-it

(Ignore the "horrible test hack" and the debugfs part.)

The basic summary is that I change /dev/random so that it becomes
functionally identical to getrandom(..., 0) -- in other words, it
blocks until the CRNG is initialized but is then identical to
/dev/urandom.  And I add getrandom(...., GRND_INSECURE) that is
functionally identical to the existing /dev/urandom: it always returns
*something* immediately, but it may or may not actually be
cryptographically random or even random at all depending on system
details.

In other words, my series simplifies the ABI that we support.  Right
now, we have three ways to ask for random numbers with different
semantics and we need to have to RNGs in the kernel at all time.  With
my changes, we have only two ways to ask for random numbers, and the
/dev/random pool is entirely gone.

Would you be amenable to merging this into your series (i.e. either
merging the code or just the ideas)?  This would let you get rid of
things like the compile-time selection of the blocking TRNG, since the
blocking TRNG would be entirely gone.

Or do you think that a kernel-provided blocking TRNG is a genuinely
useful thing to keep around?

--Andy
Stephan Mueller Nov. 12, 2019, 10:43 p.m. UTC | #3
Am Dienstag, 12. November 2019, 14:23:10 CET schrieb Florian Weimer:

Hi Florian,

> * Stephan Müller:
> > * support deactivation of TRNG (i.e. blocking behavior of /dev/random)
> > 
> >   at compile time. If deactivated, /dev/random behaves like
> >   getrandom(2).
> 
> I don't quite understand this comment.  Doesn't getrandom with the
> GRND_RANDOM always behave like /dev/random?  Presumably, without the
> TRNG tap, the GRND_RANDOM flag for getrandom is ignored, and reading
> from /dev/random behaves like reading from /dev/urandom.

Absolutely. Apologies for the imprecision here. I will correct that.

The idea is that the constant blocking behavior of /dev/random and GRND_RANDOM 
is replaced with the blocking behavior of getrandom(2) without the GRND_RANDOM 
flag (i.e. the interface waits until the LRNG thinks it is completely seeded 
before it provides ulimited data).
> 
> Anyway, reading the accompanying PDF, this looks rather impressive:
> the userspace bootstrapping problem is gone (the issue where waiting
> for more entropy prevents the collection of more entropy), *and* we
> can still make the standards people happy.
> 
> (Replying from my other account due to mail issues, sorry.)


Ciao
Stephan
Stephan Mueller Nov. 12, 2019, 11:03 p.m. UTC | #4
Am Dienstag, 12. November 2019, 16:33:59 CET schrieb Andy Lutomirski:

Hi Andy,

> On Mon, Nov 11, 2019 at 11:13 AM Stephan Müller <smueller@chronox.de> wrote:
> > The following patch set provides a different approach to /dev/random which
> > is called Linux Random Number Generator (LRNG) to collect entropy within
> > the Linux kernel. The main improvements compared to the existing
> > /dev/random is to provide sufficient entropy during boot time as well as
> > in virtual environments and when using SSDs. A secondary design goal is
> > to limit the impact of the entropy collection on massive parallel systems
> > and also allow the use accelerated cryptographic primitives. Also, all
> > steps of the entropic data processing are testable.
> 
> This is very nice!
> 
> > The LRNG patch set allows a user to select use of the existing /dev/random
> > or the LRNG during compile time. As the LRNG provides API and ABI
> > compatible interfaces to the existing /dev/random implementation, the
> > user can freely chose the RNG implementation without affecting kernel or
> > user space operations.
> > 
> > This patch set provides early boot-time entropy which implies that no
> > additional flags to the getrandom(2) system call discussed recently on
> > the LKML is considered to be necessary.
> 
> I'm uneasy about this.  I fully believe that, *on x86*, this works.
> But on embedded systems with in-order CPUs, a single clock, and very
> lightweight boot processes, most or all of boot might be too
> deterministic for this to work.

I agree that in such cases, my LRNG getrandom(2) would also block until the 
LRNG thinks it collected 256 bits of entropy. However, I am under the 
impression that the LRNG collects that entropy faster that the existing /dev/
random implementation, even in this case.

Nicolai is copied on this thread. He promised to have the LRNG tested on such 
a minimalistic system that you describe. I hope he could contribute some 
numbers from that test helping us to understand how much of a problem we face.
> 
> I have a somewhat competing patch set here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=random
> /kill-it
> 
> (Ignore the "horrible test hack" and the debugfs part.)
> 
> The basic summary is that I change /dev/random so that it becomes
> functionally identical to getrandom(..., 0) -- in other words, it
> blocks until the CRNG is initialized but is then identical to
> /dev/urandom.

This would be equal to the LRNG code without compiling the TRNG.

> And I add getrandom(...., GRND_INSECURE) that is
> functionally identical to the existing /dev/urandom: it always returns
> *something* immediately, but it may or may not actually be
> cryptographically random or even random at all depending on system
> details.

Ok, if it is suggested that getrandom(2) should also have a mode to behave 
exactly like /dev/urandom by not waiting until it is fully seeded, I am happy 
to add that.
> 
> In other words, my series simplifies the ABI that we support.  Right
> now, we have three ways to ask for random numbers with different
> semantics and we need to have to RNGs in the kernel at all time.  With
> my changes, we have only two ways to ask for random numbers, and the
> /dev/random pool is entirely gone.

Again, I do not want to stand in the way of changing the ABI if this is the 
agreed way. All I want to say is that the LRNG seemingly is initialized much 
faster than the existing /dev/random. If this is not fast enough for some 
embedded environments, I would not want to stand in the way to make their life 
easier.
> 
> Would you be amenable to merging this into your series (i.e. either
> merging the code or just the ideas)? 

Absolutely. I would be happy to do that.

Allow me to pull your code (I am currently behind a slow line) and review it 
to see how best to integrate it.

> This would let you get rid of
> things like the compile-time selection of the blocking TRNG, since the
> blocking TRNG would be entirely gone.

Hm, I am not so sure we should do that.

Allow me to explain: I am also collaborating on the European side with the 
German BSI. They love /dev/random as it is a "NTG.1" RNG based on their AIS 31 
standard.

In order to seed a deterministic RNG (like OpenSSL, GnuTLS, etc. which are all 
defined to be "DRG.3" or "DRG.2"), BSI mandates that the seed source is an 
NTG.1.

By getting rid of the TRNG entirely and having /dev/random entirely behaving 
like /dev/urandom or getrandom(2) without the GRND_RANDOM flag, the kernel 
would "only" provide a "DRG.3" type RNG. This type of RNG would be disallowed 
to seed another "DRG.3" or "DRG.2".

In plain English that means that for BSI's requirements, if the TRNG is gone 
there would be no native seed source on Linux any more that can satisfy the 
requirement. This is the ultimate reason why I made the TRNG compile-time 
selectable: to support embedded systems but also support use cases like the 
BSI case.

Please consider that I maintain a study over the last years for BSI trying to 
ensure that the NTG.1 property is always met [1] [2]. The sole purpose of that 
study is around this NTG.1.
> 
> Or do you think that a kernel-provided blocking TRNG is a genuinely
> useful thing to keep around?

Yes, as I hope I explained it appropriately above, there are standardization 
requirements that need the TRNG.

PS: When I was forwarding Linus' email on eliminating the blocking_pool to 
BSI, I saw unhappy faces. :-)

I would like to help both sides here.

[1] https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/
LinuxRNG/NTG1_Kerneltabelle_EN.pdf?__blob=publicationFile&v=3

[2] https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/
LinuxRNG/NTG1_Kerneltabelle_EN.pdf?__blob=publicationFile&v=3

Ciao
Stephan
Stephan Mueller Nov. 12, 2019, 11:26 p.m. UTC | #5
Am Mittwoch, 13. November 2019, 00:03:47 CET schrieb Stephan Müller:

Hi Stephan,

> Am Dienstag, 12. November 2019, 16:33:59 CET schrieb Andy Lutomirski:
> 
> Hi Andy,
> 
> > On Mon, Nov 11, 2019 at 11:13 AM Stephan Müller <smueller@chronox.de> 
wrote:
> > > The following patch set provides a different approach to /dev/random
> > > which
> > > is called Linux Random Number Generator (LRNG) to collect entropy within
> > > the Linux kernel. The main improvements compared to the existing
> > > /dev/random is to provide sufficient entropy during boot time as well as
> > > in virtual environments and when using SSDs. A secondary design goal is
> > > to limit the impact of the entropy collection on massive parallel
> > > systems
> > > and also allow the use accelerated cryptographic primitives. Also, all
> > > steps of the entropic data processing are testable.
> > 
> > This is very nice!
> > 
> > > The LRNG patch set allows a user to select use of the existing
> > > /dev/random
> > > or the LRNG during compile time. As the LRNG provides API and ABI
> > > compatible interfaces to the existing /dev/random implementation, the
> > > user can freely chose the RNG implementation without affecting kernel or
> > > user space operations.
> > > 
> > > This patch set provides early boot-time entropy which implies that no
> > > additional flags to the getrandom(2) system call discussed recently on
> > > the LKML is considered to be necessary.
> > 
> > I'm uneasy about this.  I fully believe that, *on x86*, this works.
> > But on embedded systems with in-order CPUs, a single clock, and very
> > lightweight boot processes, most or all of boot might be too
> > deterministic for this to work.
> 
> I agree that in such cases, my LRNG getrandom(2) would also block until the
> LRNG thinks it collected 256 bits of entropy. However, I am under the
> impression that the LRNG collects that entropy faster that the existing
> /dev/ random implementation, even in this case.
> 
> Nicolai is copied on this thread. He promised to have the LRNG tested on
> such a minimalistic system that you describe. I hope he could contribute
> some numbers from that test helping us to understand how much of a problem
> we face.
> > I have a somewhat competing patch set here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=rand
> > om /kill-it
> > 
> > (Ignore the "horrible test hack" and the debugfs part.)
> > 
> > The basic summary is that I change /dev/random so that it becomes
> > functionally identical to getrandom(..., 0) -- in other words, it
> > blocks until the CRNG is initialized but is then identical to
> > /dev/urandom.
> 
> This would be equal to the LRNG code without compiling the TRNG.
> 
> > And I add getrandom(...., GRND_INSECURE) that is
> > functionally identical to the existing /dev/urandom: it always returns
> > *something* immediately, but it may or may not actually be
> > cryptographically random or even random at all depending on system
> > details.
> 
> Ok, if it is suggested that getrandom(2) should also have a mode to behave
> exactly like /dev/urandom by not waiting until it is fully seeded, I am
> happy to add that.
> 
> > In other words, my series simplifies the ABI that we support.  Right
> > now, we have three ways to ask for random numbers with different
> > semantics and we need to have to RNGs in the kernel at all time.  With
> > my changes, we have only two ways to ask for random numbers, and the
> > /dev/random pool is entirely gone.
> 
> Again, I do not want to stand in the way of changing the ABI if this is the
> agreed way. All I want to say is that the LRNG seemingly is initialized much
> faster than the existing /dev/random. If this is not fast enough for some
> embedded environments, I would not want to stand in the way to make their
> life easier.
> 
> > Would you be amenable to merging this into your series (i.e. either
> > merging the code or just the ideas)?
> 
> Absolutely. I would be happy to do that.
> 
> Allow me to pull your code (I am currently behind a slow line) and review it
> to see how best to integrate it.
> 
> > This would let you get rid of
> > things like the compile-time selection of the blocking TRNG, since the
> > blocking TRNG would be entirely gone.
> 
> Hm, I am not so sure we should do that.
> 
> Allow me to explain: I am also collaborating on the European side with the
> German BSI. They love /dev/random as it is a "NTG.1" RNG based on their AIS
> 31 standard.
> 
> In order to seed a deterministic RNG (like OpenSSL, GnuTLS, etc. which are
> all defined to be "DRG.3" or "DRG.2"), BSI mandates that the seed source is
> an NTG.1.
> 
> By getting rid of the TRNG entirely and having /dev/random entirely behaving
> like /dev/urandom or getrandom(2) without the GRND_RANDOM flag, the kernel
> would "only" provide a "DRG.3" type RNG. This type of RNG would be
> disallowed to seed another "DRG.3" or "DRG.2".
> 
> In plain English that means that for BSI's requirements, if the TRNG is gone
> there would be no native seed source on Linux any more that can satisfy the
> requirement. This is the ultimate reason why I made the TRNG compile-time
> selectable: to support embedded systems but also support use cases like the
> BSI case.
> 
> Please consider that I maintain a study over the last years for BSI trying
> to ensure that the NTG.1 property is always met [1] [2]. The sole purpose
> of that study is around this NTG.1.
> 
> > Or do you think that a kernel-provided blocking TRNG is a genuinely
> > useful thing to keep around?
> 
> Yes, as I hope I explained it appropriately above, there are standardization
> requirements that need the TRNG.
> 
> PS: When I was forwarding Linus' email on eliminating the blocking_pool to
> BSI, I saw unhappy faces. :-)
> 
> I would like to help both sides here.
> 
> [1]
> https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/
> LinuxRNG/NTG1_Kerneltabelle_EN.pdf?__blob=publicationFile&v=3
> 
> [2]
> https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/
> LinuxRNG/NTG1_Kerneltabelle_EN.pdf?__blob=publicationFile&v=3

Sorry, the copy did not work:

[2] https://bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/
LinuxRNG/LinuxRNG_EN.pdf?__blob=publicationFile&v=16
> 
> Ciao
> Stephan


Ciao
Stephan
Stephan Mueller Nov. 13, 2019, 4:24 a.m. UTC | #6
Am Dienstag, 12. November 2019, 16:33:59 CET schrieb Andy Lutomirski:

Hi Andy,

> On Mon, Nov 11, 2019 at 11:13 AM Stephan Müller <smueller@chronox.de> wrote:
> > The following patch set provides a different approach to /dev/random which
> > is called Linux Random Number Generator (LRNG) to collect entropy within
> > the Linux kernel. The main improvements compared to the existing
> > /dev/random is to provide sufficient entropy during boot time as well as
> > in virtual environments and when using SSDs. A secondary design goal is
> > to limit the impact of the entropy collection on massive parallel systems
> > and also allow the use accelerated cryptographic primitives. Also, all
> > steps of the entropic data processing are testable.
> 
> This is very nice!
> 
> > The LRNG patch set allows a user to select use of the existing /dev/random
> > or the LRNG during compile time. As the LRNG provides API and ABI
> > compatible interfaces to the existing /dev/random implementation, the
> > user can freely chose the RNG implementation without affecting kernel or
> > user space operations.
> > 
> > This patch set provides early boot-time entropy which implies that no
> > additional flags to the getrandom(2) system call discussed recently on
> > the LKML is considered to be necessary.
> 
> I'm uneasy about this.  I fully believe that, *on x86*, this works.
> But on embedded systems with in-order CPUs, a single clock, and very
> lightweight boot processes, most or all of boot might be too
> deterministic for this to work.
> 
> I have a somewhat competing patch set here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=random
> /kill-it
> 
> (Ignore the "horrible test hack" and the debugfs part.)
> 
> The basic summary is that I change /dev/random so that it becomes
> functionally identical to getrandom(..., 0) -- in other words, it
> blocks until the CRNG is initialized but is then identical to
> /dev/urandom.  And I add getrandom(...., GRND_INSECURE) that is
> functionally identical to the existing /dev/urandom: it always returns
> *something* immediately, but it may or may not actually be
> cryptographically random or even random at all depending on system
> details.
> 
> In other words, my series simplifies the ABI that we support.  Right
> now, we have three ways to ask for random numbers with different
> semantics and we need to have to RNGs in the kernel at all time.  With
> my changes, we have only two ways to ask for random numbers, and the
> /dev/random pool is entirely gone.
> 
> Would you be amenable to merging this into your series (i.e. either
> merging the code or just the ideas)?  This would let you get rid of
> things like the compile-time selection of the blocking TRNG, since the
> blocking TRNG would be entirely gone.

I pulled your code and found the following based on my explanation that I 
would suggest to keep the TRNG at least as an option.

- 7d54ef8512b06baf396f12584f7f48a9558ecd0f does not seem applicable: I also do 
have an equivalent "lrng_init_wait" wait queue. This wait queue is used to let 
in-kernel users wait until the LRNG obtained 128 bits of entropy. In addition, 
this wait queue is used to let user space is invoked after the LRNG has 
received 256 bits of entropy (which implies that the kernel waiters are 
invoked earlier). In kernel waiters are all that call wait_for_random_bytes 
and its derivatives. User space callers have to call getrandom(..., 0); to be 
registered in this wait queue. So, I think the wakeup calls I have in the LRNG 
for lrng_init_wait should remain.

- 6a26a3146e5fb90878dca9fde8caa1ca4233156a: My handler for /dev/urandom and 
getrandom(..., 0) are using one callback which issues a warning in both use 
cases (see lrng_sdrng_read). So I think this patch may not be applicable as 
the LRNG code implements warning about being unseeded.

- 3e8e159da49b44ae0bb08e68fa2be760722fa033: I am happy to take that code which 
would almost directly apply. The last hunk however would be:

if (!(flags & GRND_INSECURE) && unlikely(!lrng_state_operational())) {

==> Shall I apply it to my code base? If yes, how shall the changes to 
random.h be handled?


- 920e97e7fc508e6f0da9c7dec94c8073fd63ab4d: I would pass on this patch due to 
the following: it unconditionally starts removing the access to the TRNG (the 
LRNG's logical equivalent to the blocking_pool). As patch 10/12 of the LRNG 
patch series provides the TRNG that is a compile time option, your patch would 
logically and functionally be equivalent when deselecting 
CONFIG_LRNG_TRNG_SUPPORT in the LRNG without any further changes to the LRNG 
code.

- 693b9ffdf0fdc93456b5ad293ac05edf240a531b: This patch is applicable to the 
LRNG. In case CONFIG_LRNG_TRNG_SUPPORT is not set, the TRNG is not present. 
Yet, the /dev/random and getrandom(GRND_RANDOM) would behave blocked until 
fully initialized. I have now added the general blocking until the LRNG is 
fully initialized to the common /dev/random and getrandom(GRND_RANDOM) 
interface function of lrng_trng_read_common. With that, the LRNG would be 
fully equivalent to this patch if CONFIG_LRNG_TRNG_SUPPORT is not set.

- 66f660842ec6d34134b9c3c1c9c65972834797f6: This patch is implicit with 
CONFIG_LRNG_TRNG_SUPPORT being not set.

- d8f59b5c25af22fb9d85b7fa96de601ea03f2eac: This patch is not applicable to 
the LRNG as the deactivation of CONFIG_LRNG_TRNG_SUPPORT implies that there 
should be no unused code left in the LRNG.

- 4046ac638761821aef67af10537ebcbc80715785: In theory that patch is applicable 
to the LRNG as well. The LRNG has the lrng_read_wait queue. If 
CONFIG_LRNG_TRNG_SUPPORT is not set, there will never be the code triggered to 
add a caller to this wait queue. To avoid cluttering the LRNG code with 
ifdefs, may I suggest to leave these several lines even though it is dead 
code?



Bottom line: the only patch that I seems to be relevant and that I would be 
happy to apply is the one adding GRND_INSECURE. All other patches are 
implicitly covered by deselecting CONFIG_LRNG_TRNG_SUPPORT.

By making the TRNG compile-time selectable, I was hoping to serve all users: I 
wanted to cover the conclusions of the discussion to remove the blocking_pool. 
On the other hand, however, I want to support requirements that need the 
blocking behavior.

The current LRNG patch set, however, defaults to Y for 
CONFIG_LRNG_TRNG_SUPPORT. I would see no issue if it defaults to N.


Thank you very much.

Ciao
Stephan
Andy Lutomirski Nov. 13, 2019, 4:48 a.m. UTC | #7
On Tue, Nov 12, 2019 at 8:25 PM Stephan Müller <smueller@chronox.de> wrote:
>
> Am Dienstag, 12. November 2019, 16:33:59 CET schrieb Andy Lutomirski:
>
> Hi Andy,
>
> > On Mon, Nov 11, 2019 at 11:13 AM Stephan Müller <smueller@chronox.de> wrote:
> > > The following patch set provides a different approach to /dev/random which
> > > is called Linux Random Number Generator (LRNG) to collect entropy within
> > > the Linux kernel. The main improvements compared to the existing
> > > /dev/random is to provide sufficient entropy during boot time as well as
> > > in virtual environments and when using SSDs. A secondary design goal is
> > > to limit the impact of the entropy collection on massive parallel systems
> > > and also allow the use accelerated cryptographic primitives. Also, all
> > > steps of the entropic data processing are testable.
> >
> > This is very nice!
> >
> > > The LRNG patch set allows a user to select use of the existing /dev/random
> > > or the LRNG during compile time. As the LRNG provides API and ABI
> > > compatible interfaces to the existing /dev/random implementation, the
> > > user can freely chose the RNG implementation without affecting kernel or
> > > user space operations.
> > >
> > > This patch set provides early boot-time entropy which implies that no
> > > additional flags to the getrandom(2) system call discussed recently on
> > > the LKML is considered to be necessary.
> >
> > I'm uneasy about this.  I fully believe that, *on x86*, this works.
> > But on embedded systems with in-order CPUs, a single clock, and very
> > lightweight boot processes, most or all of boot might be too
> > deterministic for this to work.
> >
> > I have a somewhat competing patch set here:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=random
> > /kill-it
> >
> > (Ignore the "horrible test hack" and the debugfs part.)
> >
> > The basic summary is that I change /dev/random so that it becomes
> > functionally identical to getrandom(..., 0) -- in other words, it
> > blocks until the CRNG is initialized but is then identical to
> > /dev/urandom.  And I add getrandom(...., GRND_INSECURE) that is
> > functionally identical to the existing /dev/urandom: it always returns
> > *something* immediately, but it may or may not actually be
> > cryptographically random or even random at all depending on system
> > details.
> >
> > In other words, my series simplifies the ABI that we support.  Right
> > now, we have three ways to ask for random numbers with different
> > semantics and we need to have to RNGs in the kernel at all time.  With
> > my changes, we have only two ways to ask for random numbers, and the
> > /dev/random pool is entirely gone.
> >
> > Would you be amenable to merging this into your series (i.e. either
> > merging the code or just the ideas)?  This would let you get rid of
> > things like the compile-time selection of the blocking TRNG, since the
> > blocking TRNG would be entirely gone.
>
> I pulled your code and found the following based on my explanation that I
> would suggest to keep the TRNG at least as an option.
>
> - 7d54ef8512b06baf396f12584f7f48a9558ecd0f does not seem applicable:

Not surprising.  It's just a cleanup to the existing code, and I doubt
you inherited the oddity I'm fixing.

> - 6a26a3146e5fb90878dca9fde8caa1ca4233156a: My handler for /dev/urandom and
> getrandom(..., 0) are using one callback which issues a warning in both use
> cases (see lrng_sdrng_read). So I think this patch may not be applicable as
> the LRNG code implements warning about being unseeded.

Probably true.

What is the actual semantics of /dev/urandom with your series applied?
 Is there any situation in which it will block?

>
> - 3e8e159da49b44ae0bb08e68fa2be760722fa033: I am happy to take that code which
> would almost directly apply. The last hunk however would be:
>
> if (!(flags & GRND_INSECURE) && unlikely(!lrng_state_operational())) {
>
> ==> Shall I apply it to my code base? If yes, how shall the changes to
> random.h be handled?
>

This might be a question for Ted.  Once the merge window opens, I'll
resubmit it.

>
> - 920e97e7fc508e6f0da9c7dec94c8073fd63ab4d: I would pass on this patch due to
> the following: it unconditionally starts removing the access to the TRNG (the
> LRNG's logical equivalent to the blocking_pool). As patch 10/12 of the LRNG
> patch series provides the TRNG that is a compile time option, your patch would
> logically and functionally be equivalent when deselecting
> CONFIG_LRNG_TRNG_SUPPORT in the LRNG without any further changes to the LRNG
> code.

Given your previous email about the TRNG, I'm wondering what the API
for the TRNG should be.  I am willing to grant that there are users
who need a TRNG for various reasons, and that not all of them can use
hwrng.  (And the current hwrng API is pretty bad.)  But I'm not
convinced that /dev/random or getrandom(..., GRND_RANDOM) is a
reasonable way to access it.  A blocking_pool-style TRNG is a very
limited resource, and I think it could make sense to require some sort
of actual permission to use it.  GRND_RANDOM has no access control at
all, and everyone expects /dev/random to be world-readable.  The most
widespread user of /dev/random that I know of is gnupg, and gnupg
really should not be using it.

Would it make sense to have a /dev/true_random that is 0400 by default
for users who actually need it?  Then /dev/random and GRND_RANDOM
could work as they do with my patch, and maybe it does the right thing
for everyone.

>
> - 693b9ffdf0fdc93456b5ad293ac05edf240a531b: This patch is applicable to the
> LRNG. In case CONFIG_LRNG_TRNG_SUPPORT is not set, the TRNG is not present.
> Yet, the /dev/random and getrandom(GRND_RANDOM) would behave blocked until
> fully initialized. I have now added the general blocking until the LRNG is
> fully initialized to the common /dev/random and getrandom(GRND_RANDOM)
> interface function of lrng_trng_read_common. With that, the LRNG would be
> fully equivalent to this patch if CONFIG_LRNG_TRNG_SUPPORT is not set.

Sounds reasonable.

> By making the TRNG compile-time selectable, I was hoping to serve all users: I
> wanted to cover the conclusions of the discussion to remove the blocking_pool.
> On the other hand, however, I want to support requirements that need the
> blocking behavior.

I find it odd that /dev/random would be either a TRNG or not a TRNG
depending on kernel configuration.  For the small fraction of users
that actually want a TRNG, wouldn't it be better to have an interface
that fails outright if the TRNG is not enabled?

--Andy
Stephan Mueller Nov. 13, 2019, 12:16 p.m. UTC | #8
Am Mittwoch, 13. November 2019, 05:48:30 CET schrieb Andy Lutomirski:

Hi Andy,

> 
> > - 6a26a3146e5fb90878dca9fde8caa1ca4233156a: My handler for /dev/urandom
> > and
> > getrandom(..., 0) are using one callback which issues a warning in both
> > use
> > cases (see lrng_sdrng_read). So I think this patch may not be applicable
> > as
> > the LRNG code implements warning about being unseeded.
> 
> Probably true.
> 
> What is the actual semantics of /dev/urandom with your series applied?
>  Is there any situation in which it will block?

The LRNG tries to provide a 100% identical user interface to the existing /
dev/random:

- /dev/urandom never blocks

- getrandom(..., 0) blocks until the LRNG has received 256 bits of entropy 
(i.e. the LRNG is fully seeded)

Yet, both may issue a warning if CONFIG_WARN_ALL_UNSEEDED_RANDOM is set.
> 
> > - 3e8e159da49b44ae0bb08e68fa2be760722fa033: I am happy to take that code
> > which would almost directly apply. The last hunk however would be:
> > 
> > if (!(flags & GRND_INSECURE) && unlikely(!lrng_state_operational())) {
> > 
> > ==> Shall I apply it to my code base? If yes, how shall the changes to
> > random.h be handled?
> 
> This might be a question for Ted.  Once the merge window opens, I'll
> resubmit it.

Ok, I will keep it out of the LRNG for now, but once your patch is merged, I 
would integrate it.
> 
> > - 920e97e7fc508e6f0da9c7dec94c8073fd63ab4d: I would pass on this patch due
> > to the following: it unconditionally starts removing the access to the
> > TRNG (the LRNG's logical equivalent to the blocking_pool). As patch 10/12
> > of the LRNG patch series provides the TRNG that is a compile time option,
> > your patch would logically and functionally be equivalent when
> > deselecting
> > CONFIG_LRNG_TRNG_SUPPORT in the LRNG without any further changes to the
> > LRNG code.
> 
> Given your previous email about the TRNG, I'm wondering what the API
> for the TRNG should be.  I am willing to grant that there are users
> who need a TRNG for various reasons, and that not all of them can use
> hwrng.  (And the current hwrng API is pretty bad.)  But I'm not
> convinced that /dev/random or getrandom(..., GRND_RANDOM) is a
> reasonable way to access it.  A blocking_pool-style TRNG is a very
> limited resource, and I think it could make sense to require some sort
> of actual permission to use it.  GRND_RANDOM has no access control at
> all, and everyone expects /dev/random to be world-readable.  The most
> widespread user of /dev/random that I know of is gnupg, and gnupg
> really should not be using it.
> 
> Would it make sense to have a /dev/true_random that is 0400 by default
> for users who actually need it?  Then /dev/random and GRND_RANDOM
> could work as they do with my patch, and maybe it does the right thing
> for everyone.

That is surely a reasonable way to do it. But I am not sure 0400 should be 
applied, but rather 0440. This should allow introducing a group in user space 
that processes who need the TRNG are not required to have root privilege, but 
rather need to be a member of some otherwise unprivileged group.
> 
> > - 693b9ffdf0fdc93456b5ad293ac05edf240a531b: This patch is applicable to
> > the
> > LRNG. In case CONFIG_LRNG_TRNG_SUPPORT is not set, the TRNG is not
> > present.
> > Yet, the /dev/random and getrandom(GRND_RANDOM) would behave blocked until
> > fully initialized. I have now added the general blocking until the LRNG is
> > fully initialized to the common /dev/random and getrandom(GRND_RANDOM)
> > interface function of lrng_trng_read_common. With that, the LRNG would be
> > fully equivalent to this patch if CONFIG_LRNG_TRNG_SUPPORT is not set.
> 
> Sounds reasonable.
> 
> > By making the TRNG compile-time selectable, I was hoping to serve all
> > users: I wanted to cover the conclusions of the discussion to remove the
> > blocking_pool. On the other hand, however, I want to support requirements
> > that need the blocking behavior.
> 
> I find it odd that /dev/random would be either a TRNG or not a TRNG
> depending on kernel configuration.  For the small fraction of users
> that actually want a TRNG, wouldn't it be better to have an interface
> that fails outright if the TRNG is not enabled?

Sure, I would have no concerns here.

> 
> --Andy


Ciao
Stephan