diff mbox series

[v1] random: block in /dev/urandom

Message ID 20220217162848.303601-1-Jason@zx2c4.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series [v1] random: block in /dev/urandom | expand

Commit Message

Jason A. Donenfeld Feb. 17, 2022, 4:28 p.m. UTC
This topic has come up countless times, and usually doesn't go anywhere.
This time I thought I'd bring it up with a slightly narrower focus,
updated for some developments over the last three years: we finally can
make /dev/urandom always secure, in light of the fact that our RNG is
now always seeded.

Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
rather than passively wait for it"), the RNG does a haveged-style jitter
dance around the scheduler, in order to produce entropy (and credit it)
for the case when we're stuck in wait_for_random_bytes(). How ever you
feel about the Linus Jitter Dance is beside the point: it's been there
for three years and usually gets the RNG initialized in a second or so.

As a matter of fact, this is what happens currently when people use
getrandom(). It's already there and working, and most people have been
using it for years without realizing.

So, given that the kernel has grown this mechanism for seeding itself
from nothing, and that this procedure happens pretty fast, maybe there's
no point any longer in having /dev/urandom give insecure bytes. In the
past we didn't want the boot process to deadlock, which was
understandable. But now, in the worst case, a second goes by, and the
problem is resolved. It seems like maybe we're finally at a point when
we can get rid of the infamous "urandom read hole".

The one slight drawback is that the Linus Jitter Dance relies on random_
get_entropy() being implemented. The first lines of try_to_generate_
entropy() are:

	stack.now = random_get_entropy();
	if (stack.now == random_get_entropy())
		return;

On most platforms, random_get_entropy() is simply aliased to get_cycles().
The number of machines without a cycle counter or some other
implementation of random_get_entropy() in 2022, which can also run a
mainline kernel, and at the same time have a both broken and out of date
userspace that relies on /dev/urandom never blocking at boot is thought
to be exceedingly low. And to be clear: those museum pieces without
cycle counters will continue to run Linux just fine, and even
/dev/urandom will be operable just like before; the RNG just needs to be
seeded first through the usual means, which should already be the case
now.

On systems that really do want unseeded randomness, we already offer
getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
their hash tables at boot. Nothing in this commit would affect
GRND_INSECURE, and it remains the means of getting those types of random
numbers.

This patch goes a long way toward eliminating a long overdue userspace
crypto footgun. After several decades of endless user confusion, we will
finally be able to say, "use any single one of our random interfaces and
you'll be fine. They're all the same. It doesn't matter." And that, I
think, is really something. Finally all of those blog posts and
disagreeing forums and contradictory articles will all become correct
about whatever they happened to recommend, and along with it, a whole
class of vulnerabilities eliminated.

With very minimal downside, we're finally in a position where we can
make this change.

Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Having learned that MIPS32 isn't affected by this (initially my largest
worry), and then heartened today upon reading LWN's summary of our
previous discussion ("it would seem there are no huge barriers to
removing the final distinction between /dev/random and /dev/urandom"), I
figured I'd go ahead and submit a v1 of this. It seems at least worth
trying and seeing if somebody arrives with legitimate complaints. To
that end I've also widened the CC list quite a bit.

Changes v0->v1:
- We no longer touch GRND_INSECURE at all, in anyway. Lennart (and to an
  extent, Andy) pointed out that getting insecure numbers immediately at
  boot is still something that has legitimate use cases, so this patch
  no longer touches that code.

 drivers/char/mem.c     |  2 +-
 drivers/char/random.c  | 51 ++++++------------------------------------
 include/linux/random.h |  2 +-
 3 files changed, 9 insertions(+), 46 deletions(-)

Comments

Andy Lutomirski Feb. 21, 2022, 6:01 p.m. UTC | #1
On Thu, Feb 17, 2022, at 8:28 AM, Jason A. Donenfeld wrote:
> This topic has come up countless times, and usually doesn't go anywhere.
> This time I thought I'd bring it up with a slightly narrower focus,
> updated for some developments over the last three years: we finally can
> make /dev/urandom always secure, in light of the fact that our RNG is
> now always seeded.
>
> Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
> rather than passively wait for it"), the RNG does a haveged-style jitter
> dance around the scheduler, in order to produce entropy (and credit it)
> for the case when we're stuck in wait_for_random_bytes(). How ever you
> feel about the Linus Jitter Dance is beside the point: it's been there
> for three years and usually gets the RNG initialized in a second or so.
>
> As a matter of fact, this is what happens currently when people use
> getrandom(). It's already there and working, and most people have been
> using it for years without realizing.
>
> So, given that the kernel has grown this mechanism for seeding itself
> from nothing, and that this procedure happens pretty fast, maybe there's
> no point any longer in having /dev/urandom give insecure bytes. In the
> past we didn't want the boot process to deadlock, which was
> understandable. But now, in the worst case, a second goes by, and the
> problem is resolved. It seems like maybe we're finally at a point when
> we can get rid of the infamous "urandom read hole".
>

This patch is 100% about a historical mistake.  Way back when (not actually that long ago), there were two usable interfaces to the random number generator: /dev/random and /dev/urandom.  /dev/random was, at least in principle, secure, but it blocked unnecessarily and was, therefore, incredibly slow.  It was totally unsuitable for repeated use by any sort of server.  /dev/urandom didn't block but was insecure if called too early.  *But* urandom was also the correct interface to get best-effort-i-need-them-right-now random bits.  The actual semantics that general crypography users wanted were not available.

Fast forward to today.  /dev/random has the correct semantics for cryptographic purposes.  getrandom() also has the correct semantics for cryptographic purposes and is reliable as such -- it is guaranteed to either not exist or to DTRT.  And best-effort users can use GRND_INSECURE or /dev/urandom.

If we imagine that every user program we care about uses GRND_INSECURE for best-effort and /dev/random or getrandom() without GRND_INSECURE for cryptography, then we're in great shape and this patch is irrelevant.

But we don't get to rely on that.  New kernels are supposed to be compatible with old userspace.  And with *old* userspace, we do not know whether /dev/urandom users want cryptographically secure output or whether they want insecure output.  And there is this window during boot that lasts, supposedly, up to 1 second, there is a massive difference. [0]

So, sorry, this patch is an ABI break.  You're reinterpreting any program that wanted best-effort randomness right after boot as wanting cryptographic randomness, this can delay boot by up to a second [0], and that's more than enough delay to be considered a break.

So I don't like this without a stronger justification and a clearer compatibility story.  I could *maybe* get on board if you had a urandom=insecure boot option to switch back to the old behavior and a very clear message like "random: startup of %s is delayed. Set urandom=insecure for faster boot if you do not need cryptographically secure urandom during boot", but I don't think this patch is okay otherwise.

Or we stick with the status quo and make the warning clearer.  "random: %s us using insecure urandom output.  Fix it to use getrandom() or /dev/rando as appropriate."

[0] I just booted 5.16 in a Skylake -rdrand,-rdseed VM and it took 1.14 seconds to initialize.
Jason A. Donenfeld Feb. 23, 2022, 5:02 p.m. UTC | #2
Hi Andy,

I think your analysis is a bit mismatched from the reality of the
situation. That reality is that cryptographic users still find
themselves using /dev/urandom, as that's been the "standard good
advice" for a very long time. And people are still encouraged to do
that, either out of ignorance or out of "compatibility". The
cryptographic problem is not going away.

Fixing this issue means, yes, adding a 1 second delay to the small
group of init system users who haven't switched to using
getrandom(GRND_INSECURE) for that less common usage (who even are
those users actually?). That's not breaking compatibility or breaking
userspace or breaking anything; that's accepting the reality of _how_
/dev/urandom is mostly used -- for crypto -- and making that usage
finally secure, at the expense of a 1 second delay for those other
users who haven't switched to getrandom(GRND_INSECURE) yet. That seems
like a _very_ small price to pay for eliminating a footgun.

And in general, deemphasizing the rare performance of the less common
usage in favor of fixing a commonly triggered footgun seems on par
with how things morph and change over time. There's no actual
breakage. There's no ABI change violation. What you're saying simply
isn't so.

In other words, I'm not really at all convinced by what you're saying.

Jason
Theodore Ts'o Feb. 23, 2022, 5:55 p.m. UTC | #3
On Wed, Feb 23, 2022 at 06:02:52PM +0100, Jason A. Donenfeld wrote:
> 
> I think your analysis is a bit mismatched from the reality of the
> situation. That reality is that cryptographic users still find
> themselves using /dev/urandom, as that's been the "standard good
> advice" for a very long time. And people are still encouraged to do
> that, either out of ignorance or out of "compatibility". The
> cryptographic problem is not going away.

Or they open /dev/urandom because getrandom() and getentropy() isn't
available on some OS's (all the world is not Linux, despite what the
systemd folks like to believe), and some other OS's have a
/dev/urandom-like device that they can open, and so it's just more
convenient for application programers to open and read from
/dev/urandom.

> Fixing this issue means, yes, adding a 1 second delay to the small
> group of init system users who haven't switched to using
> getrandom(GRND_INSECURE) for that less common usage (who even are
> those users actually?). That's not breaking compatibility or breaking
> userspace or breaking anything; that's accepting the reality of _how_
> /dev/urandom is mostly used -- for crypto -- and making that usage
> finally secure, at the expense of a 1 second delay for those other
> users who haven't switched to getrandom(GRND_INSECURE) yet. That seems
> like a _very_ small price to pay for eliminating a footgun.

I agree.  So long as we're only blocking for short amount of time, and
only during early after the system was booted, people shouldn't care.
The reason why we had to add the "gee-I-hope-this-jitterentropy-like-
hack-is-actually-secure on all architectures but it's better than the
alternatives people were trying to get Linus to adopt" was because
there were systems were hanging for hours or days.

      	   	   		    	  - Ted
Eric Biggers March 12, 2022, 8:17 p.m. UTC | #4
On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> This topic has come up countless times, and usually doesn't go anywhere.
> This time I thought I'd bring it up with a slightly narrower focus,
> updated for some developments over the last three years: we finally can
> make /dev/urandom always secure, in light of the fact that our RNG is
> now always seeded.
> 
> Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
> rather than passively wait for it"), the RNG does a haveged-style jitter
> dance around the scheduler, in order to produce entropy (and credit it)
> for the case when we're stuck in wait_for_random_bytes(). How ever you
> feel about the Linus Jitter Dance is beside the point: it's been there
> for three years and usually gets the RNG initialized in a second or so.
> 
> As a matter of fact, this is what happens currently when people use
> getrandom(). It's already there and working, and most people have been
> using it for years without realizing.
> 
> So, given that the kernel has grown this mechanism for seeding itself
> from nothing, and that this procedure happens pretty fast, maybe there's
> no point any longer in having /dev/urandom give insecure bytes. In the
> past we didn't want the boot process to deadlock, which was
> understandable. But now, in the worst case, a second goes by, and the
> problem is resolved. It seems like maybe we're finally at a point when
> we can get rid of the infamous "urandom read hole".
> 
> The one slight drawback is that the Linus Jitter Dance relies on random_
> get_entropy() being implemented. The first lines of try_to_generate_
> entropy() are:
> 
> 	stack.now = random_get_entropy();
> 	if (stack.now == random_get_entropy())
> 		return;
> 
> On most platforms, random_get_entropy() is simply aliased to get_cycles().
> The number of machines without a cycle counter or some other
> implementation of random_get_entropy() in 2022, which can also run a
> mainline kernel, and at the same time have a both broken and out of date
> userspace that relies on /dev/urandom never blocking at boot is thought
> to be exceedingly low. And to be clear: those museum pieces without
> cycle counters will continue to run Linux just fine, and even
> /dev/urandom will be operable just like before; the RNG just needs to be
> seeded first through the usual means, which should already be the case
> now.
> 
> On systems that really do want unseeded randomness, we already offer
> getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
> their hash tables at boot. Nothing in this commit would affect
> GRND_INSECURE, and it remains the means of getting those types of random
> numbers.
> 
> This patch goes a long way toward eliminating a long overdue userspace
> crypto footgun. After several decades of endless user confusion, we will
> finally be able to say, "use any single one of our random interfaces and
> you'll be fine. They're all the same. It doesn't matter." And that, I
> think, is really something. Finally all of those blog posts and
> disagreeing forums and contradictory articles will all become correct
> about whatever they happened to recommend, and along with it, a whole
> class of vulnerabilities eliminated.
> 
> With very minimal downside, we're finally in a position where we can
> make this change.
> 
> Cc: Dinh Nguyen <dinguyen@kernel.org>
> Cc: Nick Hu <nickhu@andestech.com>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Joshua Kinard <kumba@gentoo.org>
> Cc: David Laight <David.Laight@aculab.com>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Eric Biggers <ebiggers@google.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Having learned that MIPS32 isn't affected by this (initially my largest
> worry), and then heartened today upon reading LWN's summary of our
> previous discussion ("it would seem there are no huge barriers to
> removing the final distinction between /dev/random and /dev/urandom"), I
> figured I'd go ahead and submit a v1 of this. It seems at least worth
> trying and seeing if somebody arrives with legitimate complaints. To
> that end I've also widened the CC list quite a bit.
> 
> Changes v0->v1:
> - We no longer touch GRND_INSECURE at all, in anyway. Lennart (and to an
>   extent, Andy) pointed out that getting insecure numbers immediately at
>   boot is still something that has legitimate use cases, so this patch
>   no longer touches that code.
> 
>  drivers/char/mem.c     |  2 +-
>  drivers/char/random.c  | 51 ++++++------------------------------------
>  include/linux/random.h |  2 +-
>  3 files changed, 9 insertions(+), 46 deletions(-)
> 

Just a small nit: the comments above rng_is_initialized() and
wait_for_random_bytes() still imply that /dev/urandom is nonblocking.

- Eric
Eric Biggers March 12, 2022, 8:27 p.m. UTC | #5
On Sat, Mar 12, 2022 at 12:17:09PM -0800, Eric Biggers wrote:
> On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> > This topic has come up countless times, and usually doesn't go anywhere.
> > This time I thought I'd bring it up with a slightly narrower focus,
> > updated for some developments over the last three years: we finally can
> > make /dev/urandom always secure, in light of the fact that our RNG is
> > now always seeded.
> > 
> > Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
> > rather than passively wait for it"), the RNG does a haveged-style jitter
> > dance around the scheduler, in order to produce entropy (and credit it)
> > for the case when we're stuck in wait_for_random_bytes(). How ever you
> > feel about the Linus Jitter Dance is beside the point: it's been there
> > for three years and usually gets the RNG initialized in a second or so.
> > 
> > As a matter of fact, this is what happens currently when people use
> > getrandom(). It's already there and working, and most people have been
> > using it for years without realizing.
> > 
> > So, given that the kernel has grown this mechanism for seeding itself
> > from nothing, and that this procedure happens pretty fast, maybe there's
> > no point any longer in having /dev/urandom give insecure bytes. In the
> > past we didn't want the boot process to deadlock, which was
> > understandable. But now, in the worst case, a second goes by, and the
> > problem is resolved. It seems like maybe we're finally at a point when
> > we can get rid of the infamous "urandom read hole".
> > 
> > The one slight drawback is that the Linus Jitter Dance relies on random_
> > get_entropy() being implemented. The first lines of try_to_generate_
> > entropy() are:
> > 
> > 	stack.now = random_get_entropy();
> > 	if (stack.now == random_get_entropy())
> > 		return;
> > 
> > On most platforms, random_get_entropy() is simply aliased to get_cycles().
> > The number of machines without a cycle counter or some other
> > implementation of random_get_entropy() in 2022, which can also run a
> > mainline kernel, and at the same time have a both broken and out of date
> > userspace that relies on /dev/urandom never blocking at boot is thought
> > to be exceedingly low. And to be clear: those museum pieces without
> > cycle counters will continue to run Linux just fine, and even
> > /dev/urandom will be operable just like before; the RNG just needs to be
> > seeded first through the usual means, which should already be the case
> > now.
> > 
> > On systems that really do want unseeded randomness, we already offer
> > getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
> > their hash tables at boot. Nothing in this commit would affect
> > GRND_INSECURE, and it remains the means of getting those types of random
> > numbers.
> > 
> > This patch goes a long way toward eliminating a long overdue userspace
> > crypto footgun. After several decades of endless user confusion, we will
> > finally be able to say, "use any single one of our random interfaces and
> > you'll be fine. They're all the same. It doesn't matter." And that, I
> > think, is really something. Finally all of those blog posts and
> > disagreeing forums and contradictory articles will all become correct
> > about whatever they happened to recommend, and along with it, a whole
> > class of vulnerabilities eliminated.
> > 
> > With very minimal downside, we're finally in a position where we can
> > make this change.
> > 
> > Cc: Dinh Nguyen <dinguyen@kernel.org>
> > Cc: Nick Hu <nickhu@andestech.com>
> > Cc: Max Filippov <jcmvbkbc@gmail.com>
> > Cc: Palmer Dabbelt <palmer@dabbelt.com>
> > Cc: David S. Miller <davem@davemloft.net>
> > Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> > Cc: Michal Simek <monstr@monstr.eu>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Guo Ren <guoren@kernel.org>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: Joshua Kinard <kumba@gentoo.org>
> > Cc: David Laight <David.Laight@aculab.com>
> > Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> > Cc: Eric Biggers <ebiggers@google.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Theodore Ts'o <tytso@mit.edu>
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > ---
> > Having learned that MIPS32 isn't affected by this (initially my largest
> > worry), and then heartened today upon reading LWN's summary of our
> > previous discussion ("it would seem there are no huge barriers to
> > removing the final distinction between /dev/random and /dev/urandom"), I
> > figured I'd go ahead and submit a v1 of this. It seems at least worth
> > trying and seeing if somebody arrives with legitimate complaints. To
> > that end I've also widened the CC list quite a bit.
> > 
> > Changes v0->v1:
> > - We no longer touch GRND_INSECURE at all, in anyway. Lennart (and to an
> >   extent, Andy) pointed out that getting insecure numbers immediately at
> >   boot is still something that has legitimate use cases, so this patch
> >   no longer touches that code.
> > 
> >  drivers/char/mem.c     |  2 +-
> >  drivers/char/random.c  | 51 ++++++------------------------------------
> >  include/linux/random.h |  2 +-
> >  3 files changed, 9 insertions(+), 46 deletions(-)
> > 
> 
> Just a small nit: the comments above rng_is_initialized() and
> wait_for_random_bytes() still imply that /dev/urandom is nonblocking.
> 

Also the comment describing 'Fast key erasure RNG, the "crng".' still claims
that get_random_bytes(), get_random_u32(), etc. are "equivalent to a read from
/dev/urandom".  With this patch, they're not, since they don't block whereas
/dev/urandom will block.

- Eric
Guenter Roeck March 22, 2022, 3:58 p.m. UTC | #6
On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> This topic has come up countless times, and usually doesn't go anywhere.
> This time I thought I'd bring it up with a slightly narrower focus,
> updated for some developments over the last three years: we finally can
> make /dev/urandom always secure, in light of the fact that our RNG is
> now always seeded.
> 

[ ... ]

This patch (or a later version of it) made it into mainline and causes a
large number of qemu boot test failures for various architectures (arm,
m68k, microblaze, sparc32, xtensa are the ones I observed). Common
denominator is that boot hangs at "Saving random seed:". A sample bisect
log is attached. Reverting this patch fixes the problem.

Guenter

---
# bad: [8565d64430f8278bea38dab0a3ab60b4e11c71e4] Merge tag 'bounds-fixes-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
# good: [f443e374ae131c168a065ea1748feac6b2e76613] Linux 5.17
git bisect start 'HEAD' 'v5.17'
# bad: [5628b8de1228436d47491c662dc521bc138a3d43] Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
git bisect bad 5628b8de1228436d47491c662dc521bc138a3d43
# good: [a04b1bf574e1f4875ea91f5c62ca051666443200] Merge tag 'for-5.18/parisc-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good a04b1bf574e1f4875ea91f5c62ca051666443200
# good: [242ba6656d604aa8dc87451fc08143cb28d5a587] Merge tag 'acpi-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 242ba6656d604aa8dc87451fc08143cb28d5a587
# good: [02b82b02c34321dde10d003aafcd831a769b2a8a] Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 02b82b02c34321dde10d003aafcd831a769b2a8a
# bad: [77553cf8f44863b31da242cf24671d76ddb61597] random: don't let 644 read-only sysctls be written to
git bisect bad 77553cf8f44863b31da242cf24671d76ddb61597
# good: [a07fdae346c35c6ba286af1c88e0effcfa330bf9] random: add proper SPDX header
git bisect good a07fdae346c35c6ba286af1c88e0effcfa330bf9
# good: [58340f8e952b613e0ead0bed58b97b05bf4743c5] random: defer fast pool mixing to worker
git bisect good 58340f8e952b613e0ead0bed58b97b05bf4743c5
# good: [da3951ebdcd1cb1d5c750e08cd05aee7b0c04d9a] random: round-robin registers as ulong, not u32
git bisect good da3951ebdcd1cb1d5c750e08cd05aee7b0c04d9a
# good: [abded93ec1e9692920fe309f07f40bd1035f2940] random: unify cycles_t and jiffies usage and types
git bisect good abded93ec1e9692920fe309f07f40bd1035f2940
# bad: [6f98a4bfee72c22f50aedb39fb761567969865fe] random: block in /dev/urandom
git bisect bad 6f98a4bfee72c22f50aedb39fb761567969865fe
# good: [c2a7de4feb6e09f23af7accc0f882a8fa92e7ae5] random: do crng pre-init loading in worker rather than irq
git bisect good c2a7de4feb6e09f23af7accc0f882a8fa92e7ae5
# first bad commit: [6f98a4bfee72c22f50aedb39fb761567969865fe] random: block in /dev/urandom
Linus Torvalds March 22, 2022, 4:21 p.m. UTC | #7
On Tue, Mar 22, 2022 at 8:58 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> This patch (or a later version of it) made it into mainline and causes a
> large number of qemu boot test failures for various architectures (arm,
> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> denominator is that boot hangs at "Saving random seed:". A sample bisect
> log is attached. Reverting this patch fixes the problem.

Ok, it was worth trying, but yeah, it clearly causes problems for
various platforms that can't do jitter entropy and have nothing else
happening either.

Will revert.

Thanks.

               Linus
Jason A. Donenfeld March 22, 2022, 4:40 p.m. UTC | #8
Hi Linus,

On Tue, Mar 22, 2022 at 10:27 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Mar 22, 2022 at 8:58 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > This patch (or a later version of it) made it into mainline and causes a
> > large number of qemu boot test failures for various architectures (arm,
> > m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> > denominator is that boot hangs at "Saving random seed:". A sample bisect
> > log is attached. Reverting this patch fixes the problem.
>
> Ok, it was worth trying, but yeah, it clearly causes problems for
> various platforms that can't do jitter entropy and have nothing else
> happening either.
>
> Will revert.

Shucks. I wish I had a good argument here, but I suppose the most I
can say is maybe we'll be able to revisit it some time off. I'll send
you the revert patch in a minute.

Jason
Jason A. Donenfeld March 22, 2022, 5:09 p.m. UTC | #9
Hey Guenter,

On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
> On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> > This topic has come up countless times, and usually doesn't go anywhere.
> > This time I thought I'd bring it up with a slightly narrower focus,
> > updated for some developments over the last three years: we finally can
> > make /dev/urandom always secure, in light of the fact that our RNG is
> > now always seeded.
> > 
> 
> [ ... ]
> 
> This patch (or a later version of it) made it into mainline and causes a
> large number of qemu boot test failures for various architectures (arm,
> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> denominator is that boot hangs at "Saving random seed:". A sample bisect
> log is attached. Reverting this patch fixes the problem.

As Linus said, it was worth a try, but I guess it just didn't work. For
my own curiosity, though, do you have a link to those QEMU VMs you could
share? I'd sort of like to poke around, and if we do ever reattempt this
sometime down the road, it seems like understanding everything about why
the previous time failed might be a good idea.

Jason
Guenter Roeck March 22, 2022, 5:56 p.m. UTC | #10
On 3/22/22 10:09, Jason A. Donenfeld wrote:
> Hey Guenter,
> 
> On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
>> On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
>>> This topic has come up countless times, and usually doesn't go anywhere.
>>> This time I thought I'd bring it up with a slightly narrower focus,
>>> updated for some developments over the last three years: we finally can
>>> make /dev/urandom always secure, in light of the fact that our RNG is
>>> now always seeded.
>>>
>>
>> [ ... ]
>>
>> This patch (or a later version of it) made it into mainline and causes a
>> large number of qemu boot test failures for various architectures (arm,
>> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
>> denominator is that boot hangs at "Saving random seed:". A sample bisect
>> log is attached. Reverting this patch fixes the problem.
> 
> As Linus said, it was worth a try, but I guess it just didn't work. For
> my own curiosity, though, do you have a link to those QEMU VMs you could
> share? I'd sort of like to poke around, and if we do ever reattempt this
> sometime down the road, it seems like understanding everything about why
> the previous time failed might be a good idea.
> 

Everything - including the various root file systems - is at
git@github.com:groeck/linux-build-test.git. Look into rootfs/ for the
various boot tests. I'll be happy to provide some qemu command lines
if needed.

Guenter
Jason A. Donenfeld March 22, 2022, 6:19 p.m. UTC | #11
Hi Guenter,

On Tue, Mar 22, 2022 at 10:56:44AM -0700, Guenter Roeck wrote:
> Everything - including the various root file systems - is at
> git@github.com:groeck/linux-build-test.git. Look into rootfs/ for the
> various boot tests. I'll be happy to provide some qemu command lines
> if needed.

Thanks. It looks like the "problem" is with this shell script:

  init_rng() {
    if check_file_size; then
      printf 'Initializing random number generator: '
      dd if="$URANDOM_SEED" bs="$pool_size" of=/dev/urandom count=1 2> /dev/null
      status=$?
      if [ "$status" -eq 0 ]; then
        echo "OK"
      else
        echo "FAIL"
      fi
      return "$status"
    fi
  }
  
  save_random_seed() {
    printf 'Saving random seed: '
    if touch "$URANDOM_SEED" 2> /dev/null; then
      old_umask=$(umask)
      umask 077
      dd if=/dev/urandom of="$URANDOM_SEED" bs="$pool_size" count=1 2> /dev/null
      status=$?
      umask "$old_umask"
      if [ "$status" -eq 0 ]; then
        echo "OK"
      else
        echo "FAIL"
      fi
    else
      status=$?
      echo "SKIP (read-only file system detected)"
    fi
    return "$status"
  }

  case "$1" in
    start|restart|reload)
      # Carry a random seed from start-up to start-up
      # Load and then save the whole entropy pool
      init_rng && save_random_seed;;

This code is actually problematic for a number of reasons. (And Linus,
I'm not saying "userspace is wrong" to justify breaking it or something,
don't worry.)

The first `dd if="$URANDOM_SEED" bs="$pool_size" of=/dev/urandom count=1`
will write the seed into the input pool, but:

  - It won't credit the entropy from that seed, so the pool won't
    actually initialize. (You need to use the ioctl to credit it.)
  - Because the pool doesn't initialize, subsequent reads from
    /dev/urandom won't actually use that seed.

The first point is why we had to revert this patch. But the second one
is actually a bit dangerous: you might write in a perfectly good seed to
/dev/urandom, but what you read out for the subsequent seed may be
complete deterministic crap. This is because the call to write_pool()
goes right into the input pool and doesn't touch any of the "fast init"
stuff, where we immediately mutate the crng key during early boot.

As far as I can tell, this has been the behavior for a really long time,
making the above innocuous pattern a pretty old thing that's broken. So
I could perhaps say, "this behavior is so old now, that your userspace
code is just plain broken," but I think I might actually have a very
quick unobtrusive fix for this. I'll mull some things over for rc2 or
later in rc1.

But, anyway, this only fixes the second point mentioned above. The first
one -- which resulted in the revert -- remains a stumper for now.

Jason
Mark Brown March 22, 2022, 6:24 p.m. UTC | #12
On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:

> This patch (or a later version of it) made it into mainline and causes a
> large number of qemu boot test failures for various architectures (arm,
> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> denominator is that boot hangs at "Saving random seed:". A sample bisect
> log is attached. Reverting this patch fixes the problem.

Just as a datapoint for debugging at least qemu/arm is getting coverage
in CI systems (KernelCI is covering a bunch of different emulated
machines and LKFT has at least one configuration as well, clang's tests
have some wider architecture coverage as well I think) and they don't
seem to be seeing any problems - there's some other variable in there.

For example current basic boot tests for KernelCI are at:

   https://linux.kernelci.org/test/job/mainline/branch/master/kernel/v5.17-1442-gb47d5a4f6b8d/plan/baseline/

for mainline and -next has:

   https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20220322/plan/baseline/

These are with a buildroot based rootfs that has a "Saving random seed: " 
step in the boot process FWIW.
Linus Torvalds March 22, 2022, 6:29 p.m. UTC | #13
On Tue, Mar 22, 2022 at 11:19 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> The first point is why we had to revert this patch. But the second one
> is actually a bit dangerous: you might write in a perfectly good seed to
> /dev/urandom, but what you read out for the subsequent seed may be
> complete deterministic crap. This is because the call to write_pool()
> goes right into the input pool and doesn't touch any of the "fast init"
> stuff, where we immediately mutate the crng key during early boot.

Christ, how I hate the crazy "no entropy means that we can't use it".

It's a disease, I tell you.

And it seems to be the direct cause of this misfeature.

By all means the code can say "I can't credit this as entropy", but
the fact that it then doesn't even mix it into the fast pool is just
wrong, wrong, wrong.

I think *that* is what we should fix. The fact is, urandom has
long-standing semantics as "don't block", and that it shouldn't care
about the (often completely insane) entropy crediting rules.

But that "don't care about entropy rules" should then also mean "oh,
we'll mix things in even if we don't credit entropy".

I hope that's the easy fix you are thinking about.

              Linus
Jason A. Donenfeld March 22, 2022, 6:36 p.m. UTC | #14
Hi Linus,

On Tue, Mar 22, 2022 at 12:29 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Christ, how I hate the crazy "no entropy means that we can't use it".
>
> It's a disease, I tell you.
>
> And it seems to be the direct cause of this misfeature.

A disease indeed.

> By all means the code can say "I can't credit this as entropy", but
> the fact that it then doesn't even mix it into the fast pool is just
> wrong, wrong, wrong.
>
> I think *that* is what we should fix. The fact is, urandom has
> long-standing semantics as "don't block", and that it shouldn't care
> about the (often completely insane) entropy crediting rules.
>
> But that "don't care about entropy rules" should then also mean "oh,
> we'll mix things in even if we don't credit entropy".
>
> I hope that's the easy fix you are thinking about.

Yes, exactly. And the patch to fix it is literally 2 lines. I'm
playing with it now and I'll think about it a bit and hopefully have
something for you to pull not before long.

In general, your intuition is correct, I think, that the entropy
crediting scheme is sort of insane and leads to problems. As I wrote
in <https://www.zx2c4.com/projects/linux-rng-5.17-5.18/> at the end,
other RNG schemes like Fortuna don't really suffer from this in the
same way because they're not even counting entropy. This might be
something to look at seriously in the future.

Jason
Guenter Roeck March 22, 2022, 9:54 p.m. UTC | #15
On 3/22/22 11:24, Mark Brown wrote:
> On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
> 
>> This patch (or a later version of it) made it into mainline and causes a
>> large number of qemu boot test failures for various architectures (arm,
>> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
>> denominator is that boot hangs at "Saving random seed:". A sample bisect
>> log is attached. Reverting this patch fixes the problem.
> 
> Just as a datapoint for debugging at least qemu/arm is getting coverage
> in CI systems (KernelCI is covering a bunch of different emulated
> machines and LKFT has at least one configuration as well, clang's tests
> have some wider architecture coverage as well I think) and they don't
> seem to be seeing any problems - there's some other variable in there.
> 
> For example current basic boot tests for KernelCI are at:
> 
>     https://linux.kernelci.org/test/job/mainline/branch/master/kernel/v5.17-1442-gb47d5a4f6b8d/plan/baseline/
> 
> for mainline and -next has:
> 
>     https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20220322/plan/baseline/
> 
> These are with a buildroot based rootfs that has a "Saving random seed: "
> step in the boot process FWIW.

I use buildroot 2021.02.3. I have not changed the buildroot code, and it
still seems to be the same in 2022.02. I don't see the problem with all
boot tests, only with the architectures mentioned above, and not with all
qemu machines on the affected platforms. For arm, mostly older machines
are affected (versatile, realview, pxa configurations, collie, integratorcp,
sx1, mps2-an385, vexpress-a9, cubieboard). I didn't check, but maybe
kernelci doesn't test those machines ?

Guenter
David Laight March 22, 2022, 10:25 p.m. UTC | #16
From: Guenter Roeck
> Sent: 22 March 2022 21:54
> 
> On 3/22/22 11:24, Mark Brown wrote:
> > On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
> >
> >> This patch (or a later version of it) made it into mainline and causes a
> >> large number of qemu boot test failures for various architectures (arm,
> >> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> >> denominator is that boot hangs at "Saving random seed:". A sample bisect
> >> log is attached. Reverting this patch fixes the problem.
> >
> > Just as a datapoint for debugging at least qemu/arm is getting coverage
> > in CI systems (KernelCI is covering a bunch of different emulated
> > machines and LKFT has at least one configuration as well, clang's tests
> > have some wider architecture coverage as well I think) and they don't
> > seem to be seeing any problems - there's some other variable in there.
> >
> > For example current basic boot tests for KernelCI are at:
> >
> >     https://linux.kernelci.org/test/job/mainline/branch/master/kernel/v5.17-1442-
> gb47d5a4f6b8d/plan/baseline/
> >
> > for mainline and -next has:
> >
> >     https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20220322/plan/baseline/
> >
> > These are with a buildroot based rootfs that has a "Saving random seed: "
> > step in the boot process FWIW.
> 
> I use buildroot 2021.02.3. I have not changed the buildroot code, and it
> still seems to be the same in 2022.02. I don't see the problem with all
> boot tests, only with the architectures mentioned above, and not with all
> qemu machines on the affected platforms. For arm, mostly older machines
> are affected (versatile, realview, pxa configurations, collie, integratorcp,
> sx1, mps2-an385, vexpress-a9, cubieboard). I didn't check, but maybe
> kernelci doesn't test those machines ?

I was trying to fix the buildroot save/restore random seed of a system
of mine.
I thought I'd fixed it - needed to use a persistent filesystem.
But I can't get rid of the 'uninitialised random read' messages.
(Which I expected to go away after writing the seed.)
But a quick look at the kernel code didn't seem to credit the
write into the correct logic.
I didn't check whether the data actually got used though.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Mark Brown March 23, 2022, 12:10 p.m. UTC | #17
On Tue, Mar 22, 2022 at 02:54:20PM -0700, Guenter Roeck wrote:
> On 3/22/22 11:24, Mark Brown wrote:

> > Just as a datapoint for debugging at least qemu/arm is getting coverage
> > in CI systems (KernelCI is covering a bunch of different emulated
> > machines and LKFT has at least one configuration as well, clang's tests
> > have some wider architecture coverage as well I think) and they don't
> > seem to be seeing any problems - there's some other variable in there.

> I use buildroot 2021.02.3. I have not changed the buildroot code, and it
> still seems to be the same in 2022.02. I don't see the problem with all
> boot tests, only with the architectures mentioned above, and not with all
> qemu machines on the affected platforms. For arm, mostly older machines
> are affected (versatile, realview, pxa configurations, collie, integratorcp,
> sx1, mps2-an385, vexpress-a9, cubieboard). I didn't check, but maybe
> kernelci doesn't test those machines ?

Kind of academic given that Jason seems to have a handle on what the
issues are but for KernelCI it's variations on mach-virt, plus
versatile-pb.  There's a physical cubietruck as well, and BeagleBone
Blacks among others.  My best guess would be systems with low RAM are
somehow more prone to issues.
Guenter Roeck March 23, 2022, 2:23 p.m. UTC | #18
On 3/23/22 05:10, Mark Brown wrote:
> On Tue, Mar 22, 2022 at 02:54:20PM -0700, Guenter Roeck wrote:
>> On 3/22/22 11:24, Mark Brown wrote:
> 
>>> Just as a datapoint for debugging at least qemu/arm is getting coverage
>>> in CI systems (KernelCI is covering a bunch of different emulated
>>> machines and LKFT has at least one configuration as well, clang's tests
>>> have some wider architecture coverage as well I think) and they don't
>>> seem to be seeing any problems - there's some other variable in there.
> 
>> I use buildroot 2021.02.3. I have not changed the buildroot code, and it
>> still seems to be the same in 2022.02. I don't see the problem with all
>> boot tests, only with the architectures mentioned above, and not with all
>> qemu machines on the affected platforms. For arm, mostly older machines
>> are affected (versatile, realview, pxa configurations, collie, integratorcp,
>> sx1, mps2-an385, vexpress-a9, cubieboard). I didn't check, but maybe
>> kernelci doesn't test those machines ?
> 
> Kind of academic given that Jason seems to have a handle on what the
> issues are but for KernelCI it's variations on mach-virt, plus
> versatile-pb.  There's a physical cubietruck as well, and BeagleBone
> Blacks among others.  My best guess would be systems with low RAM are
> somehow more prone to issues.

I don't think it is entirely academic. versatile-pb fails for me;
if it doesn't fail at KernelCI, I'd like to understand why - not to
fix it in my test environment, but to make sure that I _don't_ fix it.
After all, it _is_ a regression. Even if that regression is triggered
by bad (for a given definition of "bad") userspace code, it is still
a regression.

Thanks,
Guenter
Arnd Bergmann March 23, 2022, 3:53 p.m. UTC | #19
On Wed, Mar 23, 2022 at 3:23 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 3/23/22 05:10, Mark Brown wrote:
> > On Tue, Mar 22, 2022 at 02:54:20PM -0700, Guenter Roeck wrote:
> > Kind of academic given that Jason seems to have a handle on what the
> > issues are but for KernelCI it's variations on mach-virt, plus
> > versatile-pb.  There's a physical cubietruck as well, and BeagleBone
> > Blacks among others.  My best guess would be systems with low RAM are
> > somehow more prone to issues.
>
> I don't think it is entirely academic. versatile-pb fails for me;
> if it doesn't fail at KernelCI, I'd like to understand why - not to
> fix it in my test environment, but to make sure that I _don't_ fix it.
> After all, it _is_ a regression. Even if that regression is triggered
> by bad (for a given definition of "bad") userspace code, it is still
> a regression.

Maybe kernelci has a virtio-rng device assigned to the machine
and you don't? That would clearly avoid the issue here.

        Arnd
Mark Brown March 23, 2022, 4:18 p.m. UTC | #20
On Wed, Mar 23, 2022 at 04:53:13PM +0100, Arnd Bergmann wrote:
> On Wed, Mar 23, 2022 at 3:23 PM Guenter Roeck <linux@roeck-us.net> wrote:

> > I don't think it is entirely academic. versatile-pb fails for me;
> > if it doesn't fail at KernelCI, I'd like to understand why - not to
> > fix it in my test environment, but to make sure that I _don't_ fix it.
> > After all, it _is_ a regression. Even if that regression is triggered
> > by bad (for a given definition of "bad") userspace code, it is still
> > a regression.

> Maybe kernelci has a virtio-rng device assigned to the machine
> and you don't? That would clearly avoid the issue here.

No, nothing I can see in the boot log:

https://storage.kernelci.org/next/master/next-20220323/arm/versatile_defconfig/gcc-10/lab-baylibre/baseline-qemu_arm-versatilepb.html

and I'd be surprised if virtio devices made it through with a specific
platform emulation.  However it looks like for that test the init
scripts didn't do anything with the random seed (possibly due to running
from ramdisk?) so we'd not have hit the condition.
Arnd Bergmann March 23, 2022, 4:41 p.m. UTC | #21
On Wed, Mar 23, 2022 at 5:18 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Mar 23, 2022 at 04:53:13PM +0100, Arnd Bergmann wrote:
> > On Wed, Mar 23, 2022 at 3:23 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> > > I don't think it is entirely academic. versatile-pb fails for me;
> > > if it doesn't fail at KernelCI, I'd like to understand why - not to
> > > fix it in my test environment, but to make sure that I _don't_ fix it.
> > > After all, it _is_ a regression. Even if that regression is triggered
> > > by bad (for a given definition of "bad") userspace code, it is still
> > > a regression.
>
> > Maybe kernelci has a virtio-rng device assigned to the machine
> > and you don't? That would clearly avoid the issue here.
>
> No, nothing I can see in the boot log:
>
> https://storage.kernelci.org/next/master/next-20220323/arm/versatile_defconfig/gcc-10/lab-baylibre/baseline-qemu_arm-versatilepb.html
>
> and I'd be surprised if virtio devices made it through with a specific
> platform emulation.

In general they do: virtio devices appear as regular PCI devices
and get probed from there, as long as the drivers are available.

It looks like the PCI driver does not get initialized here though,
presumably because it's not enabled in versatile_defconfig.
It used to also not be enabled in multi_v5_defconfig, but I have
merged a patch from Anders that enables it in 5.18 for the
multi_v5_defconfig.

> However it looks like for that test the init
> scripts didn't do anything with the random seed (possibly due to running
> from ramdisk?) so we'd not have hit the condition.

Right.

     Arnd
Mark Brown March 23, 2022, 4:47 p.m. UTC | #22
On Wed, Mar 23, 2022 at 05:41:01PM +0100, Arnd Bergmann wrote:
> On Wed, Mar 23, 2022 at 5:18 PM Mark Brown <broonie@kernel.org> wrote:

> > and I'd be surprised if virtio devices made it through with a specific
> > platform emulation.

> In general they do: virtio devices appear as regular PCI devices
> and get probed from there, as long as the drivers are available.

> It looks like the PCI driver does not get initialized here though,
> presumably because it's not enabled in versatile_defconfig.
> It used to also not be enabled in multi_v5_defconfig, but I have
> merged a patch from Anders that enables it in 5.18 for the
> multi_v5_defconfig.

Ah, I thought Versatile was like the other older Arm reference platforms
and didn't have PCI.
Jason A. Donenfeld April 22, 2022, 1:42 p.m. UTC | #23
Hey Guenter,

On Tue, Mar 22, 2022 at 6:56 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 3/22/22 10:09, Jason A. Donenfeld wrote:
> > Hey Guenter,
> >
> > On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
> >> On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> >>> This topic has come up countless times, and usually doesn't go anywhere.
> >>> This time I thought I'd bring it up with a slightly narrower focus,
> >>> updated for some developments over the last three years: we finally can
> >>> make /dev/urandom always secure, in light of the fact that our RNG is
> >>> now always seeded.
> >>>
> >>
> >> [ ... ]
> >>
> >> This patch (or a later version of it) made it into mainline and causes a
> >> large number of qemu boot test failures for various architectures (arm,
> >> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> >> denominator is that boot hangs at "Saving random seed:". A sample bisect
> >> log is attached. Reverting this patch fixes the problem.
> >
> > As Linus said, it was worth a try, but I guess it just didn't work. For
> > my own curiosity, though, do you have a link to those QEMU VMs you could
> > share? I'd sort of like to poke around, and if we do ever reattempt this
> > sometime down the road, it seems like understanding everything about why
> > the previous time failed might be a good idea.
> >
>
> Everything - including the various root file systems - is at
> git@github.com:groeck/linux-build-test.git. Look into rootfs/ for the
> various boot tests. I'll be happy to provide some qemu command lines
> if needed.

I've been playing with a few things, and I'm wondering how close I am
to making this problem go away. I just made this branch:
https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=jd/for-guenter

Any interest in setting your tests on that and seeing if it still
breaks? Or, perhaps better, do you have a single script that runs all
your various tests and does all the toolchain things right, so I can
just point it at that tree and iterate?

Jason
Guenter Roeck April 22, 2022, 11:46 p.m. UTC | #24
On Fri, Apr 22, 2022 at 03:42:46PM +0200, Jason A. Donenfeld wrote:
> Hey Guenter,
> 
> On Tue, Mar 22, 2022 at 6:56 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On 3/22/22 10:09, Jason A. Donenfeld wrote:
> > > Hey Guenter,
> > >
> > > On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
> > >> On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> > >>> This topic has come up countless times, and usually doesn't go anywhere.
> > >>> This time I thought I'd bring it up with a slightly narrower focus,
> > >>> updated for some developments over the last three years: we finally can
> > >>> make /dev/urandom always secure, in light of the fact that our RNG is
> > >>> now always seeded.
> > >>>
> > >>
> > >> [ ... ]
> > >>
> > >> This patch (or a later version of it) made it into mainline and causes a
> > >> large number of qemu boot test failures for various architectures (arm,
> > >> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> > >> denominator is that boot hangs at "Saving random seed:". A sample bisect
> > >> log is attached. Reverting this patch fixes the problem.
> > >
> > > As Linus said, it was worth a try, but I guess it just didn't work. For
> > > my own curiosity, though, do you have a link to those QEMU VMs you could
> > > share? I'd sort of like to poke around, and if we do ever reattempt this
> > > sometime down the road, it seems like understanding everything about why
> > > the previous time failed might be a good idea.
> > >
> >
> > Everything - including the various root file systems - is at
> > git@github.com:groeck/linux-build-test.git. Look into rootfs/ for the
> > various boot tests. I'll be happy to provide some qemu command lines
> > if needed.
> 
> I've been playing with a few things, and I'm wondering how close I am
> to making this problem go away. I just made this branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=jd/for-guenter
> 
> Any interest in setting your tests on that and seeing if it still
> breaks? Or, perhaps better, do you have a single script that runs all

I applied your branch to my 'testing' branch. It will build tonight.
We should have results by tomorrow morning.

> your various tests and does all the toolchain things right, so I can
> just point it at that tree and iterate?
> 

Sorry, my system isn't that fancy. I don't mind running tests like this one,
though.

Guenter
Jason A. Donenfeld April 23, 2022, 12:52 a.m. UTC | #25
Hey Arnd/Guenter,

On Wed, Mar 23, 2022 at 4:53 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Mar 23, 2022 at 3:23 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On 3/23/22 05:10, Mark Brown wrote:
> > > On Tue, Mar 22, 2022 at 02:54:20PM -0700, Guenter Roeck wrote:
> > > Kind of academic given that Jason seems to have a handle on what the
> > > issues are but for KernelCI it's variations on mach-virt, plus
> > > versatile-pb.  There's a physical cubietruck as well, and BeagleBone
> > > Blacks among others.  My best guess would be systems with low RAM are
> > > somehow more prone to issues.
> >
> > I don't think it is entirely academic. versatile-pb fails for me;
> > if it doesn't fail at KernelCI, I'd like to understand why - not to
> > fix it in my test environment, but to make sure that I _don't_ fix it.
> > After all, it _is_ a regression. Even if that regression is triggered
> > by bad (for a given definition of "bad") userspace code, it is still
> > a regression.
>
> Maybe kernelci has a virtio-rng device assigned to the machine
> and you don't? That would clearly avoid the issue here.

Indeed it's probably something like that. Or maybe they're networked
with something that has a steady stream of interrupts. I say this
because I was able to reproduce Guenter's findings using the
versatilepb machine with the versatile_defconfig config and the
versatile-pb.dtb file. Indeed this board doesn't have a cycle counter.
However, I did have success using the fallback timer and the other
patches in the jd/for-guenter branch, so at least for versatile's
nuances, I think (hope?) there's a reasonable success story here.

Jason
Guenter Roeck April 23, 2022, 1:56 p.m. UTC | #26
On Fri, Apr 22, 2022 at 03:42:46PM +0200, Jason A. Donenfeld wrote:
> Hey Guenter,
> 
> On Tue, Mar 22, 2022 at 6:56 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On 3/22/22 10:09, Jason A. Donenfeld wrote:
> > > Hey Guenter,
> > >
> > > On Tue, Mar 22, 2022 at 08:58:20AM -0700, Guenter Roeck wrote:
> > >> On Thu, Feb 17, 2022 at 05:28:48PM +0100, Jason A. Donenfeld wrote:
> > >>> This topic has come up countless times, and usually doesn't go anywhere.
> > >>> This time I thought I'd bring it up with a slightly narrower focus,
> > >>> updated for some developments over the last three years: we finally can
> > >>> make /dev/urandom always secure, in light of the fact that our RNG is
> > >>> now always seeded.
> > >>>
> > >>
> > >> [ ... ]
> > >>
> > >> This patch (or a later version of it) made it into mainline and causes a
> > >> large number of qemu boot test failures for various architectures (arm,
> > >> m68k, microblaze, sparc32, xtensa are the ones I observed). Common
> > >> denominator is that boot hangs at "Saving random seed:". A sample bisect
> > >> log is attached. Reverting this patch fixes the problem.
> > >
> > > As Linus said, it was worth a try, but I guess it just didn't work. For
> > > my own curiosity, though, do you have a link to those QEMU VMs you could
> > > share? I'd sort of like to poke around, and if we do ever reattempt this
> > > sometime down the road, it seems like understanding everything about why
> > > the previous time failed might be a good idea.
> > >
> >
> > Everything - including the various root file systems - is at
> > git@github.com:groeck/linux-build-test.git. Look into rootfs/ for the
> > various boot tests. I'll be happy to provide some qemu command lines
> > if needed.
> 
> I've been playing with a few things, and I'm wondering how close I am
> to making this problem go away. I just made this branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=jd/for-guenter
> 
> Any interest in setting your tests on that and seeing if it still
> breaks? Or, perhaps better, do you have a single script that runs all

Looks like your code is already in -next; I see the same failures in
your tree and there.

openrisc generates a warning backtrace.

WARNING: CPU: 0 PID: 0 at drivers/char/random.c:1006 rand_initialize+0x148/0x174
Missing cycle counter and fallback timer; RNG entropy collection will consequently suffer.

parisc crashes.

[    0.000000] Kernel Fault: Code=15 (Data TLB miss fault) at addr 00000000
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc3-32bit+ #1
[    0.000000]
[    0.000000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[    0.000000] PSW: 00000000000001001011111100001110 Not tainted
[    0.000000] r00-03  0004bf0e 10f2c978 10773aa0 10e74300
[    0.000000] r04-07  00000004 10e74208 10e869d0 10e83978
[    0.000000] r08-11  f0023b90 f0023390 0004000e 10104f68
[    0.000000] r12-15  00000002 00000000 00000008 fffffff9
[    0.000000] r16-19  00000028 00080000 00000000 10dc6364
[    0.000000] r20-23  10dc6364 00000000 00000000 fefefeff
[    0.000000] r24-27  00000000 00000004 00000000 10dc6178
[    0.000000] r28-31  0073a08d 80000000 10e74340 00000000
[    0.000000] sr00-03  00000000 00000000 00000000 00000000
[    0.000000] sr04-07  00000000 00000000 00000000 00000000
[    0.000000]
[    0.000000] IASQ: 00000000 00000000 IAOQ: 1024d09c 1024d0a0
[    0.000000]  IIR: 0f401096    ISR: 00000000  IOR: 00000000
[    0.000000]  CPU:        0   CR30: 10e869d0 CR31: 00000000
[    0.000000]  ORIG_R28: 10e83ce8
[    0.000000]  IAOQ[0]: random_get_entropy_fallback+0x18/0x38
[    0.000000]  IAOQ[1]: random_get_entropy_fallback+0x1c/0x38
[    0.000000]  RP(r2): add_device_randomness+0x30/0xc8
[    0.000000] Backtrace:
[    0.000000]  [<10773aa0>] add_device_randomness+0x30/0xc8
[    0.000000]  [<10108734>] collect_boot_cpu_data+0x44/0x270
[    0.000000]  [<10104f28>] setup_arch+0x98/0xd4
[    0.000000]  [<10100a90>] start_kernel+0x8c/0x6d0

s390 crashes silently, no crash log.

Hope that helps,
Guenter
Jason A. Donenfeld April 23, 2022, 2:28 p.m. UTC | #27
Hi Guenter,

On Sat, Apr 23, 2022 at 06:56:31AM -0700, Guenter Roeck wrote:
> Looks like your code is already in -next; I see the same failures in
> your tree and there.

It's not in next, actually. The branch I made for you has that
additional testing commit.

Jason
Guenter Roeck April 23, 2022, 4:35 p.m. UTC | #28
On Sat, Apr 23, 2022 at 04:28:59PM +0200, Jason A. Donenfeld wrote:
> Hi Guenter,
> 
> On Sat, Apr 23, 2022 at 06:56:31AM -0700, Guenter Roeck wrote:
> > Looks like your code is already in -next; I see the same failures in
> > your tree and there.
> 
> It's not in next, actually. The branch I made for you has that
> additional testing commit.
> 

Hmm, then I can't really test it because the other 16 patches
in your branch (which are in -next) already cause a number
of failures.

Guenter
Jason A. Donenfeld April 23, 2022, 9:10 p.m. UTC | #29
Hey Guenter,

On Sat, Apr 23, 2022 at 06:56:31AM -0700, Guenter Roeck wrote:
> Looks like your code is already in -next; I see the same failures in
> your tree and there.

So interestingly, none of the old issues are now present (the hangs on
versatilepb and such), so that's very positive. As for the crashes you
found:

> openrisc generates a warning backtrace.
> parisc crashes.
> s390 crashes silently, no crash log.

I've now fixed these too, and tested the fixes as well. Hopefully the
new jd/for-guenther branch has no regressions at all now... Knock on
wood.

Thanks a bunch for looking at this. Very much appreciated.

Jason
Guenter Roeck April 24, 2022, 2:04 a.m. UTC | #30
On 4/23/22 14:10, Jason A. Donenfeld wrote:
> Hey Guenter,
> 
> On Sat, Apr 23, 2022 at 06:56:31AM -0700, Guenter Roeck wrote:
>> Looks like your code is already in -next; I see the same failures in
>> your tree and there.
> 
> So interestingly, none of the old issues are now present (the hangs on
> versatilepb and such), so that's very positive. As for the crashes you
> found:
> 
>> openrisc generates a warning backtrace.
>> parisc crashes.
>> s390 crashes silently, no crash log.
> 
> I've now fixed these too, and tested the fixes as well. Hopefully the
> new jd/for-guenther branch has no regressions at all now... Knock on
> wood.
> 
> Thanks a bunch for looking at this. Very much appreciated.
> 

I'll run another test tonight.

Guenter
Jason A. Donenfeld April 25, 2022, 12:12 a.m. UTC | #31
Hi Guenter,

On Sat, Apr 23, 2022 at 07:04:26PM -0700, Guenter Roeck wrote:
> I'll run another test tonight.

Super, thanks. Looking forward to learning what transpires. Hopefully
all pass this time through...

Jason
Guenter Roeck April 25, 2022, 1:54 a.m. UTC | #32
On 4/24/22 17:12, Jason A. Donenfeld wrote:
> Hi Guenter,
> 
> On Sat, Apr 23, 2022 at 07:04:26PM -0700, Guenter Roeck wrote:
>> I'll run another test tonight.
> 
> Super, thanks. Looking forward to learning what transpires. Hopefully
> all pass this time through...
> 

Build results:
	total: 147 pass: 146 fail: 1
Failed builds:
	m68k:allmodconfig
Qemu test results:
	total: 489 pass: 489 fail: 0

The failure is inherited from mainline, so all looks good.

Guenter
Jason A. Donenfeld April 25, 2022, 11:11 a.m. UTC | #33
Hi Guenter,

On Sun, Apr 24, 2022 at 06:54:10PM -0700, Guenter Roeck wrote:
> On 4/24/22 17:12, Jason A. Donenfeld wrote:
> > Hi Guenter,
> > 
> > On Sat, Apr 23, 2022 at 07:04:26PM -0700, Guenter Roeck wrote:
> >> I'll run another test tonight.
> > 
> > Super, thanks. Looking forward to learning what transpires. Hopefully
> > all pass this time through...
> > 
> 
> Build results:
> 	total: 147 pass: 146 fail: 1
> Failed builds:
> 	m68k:allmodconfig
> Qemu test results:
> 	total: 489 pass: 489 fail: 0
> 
> The failure is inherited from mainline, so all looks good.

That is excellent news! Thanks again for testing.

So what this means is: the rationale for reverting the /dev/random +
/dev/urandom unification has now been fixed. That's some real tangible
progress.

Now, I don't want to rush into trying the unification again too soon. I
think if anything, the lesson from the first attempt wasn't simply, "I
should fix a few of Guenter's test cases," but rather that the problem
is fairly nuanced and will take a lot wider testing and research.
However, the fact that the initial thing, across multiple platforms,
that lead to the revert has been fixed gives me a decent amount of
optimism that at /some point/ down the road, we'll be able to try this
again. One step at a time.

Jason
Mark Brown April 25, 2022, 12:09 p.m. UTC | #34
On Sat, Apr 23, 2022 at 02:52:51AM +0200, Jason A. Donenfeld wrote:
> On Wed, Mar 23, 2022 at 4:53 PM Arnd Bergmann <arnd@arndb.de> wrote:

> > Maybe kernelci has a virtio-rng device assigned to the machine
> > and you don't? That would clearly avoid the issue here.

> Indeed it's probably something like that. Or maybe they're networked
> with something that has a steady stream of interrupts. I say this
> because I was able to reproduce Guenter's findings using the
> versatilepb machine with the versatile_defconfig config and the
> versatile-pb.dtb file. Indeed this board doesn't have a cycle counter.
> However, I did have success using the fallback timer and the other
> patches in the jd/for-guenter branch, so at least for versatile's
> nuances, I think (hope?) there's a reasonable success story here.

There's no virtio-rng device being instantiated, unless qemu is doing
that by default (I can't see anything in the logs that suggests it did).
There is networking though.  A sample command for invoking qemu for
versatilepb is:

qemu-system-arm -cpu arm926 -machine versatilepb -nographic -net nic,model=smc91c111,macaddr=52:54:00:12:34:58 -net user -m 256 -monitor none -dtb /var/lib/lava/dispatcher/tmp/85180/deployimages-hitd6sn_/dtb/versatile-pb.dtb -kernel /var/lib/lava/dispatcher/tmp/85180/deployimages-hitd6sn_/kernel/zImage -append "console=ttyAMA0,115200 root=/dev/ram0 debug verbose console_msg_format=syslog earlycon" -initrd /var/lib/lava/dispatcher/tmp/85180/deployimages-hitd6sn_/ramdisk/rootfs.cpio.gz -drive format=qcow2,file=/var/lib/lava/dispatcher/tmp/85180/apply-overlay-guest-l9_f_lxl/lava-guest.qcow2,media=disk,if=scsi,id=lavatest
diff mbox series

Patch

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index cc296f0823bd..9f586025dbe6 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -707,7 +707,7 @@  static const struct memdev {
 	 [5] = { "zero", 0666, &zero_fops, FMODE_NOWAIT },
 	 [7] = { "full", 0666, &full_fops, 0 },
 	 [8] = { "random", 0666, &random_fops, 0 },
-	 [9] = { "urandom", 0666, &urandom_fops, 0 },
+	 [9] = { "urandom", 0666, &random_fops, 0 },
 #ifdef CONFIG_PRINTK
 	[11] = { "kmsg", 0644, &kmsg_fops, 0 },
 #endif
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 8d5abeefcc4f..fda5182d655d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -89,8 +89,6 @@  static LIST_HEAD(random_ready_list);
 /* Control how we warn userspace. */
 static struct ratelimit_state unseeded_warning =
 	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
-static struct ratelimit_state urandom_warning =
-	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
 static int ratelimit_disable __read_mostly;
 module_param_named(ratelimit_disable, ratelimit_disable, int, 0644);
 MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
@@ -336,11 +334,6 @@  static void crng_reseed(void)
 				  unseeded_warning.missed);
 			unseeded_warning.missed = 0;
 		}
-		if (urandom_warning.missed) {
-			pr_notice("%d urandom warning(s) missed due to ratelimiting\n",
-				  urandom_warning.missed);
-			urandom_warning.missed = 0;
-		}
 	}
 }
 
@@ -993,10 +986,8 @@  int __init rand_initialize(void)
 		pr_notice("crng init done (trusting CPU's manufacturer)\n");
 	}
 
-	if (ratelimit_disable) {
-		urandom_warning.interval = 0;
+	if (ratelimit_disable)
 		unseeded_warning.interval = 0;
-	}
 	return 0;
 }
 
@@ -1382,20 +1373,16 @@  static void try_to_generate_entropy(void)
  * getrandom(2) is the primary modern interface into the RNG and should
  * be used in preference to anything else.
  *
- * Reading from /dev/random has the same functionality as calling
- * getrandom(2) with flags=0. In earlier versions, however, it had
- * vastly different semantics and should therefore be avoided, to
- * prevent backwards compatibility issues.
- *
- * Reading from /dev/urandom has the same functionality as calling
- * getrandom(2) with flags=GRND_INSECURE. Because it does not block
- * waiting for the RNG to be ready, it should not be used.
+ * Reading from /dev/random and /dev/urandom both have the same effect
+ * as calling getrandom(2) with flags=0. (In earlier versions, however,
+ * they each had different semantics.)
  *
  * Writing to either /dev/random or /dev/urandom adds entropy to
  * the input pool but does not credit it.
  *
- * Polling on /dev/random indicates when the RNG is initialized, on
- * the read side, and when it wants new entropy, on the write side.
+ * Polling on /dev/random or /dev/urandom indicates when the RNG
+ * is initialized, on the read side, and when it wants new entropy,
+ * on the write side.
  *
  * Both /dev/random and /dev/urandom have the same set of ioctls for
  * adding entropy, getting the entropy count, zeroing the count, and
@@ -1480,21 +1467,6 @@  static ssize_t random_write(struct file *file, const char __user *buffer,
 	return (ssize_t)count;
 }
 
-static ssize_t urandom_read(struct file *file, char __user *buf, size_t nbytes,
-			    loff_t *ppos)
-{
-	static int maxwarn = 10;
-
-	if (!crng_ready() && maxwarn > 0) {
-		maxwarn--;
-		if (__ratelimit(&urandom_warning))
-			pr_notice("%s: uninitialized urandom read (%zd bytes read)\n",
-				  current->comm, nbytes);
-	}
-
-	return get_random_bytes_user(buf, nbytes);
-}
-
 static ssize_t random_read(struct file *file, char __user *buf, size_t nbytes,
 			   loff_t *ppos)
 {
@@ -1581,15 +1553,6 @@  const struct file_operations random_fops = {
 	.llseek = noop_llseek,
 };
 
-const struct file_operations urandom_fops = {
-	.read = urandom_read,
-	.write = random_write,
-	.unlocked_ioctl = random_ioctl,
-	.compat_ioctl = compat_ptr_ioctl,
-	.fasync = random_fasync,
-	.llseek = noop_llseek,
-};
-
 
 /********************************************************************
  *
diff --git a/include/linux/random.h b/include/linux/random.h
index d7354de9351e..38ff777aad19 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -44,7 +44,7 @@  extern void del_random_ready_callback(struct random_ready_callback *rdy);
 extern size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes);
 
 #ifndef MODULE
-extern const struct file_operations random_fops, urandom_fops;
+extern const struct file_operations random_fops;
 #endif
 
 u32 get_random_u32(void);