Message ID | 20170614224526.29076-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello Ted, With rc6 already released and rc7 coming up, I'd really appreciate you stepping in here and either ACKing the above commit, or giving your two cents about it in case I need to roll something different. Thanks, Jason On Thu, Jun 15, 2017 at 12:45 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Odd versions of gcc for the sh4 architecture will actually warn about > flags being used while uninitialized, so we set them to zero. Non crazy > gccs will optimize that out again, so it doesn't make a difference. > > Next, over aggressive gccs could inline the expression that defines > use_lock, which could then introduce a race resulting in a lock > imbalance. By using READ_ONCE, we prevent that fate. Finally, we make > that assignment const, so that gcc can still optimize a nice amount. > > Finally, we fix a potential deadlock between primary_crng.lock and > batched_entropy_reset_lock, where they could be called in opposite > order. Moving the call to invalidate_batched_entropy to outside the lock > rectifies this issue. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > Ted -- the first part of this is the fixup patch we discussed earlier. > Then I added on top a fix for a potentially related race. > > I'm not totally convinced that moving this block to outside the spinlock > is 100% okay, so please give this a close look before merging. > > > drivers/char/random.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index e870f329db88..01a260f67437 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -803,13 +803,13 @@ static int crng_fast_load(const char *cp, size_t len) > p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp; > cp++; crng_init_cnt++; len--; > } > + spin_unlock_irqrestore(&primary_crng.lock, flags); > if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) { > invalidate_batched_entropy(); > crng_init = 1; > wake_up_interruptible(&crng_init_wait); > pr_notice("random: fast init done\n"); > } > - spin_unlock_irqrestore(&primary_crng.lock, flags); > return 1; > } > > @@ -841,6 +841,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) > } > memzero_explicit(&buf, sizeof(buf)); > crng->init_time = jiffies; > + spin_unlock_irqrestore(&primary_crng.lock, flags); > if (crng == &primary_crng && crng_init < 2) { > invalidate_batched_entropy(); > crng_init = 2; > @@ -848,7 +849,6 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) > wake_up_interruptible(&crng_init_wait); > pr_notice("random: crng init done\n"); > } > - spin_unlock_irqrestore(&primary_crng.lock, flags); > } > > static inline void crng_wait_ready(void) > @@ -2041,8 +2041,8 @@ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64); > u64 get_random_u64(void) > { > u64 ret; > - bool use_lock = crng_init < 2; > - unsigned long flags; > + bool use_lock = READ_ONCE(crng_init) < 2; > + unsigned long flags = 0; > struct batched_entropy *batch; > > #if BITS_PER_LONG == 64 > @@ -2073,8 +2073,8 @@ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32); > u32 get_random_u32(void) > { > u32 ret; > - bool use_lock = crng_init < 2; > - unsigned long flags; > + bool use_lock = READ_ONCE(crng_init) < 2; > + unsigned long flags = 0; > struct batched_entropy *batch; > > if (arch_get_random_int(&ret)) > -- > 2.13.1 >
On Mon, Jun 19, 2017 at 10:57:18PM +0200, Jason A. Donenfeld wrote: > > With rc6 already released and rc7 coming up, I'd really appreciate you > stepping in here and either ACKing the above commit, or giving your > two cents about it in case I need to roll something different. I actually had set up an earlier version of your patch for on Saturday while I was in Beijing. (Like Linus, I'm attending the LinuxCon China conference Monday and Tuesday.) I had even created the signed tag, but I didn't send the pull request to Linus because I was waiting to see about how discussions over the locking strategy and the spammy log messages on PowerPC was going to get resolved. I've since respun the commit to reflect your newer patch (see the random_for_linus_stable tag on random.git) and rebased the dev branch on top of that. Please take a look and comment. The other open issue I want to resolve before sending a pull request this week is whether we want to change the default for CONFIG_WARN_UNSEEDED_RANDOM so that the answer is 'n'. It *is* spammy for PowerPC, because they aren't getting their CRNG initialized quickly enough, so several userspace processes are getting fork/exec'ed with an uninitialized CRNG. That being said, it is a valid warning because it means that the initial stack canary for the first couple of PowerPC processes are being created without a fully initialized CRNG, which may mean that an attacker might be able to circumvent the stack canary on the first couple of processes. So that could potentially be a real security issue on Power. OTOH, most Power users aren't going to be able to do anything about the fact the stack canaries of the system daemons started during early boot don't have strong randomness, so perhaps we should disable the warning by default. Opinions? - Ted
On Tue, Jun 20, 2017 at 3:33 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Mon, Jun 19, 2017 at 10:57:18PM +0200, Jason A. Donenfeld wrote: >> >> With rc6 already released and rc7 coming up, I'd really appreciate you >> stepping in here and either ACKing the above commit, or giving your >> two cents about it in case I need to roll something different. > > I actually had set up an earlier version of your patch for on Saturday > while I was in Beijing. (Like Linus, I'm attending the LinuxCon China > conference Monday and Tuesday.) I had even created the signed tag, > but I didn't send the pull request to Linus because I was waiting to > see about how discussions over the locking strategy and the spammy log > messages on PowerPC was going to get resolved. > > I've since respun the commit to reflect your newer patch (see the > random_for_linus_stable tag on random.git) and rebased the dev branch > on top of that. Please take a look and comment. > > The other open issue I want to resolve before sending a pull request > this week is whether we want to change the default for > CONFIG_WARN_UNSEEDED_RANDOM so that the answer is 'n'. It *is* spammy > for PowerPC, because they aren't getting their CRNG initialized > quickly enough, so several userspace processes are getting > fork/exec'ed with an uninitialized CRNG. It's very spammy for ARM as well. I booted next-20170619 on an Aspeed (32-bit ARM) board and by the time I made it to a shell the log buffer contained only warnings: [ 10.452921] random: arch_pick_mmap_layout+0xa8/0xe8 get_random_u32 called with crng_init=0 [ 10.461255] random: load_elf_binary+0x3c8/0x104c get_random_u32 called with crng_init=0 [ 10.471464] random: arch_setup_additional_pages+0x6c/0x110 get_random_u32 called with crng_init=0 [ 10.480429] random: randomize_page+0x44/0x58 get_random_u32 called with crng_init=0 [ 10.494802] random: arch_pick_mmap_layout+0xa8/0xe8 get_random_u32 called with crng_init=0 [ 10.503141] random: load_elf_binary+0x3c8/0x104c get_random_u32 called with crng_init=0 [ 10.511571] random: arch_setup_additional_pages+0x6c/0x110 get_random_u32 called with crng_init=0 [ 10.520527] random: randomize_page+0x44/0x58 get_random_u32 called with crng_init=0 [ 10.537847] random: arch_pick_mmap_layout+0xa8/0xe8 get_random_u32 called with crng_init=0 [ 10.546182] random: load_elf_binary+0x3c8/0x104c get_random_u32 called with crng_init=0 [ 10.554611] random: arch_setup_additional_pages+0x6c/0x110 get_random_u32 called with crng_init=0 [ 10.563563] random: randomize_page+0x44/0x58 get_random_u32 called with crng_init=0 So +1 for defaulting CONFIG_WARN_UNSEEDED_RANDOM=n. Cheers, Joel > That being said, it is a > valid warning because it means that the initial stack canary for the > first couple of PowerPC processes are being created without a fully > initialized CRNG, which may mean that an attacker might be able to > circumvent the stack canary on the first couple of processes. So that > could potentially be a real security issue on Power. OTOH, most Power > users aren't going to be able to do anything about the fact the stack > canaries of the system daemons started during early boot don't have > strong randomness, so perhaps we should disable the warning by > default. > > Opinions? > > - Ted
Theodore Ts'o <tytso@mit.edu> writes: > On Mon, Jun 19, 2017 at 10:57:18PM +0200, Jason A. Donenfeld wrote: >> >> With rc6 already released and rc7 coming up, I'd really appreciate you >> stepping in here and either ACKing the above commit, or giving your >> two cents about it in case I need to roll something different. > > I actually had set up an earlier version of your patch for on Saturday > while I was in Beijing. (Like Linus, I'm attending the LinuxCon China > conference Monday and Tuesday.) I had even created the signed tag, > but I didn't send the pull request to Linus because I was waiting to > see about how discussions over the locking strategy and the spammy log > messages on PowerPC was going to get resolved. > > I've since respun the commit to reflect your newer patch (see the > random_for_linus_stable tag on random.git) and rebased the dev branch > on top of that. Please take a look and comment. > > The other open issue I want to resolve before sending a pull request > this week is whether we want to change the default for > CONFIG_WARN_UNSEEDED_RANDOM so that the answer is 'n'. Yes please. > It *is* spammy for PowerPC, because they aren't getting their CRNG *some* powerpc machines ... > initialized quickly enough, so several userspace processes are getting > fork/exec'ed with an uninitialized CRNG. That being said, it is a > valid warning because it means that the initial stack canary for the > first couple of PowerPC processes are being created without a fully > initialized CRNG, which may mean that an attacker might be able to > circumvent the stack canary on the first couple of processes. So that > could potentially be a real security issue on Power. OTOH, most Power > users aren't going to be able to do anything about the fact the stack > canaries of the system daemons started during early boot don't have > strong randomness, so perhaps we should disable the warning by > default. powerpc supports a wide range of hardware platforms, some of which are 10-15 years old, and don't have a hardware RNG. Is there anything we can do on those machines? Seems like our only option would be to block the boot while some more "entropy" builds up, but that's unlikely to be popular with users. On our newer machines (>= Power8) we have a hardware RNG which we wire up to arch_get_random_seed_long(), so on those machines the warnings would be valid, because they'd indicate a bug. So I think it should be up to arches to decide whether this is turned on via their defconfigs, and the default should be 'n' because a lot of old hardware won't be able to do anything useful with the warnings. cheers
Hey Ted, On Tue, Jun 20, 2017 at 02:03:44AM -0400, Theodore Ts'o wrote: > I actually had set up an earlier version of your patch for on Saturday > while I was in Beijing. (Like Linus, I'm attending the LinuxCon China > conference Monday and Tuesday.) I had even created the signed tag, > I've since respun the commit to reflect your newer patch (see the > random_for_linus_stable tag on random.git) and rebased the dev branch > on top of that. Please take a look and comment. So it looks like you've gone with 4a072c71f49. If that looks good (moving the lock, etc) to you, then great, we're done. If there are still locking objections (are there?), then we'll need to revisit. > but I didn't send the pull request to Linus because I was waiting to > see about how discussions over the locking strategy and the spammy log > messages on PowerPC was going to get resolved. > The other open issue I want to resolve before sending a pull request > this week is whether we want to change the default for > CONFIG_WARN_UNSEEDED_RANDOM so that the answer is 'n'. In the v1 of this patch many moons ago, it was just vanilla, default y, but due to the spamminess, I thought folks would revolt. So I made a change: Specifically, I added `depends on DEBUG_KERNEL`. This means that these useful warnings will only poke other kernel developers. This is probably exactly what we want. If the various associated developers see a warning coming from their particular subsystem, they'll be more motivated to fix it. Ordinary users on distribution kernels shouldn't see the warnings or the spam at all, since typically users aren't using DEBUG_KERNEL. Then, to make things _even less_ annoying to kernel developers, you added a nice patch on top to squelch repeated messages. So, I still think this current strategy is a good one, of default y, but depends on DEBUG_KERNEL. Regards, Jason
On Tue, Jun 20, 2017 at 4:14 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: >... > Specifically, I added `depends on DEBUG_KERNEL`. This means that these > useful warnings will only poke other kernel developers. This is probably > exactly what we want. If the various associated developers see a warning > coming from their particular subsystem, they'll be more motivated to > fix it. Ordinary users on distribution kernels shouldn't see the > warnings or the spam at all, since typically users aren't using > DEBUG_KERNEL. I think it is a bad idea to suppress all messages from a security engineering point of view. Many folks don't run debug kernels. Most of the users who want or need to know of the issues won't realize its happening. Consider, the reason we learned of systemd's problems was due to dmesg's. Suppressing all messages for all configurations cast a wider net than necessary. Configurations that could potentially be detected and fixed likely will go unnoticed. If the problem is not brought to light, then it won't be fixed. I feel like the kernel is making policy decisions for some organizations. For those who have hardware that is effectively unfixable, then organization has to decide what to do based on their risk adversity. They may decide to live with the risk, or they may decide to refresh the hardware. However, without information on the issue, they may not even realize they have an actionable item. Jeff
On Tue, Jun 20, 2017 at 10:33 AM, Jeffrey Walton <noloader@gmail.com> wrote: > I think it is a bad idea to suppress all messages from a security > engineering point of view. > > Many folks don't run debug kernels. Most of the users who want or need > to know of the issues won't realize its happening. Consider, the > reason we learned of systemd's problems was due to dmesg's. > > Suppressing all messages for all configurations cast a wider net than > necessary. Configurations that could potentially be detected and fixed > likely will go unnoticed. If the problem is not brought to light, then > it won't be fixed. I more or less agree with you that we should just turn this on for all users and they'll just have to live with the spam and report odd entries, and overtime we'll fix all the violations. But I think there's another camp that would mutiny in the face of this kind of hubris. That's why I moved pretty readily toward the compromise position of default y, but depends on DEBUG_KERNEL. My hope was that it'd to an extent satisfy both camps, and also disappoint both camps in an equal way.
On Tue, Jun 20, 2017 at 10:53:35AM +0200, Jason A. Donenfeld wrote: > > Suppressing all messages for all configurations cast a wider net than > > necessary. Configurations that could potentially be detected and fixed > > likely will go unnoticed. If the problem is not brought to light, then > > it won't be fixed. > > I more or less agree with you that we should just turn this on for all > users and they'll just have to live with the spam and report odd > entries, and overtime we'll fix all the violations. Fix all the problems *how*? If you are on an old system which doesn't a hardware random number generator, and which doesn't have a high resolution cycle counter, and may not have a lot of entropy easily harvestable from the environment, there may not be a lot you can do. Sure, you can pretend that the cache (which by the way is usually determinstic) is ***so*** complicated that no one can figure it out, and essentially pretend that you have entropy when you probably don't; that just simply becomes a different way of handwaving and suppressing the warning messages. > But I think there's another camp that would mutiny in the face of this > kind of hubris. Blocking the boot for hours and hours until we have enough entropy to initialize the CRNG is ***not*** an acceptable way of making the warning messages go away. Do that and the users **will** mutiny. It's this sort of attitude which is why Linus has in the past said that security people are sometimes insane.... - Ted
On Tue, Jun 20, 2017 at 5:36 AM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Jun 20, 2017 at 10:53:35AM +0200, Jason A. Donenfeld wrote: >> > Suppressing all messages for all configurations cast a wider net than >> > necessary. Configurations that could potentially be detected and fixed >> > likely will go unnoticed. If the problem is not brought to light, then >> > it won't be fixed. >> >> I more or less agree with you that we should just turn this on for all >> users and they'll just have to live with the spam and report odd >> entries, and overtime we'll fix all the violations. > > Fix all the problems *how*? If you are on an old system which doesn't > a hardware random number generator, and which doesn't have a high > resolution cycle counter, and may not have a lot of entropy easily > harvestable from the environment, there may not be a lot you can do. > Sure, you can pretend that the cache (which by the way is usually > determinstic) is ***so*** complicated that no one can figure it out, > and essentially pretend that you have entropy when you probably don't; > that just simply becomes a different way of handwaving and suppressing > the warning messages. > >> But I think there's another camp that would mutiny in the face of this >> kind of hubris. > > Blocking the boot for hours and hours until we have enough entropy to > initialize the CRNG is ***not*** an acceptable way of making the > warning messages go away. Do that and the users **will** mutiny. > > It's this sort of attitude which is why Linus has in the past said > that security people are sometimes insane.... I don't believe it has anything to do with insanity. Its sound security engineering. Are there compelling reasons a single dmesg warning cannot be provided? A single message avoids spamming the logs. It also informs the system owner of the problem. An individual or organization can then take action based on their risk posture. Finally, it avoids the kernel making policy decisions for a user or organization. Jeff
On Tue, Jun 20, 2017 at 11:36 AM, Theodore Ts'o <tytso@mit.edu> wrote: >> But I think there's another camp that would mutiny in the face of this >> kind of hubris. > > Blocking the boot for hours and hours until we have enough entropy to > initialize the CRNG is ***not*** an acceptable way of making the > warning messages go away. Do that and the users **will** mutiny. > > It's this sort of attitude which is why Linus has in the past said > that security people are sometimes insane.... Uh, talk about a totally unnecessary punch... In case my last email wasn't clear, I fully recognize that `default y` is a tad too extreme, which is why from one of the earliest revisions in this series, I moved directly to the compromise solution (`depends DEBUG_KERNEL`) without even waiting for people to complain first.
On Tue, Jun 20, 2017 at 5:49 AM, Jeffrey Walton <noloader@gmail.com> wrote: > On Tue, Jun 20, 2017 at 5:36 AM, Theodore Ts'o <tytso@mit.edu> wrote: >> On Tue, Jun 20, 2017 at 10:53:35AM +0200, Jason A. Donenfeld wrote: >>> > Suppressing all messages for all configurations cast a wider net than >>> > necessary. Configurations that could potentially be detected and fixed >>> > likely will go unnoticed. If the problem is not brought to light, then >>> > it won't be fixed. > Are there compelling reasons a single dmesg warning cannot be provided? > > A single message avoids spamming the logs. It also informs the system > owner of the problem. An individual or organization can then take > action based on their risk posture. Finally, it avoids the kernel > making policy decisions for a user or organization. I'd say the best solution is to have no configuration option specifically for these messages. Always give some, but let DEBUG_KERNEL control how many. If DEBUG_KERNEL is not set, emit exactly one message & ignore any other errors of this type. On some systems, that message may have to be ignored, on some it might start an incremental process where one problem gets fixed only to have another crop up & on some it might prompt the admin to explore further by compiling with DEBUG_KERNEL. If DEBUG_KERNEL is set, emit a message for every error of this type.
On Tue, Jun 20, 2017 at 10:50 AM, Sandy Harris <sandyinchina@gmail.com> wrote: > On Tue, Jun 20, 2017 at 5:49 AM, Jeffrey Walton <noloader@gmail.com> wrote: >> On Tue, Jun 20, 2017 at 5:36 AM, Theodore Ts'o <tytso@mit.edu> wrote: >>> On Tue, Jun 20, 2017 at 10:53:35AM +0200, Jason A. Donenfeld wrote: > >>>> > Suppressing all messages for all configurations cast a wider net than >>>> > necessary. Configurations that could potentially be detected and fixed >>>> > likely will go unnoticed. If the problem is not brought to light, then >>>> > it won't be fixed. > >> Are there compelling reasons a single dmesg warning cannot be provided? >> >> A single message avoids spamming the logs. It also informs the system >> owner of the problem. An individual or organization can then take >> action based on their risk posture. Finally, it avoids the kernel >> making policy decisions for a user or organization. > > I'd say the best solution is to have no configuration option > specifically for these messages. Always give some, but let > DEBUG_KERNEL control how many. > > If DEBUG_KERNEL is not set, emit exactly one message & ignore any > other errors of this type. On some systems, that message may have to > be ignored, on some it might start an incremental process where one > problem gets fixed only to have another crop up & on some it might > prompt the admin to explore further by compiling with DEBUG_KERNEL. > > If DEBUG_KERNEL is set, emit a message for every error of this type. How about doing this: default DEBUG_KERNEL Most distro kernel select DEBUG_KERNEL because it unhides a bunch of other useful configs. Since it doesn't strictly _depend_ on DEBUG_KERNEL, I think it's probably a mistake to enforce a false dependency. Using it as a hint for the default seems maybe like a good middle ground. (And if people can't agree on that, then I guess "default n"...) -Kees
On Tue, Jun 20, 2017 at 8:14 PM, Kees Cook <keescook@chromium.org> wrote: > How about doing this: > > default DEBUG_KERNEL > > Most distro kernel select DEBUG_KERNEL because it unhides a bunch of > other useful configs. Since it doesn't strictly _depend_ on > DEBUG_KERNEL, I think it's probably a mistake to enforce a false > dependency. Using it as a hint for the default seems maybe like a good > middle ground. (And if people can't agree on that, then I guess > "default n"...) I didn't know you could do that with Kconfig. Great idea. I'll make this change and submit a new patch to Ted. Jason
On Tue, Jun 20, 2017 at 11:49:07AM +0200, Jason A. Donenfeld wrote: > Uh, talk about a totally unnecessary punch... In case my last email > wasn't clear, I fully recognize that `default y` is a tad too extreme, > which is why from one of the earliest revisions in this series, I > moved directly to the compromise solution (`depends DEBUG_KERNEL`) > without even waiting for people to complain first. The punch was in response to this statement, which I personally found fairly infuriating: >> I more or less agree with you that we should just turn this on for all >> users and they'll just have to live with the spam and report odd >> entries, and overtime we'll fix all the violations. There seems to be a fundamental misapprehension that it will be easy to "fix all the violations". For certain hardware types, this is not easy, and the "eh, let them get spammed until we get around to fixing it" attitude is precisely what I was pushing back against. There's a certain amount of privilege for those of us who are using x86 systems with built-in hardware random number generators, and cycle counters, where the problem is much easier to solve. But for other platforms, it really, REALLY isn't that easy to fix. One solution that might be acceptable is to simply print a *single* warning, the first time some piece of kernel code tries to get randomness before the CRNG is initialized. And that's it. If it's only a single dmesg line, then we probably don't need to hide it behind a #ifdef. That might satisfy the security-obsessed who want to rub users' noses in the face that their kernel is doing something potentially insecure and there is nothing they can do about it. But since it's also a single line in the syslog, it's not actively annoying. The #ifdef will allow the current behaviour where we suppress duplicates, but we warn for every attempt to get randomness. That we can default to no, since it will only be people who are trying to audit calls to see if the real fix is to switch the call to prandom_u32, because the use case really was't security/crypto sensitive. As I have said before, ultimately I think the only real solution to this whole mess is to allow the bootloader to read entropy from the boot device (or maybe some NVRAM or battery-backed memory), which is then overwritten as early as possible in the boot process with new random data. This is what OpenBSD does, but OpenBSD has a much easier job because they only have to support a small set of architectures. We will need to do this for each bootloader that is used by Linux, which is a pretty large set. But ultimately, it solves *all* the problems, including getting entropy for KASLR, which needs the randomness super-early in the boot process, long before we have any hope of initializing the entropy pool from environmental noise. So personally, I think this focus on trying to warn/spam users is not particularly useful. If we can mute the warnings to those people who want to play whack-a-mole, it's not harmful, but for those who think that futzing with get_random_* calls is the right approach, personally I'm really not convinced. Of course, the kernel is a volunteer project, so ultimately all a maintainer can do is to say no to patches, and not command people to work on things that he or she wishes. I *can* try to pursude people about what the right way to go is, because doing something that involves boot loaders is going to require a huge amount of effort from many people. It's certainly not anything I or anyone else can do by him or herself. - Ted
On Wed, Jun 21, 2017 at 1:38 AM, Theodore Ts'o <tytso@mit.edu> wrote: > The punch was in response to this statement, which I personally found > fairly infuriating: > >>> I more or less agree with you that we should just turn this on for all >>> users and they'll just have to live with the spam and report odd >>> entries, and overtime we'll fix all the violations. Holy cow, please cool it. I think the "or less" part was relevant, as was the subsequent sentence which characterized that sentiment as "hubris". Also, the subsequent email when I made explicit the fact that I was more in agreement with you than you interpreted. So, in case you're really really really not getting the message I'm trying to make so explicitly clear to you: I AGREE WITH YOU. So, no more punching, pretty please? > > There seems to be a fundamental misapprehension that it will be easy > to "fix all the violations". I don't think it will be easy. I agree with you. > But for other > platforms, it really, REALLY isn't that easy to fix. Other platforms will be hard. I agree with you. > So personally, I think I'm going to roll Kees' suggestion into a PATCH and send it to you. You can decide if you want to apply it. I'll be satisfied with whatever you choose and will follow your lead. Jason
On Tue, Jun 20, 2017 at 7:38 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Jun 20, 2017 at 11:49:07AM +0200, Jason A. Donenfeld wrote: >> ... >>> I more or less agree with you that we should just turn this on for all >>> users and they'll just have to live with the spam and report odd >>> entries, and overtime we'll fix all the violations. > > There seems to be a fundamental misapprehension that it will be easy > to "fix all the violations". For certain hardware types, this is > not easy, and the "eh, let them get spammed until we get around to > fixing it" attitude is precisely what I was pushing back against. I can't speak for others, but for me: I think they will fall into three categories: 1. easy to fix 2. difficult to fix 3. unable to fix (1) is low hanging fruit and they will probably (hopefully?) be cleared easily. Like systemd on x86_64 with rdrand and rdseed. There's no reason for systemd to find itself starved of entropy on that platform. (cf., http://github.com/systemd/systemd/issues/4167). Organizations that find themselves in (3) can choose to use a board or server and accept the risk, or they can choose to remediate it in another way. The "other way" may include a capital expenditure and a hardware refresh. The central point is, they know about the risk and they can make the decision. Jeff
diff --git a/drivers/char/random.c b/drivers/char/random.c index e870f329db88..01a260f67437 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -803,13 +803,13 @@ static int crng_fast_load(const char *cp, size_t len) p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp; cp++; crng_init_cnt++; len--; } + spin_unlock_irqrestore(&primary_crng.lock, flags); if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) { invalidate_batched_entropy(); crng_init = 1; wake_up_interruptible(&crng_init_wait); pr_notice("random: fast init done\n"); } - spin_unlock_irqrestore(&primary_crng.lock, flags); return 1; } @@ -841,6 +841,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) } memzero_explicit(&buf, sizeof(buf)); crng->init_time = jiffies; + spin_unlock_irqrestore(&primary_crng.lock, flags); if (crng == &primary_crng && crng_init < 2) { invalidate_batched_entropy(); crng_init = 2; @@ -848,7 +849,6 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r) wake_up_interruptible(&crng_init_wait); pr_notice("random: crng init done\n"); } - spin_unlock_irqrestore(&primary_crng.lock, flags); } static inline void crng_wait_ready(void) @@ -2041,8 +2041,8 @@ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64); u64 get_random_u64(void) { u64 ret; - bool use_lock = crng_init < 2; - unsigned long flags; + bool use_lock = READ_ONCE(crng_init) < 2; + unsigned long flags = 0; struct batched_entropy *batch; #if BITS_PER_LONG == 64 @@ -2073,8 +2073,8 @@ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32); u32 get_random_u32(void) { u32 ret; - bool use_lock = crng_init < 2; - unsigned long flags; + bool use_lock = READ_ONCE(crng_init) < 2; + unsigned long flags = 0; struct batched_entropy *batch; if (arch_get_random_int(&ret))
Odd versions of gcc for the sh4 architecture will actually warn about flags being used while uninitialized, so we set them to zero. Non crazy gccs will optimize that out again, so it doesn't make a difference. Next, over aggressive gccs could inline the expression that defines use_lock, which could then introduce a race resulting in a lock imbalance. By using READ_ONCE, we prevent that fate. Finally, we make that assignment const, so that gcc can still optimize a nice amount. Finally, we fix a potential deadlock between primary_crng.lock and batched_entropy_reset_lock, where they could be called in opposite order. Moving the call to invalidate_batched_entropy to outside the lock rectifies this issue. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- Ted -- the first part of this is the fixup patch we discussed earlier. Then I added on top a fix for a potentially related race. I'm not totally convinced that moving this block to outside the spinlock is 100% okay, so please give this a close look before merging. drivers/char/random.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)