Message ID | 20220211011446.392673-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | random: ensure mix_interrupt_randomness() is consistent | expand |
On 2022-02-11 02:14:46 [+0100], Jason A. Donenfeld wrote: > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Theodore Ts'o <tytso@mit.edu> > Cc: Dominik Brodowski <linux@dominikbrodowski.net> > Cc: Sultan Alsawaf <sultan@kerneltoast.com> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > drivers/char/random.c | 42 ++++++++++++++++++++++++++++++++++-------- > 1 file changed, 34 insertions(+), 8 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 9c779f1bda34..caaf3c33bb38 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -1210,14 +1211,39 @@ static u32 get_reg(struct fast_pool *f, struct pt_regs *regs) > static void mix_interrupt_randomness(struct work_struct *work) > { > struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); > - u32 pool[ARRAY_SIZE(fast_pool->pool32)]; > + unsigned long pool[ARRAY_SIZE(fast_pool->pool_long)]; > + unsigned int count_snapshot; > + size_t i; > > - /* Copy the pool to the stack so that the mixer always has a consistent view. */ > - memcpy(pool, fast_pool->pool32, sizeof(pool)); > + /* Check to see if we're running on the wrong CPU due to hotplug. */ > + migrate_disable(); > + if (fast_pool != this_cpu_ptr(&irq_randomness)) { I am not sure that acquire and release semantic is needed and if so a comment would probably be helpful to explain why. But I'm trying to avoid the migrate_disable(), so: To close the racy with losing the workqueue bit, wouldn't it be sufficient to set it to zero via atomic_cmpxchg()? Also if the counter before the memcpy() and after (at cmpxchg time) didn't change then the pool wasn't modified. So basically do { counter = atomic_read(&fast_pool->count); // no need to cast memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool)); } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter); then it also shouldn't matter if we are _accidentally_ on the wrong CPU. > + migrate_enable(); > + /* > + * If we are unlucky enough to have been moved to another CPU, > + * then we set our count to zero atomically so that when the > + * CPU comes back online, it can enqueue work again. > + */ > + atomic_set_release(&fast_pool->count, 0); > + return; > + } > + > + /* > + * Copy the pool to the stack so that the mixer always has a > + * consistent view. It's extremely unlikely but possible that > + * this 2 or 4 word read is interrupted by an irq, but in case > + * it is, we double check that count stays the same. > + */ > + do { > + count_snapshot = (unsigned int)atomic_read(&fast_pool->count); > + for (i = 0; i < ARRAY_SIZE(pool); ++i) > + pool[i] = READ_ONCE(fast_pool->pool_long[i]); Why do you avoid memcpy()? Since it is a small memcpy, I'm sure the compile will inline the register moves. Sebastian
Hi Sebastian, On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > But I'm trying to avoid the migrate_disable(), so: > To close the racy with losing the workqueue bit, wouldn't it be > sufficient to set it to zero via atomic_cmpxchg()? Also if the counter > before the memcpy() and after (at cmpxchg time) didn't change then the > pool wasn't modified. So basically > > do { > counter = atomic_read(&fast_pool->count); // no need to cast > memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool)); > } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter); > > > then it also shouldn't matter if we are _accidentally_ on the wrong CPU. This won't work. If we're executing on a different CPU, the CPU mutating the pool won't necessarily update the count at the right time. This isn't actually a seqlock or something like that. Rather, it depends on running on the same CPU, where the interrupting irq handler runs in full before giving control back, so that count and pool are either both updated or not at all. Making this work across CPUs makes things a lot more complicated and I'd rather not do that. Actually, though, a nicer fix would be to just disable local interrupts for that *2 word copy*. That's a tiny period of time. If you permit me, that seems nicer. But if you don't like that, I'll keep that loop. Unfortunately, though, I think disabling migration is required. Sultan (CC'd) found that these workqueues can migrate even midway through running. And generally the whole idea is to keep this on the *same* CPU so that we don't have to introduce locks and synchronization. I'll add comments around the acquire/release. The remaining question I believe is: would you prefer disabing irqs during the 2 word memcpy, or this counter double read loop? Jason
Sorry, missed this in your last email: On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > + do { > > + count_snapshot = (unsigned int)atomic_read(&fast_pool->count); > > + for (i = 0; i < ARRAY_SIZE(pool); ++i) > > + pool[i] = READ_ONCE(fast_pool->pool_long[i]); > > Why do you avoid memcpy()? Since it is a small memcpy, I'm sure the > compile will inline the register moves. Because the compiler will otherwise reorder it to be after the two counter reads. I checked. And a barrier() is too heavy as it flushes the writes to the stack instead of keeping them read into registers. READ_ONCE() is the exact semantics we care about here. Jason
On 2022-02-11 11:48:15 [+0100], Jason A. Donenfeld wrote: > Hi Sebastian, Hi, > On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > But I'm trying to avoid the migrate_disable(), so: > > To close the racy with losing the workqueue bit, wouldn't it be > > sufficient to set it to zero via atomic_cmpxchg()? Also if the counter > > before the memcpy() and after (at cmpxchg time) didn't change then the > > pool wasn't modified. So basically > > > > do { > > counter = atomic_read(&fast_pool->count); // no need to cast > > memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool)); > > } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter); > > > > > > then it also shouldn't matter if we are _accidentally_ on the wrong CPU. > > This won't work. If we're executing on a different CPU, the CPU > mutating the pool won't necessarily update the count at the right > time. This isn't actually a seqlock or something like that. Rather, it But it is atomic, isn't it? > depends on running on the same CPU, where the interrupting irq handler > runs in full before giving control back, so that count and pool are > either both updated or not at all. Making this work across CPUs makes > things a lot more complicated and I'd rather not do that. but this isn't the rule, is it? It runs on the same CPU so we should observe the update in IRQ context and the worker should observe the counter _and_ pool update. And cross CPU isn't the rule. We only re-do the loop if - an interrupt came in on the local-CPU between atomic_read() and atomic_cmpxchg(). - the worker was migrated due CPU hotplug and we managed properly reset counter back to 0. > Actually, though, a nicer fix would be to just disable local > interrupts for that *2 word copy*. That's a tiny period of time. If > you permit me, that seems nicer. But if you don't like that, I'll keep > that loop. Here, I don't mind but I don't think it is needed. > Unfortunately, though, I think disabling migration is required. Sultan > (CC'd) found that these workqueues can migrate even midway through > running. And generally the whole idea is to keep this on the *same* > CPU so that we don't have to introduce locks and synchronization. They can't. Your workqueue is not unbound _and_ you specify a specific CPU instead of WORK_CPU_UNBOUND (or an offlined CPU). The only way it can migrate is if the CPU goes down while the worker is running (or before it had a chance I think) which forces the scheduler to break its (worker's) CPU affinity and move it to another CPU. > I'll add comments around the acquire/release. The remaining question > I believe is: would you prefer disabing irqs during the 2 word memcpy, > or this counter double read loop? I would prefer the cmpxchg in case it highly unlikely gets moved to another CPU and we may lose that SCHED bit. That is why we switched to atomics I think. Otherwise if the updates are only local can disable interrupts during the update. But I don't mind disabling interrupts for that copy. > Jason Sebastian
Hi Sebastian, On Fri, Feb 11, 2022 at 3:51 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > Unfortunately, though, I think disabling migration is required. Sultan > > (CC'd) found that these workqueues can migrate even midway through > > running. And generally the whole idea is to keep this on the *same* > > CPU so that we don't have to introduce locks and synchronization. > > They can't. Your workqueue is not unbound _and_ you specify a specific > CPU instead of WORK_CPU_UNBOUND (or an offlined CPU). > The only way it can migrate is if the CPU goes down while the worker is > running (or before it had a chance I think) which forces the scheduler > to break its (worker's) CPU affinity and move it to another CPU. Right, but the CPU could come back up while the worker is running on the wrong CPU, and then kaboom. Anyway, the migration_disable() window is very, very small - a few instructions at most with no loops. I think it'll be fine. > > I'll add comments around the acquire/release. The remaining question > > I believe is: would you prefer disabing irqs during the 2 word memcpy, > > or this counter double read loop? > > I would prefer the cmpxchg I'll do the cmpxchg and send you a v+1. Sorry it wasn't in the last one. It only now clicked with me what that code would look like, after I stepped away from the screen for a bit to defrobulate my brains. Jason
On 2022-02-11 17:19:21 [+0100], Jason A. Donenfeld wrote: > Hi Sebastian, Hi, > I'll do the cmpxchg and send you a v+1. Sorry it wasn't in the last > one. It only now clicked with me what that code would look like, after > I stepped away from the screen for a bit to defrobulate my brains. No worries. As much as I liked my first series, I like even more where this is heading to ;) > Jason Sebastian
diff --git a/drivers/char/random.c b/drivers/char/random.c index 9c779f1bda34..caaf3c33bb38 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1152,10 +1152,11 @@ struct fast_pool { union { u64 pool64[2]; u32 pool32[4]; + unsigned long pool_long[16 / sizeof(long)]; }; struct work_struct mix; unsigned long last; - unsigned int count; + atomic_t count; u16 reg_idx; }; #define FAST_POOL_MIX_INFLIGHT (1U << 31) @@ -1210,14 +1211,39 @@ static u32 get_reg(struct fast_pool *f, struct pt_regs *regs) static void mix_interrupt_randomness(struct work_struct *work) { struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); - u32 pool[ARRAY_SIZE(fast_pool->pool32)]; + unsigned long pool[ARRAY_SIZE(fast_pool->pool_long)]; + unsigned int count_snapshot; + size_t i; - /* Copy the pool to the stack so that the mixer always has a consistent view. */ - memcpy(pool, fast_pool->pool32, sizeof(pool)); + /* Check to see if we're running on the wrong CPU due to hotplug. */ + migrate_disable(); + if (fast_pool != this_cpu_ptr(&irq_randomness)) { + migrate_enable(); + /* + * If we are unlucky enough to have been moved to another CPU, + * then we set our count to zero atomically so that when the + * CPU comes back online, it can enqueue work again. + */ + atomic_set_release(&fast_pool->count, 0); + return; + } + + /* + * Copy the pool to the stack so that the mixer always has a + * consistent view. It's extremely unlikely but possible that + * this 2 or 4 word read is interrupted by an irq, but in case + * it is, we double check that count stays the same. + */ + do { + count_snapshot = (unsigned int)atomic_read(&fast_pool->count); + for (i = 0; i < ARRAY_SIZE(pool); ++i) + pool[i] = READ_ONCE(fast_pool->pool_long[i]); + } while (count_snapshot != (unsigned int)atomic_read(&fast_pool->count)); /* We take care to zero out the count only after we're done reading the pool. */ - WRITE_ONCE(fast_pool->count, 0); + atomic_set(&fast_pool->count, 0); fast_pool->last = jiffies; + migrate_enable(); mix_pool_bytes(pool, sizeof(pool)); credit_entropy_bits(1); @@ -1246,12 +1272,12 @@ void add_interrupt_randomness(int irq) } fast_mix(fast_pool->pool32); - new_count = ++fast_pool->count; + new_count = (unsigned int)atomic_inc_return_acquire(&fast_pool->count); if (unlikely(crng_init == 0)) { if (new_count >= 64 && crng_fast_load(fast_pool->pool32, sizeof(fast_pool->pool32)) > 0) { - fast_pool->count = 0; + atomic_set(&fast_pool->count, 0); fast_pool->last = now; /* @@ -1273,7 +1299,7 @@ void add_interrupt_randomness(int irq) if (unlikely(!fast_pool->mix.func)) INIT_WORK(&fast_pool->mix, mix_interrupt_randomness); - fast_pool->count |= FAST_POOL_MIX_INFLIGHT; + atomic_or(FAST_POOL_MIX_INFLIGHT, &fast_pool->count); queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix); } EXPORT_SYMBOL_GPL(add_interrupt_randomness);
This addresses two issues alluded to in an earlier commit. The first issue is that mix_interrupt_randomness() might be migrated to another CPU during CPU hotplug. This issue is rectified by checking that it hasn't been migrated (after disabling migration). If it has been migrated, then we set the count to zero, so that when the CPU comes online again, it can requeue the work. As part of this, we switch to using an atomic_t, so that the increment in the irq handler doesn't wipe out the zeroing if the CPU comes back online while this worker is running. The second issue is that, though relatively minor in effect, we probably do after all want to make sure we get a consistent view of the pool onto the stack, in case it's interrupted by an irq while reading. To do this, we simply read count before and after the memcpy and make sure they're the same. If they're not, we try again. The likelihood of actually hitting this is very low, as we're talking about a 2 or 4 word mov, and we're executing on the same CPU as the potential interruption. Nonetheless, it's easy enough to fix, so we do here. If we only were concerned with the first issue rather than the second, we could fix this entirely with just using an atomic_t. But in order to fix them both, it requires a bit of coordination, which this patch tackles. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- drivers/char/random.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 8 deletions(-)