diff mbox series

random: ensure mix_interrupt_randomness() is consistent

Message ID 20220211011446.392673-1-Jason@zx2c4.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series random: ensure mix_interrupt_randomness() is consistent | expand

Commit Message

Jason A. Donenfeld Feb. 11, 2022, 1:14 a.m. UTC
This addresses two issues alluded to in an earlier commit.

The first issue is that mix_interrupt_randomness() might be migrated to
another CPU during CPU hotplug. This issue is rectified by checking that
it hasn't been migrated (after disabling migration). If it has been
migrated, then we set the count to zero, so that when the CPU comes
online again, it can requeue the work. As part of this, we switch to
using an atomic_t, so that the increment in the irq handler doesn't wipe
out the zeroing if the CPU comes back online while this worker is
running.

The second issue is that, though relatively minor in effect, we probably
do after all want to make sure we get a consistent view of the pool onto
the stack, in case it's interrupted by an irq while reading. To do this,
we simply read count before and after the memcpy and make sure they're
the same. If they're not, we try again. The likelihood of actually
hitting this is very low, as we're talking about a 2 or 4 word mov, and
we're executing on the same CPU as the potential interruption.
Nonetheless, it's easy enough to fix, so we do here.

If we only were concerned with the first issue rather than the second,
we could fix this entirely with just using an atomic_t. But in order to
fix them both, it requires a bit of coordination, which this patch
tackles.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

Comments

Sebastian Andrzej Siewior Feb. 11, 2022, 8:16 a.m. UTC | #1
On 2022-02-11 02:14:46 [+0100], Jason A. Donenfeld wrote:
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Sultan Alsawaf <sultan@kerneltoast.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  drivers/char/random.c | 42 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 9c779f1bda34..caaf3c33bb38 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1210,14 +1211,39 @@ static u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
>  static void mix_interrupt_randomness(struct work_struct *work)
>  {
>  	struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
> -	u32 pool[ARRAY_SIZE(fast_pool->pool32)];
> +	unsigned long pool[ARRAY_SIZE(fast_pool->pool_long)];
> +	unsigned int count_snapshot;
> +	size_t i;
>  
> -	/* Copy the pool to the stack so that the mixer always has a consistent view. */
> -	memcpy(pool, fast_pool->pool32, sizeof(pool));
> +	/* Check to see if we're running on the wrong CPU due to hotplug. */
> +	migrate_disable();
> +	if (fast_pool != this_cpu_ptr(&irq_randomness)) {

I am not sure that acquire and release semantic is needed and if so a
comment would probably be helpful to explain why.
But I'm trying to avoid the migrate_disable(), so:
To close the racy with losing the workqueue bit, wouldn't it be
sufficient to set it to zero via atomic_cmpxchg()? Also if the counter
before the memcpy() and after (at cmpxchg time) didn't change then the
pool wasn't modified. So basically 

 do {
 	counter = atomic_read(&fast_pool->count); // no need to cast
	memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool));
    } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter);


then it also shouldn't matter if we are _accidentally_ on the wrong CPU.

> +		migrate_enable();
> +		/*
> +		 * If we are unlucky enough to have been moved to another CPU,
> +		 * then we set our count to zero atomically so that when the
> +		 * CPU comes back online, it can enqueue work again.
> +		 */
> +		atomic_set_release(&fast_pool->count, 0);
> +		return;
> +	}
> +
> +	/*
> +	 * Copy the pool to the stack so that the mixer always has a
> +	 * consistent view. It's extremely unlikely but possible that
> +	 * this 2 or 4 word read is interrupted by an irq, but in case
> +	 * it is, we double check that count stays the same.
> +	 */
> +	do {
> +		count_snapshot = (unsigned int)atomic_read(&fast_pool->count);
> +		for (i = 0; i < ARRAY_SIZE(pool); ++i)
> +			pool[i] = READ_ONCE(fast_pool->pool_long[i]);

Why do you avoid memcpy()? Since it is a small memcpy, I'm sure the
compile will inline the register moves.

Sebastian
Jason A. Donenfeld Feb. 11, 2022, 10:48 a.m. UTC | #2
Hi Sebastian,

On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> But I'm trying to avoid the migrate_disable(), so:
> To close the racy with losing the workqueue bit, wouldn't it be
> sufficient to set it to zero via atomic_cmpxchg()? Also if the counter
> before the memcpy() and after (at cmpxchg time) didn't change then the
> pool wasn't modified. So basically
>
>  do {
>         counter = atomic_read(&fast_pool->count); // no need to cast
>         memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool));
>     } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter);
>
>
> then it also shouldn't matter if we are _accidentally_ on the wrong CPU.

This won't work. If we're executing on a different CPU, the CPU
mutating the pool won't necessarily update the count at the right
time. This isn't actually a seqlock or something like that. Rather, it
depends on running on the same CPU, where the interrupting irq handler
runs in full before giving control back, so that count and pool are
either both updated or not at all. Making this work across CPUs makes
things a lot more complicated and I'd rather not do that.

Actually, though, a nicer fix would be to just disable local
interrupts for that *2 word copy*. That's a tiny period of time. If
you permit me, that seems nicer. But if you don't like that, I'll keep
that loop.

Unfortunately, though, I think disabling migration is required. Sultan
(CC'd) found that these workqueues can migrate even midway through
running. And generally the whole idea is to keep this on the *same*
CPU so that we don't have to introduce locks and synchronization.

I'll add comments around the acquire/release. The remaining question
I believe is: would you prefer disabing irqs during the 2 word memcpy,
or this counter double read loop?


Jason
Jason A. Donenfeld Feb. 11, 2022, 1:04 p.m. UTC | #3
Sorry, missed this in your last email:

On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> > +     do {
> > +             count_snapshot = (unsigned int)atomic_read(&fast_pool->count);
> > +             for (i = 0; i < ARRAY_SIZE(pool); ++i)
> > +                     pool[i] = READ_ONCE(fast_pool->pool_long[i]);
>
> Why do you avoid memcpy()? Since it is a small memcpy, I'm sure the
> compile will inline the register moves.

Because the compiler will otherwise reorder it to be after the two
counter reads. I checked. And a barrier() is too heavy as it flushes
the writes to the stack instead of keeping them read into registers.
READ_ONCE() is the exact semantics we care about here.

Jason
Sebastian Andrzej Siewior Feb. 11, 2022, 2:51 p.m. UTC | #4
On 2022-02-11 11:48:15 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi,

> On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > But I'm trying to avoid the migrate_disable(), so:
> > To close the racy with losing the workqueue bit, wouldn't it be
> > sufficient to set it to zero via atomic_cmpxchg()? Also if the counter
> > before the memcpy() and after (at cmpxchg time) didn't change then the
> > pool wasn't modified. So basically
> >
> >  do {
> >         counter = atomic_read(&fast_pool->count); // no need to cast
> >         memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool));
> >     } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter);
> >
> >
> > then it also shouldn't matter if we are _accidentally_ on the wrong CPU.
> 
> This won't work. If we're executing on a different CPU, the CPU
> mutating the pool won't necessarily update the count at the right
> time. This isn't actually a seqlock or something like that. Rather, it

But it is atomic, isn't it?

> depends on running on the same CPU, where the interrupting irq handler
> runs in full before giving control back, so that count and pool are
> either both updated or not at all. Making this work across CPUs makes
> things a lot more complicated and I'd rather not do that.

but this isn't the rule, is it? It runs on the same CPU so we should
observe the update in IRQ context and the worker should observe the
counter _and_ pool update.

And cross CPU isn't the rule. We only re-do the loop if
- an interrupt came in on the local-CPU between atomic_read() and
  atomic_cmpxchg().

- the worker was migrated due CPU hotplug and we managed properly reset
  counter back to 0.

> Actually, though, a nicer fix would be to just disable local
> interrupts for that *2 word copy*. That's a tiny period of time. If
> you permit me, that seems nicer. But if you don't like that, I'll keep
> that loop.

Here, I don't mind but I don't think it is needed.

> Unfortunately, though, I think disabling migration is required. Sultan
> (CC'd) found that these workqueues can migrate even midway through
> running. And generally the whole idea is to keep this on the *same*
> CPU so that we don't have to introduce locks and synchronization.

They can't. Your workqueue is not unbound _and_ you specify a specific
CPU instead of WORK_CPU_UNBOUND (or an offlined CPU).
The only way it can migrate is if the CPU goes down while the worker is
running (or before it had a chance I think) which forces the scheduler
to break its (worker's) CPU affinity and move it to another CPU.

> I'll add comments around the acquire/release. The remaining question
> I believe is: would you prefer disabing irqs during the 2 word memcpy,
> or this counter double read loop?

I would prefer the cmpxchg in case it highly unlikely gets moved to
another CPU and we may lose that SCHED bit. That is why we switched to
atomics I think. Otherwise if the updates are only local can disable
interrupts during the update.
But I don't mind disabling interrupts for that copy.

> Jason

Sebastian
Jason A. Donenfeld Feb. 11, 2022, 4:19 p.m. UTC | #5
Hi Sebastian,

On Fri, Feb 11, 2022 at 3:51 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> > Unfortunately, though, I think disabling migration is required. Sultan
> > (CC'd) found that these workqueues can migrate even midway through
> > running. And generally the whole idea is to keep this on the *same*
> > CPU so that we don't have to introduce locks and synchronization.
>
> They can't. Your workqueue is not unbound _and_ you specify a specific
> CPU instead of WORK_CPU_UNBOUND (or an offlined CPU).
> The only way it can migrate is if the CPU goes down while the worker is
> running (or before it had a chance I think) which forces the scheduler
> to break its (worker's) CPU affinity and move it to another CPU.

Right, but the CPU could come back up while the worker is running on
the wrong CPU, and then kaboom. Anyway, the migration_disable() window
is very, very small - a few instructions at most with no loops. I
think it'll be fine.

> > I'll add comments around the acquire/release. The remaining question
> > I believe is: would you prefer disabing irqs during the 2 word memcpy,
> > or this counter double read loop?
>
> I would prefer the cmpxchg

I'll do the cmpxchg and send you a v+1. Sorry it wasn't in the last
one. It only now clicked with me what that code would look like, after
I stepped away from the screen for a bit to defrobulate my brains.

Jason
Sebastian Andrzej Siewior Feb. 11, 2022, 4:24 p.m. UTC | #6
On 2022-02-11 17:19:21 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi,

> I'll do the cmpxchg and send you a v+1. Sorry it wasn't in the last
> one. It only now clicked with me what that code would look like, after
> I stepped away from the screen for a bit to defrobulate my brains.

No worries. As much as I liked my first series, I like even more where
this is heading to ;)

> Jason

Sebastian
diff mbox series

Patch

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 9c779f1bda34..caaf3c33bb38 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1152,10 +1152,11 @@  struct fast_pool {
 	union {
 		u64 pool64[2];
 		u32 pool32[4];
+		unsigned long pool_long[16 / sizeof(long)];
 	};
 	struct work_struct mix;
 	unsigned long last;
-	unsigned int count;
+	atomic_t count;
 	u16 reg_idx;
 };
 #define FAST_POOL_MIX_INFLIGHT (1U << 31)
@@ -1210,14 +1211,39 @@  static u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
 static void mix_interrupt_randomness(struct work_struct *work)
 {
 	struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
-	u32 pool[ARRAY_SIZE(fast_pool->pool32)];
+	unsigned long pool[ARRAY_SIZE(fast_pool->pool_long)];
+	unsigned int count_snapshot;
+	size_t i;
 
-	/* Copy the pool to the stack so that the mixer always has a consistent view. */
-	memcpy(pool, fast_pool->pool32, sizeof(pool));
+	/* Check to see if we're running on the wrong CPU due to hotplug. */
+	migrate_disable();
+	if (fast_pool != this_cpu_ptr(&irq_randomness)) {
+		migrate_enable();
+		/*
+		 * If we are unlucky enough to have been moved to another CPU,
+		 * then we set our count to zero atomically so that when the
+		 * CPU comes back online, it can enqueue work again.
+		 */
+		atomic_set_release(&fast_pool->count, 0);
+		return;
+	}
+
+	/*
+	 * Copy the pool to the stack so that the mixer always has a
+	 * consistent view. It's extremely unlikely but possible that
+	 * this 2 or 4 word read is interrupted by an irq, but in case
+	 * it is, we double check that count stays the same.
+	 */
+	do {
+		count_snapshot = (unsigned int)atomic_read(&fast_pool->count);
+		for (i = 0; i < ARRAY_SIZE(pool); ++i)
+			pool[i] = READ_ONCE(fast_pool->pool_long[i]);
+	} while (count_snapshot != (unsigned int)atomic_read(&fast_pool->count));
 
 	/* We take care to zero out the count only after we're done reading the pool. */
-	WRITE_ONCE(fast_pool->count, 0);
+	atomic_set(&fast_pool->count, 0);
 	fast_pool->last = jiffies;
+	migrate_enable();
 
 	mix_pool_bytes(pool, sizeof(pool));
 	credit_entropy_bits(1);
@@ -1246,12 +1272,12 @@  void add_interrupt_randomness(int irq)
 	}
 
 	fast_mix(fast_pool->pool32);
-	new_count = ++fast_pool->count;
+	new_count = (unsigned int)atomic_inc_return_acquire(&fast_pool->count);
 
 	if (unlikely(crng_init == 0)) {
 		if (new_count >= 64 &&
 		    crng_fast_load(fast_pool->pool32, sizeof(fast_pool->pool32)) > 0) {
-			fast_pool->count = 0;
+			atomic_set(&fast_pool->count, 0);
 			fast_pool->last = now;
 
 			/*
@@ -1273,7 +1299,7 @@  void add_interrupt_randomness(int irq)
 
 	if (unlikely(!fast_pool->mix.func))
 		INIT_WORK(&fast_pool->mix, mix_interrupt_randomness);
-	fast_pool->count |= FAST_POOL_MIX_INFLIGHT;
+	atomic_or(FAST_POOL_MIX_INFLIGHT, &fast_pool->count);
 	queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix);
 }
 EXPORT_SYMBOL_GPL(add_interrupt_randomness);