[v6] random: defer fast pool mixing to worker

Message ID	20220211162515.554867-1-Jason@zx2c4.com (mailing list archive)
State	Not Applicable
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> From: "Jason A. Donenfeld" <Jason@zx2c4.com> To: linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>, Thomas Gleixner <tglx@linutronix.de>, Peter Zijlstra <peterz@infradead.org>, Theodore Ts'o <tytso@mit.edu>, Sultan Alsawaf <sultan@kerneltoast.com>, =?utf-8?q?Jonathan_Neusch=C3=A4fer?= <j.neuschaefer@gmx.net>, Dominik Brodowski <linux@dominikbrodowski.net> Subject: [PATCH v6] random: defer fast pool mixing to worker Date: Fri, 11 Feb 2022 17:25:15 +0100 Message-Id: <20220211162515.554867-1-Jason@zx2c4.com> In-Reply-To: <YgZ6IEbiDgz5X1ON@linutronix.de> References: <YgZ6IEbiDgz5X1ON@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v6] random: defer fast pool mixing to worker \| expand [v6] random: defer fast pool mixing to worker

Jason A. Donenfeld Feb. 11, 2022, 4:25 p.m. UTC

On PREEMPT_RT, it's problematic to take spinlocks from hard irq
handlers. We can fix this by deferring to a workqueue the dumping of
the fast pool into the input pool.

We accomplish this with some careful rules on fast_pool->count:

  - When it's incremented to >= 64, we schedule the work.
  - If the top bit is set, we never schedule the work, even if >= 64.
  - The worker is responsible for setting it back to 0 when it's done.

There are two small issues around using workqueues for this purpose that
we work around.

The first issue is that mix_interrupt_randomness() might be migrated to
another CPU during CPU hotplug. This issue is rectified by checking that
it hasn't been migrated (after disabling migration). If it has been
migrated, then we set the count to zero, so that when the CPU comes
online again, it can requeue the work. As part of this, we switch to
using an atomic_t, so that the increment in the irq handler doesn't wipe
out the zeroing if the CPU comes back online while this worker is
running.

The second issue is that, though relatively minor in effect, we probably
want to make sure we get a consistent view of the pool onto the stack,
in case it's interrupted by an irq while reading. To do this, we simply
read count before and after the memcpy and make sure they're the same.
If they're not, we try again. This isn't a seqlock or anything heavy
like that because we're guaranteed to be on the same core as the irq
handler interrupting, which means that interruption either happens in
full or doesn't at all. The likelihood of actually hitting this is very
low, as we're talking about a 2 or 4 word mov.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Sebastian - this now uses cmpxchg as you suggested, and has comments on
its various atomic uses. I think we should be finally good to go! PTAL.
Thanks again for looking at this hairy patch.

 drivers/char/random.c | 73 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 59 insertions(+), 14 deletions(-)

Sebastian Andrzej Siewior Feb. 11, 2022, 4:44 p.m. UTC | #1

On 2022-02-11 17:25:15 [+0100], Jason A. Donenfeld wrote:
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index c42c07a7eb56..20b11a4b6559 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1214,12 +1215,59 @@ static u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
>  	return *ptr;
>  }
>  
> +static void mix_interrupt_randomness(struct work_struct *work)
> +{
> +	struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
> +	unsigned long pool[ARRAY_SIZE(fast_pool->pool)];
> +	int count;
> +
> +	/* Check to see if we're running on the wrong CPU due to hotplug. */
> +	migrate_disable();
> +	if (fast_pool != this_cpu_ptr(&irq_randomness)) {
> +		migrate_enable();
> +		/*
> +		 * If we are unlucky enough to have been moved to another CPU,

+ "during CPU hotplug while the CPU was shutdown". It should not look
like the worker can be migrated on system without CPU-hotplug involved.

> +		 * then we set our count to zero atomically so that when the
> +		 * CPU comes back online, it can enqueue work again. The
> +		 * _release here pairs with the atomic_inc_return_acquire in
> +		 * add_interrupt_randomness().
> +		 */
> +		atomic_set_release(&fast_pool->count, 0);
> +		return;
> +	}
> +
> +	/*
> +	 * Copy the pool to the stack so that the mixer always has a
> +	 * consistent view. It's extremely unlikely but possible that
> +	 * this 2 or 4 word read is interrupted by an irq, but in case
> +	 * it is, we double check that count stays the same.
> +	 *
> +	 * We set the count to 0 so that irqs can immediately begin to
> +	 * accumulate again after. Since any possible interruptions
> +	 * at this stage are guaranteed to be on the same CPU, we can
> +	 * use cmpxchg_relaxed.
> +	 */
> +	count = atomic_read(&fast_pool->count);
> +	do {
> +		memcpy(pool, fast_pool->pool, sizeof(pool));
> +	} while (atomic_try_cmpxchg_relaxed(&fast_pool->count, &count, 0));

I *think* we could drop that "fast_pool !=
this_cpu_ptr(&irq_randomness)" check at the top since that cmpxchg will
save us and redo the loop. But if I remember correctly you worried about
fast_pool->pool being modified (which is only a corner case if we are on
the other CPU while the orig CPU is back again). Either way, it would be
random and we would not consume more entropy.

So if we have to keep this then please swap that migrate_disable() with
local_irq_disable(). Otherwise PeterZ will yell at me.

> +	fast_pool->last = jiffies;
> +	migrate_enable();
> +
> +	mix_pool_bytes(pool, sizeof(pool));
> +	credit_entropy_bits(1);
> +	memzero_explicit(pool, sizeof(pool));
> +}
> +
…
> @@ -1235,12 +1283,13 @@ void add_interrupt_randomness(int irq)
>  	}
>  
>  	fast_mix((u32 *)fast_pool->pool);
> -	++fast_pool->count;
> +	/* The _acquire here pairs with the atomic_set_release in mix_interrupt_randomness(). */
> +	new_count = (unsigned int)atomic_inc_return_acquire(&fast_pool->count);
>  
>  	if (unlikely(crng_init == 0)) {
> -		if (fast_pool->count >= 64 &&
> +		if (new_count >= 64 &&
>  		    crng_fast_load(fast_pool->pool, sizeof(fast_pool->pool)) > 0) {
> -			fast_pool->count = 0;
> +			atomic_set(&fast_pool->count, 0);
>  			fast_pool->last = now;

I'm fine if we keep this as is for now.
What do we do here vs RT? I suggested this
  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=a2d2d54409481aa23a3e11ab9559a843e36a79ec

Is this doable?

Sebastian

Jason A. Donenfeld Feb. 11, 2022, 4:50 p.m. UTC | #2

Hi Sebastian,

On Fri, Feb 11, 2022 at 5:44 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> > +
> > +     /* Check to see if we're running on the wrong CPU due to hotplug. */
> > +     migrate_disable();
> > +     if (fast_pool != this_cpu_ptr(&irq_randomness)) {
> > +             migrate_enable();
> > +             /*
> > +              * If we are unlucky enough to have been moved to another CPU,
>
> + "during CPU hotplug while the CPU was shutdown". It should not look
> like the worker can be migrated on system without CPU-hotplug involved.

Will adjust comment.

> I *think* we could drop that "fast_pool !=
> this_cpu_ptr(&irq_randomness)" check at the top since that cmpxchg will
> save us and redo the loop. But if I remember correctly you worried about
> fast_pool->pool being modified (which is only a corner case if we are on
> the other CPU while the orig CPU is back again). Either way, it would be
> random and we would not consume more entropy.

No, we cannot, and "it's all random anyway so who cares if we corrupt
things!" is not rigorous, as entropy may actually be thrown away as
it's moved between words on each mix. If we're not running on the same
CPU, one CPU can corrupt the other's view of fast pool before updating
count. We must keep this.

> So if we have to keep this then please swap that migrate_disable() with
> local_irq_disable(). Otherwise PeterZ will yell at me.

Okay, I'll do that then, and then in the process get rid of the
cmpxchg loop since it's no longer required.

> >       if (unlikely(crng_init == 0)) {
> > -             if (fast_pool->count >= 64 &&
> > +             if (new_count >= 64 &&
> >                   crng_fast_load(fast_pool->pool, sizeof(fast_pool->pool)) > 0) {
> > -                     fast_pool->count = 0;
> > +                     atomic_set(&fast_pool->count, 0);
> >                       fast_pool->last = now;
>
> I'm fine if we keep this as is for now.
> What do we do here vs RT? I suggested this
>   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=a2d2d54409481aa23a3e11ab9559a843e36a79ec
>
> Is this doable?

It might be, but last time I checked it seemed problematic. As I
mentioned in an earlier thread, I'll take a look again at that next
week after this patch here settles. Haven't forgotten.

v+1 coming up with irqs disabled.

Jason

Sebastian Andrzej Siewior Feb. 11, 2022, 4:58 p.m. UTC | #3

On 2022-02-11 17:50:34 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> > I *think* we could drop that "fast_pool !=
> > this_cpu_ptr(&irq_randomness)" check at the top since that cmpxchg will
> > save us and redo the loop. But if I remember correctly you worried about
> > fast_pool->pool being modified (which is only a corner case if we are on
> > the other CPU while the orig CPU is back again). Either way, it would be
> > random and we would not consume more entropy.
> 
> No, we cannot, and "it's all random anyway so who cares if we corrupt
> things!" is not rigorous, as entropy may actually be thrown away as
> it's moved between words on each mix. If we're not running on the same
> CPU, one CPU can corrupt the other's view of fast pool before updating
> count. We must keep this.

Okay, I assumed something like that.

> > So if we have to keep this then please swap that migrate_disable() with
> > local_irq_disable(). Otherwise PeterZ will yell at me.
> 
> Okay, I'll do that then, and then in the process get rid of the
> cmpxchg loop since it's no longer required.

So the only reason why we have that atomic_t is for rare case where run
on the remote CPU and need to remove the upper bit in the counter?

> > >       if (unlikely(crng_init == 0)) {
> > > -             if (fast_pool->count >= 64 &&
> > > +             if (new_count >= 64 &&
> > >                   crng_fast_load(fast_pool->pool, sizeof(fast_pool->pool)) > 0) {
> > > -                     fast_pool->count = 0;
> > > +                     atomic_set(&fast_pool->count, 0);
> > >                       fast_pool->last = now;
> >
> > I'm fine if we keep this as is for now.
> > What do we do here vs RT? I suggested this
> >   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=a2d2d54409481aa23a3e11ab9559a843e36a79ec
> >
> > Is this doable?
> 
> It might be, but last time I checked it seemed problematic. As I
> mentioned in an earlier thread, I'll take a look again at that next
> week after this patch here settles. Haven't forgotten.

Ah, cheers.

> v+1 coming up with irqs disabled.
> 
> Jason

Sebastian

Jason A. Donenfeld Feb. 11, 2022, 5 p.m. UTC | #4

Hi Sebastian,

On Fri, Feb 11, 2022 at 5:59 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> > Okay, I'll do that then, and then in the process get rid of the
> > cmpxchg loop since it's no longer required.
>
> So the only reason why we have that atomic_t is for rare case where run
> on the remote CPU and need to remove the upper bit in the counter?

Yes. That's the only remaining reason. Annoying, but whatareyagonnado?

Jason

Sebastian Andrzej Siewior Feb. 11, 2022, 5:15 p.m. UTC | #5

On 2022-02-11 18:00:21 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> On Fri, Feb 11, 2022 at 5:59 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > > Okay, I'll do that then, and then in the process get rid of the
> > > cmpxchg loop since it's no longer required.
> >
> > So the only reason why we have that atomic_t is for rare case where run
> > on the remote CPU and need to remove the upper bit in the counter?
> 
> Yes. That's the only remaining reason. Annoying, but whatareyagonnado?

A CPU hotplug notifier which removes unconditionally that bit when the
CPU goes down or sets it to 0.
We can keep it as it. Just an idea for later maybe ;)

> Jason

Sebastian

Jason A. Donenfeld Feb. 11, 2022, 5:17 p.m. UTC | #6

On Fri, Feb 11, 2022 at 6:15 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> A CPU hotplug notifier which removes unconditionally that bit when the
> CPU goes down or sets it to 0.
> We can keep it as it. Just an idea for later maybe ;)

I looked into it and it seemed like the plumbing was kind of miserable
for that. If you want to take a stab, though, that might be an okay
followup patch, and then we can assess atomics vs notifier. I think
notifier will wind up being a lot clunkier, though.

Sounds like we should be all set for the v7 I sent out?

Jason

Sebastian Andrzej Siewior Feb. 11, 2022, 5:26 p.m. UTC | #7

On 2022-02-11 18:17:57 [+0100], Jason A. Donenfeld wrote:
> On Fri, Feb 11, 2022 at 6:15 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > A CPU hotplug notifier which removes unconditionally that bit when the
> > CPU goes down or sets it to 0.
> > We can keep it as it. Just an idea for later maybe ;)
> 
> I looked into it and it seemed like the plumbing was kind of miserable
> for that. If you want to take a stab, though, that might be an okay
> followup patch, and then we can assess atomics vs notifier. I think
> notifier will wind up being a lot clunkier, though.
> 
> Sounds like we should be all set for the v7 I sent out?

Sure. I can do the CPU-HP notifier later one once we are done with
everything. I acked the v7, don't see a road block.

> Jason

Sebastian

Jason A. Donenfeld Feb. 13, 2022, 5:37 p.m. UTC | #8

On Fri, Feb 11, 2022 at 5:59 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> > > What do we do here vs RT? I suggested this
> > >   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=a2d2d54409481aa23a3e11ab9559a843e36a79ec
> > >
> > > Is this doable?
> >
> > It might be, but last time I checked it seemed problematic. As I
> > mentioned in an earlier thread, I'll take a look again at that next
> > week after this patch here settles. Haven't forgotten.
>
> Ah, cheers.

I started looking at this and came up with this draft with questions:
https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/no-irq-trylocks

Some research remains...

Jason

Jason A. Donenfeld Feb. 13, 2022, 9:04 p.m. UTC | #9

Hey Sebastian,

On Fri, Feb 11, 2022 at 6:26 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> Sure. I can do the CPU-HP notifier later one once we are done with
> everything. I acked the v7, don't see a road block.

I've been running this over the weekend and performance is generally okay.

# perf top --symbols $(objdump -t
/usr/src/linux/drivers/char/random.o|fgrep 'F .text'|awk '{print
$6}'|tr '\n' ,) -F max -g

One thing I noticed though was that add_interrupt_randomness spends
most of its time on:

    lock   xadd   %eax,0x38(%rbp)

So, as expected, those atomics really need to go. Indeed we might be
best off with the CPU hotplug notifier for setting the count back to
zero.

Do you want to prepare a patch for this? Or should I take a stab at it?

Jason

Sebastian Andrzej Siewior Feb. 14, 2022, 9:16 a.m. UTC | #10

On 2022-02-13 18:37:33 [+0100], Jason A. Donenfeld wrote:
> I started looking at this and came up with this draft with questions:
> https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/no-irq-trylocks

to
| - Does anything anywhere call get_random_xx() before the worker has a
|   chance to run?

Once you queue a work item I don't think that the scheduler needs to put
it on the CPU right away. It may have already have other tasks waiting
including some with a RT priority.
Also, the lock is irqsave() so they can be users in an interrupt
handler. I remember the original reason why I made it irqsave is because
something did kmalloc() and SLUB somehow asked for random bits.

> Some research remains...
> 
> Jason

Sebastian

Jason A. Donenfeld Feb. 14, 2022, 10:17 a.m. UTC | #11

On 2/14/22, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> to
> | - Does anything anywhere call get_random_xx() before the worker has a
> |   chance to run?
>
> Once you queue a work item I don't think that the scheduler needs to put
> it on the CPU right away. It may have already have other tasks waiting
> including some with a RT priority.
> Also, the lock is irqsave() so they can be users in an interrupt
> handler. I remember the original reason why I made it irqsave is because
> something did kmalloc() and SLUB somehow asked for random bits.

Right. So there are two sides of the questions: 1) how bad is this
actual race, and are there any drivers that do regularly get bit by
this? 2) There's a largeish window between workqueue_init_early()
setting up the system highprio workqueue, and workqueue_init()
enabling queued workers to actually run. Interrupts also get enabled
in the interim. Does anything get bit by that window?

Jason

Jason A. Donenfeld Feb. 14, 2022, 10:19 a.m. UTC | #12

On 2/13/22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hey Sebastian,
>
> On Fri, Feb 11, 2022 at 6:26 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
>> Sure. I can do the CPU-HP notifier later one once we are done with
>> everything. I acked the v7, don't see a road block.
>
> I've been running this over the weekend and performance is generally okay.
>
> # perf top --symbols $(objdump -t
> /usr/src/linux/drivers/char/random.o|fgrep 'F .text'|awk '{print
> $6}'|tr '\n' ,) -F max -g
>
> One thing I noticed though was that add_interrupt_randomness spends
> most of its time on:
>
>     lock   xadd   %eax,0x38(%rbp)
>
> So, as expected, those atomics really need to go. Indeed we might be
> best off with the CPU hotplug notifier for setting the count back to
> zero.
>
> Do you want to prepare a patch for this? Or should I take a stab at it?

FYI, I was overwhelmed with a compulsion to try doing it myself and
posted that here
https://lore.kernel.org/lkml/20220213215343.11652-1-Jason@zx2c4.com/
which is now pending your review.

Sebastian Andrzej Siewior Feb. 14, 2022, 11:16 a.m. UTC | #13

On 2022-02-14 11:17:20 [+0100], Jason A. Donenfeld wrote:
> On 2/14/22, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > to
> > | - Does anything anywhere call get_random_xx() before the worker has a
> > |   chance to run?
> >
> > Once you queue a work item I don't think that the scheduler needs to put
> > it on the CPU right away. It may have already have other tasks waiting
> > including some with a RT priority.
> > Also, the lock is irqsave() so they can be users in an interrupt
> > handler. I remember the original reason why I made it irqsave is because
> > something did kmalloc() and SLUB somehow asked for random bits.
> 
> Right. So there are two sides of the questions: 1) how bad is this
> actual race, and are there any drivers that do regularly get bit by
> this? 2) There's a largeish window between workqueue_init_early()
> setting up the system highprio workqueue, and workqueue_init()
> enabling queued workers to actually run. Interrupts also get enabled
> in the interim. Does anything get bit by that window?

This is only important during boot-up, right? Otherwise it just extracts
entropy from the pool.
I posted numbers earlier on where the work go scheduled and the three or
four interrupts came in before the work-item was scheduled. I could send
you the diff if you want to up it on some machines.
 
> Jason

Sebastian

Jason A. Donenfeld Feb. 14, 2022, 2:47 p.m. UTC | #14

On Mon, Feb 14, 2022 at 12:16 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-02-14 11:17:20 [+0100], Jason A. Donenfeld wrote:
> > On 2/14/22, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > > to
> > > | - Does anything anywhere call get_random_xx() before the worker has a
> > > |   chance to run?
> > >
> > > Once you queue a work item I don't think that the scheduler needs to put
> > > it on the CPU right away. It may have already have other tasks waiting
> > > including some with a RT priority.
> > > Also, the lock is irqsave() so they can be users in an interrupt
> > > handler. I remember the original reason why I made it irqsave is because
> > > something did kmalloc() and SLUB somehow asked for random bits.
> >
> > Right. So there are two sides of the questions: 1) how bad is this
> > actual race, and are there any drivers that do regularly get bit by
> > this? 2) There's a largeish window between workqueue_init_early()
> > setting up the system highprio workqueue, and workqueue_init()
> > enabling queued workers to actually run. Interrupts also get enabled
> > in the interim. Does anything get bit by that window?
>
> This is only important during boot-up, right?

Right. This is a pre-init window only. But a bunch of things are done
pre-init -- siphash secret keys, aslr seeds, and so forth.

[v6] random: defer fast pool mixing to worker

Commit Message

Comments

Patch