irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier

Message ID	20210930103754.2128949-1-bigeasy@linutronix.de (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Christoph Hellwig <hch@infradead.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de> Subject: [PATCH] irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier Date: Thu, 30 Sep 2021 12:37:54 +0200 Message-Id: <20210930103754.2128949-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk
Series	irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier \| expand irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier

Sebastian Andrzej Siewior Sept. 30, 2021, 10:37 a.m. UTC

__raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
is it. The softirq won't be handled in a deterministic way but randomly
when an interrupt fires and handles softirq in its irq_exit() routine or
if something randomly checks and handles pending softirqs in the call
chain before the CPU goes idle.

Add a local_bh_disable/enable() around the IRQ-off section which will
handle pending softirqs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 lib/irq_poll.c | 2 ++
 1 file changed, 2 insertions(+)

Sebastian Andrzej Siewior Sept. 30, 2021, 10:56 a.m. UTC | #1

I was looking at irq_poll and for missing scheduling points.
It raised the question why are there 7 driver still using irq_poll and
not moved on to something else like threaded interrupts or kworker.

There is:
- Infiband can complete direct, irq_poll and kworker.
- be2iscsi only irq_poll.
- cxlflash only irq_poll. Not sure how IRQs are acked.
- ipr direct or irq_poll, can be configured. Now sure how IRQs are acked.
- lpfc kworker and/or irq_poll. Not sure all invocations are from
  interrupts like context [0].
- megaraid irq_poll. Not sure all invocations are from interrupts like
  context [0].
- mpt3sas irq_poll or io_uring io poll. Not sure all invocations are
  from interrupts like context [0].

[0] If irq_poll_sched() is not used from an interrupt (as in interrupt
service routine, timer handler, tasklet (not encouraging just noticed))
but from task context (as in kworker for instance) then irq-poll handler
will not be invoked right away. Instead it will be delayed to random
point in time until an interrupt fires or something down the stack
performs a pending softirq check.
Waking ksoftirqd itself isn't helping much since it will set a
NEED_RESCHED bit in the task_struct which local_irq_restore() isn't
testing for. So the scheduling event is delayed until spin_unlock() for
instance.

Is there a reason for the remaining user of irq_poll to keep using it?

Sebastian

Christoph Hellwig Oct. 1, 2021, 4:24 a.m. UTC | #2

On Thu, Sep 30, 2021 at 12:56:05PM +0200, Sebastian Andrzej Siewior wrote:
> Is there a reason for the remaining user of irq_poll to keep using it?

At least for RDMA there are workloads where the latency difference
matters.  That's why we added both the irq_poll and workqueue mode
to thew new CQ API a few years ago.

Sebastian Andrzej Siewior Oct. 1, 2021, 6:41 a.m. UTC | #3

On 2021-10-01 05:24:49 [+0100], Christoph Hellwig wrote:
> On Thu, Sep 30, 2021 at 12:56:05PM +0200, Sebastian Andrzej Siewior wrote:
> > Is there a reason for the remaining user of irq_poll to keep using it?
> 
> At least for RDMA there are workloads where the latency difference
> matters.  That's why we added both the irq_poll and workqueue mode
> to thew new CQ API a few years ago.

Would it work for them to move to threaded interrupts or is the NAPI
like behaviour (delay after a while to the next jiffy) the killer
feature?

Sebastian

Sebastian Andrzej Siewior Oct. 18, 2021, 7:45 a.m. UTC | #4

On 2021-09-30 12:37:54 [+0200], To linux-kernel@vger.kernel.org wrote:
> __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
> is it. The softirq won't be handled in a deterministic way but randomly
> when an interrupt fires and handles softirq in its irq_exit() routine or
> if something randomly checks and handles pending softirqs in the call
> chain before the CPU goes idle.
> 
> Add a local_bh_disable/enable() around the IRQ-off section which will
> handle pending softirqs.

ping

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  lib/irq_poll.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/irq_poll.c b/lib/irq_poll.c
> index 2f17b488d58e1..2b9f797642f60 100644
> --- a/lib/irq_poll.c
> +++ b/lib/irq_poll.c
> @@ -191,11 +191,13 @@ static int irq_poll_cpu_dead(unsigned int cpu)
>  	 * If a CPU goes away, splice its entries to the current CPU
>  	 * and trigger a run of the softirq
>  	 */
> +	local_bh_disable();
>  	local_irq_disable();
>  	list_splice_init(&per_cpu(blk_cpu_iopoll, cpu),
>  			 this_cpu_ptr(&blk_cpu_iopoll));
>  	__raise_softirq_irqoff(IRQ_POLL_SOFTIRQ);
>  	local_irq_enable();
> +	local_bh_enable();
>  
>  	return 0;
>  }
> -- 
> 2.33.0
> 

Sebastian

Christoph Hellwig Oct. 18, 2021, 10:53 a.m. UTC | #5

On Thu, Sep 30, 2021 at 12:37:54PM +0200, Sebastian Andrzej Siewior wrote:
> __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
> is it. The softirq won't be handled in a deterministic way but randomly
> when an interrupt fires and handles softirq in its irq_exit() routine or
> if something randomly checks and handles pending softirqs in the call
> chain before the CPU goes idle.
> 
> Add a local_bh_disable/enable() around the IRQ-off section which will
> handle pending softirqs.

This patch leaves me extremely confused, and it would even more if I was
just reading the code.  local_irq_disable is supposed to disable BHs
as well, so the code looks pretty much nonsensical to me.  But
apparently that isn't the point if I follow your commit message as you
don't care about an extra level of BH disabling but want to force a
side-effect of the re-enabling?  Why not directly call the helper
to schedule the softirq then?

Sebastian Andrzej Siewior Oct. 18, 2021, 11:49 a.m. UTC | #6

On 2021-10-18 03:53:20 [-0700], Christoph Hellwig wrote:
> On Thu, Sep 30, 2021 at 12:37:54PM +0200, Sebastian Andrzej Siewior wrote:
> > __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
> > is it. The softirq won't be handled in a deterministic way but randomly
> > when an interrupt fires and handles softirq in its irq_exit() routine or
> > if something randomly checks and handles pending softirqs in the call
> > chain before the CPU goes idle.
> > 
> > Add a local_bh_disable/enable() around the IRQ-off section which will
> > handle pending softirqs.
> 
> This patch leaves me extremely confused, and it would even more if I was
> just reading the code.  local_irq_disable is supposed to disable BHs
> as well, so the code looks pretty much nonsensical to me.  But
> apparently that isn't the point if I follow your commit message as you
> don't care about an extra level of BH disabling but want to force a
> side-effect of the re-enabling?  Why not directly call the helper
> to schedule the softirq then? 

This side-effect is actually the way it works most of the time. There is
one exception in the network stack where do_softirq() is called manually
after checking for pending softirqs. I managed to remove one instance in
commit
   fe420d87bbc23 ("net/core: remove explicit do_softirq() from busy_poll_stop()")

just wasn't so lucky with the other one (yet). Other than that, it is
either local_bh_enable() that ensures that pending softirqs are
processed or __irq_exit_rcu().
Same as preempt_enable() ensure that a context switch happens if the
scheudler decided that one is needed but the CPU was in a
preempt-disabled section at that time.

Anyway. Even with s/__raise_softirq_irqoff/raise_softirq_irqoff/ you are
not much better off. The latter will wake ksoftirqd but it might result
in setting the NEED-resched bit which is not checked by
local_irq_enable(). So you end up waiting until random spin_unlock()
which has the needed check. Nothing here does that here but probably
something before the CPU-HP code is left.

Sebastian

irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier

Commit Message

Comments

Patch