diff mbox series

irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier

Message ID 20210930103754.2128949-1-bigeasy@linutronix.de (mailing list archive)
State Changes Requested
Headers show
Series irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier | expand

Commit Message

Sebastian Andrzej Siewior Sept. 30, 2021, 10:37 a.m. UTC
__raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
is it. The softirq won't be handled in a deterministic way but randomly
when an interrupt fires and handles softirq in its irq_exit() routine or
if something randomly checks and handles pending softirqs in the call
chain before the CPU goes idle.

Add a local_bh_disable/enable() around the IRQ-off section which will
handle pending softirqs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 lib/irq_poll.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Sebastian Andrzej Siewior Sept. 30, 2021, 10:56 a.m. UTC | #1
I was looking at irq_poll and for missing scheduling points.
It raised the question why are there 7 driver still using irq_poll and
not moved on to something else like threaded interrupts or kworker.

There is:
- Infiband can complete direct, irq_poll and kworker.
- be2iscsi only irq_poll.
- cxlflash only irq_poll. Not sure how IRQs are acked.
- ipr direct or irq_poll, can be configured. Now sure how IRQs are acked.
- lpfc kworker and/or irq_poll. Not sure all invocations are from
  interrupts like context [0].
- megaraid irq_poll. Not sure all invocations are from interrupts like
  context [0].
- mpt3sas irq_poll or io_uring io poll. Not sure all invocations are
  from interrupts like context [0].

[0] If irq_poll_sched() is not used from an interrupt (as in interrupt
service routine, timer handler, tasklet (not encouraging just noticed))
but from task context (as in kworker for instance) then irq-poll handler
will not be invoked right away. Instead it will be delayed to random
point in time until an interrupt fires or something down the stack
performs a pending softirq check.
Waking ksoftirqd itself isn't helping much since it will set a
NEED_RESCHED bit in the task_struct which local_irq_restore() isn't
testing for. So the scheduling event is delayed until spin_unlock() for
instance.

Is there a reason for the remaining user of irq_poll to keep using it?

Sebastian
Christoph Hellwig Oct. 1, 2021, 4:24 a.m. UTC | #2
On Thu, Sep 30, 2021 at 12:56:05PM +0200, Sebastian Andrzej Siewior wrote:
> Is there a reason for the remaining user of irq_poll to keep using it?

At least for RDMA there are workloads where the latency difference
matters.  That's why we added both the irq_poll and workqueue mode
to thew new CQ API a few years ago.
Sebastian Andrzej Siewior Oct. 1, 2021, 6:41 a.m. UTC | #3
On 2021-10-01 05:24:49 [+0100], Christoph Hellwig wrote:
> On Thu, Sep 30, 2021 at 12:56:05PM +0200, Sebastian Andrzej Siewior wrote:
> > Is there a reason for the remaining user of irq_poll to keep using it?
> 
> At least for RDMA there are workloads where the latency difference
> matters.  That's why we added both the irq_poll and workqueue mode
> to thew new CQ API a few years ago.

Would it work for them to move to threaded interrupts or is the NAPI
like behaviour (delay after a while to the next jiffy) the killer
feature?

Sebastian
Sebastian Andrzej Siewior Oct. 18, 2021, 7:45 a.m. UTC | #4
On 2021-09-30 12:37:54 [+0200], To linux-kernel@vger.kernel.org wrote:
> __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
> is it. The softirq won't be handled in a deterministic way but randomly
> when an interrupt fires and handles softirq in its irq_exit() routine or
> if something randomly checks and handles pending softirqs in the call
> chain before the CPU goes idle.
> 
> Add a local_bh_disable/enable() around the IRQ-off section which will
> handle pending softirqs.

ping

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  lib/irq_poll.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/irq_poll.c b/lib/irq_poll.c
> index 2f17b488d58e1..2b9f797642f60 100644
> --- a/lib/irq_poll.c
> +++ b/lib/irq_poll.c
> @@ -191,11 +191,13 @@ static int irq_poll_cpu_dead(unsigned int cpu)
>  	 * If a CPU goes away, splice its entries to the current CPU
>  	 * and trigger a run of the softirq
>  	 */
> +	local_bh_disable();
>  	local_irq_disable();
>  	list_splice_init(&per_cpu(blk_cpu_iopoll, cpu),
>  			 this_cpu_ptr(&blk_cpu_iopoll));
>  	__raise_softirq_irqoff(IRQ_POLL_SOFTIRQ);
>  	local_irq_enable();
> +	local_bh_enable();
>  
>  	return 0;
>  }
> -- 
> 2.33.0
> 

Sebastian
Christoph Hellwig Oct. 18, 2021, 10:53 a.m. UTC | #5
On Thu, Sep 30, 2021 at 12:37:54PM +0200, Sebastian Andrzej Siewior wrote:
> __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
> is it. The softirq won't be handled in a deterministic way but randomly
> when an interrupt fires and handles softirq in its irq_exit() routine or
> if something randomly checks and handles pending softirqs in the call
> chain before the CPU goes idle.
> 
> Add a local_bh_disable/enable() around the IRQ-off section which will
> handle pending softirqs.

This patch leaves me extremely confused, and it would even more if I was
just reading the code.  local_irq_disable is supposed to disable BHs
as well, so the code looks pretty much nonsensical to me.  But
apparently that isn't the point if I follow your commit message as you
don't care about an extra level of BH disabling but want to force a
side-effect of the re-enabling?  Why not directly call the helper
to schedule the softirq then?
Sebastian Andrzej Siewior Oct. 18, 2021, 11:49 a.m. UTC | #6
On 2021-10-18 03:53:20 [-0700], Christoph Hellwig wrote:
> On Thu, Sep 30, 2021 at 12:37:54PM +0200, Sebastian Andrzej Siewior wrote:
> > __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this
> > is it. The softirq won't be handled in a deterministic way but randomly
> > when an interrupt fires and handles softirq in its irq_exit() routine or
> > if something randomly checks and handles pending softirqs in the call
> > chain before the CPU goes idle.
> > 
> > Add a local_bh_disable/enable() around the IRQ-off section which will
> > handle pending softirqs.
> 
> This patch leaves me extremely confused, and it would even more if I was
> just reading the code.  local_irq_disable is supposed to disable BHs
> as well, so the code looks pretty much nonsensical to me.  But
> apparently that isn't the point if I follow your commit message as you
> don't care about an extra level of BH disabling but want to force a
> side-effect of the re-enabling?  Why not directly call the helper
> to schedule the softirq then? 

This side-effect is actually the way it works most of the time. There is
one exception in the network stack where do_softirq() is called manually
after checking for pending softirqs. I managed to remove one instance in
commit
   fe420d87bbc23 ("net/core: remove explicit do_softirq() from busy_poll_stop()")

just wasn't so lucky with the other one (yet). Other than that, it is
either local_bh_enable() that ensures that pending softirqs are
processed or __irq_exit_rcu().
Same as preempt_enable() ensure that a context switch happens if the
scheudler decided that one is needed but the CPU was in a
preempt-disabled section at that time.

Anyway. Even with s/__raise_softirq_irqoff/raise_softirq_irqoff/ you are
not much better off. The latter will wake ksoftirqd but it might result
in setting the NEED-resched bit which is not checked by
local_irq_enable(). So you end up waiting until random spin_unlock()
which has the needed check. Nothing here does that here but probably
something before the CPU-HP code is left.

Sebastian
diff mbox series

Patch

diff --git a/lib/irq_poll.c b/lib/irq_poll.c
index 2f17b488d58e1..2b9f797642f60 100644
--- a/lib/irq_poll.c
+++ b/lib/irq_poll.c
@@ -191,11 +191,13 @@  static int irq_poll_cpu_dead(unsigned int cpu)
 	 * If a CPU goes away, splice its entries to the current CPU
 	 * and trigger a run of the softirq
 	 */
+	local_bh_disable();
 	local_irq_disable();
 	list_splice_init(&per_cpu(blk_cpu_iopoll, cpu),
 			 this_cpu_ptr(&blk_cpu_iopoll));
 	__raise_softirq_irqoff(IRQ_POLL_SOFTIRQ);
 	local_irq_enable();
+	local_bh_enable();
 
 	return 0;
 }