Message ID | 20210930103754.2128949-1-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier | expand |
I was looking at irq_poll and for missing scheduling points. It raised the question why are there 7 driver still using irq_poll and not moved on to something else like threaded interrupts or kworker. There is: - Infiband can complete direct, irq_poll and kworker. - be2iscsi only irq_poll. - cxlflash only irq_poll. Not sure how IRQs are acked. - ipr direct or irq_poll, can be configured. Now sure how IRQs are acked. - lpfc kworker and/or irq_poll. Not sure all invocations are from interrupts like context [0]. - megaraid irq_poll. Not sure all invocations are from interrupts like context [0]. - mpt3sas irq_poll or io_uring io poll. Not sure all invocations are from interrupts like context [0]. [0] If irq_poll_sched() is not used from an interrupt (as in interrupt service routine, timer handler, tasklet (not encouraging just noticed)) but from task context (as in kworker for instance) then irq-poll handler will not be invoked right away. Instead it will be delayed to random point in time until an interrupt fires or something down the stack performs a pending softirq check. Waking ksoftirqd itself isn't helping much since it will set a NEED_RESCHED bit in the task_struct which local_irq_restore() isn't testing for. So the scheduling event is delayed until spin_unlock() for instance. Is there a reason for the remaining user of irq_poll to keep using it? Sebastian
On Thu, Sep 30, 2021 at 12:56:05PM +0200, Sebastian Andrzej Siewior wrote:
> Is there a reason for the remaining user of irq_poll to keep using it?
At least for RDMA there are workloads where the latency difference
matters. That's why we added both the irq_poll and workqueue mode
to thew new CQ API a few years ago.
On 2021-10-01 05:24:49 [+0100], Christoph Hellwig wrote: > On Thu, Sep 30, 2021 at 12:56:05PM +0200, Sebastian Andrzej Siewior wrote: > > Is there a reason for the remaining user of irq_poll to keep using it? > > At least for RDMA there are workloads where the latency difference > matters. That's why we added both the irq_poll and workqueue mode > to thew new CQ API a few years ago. Would it work for them to move to threaded interrupts or is the NAPI like behaviour (delay after a while to the next jiffy) the killer feature? Sebastian
On 2021-09-30 12:37:54 [+0200], To linux-kernel@vger.kernel.org wrote: > __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this > is it. The softirq won't be handled in a deterministic way but randomly > when an interrupt fires and handles softirq in its irq_exit() routine or > if something randomly checks and handles pending softirqs in the call > chain before the CPU goes idle. > > Add a local_bh_disable/enable() around the IRQ-off section which will > handle pending softirqs. ping > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > lib/irq_poll.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/lib/irq_poll.c b/lib/irq_poll.c > index 2f17b488d58e1..2b9f797642f60 100644 > --- a/lib/irq_poll.c > +++ b/lib/irq_poll.c > @@ -191,11 +191,13 @@ static int irq_poll_cpu_dead(unsigned int cpu) > * If a CPU goes away, splice its entries to the current CPU > * and trigger a run of the softirq > */ > + local_bh_disable(); > local_irq_disable(); > list_splice_init(&per_cpu(blk_cpu_iopoll, cpu), > this_cpu_ptr(&blk_cpu_iopoll)); > __raise_softirq_irqoff(IRQ_POLL_SOFTIRQ); > local_irq_enable(); > + local_bh_enable(); > > return 0; > } > -- > 2.33.0 > Sebastian
On Thu, Sep 30, 2021 at 12:37:54PM +0200, Sebastian Andrzej Siewior wrote: > __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this > is it. The softirq won't be handled in a deterministic way but randomly > when an interrupt fires and handles softirq in its irq_exit() routine or > if something randomly checks and handles pending softirqs in the call > chain before the CPU goes idle. > > Add a local_bh_disable/enable() around the IRQ-off section which will > handle pending softirqs. This patch leaves me extremely confused, and it would even more if I was just reading the code. local_irq_disable is supposed to disable BHs as well, so the code looks pretty much nonsensical to me. But apparently that isn't the point if I follow your commit message as you don't care about an extra level of BH disabling but want to force a side-effect of the re-enabling? Why not directly call the helper to schedule the softirq then?
On 2021-10-18 03:53:20 [-0700], Christoph Hellwig wrote: > On Thu, Sep 30, 2021 at 12:37:54PM +0200, Sebastian Andrzej Siewior wrote: > > __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this > > is it. The softirq won't be handled in a deterministic way but randomly > > when an interrupt fires and handles softirq in its irq_exit() routine or > > if something randomly checks and handles pending softirqs in the call > > chain before the CPU goes idle. > > > > Add a local_bh_disable/enable() around the IRQ-off section which will > > handle pending softirqs. > > This patch leaves me extremely confused, and it would even more if I was > just reading the code. local_irq_disable is supposed to disable BHs > as well, so the code looks pretty much nonsensical to me. But > apparently that isn't the point if I follow your commit message as you > don't care about an extra level of BH disabling but want to force a > side-effect of the re-enabling? Why not directly call the helper > to schedule the softirq then? This side-effect is actually the way it works most of the time. There is one exception in the network stack where do_softirq() is called manually after checking for pending softirqs. I managed to remove one instance in commit fe420d87bbc23 ("net/core: remove explicit do_softirq() from busy_poll_stop()") just wasn't so lucky with the other one (yet). Other than that, it is either local_bh_enable() that ensures that pending softirqs are processed or __irq_exit_rcu(). Same as preempt_enable() ensure that a context switch happens if the scheudler decided that one is needed but the CPU was in a preempt-disabled section at that time. Anyway. Even with s/__raise_softirq_irqoff/raise_softirq_irqoff/ you are not much better off. The latter will wake ksoftirqd but it might result in setting the NEED-resched bit which is not checked by local_irq_enable(). So you end up waiting until random spin_unlock() which has the needed check. Nothing here does that here but probably something before the CPU-HP code is left. Sebastian
diff --git a/lib/irq_poll.c b/lib/irq_poll.c index 2f17b488d58e1..2b9f797642f60 100644 --- a/lib/irq_poll.c +++ b/lib/irq_poll.c @@ -191,11 +191,13 @@ static int irq_poll_cpu_dead(unsigned int cpu) * If a CPU goes away, splice its entries to the current CPU * and trigger a run of the softirq */ + local_bh_disable(); local_irq_disable(); list_splice_init(&per_cpu(blk_cpu_iopoll, cpu), this_cpu_ptr(&blk_cpu_iopoll)); __raise_softirq_irqoff(IRQ_POLL_SOFTIRQ); local_irq_enable(); + local_bh_enable(); return 0; }
__raise_softirq_irqoff() adds a bit to the pending sofirq mask and this is it. The softirq won't be handled in a deterministic way but randomly when an interrupt fires and handles softirq in its irq_exit() routine or if something randomly checks and handles pending softirqs in the call chain before the CPU goes idle. Add a local_bh_disable/enable() around the IRQ-off section which will handle pending softirqs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- lib/irq_poll.c | 2 ++ 1 file changed, 2 insertions(+)