diff mbox

[2/2] IB/hfi1: Handle packets in the theaded handler only

Message ID 20171003154920.31566-3-acme@kernel.org (mailing list archive)
State Not Applicable
Headers show

Commit Message

Arnaldo Carvalho de Melo Oct. 3, 2017, 3:49 p.m. UTC
From: Arnaldo Carvalho de Melo <acme@redhat.com>

The hfi1 driver calls request_threaded_irq with two parameters:

      handler = receive_context_interrupt;
      thread = receive_context_thread;
      request_threaded_irq(me->msix.vector, handler, thread, 0, me->name, arg);

And tries to process packets on the hard irq one, receive_context_interrupt(),
only waking up the thread (returning IRQ_WAKE_THREAD) when some threshold is
crossed in the number of packets available in the NIC, trying to balance
latency and bandwidth.

But in a CONFIG_PREEMPT_RT_FULL kernel it ends up calling spin locks from the
hard irq handler (receive_context_interrupt) which causes BUGs like this:

  [ 1002.740581] hfi1 0000:21:00.0: hfi1_0: set_link_state: current ARMED, new ACTIVE
  [ 1002.740583] hfi1 0000:21:00.0: hfi1_0: logical state changed to PORT_ACTIVE (0x4)
  [ 1002.740599] hfi1 0000:21:00.0: hfi1_0: send_idle_message: sending idle message 0x203
  [ 1002.741873] hfi1 0000:21:00.0: hfi1_0: read_idle_message: read idle message 0x203
  [ 1002.741874] hfi1 0000:21:00.0: hfi1_0: handle_sma_message: SMA message 0x2
  [ 1002.741923] hfi1 0000:21:00.0: hfi1_0: Switching to NO_DMA_RTAIL
  [ 1004.744192] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link becomes ready
  [ 1167.907754] ------------[ cut here ]------------
  [ 1167.907756] kernel BUG at kernel/rtmutex.c:902!
  [ 1167.907758] invalid opcode: 0000 [#1] PREEMPT SMP
  <SNIP>
  [ 1167.907805] CPU: 10 PID: 1505 Comm: hfi1_cq0 Not tainted 3.10.0-708.rt56.635.test.el7.x86_64 #1
  <SNIP>
  [ 1167.907823] Call Trace:
  [ 1167.907826]  <IRQ>
  [ 1167.907850]  [<ffffffffc06e4981>] ? hfi1_rvt_get_rwqe+0x141/0x400 [hfi1]
  [ 1167.907852]  [<ffffffff816b7625>] rt_spin_lock+0x25/0x30
  [ 1167.907856]  [<ffffffff810aa774>] queue_kthread_work+0x24/0x60
  [ 1167.907861]  [<ffffffffc068845b>] rvt_cq_enter+0x17b/0x250 [rdmavt]
  [ 1167.907869]  [<ffffffffc06e391a>] hfi1_rc_rcv+0x67a/0x1260 [hfi1]
  [ 1167.907878]  [<ffffffffc06fefc8>] hfi1_ib_rcv+0x2c8/0x400 [hfi1]
  [ 1167.907886]  [<ffffffffc06c381c>] process_receive_ib+0x6c/0x150 [hfi1]
  [ 1167.907888]  [<ffffffff810cee9d>] ? enqueue_pushable_task+0x6d/0x90
  [ 1167.907895]  [<ffffffffc06c1f31>] handle_receive_interrupt_nodma_rtail+0x161/0x310 [hfi1]
  [ 1167.907914]  [<ffffffffc06b49d3>] receive_context_interrupt+0x53/0x390 [hfi1]
  [ 1167.907917]  [<ffffffff8112fb26>] __handle_irq_event_percpu+0x56/0x240
  [ 1167.907919]  [<ffffffff816b7616>] ? rt_spin_lock+0x16/0x30
  [ 1167.907920]  [<ffffffff8112fd59>] handle_irq_event_percpu+0x49/0xa0
  [ 1167.907922]  [<ffffffff8112fe28>] handle_irq_event+0x78/0xb0
  [ 1167.907924]  [<ffffffff81132d29>] handle_edge_irq+0x99/0x1a0
  [ 1167.907926]  [<ffffffff8101ea7b>] handle_irq+0xbb/0x150
  [ 1167.907929]  [<ffffffff816c298d>] do_IRQ+0x4d/0xe0
  [ 1167.907931]  [<ffffffff816b7fad>] common_interrupt+0x6d/0x6d
  [ 1167.907931]  <EOI>
  [ 1167.907932]  [<ffffffff816b7616>] ? rt_spin_lock+0x16/0x30
  [ 1167.907934]  [<ffffffff810aaa55>] ? kthread_worker_fn+0xb5/0x170
  [ 1167.907935]  [<ffffffff810aa9a0>] ? flush_kthread_work+0x130/0x130
  [ 1167.907937]  [<ffffffff810aabdf>] kthread+0xcf/0xe0
  [ 1167.907938]  [<ffffffff810aab10>] ? kthread_worker_fn+0x170/0x170
  [ 1167.907940]  [<ffffffff816c0498>] ret_from_fork+0x58/0x90
  [ 1167.907941]  [<ffffffff810aab10>] ? kthread_worker_fn+0x170/0x170
  [ 1167.907951] Code: 90 e8 eb f0 ff ff e9 d4 fd ff ff 66 0f 1f 44 00 00 e8 db f0 ff ff eb b6 0f 0b 0f 1f 80 00 00 00 00 e8 0b f7 a3 ff e8 46 86 9c ff <0f> 0b 0f 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 65 4c 8b 3c
  [ 1167.907952] RIP  [<ffffffff816b62fa>] rt_spin_lock_slowlock+0x34a/0x350
  [ 1167.907952]  RSP <ffff880c3f403ad0>

To get it to work on RT just keep the prologue that clears the chip receive
interrupt and immediately return IRQ_WAKE_THREAD, deferring all packet
processing, with its locking, to the thread.

With this test systems are able to pass traffic over this hardware using a
CONFIG_PREEMPT_RT_FULL patched kernel without triggering these BUGs.

Cc: Clark Williams <williams@redhat.com>
Cc: Dean Luick <dean.luick@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Julia Cartwright <julia@ni.com>
Cc: Kaike Wan <kaike.wan@intel.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 drivers/infiniband/hw/hfi1/chip.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Sebastian Andrzej Siewior Oct. 5, 2017, 4:27 p.m. UTC | #1
On 2017-10-03 12:49:20 [-0300], Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> The hfi1 driver calls request_threaded_irq with two parameters:
> 
>       handler = receive_context_interrupt;
>       thread = receive_context_thread;
>       request_threaded_irq(me->msix.vector, handler, thread, 0, me->name, arg);
> 
> And tries to process packets on the hard irq one, receive_context_interrupt(),
> only waking up the thread (returning IRQ_WAKE_THREAD) when some threshold is
> crossed in the number of packets available in the NIC, trying to balance
> latency and bandwidth.
> 
> But in a CONFIG_PREEMPT_RT_FULL kernel it ends up calling spin locks from the
> hard irq handler (receive_context_interrupt) which causes BUGs like this:

If I am not mistaken current devel-tree of RT (and a few releases before
that) handle that case correctly and force-thread both threads.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dennis Dalessandro Oct. 10, 2017, 7:06 p.m. UTC | #2
On 10/3/2017 11:49 AM, Arnaldo Carvalho de Melo wrote:
> Cc: Clark Williams <williams@redhat.com>
> Cc: Dean Luick <dean.luick@intel.com>
> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Julia Cartwright <julia@ni.com>
> Cc: Kaike Wan <kaike.wan@intel.com>
> Cc: Leon Romanovsky <leonro@mellanox.com>
> Cc: linux-rdma@vger.kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
> Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Mike has been working with these developers off list and has provided a 
slightly different approach. I'm going to let them drive that before I 
add an RB/Ack here.

-Dennis
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 10, 2017, 7:15 p.m. UTC | #3
Em Tue, Oct 10, 2017 at 03:06:00PM -0400, Dennis Dalessandro escreveu:
> On 10/3/2017 11:49 AM, Arnaldo Carvalho de Melo wrote:
> > Cc: Clark Williams <williams@redhat.com>
> > Cc: Dean Luick <dean.luick@intel.com>
> > Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> > Cc: Doug Ledford <dledford@redhat.com>
> > Cc: Julia Cartwright <julia@ni.com>
> > Cc: Kaike Wan <kaike.wan@intel.com>
> > Cc: Leon Romanovsky <leonro@mellanox.com>
> > Cc: linux-rdma@vger.kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>
> > Cc: Sebastian Sanchez <sebastian.sanchez@intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> Mike has been working with these developers off list and has provided a
> slightly different approach. I'm going to let them drive that before I add
> an RB/Ack here.

This may not even be needed, i.e. Sebastian Siewior said that recent
rt patches already identify the way this driver uses threaded interrupts
and automagically does the right thing, Sebastian?

He mentioned this in this message:

https://marc.info/?l=linux-rt-users&m=150722087514307&w=2

But I replied to Mike and don't recall getting an answer to my last
question.

Meanwhile a kernel with these two patches survived the original bug
reported at Red Hat's bugzilla.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Oct. 11, 2017, 10:44 a.m. UTC | #4
On 2017-10-10 16:15:29 [-0300], Arnaldo Carvalho de Melo wrote:
> This may not even be needed, i.e. Sebastian Siewior said that recent
> rt patches already identify the way this driver uses threaded interrupts
> and automagically does the right thing, Sebastian?
> 
> He mentioned this in this message:
> 
> https://marc.info/?l=linux-rt-users&m=150722087514307&w=2
> 
> But I replied to Mike and don't recall getting an answer to my last
> question.

correct, since 4.1.7-rt8
  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v4.1.7-rt8-patches&id=7461758b9e982e4ea6280ce9308492e7cceda2ed

> - Arnaldo

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 11, 2017, 1:42 p.m. UTC | #5
Em Wed, Oct 11, 2017 at 12:44:56PM +0200, Sebastian Andrzej Siewior escreveu:
> On 2017-10-10 16:15:29 [-0300], Arnaldo Carvalho de Melo wrote:
> > This may not even be needed, i.e. Sebastian Siewior said that recent
> > rt patches already identify the way this driver uses threaded interrupts
> > and automagically does the right thing, Sebastian?
> > 
> > He mentioned this in this message:
> > 
> > https://marc.info/?l=linux-rt-users&m=150722087514307&w=2
> > 
> > But I replied to Mike and don't recall getting an answer to my last
> > question.
> 
> correct, since 4.1.7-rt8
>   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v4.1.7-rt8-patches&id=7461758b9e982e4ea6280ce9308492e7cceda2ed

Ok, I'll try backporting that change and will try without 2/2.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 11, 2017, 7:07 p.m. UTC | #6
Em Wed, Oct 11, 2017 at 12:44:56PM +0200, Sebastian Andrzej Siewior escreveu:
> On 2017-10-10 16:15:29 [-0300], Arnaldo Carvalho de Melo wrote:
> > This may not even be needed, i.e. Sebastian Siewior said that recent
> > rt patches already identify the way this driver uses threaded interrupts
> > and automagically does the right thing, Sebastian?
> > 
> > He mentioned this in this message:
> > 
> > https://marc.info/?l=linux-rt-users&m=150722087514307&w=2
> > 
> > But I replied to Mike and don't recall getting an answer to my last
> > question.
> 
> correct, since 4.1.7-rt8
>   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v4.1.7-rt8-patches&id=7461758b9e982e4ea6280ce9308492e7cceda2ed

Ouch, it seems I looked at the wrong place, as patches-4.11.12-rt14
doesn't have this one, right?

Off I go trying to make sense of this...

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnaldo Carvalho de Melo Oct. 11, 2017, 7:14 p.m. UTC | #7
Em Wed, Oct 11, 2017 at 04:07:45PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Oct 11, 2017 at 12:44:56PM +0200, Sebastian Andrzej Siewior escreveu:
> > On 2017-10-10 16:15:29 [-0300], Arnaldo Carvalho de Melo wrote:
> > > This may not even be needed, i.e. Sebastian Siewior said that recent
> > > rt patches already identify the way this driver uses threaded interrupts
> > > and automagically does the right thing, Sebastian?
> > > 
> > > He mentioned this in this message:
> > > 
> > > https://marc.info/?l=linux-rt-users&m=150722087514307&w=2
> > > 
> > > But I replied to Mike and don't recall getting an answer to my last
> > > question.
> > 
> > correct, since 4.1.7-rt8
> >   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v4.1.7-rt8-patches&id=7461758b9e982e4ea6280ce9308492e7cceda2ed
> 
> Ouch, it seems I looked at the wrong place, as patches-4.11.12-rt14
> doesn't have this one, right?
> 
> Off I go trying to make sense of this...

Ok, this was merged upstream on v4.4:

[acme@jouet linux]$ git log -1 --oneline 2a1d3ab8986d1
2a1d3ab8986d genirq: Handle force threading of irqs with primary and thread handler

nevermind :-)

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 121a4c920f1b..733a00d8ea4c 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8226,15 +8226,17 @@  static irqreturn_t receive_context_interrupt(int irq, void *data)
 {
 	struct hfi1_ctxtdata *rcd = data;
 	struct hfi1_devdata *dd = rcd->dd;
-	int disposition;
-	int present;
 
 	trace_hfi1_receive_interrupt(dd, rcd->ctxt);
 	this_cpu_inc(*dd->int_counter);
 	aspm_ctx_disable(rcd);
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+	return IRQ_WAKE_THREAD;
+#else
+{
 	/* receive interrupt remains blocked while processing packets */
-	disposition = rcd->do_interrupt(rcd, 0);
+	int disposition = rcd->do_interrupt(rcd, 0), present;
 
 	/*
 	 * Too many packets were seen while processing packets in this
@@ -8257,6 +8259,8 @@  static irqreturn_t receive_context_interrupt(int irq, void *data)
 
 	return IRQ_HANDLED;
 }
+#endif
+}
 
 /*
  * Receive packet thread handler.  This expects to be invoked with the