diff mbox

Linux kernel v4.15-rc4 and rdma_rxe

Message ID 1515014747.2582.46.camel@wdc.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Bart Van Assche Jan. 3, 2018, 9:25 p.m. UTC
On Wed, 2018-01-03 at 07:13 +0200, Moni Shoua wrote:
> > Does this perhaps mean that the rxe_qp structure can be freed while rxe_do_task()

> > is in progress? Please note that the ib_srpt driver only destroys a QP

> > (srpt_destroy_ch_ib() call in srpt_release_channel_work()) after all SCSI command

> > processing has finished (transport_deregister_session()).

> 

> If I understand right you say that the system is hung when trying to

> take a lock in rxe_do_taks() (line 89). Is that right?

> Anyway, It's possible that you hit a bug related to destroying a QP.


Hello Moni,

The issues I had reported may be unrelated. BTW, this is what I saw appearing
in the system log a few minutes ago:

Jan  3 13:03:56 ubuntu-vm kernel: ib_srpt:srpt_close_ch: ib_srpt 192.168.122.76-18: queued zerolength write
Jan  3 13:03:56 ubuntu-vm kernel: rdma_rxe:rxe_completer: rdma_rxe: rxe_completer(): qp valid 1, state ERROR
[ ... ]
Jan  3 13:04:09 ubuntu-vm kernel: ib_srpt:srpt_disconnect_ch_sync: ib_srpt ch 192.168.122.76-18 state 3
[ ... ]
Jan  3 13:04:14 ubuntu-vm kernel: ib_srpt srpt_disconnect_ch_sync(192.168.122.76-18 state 3): still waiting ...

In other words, the ib_srpt driver had queued a zero-length write and changed
the QP state into ERROR but no completion was queued for that zero-length write.
The rdma_rxe log message was generated by the following code:


Bart.
diff mbox

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 6cdc40ed8a9f..f6c40edbddc6 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -550,6 +550,9 @@  int rxe_completer(void *arg)
 
 	if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
 	    qp->req.state == QP_STATE_RESET) {
+		pr_debug("rxe_completer(): qp valid %d, state %s\n",
+			 qp->valid, qp->req.state == QP_STATE_ERROR ? "ERROR" :
+			 qp->req.state == QP_STATE_RESET ? "RESET" : "(?)");
 		rxe_drain_resp_pkts(qp, qp->valid &&
 				    qp->req.state == QP_STATE_ERROR);
 		goto exit;