diff mbox

possible isert bug in tear down sequence

Message ID 77fdee90-b51c-1f38-16ea-0183a223f06c@grimberg.me (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sagi Grimberg Nov. 26, 2017, 12:18 p.m. UTC
Hey Ram,

> Let me add a third possibility, that is what we are hitting:
> I see that isert uses isert_cma_handler() and in the following cases
> drain won't be invoked:
>          case RDMA_CM_EVENT_REJECTED:       /* FALLTHRU */
>                  isert_info("Connection rejected: %s\n",
>                             rdma_reject_msg(cma_id, event->status));
>          case RDMA_CM_EVENT_UNREACHABLE:    /* FALLTHRU */
>          case RDMA_CM_EVENT_CONNECT_ERROR:
>                  ret = isert_connect_error(cma_id);
>                  break;
> 
> Specifically, I hit the rejected case. See dmesg below with added prints (rrr...).
> We Are using
> 
> [ 2455.241978] rrr created QP ffff880e984d6c00
> [ 2455.241982] isert: isert_login_post_recv: Setup sge: addr: eb19e4000 length: 8268 0x00000000
> [ 2455.241987] rrr post_recv qp=ffff880e984d6c00, wr_id=ffff880eb19e6064
> [ 2455.242108] isert: isert_cma_handler: rejected (8): status 10 id ffff880eb1f9b000 np ffff8810454d2c40
> [ 2455.242114] isert: isert_cma_handler: Connection rejected: stale conn
> [ 2455.242121] isert: isert_release_kref: conn ffff880eb19e2000 final kref kworker/7:2/6058
> [ 2455.242127] isert: isert_connect_release: conn ffff880eb19e2000
> [ 2455.242156] rrr poll_recv qp=ffff880e984d6c00 RDMA_CQE_RESP_STS_WORK_REQUEST_FLUSHED_ERR, wr_id=ffff880eb19e6064
> [ 2455.242157] rrr destroyed QP ffff880e984d6c00
> [ 2455.242164] Modules linked in: netconsole target_core_user target_core_pscsi target_core_file target_core_iblock
> [ 2455.242183] BUG: unable to handle kernel
> [ 2455.242202]  [<ffffffffa0823813>] isert_login_recv_done+0x23/0x160 [ib_isert]
> 
> A QP gets created, post_recv is invoked, poll_cq as well (flushed) the QP is destroyed and then the workqueue tries to dereference the QP...
> 
> I'm checking why the connection got stale, but anyway I think ib_drain_qp() should be invoked.

100% correct :)

> What do you think?

Does this fix your issue:
--
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Amrani, Ram Nov. 26, 2017, 12:23 p.m. UTC | #1
> Does this fix your issue:

> --

> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c

> b/drivers/infiniband/ulp/isert/ib_isert.c

> index ceabdb85df8b..9d4785ba24cb 100644

> --- a/drivers/infiniband/ulp/isert/ib_isert.c

> +++ b/drivers/infiniband/ulp/isert/ib_isert.c

> @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id)

>   {

>          struct isert_conn *isert_conn = cma_id->qp->qp_context;

> 

> +       ib_drain_qp(isert_conn->qp);

>          list_del_init(&isert_conn->node);

>          isert_conn->cm_id = NULL;

>          isert_put_conn(isert_conn);

> --


Yes, that is exactly what I mean and wanted to check with you.
As I'm remote debugging. It will take me few days to OK it.

Thanks,
Ram
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index ceabdb85df8b..9d4785ba24cb 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -741,6 +741,7 @@  isert_connect_error(struct rdma_cm_id *cma_id)
  {
         struct isert_conn *isert_conn = cma_id->qp->qp_context;

+       ib_drain_qp(isert_conn->qp);
         list_del_init(&isert_conn->node);
         isert_conn->cm_id = NULL;
         isert_put_conn(isert_conn);