diff mbox series

[PATCHv2,for-next,02/19] RMDA/rtrs-srv: Occasionally flush ongoing session closing

Message ID 20201217141915.56989-3-jinpu.wang@cloud.ionos.com (mailing list archive)
State Rejected
Delegated to: Jason Gunthorpe
Headers show
Series Misc update for rtrs | expand

Commit Message

Jinpu Wang Dec. 17, 2020, 2:18 p.m. UTC
If there are many establishments/teardowns, we need to make sure
we do not consume too much system memory. Thus let on going
session closing to finish before accepting new connection.

In cma_ib_req_handler, the conn_id is newly created holding
handler_mutex when call this function, and flush_workqueue
wait for close_work to finish, in close_work rdma_destroy_id
will be called, which will hold the handler_mutex, but they
are mutex for different rdma_cm_id.

To avoid circular locking lockdep warnings, disable lockdep
around flush_workqueue in rtrs_rdma_connect, it's false positive.

Inspired by commit 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Jason Gunthorpe Jan. 12, 2021, 7:13 p.m. UTC | #1
On Thu, Dec 17, 2020 at 03:18:58PM +0100, Jack Wang wrote:
> If there are many establishments/teardowns, we need to make sure
> we do not consume too much system memory. Thus let on going
> session closing to finish before accepting new connection.

Then just limit it, why this scheme?

> In cma_ib_req_handler, the conn_id is newly created holding
> handler_mutex when call this function, and flush_workqueue
> wait for close_work to finish, in close_work rdma_destroy_id
> will be called, which will hold the handler_mutex, but they
> are mutex for different rdma_cm_id.

No, there are multiple handler locks held here, and the new one is
already marked nested, so isn't even the thing triggering lockdep.

The locking for CM is already bonkers, I don't want to see drivers
turning off lockdep. How are you sure that work queue doesn't become
(and won't ever in the future) become entangled with the listening
handler_mutex?

Jason
Jinpu Wang Jan. 13, 2021, 8:55 a.m. UTC | #2
Hi Jason,
On Tue, Jan 12, 2021 at 8:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Dec 17, 2020 at 03:18:58PM +0100, Jack Wang wrote:
> > If there are many establishments/teardowns, we need to make sure
> > we do not consume too much system memory. Thus let on going
> > session closing to finish before accepting new connection.
>
> Then just limit it, why this scheme?
Will think about it, thanks for the suggestion.

>
> > In cma_ib_req_handler, the conn_id is newly created holding
> > handler_mutex when call this function, and flush_workqueue
> > wait for close_work to finish, in close_work rdma_destroy_id
> > will be called, which will hold the handler_mutex, but they
> > are mutex for different rdma_cm_id.
>
> No, there are multiple handler locks held here, and the new one is
> already marked nested, so isn't even the thing triggering lockdep.
>
> The locking for CM is already bonkers, I don't want to see drivers
> turning off lockdep. How are you sure that work queue doesn't become
> (and won't ever in the future) become entangled with the listening
> handler_mutex?
IIUC, only new created conn_id is passed and save in ULP. but I
understand your concerns,
I will drop this patch, think about other solution. Do you need a
resend/rebase for the reset of the patchset?
>
> Jason

Thanks!
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index ed4628f032bb..eb7bb93884e7 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1791,6 +1791,20 @@  static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
 		err = -ENOMEM;
 		goto reject_w_err;
 	}
+	if (!cid) {
+		/* Let inflight session teardown complete *
+		 *
+		 * Avoid circular locking lockdep warnings.
+		 * cma_ib_req_handler, the conn_id is newly created holding
+		 * handler_mutex when call this function, and flush_workqueue
+		 * wait for close_work to finish, in close_work rdma_destroy_id
+		 * will be called, which will hold the handler_mutex, but they
+		 * are mutex for different rdma_cm_id.
+		 */
+		lockdep_off();
+		flush_workqueue(rtrs_wq);
+		lockdep_on();
+	}
 	mutex_lock(&srv->paths_mutex);
 	sess = __find_sess(srv, &msg->sess_uuid);
 	if (sess) {