Message ID | 20221113010823.6436-4-guoqing.jiang@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Misc changes for rtrs | expand |
On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > > RTRS creates several connections per nr_cpu_ids, it makes more sense > to only close the path when it just allocated. > > Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> > --- > drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c > index 2cc8b423bcaa..063082d29fc6 100644 > --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c > +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c > @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > u16 version, con_num, cid; > u16 recon_cnt; > int err = -ECONNRESET; > + bool alloc_path = false; > > if (len < sizeof(*msg)) { > pr_err("Invalid RTRS connection request\n"); > @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > pr_err("RTRS server session allocation failed: %d\n", err); > goto reject_w_err; > } > + alloc_path = true; > } > err = create_con(srv_path, cm_id, cid); > if (err) { > @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > > close_and_return_err: > mutex_unlock(&srv->paths_mutex); > - close_path(srv_path); > + if (alloc_path) > + close_path(srv_path); I think the way this is coded is, if there is a problem opening a connection in that srv_path, then it closes that srv_path gracefully, which results in all other connections in that path getting closed, and the IOs being failover'ed to the other path (in case of multipath), and then the client would trigger an auto reconnect. @Jinpu can shed some more light, what would be the preferred course of action if there is a failure in one connection. To keep the current design or to avoid closing the path in case of a connection issue. > > return err; > } > -- > 2.31.1 >
On 11/15/22 12:18 AM, Haris Iqbal wrote: > On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang<guoqing.jiang@linux.dev> wrote: >> RTRS creates several connections per nr_cpu_ids, it makes more sense >> to only close the path when it just allocated. >> >> Signed-off-by: Guoqing Jiang<guoqing.jiang@linux.dev> >> --- >> drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c >> index 2cc8b423bcaa..063082d29fc6 100644 >> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c >> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, >> u16 version, con_num, cid; >> u16 recon_cnt; >> int err = -ECONNRESET; >> + bool alloc_path = false; >> >> if (len < sizeof(*msg)) { >> pr_err("Invalid RTRS connection request\n"); >> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, >> pr_err("RTRS server session allocation failed: %d\n", err); >> goto reject_w_err; >> } >> + alloc_path = true; >> } >> err = create_con(srv_path, cm_id, cid); >> if (err) { >> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, >> >> close_and_return_err: >> mutex_unlock(&srv->paths_mutex); >> - close_path(srv_path); >> + if (alloc_path) >> + close_path(srv_path); > I think the way this is coded is, if there is a problem opening a > connection in that srv_path, then it closes that srv_path gracefully, > which results in all other connections in that path getting closed, > and the IOs being failover'ed to the other path (in case of > multipath), and then the client would trigger an auto reconnect. Could it possible that IO is happening in the previous connection, then a later failed connection try to close all the connections. > @Jinpu can shed some more light, what would be the preferred course of > action if there is a failure in one connection. To keep the current > design or to avoid closing the path in case of a connection issue. Frankly, I am less confident about this patch and seems it is a rare condition. Will just drop it for now. Thanks, Guoqing
On Tue, Nov 15, 2022 at 11:24 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > > > > On 11/15/22 12:18 AM, Haris Iqbal wrote: > > On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang<guoqing.jiang@linux.dev> wrote: > >> RTRS creates several connections per nr_cpu_ids, it makes more sense > >> to only close the path when it just allocated. > >> > >> Signed-off-by: Guoqing Jiang<guoqing.jiang@linux.dev> > >> --- > >> drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++- > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c > >> index 2cc8b423bcaa..063082d29fc6 100644 > >> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c > >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c > >> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > >> u16 version, con_num, cid; > >> u16 recon_cnt; > >> int err = -ECONNRESET; > >> + bool alloc_path = false; > >> > >> if (len < sizeof(*msg)) { > >> pr_err("Invalid RTRS connection request\n"); > >> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > >> pr_err("RTRS server session allocation failed: %d\n", err); > >> goto reject_w_err; > >> } > >> + alloc_path = true; > >> } > >> err = create_con(srv_path, cm_id, cid); > >> if (err) { > >> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > >> > >> close_and_return_err: > >> mutex_unlock(&srv->paths_mutex); > >> - close_path(srv_path); > >> + if (alloc_path) > >> + close_path(srv_path); > > I think the way this is coded is, if there is a problem opening a > > connection in that srv_path, then it closes that srv_path gracefully, > > which results in all other connections in that path getting closed, > > and the IOs being failover'ed to the other path (in case of > > multipath), and then the client would trigger an auto reconnect. > > Could it possible that IO is happening in the previous connection, then > a later > failed connection try to close all the connections. Yes, the idea is if any failure during rdma_connect, we close the full path and expecting the other healthy path to take over. > > > @Jinpu can shed some more light, what would be the preferred course of > > action if there is a failure in one connection. To keep the current > > design or to avoid closing the path in case of a connection issue. > > Frankly, I am less confident about this patch and seems it is a rare > condition. > Will just drop it for now. > > Thanks, > Guoqing
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c index 2cc8b423bcaa..063082d29fc6 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, u16 version, con_num, cid; u16 recon_cnt; int err = -ECONNRESET; + bool alloc_path = false; if (len < sizeof(*msg)) { pr_err("Invalid RTRS connection request\n"); @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, pr_err("RTRS server session allocation failed: %d\n", err); goto reject_w_err; } + alloc_path = true; } err = create_con(srv_path, cm_id, cid); if (err) { @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, close_and_return_err: mutex_unlock(&srv->paths_mutex); - close_path(srv_path); + if (alloc_path) + close_path(srv_path); return err; }
RTRS creates several connections per nr_cpu_ids, it makes more sense to only close the path when it just allocated. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> --- drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)