diff mbox series

[RFC,03/12] RDMA/rtrs-srv: Only close srv_path if it is just allocated

Message ID 20221113010823.6436-4-guoqing.jiang@linux.dev (mailing list archive)
State Changes Requested
Headers show
Series Misc changes for rtrs | expand

Commit Message

Guoqing Jiang Nov. 13, 2022, 1:08 a.m. UTC
RTRS creates several connections per nr_cpu_ids, it makes more sense
to only close the path when it just allocated.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Haris Iqbal Nov. 14, 2022, 4:18 p.m. UTC | #1
On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> RTRS creates several connections per nr_cpu_ids, it makes more sense
> to only close the path when it just allocated.
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>  drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> index 2cc8b423bcaa..063082d29fc6 100644
> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>         u16 version, con_num, cid;
>         u16 recon_cnt;
>         int err = -ECONNRESET;
> +       bool alloc_path = false;
>
>         if (len < sizeof(*msg)) {
>                 pr_err("Invalid RTRS connection request\n");
> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>                         pr_err("RTRS server session allocation failed: %d\n", err);
>                         goto reject_w_err;
>                 }
> +               alloc_path = true;
>         }
>         err = create_con(srv_path, cm_id, cid);
>         if (err) {
> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>
>  close_and_return_err:
>         mutex_unlock(&srv->paths_mutex);
> -       close_path(srv_path);
> +       if (alloc_path)
> +               close_path(srv_path);

I think the way this is coded is, if there is a problem opening a
connection in that srv_path, then it closes that srv_path gracefully,
which results in all other connections in that path getting closed,
and the IOs being failover'ed to the other path (in case of
multipath), and then the client would trigger an auto reconnect.

@Jinpu can shed some more light, what would be the preferred course of
action if there is a failure in one connection. To keep the current
design or to avoid closing the path in case of a connection issue.

>
>         return err;
>  }
> --
> 2.31.1
>
Guoqing Jiang Nov. 15, 2022, 10:24 a.m. UTC | #2
On 11/15/22 12:18 AM, Haris Iqbal wrote:
> On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang<guoqing.jiang@linux.dev>  wrote:
>> RTRS creates several connections per nr_cpu_ids, it makes more sense
>> to only close the path when it just allocated.
>>
>> Signed-off-by: Guoqing Jiang<guoqing.jiang@linux.dev>
>> ---
>>   drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
>> index 2cc8b423bcaa..063082d29fc6 100644
>> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
>> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>>          u16 version, con_num, cid;
>>          u16 recon_cnt;
>>          int err = -ECONNRESET;
>> +       bool alloc_path = false;
>>
>>          if (len < sizeof(*msg)) {
>>                  pr_err("Invalid RTRS connection request\n");
>> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>>                          pr_err("RTRS server session allocation failed: %d\n", err);
>>                          goto reject_w_err;
>>                  }
>> +               alloc_path = true;
>>          }
>>          err = create_con(srv_path, cm_id, cid);
>>          if (err) {
>> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
>>
>>   close_and_return_err:
>>          mutex_unlock(&srv->paths_mutex);
>> -       close_path(srv_path);
>> +       if (alloc_path)
>> +               close_path(srv_path);
> I think the way this is coded is, if there is a problem opening a
> connection in that srv_path, then it closes that srv_path gracefully,
> which results in all other connections in that path getting closed,
> and the IOs being failover'ed to the other path (in case of
> multipath), and then the client would trigger an auto reconnect.

Could it possible that IO is happening in the previous connection, then 
a later
failed connection try to close all the connections.

> @Jinpu can shed some more light, what would be the preferred course of
> action if there is a failure in one connection. To keep the current
> design or to avoid closing the path in case of a connection issue.

Frankly, I am less confident about this patch and seems it is a rare 
condition.
Will just drop it for now.

Thanks,
Guoqing
Jinpu Wang Nov. 15, 2022, 10:31 a.m. UTC | #3
On Tue, Nov 15, 2022 at 11:24 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 11/15/22 12:18 AM, Haris Iqbal wrote:
> > On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang<guoqing.jiang@linux.dev>  wrote:
> >> RTRS creates several connections per nr_cpu_ids, it makes more sense
> >> to only close the path when it just allocated.
> >>
> >> Signed-off-by: Guoqing Jiang<guoqing.jiang@linux.dev>
> >> ---
> >>   drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> >> index 2cc8b423bcaa..063082d29fc6 100644
> >> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> >> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
> >>          u16 version, con_num, cid;
> >>          u16 recon_cnt;
> >>          int err = -ECONNRESET;
> >> +       bool alloc_path = false;
> >>
> >>          if (len < sizeof(*msg)) {
> >>                  pr_err("Invalid RTRS connection request\n");
> >> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
> >>                          pr_err("RTRS server session allocation failed: %d\n", err);
> >>                          goto reject_w_err;
> >>                  }
> >> +               alloc_path = true;
> >>          }
> >>          err = create_con(srv_path, cm_id, cid);
> >>          if (err) {
> >> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
> >>
> >>   close_and_return_err:
> >>          mutex_unlock(&srv->paths_mutex);
> >> -       close_path(srv_path);
> >> +       if (alloc_path)
> >> +               close_path(srv_path);
> > I think the way this is coded is, if there is a problem opening a
> > connection in that srv_path, then it closes that srv_path gracefully,
> > which results in all other connections in that path getting closed,
> > and the IOs being failover'ed to the other path (in case of
> > multipath), and then the client would trigger an auto reconnect.
>
> Could it possible that IO is happening in the previous connection, then
> a later
> failed connection try to close all the connections.

Yes, the idea is if any failure during rdma_connect, we close the full
path and expecting the other healthy path
to take over.
>
> > @Jinpu can shed some more light, what would be the preferred course of
> > action if there is a failure in one connection. To keep the current
> > design or to avoid closing the path in case of a connection issue.
>
> Frankly, I am less confident about this patch and seems it is a rare
> condition.
> Will just drop it for now.
>
> Thanks,
> Guoqing
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 2cc8b423bcaa..063082d29fc6 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -1833,6 +1833,7 @@  static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
 	u16 version, con_num, cid;
 	u16 recon_cnt;
 	int err = -ECONNRESET;
+	bool alloc_path = false;
 
 	if (len < sizeof(*msg)) {
 		pr_err("Invalid RTRS connection request\n");
@@ -1906,6 +1907,7 @@  static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
 			pr_err("RTRS server session allocation failed: %d\n", err);
 			goto reject_w_err;
 		}
+		alloc_path = true;
 	}
 	err = create_con(srv_path, cm_id, cid);
 	if (err) {
@@ -1940,7 +1942,8 @@  static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
 
 close_and_return_err:
 	mutex_unlock(&srv->paths_mutex);
-	close_path(srv_path);
+	if (alloc_path)
+		close_path(srv_path);
 
 	return err;
 }