diff mbox series

[v2] RDMA/siw: Fix connection failure handling

Message ID 20230905145822.446263-1-bmt@zurich.ibm.com (mailing list archive)
State Accepted
Headers show
Series [v2] RDMA/siw: Fix connection failure handling | expand

Commit Message

Bernard Metzler Sept. 5, 2023, 2:58 p.m. UTC
In case immediate MPA request processing fails, the newly
created endpoint unlinks the listening endpoint and is
ready to be dropped. This special case was not handled
correctly by the code handling the later TCP socket close,
causing a NULL dereference crash in siw_cm_work_handler()
when dereferencing a NULL listener. We now also cancel
the useless MPA timeout, if immediate MPA request
processing fails.

This patch furthermore simplifies MPA processing in general:
Scheduling a useless TCP socket read in sk_data_ready() upcall
is now surpressed, if the socket is already moved out of
TCP_ESTABLISHED state.

Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
---
ChangeLog v1->v2:
- Move debug message to now conditional listener drop
---
 drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Guoqing Jiang Sept. 8, 2023, 1:34 a.m. UTC | #1
Hi,

On 9/5/23 22:58, Bernard Metzler wrote:
> In case immediate MPA request processing fails, the newly
> created endpoint unlinks the listening endpoint and is
> ready to be dropped. This special case was not handled
> correctly by the code handling the later TCP socket close,
> causing a NULL dereference crash in siw_cm_work_handler()
> when dereferencing a NULL listener. We now also cancel
> the useless MPA timeout, if immediate MPA request
> processing fails.
>
> This patch furthermore simplifies MPA processing in general:
> Scheduling a useless TCP socket read in sk_data_ready() upcall
> is now surpressed, if the socket is already moved out of
> TCP_ESTABLISHED state.
>
> Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
> Signed-off-by: Bernard Metzler<bmt@zurich.ibm.com>
> ---
> ChangeLog v1->v2:
> - Move debug message to now conditional listener drop
> ---
>   drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
> index a2605178f4ed..43e776073f49 100644
> --- a/drivers/infiniband/sw/siw/siw_cm.c
> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep)
>   			siw_cep_put(cep);
>   			new_cep->listen_cep = NULL;
>   			if (rv) {
> +				siw_cancel_mpatimer(new_cep);

One question, why siw_cm_work_handler set cep->mpa_timer to NULL instead 
of call
siw_cancel_mpatimer like above? Could it be memory leak issue for 
cep->mpa_timer?
Let's say mpa_timer is set to NULL before  call siw_cancel_mpatimer.

>   				siw_cep_set_free(new_cep);
>   				goto error;
>   			}
> @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct *w)
>   				/*
>   				 * Socket close before MPA request received.
>   				 */
> -				siw_dbg_cep(cep, "no mpareq: drop listener\n");
> -				siw_cep_put(cep->listen_cep);
> -				cep->listen_cep = NULL;
> +				if (cep->listen_cep) {
> +					siw_dbg_cep(cep,
> +						"no mpareq: drop listener\n");
> +					siw_cep_put(cep->listen_cep);
> +					cep->listen_cep = NULL;
> +				}
>   			}
>   		}
>   		release_cep = 1;
> @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk)
>   	if (!cep)
>   		goto out;
>   
> -	siw_dbg_cep(cep, "state: %d\n", cep->state);
> +	siw_dbg_cep(cep, "cep state: %d, socket state %d\n",
> +		    cep->state, sk->sk_state);
> +
> +	if (sk->sk_state != TCP_ESTABLISHED)
> +		goto out;
>   

Maybe split above to another patch makes more sense, just my $0.02.

Thanks,
Guoqing
Bernard Metzler Sept. 8, 2023, 10:55 a.m. UTC | #2
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Friday, 8 September 2023 03:35
> To: Bernard Metzler <BMT@zurich.ibm.com>; linux-rdma@vger.kernel.org
> Cc: jgg@ziepe.ca; leon@kernel.org
> Subject: [EXTERNAL] Re: [PATCH v2] RDMA/siw: Fix connection failure
> handling
> 
> Hi,
> 
> On 9/5/23 22:58, Bernard Metzler wrote:
> > In case immediate MPA request processing fails, the newly
> > created endpoint unlinks the listening endpoint and is
> > ready to be dropped. This special case was not handled
> > correctly by the code handling the later TCP socket close,
> > causing a NULL dereference crash in siw_cm_work_handler()
> > when dereferencing a NULL listener. We now also cancel
> > the useless MPA timeout, if immediate MPA request
> > processing fails.
> >
> > This patch furthermore simplifies MPA processing in general:
> > Scheduling a useless TCP socket read in sk_data_ready() upcall
> > is now surpressed, if the socket is already moved out of
> > TCP_ESTABLISHED state.
> >
> > Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
> > Signed-off-by: Bernard Metzler<bmt@zurich.ibm.com>
> > ---
> > ChangeLog v1->v2:
> > - Move debug message to now conditional listener drop
> > ---
> >   drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++----
> >   1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> b/drivers/infiniband/sw/siw/siw_cm.c
> > index a2605178f4ed..43e776073f49 100644
> > --- a/drivers/infiniband/sw/siw/siw_cm.c
> > +++ b/drivers/infiniband/sw/siw/siw_cm.c
> > @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep)
> >   			siw_cep_put(cep);
> >   			new_cep->listen_cep = NULL;
> >   			if (rv) {
> > +				siw_cancel_mpatimer(new_cep);
> 
> One question, why siw_cm_work_handler set cep->mpa_timer to NULL instead
> of call
> siw_cancel_mpatimer like above? Could it be memory leak issue for
> cep->mpa_timer?

Thanks for reviewing.

Above in the proposed patch, we cancel a pending
MPA timer.

You are referring to siw_cm.c:1115

        case SIW_CM_WORK_MPATIMEOUT:
                cep->mpa_timer = NULL;

We explicitly set it NULL here since we are handling
the timeout, so the MPA timer has fired and we
cannot cancel it anymore. At the end of
siw_cm_work_handler(), any work, including the MPA
timeout, gets recycled to the cep's list of free work
items via siw_put_work(). This list is recycled only at
the end of the cep's lifetime.


> Let's say mpa_timer is set to NULL before  call siw_cancel_mpatimer.
> 
> >   				siw_cep_set_free(new_cep);
> >   				goto error;
> >   			}
> > @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct
> *w)
> >   				/*
> >   				 * Socket close before MPA request received.
> >   				 */
> > -				siw_dbg_cep(cep, "no mpareq: drop listener\n");
> > -				siw_cep_put(cep->listen_cep);
> > -				cep->listen_cep = NULL;
> > +				if (cep->listen_cep) {
> > +					siw_dbg_cep(cep,
> > +						"no mpareq: drop listener\n");
> > +					siw_cep_put(cep->listen_cep);
> > +					cep->listen_cep = NULL;
> > +				}
> >   			}
> >   		}
> >   		release_cep = 1;
> > @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk)
> >   	if (!cep)
> >   		goto out;
> >
> > -	siw_dbg_cep(cep, "state: %d\n", cep->state);
> > +	siw_dbg_cep(cep, "cep state: %d, socket state %d\n",
> > +		    cep->state, sk->sk_state);
> > +
> > +	if (sk->sk_state != TCP_ESTABLISHED)
> > +		goto out;
> >
> 
> Maybe split above to another patch makes more sense, just my $0.02.
> 
I preferred to have it in one patch, since it was all triggered
by your patch b056327bee09
'RDMA/siw: Balance the reference of cep->kref in the error path'

Alarmed by that, I re-tested many error cases including a failing
MPA request processing, which triggered this NULL bug. It also
showed me this useless extra read event processing when the TCP
socket is closing while the endpoint is still in MPA handshake mode.

But you are right, it is not necessarily related. I can split it.


> Thanks,
> Guoqing
Guoqing Jiang Sept. 11, 2023, 1:52 a.m. UTC | #3
On 9/8/23 18:55, Bernard Metzler wrote:
>> -----Original Message-----
>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
>> Sent: Friday, 8 September 2023 03:35
>> To: Bernard Metzler <BMT@zurich.ibm.com>; linux-rdma@vger.kernel.org
>> Cc: jgg@ziepe.ca; leon@kernel.org
>> Subject: [EXTERNAL] Re: [PATCH v2] RDMA/siw: Fix connection failure
>> handling
>>
>> Hi,
>>
>> On 9/5/23 22:58, Bernard Metzler wrote:
>>> In case immediate MPA request processing fails, the newly
>>> created endpoint unlinks the listening endpoint and is
>>> ready to be dropped. This special case was not handled
>>> correctly by the code handling the later TCP socket close,
>>> causing a NULL dereference crash in siw_cm_work_handler()
>>> when dereferencing a NULL listener. We now also cancel
>>> the useless MPA timeout, if immediate MPA request
>>> processing fails.
>>>
>>> This patch furthermore simplifies MPA processing in general:
>>> Scheduling a useless TCP socket read in sk_data_ready() upcall
>>> is now surpressed, if the socket is already moved out of
>>> TCP_ESTABLISHED state.
>>>
>>> Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
>>> Signed-off-by: Bernard Metzler<bmt@zurich.ibm.com>
>>> ---
>>> ChangeLog v1->v2:
>>> - Move debug message to now conditional listener drop
>>> ---
>>>    drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++----
>>>    1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>> b/drivers/infiniband/sw/siw/siw_cm.c
>>> index a2605178f4ed..43e776073f49 100644
>>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>>> @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep)
>>>    			siw_cep_put(cep);
>>>    			new_cep->listen_cep = NULL;
>>>    			if (rv) {
>>> +				siw_cancel_mpatimer(new_cep);
>> One question, why siw_cm_work_handler set cep->mpa_timer to NULL instead
>> of call
>> siw_cancel_mpatimer like above? Could it be memory leak issue for
>> cep->mpa_timer?
> Thanks for reviewing.
>
> Above in the proposed patch, we cancel a pending
> MPA timer.
>
> You are referring to siw_cm.c:1115
>
>          case SIW_CM_WORK_MPATIMEOUT:
>                  cep->mpa_timer = NULL;
>
> We explicitly set it NULL here since we are handling
> the timeout, so the MPA timer has fired and we
> cannot cancel it anymore. At the end of
> siw_cm_work_handler(), any work, including the MPA
> timeout, gets recycled to the cep's list of free work
> items via siw_put_work(). This list is recycled only at
> the end of the cep's lifetime.

Got it, and thanks for detailed explanation!

>> Let's say mpa_timer is set to NULL before  call siw_cancel_mpatimer.
>>
>>>    				siw_cep_set_free(new_cep);
>>>    				goto error;
>>>    			}
>>> @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct
>> *w)
>>>    				/*
>>>    				 * Socket close before MPA request received.
>>>    				 */
>>> -				siw_dbg_cep(cep, "no mpareq: drop listener\n");
>>> -				siw_cep_put(cep->listen_cep);
>>> -				cep->listen_cep = NULL;
>>> +				if (cep->listen_cep) {
>>> +					siw_dbg_cep(cep,
>>> +						"no mpareq: drop listener\n");
>>> +					siw_cep_put(cep->listen_cep);
>>> +					cep->listen_cep = NULL;
>>> +				}
>>>    			}
>>>    		}
>>>    		release_cep = 1;
>>> @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk)
>>>    	if (!cep)
>>>    		goto out;
>>>
>>> -	siw_dbg_cep(cep, "state: %d\n", cep->state);
>>> +	siw_dbg_cep(cep, "cep state: %d, socket state %d\n",
>>> +		    cep->state, sk->sk_state);
>>> +
>>> +	if (sk->sk_state != TCP_ESTABLISHED)
>>> +		goto out;
>>>
>> Maybe split above to another patch makes more sense, just my $0.02.
>>
> I preferred to have it in one patch, since it was all triggered
> by your patch b056327bee09
> 'RDMA/siw: Balance the reference of cep->kref in the error path'
>
> Alarmed by that, I re-tested many error cases including a failing
> MPA request processing, which triggered this NULL bug. It also
> showed me this useless extra read event processing when the TCP
> socket is closing while the endpoint is still in MPA handshake mode.

Good to know the background :).

> But you are right, it is not necessarily related. I can split it.

With or without the split, Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>

Thanks,
Guoqing
Leon Romanovsky Sept. 11, 2023, 11:56 a.m. UTC | #4
On Tue, 05 Sep 2023 16:58:22 +0200, Bernard Metzler wrote:
> In case immediate MPA request processing fails, the newly
> created endpoint unlinks the listening endpoint and is
> ready to be dropped. This special case was not handled
> correctly by the code handling the later TCP socket close,
> causing a NULL dereference crash in siw_cm_work_handler()
> when dereferencing a NULL listener. We now also cancel
> the useless MPA timeout, if immediate MPA request
> processing fails.
> 
> [...]

Applied, thanks!

[1/1] RDMA/siw: Fix connection failure handling
      https://git.kernel.org/rdma/rdma/c/53a3f777049771

Best regards,
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index a2605178f4ed..43e776073f49 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -976,6 +976,7 @@  static void siw_accept_newconn(struct siw_cep *cep)
 			siw_cep_put(cep);
 			new_cep->listen_cep = NULL;
 			if (rv) {
+				siw_cancel_mpatimer(new_cep);
 				siw_cep_set_free(new_cep);
 				goto error;
 			}
@@ -1100,9 +1101,12 @@  static void siw_cm_work_handler(struct work_struct *w)
 				/*
 				 * Socket close before MPA request received.
 				 */
-				siw_dbg_cep(cep, "no mpareq: drop listener\n");
-				siw_cep_put(cep->listen_cep);
-				cep->listen_cep = NULL;
+				if (cep->listen_cep) {
+					siw_dbg_cep(cep,
+						"no mpareq: drop listener\n");
+					siw_cep_put(cep->listen_cep);
+					cep->listen_cep = NULL;
+				}
 			}
 		}
 		release_cep = 1;
@@ -1227,7 +1231,11 @@  static void siw_cm_llp_data_ready(struct sock *sk)
 	if (!cep)
 		goto out;
 
-	siw_dbg_cep(cep, "state: %d\n", cep->state);
+	siw_dbg_cep(cep, "cep state: %d, socket state %d\n",
+		    cep->state, sk->sk_state);
+
+	if (sk->sk_state != TCP_ESTABLISHED)
+		goto out;
 
 	switch (cep->state) {
 	case SIW_EPSTATE_RDMA_MODE: