Message ID | 20230905145822.446263-1-bmt@zurich.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v2] RDMA/siw: Fix connection failure handling | expand |
Hi, On 9/5/23 22:58, Bernard Metzler wrote: > In case immediate MPA request processing fails, the newly > created endpoint unlinks the listening endpoint and is > ready to be dropped. This special case was not handled > correctly by the code handling the later TCP socket close, > causing a NULL dereference crash in siw_cm_work_handler() > when dereferencing a NULL listener. We now also cancel > the useless MPA timeout, if immediate MPA request > processing fails. > > This patch furthermore simplifies MPA processing in general: > Scheduling a useless TCP socket read in sk_data_ready() upcall > is now surpressed, if the socket is already moved out of > TCP_ESTABLISHED state. > > Fixes: 6c52fdc244b5 ("rdma/siw: connection management") > Signed-off-by: Bernard Metzler<bmt@zurich.ibm.com> > --- > ChangeLog v1->v2: > - Move debug message to now conditional listener drop > --- > drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c > index a2605178f4ed..43e776073f49 100644 > --- a/drivers/infiniband/sw/siw/siw_cm.c > +++ b/drivers/infiniband/sw/siw/siw_cm.c > @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep) > siw_cep_put(cep); > new_cep->listen_cep = NULL; > if (rv) { > + siw_cancel_mpatimer(new_cep); One question, why siw_cm_work_handler set cep->mpa_timer to NULL instead of call siw_cancel_mpatimer like above? Could it be memory leak issue for cep->mpa_timer? Let's say mpa_timer is set to NULL before call siw_cancel_mpatimer. > siw_cep_set_free(new_cep); > goto error; > } > @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct *w) > /* > * Socket close before MPA request received. > */ > - siw_dbg_cep(cep, "no mpareq: drop listener\n"); > - siw_cep_put(cep->listen_cep); > - cep->listen_cep = NULL; > + if (cep->listen_cep) { > + siw_dbg_cep(cep, > + "no mpareq: drop listener\n"); > + siw_cep_put(cep->listen_cep); > + cep->listen_cep = NULL; > + } > } > } > release_cep = 1; > @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk) > if (!cep) > goto out; > > - siw_dbg_cep(cep, "state: %d\n", cep->state); > + siw_dbg_cep(cep, "cep state: %d, socket state %d\n", > + cep->state, sk->sk_state); > + > + if (sk->sk_state != TCP_ESTABLISHED) > + goto out; > Maybe split above to another patch makes more sense, just my $0.02. Thanks, Guoqing
> -----Original Message----- > From: Guoqing Jiang <guoqing.jiang@linux.dev> > Sent: Friday, 8 September 2023 03:35 > To: Bernard Metzler <BMT@zurich.ibm.com>; linux-rdma@vger.kernel.org > Cc: jgg@ziepe.ca; leon@kernel.org > Subject: [EXTERNAL] Re: [PATCH v2] RDMA/siw: Fix connection failure > handling > > Hi, > > On 9/5/23 22:58, Bernard Metzler wrote: > > In case immediate MPA request processing fails, the newly > > created endpoint unlinks the listening endpoint and is > > ready to be dropped. This special case was not handled > > correctly by the code handling the later TCP socket close, > > causing a NULL dereference crash in siw_cm_work_handler() > > when dereferencing a NULL listener. We now also cancel > > the useless MPA timeout, if immediate MPA request > > processing fails. > > > > This patch furthermore simplifies MPA processing in general: > > Scheduling a useless TCP socket read in sk_data_ready() upcall > > is now surpressed, if the socket is already moved out of > > TCP_ESTABLISHED state. > > > > Fixes: 6c52fdc244b5 ("rdma/siw: connection management") > > Signed-off-by: Bernard Metzler<bmt@zurich.ibm.com> > > --- > > ChangeLog v1->v2: > > - Move debug message to now conditional listener drop > > --- > > drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++---- > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/infiniband/sw/siw/siw_cm.c > b/drivers/infiniband/sw/siw/siw_cm.c > > index a2605178f4ed..43e776073f49 100644 > > --- a/drivers/infiniband/sw/siw/siw_cm.c > > +++ b/drivers/infiniband/sw/siw/siw_cm.c > > @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep) > > siw_cep_put(cep); > > new_cep->listen_cep = NULL; > > if (rv) { > > + siw_cancel_mpatimer(new_cep); > > One question, why siw_cm_work_handler set cep->mpa_timer to NULL instead > of call > siw_cancel_mpatimer like above? Could it be memory leak issue for > cep->mpa_timer? Thanks for reviewing. Above in the proposed patch, we cancel a pending MPA timer. You are referring to siw_cm.c:1115 case SIW_CM_WORK_MPATIMEOUT: cep->mpa_timer = NULL; We explicitly set it NULL here since we are handling the timeout, so the MPA timer has fired and we cannot cancel it anymore. At the end of siw_cm_work_handler(), any work, including the MPA timeout, gets recycled to the cep's list of free work items via siw_put_work(). This list is recycled only at the end of the cep's lifetime. > Let's say mpa_timer is set to NULL before call siw_cancel_mpatimer. > > > siw_cep_set_free(new_cep); > > goto error; > > } > > @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct > *w) > > /* > > * Socket close before MPA request received. > > */ > > - siw_dbg_cep(cep, "no mpareq: drop listener\n"); > > - siw_cep_put(cep->listen_cep); > > - cep->listen_cep = NULL; > > + if (cep->listen_cep) { > > + siw_dbg_cep(cep, > > + "no mpareq: drop listener\n"); > > + siw_cep_put(cep->listen_cep); > > + cep->listen_cep = NULL; > > + } > > } > > } > > release_cep = 1; > > @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk) > > if (!cep) > > goto out; > > > > - siw_dbg_cep(cep, "state: %d\n", cep->state); > > + siw_dbg_cep(cep, "cep state: %d, socket state %d\n", > > + cep->state, sk->sk_state); > > + > > + if (sk->sk_state != TCP_ESTABLISHED) > > + goto out; > > > > Maybe split above to another patch makes more sense, just my $0.02. > I preferred to have it in one patch, since it was all triggered by your patch b056327bee09 'RDMA/siw: Balance the reference of cep->kref in the error path' Alarmed by that, I re-tested many error cases including a failing MPA request processing, which triggered this NULL bug. It also showed me this useless extra read event processing when the TCP socket is closing while the endpoint is still in MPA handshake mode. But you are right, it is not necessarily related. I can split it. > Thanks, > Guoqing
On 9/8/23 18:55, Bernard Metzler wrote: >> -----Original Message----- >> From: Guoqing Jiang <guoqing.jiang@linux.dev> >> Sent: Friday, 8 September 2023 03:35 >> To: Bernard Metzler <BMT@zurich.ibm.com>; linux-rdma@vger.kernel.org >> Cc: jgg@ziepe.ca; leon@kernel.org >> Subject: [EXTERNAL] Re: [PATCH v2] RDMA/siw: Fix connection failure >> handling >> >> Hi, >> >> On 9/5/23 22:58, Bernard Metzler wrote: >>> In case immediate MPA request processing fails, the newly >>> created endpoint unlinks the listening endpoint and is >>> ready to be dropped. This special case was not handled >>> correctly by the code handling the later TCP socket close, >>> causing a NULL dereference crash in siw_cm_work_handler() >>> when dereferencing a NULL listener. We now also cancel >>> the useless MPA timeout, if immediate MPA request >>> processing fails. >>> >>> This patch furthermore simplifies MPA processing in general: >>> Scheduling a useless TCP socket read in sk_data_ready() upcall >>> is now surpressed, if the socket is already moved out of >>> TCP_ESTABLISHED state. >>> >>> Fixes: 6c52fdc244b5 ("rdma/siw: connection management") >>> Signed-off-by: Bernard Metzler<bmt@zurich.ibm.com> >>> --- >>> ChangeLog v1->v2: >>> - Move debug message to now conditional listener drop >>> --- >>> drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++---- >>> 1 file changed, 12 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c >> b/drivers/infiniband/sw/siw/siw_cm.c >>> index a2605178f4ed..43e776073f49 100644 >>> --- a/drivers/infiniband/sw/siw/siw_cm.c >>> +++ b/drivers/infiniband/sw/siw/siw_cm.c >>> @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep) >>> siw_cep_put(cep); >>> new_cep->listen_cep = NULL; >>> if (rv) { >>> + siw_cancel_mpatimer(new_cep); >> One question, why siw_cm_work_handler set cep->mpa_timer to NULL instead >> of call >> siw_cancel_mpatimer like above? Could it be memory leak issue for >> cep->mpa_timer? > Thanks for reviewing. > > Above in the proposed patch, we cancel a pending > MPA timer. > > You are referring to siw_cm.c:1115 > > case SIW_CM_WORK_MPATIMEOUT: > cep->mpa_timer = NULL; > > We explicitly set it NULL here since we are handling > the timeout, so the MPA timer has fired and we > cannot cancel it anymore. At the end of > siw_cm_work_handler(), any work, including the MPA > timeout, gets recycled to the cep's list of free work > items via siw_put_work(). This list is recycled only at > the end of the cep's lifetime. Got it, and thanks for detailed explanation! >> Let's say mpa_timer is set to NULL before call siw_cancel_mpatimer. >> >>> siw_cep_set_free(new_cep); >>> goto error; >>> } >>> @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct >> *w) >>> /* >>> * Socket close before MPA request received. >>> */ >>> - siw_dbg_cep(cep, "no mpareq: drop listener\n"); >>> - siw_cep_put(cep->listen_cep); >>> - cep->listen_cep = NULL; >>> + if (cep->listen_cep) { >>> + siw_dbg_cep(cep, >>> + "no mpareq: drop listener\n"); >>> + siw_cep_put(cep->listen_cep); >>> + cep->listen_cep = NULL; >>> + } >>> } >>> } >>> release_cep = 1; >>> @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk) >>> if (!cep) >>> goto out; >>> >>> - siw_dbg_cep(cep, "state: %d\n", cep->state); >>> + siw_dbg_cep(cep, "cep state: %d, socket state %d\n", >>> + cep->state, sk->sk_state); >>> + >>> + if (sk->sk_state != TCP_ESTABLISHED) >>> + goto out; >>> >> Maybe split above to another patch makes more sense, just my $0.02. >> > I preferred to have it in one patch, since it was all triggered > by your patch b056327bee09 > 'RDMA/siw: Balance the reference of cep->kref in the error path' > > Alarmed by that, I re-tested many error cases including a failing > MPA request processing, which triggered this NULL bug. It also > showed me this useless extra read event processing when the TCP > socket is closing while the endpoint is still in MPA handshake mode. Good to know the background :). > But you are right, it is not necessarily related. I can split it. With or without the split, Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Thanks, Guoqing
On Tue, 05 Sep 2023 16:58:22 +0200, Bernard Metzler wrote: > In case immediate MPA request processing fails, the newly > created endpoint unlinks the listening endpoint and is > ready to be dropped. This special case was not handled > correctly by the code handling the later TCP socket close, > causing a NULL dereference crash in siw_cm_work_handler() > when dereferencing a NULL listener. We now also cancel > the useless MPA timeout, if immediate MPA request > processing fails. > > [...] Applied, thanks! [1/1] RDMA/siw: Fix connection failure handling https://git.kernel.org/rdma/rdma/c/53a3f777049771 Best regards,
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c index a2605178f4ed..43e776073f49 100644 --- a/drivers/infiniband/sw/siw/siw_cm.c +++ b/drivers/infiniband/sw/siw/siw_cm.c @@ -976,6 +976,7 @@ static void siw_accept_newconn(struct siw_cep *cep) siw_cep_put(cep); new_cep->listen_cep = NULL; if (rv) { + siw_cancel_mpatimer(new_cep); siw_cep_set_free(new_cep); goto error; } @@ -1100,9 +1101,12 @@ static void siw_cm_work_handler(struct work_struct *w) /* * Socket close before MPA request received. */ - siw_dbg_cep(cep, "no mpareq: drop listener\n"); - siw_cep_put(cep->listen_cep); - cep->listen_cep = NULL; + if (cep->listen_cep) { + siw_dbg_cep(cep, + "no mpareq: drop listener\n"); + siw_cep_put(cep->listen_cep); + cep->listen_cep = NULL; + } } } release_cep = 1; @@ -1227,7 +1231,11 @@ static void siw_cm_llp_data_ready(struct sock *sk) if (!cep) goto out; - siw_dbg_cep(cep, "state: %d\n", cep->state); + siw_dbg_cep(cep, "cep state: %d, socket state %d\n", + cep->state, sk->sk_state); + + if (sk->sk_state != TCP_ESTABLISHED) + goto out; switch (cep->state) { case SIW_EPSTATE_RDMA_MODE:
In case immediate MPA request processing fails, the newly created endpoint unlinks the listening endpoint and is ready to be dropped. This special case was not handled correctly by the code handling the later TCP socket close, causing a NULL dereference crash in siw_cm_work_handler() when dereferencing a NULL listener. We now also cancel the useless MPA timeout, if immediate MPA request processing fails. This patch furthermore simplifies MPA processing in general: Scheduling a useless TCP socket read in sk_data_ready() upcall is now surpressed, if the socket is already moved out of TCP_ESTABLISHED state. Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> --- ChangeLog v1->v2: - Move debug message to now conditional listener drop --- drivers/infiniband/sw/siw/siw_cm.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)