diff mbox series

[for-next,v2] RDMA/rxe: Fix "Replace mr by rkey in responder resources"

Message ID 20220418174103.3040-1-rpearsonhpe@gmail.com (mailing list archive)
State Accepted
Commit 570a4bf7440e9fb2a4164244a6bf60a46362b627
Delegated to: Jason Gunthorpe
Headers show
Series [for-next,v2] RDMA/rxe: Fix "Replace mr by rkey in responder resources" | expand

Commit Message

Bob Pearson April 18, 2022, 5:41 p.m. UTC
The rping benchmark fails on long runs. The root cause of this
failure has been traced to a failure to compute a nonzero value of mr
in rare situations.

Fix this failure by correctly handling the computation of mr in
read_reply() in rxe_resp.c in the replay flow.

Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c1209@linux.dev/
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v2
  Renamed commit
  Changed fixes line to correctly ID the bug
  Added a link to the reported mr == NULL issue

 drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


base-commit: 98c8026331ceabe1df579940b81eec75eb49cdd9

Comments

Jason Gunthorpe April 19, 2022, 4:18 p.m. UTC | #1
On Mon, Apr 18, 2022 at 12:41:04PM -0500, Bob Pearson wrote:
> The rping benchmark fails on long runs. The root cause of this
> failure has been traced to a failure to compute a nonzero value of mr
> in rare situations.
> 
> Fix this failure by correctly handling the computation of mr in
> read_reply() in rxe_resp.c in the replay flow.
> 
> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
> Link: https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c1209@linux.dev/
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> v2
>   Renamed commit
>   Changed fixes line to correctly ID the bug
>   Added a link to the reported mr == NULL issue
> 
>  drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

I'm confused, does this one replace this patch:

https://lore.kernel.org/all/20220411030647.20011-1-rpearsonhpe@gmail.com/

?

It has the same title but is completely different

Jason
Pearson, Robert B April 19, 2022, 10:02 p.m. UTC | #2
> I'm confused, does this one replace this patch:
> https://lore.kernel.org/all/20220411030647.20011-1-rpearsonhpe@gmail.com/
> ?
> It has the same title but is completely different
> Jason

This is a new bug. It needs a better title. Is the old one still hanging around or was it accepted upstream?
We could call this "Fix read_reply in rxe_resp.c" or anything else that works for you.
Jason Gunthorpe April 20, 2022, 3:53 p.m. UTC | #3
On Mon, Apr 18, 2022 at 12:41:04PM -0500, Bob Pearson wrote:
> The rping benchmark fails on long runs. The root cause of this
> failure has been traced to a failure to compute a nonzero value of mr
> in rare situations.
> 
> Fix this failure by correctly handling the computation of mr in
> read_reply() in rxe_resp.c in the replay flow.
> 
> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
> Link: https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c1209@linux.dev/
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> v2
>   Renamed commit
>   Changed fixes line to correctly ID the bug
>   Added a link to the reported mr == NULL issue
> 
>  drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

Applied to for-rc, thanks

Jason
Pearson, Robert B April 20, 2022, 4:06 p.m. UTC | #4
thanks

-----Original Message-----
From: Jason Gunthorpe <jgg@nvidia.com> 
Sent: Wednesday, April 20, 2022 10:53 AM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next v2] RDMA/rxe: Fix "Replace mr by rkey in responder resources"

On Mon, Apr 18, 2022 at 12:41:04PM -0500, Bob Pearson wrote:
> The rping benchmark fails on long runs. The root cause of this failure 
> has been traced to a failure to compute a nonzero value of mr in rare 
> situations.
> 
> Fix this failure by correctly handling the computation of mr in
> read_reply() in rxe_resp.c in the replay flow.
> 
> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder 
> resources")
> Link: 
> https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c120
> 9@linux.dev/
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> v2
>   Renamed commit
>   Changed fixes line to correctly ID the bug
>   Added a link to the reported mr == NULL issue
> 
>  drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

Applied to for-rc, thanks

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e2653a8721fe..2e627685e804 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -734,8 +734,14 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 	}
 
 	if (res->state == rdatm_res_state_new) {
-		mr = qp->resp.mr;
-		qp->resp.mr = NULL;
+		if (!res->replay) {
+			mr = qp->resp.mr;
+			qp->resp.mr = NULL;
+		} else {
+			mr = rxe_recheck_mr(qp, res->read.rkey);
+			if (!mr)
+				return RESPST_ERR_RKEY_VIOLATION;
+		}
 
 		if (res->read.resid <= mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;