Message ID | 20220418174103.3040-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 570a4bf7440e9fb2a4164244a6bf60a46362b627 |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-next,v2] RDMA/rxe: Fix "Replace mr by rkey in responder resources" | expand |
On Mon, Apr 18, 2022 at 12:41:04PM -0500, Bob Pearson wrote: > The rping benchmark fails on long runs. The root cause of this > failure has been traced to a failure to compute a nonzero value of mr > in rare situations. > > Fix this failure by correctly handling the computation of mr in > read_reply() in rxe_resp.c in the replay flow. > > Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") > Link: https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c1209@linux.dev/ > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > v2 > Renamed commit > Changed fixes line to correctly ID the bug > Added a link to the reported mr == NULL issue > > drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) I'm confused, does this one replace this patch: https://lore.kernel.org/all/20220411030647.20011-1-rpearsonhpe@gmail.com/ ? It has the same title but is completely different Jason
> I'm confused, does this one replace this patch: > https://lore.kernel.org/all/20220411030647.20011-1-rpearsonhpe@gmail.com/ > ? > It has the same title but is completely different > Jason This is a new bug. It needs a better title. Is the old one still hanging around or was it accepted upstream? We could call this "Fix read_reply in rxe_resp.c" or anything else that works for you.
On Mon, Apr 18, 2022 at 12:41:04PM -0500, Bob Pearson wrote: > The rping benchmark fails on long runs. The root cause of this > failure has been traced to a failure to compute a nonzero value of mr > in rare situations. > > Fix this failure by correctly handling the computation of mr in > read_reply() in rxe_resp.c in the replay flow. > > Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") > Link: https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c1209@linux.dev/ > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > v2 > Renamed commit > Changed fixes line to correctly ID the bug > Added a link to the reported mr == NULL issue > > drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) Applied to for-rc, thanks Jason
thanks -----Original Message----- From: Jason Gunthorpe <jgg@nvidia.com> Sent: Wednesday, April 20, 2022 10:53 AM To: Bob Pearson <rpearsonhpe@gmail.com> Cc: zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org Subject: Re: [PATCH for-next v2] RDMA/rxe: Fix "Replace mr by rkey in responder resources" On Mon, Apr 18, 2022 at 12:41:04PM -0500, Bob Pearson wrote: > The rping benchmark fails on long runs. The root cause of this failure > has been traced to a failure to compute a nonzero value of mr in rare > situations. > > Fix this failure by correctly handling the computation of mr in > read_reply() in rxe_resp.c in the replay flow. > > Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder > resources") > Link: > https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c120 > 9@linux.dev/ > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > v2 > Renamed commit > Changed fixes line to correctly ID the bug > Added a link to the reported mr == NULL issue > > drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) Applied to for-rc, thanks Jason
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index e2653a8721fe..2e627685e804 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -734,8 +734,14 @@ static enum resp_states read_reply(struct rxe_qp *qp, } if (res->state == rdatm_res_state_new) { - mr = qp->resp.mr; - qp->resp.mr = NULL; + if (!res->replay) { + mr = qp->resp.mr; + qp->resp.mr = NULL; + } else { + mr = rxe_recheck_mr(qp, res->read.rkey); + if (!mr) + return RESPST_ERR_RKEY_VIOLATION; + } if (res->read.resid <= mtu) opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;
The rping benchmark fails on long runs. The root cause of this failure has been traced to a failure to compute a nonzero value of mr in rare situations. Fix this failure by correctly handling the computation of mr in read_reply() in rxe_resp.c in the replay flow. Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") Link: https://lore.kernel.org/linux-rdma/1a9a9190-368d-3442-0a62-443b1a6c1209@linux.dev/ Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- v2 Renamed commit Changed fixes line to correctly ID the bug Added a link to the reported mr == NULL issue drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) base-commit: 98c8026331ceabe1df579940b81eec75eb49cdd9