diff mbox

IB/rxe: fix for duplicate request processing and ack psns

Message ID 20180613012047.GA4859@attalasystems.com (mailing list archive)
State Accepted
Headers show

Commit Message

Vijay Immanuel June 13, 2018, 1:20 a.m. UTC
Don't reset the resp opcode for a replayed read response.
The resp opcode could be in the middle of a write or send
sequence, when the duplicate read request was received.
An example sequence is as follows:
- Receive read request for 12KB PSN 20. Transmit read response
  first, middle and last with PSNs 20,21,22.
- Receive write first PSN 23.
  At this point the resp psn is 24 and resp opcode is write first.
- The sender notices that PSN 20 is dropped and retransmits.
  Receive read request for 12KB PSN 20. Transmit read response
  first, middle and last with PSNs 20,21,22. The resp opcode is
  set to -1, the resp psn remains 24.
- Receive write first PSN 23. This is processed by duplicate_request().
  The resp opcode remains -1 and resp psn remains 24.
- Receive write middle PSN 24. check_op_seq() reports a missing
  first error since the resp opcode is -1.

When sending an ack for a duplicate send or write request,
use the psn of the previous ack sent. Do not use the psn
of a read response for the ack.
An example sequence is as follows:
- Receive write PSN 30. Transmit ACK for PSN 30.
- Receive read request 4KB PSN 31. Transmit read response with
  PSN 31. The resp psn is now 32.
- The sender notices that PSN 30 is dropped and retransmits.
  Receive write PSN 30. duplicate_request() sends an ACK with
  PSN 31. That is incorrect since PSN 31 was a read request.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c  | 8 ++++++--
 drivers/infiniband/sw/rxe/rxe_verbs.h | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

Comments

Doug Ledford Aug. 30, 2018, 9:21 p.m. UTC | #1
On Tue, 2018-06-12 at 18:20 -0700, Vijay Immanuel wrote:
> Don't reset the resp opcode for a replayed read response.
> The resp opcode could be in the middle of a write or send
> sequence, when the duplicate read request was received.
> An example sequence is as follows:
> - Receive read request for 12KB PSN 20. Transmit read response
>   first, middle and last with PSNs 20,21,22.
> - Receive write first PSN 23.
>   At this point the resp psn is 24 and resp opcode is write first.
> - The sender notices that PSN 20 is dropped and retransmits.
>   Receive read request for 12KB PSN 20. Transmit read response
>   first, middle and last with PSNs 20,21,22. The resp opcode is
>   set to -1, the resp psn remains 24.
> - Receive write first PSN 23. This is processed by duplicate_request().
>   The resp opcode remains -1 and resp psn remains 24.
> - Receive write middle PSN 24. check_op_seq() reports a missing
>   first error since the resp opcode is -1.
> 
> When sending an ack for a duplicate send or write request,
> use the psn of the previous ack sent. Do not use the psn
> of a read response for the ack.
> An example sequence is as follows:
> - Receive write PSN 30. Transmit ACK for PSN 30.
> - Receive read request 4KB PSN 31. Transmit read response with
>   PSN 31. The resp psn is now 32.
> - The sender notices that PSN 30 is dropped and retransmits.
>   Receive write PSN 30. duplicate_request() sends an ACK with
>   PSN 31. That is incorrect since PSN 31 was a read request.
> 
> Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>

This has been left to languish due to lack of competent review (both
Jason and I beg off on this level of technical review of rxe).

As a result, I've decided to go ahead and put this into for-next at the
earliest possible stage so that it gets the most testing possible before
the next merge window.

So, thanks, applied to for-next.
diff mbox

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 955ff3b..9ea3bcf 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -683,6 +683,7 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 		rxe_advance_resp_resource(qp);
 
 		res->type		= RXE_READ_MASK;
+		res->replay		= 0;
 
 		res->read.va		= qp->resp.va;
 		res->read.va_org	= qp->resp.va;
@@ -753,7 +754,8 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 		state = RESPST_DONE;
 	} else {
 		qp->resp.res = NULL;
-		qp->resp.opcode = -1;
+		if (!res->replay)
+			qp->resp.opcode = -1;
 		if (psn_compare(res->cur_psn, qp->resp.psn) >= 0)
 			qp->resp.psn = res->cur_psn;
 		state = RESPST_CLEANUP;
@@ -815,6 +817,7 @@  static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 
 	/* next expected psn, read handles this separately */
 	qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+	qp->resp.ack_psn = qp->resp.psn;
 
 	qp->resp.opcode = pkt->opcode;
 	qp->resp.status = IB_WC_SUCCESS;
@@ -1061,7 +1064,7 @@  static enum resp_states duplicate_request(struct rxe_qp *qp,
 					  struct rxe_pkt_info *pkt)
 {
 	enum resp_states rc;
-	u32 prev_psn = (qp->resp.psn - 1) & BTH_PSN_MASK;
+	u32 prev_psn = (qp->resp.ack_psn - 1) & BTH_PSN_MASK;
 
 	if (pkt->mask & RXE_SEND_MASK ||
 	    pkt->mask & RXE_WRITE_MASK) {
@@ -1104,6 +1107,7 @@  static enum resp_states duplicate_request(struct rxe_qp *qp,
 			res->state = (pkt->psn == res->first_psn) ?
 					rdatm_res_state_new :
 					rdatm_res_state_replay;
+			res->replay = 1;
 
 			/* Reset the resource, except length. */
 			res->read.va_org = iova;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index af1470d..332a16da 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -171,6 +171,7 @@  enum rdatm_res_state {
 
 struct resp_res {
 	int			type;
+	int			replay;
 	u32			first_psn;
 	u32			last_psn;
 	u32			cur_psn;
@@ -195,6 +196,7 @@  struct rxe_resp_info {
 	enum rxe_qp_state	state;
 	u32			msn;
 	u32			psn;
+	u32			ack_psn;
 	int			opcode;
 	int			drop_msg;
 	int			goto_error;