diff mbox

[v2,4/4] rsockets: acknowledge completion queue events in batch

Message ID c144abfe27bd0bf36ff1e3c9c912d7c3@imap.linux.ibm.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Sreedhar Kodali Sept. 5, 2014, 1:23 p.m. UTC
From: Sreedhar Kodali <srkodali@linux.vnet.ibm.com>

     Expose '/unackcqe_default' variable so completion event
     acknowledgment can be done in batch instead of singles
     to minimize locking overheads.

     The default value of this variable is 1 so the existing
     functionality is preserved.  Having a higher value delays
     acknowledging until the specified limit is reached for
     retrieved completion events.

     Signed-off-by: Sreedhar Kodali <srkodali@linux.vnet.ibm.com>
     ---

  	fastlock_init(&rs->cq_lock);
@@ -1044,8 +1055,11 @@ static void rs_free(struct rsocket *rs)

  	if (rs->cm_id) {
  		rs_free_iomappings(rs);
-		if (rs->cm_id->qp)
+		if (rs->cm_id->qp) {
+			if (rs->unackcqe > 0)
+				ibv_ack_cq_events(rs->cm_id->recv_cq, rs->unackcqe);
  			rdma_destroy_qp(rs->cm_id);
+		}
  		rdma_destroy_id(rs->cm_id);
  	}

@@ -2026,7 +2040,11 @@ static int rs_get_cq_event(struct rsocket *rs)
  resume_get_cq_event:
  	ret = ibv_get_cq_event(rs->cm_id->recv_cq_channel, &cq, &context);
  	if (!ret) {
-		ibv_ack_cq_events(rs->cm_id->recv_cq, 1);
+		rs->unackcqe += 1;
+		if (rs->unackcqe == def_unackcqe) {
+			ibv_ack_cq_events(rs->cm_id->recv_cq, rs->unackcqe);
+			rs->unackcqe = 0;
+		}
  		rs->cq_armed = 0;
  	} else if (restart_onintr == 1 && errno == EINTR) {
  		errno = 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hefty, Sean Sept. 5, 2014, 6:01 p.m. UTC | #1
>      Expose '/unackcqe_default' variable so completion event
>      acknowledgment can be done in batch instead of singles
>      to minimize locking overheads.
> 
>      The default value of this variable is 1 so the existing
>      functionality is preserved.  Having a higher value delays
>      acknowledging until the specified limit is reached for
>      retrieved completion events.

I would rather just change the default behavior, rather than introduce more checks and options into the code.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sreedhar Kodali Sept. 6, 2014, 2:22 a.m. UTC | #2
On 2014-09-05 23:31, Hefty, Sean wrote:
>>      Expose '/unackcqe_default' variable so completion event
>>      acknowledgment can be done in batch instead of singles
>>      to minimize locking overheads.
>> 
>>      The default value of this variable is 1 so the existing
>>      functionality is preserved.  Having a higher value delays
>>      acknowledging until the specified limit is reached for
>>      retrieved completion events.
> 
> I would rather just change the default behavior, rather than introduce
> more checks and options into the code.

Hi Sean,

That's indeed a good approach.  But, we need to have an upper limit
before a batch of events is flushed.  Can we decide limit based on
completion event queue size?

Thank You.

- Sreedhar

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Sept. 6, 2014, 4:37 p.m. UTC | #3
> That's indeed a good approach.  But, we need to have an upper limit
> before a batch of events is flushed.  Can we decide limit based on
> completion event queue size?

I think that's fine.  We only need to ack before wrapping the 32-bit value, plus ensure that everything is acked when closing the rsocket.

But, does this change really make any difference?  At the point where ack is called, there's no contention, so the overhead seems small considering that we in a part of the code where we likely just blocked the thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sreedhar Kodali Sept. 7, 2014, 3:01 a.m. UTC | #4
On 2014-09-06 22:07, Hefty, Sean wrote:
>> That's indeed a good approach.  But, we need to have an upper limit
>> before a batch of events is flushed.  Can we decide limit based on
>> completion event queue size?
> 
> I think that's fine.  We only need to ack before wrapping the 32-bit
> value, plus ensure that everything is acked when closing the rsocket.
> 

Sure.  I will revert back to you with modified patch.

> But, does this change really make any difference?  At the point where
> ack is called, there's no contention, so the overhead seems small
> considering that we in a part of the code where we likely just blocked
> the thread.

Yes.  It indeed makes a small improvement when all cores are fully
saturated in a scale-up environments.

Thank You.

- Sreedhar

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/rsocket.c b/src/rsocket.c
index ffea0ca..3c39da7 100644
--- a/src/rsocket.c
+++ b/src/rsocket.c
@@ -118,6 +118,7 @@  static uint32_t polling_time = 10;
  static uint16_t restart_onintr = 0;
  static uint16_t next_comp_vector = 0;
  static uint64_t comp_vector_mask = 0;
+static uint32_t def_unackcqe = 1;

  /*
   * Immediate data format is determined by the upper bits
@@ -384,6 +385,7 @@  struct rsocket {
  	dlist_entry	  iomap_list;
  	dlist_entry	  iomap_queue;
  	int		  iomap_pending;
+	int	      unackcqe;
  };

  #define DS_UDP_TAG 0x55555555
@@ -581,6 +583,14 @@  void rs_configure(void)
  			}
  		}
  	}
+
+	if ((f = fopen(RS_CONF_DIR "/unackcqe_default", "r"))) {
+		(void) fscanf(f, "%u", &def_unackcqe);
+		fclose(f);
+		if (def_unackcqe < 1) {
+			def_unackcqe = 1;
+		}
+	}
  	init = 1;
  out:
  	pthread_mutex_unlock(&mut);
@@ -638,6 +648,7 @@  static struct rsocket *rs_alloc(struct rsocket 
*inherited_rs, int type)
  			rs->target_iomap_size = def_iomap_size;
  		}
  	}
+	rs->unackcqe = 0;
  	fastlock_init(&rs->slock);
  	fastlock_init(&rs->rlock);