diff mbox

IB/mlx5: Reduce mlx5_ib_wq cacheline bouncing

Message ID 5694EEAA.3050600@mellanox.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Or Gerlitz Jan. 12, 2016, 12:16 p.m. UTC
On 1/12/2016 12:32 PM, Sagi Grimberg wrote:
> mlx5 keeps a lot of internal accounting for wr processing.
> mlx5_ib_wq consists of multiple arrays:
> struct mlx5_ib_wq {
> 	u64		       *wrid;
> 	u32		       *wr_data;
> 	struct wr_list	       *w_list;
> 	unsigned	       *wqe_head;
> 	...
> }
>
> Each time we access each of these arrays, even for a single index
> we fetch a cacheline. Reduce cacheline bounces by fitting these members
> in a cacheline aligned struct (swr_ctx) and allocate an array. Accessing
> this array will fetch all of these members in a single shot.
>
> Since the receive queue needs only the wrid we use a nameless union
> where in the rwr_ctx we only have wrid member.

Have some performance numbers before/after this patch to support the 
proposed change?

Also, I have asked you the same question re the iser remote invalidation 
series [1], this datais needed there too.

Or.

[1] http://marc.info/?l=linux-rdma&m=145188949807652&w=2

> +	u64		       wrid;
> +}__packed;
> +

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sagi Grimberg Jan. 12, 2016, 2:44 p.m. UTC | #1
>> Each time we access each of these arrays, even for a single index
>> we fetch a cacheline. Reduce cacheline bounces by fitting these members
>> in a cacheline aligned struct (swr_ctx) and allocate an array. Accessing
>> this array will fetch all of these members in a single shot.
>>
>> Since the receive queue needs only the wrid we use a nameless union
>> where in the rwr_ctx we only have wrid member.
>
> Have some performance numbers before/after this patch to support the
> proposed change?

I didn't took the time to measure cache hit/miss. I just noticed it
a while ago and it's been bugging me for some time so I figured I'd
send it out...

> Also, I have asked you the same question re the iser remote invalidation
> series [1], this datais needed there too.

I didn't get to it, we had rough initial numbers, but we have yet to do
a full evaluation on different devices.

> +/* Please don't let this exceed a single cacheline */
> +struct swr_ctx {
> +    u64        wrid;
> +    u32        wr_data;
> +    struct wr_list    w_list;
> +    u32        wqe_head;
> +    u8        rsvd[12];
> +}__packed;
>
>
> what the role of the reserved field, is that for alignment purposes? if
> yes, maybe better namewould be "align"

OK. I can change it.

>
> Nit, (same) checkpatch error here and below
>
> ERROR: space required after that close brace '}'
> #111: FILE: drivers/infiniband/hw/mlx5/mlx5_ib.h:139:
> +}__packed;

Will fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 12, 2016, 2:53 p.m. UTC | #2
On 1/12/2016 4:44 PM, Sagi Grimberg wrote:
>
>>> Each time we access each of these arrays, even for a single index
>>> we fetch a cacheline. Reduce cacheline bounces by fitting these members
>>> in a cacheline aligned struct (swr_ctx) and allocate an array. 
>>> Accessing
>>> this array will fetch all of these members in a single shot.
>>>
>>> Since the receive queue needs only the wrid we use a nameless union
>>> where in the rwr_ctx we only have wrid member.
>>
>> Have some performance numbers before/after this patch to support the
>> proposed change?
>
> I didn't took the time to measure cache hit/miss. I just noticed it
> a while ago and it's been bugging me for some time so I figured I'd
> send it out... 

The thing is that for data-path changes on high performance network 
drivers, we @ least need to know that the perf is as good as it was 
before the change. So you could run your iser perf IOPS test 
before/after the change and post 1-2 lines with results as part of the 
change-log.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 12, 2016, 2:58 p.m. UTC | #3
>> I didn't took the time to measure cache hit/miss. I just noticed it
>> a while ago and it's been bugging me for some time so I figured I'd
>> send it out...
>
> The thing is that for data-path changes on high performance network
> drivers, we @ least need to know that the perf is as good as it was
> before the change. So you could run your iser perf IOPS test
> before/after the change and post 1-2 lines with results as part of the
> change-log.

I tested iser perf and it seems to sustain. I didn't see any major
difference but I really don't think block storage iops/latency is not
the correct way to evaluate this change.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 12, 2016, 3:05 p.m. UTC | #4
On 1/12/2016 4:58 PM, Sagi Grimberg wrote:
>
>>> I didn't took the time to measure cache hit/miss. I just noticed it
>>> a while ago and it's been bugging me for some time so I figured I'd
>>> send it out...
>>
>> The thing is that for data-path changes on high performance network
>> drivers, we @ least need to know that the perf is as good as it was
>> before the change. So you could run your iser perf IOPS test
>> before/after the change and post 1-2 lines with results as part of the
>> change-log.
>
> I tested iser perf and it seems to sustain. I didn't see any major
> difference but I really don't think block storage iops/latency is not
> the correct way to evaluate this change. 

maybe one nice option would be to take a look on how things are 
organized in libmlx5 around that corner
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 12, 2016, 3:09 p.m. UTC | #5
> maybe one nice option would be to take a look on how things are
> organized in libmlx5 around that corner

We don't have this accounting in libmlx5. Although I suspect we'll have
it at some point.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 13, 2016, 8:55 a.m. UTC | #6
On 1/12/2016 4:58 PM, Sagi Grimberg wrote:
> I tested iser perf and it seems to sustain. 

That's good. I also think we should move fwd with the patch once the 
review is passed OKay, even if we don't have the means to evaluate  the 
gain, future will tell.

> I didn't see any major difference but I really don't think block 
> storage iops/latency is not the correct way to evaluate this change. 

yes... pktgen doesn't work for IPoIB (if it was, we could use 64B UDP 
packets w.o csum to see the impact) maybe RDS (rds-stress tool) has more 
ops/sec then conventional block rdma driver?

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -129,11 +129,24 @@  struct wr_list {
  	u16	next;
  };
  
+/* Please don't let this exceed a single cacheline */
+struct swr_ctx {
+	u64		wrid;
+	u32		wr_data;
+	struct wr_list	w_list;
+	u32		wqe_head;
+	u8		rsvd[12];
+}__packed;


what the role of the reserved field, is that for alignment purposes? if 
yes, maybe better namewould be "align"

Nit, (same) checkpatch error here and below

ERROR: space required after that close brace '}'
#111: FILE: drivers/infiniband/hw/mlx5/mlx5_ib.h:139:
+}__packed;

> +
> +struct rwr_ctx {