Message ID | 5694EEAA.3050600@mellanox.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
>> Each time we access each of these arrays, even for a single index >> we fetch a cacheline. Reduce cacheline bounces by fitting these members >> in a cacheline aligned struct (swr_ctx) and allocate an array. Accessing >> this array will fetch all of these members in a single shot. >> >> Since the receive queue needs only the wrid we use a nameless union >> where in the rwr_ctx we only have wrid member. > > Have some performance numbers before/after this patch to support the > proposed change? I didn't took the time to measure cache hit/miss. I just noticed it a while ago and it's been bugging me for some time so I figured I'd send it out... > Also, I have asked you the same question re the iser remote invalidation > series [1], this datais needed there too. I didn't get to it, we had rough initial numbers, but we have yet to do a full evaluation on different devices. > +/* Please don't let this exceed a single cacheline */ > +struct swr_ctx { > + u64 wrid; > + u32 wr_data; > + struct wr_list w_list; > + u32 wqe_head; > + u8 rsvd[12]; > +}__packed; > > > what the role of the reserved field, is that for alignment purposes? if > yes, maybe better namewould be "align" OK. I can change it. > > Nit, (same) checkpatch error here and below > > ERROR: space required after that close brace '}' > #111: FILE: drivers/infiniband/hw/mlx5/mlx5_ib.h:139: > +}__packed; Will fix. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1/12/2016 4:44 PM, Sagi Grimberg wrote: > >>> Each time we access each of these arrays, even for a single index >>> we fetch a cacheline. Reduce cacheline bounces by fitting these members >>> in a cacheline aligned struct (swr_ctx) and allocate an array. >>> Accessing >>> this array will fetch all of these members in a single shot. >>> >>> Since the receive queue needs only the wrid we use a nameless union >>> where in the rwr_ctx we only have wrid member. >> >> Have some performance numbers before/after this patch to support the >> proposed change? > > I didn't took the time to measure cache hit/miss. I just noticed it > a while ago and it's been bugging me for some time so I figured I'd > send it out... The thing is that for data-path changes on high performance network drivers, we @ least need to know that the perf is as good as it was before the change. So you could run your iser perf IOPS test before/after the change and post 1-2 lines with results as part of the change-log. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> I didn't took the time to measure cache hit/miss. I just noticed it >> a while ago and it's been bugging me for some time so I figured I'd >> send it out... > > The thing is that for data-path changes on high performance network > drivers, we @ least need to know that the perf is as good as it was > before the change. So you could run your iser perf IOPS test > before/after the change and post 1-2 lines with results as part of the > change-log. I tested iser perf and it seems to sustain. I didn't see any major difference but I really don't think block storage iops/latency is not the correct way to evaluate this change. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1/12/2016 4:58 PM, Sagi Grimberg wrote: > >>> I didn't took the time to measure cache hit/miss. I just noticed it >>> a while ago and it's been bugging me for some time so I figured I'd >>> send it out... >> >> The thing is that for data-path changes on high performance network >> drivers, we @ least need to know that the perf is as good as it was >> before the change. So you could run your iser perf IOPS test >> before/after the change and post 1-2 lines with results as part of the >> change-log. > > I tested iser perf and it seems to sustain. I didn't see any major > difference but I really don't think block storage iops/latency is not > the correct way to evaluate this change. maybe one nice option would be to take a look on how things are organized in libmlx5 around that corner -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> maybe one nice option would be to take a look on how things are > organized in libmlx5 around that corner We don't have this accounting in libmlx5. Although I suspect we'll have it at some point. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1/12/2016 4:58 PM, Sagi Grimberg wrote: > I tested iser perf and it seems to sustain. That's good. I also think we should move fwd with the patch once the review is passed OKay, even if we don't have the means to evaluate the gain, future will tell. > I didn't see any major difference but I really don't think block > storage iops/latency is not the correct way to evaluate this change. yes... pktgen doesn't work for IPoIB (if it was, we could use 64B UDP packets w.o csum to see the impact) maybe RDS (rds-stress tool) has more ops/sec then conventional block rdma driver? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -129,11 +129,24 @@ struct wr_list { u16 next; }; +/* Please don't let this exceed a single cacheline */ +struct swr_ctx { + u64 wrid; + u32 wr_data; + struct wr_list w_list; + u32 wqe_head; + u8 rsvd[12]; +}__packed; what the role of the reserved field, is that for alignment purposes? if yes, maybe better namewould be "align" Nit, (same) checkpatch error here and below ERROR: space required after that close brace '}' #111: FILE: drivers/infiniband/hw/mlx5/mlx5_ib.h:139: +}__packed; > + > +struct rwr_ctx {