diff mbox

[rdma-core,2/2] mlx5: Allow creation of a Multi-Packet RQ using direct verbs

Message ID 1511119268-5934-3-git-send-email-yishaih@mellanox.com (mailing list archive)
State Superseded
Headers show

Commit Message

Yishai Hadas Nov. 19, 2017, 7:21 p.m. UTC
From: Noa Osherovich <noaos@mellanox.com>

Add needed definitions to allow creation of a Multi-Packet RQ using the
mlx5 direct verbs interface.

In order to create a Multi-Packet RQ, one needs to provide a
mlx5dv_wq_init_attr containing the following information in its
striding_rq_attrs struct:
- single_stride_log_num_of_bytes: log of size of each stride
- single_wqe_log_num_of_strides: log of number of strides per WQE
- two_byte_shift_en: When enabled, hardware pads 2 bytes of zeros
  before writing the message to memory (e.g. for IP alignment).

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
---
 debian/ibverbs-providers.symbols |  2 ++
 providers/mlx5/CMakeLists.txt    |  2 +-
 providers/mlx5/libmlx5.map       |  5 +++
 providers/mlx5/mlx5-abi.h        |  9 +++++-
 providers/mlx5/mlx5dv.h          | 47 +++++++++++++++++++++++++++-
 providers/mlx5/verbs.c           | 66 ++++++++++++++++++++++++++++++++++++----
 6 files changed, 122 insertions(+), 9 deletions(-)

Comments

Jason Gunthorpe Nov. 20, 2017, 7:04 p.m. UTC | #1
On Sun, Nov 19, 2017 at 09:21:08PM +0200, Yishai Hadas wrote:
> +	if (mlx5wq_attr) {
> +		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
> +			return -EINVAL;

Please no. Check that only the bits this function actually supports
are set and get rid of _RESERVED.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Nov. 21, 2017, 6:05 a.m. UTC | #2
On Mon, Nov 20, 2017 at 12:04:34PM -0700, Jason Gunthorpe wrote:
> On Sun, Nov 19, 2017 at 09:21:08PM +0200, Yishai Hadas wrote:
> > +	if (mlx5wq_attr) {
> > +		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
> > +			return -EINVAL;
>
> Please no. Check that only the bits this function actually supports
> are set and get rid of _RESERVED.

It is classical API mistake do not properly check validity of input parameters [1].

Thanks

[1] https://lwn.net/Articles/726917/

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas Nov. 21, 2017, 8:21 a.m. UTC | #3
On 11/21/2017 8:05 AM, Leon Romanovsky wrote:
> On Mon, Nov 20, 2017 at 12:04:34PM -0700, Jason Gunthorpe wrote:
>> On Sun, Nov 19, 2017 at 09:21:08PM +0200, Yishai Hadas wrote:
>>> +	if (mlx5wq_attr) {
>>> +		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
>>> +			return -EINVAL;
>>
>> Please no. Check that only the bits this function actually supports
>> are set and get rid of _RESERVED.
> 
> It is classical API mistake do not properly check validity of input parameters [1].
> 

Each new bit that is added in the DV API is supported by the mlx5 driver 
that was compiled with, so there is no validity issue here when checking 
with the _RESERVED value. A case might be when a verb API exposes bits 
that might not be supported by a specific driver.

However, I'm fine with having some internal supported mask and drop the 
RESERVED_ as applications don't really need to see it, will have V1 with 
that change.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas Nov. 21, 2017, 12:01 p.m. UTC | #4
On 11/20/2017 9:04 PM, Jason Gunthorpe wrote:
> On Sun, Nov 19, 2017 at 09:21:08PM +0200, Yishai Hadas wrote:
>> +	if (mlx5wq_attr) {
>> +		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
>> +			return -EINVAL;
> 
> Please no. Check that only the bits this function actually supports
> are set and get rid of _RESERVED.
> 

OK, PR was updated accordingly:
https://github.com/linux-rdma/rdma-core/pull/257

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Nov. 21, 2017, 3:53 p.m. UTC | #5
On Tue, Nov 21, 2017 at 02:01:33PM +0200, Yishai Hadas wrote:
> On 11/20/2017 9:04 PM, Jason Gunthorpe wrote:
> >On Sun, Nov 19, 2017 at 09:21:08PM +0200, Yishai Hadas wrote:
> >>+	if (mlx5wq_attr) {
> >>+		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
> >>+			return -EINVAL;
> >
> >Please no. Check that only the bits this function actually supports
> >are set and get rid of _RESERVED.
> >
> 
> OK, PR was updated accordingly:
> https://github.com/linux-rdma/rdma-core/pull/257

This new version is mixing bit widths:

+enum mlx5dv_wq_init_attr_mask {
+	MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ	= 1 << 0,
+};

+enum {
+	DV_CREATE_WQ_SUPPORTED_COMP_MASK = MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ
+};

+struct mlx5dv_wq_init_attr {
+	uint64_t				comp_mask;
[..]

So this expression:

+	if (mlx5wq_attr->comp_mask & ~DV_CREATE_WQ_SUPPORTED_COMP_MASK)


Will bitwise invert a 32 bit value then 0 extend it to 64 bits, which
is not the intention.

As I suggested earlier, I recommend helper functions for this in
driver.h:

static inline bool check_comp_mask(uint64_t input, uint64_t supported)
{
     return (input & ~supported) == 0;
}


if (!check_comp_mask(mlx5wq_attr->comp_mask,
     MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ))
       return -EINVAL;
       
I also dislike the unnecessary extra enum..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yishai Hadas Nov. 21, 2017, 5:17 p.m. UTC | #6
On 11/21/2017 5:53 PM, Jason Gunthorpe wrote:
> On Tue, Nov 21, 2017 at 02:01:33PM +0200, Yishai Hadas wrote:
>> On 11/20/2017 9:04 PM, Jason Gunthorpe wrote:
>>> On Sun, Nov 19, 2017 at 09:21:08PM +0200, Yishai Hadas wrote:
>>>> +	if (mlx5wq_attr) {
>>>> +		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
>>>> +			return -EINVAL;
>>>
>>> Please no. Check that only the bits this function actually supports
>>> are set and get rid of _RESERVED.
>>>
>>
>> OK, PR was updated accordingly:
>> https://github.com/linux-rdma/rdma-core/pull/257
> 
> This new version is mixing bit widths:
> 
> +enum mlx5dv_wq_init_attr_mask {
> +	MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ	= 1 << 0,
> +};
> 
> +enum {
> +	DV_CREATE_WQ_SUPPORTED_COMP_MASK = MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ
> +};
> 
> +struct mlx5dv_wq_init_attr {
> +	uint64_t				comp_mask;
> [..]
> 
> So this expression:
> 
> +	if (mlx5wq_attr->comp_mask & ~DV_CREATE_WQ_SUPPORTED_COMP_MASK)
> 
> 
> Will bitwise invert a 32 bit value then 0 extend it to 64 bits, which
> is not the intention.

Are you are referring to a case that the user will turn on a bit in the 
comp_mask over the 32 bits and won't get an error ? Look at 
MLX5DV_CQ_INIT_ATTR_MASK_RESERVED usage it has the same potential issue.

> As I suggested earlier, I recommend helper functions for this in
> driver.h:
> 
> static inline bool check_comp_mask(uint64_t input, uint64_t supported)
> {
>       return (input & ~supported) == 0;

OK, can add this helper function in a pre-patch with a fix to the above 
CQ case.

> 
> if (!check_comp_mask(mlx5wq_attr->comp_mask,
>       MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ))
>         return -EINVAL;
>         
> I also dislike the unnecessary extra enum..

I prefer staying with the above enum (i.e. 
DV_CREATE_WQ_SUPPORTED_COMP_MASK) to hold the supported bits as done in 
mlx5 for QP & CQ. (see MLX5_CREATE_QP_SUP_COMP_MASK). In that way the 
code around check_comp_mask() won't be changed when extra bits will be 
added.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Nov. 21, 2017, 5:20 p.m. UTC | #7
On Tue, Nov 21, 2017 at 07:17:00PM +0200, Yishai Hadas wrote:

> >Will bitwise invert a 32 bit value then 0 extend it to 64 bits, which
> >is not the intention.
> 
> Are you are referring to a case that the user will turn on a bit in the
> comp_mask over the 32 bits and won't get an error ? Look at
> MLX5DV_CQ_INIT_ATTR_MASK_RESERVED usage it has the same potential issue.

Yes, I'm not surprised, this is possibly systemically broken. We
should add the helper and deploy it everywhere we are doing this style
of check.

> >if (!check_comp_mask(mlx5wq_attr->comp_mask,
> >      MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ))
> >        return -EINVAL;
> >I also dislike the unnecessary extra enum..
> 
> I prefer staying with the above enum (i.e. DV_CREATE_WQ_SUPPORTED_COMP_MASK)
> to hold the supported bits as done in mlx5 for QP & CQ. (see
> MLX5_CREATE_QP_SUP_COMP_MASK). In that way the code around check_comp_mask()
> won't be changed when extra bits will be added.

Well, it is up to you, but I dislike this because it sprinkles the
logic around too much. Seeing clearly a line 'check_comp_mask' at the
top of the function that lists all the supported bits makes it very
easy to understand and review.

If there was more than one user of DV_CREATE_WQ_SUPPORTED_COMP_MASK I
could understand having it, but since there isn't, it is just ugly
noise.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/debian/ibverbs-providers.symbols b/debian/ibverbs-providers.symbols
index cb21dc5..08ff906 100644
--- a/debian/ibverbs-providers.symbols
+++ b/debian/ibverbs-providers.symbols
@@ -8,8 +8,10 @@  libmlx5.so.1 ibverbs-providers #MINVER#
  MLX5_1.0@MLX5_1.0 13
  MLX5_1.1@MLX5_1.1 14
  MLX5_1.2@MLX5_1.2 15
+ MLX5_1.3@MLX5_1.3 16
  mlx5dv_init_obj@MLX5_1.0 13
  mlx5dv_init_obj@MLX5_1.2 15
  mlx5dv_query_device@MLX5_1.0 13
  mlx5dv_create_cq@MLX5_1.1 14
  mlx5dv_set_context_attr@MLX5_1.2 15
+ mlx5dv_create_wq@MLX5_1.3 16
diff --git a/providers/mlx5/CMakeLists.txt b/providers/mlx5/CMakeLists.txt
index ab6a42d..88a406d 100644
--- a/providers/mlx5/CMakeLists.txt
+++ b/providers/mlx5/CMakeLists.txt
@@ -11,7 +11,7 @@  if (MLX5_MW_DEBUG)
 endif()
 
 rdma_shared_provider(mlx5 libmlx5.map
-  1 1.2.${PACKAGE_VERSION}
+  1 1.3.${PACKAGE_VERSION}
   buf.c
   cq.c
   dbrec.c
diff --git a/providers/mlx5/libmlx5.map b/providers/mlx5/libmlx5.map
index 09d886d..b1402dc 100644
--- a/providers/mlx5/libmlx5.map
+++ b/providers/mlx5/libmlx5.map
@@ -17,3 +17,8 @@  MLX5_1.2 {
 		mlx5dv_init_obj;
 		mlx5dv_set_context_attr;
 } MLX5_1.1;
+
+MLX5_1.3 {
+	global:
+		mlx5dv_create_wq;
+} MLX5_1.2;
diff --git a/providers/mlx5/mlx5-abi.h b/providers/mlx5/mlx5-abi.h
index b569bd4..8d95418 100644
--- a/providers/mlx5/mlx5-abi.h
+++ b/providers/mlx5/mlx5-abi.h
@@ -210,6 +210,11 @@  struct mlx5_create_qp_resp {
 	__u32				uuar_index;
 };
 
+enum mlx5_create_wq_comp_mask {
+	MLX5_IB_CREATE_WQ_STRIDING_RQ =		1 << 0,
+	MLX5_IB_CREATE_WQ_RESERVED =		1 << 1,
+};
+
 struct mlx5_drv_create_wq {
 	__u64		buf_addr;
 	__u64		db_addr;
@@ -218,7 +223,9 @@  struct mlx5_drv_create_wq {
 	__u32		user_index;
 	__u32		flags;
 	__u32		comp_mask;
-	__u32		reserved;
+	__u32		single_stride_log_num_of_bytes;
+	__u32		single_wqe_log_num_of_strides;
+	__u32		two_byte_shift_en;
 };
 
 struct mlx5_create_wq {
diff --git a/providers/mlx5/mlx5dv.h b/providers/mlx5/mlx5dv.h
index 3566bcb..09c7361 100644
--- a/providers/mlx5/mlx5dv.h
+++ b/providers/mlx5/mlx5dv.h
@@ -213,6 +213,44 @@  enum mlx5dv_obj_type {
 	MLX5DV_OBJ_RWQ	= 1 << 3,
 };
 
+enum mlx5dv_wq_init_attr_mask {
+	MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ	= 1 << 0,
+	MLX5DV_WQ_INIT_ATTR_MASK_RESERVED	= 1 << 1,
+};
+
+struct mlx5dv_striding_rq_init_attr {
+	uint32_t	single_stride_log_num_of_bytes;
+	uint32_t	single_wqe_log_num_of_strides;
+	uint8_t		two_byte_shift_en;
+};
+
+struct mlx5dv_wq_init_attr {
+	uint64_t				comp_mask; /* Use enum mlx5dv_wq_init_attr_mask */
+	struct mlx5dv_striding_rq_init_attr	striding_rq_attrs;
+};
+
+/*
+ * This function creates a work queue object with extra properties
+ * defined by mlx5dv_wq_init_attr struct.
+ *
+ * For each bit in the comp_mask, a field in mlx5dv_wq_init_attr
+ * should follow.
+ *
+ * MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ: Create a work queue with
+ * striding RQ capabilities.
+ * - single_stride_log_num_of_bytes represents the size of each stride in the
+ *   WQE and its value should be between min_single_stride_log_num_of_bytes
+ *   and max_single_stride_log_num_of_bytes that are reported in
+ *   mlx5dv_query_device.
+ * - single_wqe_log_num_of_strides represents the number of strides in each WQE.
+ *   Its value should be between min_single_wqe_log_num_of_strides and
+ *   max_single_wqe_log_num_of_strides that are reported in mlx5dv_query_device.
+ * - two_byte_shift_en: When enabled, hardware pads 2 bytes of zeroes
+ *   before writing the message to memory (e.g. for IP alignment)
+ */
+struct ibv_wq *mlx5dv_create_wq(struct ibv_context *context,
+				struct ibv_wq_init_attr *wq_init_attr,
+				struct mlx5dv_wq_init_attr *mlx5_wq_attr);
 /*
  * This function will initialize mlx5dv_xxx structs based on supplied type.
  * The information for initialization is taken from ibv_xx structs supplied
@@ -328,7 +366,9 @@  struct mlx5_tm_cqe {
 struct mlx5_cqe64 {
 	union {
 		struct {
-			uint8_t		rsvd0[17];
+			uint8_t		rsvd0[2];
+			__be16		wqe_id;
+			uint8_t		rsvd4[13];
 			uint8_t		ml_path;
 			uint8_t		rsvd20[4];
 			__be16		slid;
@@ -455,6 +495,11 @@  struct mlx5_wqe_ctrl_seg {
 	__be32		imm;
 };
 
+struct mlx5_mprq_wqe {
+	struct mlx5_wqe_srq_next_seg	nseg;
+	struct mlx5_wqe_data_seg	dseg;
+};
+
 struct mlx5_wqe_av {
 	union {
 		struct {
diff --git a/providers/mlx5/verbs.c b/providers/mlx5/verbs.c
index 478d9a1..c555973 100644
--- a/providers/mlx5/verbs.c
+++ b/providers/mlx5/verbs.c
@@ -953,19 +953,29 @@  static int mlx5_calc_sq_size(struct mlx5_context *ctx,
 
 static int mlx5_calc_rwq_size(struct mlx5_context *ctx,
 			      struct mlx5_rwq *rwq,
-			      struct ibv_wq_init_attr *attr)
+			      struct ibv_wq_init_attr *attr,
+			      struct mlx5dv_wq_init_attr *mlx5wq_attr)
 {
 	size_t wqe_size;
 	int wq_size;
 	uint32_t num_scatter;
+	int is_mprq = 0;
 	int scat_spc;
 
 	if (!attr->max_wr)
 		return -EINVAL;
+	if (mlx5wq_attr) {
+		if (mlx5wq_attr->comp_mask & ~(MLX5DV_WQ_INIT_ATTR_MASK_RESERVED - 1))
+			return -EINVAL;
+
+		is_mprq = !!(mlx5wq_attr->comp_mask &
+			     MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ);
+	}
 
 	/* TBD: check caps for RQ */
 	num_scatter = max_t(uint32_t, attr->max_sge, 1);
-	wqe_size = sizeof(struct mlx5_wqe_data_seg) * num_scatter;
+	wqe_size = sizeof(struct mlx5_wqe_data_seg) * num_scatter +
+		sizeof(struct mlx5_wqe_srq_next_seg) * is_mprq;
 
 	if (rwq->wq_sig)
 		wqe_size += sizeof(struct mlx5_rwqe_sig);
@@ -980,7 +990,8 @@  static int mlx5_calc_rwq_size(struct mlx5_context *ctx,
 	rwq->rq.wqe_shift = mlx5_ilog2(wqe_size);
 	rwq->rq.max_post = 1 << mlx5_ilog2(wq_size / wqe_size);
 	scat_spc = wqe_size -
-		((rwq->wq_sig) ? sizeof(struct mlx5_rwqe_sig) : 0);
+		((rwq->wq_sig) ? sizeof(struct mlx5_rwqe_sig) : 0) -
+		is_mprq * sizeof(struct mlx5_wqe_srq_next_seg);
 	rwq->rq.max_gs = scat_spc / sizeof(struct mlx5_wqe_data_seg);
 	return wq_size;
 }
@@ -2225,8 +2236,9 @@  static int mlx5_alloc_rwq_buf(struct ibv_context *context,
 	return 0;
 }
 
-struct ibv_wq *mlx5_create_wq(struct ibv_context *context,
-			      struct ibv_wq_init_attr *attr)
+static struct ibv_wq *create_wq(struct ibv_context *context,
+			 struct ibv_wq_init_attr *attr,
+			 struct mlx5dv_wq_init_attr *mlx5wq_attr)
 {
 	struct mlx5_create_wq		cmd;
 	struct mlx5_create_wq_resp		resp;
@@ -2251,7 +2263,7 @@  struct ibv_wq *mlx5_create_wq(struct ibv_context *context,
 	if (rwq->wq_sig)
 		cmd.drv.flags = MLX5_RWQ_FLAG_SIGNATURE;
 
-	ret = mlx5_calc_rwq_size(ctx, rwq, attr);
+	ret = mlx5_calc_rwq_size(ctx, rwq, attr, mlx5wq_attr);
 	if (ret < 0) {
 		errno = -ret;
 		goto err;
@@ -2285,6 +2297,35 @@  struct ibv_wq *mlx5_create_wq(struct ibv_context *context,
 	}
 
 	cmd.drv.user_index = usr_idx;
+
+	if (mlx5wq_attr) {
+		if (mlx5wq_attr->comp_mask & MLX5DV_WQ_INIT_ATTR_MASK_STRIDING_RQ) {
+			if ((mlx5wq_attr->striding_rq_attrs.single_stride_log_num_of_bytes <
+			    ctx->striding_rq_caps.min_single_stride_log_num_of_bytes) ||
+			    (mlx5wq_attr->striding_rq_attrs.single_stride_log_num_of_bytes >
+			     ctx->striding_rq_caps.max_single_stride_log_num_of_bytes)) {
+				errno = EINVAL;
+				goto err_create;
+			}
+
+			if ((mlx5wq_attr->striding_rq_attrs.single_wqe_log_num_of_strides <
+			     ctx->striding_rq_caps.min_single_wqe_log_num_of_strides) ||
+			    (mlx5wq_attr->striding_rq_attrs.single_wqe_log_num_of_strides >
+			     ctx->striding_rq_caps.max_single_wqe_log_num_of_strides)) {
+				errno = EINVAL;
+				goto err_create;
+			}
+
+			cmd.drv.single_stride_log_num_of_bytes =
+				mlx5wq_attr->striding_rq_attrs.single_stride_log_num_of_bytes;
+			cmd.drv.single_wqe_log_num_of_strides =
+				mlx5wq_attr->striding_rq_attrs.single_wqe_log_num_of_strides;
+			cmd.drv.two_byte_shift_en =
+				mlx5wq_attr->striding_rq_attrs.two_byte_shift_en;
+			cmd.drv.comp_mask |= MLX5_IB_CREATE_WQ_STRIDING_RQ;
+		}
+	}
+
 	err = ibv_cmd_create_wq(context, attr, &rwq->wq, &cmd.ibv_cmd,
 				sizeof(cmd.ibv_cmd),
 				sizeof(cmd),
@@ -2310,6 +2351,19 @@  err:
 	return NULL;
 }
 
+struct ibv_wq *mlx5_create_wq(struct ibv_context *context,
+			      struct ibv_wq_init_attr *attr)
+{
+	return create_wq(context, attr, NULL);
+}
+
+struct ibv_wq *mlx5dv_create_wq(struct ibv_context *context,
+				struct ibv_wq_init_attr *attr,
+				struct mlx5dv_wq_init_attr *mlx5_wq_attr)
+{
+	return create_wq(context, attr, mlx5_wq_attr);
+}
+
 int mlx5_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *attr)
 {
 	struct mlx5_modify_wq	cmd = {};