From patchwork Mon Mar 25 14:45:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10869425 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D5541708 for ; Mon, 25 Mar 2019 14:46:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED5042939E for ; Mon, 25 Mar 2019 14:46:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1E9A293F7; Mon, 25 Mar 2019 14:46:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A3322939E for ; Mon, 25 Mar 2019 14:46:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725788AbfCYOqT (ORCPT ); Mon, 25 Mar 2019 10:46:19 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:59043 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726010AbfCYOqT (ORCPT ); Mon, 25 Mar 2019 10:46:19 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGjl027250; Mon, 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGmN014128; Mon, 25 Mar 2019 16:46:16 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2PEkGSG014125; Mon, 25 Mar 2019 16:46:16 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, artemyko@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 1/4] mlx5: Expose DV APIs to create and destroy indirect mkey Date: Mon, 25 Mar 2019 16:45:48 +0200 Message-Id: <1553525151-14005-2-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> References: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Expose DV APIs to create and destroy indirect mkey, the internal implementation is done over the DEVX API. Reviewed-by: Artemy Kovalyov Signed-off-by: Yishai Hadas --- debian/ibverbs-providers.symbols | 2 + providers/mlx5/libmlx5.map | 2 + providers/mlx5/man/CMakeLists.txt | 2 + providers/mlx5/man/mlx5dv_create_mkey.3.md | 75 +++++++++++++++++++++++++ providers/mlx5/mlx5.h | 6 ++ providers/mlx5/mlx5_ifc.h | 89 ++++++++++++++++++++++++++++++ providers/mlx5/mlx5dv.h | 18 ++++++ providers/mlx5/verbs.c | 61 ++++++++++++++++++++ 8 files changed, 255 insertions(+) create mode 100644 providers/mlx5/man/mlx5dv_create_mkey.3.md diff --git a/debian/ibverbs-providers.symbols b/debian/ibverbs-providers.symbols index 309bbef..f2f64ae 100644 --- a/debian/ibverbs-providers.symbols +++ b/debian/ibverbs-providers.symbols @@ -58,4 +58,6 @@ libmlx5.so.1 ibverbs-providers #MINVER# mlx5dv_devx_destroy_cmd_comp@MLX5_1.9 23 mlx5dv_devx_get_async_cmd_comp@MLX5_1.9 23 mlx5dv_devx_obj_query_async@MLX5_1.9 23 + mlx5dv_create_mkey@MLX5_1.10 24 + mlx5dv_destroy_mkey@MLX5_1.10 24 mlx5dv_qp_ex_from_ibv_qp_ex@MLX5_1.10 24 diff --git a/providers/mlx5/libmlx5.map b/providers/mlx5/libmlx5.map index 862cb38..c97874a 100644 --- a/providers/mlx5/libmlx5.map +++ b/providers/mlx5/libmlx5.map @@ -83,5 +83,7 @@ MLX5_1.9 { MLX5_1.10 { global: + mlx5dv_create_mkey; + mlx5dv_destroy_mkey; mlx5dv_qp_ex_from_ibv_qp_ex; } MLX5_1.9; diff --git a/providers/mlx5/man/CMakeLists.txt b/providers/mlx5/man/CMakeLists.txt index 24bd5d8..dedfd98 100644 --- a/providers/mlx5/man/CMakeLists.txt +++ b/providers/mlx5/man/CMakeLists.txt @@ -4,6 +4,7 @@ rdma_man_pages( mlx5dv_create_flow_action_modify_header.3.md mlx5dv_create_flow_action_packet_reformat.3.md mlx5dv_create_flow_matcher.3.md + mlx5dv_create_mkey.3.md mlx5dv_create_qp.3.md mlx5dv_devx_alloc_uar.3.md mlx5dv_devx_create_cmd_comp.3.md @@ -22,6 +23,7 @@ rdma_man_pages( mlx5dv.7 ) rdma_alias_man_pages( + mlx5dv_create_mkey.3 mlx5dv_destroy_mkey.3 mlx5dv_devx_alloc_uar.3 mlx5dv_devx_free_uar.3 mlx5dv_devx_create_cmd_comp.3 mlx5dv_devx_destroy_cmd_comp.3 mlx5dv_devx_create_cmd_comp.3 mlx5dv_devx_get_async_cmd_comp.3 diff --git a/providers/mlx5/man/mlx5dv_create_mkey.3.md b/providers/mlx5/man/mlx5dv_create_mkey.3.md new file mode 100644 index 0000000..0a03d26 --- /dev/null +++ b/providers/mlx5/man/mlx5dv_create_mkey.3.md @@ -0,0 +1,75 @@ +--- +layout: page +title: mlx5dv_create_mkey / mlx5dv_destroy_mkey +section: 3 +tagline: Verbs +--- + +# NAME + +mlx5dv_create_mkey - Creates an indirect mkey + +mlx5dv_create_mkey - Destroys an indirect mkey + +# SYNOPSIS + +```c +#include + +struct mlx5dv_mkey_init_attr { + struct ibv_pd *pd; + uint32_t create_flags; + uint16_t max_entries; +}; + +struct mlx5dv_mkey { + uint32_t lkey; + uint32_t rkey; +}; + +struct mlx5dv_mkey * +mlx5dv_create_mkey(struct mlx5dv_mkey_init_attr *mkey_init_attr); + +int mlx5dv_destroy_mkey(struct mlx5dv_mkey *mkey); + +``` + +# DESCRIPTION + +Create / destroy an indirect mkey. + +Create an indirect mkey to enable application uses its specific device functionality. + +# ARGUMENTS + +##mkey_init_attr## + +*pd* +: ibv protection domain. + +*create_flags* +: MLX5DV_MKEY_INIT_ATTR_FLAGS_INDIRECT: + Indirect mkey is being created. + +*max_entries* +: Requested max number of pointed entries by this indirect mkey. + The function will update the *mkey_init_attr->max_entries* with the actual mkey value that was created; it will be greater than or equal to the value requested. + +# RETURN VALUE + +Upon success *mlx5dv_create_mkey* will return a new *struct +mlx5dv_mkey* on error NULL will be returned and errno will be set. + +Upon success destroy 0 is returned or the value of errno on a failure. + +# Notes + +To let this functionality works a DEVX context should be opened by using *mlx5dv_open_device*. + +# SEE ALSO + +**mlx5dv_open_device** + +#AUTHOR + +Yishai Hadas diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h index c7c54fd..14a610c 100644 --- a/providers/mlx5/mlx5.h +++ b/providers/mlx5/mlx5.h @@ -601,6 +601,12 @@ struct mlx5_devx_umem { uint32_t handle; }; +struct mlx5_mkey { + struct mlx5dv_mkey dv_mkey; + struct mlx5dv_devx_obj *devx_obj; + uint16_t num_desc; +}; + static inline int mlx5_ilog2(int n) { int t; diff --git a/providers/mlx5/mlx5_ifc.h b/providers/mlx5/mlx5_ifc.h index 5cf89d6..72f735a 100644 --- a/providers/mlx5/mlx5_ifc.h +++ b/providers/mlx5/mlx5_ifc.h @@ -38,6 +38,7 @@ enum mlx5_cap_mode { enum { MLX5_CMD_OP_QUERY_HCA_CAP = 0x100, + MLX5_CMD_OP_CREATE_MKEY = 0x200, }; struct mlx5_ifc_atomic_caps_bits { @@ -98,3 +99,91 @@ struct mlx5_ifc_query_hca_cap_in_bits { enum mlx5_cap_type { MLX5_CAP_ATOMIC = 3, }; + +enum { + MLX5_MKC_ACCESS_MODE_KLMS = 0x2, +}; + +struct mlx5_ifc_mkc_bits { + u8 reserved_at_0[0x1]; + u8 free[0x1]; + u8 reserved_at_2[0x1]; + u8 access_mode_4_2[0x3]; + u8 reserved_at_6[0x7]; + u8 relaxed_ordering_write[0x1]; + u8 reserved_at_e[0x1]; + u8 small_fence_on_rdma_read_response[0x1]; + u8 umr_en[0x1]; + u8 a[0x1]; + u8 rw[0x1]; + u8 rr[0x1]; + u8 lw[0x1]; + u8 lr[0x1]; + u8 access_mode_1_0[0x2]; + u8 reserved_at_18[0x8]; + + u8 qpn[0x18]; + u8 mkey_7_0[0x8]; + + u8 reserved_at_40[0x20]; + + u8 length64[0x1]; + u8 bsf_en[0x1]; + u8 sync_umr[0x1]; + u8 reserved_at_63[0x2]; + u8 expected_sigerr_count[0x1]; + u8 reserved_at_66[0x1]; + u8 en_rinval[0x1]; + u8 pd[0x18]; + + u8 start_addr[0x40]; + + u8 len[0x40]; + + u8 bsf_octword_size[0x20]; + + u8 reserved_at_120[0x80]; + + u8 translations_octword_size[0x20]; + + u8 reserved_at_1c0[0x1b]; + u8 log_page_size[0x5]; + + u8 reserved_at_1e0[0x20]; +}; + +struct mlx5_ifc_create_mkey_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x8]; + u8 mkey_index[0x18]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_create_mkey_in_bits { + u8 opcode[0x10]; + u8 reserved_at_10[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x20]; + + u8 pg_access[0x1]; + u8 mkey_umem_valid[0x1]; + u8 reserved_at_62[0x1e]; + + struct mlx5_ifc_mkc_bits memory_key_mkey_entry; + + u8 reserved_at_280[0x80]; + + u8 translations_octword_actual_size[0x20]; + + u8 reserved_at_320[0x560]; + + u8 klm_pas_mtt[0][0x20]; +}; diff --git a/providers/mlx5/mlx5dv.h b/providers/mlx5/mlx5dv.h index de4018c..ce033dc 100644 --- a/providers/mlx5/mlx5dv.h +++ b/providers/mlx5/mlx5dv.h @@ -168,6 +168,24 @@ enum mlx5dv_qp_create_flags { MLX5DV_QP_CREATE_PACKET_BASED_CREDIT_MODE = 1 << 5, }; +enum mlx5dv_mkey_init_attr_flags { + MLX5DV_MKEY_INIT_ATTR_FLAGS_INDIRECT = 1 << 0, +}; + +struct mlx5dv_mkey_init_attr { + struct ibv_pd *pd; + uint32_t create_flags; /* Use enum mlx5dv_mkey_init_attr_flags */ + uint16_t max_entries; /* Requested max number of pointed entries by this indirect mkey */ +}; + +struct mlx5dv_mkey { + uint32_t lkey; + uint32_t rkey; +}; + +struct mlx5dv_mkey *mlx5dv_create_mkey(struct mlx5dv_mkey_init_attr *mkey_init_attr); +int mlx5dv_destroy_mkey(struct mlx5dv_mkey *mkey); + enum mlx5dv_qp_init_attr_mask { MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS = 1 << 0, MLX5DV_QP_INIT_ATTR_MASK_DC = 1 << 1, diff --git a/providers/mlx5/verbs.c b/providers/mlx5/verbs.c index abbbf5a..839f43c 100644 --- a/providers/mlx5/verbs.c +++ b/providers/mlx5/verbs.c @@ -4500,3 +4500,64 @@ int mlx5dv_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp, return 0; } +struct mlx5dv_mkey *mlx5dv_create_mkey(struct mlx5dv_mkey_init_attr *mkey_init_attr) +{ + uint32_t out[DEVX_ST_SZ_DW(create_mkey_out)] = {}; + uint32_t in[DEVX_ST_SZ_DW(create_mkey_in)] = {}; + struct mlx5_mkey *mkey; + void *mkc; + + if (!mkey_init_attr->create_flags || + !check_comp_mask(mkey_init_attr->create_flags, + MLX5DV_MKEY_INIT_ATTR_FLAGS_INDIRECT)) { + errno = EOPNOTSUPP; + return NULL; + } + + mkey = calloc(1, sizeof(*mkey)); + if (!mkey) { + errno = ENOMEM; + return NULL; + } + + mkey->num_desc = align(mkey_init_attr->max_entries, 4); + DEVX_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY); + mkc = DEVX_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry); + DEVX_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_KLMS); + DEVX_SET(mkc, mkc, free, 1); + DEVX_SET(mkc, mkc, umr_en, 1); + DEVX_SET(mkc, mkc, pd, to_mpd(mkey_init_attr->pd)->pdn); + DEVX_SET(mkc, mkc, translations_octword_size, mkey->num_desc); + DEVX_SET(mkc, mkc, lr, 1); + DEVX_SET(mkc, mkc, qpn, 0xffffff); + DEVX_SET(mkc, mkc, mkey_7_0, 0); + + mkey->devx_obj = mlx5dv_devx_obj_create(mkey_init_attr->pd->context, + in, sizeof(in), out, sizeof(out)); + if (!mkey->devx_obj) + goto end; + + mkey_init_attr->max_entries = mkey->num_desc; + mkey->dv_mkey.lkey = (DEVX_GET(create_mkey_out, out, mkey_index) << 8) | 0; + mkey->dv_mkey.rkey = mkey->dv_mkey.lkey; + + return &mkey->dv_mkey; +end: + free(mkey); + return NULL; +} + +int mlx5dv_destroy_mkey(struct mlx5dv_mkey *dv_mkey) +{ + struct mlx5_mkey *mkey = container_of(dv_mkey, struct mlx5_mkey, + dv_mkey); + int ret; + + ret = mlx5dv_devx_obj_destroy(mkey->devx_obj); + if (ret) + return ret; + + free(mkey); + return 0; +} + From patchwork Mon Mar 25 14:45:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10869427 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B2C41575 for ; Mon, 25 Mar 2019 14:46:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77EE2293F7 for ; Mon, 25 Mar 2019 14:46:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6BB4A293FC; Mon, 25 Mar 2019 14:46:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 11015293F7 for ; Mon, 25 Mar 2019 14:46:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729031AbfCYOqX (ORCPT ); Mon, 25 Mar 2019 10:46:23 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:59065 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726010AbfCYOqX (ORCPT ); Mon, 25 Mar 2019 10:46:23 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGwd027253; Mon, 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGIX014132; Mon, 25 Mar 2019 16:46:16 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2PEkGJr014131; Mon, 25 Mar 2019 16:46:16 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, artemyko@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 2/4] verbs: Introduce IBV_WR/WC_DRIVER opcodes Date: Mon, 25 Mar 2019 16:45:49 +0200 Message-Id: <1553525151-14005-3-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> References: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce IBV_WR/WC_DRIVER1 opcode to be used/defined per driver for its use case. Reviewed-by: Artemy Kovalyov Signed-off-by: Yishai Hadas --- libibverbs/man/ibv_poll_cq.3 | 2 ++ libibverbs/man/ibv_post_send.3 | 2 ++ libibverbs/verbs.h | 2 ++ providers/rxe/rxe.c | 1 + 4 files changed, 7 insertions(+) diff --git a/libibverbs/man/ibv_poll_cq.3 b/libibverbs/man/ibv_poll_cq.3 index 12d1a76..957fd15 100644 --- a/libibverbs/man/ibv_poll_cq.3 +++ b/libibverbs/man/ibv_poll_cq.3 @@ -82,6 +82,8 @@ The user should consume work completions at a rate that prevents CQ overrun from occurrence. In case of a CQ overrun, the async event .B IBV_EVENT_CQ_ERR will be triggered, and the CQ cannot be used. +.PP +IBV_WC_DRIVER1 will be reported as a response to IBV_WR_DRIVER1 opcode. .SH "SEE ALSO" .BR ibv_post_send (3), .BR ibv_post_recv (3) diff --git a/libibverbs/man/ibv_post_send.3 b/libibverbs/man/ibv_post_send.3 index e6514d0..4fb99f7 100644 --- a/libibverbs/man/ibv_post_send.3 +++ b/libibverbs/man/ibv_post_send.3 @@ -166,6 +166,8 @@ request is fully executed and a work completion has been retrieved from the corresponding completion queue (CQ). However, if the IBV_SEND_INLINE flag was set, the buffer can be reused immediately after the call returns. +.PP +IBV_WR_DRIVER1 is an opcode that should be used to issue a specific driver operation. .SH "SEE ALSO" .BR ibv_create_qp (3), .BR ibv_create_ah (3), diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index a2bae25..cb2d843 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -497,6 +497,7 @@ enum ibv_wc_opcode { IBV_WC_TM_SYNC, IBV_WC_TM_RECV, IBV_WC_TM_NO_TAG, + IBV_WC_DRIVER1, }; enum { @@ -1057,6 +1058,7 @@ enum ibv_wr_opcode { IBV_WR_BIND_MW, IBV_WR_SEND_WITH_INV, IBV_WR_TSO, + IBV_WR_DRIVER1, }; enum ibv_send_flags { diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c index 4c21a4a..909c3f7 100644 --- a/providers/rxe/rxe.c +++ b/providers/rxe/rxe.c @@ -585,6 +585,7 @@ static void convert_send_wr(struct rxe_send_wr *kwr, struct ibv_send_wr *uwr) case IBV_WR_BIND_MW: case IBV_WR_SEND_WITH_INV: case IBV_WR_TSO: + case IBV_WR_DRIVER1: break; } } From patchwork Mon Mar 25 14:45:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10869431 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1F4C1708 for ; Mon, 25 Mar 2019 14:46:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C845293F7 for ; Mon, 25 Mar 2019 14:46:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90DA8293FC; Mon, 25 Mar 2019 14:46:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56AA5293F8 for ; Mon, 25 Mar 2019 14:46:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729023AbfCYOqY (ORCPT ); Mon, 25 Mar 2019 10:46:24 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:59066 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726217AbfCYOqY (ORCPT ); Mon, 25 Mar 2019 10:46:24 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGIh027256; Mon, 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkG5Y014136; Mon, 25 Mar 2019 16:46:16 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2PEkG1r014135; Mon, 25 Mar 2019 16:46:16 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, artemyko@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 3/4] mlx5: Introduce mlx5dv_wr_mr_interleaved post send builder Date: Mon, 25 Mar 2019 16:45:50 +0200 Message-Id: <1553525151-14005-4-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> References: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce mlx5dv_wr_mr_interleaved() post send builder to be used for issuing a WR that may register an interleaved memory layout. Reviewed-by: Artemy Kovalyov Signed-off-by: Yishai Hadas --- providers/mlx5/man/mlx5dv_create_qp.3.md | 9 ++ providers/mlx5/man/mlx5dv_wr_post.3.md | 35 ++++++ providers/mlx5/mlx5dv.h | 50 +++++++++ providers/mlx5/qp.c | 178 ++++++++++++++++++++++++++++++- providers/mlx5/verbs.c | 37 +++++-- 5 files changed, 293 insertions(+), 16 deletions(-) diff --git a/providers/mlx5/man/mlx5dv_create_qp.3.md b/providers/mlx5/man/mlx5dv_create_qp.3.md index 7a93e84..74a2193 100644 --- a/providers/mlx5/man/mlx5dv_create_qp.3.md +++ b/providers/mlx5/man/mlx5dv_create_qp.3.md @@ -38,6 +38,7 @@ struct mlx5dv_qp_init_attr { uint64_t comp_mask; uint32_t create_flags; struct mlx5dv_dc_init_attr dc_init_attr; + uint64_t send_ops_flags; }; ``` @@ -47,6 +48,8 @@ struct mlx5dv_qp_init_attr { valid values in *create_flags* MLX5DV_QP_INIT_ATTR_MASK_DC: valid values in *dc_init_attr* + MLX5DV_QP_INIT_ATTR_MASK_SEND_OPS_FLAGS: + valid values in *send_ops_flags* *create_flags* : A bitwise OR of the various values described below. @@ -95,6 +98,12 @@ struct mlx5dv_dc_init_attr { : used to create a DCT QP. +*send_ops_flags* +: A bitwise OR of the various values described below. + + MLX5DV_QP_EX_WITH_MR_INTERLEAVED: + Enables the mlx5dv_wr_mr_interleaved() work requset on this QP. + # NOTES **mlx5dv_qp_ex_from_ibv_qp_ex()** is used to get *struct mlx5dv_qp_ex* for diff --git a/providers/mlx5/man/mlx5dv_wr_post.3.md b/providers/mlx5/man/mlx5dv_wr_post.3.md index 2c17627..42e680c 100644 --- a/providers/mlx5/man/mlx5dv_wr_post.3.md +++ b/providers/mlx5/man/mlx5dv_wr_post.3.md @@ -22,6 +22,20 @@ static inline void mlx5dv_wr_set_dc_addr(struct mlx5dv_qp_ex *mqp, struct ibv_ah *ah, uint32_t remote_dctn, uint64_t remote_dc_key); + +struct mlx5dv_mr_interleaved { + uint64_t addr; + uint32_t bytes_count; + uint32_t bytes_skip; + uint32_t lkey; +}; + +static inline void mlx5dv_wr_mr_interleaved(struct mlx5dv_qp_ex *mqp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, /* use enum ibv_access_flags */ + uint32_t repeat_count, + uint16_t num_interleaved, + struct mlx5dv_mr_interleaved *data); ``` # DESCRIPTION @@ -45,6 +59,27 @@ features on the posted WR. A work request creation requires to use the ibv_qp_ex as described in the man for ibv_wr_post and mlx5dv_qp with its available builders and setters. +## QP Specific builders +*RC* QPs +: *mlx5dv_wr_mr_interleaved()* + + registers an interleaved memory layout by using an indirect mkey and some interleaved data. + The layout of the memory pointed by the mkey after its registration will be the *data* representation for the *num_interleaved* entries. + This single layout representation is repeated by *repeat_count*. + + The *data* as described by struct mlx5dv_mr_interleaved will hold real data defined by *bytes_count* and then a padding of *bytes_skip*. + Post a successful registration, RDMA operations can use this *mkey*. The hardware will scatter the data according to the pattern. + The *mkey* should be used in a zero-based mode. The *addr* field in its *ibv_sge* is an offset in the total data. + + Current implementation requires the IBV_SEND_INLINE option to be on in *ibv_qp_ex->wr_flags* field. + To be able to have more than 3 *num_interleaved* entries, the QP should be created with a larger WQE size that may fit it. + This should be done using the *max_inline_data* attribute of *struct ibv_qp_cap* upon its creation. + + As one entry will be consumed for strided header, the *mkey* should be created with one more entry than the required *num_interleaved*. + + In case *ibv_qp_ex->wr_flags* turns on IBV_SEND_SIGNALED, the reported WC opcode will be MLX5DV_WC_UMR. + Unregister the *mkey* to enable another pattern registration should be done via ibv_post_send with IBV_WR_LOCAL_INV opcode. + ## QP Specific setters *DCI* QPs diff --git a/providers/mlx5/mlx5dv.h b/providers/mlx5/mlx5dv.h index ce033dc..c5aae57 100644 --- a/providers/mlx5/mlx5dv.h +++ b/providers/mlx5/mlx5dv.h @@ -189,6 +189,7 @@ int mlx5dv_destroy_mkey(struct mlx5dv_mkey *mkey); enum mlx5dv_qp_init_attr_mask { MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS = 1 << 0, MLX5DV_QP_INIT_ATTR_MASK_DC = 1 << 1, + MLX5DV_QP_INIT_ATTR_MASK_SEND_OPS_FLAGS = 1 << 2, }; enum mlx5dv_dc_type { @@ -201,16 +202,32 @@ struct mlx5dv_dc_init_attr { uint64_t dct_access_key; }; +enum mlx5dv_qp_create_send_ops_flags { + MLX5DV_QP_EX_WITH_MR_INTERLEAVED = 1 << 0, +}; + struct mlx5dv_qp_init_attr { uint64_t comp_mask; /* Use enum mlx5dv_qp_init_attr_mask */ uint32_t create_flags; /* Use enum mlx5dv_qp_create_flags */ struct mlx5dv_dc_init_attr dc_init_attr; + uint64_t send_ops_flags; /* Use enum mlx5dv_qp_create_send_ops_flags */ }; struct ibv_qp *mlx5dv_create_qp(struct ibv_context *context, struct ibv_qp_init_attr_ex *qp_attr, struct mlx5dv_qp_init_attr *mlx5_qp_attr); +struct mlx5dv_mr_interleaved { + uint64_t addr; + uint32_t bytes_count; + uint32_t bytes_skip; + uint32_t lkey; +}; + +enum mlx5dv_wc_opcode { + MLX5DV_WC_UMR = IBV_WC_DRIVER1, +}; + struct mlx5dv_qp_ex { uint64_t comp_mask; /* @@ -219,6 +236,12 @@ struct mlx5dv_qp_ex { */ void (*wr_set_dc_addr)(struct mlx5dv_qp_ex *mqp, struct ibv_ah *ah, uint32_t remote_dctn, uint64_t remote_dc_key); + void (*wr_mr_interleaved)(struct mlx5dv_qp_ex *mqp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, /* use enum ibv_access_flags */ + uint32_t repeat_count, + uint16_t num_interleaved, + struct mlx5dv_mr_interleaved *data); }; struct mlx5dv_qp_ex *mlx5dv_qp_ex_from_ibv_qp_ex(struct ibv_qp_ex *qp); @@ -231,6 +254,17 @@ static inline void mlx5dv_wr_set_dc_addr(struct mlx5dv_qp_ex *mqp, mqp->wr_set_dc_addr(mqp, ah, remote_dctn, remote_dc_key); } +static inline void mlx5dv_wr_mr_interleaved(struct mlx5dv_qp_ex *mqp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, + uint32_t repeat_count, + uint16_t num_interleaved, + struct mlx5dv_mr_interleaved *data) +{ + mqp->wr_mr_interleaved(mqp, mkey, access_flags, repeat_count, + num_interleaved, data); +} + enum mlx5dv_flow_action_esp_mask { MLX5DV_FLOW_ACTION_ESP_MASK_FLAGS = 1 << 0, }; @@ -843,6 +877,22 @@ union mlx5_wqe_umr_inline_seg { struct mlx5_wqe_umr_klm_seg klm; }; +struct mlx5_wqe_umr_repeat_ent_seg { + __be16 stride; + __be16 byte_count; + __be32 memkey; + __be64 va; +}; + +struct mlx5_wqe_umr_repeat_block_seg { + __be32 byte_count; + __be32 op; + __be32 repeat_count; + __be16 reserved; + __be16 num_ent; + struct mlx5_wqe_umr_repeat_ent_seg entries[0]; +}; + enum { MLX5_WQE_MKEY_CONTEXT_FREE = 1 << 6 }; diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c index b2f749c..ecfe844 100644 --- a/providers/mlx5/qp.c +++ b/providers/mlx5/qp.c @@ -57,6 +57,7 @@ static const uint32_t mlx5_ib_opcode[] = { [IBV_WR_BIND_MW] = MLX5_OPCODE_UMR, [IBV_WR_LOCAL_INV] = MLX5_OPCODE_UMR, [IBV_WR_TSO] = MLX5_OPCODE_TSO, + [IBV_WR_DRIVER1] = MLX5_OPCODE_UMR, }; static void *get_recv_wqe(struct mlx5_qp *qp, int n) @@ -1245,6 +1246,8 @@ static inline void _common_wqe_init(struct ibv_qp_ex *ibqp, mqp->sq.wr_data[idx] = IBV_WC_BIND_MW; else if (ib_op == IBV_WR_LOCAL_INV) mqp->sq.wr_data[idx] = IBV_WC_LOCAL_INV; + else if (ib_op == IBV_WR_DRIVER1) + mqp->sq.wr_data[idx] = IBV_WC_DRIVER1; ctrl = mlx5_get_send_wqe(mqp, idx); *(uint32_t *)((void *)ctrl + 8) = 0; @@ -2044,6 +2047,156 @@ mlx5_send_wr_set_xrc_srqn(struct ibv_qp_ex *ibqp, uint32_t remote_srqn) mqp->cur_setters_cnt++; } +static uint8_t get_umr_mr_flags(uint32_t acc) +{ + return ((acc & IBV_ACCESS_REMOTE_ATOMIC ? + MLX5_WQE_MKEY_CONTEXT_ACCESS_FLAGS_ATOMIC : 0) | + (acc & IBV_ACCESS_REMOTE_WRITE ? + MLX5_WQE_MKEY_CONTEXT_ACCESS_FLAGS_REMOTE_WRITE : 0) | + (acc & IBV_ACCESS_REMOTE_READ ? + MLX5_WQE_MKEY_CONTEXT_ACCESS_FLAGS_REMOTE_READ : 0) | + (acc & IBV_ACCESS_LOCAL_WRITE ? + MLX5_WQE_MKEY_CONTEXT_ACCESS_FLAGS_LOCAL_WRITE : 0)); +} + +/* The strided block format is as the following: + * | repeat_block | entry_block | entry_block |...| entry_block | + * While the repeat entry contains details on the list of the block_entries. + */ +static void umr_strided_seg_create(struct mlx5_qp *qp, + uint32_t repeat_count, + uint16_t num_interleaved, + struct mlx5dv_mr_interleaved *data, + void *seg, + void *qend, int *wqe_size, int *xlat_size, + uint64_t *reglen) +{ + struct mlx5_wqe_umr_repeat_block_seg *rb = seg; + struct mlx5_wqe_umr_repeat_ent_seg *eb; + int byte_count = 0; + int tmp; + int i; + + rb->op = htobe32(0x400); + rb->reserved = 0; + rb->num_ent = htobe16(num_interleaved); + rb->repeat_count = htobe32(repeat_count); + eb = rb->entries; + + /* + * ------------------------------------------------------------ + * | repeat_block | entry_block | entry_block |...| entry_block + * ------------------------------------------------------------ + */ + for (i = 0; i < num_interleaved; i++, eb++) { + if (unlikely(eb == qend)) + eb = mlx5_get_send_wqe(qp, 0); + + byte_count += data[i].bytes_count; + eb->va = htobe64(data[i].addr); + eb->byte_count = htobe16(data[i].bytes_count); + eb->stride = htobe16(data[i].bytes_count + data[i].bytes_skip); + eb->memkey = htobe32(data[i].lkey); + } + + rb->byte_count = htobe32(byte_count); + *reglen = byte_count * repeat_count; + + tmp = align(num_interleaved + 1, 4) - num_interleaved - 1; + memset(eb, 0, tmp * sizeof(*eb)); + + *wqe_size = align(sizeof(*rb) + sizeof(*eb) * num_interleaved, 64); + *xlat_size = (num_interleaved + 1) * sizeof(*eb); +} + +static void mlx5_send_wr_mr_interleaved(struct mlx5dv_qp_ex *dv_qp, + struct mlx5dv_mkey *dv_mkey, + uint32_t access_flags, + uint32_t repeat_count, + uint16_t num_interleaved, + struct mlx5dv_mr_interleaved *data) +{ + struct mlx5_qp *mqp = mqp_from_mlx5dv_qp_ex(dv_qp); + struct ibv_qp_ex *ibqp = &mqp->verbs_qp.qp_ex; + struct mlx5_wqe_umr_ctrl_seg *umr_ctrl_seg; + struct mlx5_wqe_mkey_context_seg *mk; + struct mlx5_mkey *mkey = container_of(dv_mkey, struct mlx5_mkey, + dv_mkey); + int xlat_size; + int size; + uint64_t reglen = 0; + void *qend = mqp->sq.qend; + void *seg; + uint16_t max_entries; + + if (unlikely(!(ibqp->wr_flags & IBV_SEND_INLINE))) { + mqp->err = EOPNOTSUPP; + return; + } + + max_entries = min_t(size_t, + (mqp->max_inline_data + sizeof(struct mlx5_wqe_inl_data_seg)) / + sizeof(struct mlx5_wqe_umr_repeat_ent_seg) - 1, + mkey->num_desc); + + if (unlikely(num_interleaved > max_entries)) { + mqp->err = ENOMEM; + return; + } + + if (unlikely(!check_comp_mask(access_flags, + IBV_ACCESS_LOCAL_WRITE | + IBV_ACCESS_REMOTE_WRITE | + IBV_ACCESS_REMOTE_READ | + IBV_ACCESS_REMOTE_ATOMIC))) { + mqp->err = EINVAL; + return; + } + + _common_wqe_init(ibqp, IBV_WR_DRIVER1); + mqp->cur_size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + mqp->cur_ctrl->imm = htobe32(dv_mkey->lkey); + seg = umr_ctrl_seg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + + memset(umr_ctrl_seg, 0, sizeof(*umr_ctrl_seg)); + umr_ctrl_seg->flags = MLX5_WQE_UMR_CTRL_FLAG_INLINE; + umr_ctrl_seg->mkey_mask = htobe64(MLX5_WQE_UMR_CTRL_MKEY_MASK_LEN | + MLX5_WQE_UMR_CTRL_MKEY_MASK_ACCESS_LOCAL_WRITE | + MLX5_WQE_UMR_CTRL_MKEY_MASK_ACCESS_REMOTE_READ | + MLX5_WQE_UMR_CTRL_MKEY_MASK_ACCESS_REMOTE_WRITE | + MLX5_WQE_UMR_CTRL_MKEY_MASK_ACCESS_ATOMIC | + MLX5_WQE_UMR_CTRL_MKEY_MASK_FREE); + + seg += sizeof(struct mlx5_wqe_umr_ctrl_seg); + mqp->cur_size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16; + + if (unlikely(seg == qend)) + seg = mlx5_get_send_wqe(mqp, 0); + + mk = seg; + memset(mk, 0, sizeof(*mk)); + mk->access_flags = get_umr_mr_flags(access_flags); + mk->qpn_mkey = htobe32(0xffffff00 | (dv_mkey->lkey & 0xff)); + + seg += sizeof(*mk); + mqp->cur_size += (sizeof(*mk) / 16); + + if (unlikely(seg == qend)) + seg = mlx5_get_send_wqe(mqp, 0); + + umr_strided_seg_create(mqp, repeat_count, num_interleaved, data, + seg, qend, &size, &xlat_size, ®len); + mk->len = htobe64(reglen); + umr_ctrl_seg->klm_octowords = htobe16(align(xlat_size, 64) / 16); + mqp->cur_size += size / 16; + + mqp->fm_cache = MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE; + mqp->nreq++; + mqp->inl_wqe = 1; + + _common_wqe_finilize(mqp); +} + static void mlx5_send_wr_set_dc_addr(struct mlx5dv_qp_ex *dv_qp, struct ibv_ah *ah, uint32_t remote_dctn, @@ -2164,6 +2317,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, struct ibv_qp_ex *ibqp = &mqp->verbs_qp.qp_ex; uint64_t ops = attr->send_ops_flags; struct mlx5dv_qp_ex *dv_qp; + uint64_t mlx5_ops = 0; ibqp->wr_start = mlx5_send_wr_start; ibqp->wr_complete = mlx5_send_wr_complete; @@ -2174,6 +2328,10 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, ops & IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD)) return EOPNOTSUPP; + if (mlx5_attr && + mlx5_attr->comp_mask & MLX5DV_QP_INIT_ATTR_MASK_SEND_OPS_FLAGS) + mlx5_ops = mlx5_attr->send_ops_flags; + /* Set all supported micro-functions regardless user request */ switch (attr->qp_type) { case IBV_QPT_RC: @@ -2182,10 +2340,20 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, fill_wr_builders_rc_xrc_dc(ibqp); fill_wr_setters_rc_uc(ibqp); + + if (mlx5_ops) { + if (!check_comp_mask(mlx5_ops, + MLX5DV_QP_EX_WITH_MR_INTERLEAVED)) + return EOPNOTSUPP; + + dv_qp = &mqp->dv_qp; + dv_qp->wr_mr_interleaved = mlx5_send_wr_mr_interleaved; + } + break; case IBV_QPT_UC: - if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_UC) + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_UC || mlx5_ops) return EOPNOTSUPP; fill_wr_builders_uc(ibqp); @@ -2193,7 +2361,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, break; case IBV_QPT_XRC_SEND: - if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_XRC) + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_XRC || mlx5_ops) return EOPNOTSUPP; fill_wr_builders_rc_xrc_dc(ibqp); @@ -2202,7 +2370,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, break; case IBV_QPT_UD: - if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_UD) + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_UD || mlx5_ops) return EOPNOTSUPP; if (mqp->flags & MLX5_QP_FLAGS_USE_UNDERLAY) @@ -2214,7 +2382,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, break; case IBV_QPT_RAW_PACKET: - if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_RAW_PACKET) + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_RAW_PACKET || mlx5_ops) return EOPNOTSUPP; fill_wr_builders_eth(ibqp); @@ -2228,7 +2396,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, mlx5_attr->dc_init_attr.dc_type == MLX5DV_DCTYPE_DCI)) return EOPNOTSUPP; - if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_DCI) + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_DCI || mlx5_ops) return EOPNOTSUPP; fill_wr_builders_rc_xrc_dc(ibqp); diff --git a/providers/mlx5/verbs.c b/providers/mlx5/verbs.c index 839f43c..136c0d2 100644 --- a/providers/mlx5/verbs.c +++ b/providers/mlx5/verbs.c @@ -1131,7 +1131,8 @@ int mlx5_destroy_srq(struct ibv_srq *srq) static int _sq_overhead(struct mlx5_qp *qp, enum ibv_qp_type qp_type, - uint64_t ops) + uint64_t ops, + uint64_t mlx5_ops) { size_t size = sizeof(struct mlx5_wqe_ctrl_seg); size_t rdma_size = 0; @@ -1151,7 +1152,8 @@ static int _sq_overhead(struct mlx5_qp *qp, sizeof(struct mlx5_wqe_raddr_seg) + sizeof(struct mlx5_wqe_atomic_seg); - if (ops & (IBV_QP_EX_WITH_BIND_MW | IBV_QP_EX_WITH_LOCAL_INV)) + if (ops & (IBV_QP_EX_WITH_BIND_MW | IBV_QP_EX_WITH_LOCAL_INV) || + (mlx5_ops & MLX5DV_QP_EX_WITH_MR_INTERLEAVED)) mw_size = sizeof(struct mlx5_wqe_ctrl_seg) + sizeof(struct mlx5_wqe_umr_ctrl_seg) + sizeof(struct mlx5_wqe_mkey_context_seg) + @@ -1195,9 +1197,11 @@ static int _sq_overhead(struct mlx5_qp *qp, return size; } -static int sq_overhead(struct mlx5_qp *qp, struct ibv_qp_init_attr_ex *attr) +static int sq_overhead(struct mlx5_qp *qp, struct ibv_qp_init_attr_ex *attr, + struct mlx5dv_qp_init_attr *mlx5_qp_attr) { uint64_t ops; + uint64_t mlx5_ops = 0; if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS) { ops = attr->send_ops_flags; @@ -1236,11 +1240,17 @@ static int sq_overhead(struct mlx5_qp *qp, struct ibv_qp_init_attr_ex *attr) } } - return _sq_overhead(qp, attr->qp_type, ops); + + if (mlx5_qp_attr && + mlx5_qp_attr->comp_mask & MLX5DV_QP_INIT_ATTR_MASK_SEND_OPS_FLAGS) + mlx5_ops = mlx5_qp_attr->send_ops_flags; + + return _sq_overhead(qp, attr->qp_type, ops, mlx5_ops); } static int mlx5_calc_send_wqe(struct mlx5_context *ctx, struct ibv_qp_init_attr_ex *attr, + struct mlx5dv_qp_init_attr *mlx5_qp_attr, struct mlx5_qp *qp) { int size; @@ -1248,7 +1258,7 @@ static int mlx5_calc_send_wqe(struct mlx5_context *ctx, int max_gather; int tot_size; - size = sq_overhead(qp, attr); + size = sq_overhead(qp, attr, mlx5_qp_attr); if (size < 0) return size; @@ -1301,6 +1311,7 @@ static int mlx5_calc_rcv_wqe(struct mlx5_context *ctx, static int mlx5_calc_sq_size(struct mlx5_context *ctx, struct ibv_qp_init_attr_ex *attr, + struct mlx5dv_qp_init_attr *mlx5_qp_attr, struct mlx5_qp *qp) { int wqe_size; @@ -1310,7 +1321,7 @@ static int mlx5_calc_sq_size(struct mlx5_context *ctx, if (!attr->cap.max_send_wr) return 0; - wqe_size = mlx5_calc_send_wqe(ctx, attr, qp); + wqe_size = mlx5_calc_send_wqe(ctx, attr, mlx5_qp_attr, qp); if (wqe_size < 0) { mlx5_dbg(fp, MLX5_DBG_QP, "\n"); return wqe_size; @@ -1321,7 +1332,7 @@ static int mlx5_calc_sq_size(struct mlx5_context *ctx, return -EINVAL; } - qp->max_inline_data = wqe_size - sq_overhead(qp, attr) - + qp->max_inline_data = wqe_size - sq_overhead(qp, attr, mlx5_qp_attr) - sizeof(struct mlx5_wqe_inl_data_seg); attr->cap.max_inline_data = qp->max_inline_data; @@ -1441,12 +1452,13 @@ static int mlx5_calc_rq_size(struct mlx5_context *ctx, static int mlx5_calc_wq_size(struct mlx5_context *ctx, struct ibv_qp_init_attr_ex *attr, + struct mlx5dv_qp_init_attr *mlx5_qp_attr, struct mlx5_qp *qp) { int ret; int result; - ret = mlx5_calc_sq_size(ctx, attr, qp); + ret = mlx5_calc_sq_size(ctx, attr, mlx5_qp_attr, qp); if (ret < 0) return ret; @@ -1677,7 +1689,8 @@ enum { enum { MLX5_DV_CREATE_QP_SUP_COMP_MASK = MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS | - MLX5DV_QP_INIT_ATTR_MASK_DC + MLX5DV_QP_INIT_ATTR_MASK_DC | + MLX5DV_QP_INIT_ATTR_MASK_SEND_OPS_FLAGS }; enum { @@ -1912,7 +1925,9 @@ static struct ibv_qp *create_qp(struct ibv_context *context, if (ctx->atomic_cap == IBV_ATOMIC_HCA) qp->atomics_enabled = 1; - if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS) { + if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS || + (mlx5_qp_attr && + mlx5_qp_attr->comp_mask & MLX5DV_QP_INIT_ATTR_MASK_SEND_OPS_FLAGS)) { /* * Scatter2cqe, which is a data-path optimization, is disabled * since driver DC data-path doesn't support it. @@ -1939,7 +1954,7 @@ static struct ibv_qp *create_qp(struct ibv_context *context, if (!scatter_to_cqe_configured && use_scatter_to_cqe()) cmd.flags |= MLX5_QP_FLAG_SCATTER_CQE; - ret = mlx5_calc_wq_size(ctx, attr, qp); + ret = mlx5_calc_wq_size(ctx, attr, mlx5_qp_attr, qp); if (ret < 0) { errno = -ret; goto err; From patchwork Mon Mar 25 14:45:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10869429 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D53C91575 for ; Mon, 25 Mar 2019 14:46:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEF89293F7 for ; Mon, 25 Mar 2019 14:46:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B1ABF293FD; Mon, 25 Mar 2019 14:46:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F2CBD293F7 for ; Mon, 25 Mar 2019 14:46:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726010AbfCYOqY (ORCPT ); Mon, 25 Mar 2019 10:46:24 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:59067 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729023AbfCYOqY (ORCPT ); Mon, 25 Mar 2019 10:46:24 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGsV027259; Mon, 25 Mar 2019 16:46:16 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2PEkGgs014140; Mon, 25 Mar 2019 16:46:16 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2PEkGn0014139; Mon, 25 Mar 2019 16:46:16 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, artemyko@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 4/4] mlx5: Introduce mlx5dv_wr_mr_list post send builder Date: Mon, 25 Mar 2019 16:45:51 +0200 Message-Id: <1553525151-14005-5-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> References: <1553525151-14005-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce mlx5dv_wr_mr_list() post send builder to be used for issuing a WR that may register a memory layout which is based on list of ibv_sge. Reviewed-by: Artemy Kovalyov Signed-off-by: Yishai Hadas --- providers/mlx5/man/mlx5dv_create_qp.3.md | 3 + providers/mlx5/man/mlx5dv_wr_post.3.md | 20 +++++++ providers/mlx5/mlx5dv.h | 15 +++++ providers/mlx5/qp.c | 97 +++++++++++++++++++++++++++----- providers/mlx5/verbs.c | 3 +- 5 files changed, 122 insertions(+), 16 deletions(-) diff --git a/providers/mlx5/man/mlx5dv_create_qp.3.md b/providers/mlx5/man/mlx5dv_create_qp.3.md index 74a2193..856c69a 100644 --- a/providers/mlx5/man/mlx5dv_create_qp.3.md +++ b/providers/mlx5/man/mlx5dv_create_qp.3.md @@ -104,6 +104,9 @@ struct mlx5dv_dc_init_attr { MLX5DV_QP_EX_WITH_MR_INTERLEAVED: Enables the mlx5dv_wr_mr_interleaved() work requset on this QP. + MLX5DV_QP_EX_WITH_MR_LIST: + Enables the mlx5dv_wr_mr_list() work requset on this QP. + # NOTES **mlx5dv_qp_ex_from_ibv_qp_ex()** is used to get *struct mlx5dv_qp_ex* for diff --git a/providers/mlx5/man/mlx5dv_wr_post.3.md b/providers/mlx5/man/mlx5dv_wr_post.3.md index 42e680c..0f7ff4e 100644 --- a/providers/mlx5/man/mlx5dv_wr_post.3.md +++ b/providers/mlx5/man/mlx5dv_wr_post.3.md @@ -36,6 +36,12 @@ static inline void mlx5dv_wr_mr_interleaved(struct mlx5dv_qp_ex *mqp, uint32_t repeat_count, uint16_t num_interleaved, struct mlx5dv_mr_interleaved *data); + +static inline void mlx5dv_wr_mr_list(struct mlx5dv_qp_ex *mqp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, /* use enum ibv_access_flags */ + uint16_t num_sges, + struct ibv_sge *sge); ``` # DESCRIPTION @@ -80,6 +86,20 @@ man for ibv_wr_post and mlx5dv_qp with its available builders and setters. In case *ibv_qp_ex->wr_flags* turns on IBV_SEND_SIGNALED, the reported WC opcode will be MLX5DV_WC_UMR. Unregister the *mkey* to enable another pattern registration should be done via ibv_post_send with IBV_WR_LOCAL_INV opcode. +: *mlx5dv_wr_mr_list()* + + registers a memory layout based on list of ibv_sge. + The layout of the memory pointed by the *mkey* after its registration will be based on the list of *sge* counted by *num_sges*. + Post a successful registration RDMA operations can use this *mkey*, the hardware will scatter the data according to the pattern. + The *mkey* should be used in a zero-based mode, the *addr* field in its *ibv_sge* is an offset in the total data. + + Current implementation requires the IBV_SEND_INLINE option to be on in *ibv_qp_ex->wr_flags* field. + To be able to have more than 4 *num_sge* entries, the QP should be created with a larger WQE size that may fit it. + This should be done using the *max_inline_data* attribute of *struct ibv_qp_cap* upon its creation. + + In case *ibv_qp_ex->wr_flags* turns on IBV_SEND_SIGNALED, the reported WC opcode will be MLX5DV_WC_UMR. + Unregister the *mkey* to enable other pattern registration should be done via ibv_post_send with IBV_WR_LOCAL_INV opcode. + ## QP Specific setters *DCI* QPs diff --git a/providers/mlx5/mlx5dv.h b/providers/mlx5/mlx5dv.h index c5aae57..8b88026 100644 --- a/providers/mlx5/mlx5dv.h +++ b/providers/mlx5/mlx5dv.h @@ -204,6 +204,7 @@ struct mlx5dv_dc_init_attr { enum mlx5dv_qp_create_send_ops_flags { MLX5DV_QP_EX_WITH_MR_INTERLEAVED = 1 << 0, + MLX5DV_QP_EX_WITH_MR_LIST = 1 << 1, }; struct mlx5dv_qp_init_attr { @@ -242,6 +243,11 @@ struct mlx5dv_qp_ex { uint32_t repeat_count, uint16_t num_interleaved, struct mlx5dv_mr_interleaved *data); + void (*wr_mr_list)(struct mlx5dv_qp_ex *mqp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, /* use enum ibv_access_flags */ + uint16_t num_sges, + struct ibv_sge *sge); }; struct mlx5dv_qp_ex *mlx5dv_qp_ex_from_ibv_qp_ex(struct ibv_qp_ex *qp); @@ -265,6 +271,15 @@ static inline void mlx5dv_wr_mr_interleaved(struct mlx5dv_qp_ex *mqp, num_interleaved, data); } +static inline void mlx5dv_wr_mr_list(struct mlx5dv_qp_ex *mqp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, + uint16_t num_sges, + struct ibv_sge *sge) +{ + mqp->wr_mr_list(mqp, mkey, access_flags, num_sges, sge); +} + enum mlx5dv_flow_action_esp_mask { MLX5DV_FLOW_ACTION_ESP_MASK_FLAGS = 1 << 0, }; diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c index ecfe844..7707c2f 100644 --- a/providers/mlx5/qp.c +++ b/providers/mlx5/qp.c @@ -2059,6 +2059,40 @@ static uint8_t get_umr_mr_flags(uint32_t acc) MLX5_WQE_MKEY_CONTEXT_ACCESS_FLAGS_LOCAL_WRITE : 0)); } +static int umr_sg_list_create(struct mlx5_qp *qp, + uint16_t num_sges, + struct ibv_sge *sge, + void *seg, + void *qend, int *size, int *xlat_size, + uint64_t *reglen) +{ + struct mlx5_wqe_data_seg *dseg; + int byte_count = 0; + int i; + size_t tmp; + + dseg = seg; + + for (i = 0; i < num_sges; i++, dseg++) { + if (unlikely(dseg == qend)) + dseg = mlx5_get_send_wqe(qp, 0); + + dseg->addr = htobe64(sge[i].addr); + dseg->lkey = htobe32(sge[i].lkey); + dseg->byte_count = htobe32(sge[i].length); + byte_count += sge[i].length; + } + + tmp = align(num_sges, 4) - num_sges; + memset(dseg, 0, tmp * sizeof(*dseg)); + + *size = align(num_sges * sizeof(*dseg), 64); + *reglen = byte_count; + *xlat_size = num_sges * sizeof(*dseg); + + return 0; +} + /* The strided block format is as the following: * | repeat_block | entry_block | entry_block |...| entry_block | * While the repeat entry contains details on the list of the block_entries. @@ -2109,12 +2143,13 @@ static void umr_strided_seg_create(struct mlx5_qp *qp, *xlat_size = (num_interleaved + 1) * sizeof(*eb); } -static void mlx5_send_wr_mr_interleaved(struct mlx5dv_qp_ex *dv_qp, - struct mlx5dv_mkey *dv_mkey, - uint32_t access_flags, - uint32_t repeat_count, - uint16_t num_interleaved, - struct mlx5dv_mr_interleaved *data) +static void mlx5_send_wr_mr(struct mlx5dv_qp_ex *dv_qp, + struct mlx5dv_mkey *dv_mkey, + uint32_t access_flags, + uint32_t repeat_count, + uint16_t num_entries, + struct mlx5dv_mr_interleaved *data, + struct ibv_sge *sge) { struct mlx5_qp *mqp = mqp_from_mlx5dv_qp_ex(dv_qp); struct ibv_qp_ex *ibqp = &mqp->verbs_qp.qp_ex; @@ -2134,12 +2169,17 @@ static void mlx5_send_wr_mr_interleaved(struct mlx5dv_qp_ex *dv_qp, return; } - max_entries = min_t(size_t, - (mqp->max_inline_data + sizeof(struct mlx5_wqe_inl_data_seg)) / - sizeof(struct mlx5_wqe_umr_repeat_ent_seg) - 1, - mkey->num_desc); - - if (unlikely(num_interleaved > max_entries)) { + max_entries = data ? + min_t(size_t, + (mqp->max_inline_data + sizeof(struct mlx5_wqe_inl_data_seg)) / + sizeof(struct mlx5_wqe_umr_repeat_ent_seg) - 1, + mkey->num_desc) : + min_t(size_t, + (mqp->max_inline_data + sizeof(struct mlx5_wqe_inl_data_seg)) / + sizeof(struct mlx5_wqe_data_seg), + mkey->num_desc); + + if (unlikely(num_entries > max_entries)) { mqp->err = ENOMEM; return; } @@ -2184,8 +2224,13 @@ static void mlx5_send_wr_mr_interleaved(struct mlx5dv_qp_ex *dv_qp, if (unlikely(seg == qend)) seg = mlx5_get_send_wqe(mqp, 0); - umr_strided_seg_create(mqp, repeat_count, num_interleaved, data, - seg, qend, &size, &xlat_size, ®len); + if (data) + umr_strided_seg_create(mqp, repeat_count, num_entries, data, + seg, qend, &size, &xlat_size, ®len); + else + umr_sg_list_create(mqp, num_entries, sge, seg, + qend, &size, &xlat_size, ®len); + mk->len = htobe64(reglen); umr_ctrl_seg->klm_octowords = htobe16(align(xlat_size, 64) / 16); mqp->cur_size += size / 16; @@ -2197,6 +2242,26 @@ static void mlx5_send_wr_mr_interleaved(struct mlx5dv_qp_ex *dv_qp, _common_wqe_finilize(mqp); } +static void mlx5_send_wr_mr_interleaved(struct mlx5dv_qp_ex *dv_qp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, + uint32_t repeat_count, + uint16_t num_interleaved, + struct mlx5dv_mr_interleaved *data) +{ + mlx5_send_wr_mr(dv_qp, mkey, access_flags, repeat_count, + num_interleaved, data, NULL); +} + +static inline void mlx5_send_wr_mr_list(struct mlx5dv_qp_ex *dv_qp, + struct mlx5dv_mkey *mkey, + uint32_t access_flags, + uint16_t num_sges, + struct ibv_sge *sge) +{ + mlx5_send_wr_mr(dv_qp, mkey, access_flags, 0, num_sges, NULL, sge); +} + static void mlx5_send_wr_set_dc_addr(struct mlx5dv_qp_ex *dv_qp, struct ibv_ah *ah, uint32_t remote_dctn, @@ -2343,11 +2408,13 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, if (mlx5_ops) { if (!check_comp_mask(mlx5_ops, - MLX5DV_QP_EX_WITH_MR_INTERLEAVED)) + MLX5DV_QP_EX_WITH_MR_INTERLEAVED | + MLX5DV_QP_EX_WITH_MR_LIST)) return EOPNOTSUPP; dv_qp = &mqp->dv_qp; dv_qp->wr_mr_interleaved = mlx5_send_wr_mr_interleaved; + dv_qp->wr_mr_list = mlx5_send_wr_mr_list; } break; diff --git a/providers/mlx5/verbs.c b/providers/mlx5/verbs.c index 136c0d2..831ea46 100644 --- a/providers/mlx5/verbs.c +++ b/providers/mlx5/verbs.c @@ -1153,7 +1153,8 @@ static int _sq_overhead(struct mlx5_qp *qp, sizeof(struct mlx5_wqe_atomic_seg); if (ops & (IBV_QP_EX_WITH_BIND_MW | IBV_QP_EX_WITH_LOCAL_INV) || - (mlx5_ops & MLX5DV_QP_EX_WITH_MR_INTERLEAVED)) + (mlx5_ops & (MLX5DV_QP_EX_WITH_MR_INTERLEAVED | + MLX5DV_QP_EX_WITH_MR_LIST))) mw_size = sizeof(struct mlx5_wqe_ctrl_seg) + sizeof(struct mlx5_wqe_umr_ctrl_seg) + sizeof(struct mlx5_wqe_mkey_context_seg) +