From patchwork Mon Mar 18 12:24:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10857549 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71D736C2 for ; Mon, 18 Mar 2019 12:25:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FCF229222 for ; Mon, 18 Mar 2019 12:25:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 416BC292C8; Mon, 18 Mar 2019 12:25:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 848FF29222 for ; Mon, 18 Mar 2019 12:25:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726202AbfCRMZz (ORCPT ); Mon, 18 Mar 2019 08:25:55 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:53298 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727130AbfCRMZy (ORCPT ); Mon, 18 Mar 2019 08:25:54 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPT0k019547; Mon, 18 Mar 2019 14:25:29 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPTmv004894; Mon, 18 Mar 2019 14:25:29 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2ICPTMo004893; Mon, 18 Mar 2019 14:25:29 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, guyle@mellanox.com, Alexr@mellanox.com, jgg@mellanox.com, majd@mellanox.com, Alex Rosenbaum Subject: [PATCH rdma-core 1/6] verbs: Introduce a new post send API Date: Mon, 18 Mar 2019 14:24:14 +0200 Message-Id: <1552911859-4073-2-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> References: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Alex Rosenbaum Introduce a new set of QP send operations which allow extensibility for new send opcodes. This concept lets to add vendor specific send opcodes or even vendor specific QP types easily. Users can require a set of supported QP send ops specified during QP creation. The new 'struct ibv_qp_ex' will expose support to the required post send ops. The post send ops API's are broken down into: 1. Create a new send WR entry on the QP's send queue buffer 2. Set WR attributes for specific fields depending on QP type and create opcode 3. Commit the open send buffer queued entries to HW and ring the doorbell. The application can 'create' multiple new WR entries on the QP's send queue buffer before ringing the doorbell Signed-off-by: Alex Rosenbaum Signed-off-by: Jason Gunthorpe Signed-off-by: Guy Levi Signed-off-by: Yishai Hadas --- debian/libibverbs1.symbols | 2 + libibverbs/CMakeLists.txt | 2 +- libibverbs/cmd.c | 14 +- libibverbs/driver.h | 8 +- libibverbs/libibverbs.map.in | 5 + libibverbs/man/CMakeLists.txt | 21 +++ libibverbs/man/ibv_create_qp_ex.3 | 26 ++- libibverbs/man/ibv_wr_post.3.md | 333 ++++++++++++++++++++++++++++++++++++++ libibverbs/verbs.c | 9 ++ libibverbs/verbs.h | 191 +++++++++++++++++++++- 10 files changed, 604 insertions(+), 7 deletions(-) create mode 100644 libibverbs/man/ibv_wr_post.3.md diff --git a/debian/libibverbs1.symbols b/debian/libibverbs1.symbols index 774862e..a077f9a 100644 --- a/debian/libibverbs1.symbols +++ b/debian/libibverbs1.symbols @@ -3,6 +3,7 @@ libibverbs.so.1 libibverbs1 #MINVER# IBVERBS_1.0@IBVERBS_1.0 1.1.6 IBVERBS_1.1@IBVERBS_1.1 1.1.6 IBVERBS_1.5@IBVERBS_1.5 20 + IBVERBS_1.6@IBVERBS_1.6 24 (symver)IBVERBS_PRIVATE_22 22 ibv_ack_async_event@IBVERBS_1.0 1.1.6 ibv_ack_async_event@IBVERBS_1.1 1.1.6 @@ -70,6 +71,7 @@ libibverbs.so.1 libibverbs1 #MINVER# ibv_open_device@IBVERBS_1.0 1.1.6 ibv_open_device@IBVERBS_1.1 1.1.6 ibv_port_state_str@IBVERBS_1.1 1.1.6 + ibv_qp_to_qp_ex@IBVERBS_1.6 24 ibv_query_device@IBVERBS_1.0 1.1.6 ibv_query_device@IBVERBS_1.1 1.1.6 ibv_query_gid@IBVERBS_1.0 1.1.6 diff --git a/libibverbs/CMakeLists.txt b/libibverbs/CMakeLists.txt index 0671977..76aa522 100644 --- a/libibverbs/CMakeLists.txt +++ b/libibverbs/CMakeLists.txt @@ -27,7 +27,7 @@ configure_file("libibverbs.map.in" rdma_library(ibverbs "${CMAKE_CURRENT_BINARY_DIR}/libibverbs.map" # See Documentation/versioning.md - 1 1.5.${PACKAGE_VERSION} + 1 1.6.${PACKAGE_VERSION} all_providers.c cmd.c cmd_ah.c diff --git a/libibverbs/cmd.c b/libibverbs/cmd.c index d976842..5291abf 100644 --- a/libibverbs/cmd.c +++ b/libibverbs/cmd.c @@ -806,7 +806,14 @@ int ibv_cmd_create_qp_ex2(struct ibv_context *context, struct verbs_xrcd *vxrcd = NULL; int err; - if (qp_attr->comp_mask >= IBV_QP_INIT_ATTR_RESERVED) + if (!check_comp_mask(qp_attr->comp_mask, + IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_XRCD | + IBV_QP_INIT_ATTR_CREATE_FLAGS | + IBV_QP_INIT_ATTR_MAX_TSO_HEADER | + IBV_QP_INIT_ATTR_IND_TABLE | + IBV_QP_INIT_ATTR_RX_HASH | + IBV_QP_INIT_ATTR_SEND_OPS_FLAGS)) return EINVAL; memset(&cmd->core_payload, 0, sizeof(cmd->core_payload)); @@ -850,7 +857,10 @@ int ibv_cmd_create_qp_ex(struct ibv_context *context, struct verbs_xrcd *vxrcd = NULL; int err; - if (attr_ex->comp_mask > (IBV_QP_INIT_ATTR_XRCD | IBV_QP_INIT_ATTR_PD)) + if (!check_comp_mask(attr_ex->comp_mask, + IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_XRCD | + IBV_QP_INIT_ATTR_SEND_OPS_FLAGS)) return ENOSYS; err = create_qp_ex_common(qp, attr_ex, vxrcd, diff --git a/libibverbs/driver.h b/libibverbs/driver.h index b9f648c..e4d624b 100644 --- a/libibverbs/driver.h +++ b/libibverbs/driver.h @@ -77,7 +77,7 @@ struct verbs_srq { enum verbs_qp_mask { VERBS_QP_XRCD = 1 << 0, - VERBS_QP_RESERVED = 1 << 1 + VERBS_QP_EX = 1 << 1, }; enum ibv_gid_type { @@ -101,10 +101,14 @@ static inline struct verbs_mr *verbs_get_mr(struct ibv_mr *mr) } struct verbs_qp { - struct ibv_qp qp; + union { + struct ibv_qp qp; + struct ibv_qp_ex qp_ex; + }; uint32_t comp_mask; struct verbs_xrcd *xrcd; }; +static_assert(offsetof(struct ibv_qp_ex, qp_base) == 0, "Invalid qp layout"); enum ibv_flow_action_type { IBV_FLOW_ACTION_UNSPECIFIED, diff --git a/libibverbs/libibverbs.map.in b/libibverbs/libibverbs.map.in index 4bffb1b..87a1b9f 100644 --- a/libibverbs/libibverbs.map.in +++ b/libibverbs/libibverbs.map.in @@ -111,6 +111,11 @@ IBVERBS_1.5 { ibv_get_pkey_index; } IBVERBS_1.1; +IBVERBS_1.6 { + global: + ibv_qp_to_qp_ex; +} IBVERBS_1.5; + /* If any symbols in this stanza change ABI then the entire staza gets a new symbol version. See the top level CMakeLists.txt for this setting. */ diff --git a/libibverbs/man/CMakeLists.txt b/libibverbs/man/CMakeLists.txt index 4d5abef..e1d5edf 100644 --- a/libibverbs/man/CMakeLists.txt +++ b/libibverbs/man/CMakeLists.txt @@ -68,6 +68,7 @@ rdma_man_pages( ibv_srq_pingpong.1 ibv_uc_pingpong.1 ibv_ud_pingpong.1 + ibv_wr_post.3.md ibv_xsrq_pingpong.1 ) rdma_alias_man_pages( @@ -101,4 +102,24 @@ rdma_alias_man_pages( ibv_rate_to_mbps.3 mbps_to_ibv_rate.3 ibv_rate_to_mult.3 mult_to_ibv_rate.3 ibv_reg_mr.3 ibv_dereg_mr.3 + ibv_wr_post.3 ibv_wr_abort.3 + ibv_wr_post.3 ibv_wr_complete.3 + ibv_wr_post.3 ibv_wr_start.3 + ibv_wr_post.3 ibv_wr_atomic_cmp_swp.3 + ibv_wr_post.3 ibv_wr_atomic_fetch_add.3 + ibv_wr_post.3 ibv_wr_bind_mw.3 + ibv_wr_post.3 ibv_wr_local_inv.3 + ibv_wr_post.3 ibv_wr_rdma_read.3 + ibv_wr_post.3 ibv_wr_rdma_write.3 + ibv_wr_post.3 ibv_wr_rdma_write_imm.3 + ibv_wr_post.3 ibv_wr_send.3 + ibv_wr_post.3 ibv_wr_send_imm.3 + ibv_wr_post.3 ibv_wr_send_inv.3 + ibv_wr_post.3 ibv_wr_send_tso.3 + ibv_wr_post.3 ibv_wr_set_inline_data.3 + ibv_wr_post.3 ibv_wr_set_inline_data_list.3 + ibv_wr_post.3 ibv_wr_set_sge.3 + ibv_wr_post.3 ibv_wr_set_sge_list.3 + ibv_wr_post.3 ibv_wr_set_ud_addr.3 + ibv_wr_post.3 ibv_wr_set_xrc_srqn.3 ) diff --git a/libibverbs/man/ibv_create_qp_ex.3 b/libibverbs/man/ibv_create_qp_ex.3 index 47e0d9e..277e9fa 100644 --- a/libibverbs/man/ibv_create_qp_ex.3 +++ b/libibverbs/man/ibv_create_qp_ex.3 @@ -39,6 +39,7 @@ uint16_t max_tso_header; /* Maximum TSO header size */ struct ibv_rwq_ind_table *rwq_ind_tbl; /* Indirection table to be associated with the QP */ struct ibv_rx_hash_conf rx_hash_conf; /* RX hash configuration to be used */ uint32_t source_qpn; /* Source QP number, creation flag IBV_QP_CREATE_SOURCE_QPN should be set, few NOTEs below */ +uint64_t send_ops_flags; /* Select which QP send ops will be defined in struct ibv_qp_ex. Use enum ibv_qp_create_send_ops_flags */ .in -8 }; .sp @@ -62,6 +63,7 @@ IBV_QP_CREATE_SOURCE_QPN = 1 << 10, /* The created QP will use th IBV_QP_CREATE_PCI_WRITE_END_PADDING = 1 << 11, /* Incoming packets will be padded to cacheline size */ .in -8 }; +.fi .nf struct ibv_rx_hash_conf { .in +8 @@ -72,7 +74,6 @@ uint64_t rx_hash_fields_mask; /* RX fields that should particip .in -8 }; .fi - .nf enum ibv_rx_hash_fields { .in +8 @@ -92,6 +93,23 @@ IBV_RX_HASH_INNER = (1UL << 31), .in -8 }; .fi +.nf +struct ibv_qp_create_send_ops_flags { +.in +8 +IBV_QP_EX_WITH_RDMA_WRITE = 1 << 0, +IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM = 1 << 1, +IBV_QP_EX_WITH_SEND = 1 << 2, +IBV_QP_EX_WITH_SEND_WITH_IMM = 1 << 3, +IBV_QP_EX_WITH_RDMA_READ = 1 << 4, +IBV_QP_EX_WITH_ATOMIC_CMP_AND_SWP = 1 << 5, +IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD = 1 << 6, +IBV_QP_EX_WITH_LOCAL_INV = 1 << 7, +IBV_QP_EX_WITH_BIND_MW = 1 << 8, +IBV_QP_EX_WITH_SEND_WITH_INV = 1 << 9, +IBV_QP_EX_WITH_TSO = 1 << 10, +.in -8 +}; +.fi .PP The function @@ -119,6 +137,12 @@ if the QP is to be associated with an SRQ. .PP The attribute source_qpn is supported only on UD QP, without flow steering RX should not be possible. .PP +Use +.B ibv_qp_to_qp_ex() +to get the +.I ibv_qp_ex +for accessing the send ops iterator interface, when QP create attr IBV_QP_INIT_ATTR_SEND_OPS_FLAGS is used. +.PP .B ibv_destroy_qp() fails if the QP is attached to a multicast group. .PP diff --git a/libibverbs/man/ibv_wr_post.3.md b/libibverbs/man/ibv_wr_post.3.md new file mode 100644 index 0000000..1fc1acb --- /dev/null +++ b/libibverbs/man/ibv_wr_post.3.md @@ -0,0 +1,333 @@ +--- +date: 2018-11-27 +footer: libibverbs +header: "Libibverbs Programmer's Manual" +layout: page +license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md' +section: 3 +title: IBV_WR API +--- + +# NAME + +ibv_wr_abort, ibv_wr_complete, ibv_wr_start - Manage regions allowed to post work + +ibv_wr_atomic_cmp_swp, ibv_wr_atomic_fetch_add - Post remote atomic operation work requests + +ibv_wr_bind_mw, ibv_wr_local_inv - Post work requests for memory windows + +ibv_wr_rdma_read, ibv_wr_rdma_write, ibv_wr_rdma_write_imm - Post RDMA work requests + +ibv_wr_send, ibv_wr_send_imm, ibv_wr_send_inv - Post send work requests + +ibv_wr_send_tso - Post segmentation offload work requests + +ibv_wr_set_inline_data, ibv_wr_set_inline_data_list - Attach inline data to the last work request + +ibv_wr_set_sge, ibv_wr_set_sge_list - Attach data to the last work request + +ibv_wr_set_ud_addr - Attach UD addressing info to the last work request + +ibv_wr_set_xrc_srqn - Attach an XRC SRQN to the last work request + +# SYNOPSIS + +```c +#include + +void ibv_wr_abort(struct ibv_qp_ex *qp); +int ibv_wr_complete(struct ibv_qp_ex *qp); +void ibv_wr_start(struct ibv_qp_ex *qp); + +void ibv_wr_atomic_cmp_swp(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, uint64_t compare, + uint64_t swap); +void ibv_wr_atomic_fetch_add(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, uint64_t add); + +void ibv_wr_bind_mw(struct ibv_qp_ex *qp, struct ibv_mw *mw, uint32_t rkey, + const struct ibv_mw_bind_info *bind_info); +void ibv_wr_local_inv(struct ibv_qp_ex *qp, uint32_t invalidate_rkey); + +void ibv_wr_rdma_read(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr); +void ibv_wr_rdma_write(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr); +void ibv_wr_rdma_write_imm(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, __be32 imm_data); + +void ibv_wr_send(struct ibv_qp_ex *qp); +void ibv_wr_send_imm(struct ibv_qp_ex *qp, __be32 imm_data); +void ibv_wr_send_inv(struct ibv_qp_ex *qp, uint32_t invalidate_rkey); +void ibv_wr_send_tso(struct ibv_qp_ex *qp, void *hdr, uint16_t hdr_sz, + uint16_t mss); + +void ibv_wr_set_inline_data(struct ibv_qp_ex *qp, void *addr, size_t length); +void ibv_wr_set_inline_data_list(struct ibv_qp_ex *qp, size_t num_buf, + const struct ibv_data_buf *buf_list); +void ibv_wr_set_sge(struct ibv_qp_ex *qp, uint32_t lkey, uint64_t addr, + uint32_t length); +void ibv_wr_set_sge_list(struct ibv_qp_ex *qp, size_t num_sge, + const struct ibv_sge *sg_list); + +void ibv_wr_set_ud_addr(struct ibv_qp_ex *qp, struct ibv_ah *ah, + uint32_t remote_qpn, uint32_t remote_qkey); +void ibv_wr_set_xrc_srqn(struct ibv_qp_ex *qp, uint32_t remote_srqn); +``` + +# DESCRIPTION + +The verbs work request API (ibv_wr_\*) allows efficient posting of work to a send +queue using function calls instead of the struct based *ibv_post_send()* +scheme. This approach is designed to minimize CPU branching and locking during +the posting process. + +This API is intended to be used to access additional functionality beyond +what is provided by *ibv_post_send()*. + +WRs batches of *ibv_post_send()* and this API WRs batches can interleave +together just if they are not posted within the critical region of each other. +(A critical region in this API formed by *ibv_wr_start()* and +*ibv_wr_complete()*/*ibv_wr_abort()*) + +# USAGE + +To use these APIs the QP must be created using ibv_create_qp_ex() which allows +setting the **IBV_QP_INIT_ATTR_SEND_OPS_FLAGS** in *comp_mask*. The +*send_ops_flags* should be set to the OR of the work request types that will +be posted to the QP. + +If the QP does not support all the requested work request types then QP +creation will fail. + +Posting work requests to the QP is done within the critical region formed by +*ibv_wr_start()* and *ibv_wr_complete()*/*ibv_wr_abort()* (see CONCURRENCY below). + +Each work request is created by calling a WR builder function (see the table +column WR builder below) to start creating the work request, followed by +allowed/required setter functions described below. + +The WR builder and setter combination can be called multiple times to +efficiently post multiple work requests within a single critical region. + +Each WR builder will use the *wr_id* member of *struct ibv_qp_ex* to set the +value to be returned in the completion. Some operations will also use the +*wr_flags* member to influence operation (see Flags below). These values +should be set before invoking the WR builder function. + +For example a simple send could be formed as follows: + +```C +qpx->wr_id = 1; +ibv_wr_send(qpx); +ibv_wr_set_sge(qpx, lkey, &data, sizeof(data)); +``` + +The section WORK REQUESTS describes the various WR builders and setters in +details. + +Posting work is completed by calling *ibv_wr_complete()* or *ibv_wr_abort()*. +No work is executed to the queue until *ibv_wr_complete()* returns +success. *ibv_wr_abort()* will discard all work prepared since *ibv_wr_start()*. + +# WORK REQUESTS + +Many of the operations match the opcodes available for *ibv_post_send()*. Each +operation has a WR builder function, a list of allowed setters, and a flag bit +to request the operation with *send_ops_flags* in *struct +ibv_qp_init_attr_ex* (see the EXAMPLE below). + +| Operation | WR builder | QP Type Supported | setters | +|----------------------|---------------------------|----------------------------------|----------| +| ATOMIC_CMP_AND_SWP | ibv_wr_atomic_cmp_swp() | RC, XRC_SEND | DATA, QP | +| ATOMIC_FETCH_AND_ADD | ibv_wr_atomic_fetch_add() | RC, XRC_SEND | DATA, QP | +| BIND_MW | ibv_wr_bind_mw() | UC, RC, XRC_SEND | NONE | +| LOCAL_INV | ibv_wr_local_inv() | UC, RC, XRC_SEND | NONE | +| RDMA_READ | ibv_wr_rdma_read() | RC, XRC_SEND | DATA, QP | +| RDMA_WRITE | ibv_wr_rdma_write() | UC, RC, XRC_SEND | DATA, QP | +| RDMA_WRITE_WITH_IMM | ibv_wr_rdma_write_imm() | UC, RC, XRC_SEND | DATA, QP | +| SEND | ibv_wr_send() | UD, UC, RC, XRC_SEND, RAW_PACKET | DATA, QP | +| SEND_WITH_IMM | ibv_wr_send_imm() | UD, UC, RC, SRC SEND | DATA, QP | +| SEND_WITH_INV | ibv_wr_send_inv() | UC, RC, XRC_SEND | DATA, QP | +| TSO | ibv_wr_send_tso() | UD, RAW_PACKET | DATA, QP | + + +## Atomic operations + +Atomic operations are only atomic so long as all writes to memory go only +through the same RDMA hardware. It is not atomic with writes performed by the +CPU, or by other RDMA hardware in the system. + +*ibv_wr_atomic_cmp_swp()* +: If the remote 64 bit memory location specified by *rkey* and *remote_addr* + equals *compare* then set it to *swap*. + +*ibv_wr_atomic_fetch_add()* +: Add *add* to the 64 bit memory location specified *rkey* and *remote_addr*. + +## Memory Windows + +Memory window type 2 operations (See man page for ibv_alloc_mw). + +*ibv_wr_bind_mw()* +: Bind a MW type 2 specified by **mw**, set a new **rkey** and set its + properties by **bind_info**. + +*ibv_wr_local_inv()* +: Invalidate a MW type 2 which is associated with **rkey**. + +## RDMA + +*ibv_wr_rdma_read()* +: Read from the remote memory location specified *rkey* and + *remote_addr*. The number of bytes to read, and the local location to + store the data, is determined by the DATA buffers set after this call. + +*ibv_wr_rdma_write()*, *ibv_wr_rdma_write_imm()* +: Write to the remote memory location specified *rkey* and + *remote_addr*. The number of bytes to read, and the local location to get + the data, is determined by the DATA buffers set after this call. + + The _imm version causes the remote side to get a IBV_WC_RECV_RDMA_WITH_IMM + containing the 32 bits of immediate data. + +## Message Send + +*ibv_wr_send()*, *ibv_wr_send_imm()* +: Send a message. The number of bytes to send, and the local location to get + the data, is determined by the DATA buffers set after this call. + + The _imm version causes the remote side to get a IBV_WC_RECV_RDMA_WITH_IMM + containing the 32 bits of immediate data. + +*ibv_wr_send_inv()* +: The data transfer is the same as for *ibv_wr_send()*, however the remote + side will invalidate the MR specified by *invalidate_rkey* before + delivering a completion. + +*ibv_wr_send_tso()* +: Produce multiple SEND messages using TCP Segmentation Offload. The SGE + points to a TCP Stream buffer which will be segmented into + MSS size SENDs. The hdr includes the entire network headers up to and + including the TCP header and is prefixed before each segment. + +## QP Specific setters + +Certain QP types require each post to be accompanied by additional setters, +these setters are mandatory for any operation listing a QP setter in the above +table. + +*UD* QPs +: *ibv_wr_set_ud_addr()* must be called to set the destination address of + the work. + +*XRC_SEND* QPs +: *ibv_wr_set_xrc_srqn()* must be called to set the destination SRQN field. + +## DATA transfer setters + +For work that requires to transfer data one of the following setters should +be called once after the WR builder: + +*ibv_wr_set_sge()* +: Transfer data to/from a single buffer given by the lkey, addr and + length. This is equivalent to *ibv_wr_set_sge_list()* with a single + element. + +*ibv_wr_set_sge_list()* +: Transfer data to/from a list of buffers, logically concatenated + together. Each buffer is specified by an element in an array of *struct + ibv_sge*. + +Inline setters will copy the send data during the setter and allows the caller +to immediately re-use the buffer. This behavior is identical to the +IBV_SEND_INLINE flag. Generally this copy is done in a way that optimizes +SEND latency and is suitable for small messages. The provider will limit the +amount of data it can support in a single operation. This limit is requested +in the *max_inline_data* member of *struct ibv_qp_init_attr*. Valid only +for SEND and RDMA_WRITE. + +*ibv_wr_set_inline_data()* +: Copy send data from a single buffer given by the addr and length. + This is equivalent to *ibv_wr_set_inline_data_list()* with a single + element. + +*ibv_wr_set_inline_data_list()* +: Copy send data from a list of buffers, logically concatenated + together. Each buffer is specified by an element in an array of *struct + ibv_inl_data*. + +## Flags + +A bit mask of flags may be specified in *wr_flags* to control the behavior of +the work request. + +**IBV_SEND_FENCE** +: Do not start this work request until prior work has completed. + +**IBV_SEND_IP_CSUM** +: Offload the IPv4 and TCP/UDP checksum calculation + +**IBV_SEND_SIGNALED** +: A completion will be generated in the completion queue for the operation. + +**IBV_SEND_SOLICTED** +: Set the solicted bit in the RDMA packet. This informs the other side to + generate a completion event upon receiving the RDMA operation. + +# CONCURRENCY + +The provider will provide locking to ensure that *ibv_wr_start()* and +*ibv_wr_complete()/abort()* form a per-QP critical section where no other +threads can enter. + +If an *ibv_td* is provided during QP creation then no locking will be perfomed +and it is up to the caller to ensure that only one thread can be within the +critical region at a time. + +# RETURN VALUE + +Applications should use this API in a way that does not create failures. The +individual APIs do not return a failure indication to avoid branching. + +If a failure is detected during operation, for instance due to an invalid +argument, then *ibv_wr_complete()* will return failure and the entire posting +will be aborted. + +# EXAMPLE + +```c +/* create RC QP type and specify the required send opcodes */ +qp_init_attr_ex.qp_type = IBV_QPT_RC; +qp_init_attr_ex.comp_mask |= IBV_QP_INIT_ATTR_SEND_OPS_FLAGS; +qp_init_attr_ex.send_ops_flags |= IBV_QP_EX_WITH_RDMA_WRITE; +qp_init_attr_ex.send_ops_flags |= IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM; + +ibv_qp *qp = ibv_create_qp_ex(ctx, qp_init_attr_ex); +ibv_qp_ex *qpx = ibv_qp_to_qp_ex(qp); + +ibv_wr_start(qpx); + +/* create 1st WRITE WR entry */ +qpx->wr_id = my_wr_id_1; +ibv_wr_rdma_write(qpx, rkey, remote_addr_1); +ibv_wr_set_sge(qpx, lkey, local_addr_1, length_1); + +/* create 2nd WRITE_WITH_IMM WR entry */ +qpx->wr_id = my_wr_id_2; +qpx->send_flags = IBV_SEND_SIGNALED; +ibv_wr_rdma_write_imm(qpx, rkey, remote_addr_2, htonl(0x1234)); +ibv_set_wr_sge(qpx, lkey, local_addr_2, length_2); + +/* Begin processing WRs */ +ret = ibv_wr_complete(qpx); +``` + +# SEE ALSO + +**ibv_post_send**(3), **ibv_create_qp_ex(3)**. + +# AUTHOR + +Jason Gunthorpe +Guy Levi diff --git a/libibverbs/verbs.c b/libibverbs/verbs.c index 188bccb..1766b9f 100644 --- a/libibverbs/verbs.c +++ b/libibverbs/verbs.c @@ -589,6 +589,15 @@ LATEST_SYMVER_FUNC(ibv_create_qp, 1_1, "IBVERBS_1.1", return qp; } +struct ibv_qp_ex *ibv_qp_to_qp_ex(struct ibv_qp *qp) +{ + struct verbs_qp *vqp = (struct verbs_qp *)qp; + + if (vqp->comp_mask & VERBS_QP_EX) + return &vqp->qp_ex; + return NULL; +} + LATEST_SYMVER_FUNC(ibv_query_qp, 1_1, "IBVERBS_1.1", int, struct ibv_qp *qp, struct ibv_qp_attr *attr, diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index 243cd8c..a2bae25 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -889,7 +889,7 @@ enum ibv_qp_init_attr_mask { IBV_QP_INIT_ATTR_MAX_TSO_HEADER = 1 << 3, IBV_QP_INIT_ATTR_IND_TABLE = 1 << 4, IBV_QP_INIT_ATTR_RX_HASH = 1 << 5, - IBV_QP_INIT_ATTR_RESERVED = 1 << 6 + IBV_QP_INIT_ATTR_SEND_OPS_FLAGS = 1 << 6, }; enum ibv_qp_create_flags { @@ -900,6 +900,20 @@ enum ibv_qp_create_flags { IBV_QP_CREATE_PCI_WRITE_END_PADDING = 1 << 11, }; +enum ibv_qp_create_send_ops_flags { + IBV_QP_EX_WITH_RDMA_WRITE = 1 << 0, + IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM = 1 << 1, + IBV_QP_EX_WITH_SEND = 1 << 2, + IBV_QP_EX_WITH_SEND_WITH_IMM = 1 << 3, + IBV_QP_EX_WITH_RDMA_READ = 1 << 4, + IBV_QP_EX_WITH_ATOMIC_CMP_AND_SWP = 1 << 5, + IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD = 1 << 6, + IBV_QP_EX_WITH_LOCAL_INV = 1 << 7, + IBV_QP_EX_WITH_BIND_MW = 1 << 8, + IBV_QP_EX_WITH_SEND_WITH_INV = 1 << 9, + IBV_QP_EX_WITH_TSO = 1 << 10, +}; + struct ibv_rx_hash_conf { /* enum ibv_rx_hash_function_flags */ uint8_t rx_hash_function; @@ -926,6 +940,8 @@ struct ibv_qp_init_attr_ex { struct ibv_rwq_ind_table *rwq_ind_tbl; struct ibv_rx_hash_conf rx_hash_conf; uint32_t source_qpn; + /* See enum ibv_qp_create_send_ops_flags */ + uint64_t send_ops_flags; }; enum ibv_qp_open_attr_mask { @@ -1051,6 +1067,11 @@ enum ibv_send_flags { IBV_SEND_IP_CSUM = 1 << 4 }; +struct ibv_data_buf { + void *addr; + size_t length; +}; + struct ibv_sge { uint64_t addr; uint32_t length; @@ -1206,6 +1227,174 @@ struct ibv_qp { uint32_t events_completed; }; +struct ibv_qp_ex { + struct ibv_qp qp_base; + uint64_t comp_mask; + + uint64_t wr_id; + /* bitmask from enum ibv_send_flags */ + unsigned int wr_flags; + + void (*wr_atomic_cmp_swp)(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, uint64_t compare, + uint64_t swap); + void (*wr_atomic_fetch_add)(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, uint64_t add); + void (*wr_bind_mw)(struct ibv_qp_ex *qp, struct ibv_mw *mw, + uint32_t rkey, + const struct ibv_mw_bind_info *bind_info); + void (*wr_local_inv)(struct ibv_qp_ex *qp, uint32_t invalidate_rkey); + void (*wr_rdma_read)(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr); + void (*wr_rdma_write)(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr); + void (*wr_rdma_write_imm)(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, __be32 imm_data); + + void (*wr_send)(struct ibv_qp_ex *qp); + void (*wr_send_imm)(struct ibv_qp_ex *qp, __be32 imm_data); + void (*wr_send_inv)(struct ibv_qp_ex *qp, uint32_t invalidate_rkey); + void (*wr_send_tso)(struct ibv_qp_ex *qp, void *hdr, uint16_t hdr_sz, + uint16_t mss); + + void (*wr_set_ud_addr)(struct ibv_qp_ex *qp, struct ibv_ah *ah, + uint32_t remote_qpn, uint32_t remote_qkey); + void (*wr_set_xrc_srqn)(struct ibv_qp_ex *qp, uint32_t remote_srqn); + + void (*wr_set_inline_data)(struct ibv_qp_ex *qp, void *addr, + size_t length); + void (*wr_set_inline_data_list)(struct ibv_qp_ex *qp, size_t num_buf, + const struct ibv_data_buf *buf_list); + void (*wr_set_sge)(struct ibv_qp_ex *qp, uint32_t lkey, uint64_t addr, + uint32_t length); + void (*wr_set_sge_list)(struct ibv_qp_ex *qp, size_t num_sge, + const struct ibv_sge *sg_list); + + void (*wr_start)(struct ibv_qp_ex *qp); + int (*wr_complete)(struct ibv_qp_ex *qp); + void (*wr_abort)(struct ibv_qp_ex *qp); +}; + +struct ibv_qp_ex *ibv_qp_to_qp_ex(struct ibv_qp *qp); + +static inline void ibv_wr_atomic_cmp_swp(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, uint64_t compare, + uint64_t swap) +{ + qp->wr_atomic_cmp_swp(qp, rkey, remote_addr, compare, swap); +} + +static inline void ibv_wr_atomic_fetch_add(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, uint64_t add) +{ + qp->wr_atomic_fetch_add(qp, rkey, remote_addr, add); +} + +static inline void ibv_wr_bind_mw(struct ibv_qp_ex *qp, struct ibv_mw *mw, + uint32_t rkey, + const struct ibv_mw_bind_info *bind_info) +{ + qp->wr_bind_mw(qp, mw, rkey, bind_info); +} + +static inline void ibv_wr_local_inv(struct ibv_qp_ex *qp, + uint32_t invalidate_rkey) +{ + qp->wr_local_inv(qp, invalidate_rkey); +} + +static inline void ibv_wr_rdma_read(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr) +{ + qp->wr_rdma_read(qp, rkey, remote_addr); +} + +static inline void ibv_wr_rdma_write(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr) +{ + qp->wr_rdma_write(qp, rkey, remote_addr); +} + +static inline void ibv_wr_rdma_write_imm(struct ibv_qp_ex *qp, uint32_t rkey, + uint64_t remote_addr, __be32 imm_data) +{ + qp->wr_rdma_write_imm(qp, rkey, remote_addr, imm_data); +} + +static inline void ibv_wr_send(struct ibv_qp_ex *qp) +{ + qp->wr_send(qp); +} + +static inline void ibv_wr_send_imm(struct ibv_qp_ex *qp, __be32 imm_data) +{ + qp->wr_send_imm(qp, imm_data); +} + +static inline void ibv_wr_send_inv(struct ibv_qp_ex *qp, + uint32_t invalidate_rkey) +{ + qp->wr_send_inv(qp, invalidate_rkey); +} + +static inline void ibv_wr_send_tso(struct ibv_qp_ex *qp, void *hdr, + uint16_t hdr_sz, uint16_t mss) +{ + qp->wr_send_tso(qp, hdr, hdr_sz, mss); +} + +static inline void ibv_wr_set_ud_addr(struct ibv_qp_ex *qp, struct ibv_ah *ah, + uint32_t remote_qpn, uint32_t remote_qkey) +{ + qp->wr_set_ud_addr(qp, ah, remote_qpn, remote_qkey); +} + +static inline void ibv_wr_set_xrc_srqn(struct ibv_qp_ex *qp, + uint32_t remote_srqn) +{ + qp->wr_set_xrc_srqn(qp, remote_srqn); +} + +static inline void ibv_wr_set_inline_data(struct ibv_qp_ex *qp, void *addr, + size_t length) +{ + qp->wr_set_inline_data(qp, addr, length); +} + +static inline void ibv_wr_set_inline_data_list(struct ibv_qp_ex *qp, + size_t num_buf, + const struct ibv_data_buf *buf_list) +{ + qp->wr_set_inline_data_list(qp, num_buf, buf_list); +} + +static inline void ibv_wr_set_sge(struct ibv_qp_ex *qp, uint32_t lkey, + uint64_t addr, uint32_t length) +{ + qp->wr_set_sge(qp, lkey, addr, length); +} + +static inline void ibv_wr_set_sge_list(struct ibv_qp_ex *qp, size_t num_sge, + const struct ibv_sge *sg_list) +{ + qp->wr_set_sge_list(qp, num_sge, sg_list); +} + +static inline void ibv_wr_start(struct ibv_qp_ex *qp) +{ + qp->wr_start(qp); +} + +static inline int ibv_wr_complete(struct ibv_qp_ex *qp) +{ + return qp->wr_complete(qp); +} + +static inline void ibv_wr_abort(struct ibv_qp_ex *qp) +{ + qp->wr_abort(qp); +} + struct ibv_comp_channel { struct ibv_context *context; int fd; From patchwork Mon Mar 18 12:24:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10857545 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A8C11709 for ; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A25D29227 for ; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E93A29313; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 83F1E292FE for ; Mon, 18 Mar 2019 12:25:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727141AbfCRMZk (ORCPT ); Mon, 18 Mar 2019 08:25:40 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:53306 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726435AbfCRMZj (ORCPT ); Mon, 18 Mar 2019 08:25:39 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPTBl019550; Mon, 18 Mar 2019 14:25:29 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPTR2004898; Mon, 18 Mar 2019 14:25:29 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2ICPTRm004897; Mon, 18 Mar 2019 14:25:29 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, guyle@mellanox.com, Alexr@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 2/6] mlx5: Support new post send API Date: Mon, 18 Mar 2019 14:24:15 +0200 Message-Id: <1552911859-4073-3-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> References: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Guy Levi Add a support for the new post send API which introduced previously by libibverbs. It includes a support for all QP types and all operations except raw packet transport type which will be supported in an upcoming patch. It includes a support for SGE data setters w/o inline data setters which will be supported in an upcoming patch. Supporting this API lets to extend gracefully the send work with new mlx5 specific (or generic) operations and features, get it optimized and without an impact on the legacy. Signed-off-by: Guy Levi Signed-off-by: Yishai Hadas --- providers/mlx5/mlx5.h | 16 +- providers/mlx5/qp.c | 653 +++++++++++++++++++++++++++++++++++++++++++++++-- providers/mlx5/verbs.c | 160 ++++++++---- 3 files changed, 768 insertions(+), 61 deletions(-) diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h index 9129c0f..71220a2 100644 --- a/providers/mlx5/mlx5.h +++ b/providers/mlx5/mlx5.h @@ -505,7 +505,6 @@ struct mlx5_qp { struct verbs_qp verbs_qp; struct ibv_qp *ibv_qp; struct mlx5_buf buf; - void *sq_start; int max_inline_data; int buf_size; /* For Raw Packet QP, use different buffers for the SQ and RQ */ @@ -513,8 +512,20 @@ struct mlx5_qp { int sq_buf_size; struct mlx5_bf *bf; + /* Start of new post send API specific fields */ + uint8_t cur_setters_cnt; + uint8_t fm_cache_rb; + int err; + int nreq; + uint32_t cur_size; + uint32_t cur_post_rb; + void *cur_data; + struct mlx5_wqe_ctrl_seg *cur_ctrl; + /* End of new post send API specific fields */ + uint8_t fm_cache; uint8_t sq_signal_bits; + void *sq_start; struct mlx5_wq sq; __be32 *db; @@ -916,6 +927,9 @@ int mlx5_advise_mr(struct ibv_pd *pd, uint32_t flags, struct ibv_sge *sg_list, uint32_t num_sges); +int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, + const struct ibv_qp_init_attr_ex *attr); + static inline void *mlx5_find_uidx(struct mlx5_context *ctx, uint32_t uidx) { int tind = uidx >> MLX5_UIDX_TABLE_SHIFT; diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c index 4054008..c933ee6 100644 --- a/providers/mlx5/qp.c +++ b/providers/mlx5/qp.c @@ -224,25 +224,47 @@ static void set_tm_seg(struct mlx5_wqe_tm_seg *tmseg, int op, tmseg->append_mask = htobe64(wr->tm.add.mask); } -static void set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg, - enum ibv_wr_opcode opcode, - uint64_t swap, - uint64_t compare_add) +static inline void _set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg, + enum ibv_wr_opcode opcode, + uint64_t swap, + uint64_t compare_add) + ALWAYS_INLINE; +static inline void _set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg, + enum ibv_wr_opcode opcode, + uint64_t swap, + uint64_t compare_add) { if (opcode == IBV_WR_ATOMIC_CMP_AND_SWP) { aseg->swap_add = htobe64(swap); - aseg->compare = htobe64(compare_add); + aseg->compare = htobe64(compare_add); } else { aseg->swap_add = htobe64(compare_add); } } +static void set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg, + enum ibv_wr_opcode opcode, + uint64_t swap, + uint64_t compare_add) +{ + _set_atomic_seg(aseg, opcode, swap, compare_add); +} + +static inline void _set_datagram_seg(struct mlx5_wqe_datagram_seg *dseg, + struct mlx5_wqe_av *av, + uint32_t remote_qpn, + uint32_t remote_qkey) +{ + memcpy(&dseg->av, av, sizeof(dseg->av)); + dseg->av.dqp_dct = htobe32(remote_qpn | MLX5_EXTENDED_UD_AV); + dseg->av.key.qkey.qkey = htobe32(remote_qkey); +} + static void set_datagram_seg(struct mlx5_wqe_datagram_seg *dseg, struct ibv_send_wr *wr) { - memcpy(&dseg->av, &to_mah(wr->wr.ud.ah)->av, sizeof dseg->av); - dseg->av.dqp_dct = htobe32(wr->wr.ud.remote_qpn | MLX5_EXTENDED_UD_AV); - dseg->av.key.qkey.qkey = htobe32(wr->wr.ud.remote_qkey); + _set_datagram_seg(dseg, &to_mah(wr->wr.ud.ah)->av, wr->wr.ud.remote_qpn, + wr->wr.ud.remote_qkey); } static void set_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg, @@ -453,7 +475,8 @@ static inline __be16 get_klm_octo(int nentries) } static void set_umr_data_seg(struct mlx5_qp *qp, enum ibv_mw_type type, - int32_t rkey, struct ibv_mw_bind_info *bind_info, + int32_t rkey, + const struct ibv_mw_bind_info *bind_info, uint32_t qpn, void **seg, int *size) { union { @@ -473,7 +496,8 @@ static void set_umr_data_seg(struct mlx5_qp *qp, enum ibv_mw_type type, } static void set_umr_mkey_seg(struct mlx5_qp *qp, enum ibv_mw_type type, - int32_t rkey, struct ibv_mw_bind_info *bind_info, + int32_t rkey, + const struct ibv_mw_bind_info *bind_info, uint32_t qpn, void **seg, int *size) { struct mlx5_wqe_mkey_context_seg *mkey = *seg; @@ -511,7 +535,8 @@ static void set_umr_mkey_seg(struct mlx5_qp *qp, enum ibv_mw_type type, } static inline void set_umr_control_seg(struct mlx5_qp *qp, enum ibv_mw_type type, - int32_t rkey, struct ibv_mw_bind_info *bind_info, + int32_t rkey, + const struct ibv_mw_bind_info *bind_info, uint32_t qpn, void **seg, int *size) { struct mlx5_wqe_umr_ctrl_seg *ctrl = *seg; @@ -548,7 +573,8 @@ static inline void set_umr_control_seg(struct mlx5_qp *qp, enum ibv_mw_type type } static inline int set_bind_wr(struct mlx5_qp *qp, enum ibv_mw_type type, - int32_t rkey, struct ibv_mw_bind_info *bind_info, + int32_t rkey, + const struct ibv_mw_bind_info *bind_info, uint32_t qpn, void **seg, int *size) { void *qend = qp->sq.qend; @@ -701,8 +727,7 @@ static inline int mlx5_post_send_underlay(struct mlx5_qp *qp, struct ibv_send_wr } static inline void post_send_db(struct mlx5_qp *qp, struct mlx5_bf *bf, - int nreq, int inl, int size, - uint8_t next_fence, void *ctrl) + int nreq, int inl, int size, void *ctrl) { struct mlx5_context *ctx; @@ -710,7 +735,6 @@ static inline void post_send_db(struct mlx5_qp *qp, struct mlx5_bf *bf, return; qp->sq.head += nreq; - qp->fm_cache = next_fence; /* * Make sure that descriptors are written before @@ -1087,7 +1111,8 @@ static inline int _mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, } out: - post_send_db(qp, bf, nreq, inl, size, next_fence, ctrl); + qp->fm_cache = next_fence; + post_send_db(qp, bf, nreq, inl, size, ctrl); mlx5_spin_unlock(&qp->sq.lock); @@ -1115,6 +1140,599 @@ int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, return _mlx5_post_send(ibqp, wr, bad_wr); } +enum { + WQE_REQ_SETTERS_UD_XRC = 2, +}; + +static void mlx5_send_wr_start(struct ibv_qp_ex *ibqp) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + mlx5_spin_lock(&mqp->sq.lock); + + mqp->cur_post_rb = mqp->sq.cur_post; + mqp->fm_cache_rb = mqp->fm_cache; + mqp->err = 0; + mqp->nreq = 0; +} + +static int mlx5_send_wr_complete(struct ibv_qp_ex *ibqp) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + int err = mqp->err; + + if (unlikely(err)) { + /* Rolling back */ + mqp->sq.cur_post = mqp->cur_post_rb; + mqp->fm_cache = mqp->fm_cache_rb; + goto out; + } + + post_send_db(mqp, mqp->bf, mqp->nreq, 0, mqp->cur_size, + mqp->cur_ctrl); + +out: + mlx5_spin_unlock(&mqp->sq.lock); + + return err; +} + +static void mlx5_send_wr_abort(struct ibv_qp_ex *ibqp) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + /* Rolling back */ + mqp->sq.cur_post = mqp->cur_post_rb; + mqp->fm_cache = mqp->fm_cache_rb; + + mlx5_spin_unlock(&mqp->sq.lock); +} + +static inline void _common_wqe_init(struct ibv_qp_ex *ibqp, + enum ibv_wr_opcode ib_op) + ALWAYS_INLINE; +static inline void _common_wqe_init(struct ibv_qp_ex *ibqp, + enum ibv_wr_opcode ib_op) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_ctrl_seg *ctrl; + uint8_t fence; + uint32_t idx; + + if (unlikely(mlx5_wq_overflow(&mqp->sq, mqp->nreq, to_mcq(ibqp->qp_base.send_cq)))) { + FILE *fp = to_mctx(((struct ibv_qp *)ibqp)->context)->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "Work queue overflow\n"); + + if (!mqp->err) + mqp->err = ENOMEM; + + return; + } + + idx = mqp->sq.cur_post & (mqp->sq.wqe_cnt - 1); + mqp->sq.wrid[idx] = ibqp->wr_id; + mqp->sq.wqe_head[idx] = mqp->sq.head + mqp->nreq; + if (ib_op == IBV_WR_BIND_MW) + mqp->sq.wr_data[idx] = IBV_WC_BIND_MW; + else if (ib_op == IBV_WR_LOCAL_INV) + mqp->sq.wr_data[idx] = IBV_WC_LOCAL_INV; + + ctrl = mlx5_get_send_wqe(mqp, idx); + *(uint32_t *)((void *)ctrl + 8) = 0; + + fence = (ibqp->wr_flags & IBV_SEND_FENCE) ? MLX5_WQE_CTRL_FENCE : + mqp->fm_cache; + mqp->fm_cache = 0; + + ctrl->fm_ce_se = + mqp->sq_signal_bits | fence | + (ibqp->wr_flags & IBV_SEND_SIGNALED ? + MLX5_WQE_CTRL_CQ_UPDATE : 0) | + (ibqp->wr_flags & IBV_SEND_SOLICITED ? + MLX5_WQE_CTRL_SOLICITED : 0); + + ctrl->opmod_idx_opcode = htobe32(((mqp->sq.cur_post & 0xffff) << 8) | + mlx5_ib_opcode[ib_op]); + + mqp->cur_ctrl = ctrl; +} + +static inline void _common_wqe_finilize(struct mlx5_qp *mqp) +{ + mqp->cur_ctrl->qpn_ds = htobe32(mqp->cur_size | (mqp->ibv_qp->qp_num << 8)); + + if (unlikely(mqp->wq_sig)) + mqp->cur_ctrl->signature = wq_sig(mqp->cur_ctrl); + +#ifdef MLX5_DEBUG + if (mlx5_debug_mask & MLX5_DBG_QP_SEND) { + int idx = mqp->sq.cur_post & (mqp->sq.wqe_cnt - 1); + FILE *fp = to_mctx(mqp->ibv_qp->context)->dbg_fp; + + dump_wqe(fp, idx, mqp->cur_size, mqp); + } +#endif + + mqp->sq.cur_post += DIV_ROUND_UP(mqp->cur_size, 4); +} + +static inline void _mlx5_send_wr_send(struct ibv_qp_ex *ibqp, + enum ibv_wr_opcode ib_op) + ALWAYS_INLINE; +static inline void _mlx5_send_wr_send(struct ibv_qp_ex *ibqp, + enum ibv_wr_opcode ib_op) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + size_t transport_seg_sz = 0; + + _common_wqe_init(ibqp, ib_op); + + if (ibqp->qp_base.qp_type == IBV_QPT_UD) + transport_seg_sz = sizeof(struct mlx5_wqe_datagram_seg); + else if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) + transport_seg_sz = sizeof(struct mlx5_wqe_xrc_seg); + + mqp->cur_data = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg) + + transport_seg_sz; + /* In UD, cur_data may overrun the SQ */ + if (unlikely(mqp->cur_data == mqp->sq.qend)) + mqp->cur_data = mlx5_get_send_wqe(mqp, 0); + + mqp->cur_size = (sizeof(struct mlx5_wqe_ctrl_seg) + transport_seg_sz) / 16; + mqp->nreq++; + + /* Relevant just for WQE construction which requires more than 1 setter */ + mqp->cur_setters_cnt = 0; +} + +static void mlx5_send_wr_send_other(struct ibv_qp_ex *ibqp) +{ + _mlx5_send_wr_send(ibqp, IBV_WR_SEND); +} + +static void mlx5_send_wr_send_imm(struct ibv_qp_ex *ibqp, __be32 imm_data) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_send(ibqp, IBV_WR_SEND_WITH_IMM); + + mqp->cur_ctrl->imm = imm_data; +} + +static void mlx5_send_wr_send_inv(struct ibv_qp_ex *ibqp, + uint32_t invalidate_rkey) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_send(ibqp, IBV_WR_SEND_WITH_INV); + + mqp->cur_ctrl->imm = htobe32(invalidate_rkey); +} + +static inline void _mlx5_send_wr_rdma(struct ibv_qp_ex *ibqp, + uint32_t rkey, + uint64_t remote_addr, + enum ibv_wr_opcode ib_op) + ALWAYS_INLINE; +static inline void _mlx5_send_wr_rdma(struct ibv_qp_ex *ibqp, + uint32_t rkey, + uint64_t remote_addr, + enum ibv_wr_opcode ib_op) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + size_t transport_seg_sz = 0; + void *raddr_seg; + + _common_wqe_init(ibqp, ib_op); + + if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) + transport_seg_sz = sizeof(struct mlx5_wqe_xrc_seg); + + raddr_seg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg) + + transport_seg_sz; + + set_raddr_seg(raddr_seg, remote_addr, rkey); + + mqp->cur_data = raddr_seg + sizeof(struct mlx5_wqe_raddr_seg); + mqp->cur_size = (sizeof(struct mlx5_wqe_ctrl_seg) + transport_seg_sz + + sizeof(struct mlx5_wqe_raddr_seg)) / 16; + mqp->nreq++; + + /* Relevant just for WQE construction which requires more than 1 setter */ + mqp->cur_setters_cnt = 0; +} + +static void mlx5_send_wr_rdma_write(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr) +{ + _mlx5_send_wr_rdma(ibqp, rkey, remote_addr, IBV_WR_RDMA_WRITE); +} + +static void mlx5_send_wr_rdma_write_imm(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr, __be32 imm_data) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_rdma(ibqp, rkey, remote_addr, IBV_WR_RDMA_WRITE_WITH_IMM); + + mqp->cur_ctrl->imm = imm_data; +} + +static void mlx5_send_wr_rdma_read(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr) +{ + _mlx5_send_wr_rdma(ibqp, rkey, remote_addr, IBV_WR_RDMA_READ); +} + +static inline void _mlx5_send_wr_atomic(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr, + uint64_t compare_add, + uint64_t swap, enum ibv_wr_opcode ib_op) + ALWAYS_INLINE; +static inline void _mlx5_send_wr_atomic(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr, + uint64_t compare_add, + uint64_t swap, enum ibv_wr_opcode ib_op) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + size_t transport_seg_sz = 0; + void *raddr_seg; + + _common_wqe_init(ibqp, ib_op); + + if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) + transport_seg_sz = sizeof(struct mlx5_wqe_xrc_seg); + + raddr_seg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg) + + transport_seg_sz; + + set_raddr_seg(raddr_seg, remote_addr, rkey); + + _set_atomic_seg((struct mlx5_wqe_atomic_seg *)(raddr_seg + sizeof(struct mlx5_wqe_raddr_seg)), + ib_op, swap, compare_add); + + mqp->cur_data = raddr_seg + sizeof(struct mlx5_wqe_raddr_seg) + + sizeof(struct mlx5_wqe_atomic_seg); + /* In XRC, cur_data may overrun the SQ */ + if (unlikely(mqp->cur_data == mqp->sq.qend)) + mqp->cur_data = mlx5_get_send_wqe(mqp, 0); + + mqp->cur_size = (sizeof(struct mlx5_wqe_ctrl_seg) + transport_seg_sz + + sizeof(struct mlx5_wqe_raddr_seg) + + sizeof(struct mlx5_wqe_atomic_seg)) / 16; + mqp->nreq++; + + /* Relevant just for WQE construction which requires more than 1 setter */ + mqp->cur_setters_cnt = 0; +} + +static void mlx5_send_wr_atomic_cmp_swp(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr, uint64_t compare, + uint64_t swap) +{ + _mlx5_send_wr_atomic(ibqp, rkey, remote_addr, compare, swap, + IBV_WR_ATOMIC_CMP_AND_SWP); +} + +static void mlx5_send_wr_atomic_fetch_add(struct ibv_qp_ex *ibqp, uint32_t rkey, + uint64_t remote_addr, uint64_t add) +{ + _mlx5_send_wr_atomic(ibqp, rkey, remote_addr, add, 0, + IBV_WR_ATOMIC_FETCH_AND_ADD); +} + +static inline void _build_umr_wqe(struct ibv_qp_ex *ibqp, uint32_t orig_rkey, + uint32_t new_rkey, + const struct ibv_mw_bind_info *bind_info, + enum ibv_wr_opcode ib_op) + ALWAYS_INLINE; +static inline void _build_umr_wqe(struct ibv_qp_ex *ibqp, uint32_t orig_rkey, + uint32_t new_rkey, + const struct ibv_mw_bind_info *bind_info, + enum ibv_wr_opcode ib_op) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + void *umr_seg; + int err = 0; + int size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + + _common_wqe_init(ibqp, ib_op); + + mqp->cur_ctrl->imm = htobe32(orig_rkey); + + umr_seg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + err = set_bind_wr(mqp, IBV_MW_TYPE_2, new_rkey, bind_info, + ((struct ibv_qp *)ibqp)->qp_num, &umr_seg, &size); + if (unlikely(err)) { + if (!mqp->err) + mqp->err = err; + + return; + } + + mqp->cur_size = size; + mqp->fm_cache = MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE; + mqp->nreq++; + _common_wqe_finilize(mqp); +} + +static void mlx5_send_wr_bind_mw(struct ibv_qp_ex *ibqp, struct ibv_mw *mw, + uint32_t rkey, + const struct ibv_mw_bind_info *bind_info) +{ + _build_umr_wqe(ibqp, mw->rkey, rkey, bind_info, IBV_WR_BIND_MW); +} + +static void mlx5_send_wr_local_inv(struct ibv_qp_ex *ibqp, + uint32_t invalidate_rkey) +{ + const struct ibv_mw_bind_info bind_info = {}; + + _build_umr_wqe(ibqp, invalidate_rkey, 0, &bind_info, IBV_WR_LOCAL_INV); +} + +static inline void +_mlx5_send_wr_set_sge(struct mlx5_qp *mqp, uint32_t lkey, uint64_t addr, + uint32_t length) +{ + struct mlx5_wqe_data_seg *dseg; + + if (unlikely(!length)) + return; + + dseg = mqp->cur_data; + dseg->byte_count = htobe32(length); + dseg->lkey = htobe32(lkey); + dseg->addr = htobe64(addr); + mqp->cur_size += sizeof(*dseg) / 16; +} + +static void +mlx5_send_wr_set_sge_rc_uc(struct ibv_qp_ex *ibqp, uint32_t lkey, + uint64_t addr, uint32_t length) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_sge(mqp, lkey, addr, length); + _common_wqe_finilize(mqp); +} + +static void +mlx5_send_wr_set_sge_ud_xrc(struct ibv_qp_ex *ibqp, uint32_t lkey, + uint64_t addr, uint32_t length) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_sge(mqp, lkey, addr, length); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + +static inline void +_mlx5_send_wr_set_sge_list(struct mlx5_qp *mqp, size_t num_sge, + const struct ibv_sge *sg_list) +{ + struct mlx5_wqe_data_seg *dseg = mqp->cur_data; + size_t i; + + if (unlikely(num_sge > mqp->sq.max_gs)) { + FILE *fp = to_mctx(mqp->ibv_qp->context)->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "Num SGEs %zu exceeds the maximum (%d)\n", + num_sge, mqp->sq.max_gs); + + if (!mqp->err) + mqp->err = ENOMEM; + + return; + } + + for (i = 0; i < num_sge; i++) { + if (unlikely(dseg == mqp->sq.qend)) + dseg = mlx5_get_send_wqe(mqp, 0); + + if (unlikely(!sg_list[i].length)) + continue; + + dseg->byte_count = htobe32(sg_list[i].length); + dseg->lkey = htobe32(sg_list[i].lkey); + dseg->addr = htobe64(sg_list[i].addr); + dseg++; + mqp->cur_size += (sizeof(*dseg) / 16); + } +} + +static void +mlx5_send_wr_set_sge_list_rc_uc(struct ibv_qp_ex *ibqp, size_t num_sge, + const struct ibv_sge *sg_list) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_sge_list(mqp, num_sge, sg_list); + _common_wqe_finilize(mqp); +} + +static void +mlx5_send_wr_set_sge_list_ud_xrc(struct ibv_qp_ex *ibqp, size_t num_sge, + const struct ibv_sge *sg_list) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_sge_list(mqp, num_sge, sg_list); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + +static void +mlx5_send_wr_set_ud_addr(struct ibv_qp_ex *ibqp, struct ibv_ah *ah, + uint32_t remote_qpn, uint32_t remote_qkey) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_datagram_seg *dseg = + (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + struct mlx5_ah *mah = to_mah(ah); + + _set_datagram_seg(dseg, &mah->av, remote_qpn, remote_qkey); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + +static void +mlx5_send_wr_set_xrc_srqn(struct ibv_qp_ex *ibqp, uint32_t remote_srqn) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_xrc_seg *xrc_seg = + (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + + xrc_seg->xrc_srqn = htobe32(remote_srqn); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + +enum { + MLX5_SUPPORTED_SEND_OPS_FLAGS_RC = + IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_SEND_WITH_INV | + IBV_QP_EX_WITH_SEND_WITH_IMM | + IBV_QP_EX_WITH_RDMA_WRITE | + IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM | + IBV_QP_EX_WITH_RDMA_READ | + IBV_QP_EX_WITH_ATOMIC_CMP_AND_SWP | + IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD | + IBV_QP_EX_WITH_LOCAL_INV | + IBV_QP_EX_WITH_BIND_MW, + MLX5_SUPPORTED_SEND_OPS_FLAGS_XRC = + MLX5_SUPPORTED_SEND_OPS_FLAGS_RC, + MLX5_SUPPORTED_SEND_OPS_FLAGS_UD = + IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_SEND_WITH_IMM, + MLX5_SUPPORTED_SEND_OPS_FLAGS_UC = + IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_SEND_WITH_INV | + IBV_QP_EX_WITH_SEND_WITH_IMM | + IBV_QP_EX_WITH_RDMA_WRITE | + IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM | + IBV_QP_EX_WITH_LOCAL_INV | + IBV_QP_EX_WITH_BIND_MW, +}; + +static void fill_wr_builders_rc_xrc(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_send = mlx5_send_wr_send_other; + ibqp->wr_send_imm = mlx5_send_wr_send_imm; + ibqp->wr_send_inv = mlx5_send_wr_send_inv; + ibqp->wr_rdma_write = mlx5_send_wr_rdma_write; + ibqp->wr_rdma_write_imm = mlx5_send_wr_rdma_write_imm; + ibqp->wr_rdma_read = mlx5_send_wr_rdma_read; + ibqp->wr_atomic_cmp_swp = mlx5_send_wr_atomic_cmp_swp; + ibqp->wr_atomic_fetch_add = mlx5_send_wr_atomic_fetch_add; + ibqp->wr_bind_mw = mlx5_send_wr_bind_mw; + ibqp->wr_local_inv = mlx5_send_wr_local_inv; +} + +static void fill_wr_builders_uc(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_send = mlx5_send_wr_send_other; + ibqp->wr_send_imm = mlx5_send_wr_send_imm; + ibqp->wr_send_inv = mlx5_send_wr_send_inv; + ibqp->wr_rdma_write = mlx5_send_wr_rdma_write; + ibqp->wr_rdma_write_imm = mlx5_send_wr_rdma_write_imm; + ibqp->wr_bind_mw = mlx5_send_wr_bind_mw; + ibqp->wr_local_inv = mlx5_send_wr_local_inv; +} + +static void fill_wr_builders_ud(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_send = mlx5_send_wr_send_other; + ibqp->wr_send_imm = mlx5_send_wr_send_imm; +} + +static void fill_wr_setters_rc_uc(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_set_sge = mlx5_send_wr_set_sge_rc_uc; + ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_rc_uc; +} + +static void fill_wr_setters_ud_xrc(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_set_sge = mlx5_send_wr_set_sge_ud_xrc; + ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_ud_xrc; +} + +int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, + const struct ibv_qp_init_attr_ex *attr) +{ + struct ibv_qp_ex *ibqp = &mqp->verbs_qp.qp_ex; + uint64_t ops = attr->send_ops_flags; + + ibqp->wr_start = mlx5_send_wr_start; + ibqp->wr_complete = mlx5_send_wr_complete; + ibqp->wr_abort = mlx5_send_wr_abort; + + if (!mqp->atomics_enabled && + (ops & IBV_QP_EX_WITH_ATOMIC_CMP_AND_SWP || + ops & IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD)) + return EOPNOTSUPP; + + /* Set all supported micro-functions regardless user request */ + switch (attr->qp_type) { + case IBV_QPT_RC: + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_RC) + return EOPNOTSUPP; + + fill_wr_builders_rc_xrc(ibqp); + fill_wr_setters_rc_uc(ibqp); + break; + + case IBV_QPT_UC: + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_UC) + return EOPNOTSUPP; + + fill_wr_builders_uc(ibqp); + fill_wr_setters_rc_uc(ibqp); + break; + + case IBV_QPT_XRC_SEND: + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_XRC) + return EOPNOTSUPP; + + fill_wr_builders_rc_xrc(ibqp); + fill_wr_setters_ud_xrc(ibqp); + ibqp->wr_set_xrc_srqn = mlx5_send_wr_set_xrc_srqn; + break; + + case IBV_QPT_UD: + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_UD) + return EOPNOTSUPP; + + if (mqp->flags & MLX5_QP_FLAGS_USE_UNDERLAY) + return EOPNOTSUPP; + + fill_wr_builders_ud(ibqp); + fill_wr_setters_ud_xrc(ibqp); + ibqp->wr_set_ud_addr = mlx5_send_wr_set_ud_addr; + break; + + default: + return EOPNOTSUPP; + } + + return 0; +} + int mlx5_bind_mw(struct ibv_qp *qp, struct ibv_mw *mw, struct ibv_mw_bind *mw_bind) { @@ -1522,7 +2140,8 @@ int mlx5_post_srq_ops(struct ibv_srq *ibsrq, struct ibv_ops_wr *wr, } out: - post_send_db(qp, bf, nreq, 0, size, 0, ctrl); + qp->fm_cache = 0; + post_send_db(qp, bf, nreq, 0, size, ctrl); mlx5_spin_unlock(&srq->lock); diff --git a/providers/mlx5/verbs.c b/providers/mlx5/verbs.c index 71d9ca6..870279e 100644 --- a/providers/mlx5/verbs.c +++ b/providers/mlx5/verbs.c @@ -1129,56 +1129,63 @@ int mlx5_destroy_srq(struct ibv_srq *srq) return 0; } -static int sq_overhead(struct mlx5_qp *qp, enum ibv_qp_type qp_type) -{ - size_t size = 0; - size_t mw_bind_size = - sizeof(struct mlx5_wqe_umr_ctrl_seg) + - sizeof(struct mlx5_wqe_mkey_context_seg) + - max_t(size_t, sizeof(struct mlx5_wqe_umr_klm_seg), 64); - +static int _sq_overhead(struct mlx5_qp *qp, + enum ibv_qp_type qp_type, + uint64_t ops) +{ + size_t size = sizeof(struct mlx5_wqe_ctrl_seg); + size_t rdma_size = 0; + size_t atomic_size = 0; + size_t mw_size = 0; + + /* Operation overhead */ + if (ops & (IBV_QP_EX_WITH_RDMA_WRITE | + IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM | + IBV_QP_EX_WITH_RDMA_READ)) + rdma_size = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_raddr_seg); + + if (ops & (IBV_QP_EX_WITH_ATOMIC_CMP_AND_SWP | + IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD)) + atomic_size = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_raddr_seg) + + sizeof(struct mlx5_wqe_atomic_seg); + + if (ops & (IBV_QP_EX_WITH_BIND_MW | IBV_QP_EX_WITH_LOCAL_INV)) + mw_size = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_umr_ctrl_seg) + + sizeof(struct mlx5_wqe_mkey_context_seg) + + max_t(size_t, sizeof(struct mlx5_wqe_umr_klm_seg), 64); + + size = max_t(size_t, size, rdma_size); + size = max_t(size_t, size, atomic_size); + size = max_t(size_t, size, mw_size); + + /* Transport overhead */ switch (qp_type) { case IBV_QPT_DRIVER: if (qp->dc_type != MLX5DV_DCTYPE_DCI) return -EINVAL; - size += sizeof(struct mlx5_wqe_datagram_seg); SWITCH_FALLTHROUGH; - case IBV_QPT_RC: - size += sizeof(struct mlx5_wqe_ctrl_seg) + - max(sizeof(struct mlx5_wqe_atomic_seg) + - sizeof(struct mlx5_wqe_raddr_seg), - mw_bind_size); - break; - - case IBV_QPT_UC: - size = sizeof(struct mlx5_wqe_ctrl_seg) + - max(sizeof(struct mlx5_wqe_raddr_seg), - mw_bind_size); - break; - case IBV_QPT_UD: - size = sizeof(struct mlx5_wqe_ctrl_seg) + - sizeof(struct mlx5_wqe_datagram_seg); - + size += sizeof(struct mlx5_wqe_datagram_seg); if (qp->flags & MLX5_QP_FLAGS_USE_UNDERLAY) - size += (sizeof(struct mlx5_wqe_eth_seg) + sizeof(struct mlx5_wqe_eth_pad)); - + size += sizeof(struct mlx5_wqe_eth_seg) + + sizeof(struct mlx5_wqe_eth_pad); break; - case IBV_QPT_XRC_SEND: - size = sizeof(struct mlx5_wqe_ctrl_seg) + mw_bind_size; - SWITCH_FALLTHROUGH; - case IBV_QPT_XRC_RECV: - size = max(size, sizeof(struct mlx5_wqe_ctrl_seg) + - sizeof(struct mlx5_wqe_xrc_seg) + - sizeof(struct mlx5_wqe_raddr_seg)); + case IBV_QPT_XRC_SEND: + size += sizeof(struct mlx5_wqe_xrc_seg); break; case IBV_QPT_RAW_PACKET: - size = sizeof(struct mlx5_wqe_ctrl_seg) + - sizeof(struct mlx5_wqe_eth_seg); + size += sizeof(struct mlx5_wqe_eth_seg); + break; + + case IBV_QPT_RC: + case IBV_QPT_UC: break; default: @@ -1188,6 +1195,50 @@ static int sq_overhead(struct mlx5_qp *qp, enum ibv_qp_type qp_type) return size; } +static int sq_overhead(struct mlx5_qp *qp, struct ibv_qp_init_attr_ex *attr) +{ + uint64_t ops; + + if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS) { + ops = attr->send_ops_flags; + } else { + switch (attr->qp_type) { + case IBV_QPT_RC: + case IBV_QPT_UC: + case IBV_QPT_DRIVER: + case IBV_QPT_XRC_RECV: + case IBV_QPT_XRC_SEND: + ops = IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_SEND_WITH_INV | + IBV_QP_EX_WITH_SEND_WITH_IMM | + IBV_QP_EX_WITH_RDMA_WRITE | + IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM | + IBV_QP_EX_WITH_RDMA_READ | + IBV_QP_EX_WITH_ATOMIC_CMP_AND_SWP | + IBV_QP_EX_WITH_ATOMIC_FETCH_AND_ADD | + IBV_QP_EX_WITH_LOCAL_INV | + IBV_QP_EX_WITH_BIND_MW; + break; + + case IBV_QPT_UD: + ops = IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_SEND_WITH_IMM | + IBV_QP_EX_WITH_TSO; + break; + + case IBV_QPT_RAW_PACKET: + ops = IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_TSO; + break; + + default: + return -EINVAL; + } + } + + return _sq_overhead(qp, attr->qp_type, ops); +} + static int mlx5_calc_send_wqe(struct mlx5_context *ctx, struct ibv_qp_init_attr_ex *attr, struct mlx5_qp *qp) @@ -1197,7 +1248,7 @@ static int mlx5_calc_send_wqe(struct mlx5_context *ctx, int max_gather; int tot_size; - size = sq_overhead(qp, attr->qp_type); + size = sq_overhead(qp, attr); if (size < 0) return size; @@ -1270,7 +1321,7 @@ static int mlx5_calc_sq_size(struct mlx5_context *ctx, return -EINVAL; } - qp->max_inline_data = wqe_size - sq_overhead(qp, attr->qp_type) - + qp->max_inline_data = wqe_size - sq_overhead(qp, attr) - sizeof(struct mlx5_wqe_inl_data_seg); attr->cap.max_inline_data = qp->max_inline_data; @@ -1620,7 +1671,8 @@ enum { IBV_QP_INIT_ATTR_CREATE_FLAGS | IBV_QP_INIT_ATTR_MAX_TSO_HEADER | IBV_QP_INIT_ATTR_IND_TABLE | - IBV_QP_INIT_ATTR_RX_HASH), + IBV_QP_INIT_ATTR_RX_HASH | + IBV_QP_INIT_ATTR_SEND_OPS_FLAGS), }; enum { @@ -1727,13 +1779,23 @@ static struct ibv_qp *create_qp(struct ibv_context *context, (attr->qp_type != IBV_QPT_RAW_PACKET)) return NULL; + if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS && + (attr->comp_mask & IBV_QP_INIT_ATTR_RX_HASH || + (attr->qp_type == IBV_QPT_DRIVER && + mlx5_qp_attr && + mlx5_qp_attr->comp_mask & MLX5DV_QP_INIT_ATTR_MASK_DC && + mlx5_qp_attr->dc_init_attr.dc_type == MLX5DV_DCTYPE_DCT))) { + errno = EINVAL; + return NULL; + } + qp = calloc(1, sizeof(*qp)); if (!qp) { mlx5_dbg(fp, MLX5_DBG_QP, "\n"); return NULL; } - ibqp = (struct ibv_qp *)&qp->verbs_qp; + ibqp = &qp->verbs_qp.qp; qp->ibv_qp = ibqp; if ((attr->comp_mask & IBV_QP_INIT_ATTR_CREATE_FLAGS) && @@ -1847,6 +1909,18 @@ static struct ibv_qp *create_qp(struct ibv_context *context, return ibqp; } + if (ctx->atomic_cap == IBV_ATOMIC_HCA) + qp->atomics_enabled = 1; + + if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS) { + ret = mlx5_qp_fill_wr_pfns(qp, attr); + if (ret) { + errno = ret; + mlx5_dbg(fp, MLX5_DBG_QP, "Failed to handle operations flags (errno %d)\n", errno); + goto err; + } + } + cmd.flags = mlx5_create_flags; qp->wq_sig = qp_sig_enabled(); if (qp->wq_sig) @@ -1911,9 +1985,6 @@ static struct ibv_qp *create_qp(struct ibv_context *context, cmd.rq_wqe_count = qp->rq.wqe_cnt; cmd.rq_wqe_shift = qp->rq.wqe_shift; - if (ctx->atomic_cap == IBV_ATOMIC_HCA) - qp->atomics_enabled = 1; - if (!ctx->cqe_version) { cmd.uidx = 0xffffff; pthread_mutex_lock(&ctx->qp_table_mutex); @@ -1992,6 +2063,9 @@ static struct ibv_qp *create_qp(struct ibv_context *context, if (resp_drv->comp_mask & MLX5_IB_CREATE_QP_RESP_MASK_SQN) qp->sqn = resp_drv->sqn; + if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS) + qp->verbs_qp.comp_mask |= VERBS_QP_EX; + return ibqp; err_destroy: From patchwork Mon Mar 18 12:24:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10857541 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 969451390 for ; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B67529227 for ; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6FDD3292D1; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C439D29227 for ; Mon, 18 Mar 2019 12:25:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727220AbfCRMZk (ORCPT ); Mon, 18 Mar 2019 08:25:40 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:53317 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727141AbfCRMZi (ORCPT ); Mon, 18 Mar 2019 08:25:38 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUFt019557; Mon, 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPTdg004904; Mon, 18 Mar 2019 14:25:30 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2ICPTL2004903; Mon, 18 Mar 2019 14:25:29 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, guyle@mellanox.com, Alexr@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 3/6] mlx5: Support inline data WR over new post send API Date: Mon, 18 Mar 2019 14:24:16 +0200 Message-Id: <1552911859-4073-4-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> References: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Guy Levi It is a complementary part for new post send API support. Now inline data setters are also supported: - ibv_wr_set_inline_data() - ibv_wr_set_inline_data_list() Signed-off-by: Guy Levi Signed-off-by: Yishai Hadas --- providers/mlx5/mlx5.h | 1 + providers/mlx5/qp.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 155 insertions(+), 1 deletion(-) diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h index 71220a2..b31619c 100644 --- a/providers/mlx5/mlx5.h +++ b/providers/mlx5/mlx5.h @@ -513,6 +513,7 @@ struct mlx5_qp { struct mlx5_bf *bf; /* Start of new post send API specific fields */ + bool inl_wqe; uint8_t cur_setters_cnt; uint8_t fm_cache_rb; int err; diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c index c933ee6..8cff584 100644 --- a/providers/mlx5/qp.c +++ b/providers/mlx5/qp.c @@ -1154,6 +1154,7 @@ static void mlx5_send_wr_start(struct ibv_qp_ex *ibqp) mqp->fm_cache_rb = mqp->fm_cache; mqp->err = 0; mqp->nreq = 0; + mqp->inl_wqe = 0; } static int mlx5_send_wr_complete(struct ibv_qp_ex *ibqp) @@ -1168,7 +1169,7 @@ static int mlx5_send_wr_complete(struct ibv_qp_ex *ibqp) goto out; } - post_send_db(mqp, mqp->bf, mqp->nreq, 0, mqp->cur_size, + post_send_db(mqp, mqp->bf, mqp->nreq, mqp->inl_wqe, mqp->cur_size, mqp->cur_ctrl); out: @@ -1570,6 +1571,154 @@ mlx5_send_wr_set_sge_list_ud_xrc(struct ibv_qp_ex *ibqp, size_t num_sge, mqp->cur_setters_cnt++; } +static inline void memcpy_to_wqe(struct mlx5_qp *mqp, void *dest, void *src, + size_t n) +{ + if (unlikely(dest + n > mqp->sq.qend)) { + size_t copy = mqp->sq.qend - dest; + + memcpy(dest, src, copy); + src += copy; + n -= copy; + dest = mlx5_get_send_wqe(mqp, 0); + } + memcpy(dest, src, n); +} + +static inline void memcpy_to_wqe_and_update(struct mlx5_qp *mqp, void **dest, + void *src, size_t n) +{ + if (unlikely(*dest + n > mqp->sq.qend)) { + size_t copy = mqp->sq.qend - *dest; + + memcpy(*dest, src, copy); + src += copy; + n -= copy; + *dest = mlx5_get_send_wqe(mqp, 0); + } + memcpy(*dest, src, n); + + *dest += n; +} + +static inline void +_mlx5_send_wr_set_inline_data(struct mlx5_qp *mqp, void *addr, size_t length) +{ + struct mlx5_wqe_inline_seg *dseg = mqp->cur_data; + + if (unlikely(length > mqp->max_inline_data)) { + FILE *fp = to_mctx(mqp->ibv_qp->context)->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_QP_SEND, + "Inline data %zu exceeds the maximum (%d)\n", + length, mqp->max_inline_data); + + if (!mqp->err) + mqp->err = ENOMEM; + + return; + } + + mqp->inl_wqe = 1; /* Encourage a BlueFlame usage */ + + if (unlikely(!length)) + return; + + memcpy_to_wqe(mqp, (void *)dseg + sizeof(*dseg), addr, length); + dseg->byte_count = htobe32(length | MLX5_INLINE_SEG); + mqp->cur_size += DIV_ROUND_UP(length + sizeof(*dseg), 16); +} + +static void +mlx5_send_wr_set_inline_data_rc_uc(struct ibv_qp_ex *ibqp, void *addr, + size_t length) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_inline_data(mqp, addr, length); + _common_wqe_finilize(mqp); +} + +static void +mlx5_send_wr_set_inline_data_ud_xrc(struct ibv_qp_ex *ibqp, void *addr, + size_t length) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_inline_data(mqp, addr, length); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + +static inline void +_mlx5_send_wr_set_inline_data_list(struct mlx5_qp *mqp, + size_t num_buf, + const struct ibv_data_buf *buf_list) +{ + struct mlx5_wqe_inline_seg *dseg = mqp->cur_data; + void *wqe = (void *)dseg + sizeof(*dseg); + size_t inl_size = 0; + int i; + + for (i = 0; i < num_buf; i++) { + size_t length = buf_list[i].length; + + inl_size += length; + + if (unlikely(inl_size > mqp->max_inline_data)) { + FILE *fp = to_mctx(mqp->ibv_qp->context)->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_QP_SEND, + "Inline data %zu exceeds the maximum (%d)\n", + inl_size, mqp->max_inline_data); + + if (!mqp->err) + mqp->err = ENOMEM; + + return; + } + + memcpy_to_wqe_and_update(mqp, &wqe, buf_list[i].addr, length); + } + + mqp->inl_wqe = 1; /* Encourage a BlueFlame usage */ + + if (unlikely(!inl_size)) + return; + + dseg->byte_count = htobe32(inl_size | MLX5_INLINE_SEG); + mqp->cur_size += DIV_ROUND_UP(inl_size + sizeof(*dseg), 16); +} + +static void +mlx5_send_wr_set_inline_data_list_rc_uc(struct ibv_qp_ex *ibqp, + size_t num_buf, + const struct ibv_data_buf *buf_list) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_inline_data_list(mqp, num_buf, buf_list); + _common_wqe_finilize(mqp); +} + +static void +mlx5_send_wr_set_inline_data_list_ud_xrc(struct ibv_qp_ex *ibqp, + size_t num_buf, + const struct ibv_data_buf *buf_list) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + + _mlx5_send_wr_set_inline_data_list(mqp, num_buf, buf_list); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + static void mlx5_send_wr_set_ud_addr(struct ibv_qp_ex *ibqp, struct ibv_ah *ah, uint32_t remote_qpn, uint32_t remote_qkey) @@ -1664,12 +1813,16 @@ static void fill_wr_setters_rc_uc(struct ibv_qp_ex *ibqp) { ibqp->wr_set_sge = mlx5_send_wr_set_sge_rc_uc; ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_rc_uc; + ibqp->wr_set_inline_data = mlx5_send_wr_set_inline_data_rc_uc; + ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_rc_uc; } static void fill_wr_setters_ud_xrc(struct ibv_qp_ex *ibqp) { ibqp->wr_set_sge = mlx5_send_wr_set_sge_ud_xrc; ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_ud_xrc; + ibqp->wr_set_inline_data = mlx5_send_wr_set_inline_data_ud_xrc; + ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_ud_xrc; } int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, From patchwork Mon Mar 18 12:24:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10857539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 261D16C2 for ; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05E8829222 for ; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EE274292D1; Mon, 18 Mar 2019 12:25:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BCB2E29222 for ; Mon, 18 Mar 2019 12:25:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726969AbfCRMZj (ORCPT ); Mon, 18 Mar 2019 08:25:39 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:53318 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727220AbfCRMZj (ORCPT ); Mon, 18 Mar 2019 08:25:39 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUIE019560; Mon, 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUkM004908; Mon, 18 Mar 2019 14:25:30 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2ICPU2P004907; Mon, 18 Mar 2019 14:25:30 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, guyle@mellanox.com, Alexr@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 4/6] mlx5: Support raw packet QPT over new post send API Date: Mon, 18 Mar 2019 14:24:17 +0200 Message-Id: <1552911859-4073-5-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> References: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Guy Levi As a complementary part for mlx5 support of new post send API, now raw packet QP transport with all its operations is supported. This completes the full support for all QP types and operations. Signed-off-by: Guy Levi Signed-off-by: Yishai Hadas --- providers/mlx5/mlx5.h | 1 + providers/mlx5/qp.c | 367 +++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 338 insertions(+), 30 deletions(-) diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h index b31619c..3a22fde 100644 --- a/providers/mlx5/mlx5.h +++ b/providers/mlx5/mlx5.h @@ -520,6 +520,7 @@ struct mlx5_qp { int nreq; uint32_t cur_size; uint32_t cur_post_rb; + void *cur_eth; void *cur_data; struct mlx5_wqe_ctrl_seg *cur_ctrl; /* End of new post send API specific fields */ diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c index 8cff584..f3bce40 100644 --- a/providers/mlx5/qp.c +++ b/providers/mlx5/qp.c @@ -411,36 +411,59 @@ void *mlx5_get_atomic_laddr(struct mlx5_qp *qp, uint16_t idx, int *byte_count) } static inline int copy_eth_inline_headers(struct ibv_qp *ibqp, - struct ibv_send_wr *wr, + const void *list, + size_t nelem, struct mlx5_wqe_eth_seg *eseg, - struct mlx5_sg_copy_ptr *sg_copy_ptr) + struct mlx5_sg_copy_ptr *sg_copy_ptr, + bool is_sge) + ALWAYS_INLINE; +static inline int copy_eth_inline_headers(struct ibv_qp *ibqp, + const void *list, + size_t nelem, + struct mlx5_wqe_eth_seg *eseg, + struct mlx5_sg_copy_ptr *sg_copy_ptr, + bool is_sge) { uint32_t inl_hdr_size = to_mctx(ibqp->context)->eth_min_inline_size; - int inl_hdr_copy_size = 0; + size_t inl_hdr_copy_size = 0; int j = 0; FILE *fp = to_mctx(ibqp->context)->dbg_fp; + size_t length; + void *addr; - if (unlikely(wr->num_sge < 1)) { - mlx5_dbg(fp, MLX5_DBG_QP_SEND, "illegal num_sge: %d, minimum is 1\n", - wr->num_sge); + if (unlikely(nelem < 1)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, + "illegal num_sge: %zu, minimum is 1\n", nelem); return EINVAL; } - if (likely(wr->sg_list[0].length >= MLX5_ETH_L2_INLINE_HEADER_SIZE)) { + if (is_sge) { + addr = (void *)(uintptr_t)((struct ibv_sge *)list)[0].addr; + length = (size_t)((struct ibv_sge *)list)[0].length; + } else { + addr = ((struct ibv_data_buf *)list)[0].addr; + length = ((struct ibv_data_buf *)list)[0].length; + } + + if (likely(length >= MLX5_ETH_L2_INLINE_HEADER_SIZE)) { inl_hdr_copy_size = inl_hdr_size; - memcpy(eseg->inline_hdr_start, - (void *)(uintptr_t)wr->sg_list[0].addr, - inl_hdr_copy_size); + memcpy(eseg->inline_hdr_start, addr, inl_hdr_copy_size); } else { uint32_t inl_hdr_size_left = inl_hdr_size; - for (j = 0; j < wr->num_sge && inl_hdr_size_left > 0; ++j) { - inl_hdr_copy_size = min(wr->sg_list[j].length, - inl_hdr_size_left); + for (j = 0; j < nelem && inl_hdr_size_left > 0; ++j) { + if (is_sge) { + addr = (void *)(uintptr_t)((struct ibv_sge *)list)[j].addr; + length = (size_t)((struct ibv_sge *)list)[j].length; + } else { + addr = ((struct ibv_data_buf *)list)[j].addr; + length = ((struct ibv_data_buf *)list)[j].length; + } + + inl_hdr_copy_size = min_t(size_t, length, inl_hdr_size_left); memcpy(eseg->inline_hdr_start + (MLX5_ETH_L2_INLINE_HEADER_SIZE - inl_hdr_size_left), - (void *)(uintptr_t)wr->sg_list[j].addr, - inl_hdr_copy_size); + addr, inl_hdr_copy_size); inl_hdr_size_left -= inl_hdr_copy_size; } if (unlikely(inl_hdr_size_left)) { @@ -456,7 +479,7 @@ static inline int copy_eth_inline_headers(struct ibv_qp *ibqp, /* If we copied all the sge into the inline-headers, then we need to * start copying from the next sge into the data-segment. */ - if (unlikely(wr->sg_list[j].length == inl_hdr_copy_size)) { + if (unlikely(length == inl_hdr_copy_size)) { ++j; inl_hdr_copy_size = 0; } @@ -619,17 +642,17 @@ static inline int set_bind_wr(struct mlx5_qp *qp, enum ibv_mw_type type, /* Copy tso header to eth segment with considering padding and WQE * wrap around in WQ buffer. */ -static inline int set_tso_eth_seg(void **seg, struct ibv_send_wr *wr, - void *qend, struct mlx5_qp *qp, int *size) +static inline int set_tso_eth_seg(void **seg, void *hdr, uint16_t hdr_sz, + uint16_t mss, + struct mlx5_qp *qp, int *size) { struct mlx5_wqe_eth_seg *eseg = *seg; int size_of_inl_hdr_start = sizeof(eseg->inline_hdr_start); uint64_t left, left_len, copy_sz; - void *pdata = wr->tso.hdr; FILE *fp = to_mctx(qp->ibv_qp->context)->dbg_fp; - if (unlikely(wr->tso.hdr_sz < MLX5_ETH_L2_MIN_HEADER_SIZE || - wr->tso.hdr_sz > qp->max_tso_header)) { + if (unlikely(hdr_sz < MLX5_ETH_L2_MIN_HEADER_SIZE || + hdr_sz > qp->max_tso_header)) { mlx5_dbg(fp, MLX5_DBG_QP_SEND, "TSO header size should be at least %d and at most %d\n", MLX5_ETH_L2_MIN_HEADER_SIZE, @@ -637,18 +660,18 @@ static inline int set_tso_eth_seg(void **seg, struct ibv_send_wr *wr, return EINVAL; } - left = wr->tso.hdr_sz; - eseg->mss = htobe16(wr->tso.mss); - eseg->inline_hdr_sz = htobe16(wr->tso.hdr_sz); + left = hdr_sz; + eseg->mss = htobe16(mss); + eseg->inline_hdr_sz = htobe16(hdr_sz); /* Check if there is space till the end of queue, if yes, * copy all in one shot, otherwise copy till the end of queue, * rollback and then copy the left */ - left_len = qend - (void *)eseg->inline_hdr_start; + left_len = qp->sq.qend - (void *)eseg->inline_hdr_start; copy_sz = min(left_len, left); - memcpy(eseg->inline_hdr_start, pdata, copy_sz); + memcpy(eseg->inline_hdr_start, hdr, copy_sz); /* The -1 is because there are already 16 bytes included in * eseg->inline_hdr[16] @@ -660,8 +683,8 @@ static inline int set_tso_eth_seg(void **seg, struct ibv_send_wr *wr, if (unlikely(copy_sz < left)) { *seg = mlx5_get_send_wqe(qp, 0); left -= copy_sz; - pdata += copy_sz; - memcpy(*seg, pdata, left); + hdr += copy_sz; + memcpy(*seg, hdr, left); *seg += align(left, 16); *size += align(left, 16) / 16; } @@ -1003,7 +1026,9 @@ static inline int _mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, if (wr->opcode == IBV_WR_TSO) { max_tso = qp->max_tso; - err = set_tso_eth_seg(&seg, wr, qend, qp, &size); + err = set_tso_eth_seg(&seg, wr->tso.hdr, + wr->tso.hdr_sz, + wr->tso.mss, qp, &size); if (unlikely(err)) { *bad_wr = wr; goto out; @@ -1021,7 +1046,9 @@ static inline int _mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, uint32_t inl_hdr_size = to_mctx(ibqp->context)->eth_min_inline_size; - err = copy_eth_inline_headers(ibqp, wr, seg, &sg_copy_ptr); + err = copy_eth_inline_headers(ibqp, wr->sg_list, + wr->num_sge, seg, + &sg_copy_ptr, 1); if (unlikely(err)) { *bad_wr = wr; mlx5_dbg(fp, MLX5_DBG_QP_SEND, @@ -1292,6 +1319,45 @@ static void mlx5_send_wr_send_other(struct ibv_qp_ex *ibqp) _mlx5_send_wr_send(ibqp, IBV_WR_SEND); } +static void mlx5_send_wr_send_eth(struct ibv_qp_ex *ibqp) +{ + uint32_t inl_hdr_size = + to_mctx(((struct ibv_qp *)ibqp)->context)->eth_min_inline_size; + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_eth_seg *eseg; + size_t eseg_sz; + + _common_wqe_init(ibqp, IBV_WR_SEND); + + eseg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + memset(eseg, 0, sizeof(struct mlx5_wqe_eth_seg)); + if (inl_hdr_size) + mqp->cur_eth = eseg; + + if (ibqp->wr_flags & IBV_SEND_IP_CSUM) { + if (unlikely(!(mqp->qp_cap_cache & + MLX5_CSUM_SUPPORT_RAW_OVER_ETH))) { + if (!mqp->err) + mqp->err = EINVAL; + + return; + } + + eseg->cs_flags |= MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; + } + + /* The eth segment size depends on the device's min inline + * header requirement which can be 0 or 18. The basic eth segment + * always includes room for first 2 inline header bytes (even if + * copy size is 0) so the additional seg size is adjusted accordingly. + */ + eseg_sz = (offsetof(struct mlx5_wqe_eth_seg, inline_hdr) + + inl_hdr_size) & ~0xf; + mqp->cur_data = (void *)eseg + eseg_sz; + mqp->cur_size = (sizeof(struct mlx5_wqe_ctrl_seg) + eseg_sz) >> 4; + mqp->nreq++; +} + static void mlx5_send_wr_send_imm(struct ibv_qp_ex *ibqp, __be32 imm_data) { struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); @@ -1311,6 +1377,48 @@ static void mlx5_send_wr_send_inv(struct ibv_qp_ex *ibqp, mqp->cur_ctrl->imm = htobe32(invalidate_rkey); } +static void mlx5_send_wr_send_tso(struct ibv_qp_ex *ibqp, void *hdr, + uint16_t hdr_sz, uint16_t mss) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_eth_seg *eseg; + int size = 0; + int err; + + _common_wqe_init(ibqp, IBV_WR_TSO); + + eseg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + memset(eseg, 0, sizeof(struct mlx5_wqe_eth_seg)); + + if (ibqp->wr_flags & IBV_SEND_IP_CSUM) { + if (unlikely(!(mqp->qp_cap_cache & MLX5_CSUM_SUPPORT_RAW_OVER_ETH))) { + if (!mqp->err) + mqp->err = EINVAL; + + return; + } + + eseg->cs_flags |= MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; + } + + err = set_tso_eth_seg((void *)&eseg, hdr, hdr_sz, mss, mqp, &size); + if (unlikely(err)) { + if (!mqp->err) + mqp->err = err; + + return; + } + + /* eseg and cur_size was updated with hdr size inside set_tso_eth_seg */ + mqp->cur_data = (void *)eseg + sizeof(struct mlx5_wqe_eth_seg); + mqp->cur_size = size + + ((sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_eth_seg)) >> 4); + + mqp->cur_eth = NULL; + mqp->nreq++; +} + static inline void _mlx5_send_wr_rdma(struct ibv_qp_ex *ibqp, uint32_t rkey, uint64_t remote_addr, @@ -1513,6 +1621,36 @@ mlx5_send_wr_set_sge_ud_xrc(struct ibv_qp_ex *ibqp, uint32_t lkey, mqp->cur_setters_cnt++; } +static void +mlx5_send_wr_set_sge_eth(struct ibv_qp_ex *ibqp, uint32_t lkey, + uint64_t addr, uint32_t length) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_eth_seg *eseg = mqp->cur_eth; + int err; + + if (eseg) { /* Inline-headers was set */ + struct mlx5_sg_copy_ptr sg_copy_ptr = {.index = 0, .offset = 0}; + struct ibv_sge sge = {.addr = addr, .length = length}; + + err = copy_eth_inline_headers((struct ibv_qp *)ibqp, &sge, 1, + eseg, &sg_copy_ptr, 1); + if (unlikely(err)) { + if (!mqp->err) + mqp->err = err; + + return; + } + + addr += sg_copy_ptr.offset; + length -= sg_copy_ptr.offset; + } + + _mlx5_send_wr_set_sge(mqp, lkey, addr, length); + + _common_wqe_finilize(mqp); +} + static inline void _mlx5_send_wr_set_sge_list(struct mlx5_qp *mqp, size_t num_sge, const struct ibv_sge *sg_list) @@ -1571,6 +1709,61 @@ mlx5_send_wr_set_sge_list_ud_xrc(struct ibv_qp_ex *ibqp, size_t num_sge, mqp->cur_setters_cnt++; } +static void +mlx5_send_wr_set_sge_list_eth(struct ibv_qp_ex *ibqp, size_t num_sge, + const struct ibv_sge *sg_list) +{ + struct mlx5_sg_copy_ptr sg_copy_ptr = {.index = 0, .offset = 0}; + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_data_seg *dseg = mqp->cur_data; + struct mlx5_wqe_eth_seg *eseg = mqp->cur_eth; + size_t i; + + if (unlikely(num_sge > mqp->sq.max_gs)) { + FILE *fp = to_mctx(mqp->ibv_qp->context)->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "Num SGEs %zu exceeds the maximum (%d)\n", + num_sge, mqp->sq.max_gs); + + if (!mqp->err) + mqp->err = ENOMEM; + + return; + } + + if (eseg) { /* Inline-headers was set */ + int err; + + err = copy_eth_inline_headers((struct ibv_qp *)ibqp, sg_list, + num_sge, eseg, &sg_copy_ptr, 1); + if (unlikely(err)) { + if (!mqp->err) + mqp->err = err; + + return; + } + } + + for (i = sg_copy_ptr.index; i < num_sge; i++) { + uint32_t length = sg_list[i].length - sg_copy_ptr.offset; + + if (unlikely(!length)) + continue; + + if (unlikely(dseg == mqp->sq.qend)) + dseg = mlx5_get_send_wqe(mqp, 0); + + dseg->addr = htobe64(sg_list[i].addr + sg_copy_ptr.offset); + dseg->byte_count = htobe32(length); + dseg->lkey = htobe32(sg_list[i].lkey); + dseg++; + mqp->cur_size += (sizeof(*dseg) / 16); + sg_copy_ptr.offset = 0; + } + + _common_wqe_finilize(mqp); +} + static inline void memcpy_to_wqe(struct mlx5_qp *mqp, void *dest, void *src, size_t n) { @@ -1653,6 +1846,35 @@ mlx5_send_wr_set_inline_data_ud_xrc(struct ibv_qp_ex *ibqp, void *addr, mqp->cur_setters_cnt++; } +static void +mlx5_send_wr_set_inline_data_eth(struct ibv_qp_ex *ibqp, void *addr, + size_t length) +{ + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_eth_seg *eseg = mqp->cur_eth; + + if (eseg) { /* Inline-headers was set */ + struct mlx5_sg_copy_ptr sg_copy_ptr = {.index = 0, .offset = 0}; + struct ibv_data_buf buf = {.addr = addr, .length = length}; + int err; + + err = copy_eth_inline_headers((struct ibv_qp *)ibqp, &buf, 1, + eseg, &sg_copy_ptr, 0); + if (unlikely(err)) { + if (!mqp->err) + mqp->err = err; + + return; + } + + addr += sg_copy_ptr.offset; + length -= sg_copy_ptr.offset; + } + + _mlx5_send_wr_set_inline_data(mqp, addr, length); + _common_wqe_finilize(mqp); +} + static inline void _mlx5_send_wr_set_inline_data_list(struct mlx5_qp *mqp, size_t num_buf, @@ -1720,6 +1942,66 @@ mlx5_send_wr_set_inline_data_list_ud_xrc(struct ibv_qp_ex *ibqp, } static void +mlx5_send_wr_set_inline_data_list_eth(struct ibv_qp_ex *ibqp, + size_t num_buf, + const struct ibv_data_buf *buf_list) +{ + struct mlx5_sg_copy_ptr sg_copy_ptr = {.index = 0, .offset = 0}; + struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); + struct mlx5_wqe_inline_seg *dseg = mqp->cur_data; + struct mlx5_wqe_eth_seg *eseg = mqp->cur_eth; + void *wqe = (void *)dseg + sizeof(*dseg); + size_t inl_size = 0; + size_t i; + + if (eseg) { /* Inline-headers was set */ + int err; + + err = copy_eth_inline_headers((struct ibv_qp *)ibqp, buf_list, + num_buf, eseg, &sg_copy_ptr, 0); + if (unlikely(err)) { + if (!mqp->err) + mqp->err = err; + + return; + } + } + + for (i = sg_copy_ptr.index; i < num_buf; i++) { + size_t length = buf_list[i].length - sg_copy_ptr.offset; + + inl_size += length; + + if (unlikely(inl_size > mqp->max_inline_data)) { + FILE *fp = to_mctx(mqp->ibv_qp->context)->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_QP_SEND, + "Inline data %zu exceeds the maximum (%d)\n", + inl_size, mqp->max_inline_data); + + if (!mqp->err) + mqp->err = EINVAL; + + return; + } + + memcpy_to_wqe_and_update(mqp, &wqe, + buf_list[i].addr + sg_copy_ptr.offset, + length); + + sg_copy_ptr.offset = 0; + } + + if (likely(inl_size)) { + dseg->byte_count = htobe32(inl_size | MLX5_INLINE_SEG); + mqp->cur_size += DIV_ROUND_UP(inl_size + sizeof(*dseg), 16); + } + + mqp->inl_wqe = 1; /* Encourage a BlueFlame usage */ + _common_wqe_finilize(mqp); +} + +static void mlx5_send_wr_set_ud_addr(struct ibv_qp_ex *ibqp, struct ibv_ah *ah, uint32_t remote_qpn, uint32_t remote_qkey) { @@ -1776,6 +2058,9 @@ enum { IBV_QP_EX_WITH_RDMA_WRITE_WITH_IMM | IBV_QP_EX_WITH_LOCAL_INV | IBV_QP_EX_WITH_BIND_MW, + MLX5_SUPPORTED_SEND_OPS_FLAGS_RAW_PACKET = + IBV_QP_EX_WITH_SEND | + IBV_QP_EX_WITH_TSO, }; static void fill_wr_builders_rc_xrc(struct ibv_qp_ex *ibqp) @@ -1809,6 +2094,12 @@ static void fill_wr_builders_ud(struct ibv_qp_ex *ibqp) ibqp->wr_send_imm = mlx5_send_wr_send_imm; } +static void fill_wr_builders_eth(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_send = mlx5_send_wr_send_eth; + ibqp->wr_send_tso = mlx5_send_wr_send_tso; +} + static void fill_wr_setters_rc_uc(struct ibv_qp_ex *ibqp) { ibqp->wr_set_sge = mlx5_send_wr_set_sge_rc_uc; @@ -1825,6 +2116,14 @@ static void fill_wr_setters_ud_xrc(struct ibv_qp_ex *ibqp) ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_ud_xrc; } +static void fill_wr_setters_eth(struct ibv_qp_ex *ibqp) +{ + ibqp->wr_set_sge = mlx5_send_wr_set_sge_eth; + ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_eth; + ibqp->wr_set_inline_data = mlx5_send_wr_set_inline_data_eth; + ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_eth; +} + int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, const struct ibv_qp_init_attr_ex *attr) { @@ -1879,6 +2178,14 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, ibqp->wr_set_ud_addr = mlx5_send_wr_set_ud_addr; break; + case IBV_QPT_RAW_PACKET: + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_RAW_PACKET) + return EOPNOTSUPP; + + fill_wr_builders_eth(ibqp); + fill_wr_setters_eth(ibqp); + break; + default: return EOPNOTSUPP; } From patchwork Mon Mar 18 12:24:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10857543 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 428706C2 for ; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25DB029227 for ; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A02C292C8; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D9B429301 for ; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727373AbfCRMZi (ORCPT ); Mon, 18 Mar 2019 08:25:38 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:53329 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727246AbfCRMZi (ORCPT ); Mon, 18 Mar 2019 08:25:38 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUDA019563; Mon, 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUZe004912; Mon, 18 Mar 2019 14:25:30 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2ICPUs4004911; Mon, 18 Mar 2019 14:25:30 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, guyle@mellanox.com, Alexr@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 5/6] verbs: Demonstrate the usage of new post send API Date: Mon, 18 Mar 2019 14:24:18 +0200 Message-Id: <1552911859-4073-6-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> References: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Guy Levi Expose a new flag in ibv_rc_pingpong command line so send WRs will be posted by the new post send method as introduced previously by libibverbs. It can be used as a simple example for this API usage, as a sanity for this API functionality or for any other goal. We use this opportunity to fix the man page with device memory option description which was missed. Signed-off-by: Guy Levi Signed-off-by: Yishai Hadas --- libibverbs/examples/rc_pingpong.c | 49 ++++++++++++++++++++++++++++++++++++--- libibverbs/man/ibv_rc_pingpong.1 | 10 ++++++-- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/libibverbs/examples/rc_pingpong.c b/libibverbs/examples/rc_pingpong.c index d03dd7d..9781c4f 100644 --- a/libibverbs/examples/rc_pingpong.c +++ b/libibverbs/examples/rc_pingpong.c @@ -62,6 +62,7 @@ static int prefetch_mr; static int use_ts; static int validate_buf; static int use_dm; +static int use_new_send; struct pingpong_context { struct ibv_context *context; @@ -74,6 +75,7 @@ struct pingpong_context { struct ibv_cq_ex *cq_ex; } cq_s; struct ibv_qp *qp; + struct ibv_qp_ex *qpx; char *buf; int size; int send_flags; @@ -492,12 +494,35 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, .qp_type = IBV_QPT_RC }; - ctx->qp = ibv_create_qp(ctx->pd, &init_attr); + if (use_new_send) { + struct ibv_qp_init_attr_ex init_attr_ex = {}; + + init_attr_ex.send_cq = pp_cq(ctx); + init_attr_ex.recv_cq = pp_cq(ctx); + init_attr_ex.cap.max_send_wr = 1; + init_attr_ex.cap.max_recv_wr = rx_depth; + init_attr_ex.cap.max_send_sge = 1; + init_attr_ex.cap.max_recv_sge = 1; + init_attr_ex.qp_type = IBV_QPT_RC; + + init_attr_ex.comp_mask |= IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_SEND_OPS_FLAGS; + init_attr_ex.pd = ctx->pd; + init_attr_ex.send_ops_flags = IBV_QP_EX_WITH_SEND; + + ctx->qp = ibv_create_qp_ex(ctx->context, &init_attr_ex); + } else { + ctx->qp = ibv_create_qp(ctx->pd, &init_attr); + } + if (!ctx->qp) { fprintf(stderr, "Couldn't create QP\n"); goto clean_cq; } + if (use_new_send) + ctx->qpx = ibv_qp_to_qp_ex(ctx->qp); + ibv_query_qp(ctx->qp, &attr, IBV_QP_CAP, &init_attr); if (init_attr.cap.max_inline_data >= size && !use_dm) ctx->send_flags |= IBV_SEND_INLINE; @@ -640,7 +665,19 @@ static int pp_post_send(struct pingpong_context *ctx) }; struct ibv_send_wr *bad_wr; - return ibv_post_send(ctx->qp, &wr, &bad_wr); + if (use_new_send) { + ibv_wr_start(ctx->qpx); + + ctx->qpx->wr_id = PINGPONG_SEND_WRID; + ctx->qpx->wr_flags = ctx->send_flags; + + ibv_wr_send(ctx->qpx); + ibv_wr_set_sge(ctx->qpx, list.lkey, list.addr, list.length); + + return ibv_wr_complete(ctx->qpx); + } else { + return ibv_post_send(ctx->qp, &wr, &bad_wr); + } } struct ts_params { @@ -749,6 +786,7 @@ static void usage(const char *argv0) printf(" -t, --ts get CQE with timestamp\n"); printf(" -c, --chk validate received buffer\n"); printf(" -j, --dm use device memory\n"); + printf(" -N, --new_send use new post send WR API\n"); } int main(int argc, char *argv[]) @@ -798,10 +836,11 @@ int main(int argc, char *argv[]) { .name = "ts", .has_arg = 0, .val = 't' }, { .name = "chk", .has_arg = 0, .val = 'c' }, { .name = "dm", .has_arg = 0, .val = 'j' }, + { .name = "new_send", .has_arg = 0, .val = 'N' }, {} }; - c = getopt_long(argc, argv, "p:d:i:s:m:r:n:l:eg:oOPtcj", + c = getopt_long(argc, argv, "p:d:i:s:m:r:n:l:eg:oOPtcjN", long_options, NULL); if (c == -1) @@ -881,6 +920,10 @@ int main(int argc, char *argv[]) use_dm = 1; break; + case 'N': + use_new_send = 1; + break; + default: usage(argv[0]); return 1; diff --git a/libibverbs/man/ibv_rc_pingpong.1 b/libibverbs/man/ibv_rc_pingpong.1 index 5561fe5..92554c0 100644 --- a/libibverbs/man/ibv_rc_pingpong.1 +++ b/libibverbs/man/ibv_rc_pingpong.1 @@ -8,12 +8,12 @@ ibv_rc_pingpong \- simple InfiniBand RC transport test .B ibv_rc_pingpong [\-p port] [\-d device] [\-i ib port] [\-s size] [\-m size] [\-r rx depth] [\-n iters] [\-l sl] [\-e] [\-g gid index] -[\-o] [\-P] [\-t] \fBHOSTNAME\fR +[\-o] [\-P] [\-t] [\-j] [\-N] \fBHOSTNAME\fR .B ibv_rc_pingpong [\-p port] [\-d device] [\-i ib port] [\-s size] [\-m size] [\-r rx depth] [\-n iters] [\-l sl] [\-e] [\-g gid index] -[\-o] [\-P] [\-t] +[\-o] [\-P] [\-t] [\-j] [\-N] .SH DESCRIPTION .PP @@ -66,6 +66,12 @@ get CQE with timestamp .TP \fB\-c\fR, \fB\-\-chk\fR validate received buffer +.TP +\fB\-j\fR, \fB\-\-dm\fR +use device memory +.TP +\fB\-N\fR, \fB\-\-new_send\fR +use new post send WR API .SH SEE ALSO .BR ibv_uc_pingpong (1), From patchwork Mon Mar 18 12:24:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 10857547 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C349017EF for ; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A2EB629222 for ; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9703C292C8; Mon, 18 Mar 2019 12:25:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BD0229222 for ; Mon, 18 Mar 2019 12:25:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726435AbfCRMZk (ORCPT ); Mon, 18 Mar 2019 08:25:40 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:53330 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727188AbfCRMZk (ORCPT ); Mon, 18 Mar 2019 08:25:40 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUZf019566; Mon, 18 Mar 2019 14:25:30 +0200 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id x2ICPUjZ004918; Mon, 18 Mar 2019 14:25:30 +0200 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id x2ICPU7k004917; Mon, 18 Mar 2019 14:25:30 +0200 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: yishaih@mellanox.com, guyle@mellanox.com, Alexr@mellanox.com, jgg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core 6/6] mlx5: Introduce a new send API in direct verbs Date: Mon, 18 Mar 2019 14:24:19 +0200 Message-Id: <1552911859-4073-7-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 In-Reply-To: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> References: <1552911859-4073-1-git-send-email-yishaih@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Guy Levi A new send API was introduced by libibverbs. This is a mlx5 specific API extension to the generic one which is introduced in DV. By calling mlx5dv_create_qp with the generic attr, send_ops_flags, mlx5 specific send work features can be used. The new struct, mlx5dv_qp_ex, is used to access mlx5 specific send work micro-functions. Till now, the driver let to create a QP with DC transport type w/o a data-path support (Users had to implement data-path by themselves). Now, we introduce a DC support over DV new post send API which is actually a complete DC data-path support (Post a send WR, post a receive WR and poll a WC). Signed-off-by: Guy Levi Signed-off-by: Yishai Hadas --- debian/ibverbs-providers.symbols | 2 + providers/mlx5/CMakeLists.txt | 2 +- providers/mlx5/libmlx5.map | 6 ++ providers/mlx5/man/CMakeLists.txt | 3 + providers/mlx5/man/mlx5dv_create_qp.3.md | 5 ++ providers/mlx5/man/mlx5dv_wr_post.3.md | 94 ++++++++++++++++++++++++++ providers/mlx5/mlx5.h | 9 ++- providers/mlx5/mlx5dv.h | 20 ++++++ providers/mlx5/qp.c | 112 ++++++++++++++++++++++--------- providers/mlx5/verbs.c | 17 ++++- 10 files changed, 236 insertions(+), 34 deletions(-) create mode 100644 providers/mlx5/man/mlx5dv_wr_post.3.md diff --git a/debian/ibverbs-providers.symbols b/debian/ibverbs-providers.symbols index 9be0a94..309bbef 100644 --- a/debian/ibverbs-providers.symbols +++ b/debian/ibverbs-providers.symbols @@ -17,6 +17,7 @@ libmlx5.so.1 ibverbs-providers #MINVER# MLX5_1.7@MLX5_1.7 21 MLX5_1.8@MLX5_1.8 22 MLX5_1.9@MLX5_1.9 23 + MLX5_1.10@MLX5_1.10 24 mlx5dv_init_obj@MLX5_1.0 13 mlx5dv_init_obj@MLX5_1.2 15 mlx5dv_query_device@MLX5_1.0 13 @@ -57,3 +58,4 @@ libmlx5.so.1 ibverbs-providers #MINVER# mlx5dv_devx_destroy_cmd_comp@MLX5_1.9 23 mlx5dv_devx_get_async_cmd_comp@MLX5_1.9 23 mlx5dv_devx_obj_query_async@MLX5_1.9 23 + mlx5dv_qp_ex_from_ibv_qp_ex@MLX5_1.10 24 diff --git a/providers/mlx5/CMakeLists.txt b/providers/mlx5/CMakeLists.txt index d629c58..88b1246 100644 --- a/providers/mlx5/CMakeLists.txt +++ b/providers/mlx5/CMakeLists.txt @@ -11,7 +11,7 @@ if (MLX5_MW_DEBUG) endif() rdma_shared_provider(mlx5 libmlx5.map - 1 1.9.${PACKAGE_VERSION} + 1 1.10.${PACKAGE_VERSION} buf.c cq.c dbrec.c diff --git a/providers/mlx5/libmlx5.map b/providers/mlx5/libmlx5.map index be99767..28c8616 100644 --- a/providers/mlx5/libmlx5.map +++ b/providers/mlx5/libmlx5.map @@ -79,4 +79,10 @@ MLX5_1.9 { mlx5dv_devx_destroy_cmd_comp; mlx5dv_devx_get_async_cmd_comp; mlx5dv_devx_obj_query_async; + mlx5dv_qp_ex_from_ibv_qp_ex; } MLX5_1.8; + +MLX5_1.10 { + global: + mlx5dv_qp_ex_from_ibv_qp_ex; +} MLX5_1.9; diff --git a/providers/mlx5/man/CMakeLists.txt b/providers/mlx5/man/CMakeLists.txt index d8d42c3..24bd5d8 100644 --- a/providers/mlx5/man/CMakeLists.txt +++ b/providers/mlx5/man/CMakeLists.txt @@ -18,6 +18,7 @@ rdma_man_pages( mlx5dv_open_device.3.md mlx5dv_query_device.3 mlx5dv_ts_to_ns.3 + mlx5dv_wr_post.3.md mlx5dv.7 ) rdma_alias_man_pages( @@ -39,4 +40,6 @@ rdma_alias_man_pages( mlx5dv_devx_qp_modify.3 mlx5dv_devx_ind_tbl_modify.3 mlx5dv_devx_qp_modify.3 mlx5dv_devx_ind_tbl_query.3 mlx5dv_devx_umem_reg.3 mlx5dv_devx_umem_dereg.3 + mlx5dv_wr_post.3 mlx5dv_wr_set_dc_addr.3 + mlx5dv_wr_post.3 mlx5dv_qp_ex_from_ibv_qp_ex.3 ) diff --git a/providers/mlx5/man/mlx5dv_create_qp.3.md b/providers/mlx5/man/mlx5dv_create_qp.3.md index c21b527..7a93e84 100644 --- a/providers/mlx5/man/mlx5dv_create_qp.3.md +++ b/providers/mlx5/man/mlx5dv_create_qp.3.md @@ -95,6 +95,11 @@ struct mlx5dv_dc_init_attr { : used to create a DCT QP. +# NOTES + +**mlx5dv_qp_ex_from_ibv_qp_ex()** is used to get *struct mlx5dv_qp_ex* for +accessing the send ops interfaces when IBV_QP_INIT_ATTR_SEND_OPS_FLAGS is used. + # RETURN VALUE **mlx5dv_create_qp()** diff --git a/providers/mlx5/man/mlx5dv_wr_post.3.md b/providers/mlx5/man/mlx5dv_wr_post.3.md new file mode 100644 index 0000000..2c17627 --- /dev/null +++ b/providers/mlx5/man/mlx5dv_wr_post.3.md @@ -0,0 +1,94 @@ +--- +date: 2019-02-24 +footer: mlx5 +header: "mlx5 Programmer's Manual" +tagline: Verbs +layout: page +license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md' +section: 3 +title: MLX5DV_WR +--- + +# NAME + +mlx5dv_wr_set_dc_addr - Attach a DC info to the last work request + +# SYNOPSIS + +```c +#include + +static inline void mlx5dv_wr_set_dc_addr(struct mlx5dv_qp_ex *mqp, + struct ibv_ah *ah, + uint32_t remote_dctn, + uint64_t remote_dc_key); +``` + +# DESCRIPTION + +The MLX5DV work request APIs (mlx5dv_wr_\*) is an extension for IBV work +request API (ibv_wr_\*) with mlx5 specific features for send work request. +This may be used together with or without ibv_wr_* calls. + +# USAGE + +To use these APIs a QP must be created using mlx5dv_create_qp() with +*send_ops_flags* of struct ibv_qp_init_attr_ex set. + +If the QP does not support all the requested work request types then QP +creation will fail. + +The mlx5dv_qp_ex is extracted from the IBV_QP by ibv_qp_to_qp_ex() and +mlx5dv_qp_ex_from_ibv_qp_ex(). This should be used to apply the mlx5 specific +features on the posted WR. + +A work request creation requires to use the ibv_qp_ex as described in the +man for ibv_wr_post and mlx5dv_qp with its available builders and setters. + +## QP Specific setters + +*DCI* QPs +: *mlx5dv_wr_set_dc_addr()* must be called to set the DCI WR properties. The + destination address of the work is specified by *ah*, the remote DCT + number is specified by *remote_dctn* and the DC key is specified by + *remote_dc_key*. + This setter is available when the QP transport is DCI and send_ops_flags + in struct ibv_qp_init_attr_ex is set. + The available builders and setters for DCI QP are the same as RC QP. + +# EXAMPLE + +```c +/* create DC QP type and specify the required send opcodes */ +attr_ex.qp_type = IBV_QPT_DRIVER; +attr_ex.comp_mask |= IBV_QP_INIT_ATTR_SEND_OPS_FLAGS; +attr_ex.send_ops_flags |= IBV_QP_EX_WITH_RDMA_WRITE; + +attr_dv.comp_mask |= MLX5DV_QP_INIT_ATTR_MASK_DC; +attr_dv.dc_init_attr.dc_type = MLX5DV_DCTYPE_DCI; + +ibv_qp *qp = mlx5dv_create_qp(ctx, attr_ex, attr_dv); +ibv_qp_ex *qpx = ibv_qp_to_qp_ex(qp); +mlx5dv_qp_ex *mqpx = mlx5dv_qp_ex_from_ibv_qp_ex(qpx); + +ibv_wr_start(qpx); + +/* Use ibv_qp_ex object to set WR generic attributes */ +qpx->wr_id = my_wr_id_1; +qpx->wr_flags = IBV_SEND_SIGNALED; +ibv_wr_rdma_write(qpx, rkey, remote_addr_1); +ibv_wr_set_sge(qpx, lkey, local_addr_1, length_1); + +/* Use mlx5 DC setter using mlx5dv_qp_ex object */ +mlx5dv_wr_set_wr_dc_addr(mqpx, ah, remote_dctn, remote_dc_key); + +ret = ibv_wr_complete(qpx); +``` + +# SEE ALSO + +**ibv_post_send**(3), **ibv_create_qp_ex(3)**, **ibv_wr_post(3)**. + +# AUTHOR + +Guy Levi diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h index 3a22fde..c7c54fd 100644 --- a/providers/mlx5/mlx5.h +++ b/providers/mlx5/mlx5.h @@ -503,6 +503,7 @@ enum mlx5_qp_flags { struct mlx5_qp { struct mlx5_resource rsc; /* This struct must be first */ struct verbs_qp verbs_qp; + struct mlx5dv_qp_ex dv_qp; struct ibv_qp *ibv_qp; struct mlx5_buf buf; int max_inline_data; @@ -690,6 +691,11 @@ static inline struct mlx5_qp *to_mqp(struct ibv_qp *ibqp) return container_of(vqp, struct mlx5_qp, verbs_qp); } +static inline struct mlx5_qp *mqp_from_mlx5dv_qp_ex(struct mlx5dv_qp_ex *dv_qp) +{ + return container_of(dv_qp, struct mlx5_qp, dv_qp); +} + static inline struct mlx5_rwq *to_mrwq(struct ibv_wq *ibwq) { return container_of(ibwq, struct mlx5_rwq, wq); @@ -930,7 +936,8 @@ int mlx5_advise_mr(struct ibv_pd *pd, struct ibv_sge *sg_list, uint32_t num_sges); int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, - const struct ibv_qp_init_attr_ex *attr); + const struct ibv_qp_init_attr_ex *attr, + const struct mlx5dv_qp_init_attr *mlx5_attr); static inline void *mlx5_find_uidx(struct mlx5_context *ctx, uint32_t uidx) { diff --git a/providers/mlx5/mlx5dv.h b/providers/mlx5/mlx5dv.h index e2788d8..de4018c 100644 --- a/providers/mlx5/mlx5dv.h +++ b/providers/mlx5/mlx5dv.h @@ -193,6 +193,26 @@ struct ibv_qp *mlx5dv_create_qp(struct ibv_context *context, struct ibv_qp_init_attr_ex *qp_attr, struct mlx5dv_qp_init_attr *mlx5_qp_attr); +struct mlx5dv_qp_ex { + uint64_t comp_mask; + /* + * Available just for the MLX5 DC QP type with send opcodes of type: + * rdma, atomic and send. + */ + void (*wr_set_dc_addr)(struct mlx5dv_qp_ex *mqp, struct ibv_ah *ah, + uint32_t remote_dctn, uint64_t remote_dc_key); +}; + +struct mlx5dv_qp_ex *mlx5dv_qp_ex_from_ibv_qp_ex(struct ibv_qp_ex *qp); + +static inline void mlx5dv_wr_set_dc_addr(struct mlx5dv_qp_ex *mqp, + struct ibv_ah *ah, + uint32_t remote_dctn, + uint64_t remote_dc_key) +{ + mqp->wr_set_dc_addr(mqp, ah, remote_dctn, remote_dc_key); +} + enum mlx5dv_flow_action_esp_mask { MLX5DV_FLOW_ACTION_ESP_MASK_FLAGS = 1 << 0, }; diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c index f3bce40..b2f749c 100644 --- a/providers/mlx5/qp.c +++ b/providers/mlx5/qp.c @@ -1168,7 +1168,7 @@ int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, } enum { - WQE_REQ_SETTERS_UD_XRC = 2, + WQE_REQ_SETTERS_UD_XRC_DC = 2, }; static void mlx5_send_wr_start(struct ibv_qp_ex *ibqp) @@ -1296,14 +1296,15 @@ static inline void _mlx5_send_wr_send(struct ibv_qp_ex *ibqp, _common_wqe_init(ibqp, ib_op); - if (ibqp->qp_base.qp_type == IBV_QPT_UD) + if (ibqp->qp_base.qp_type == IBV_QPT_UD || + ibqp->qp_base.qp_type == IBV_QPT_DRIVER) transport_seg_sz = sizeof(struct mlx5_wqe_datagram_seg); else if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) transport_seg_sz = sizeof(struct mlx5_wqe_xrc_seg); mqp->cur_data = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg) + transport_seg_sz; - /* In UD, cur_data may overrun the SQ */ + /* In UD/DC cur_data may overrun the SQ */ if (unlikely(mqp->cur_data == mqp->sq.qend)) mqp->cur_data = mlx5_get_send_wqe(mqp, 0); @@ -1435,11 +1436,16 @@ static inline void _mlx5_send_wr_rdma(struct ibv_qp_ex *ibqp, _common_wqe_init(ibqp, ib_op); - if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) + if (ibqp->qp_base.qp_type == IBV_QPT_DRIVER) + transport_seg_sz = sizeof(struct mlx5_wqe_datagram_seg); + else if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) transport_seg_sz = sizeof(struct mlx5_wqe_xrc_seg); raddr_seg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg) + transport_seg_sz; + /* In DC raddr_seg may overrun the SQ */ + if (unlikely(raddr_seg == mqp->sq.qend)) + raddr_seg = mlx5_get_send_wqe(mqp, 0); set_raddr_seg(raddr_seg, remote_addr, rkey); @@ -1490,11 +1496,16 @@ static inline void _mlx5_send_wr_atomic(struct ibv_qp_ex *ibqp, uint32_t rkey, _common_wqe_init(ibqp, ib_op); - if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) + if (ibqp->qp_base.qp_type == IBV_QPT_DRIVER) + transport_seg_sz = sizeof(struct mlx5_wqe_datagram_seg); + else if (ibqp->qp_base.qp_type == IBV_QPT_XRC_SEND) transport_seg_sz = sizeof(struct mlx5_wqe_xrc_seg); raddr_seg = (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg) + transport_seg_sz; + /* In DC raddr_seg may overrun the SQ */ + if (unlikely(raddr_seg == mqp->sq.qend)) + raddr_seg = mlx5_get_send_wqe(mqp, 0); set_raddr_seg(raddr_seg, remote_addr, rkey); @@ -1608,14 +1619,14 @@ mlx5_send_wr_set_sge_rc_uc(struct ibv_qp_ex *ibqp, uint32_t lkey, } static void -mlx5_send_wr_set_sge_ud_xrc(struct ibv_qp_ex *ibqp, uint32_t lkey, - uint64_t addr, uint32_t length) +mlx5_send_wr_set_sge_ud_xrc_dc(struct ibv_qp_ex *ibqp, uint32_t lkey, + uint64_t addr, uint32_t length) { struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); _mlx5_send_wr_set_sge(mqp, lkey, addr, length); - if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) _common_wqe_finilize(mqp); else mqp->cur_setters_cnt++; @@ -1696,14 +1707,14 @@ mlx5_send_wr_set_sge_list_rc_uc(struct ibv_qp_ex *ibqp, size_t num_sge, } static void -mlx5_send_wr_set_sge_list_ud_xrc(struct ibv_qp_ex *ibqp, size_t num_sge, - const struct ibv_sge *sg_list) +mlx5_send_wr_set_sge_list_ud_xrc_dc(struct ibv_qp_ex *ibqp, size_t num_sge, + const struct ibv_sge *sg_list) { struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); _mlx5_send_wr_set_sge_list(mqp, num_sge, sg_list); - if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) _common_wqe_finilize(mqp); else mqp->cur_setters_cnt++; @@ -1833,14 +1844,14 @@ mlx5_send_wr_set_inline_data_rc_uc(struct ibv_qp_ex *ibqp, void *addr, } static void -mlx5_send_wr_set_inline_data_ud_xrc(struct ibv_qp_ex *ibqp, void *addr, - size_t length) +mlx5_send_wr_set_inline_data_ud_xrc_dc(struct ibv_qp_ex *ibqp, void *addr, + size_t length) { struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); _mlx5_send_wr_set_inline_data(mqp, addr, length); - if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) _common_wqe_finilize(mqp); else mqp->cur_setters_cnt++; @@ -1927,15 +1938,15 @@ mlx5_send_wr_set_inline_data_list_rc_uc(struct ibv_qp_ex *ibqp, } static void -mlx5_send_wr_set_inline_data_list_ud_xrc(struct ibv_qp_ex *ibqp, - size_t num_buf, - const struct ibv_data_buf *buf_list) +mlx5_send_wr_set_inline_data_list_ud_xrc_dc(struct ibv_qp_ex *ibqp, + size_t num_buf, + const struct ibv_data_buf *buf_list) { struct mlx5_qp *mqp = to_mqp((struct ibv_qp *)ibqp); _mlx5_send_wr_set_inline_data_list(mqp, num_buf, buf_list); - if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) _common_wqe_finilize(mqp); else mqp->cur_setters_cnt++; @@ -2012,7 +2023,7 @@ mlx5_send_wr_set_ud_addr(struct ibv_qp_ex *ibqp, struct ibv_ah *ah, _set_datagram_seg(dseg, &mah->av, remote_qpn, remote_qkey); - if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) _common_wqe_finilize(mqp); else mqp->cur_setters_cnt++; @@ -2027,7 +2038,27 @@ mlx5_send_wr_set_xrc_srqn(struct ibv_qp_ex *ibqp, uint32_t remote_srqn) xrc_seg->xrc_srqn = htobe32(remote_srqn); - if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC - 1) + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) + _common_wqe_finilize(mqp); + else + mqp->cur_setters_cnt++; +} + +static void mlx5_send_wr_set_dc_addr(struct mlx5dv_qp_ex *dv_qp, + struct ibv_ah *ah, + uint32_t remote_dctn, + uint64_t remote_dc_key) +{ + struct mlx5_qp *mqp = mqp_from_mlx5dv_qp_ex(dv_qp); + struct mlx5_wqe_datagram_seg *dseg = + (void *)mqp->cur_ctrl + sizeof(struct mlx5_wqe_ctrl_seg); + struct mlx5_ah *mah = to_mah(ah); + + memcpy(&dseg->av, &mah->av, sizeof(dseg->av)); + dseg->av.dqp_dct |= htobe32(remote_dctn | MLX5_EXTENDED_UD_AV); + dseg->av.key.dc_key = htobe64(remote_dc_key); + + if (mqp->cur_setters_cnt == WQE_REQ_SETTERS_UD_XRC_DC - 1) _common_wqe_finilize(mqp); else mqp->cur_setters_cnt++; @@ -2047,6 +2078,8 @@ enum { IBV_QP_EX_WITH_BIND_MW, MLX5_SUPPORTED_SEND_OPS_FLAGS_XRC = MLX5_SUPPORTED_SEND_OPS_FLAGS_RC, + MLX5_SUPPORTED_SEND_OPS_FLAGS_DCI = + MLX5_SUPPORTED_SEND_OPS_FLAGS_RC, MLX5_SUPPORTED_SEND_OPS_FLAGS_UD = IBV_QP_EX_WITH_SEND | IBV_QP_EX_WITH_SEND_WITH_IMM, @@ -2063,7 +2096,7 @@ enum { IBV_QP_EX_WITH_TSO, }; -static void fill_wr_builders_rc_xrc(struct ibv_qp_ex *ibqp) +static void fill_wr_builders_rc_xrc_dc(struct ibv_qp_ex *ibqp) { ibqp->wr_send = mlx5_send_wr_send_other; ibqp->wr_send_imm = mlx5_send_wr_send_imm; @@ -2108,12 +2141,12 @@ static void fill_wr_setters_rc_uc(struct ibv_qp_ex *ibqp) ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_rc_uc; } -static void fill_wr_setters_ud_xrc(struct ibv_qp_ex *ibqp) +static void fill_wr_setters_ud_xrc_dc(struct ibv_qp_ex *ibqp) { - ibqp->wr_set_sge = mlx5_send_wr_set_sge_ud_xrc; - ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_ud_xrc; - ibqp->wr_set_inline_data = mlx5_send_wr_set_inline_data_ud_xrc; - ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_ud_xrc; + ibqp->wr_set_sge = mlx5_send_wr_set_sge_ud_xrc_dc; + ibqp->wr_set_sge_list = mlx5_send_wr_set_sge_list_ud_xrc_dc; + ibqp->wr_set_inline_data = mlx5_send_wr_set_inline_data_ud_xrc_dc; + ibqp->wr_set_inline_data_list = mlx5_send_wr_set_inline_data_list_ud_xrc_dc; } static void fill_wr_setters_eth(struct ibv_qp_ex *ibqp) @@ -2125,10 +2158,12 @@ static void fill_wr_setters_eth(struct ibv_qp_ex *ibqp) } int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, - const struct ibv_qp_init_attr_ex *attr) + const struct ibv_qp_init_attr_ex *attr, + const struct mlx5dv_qp_init_attr *mlx5_attr) { struct ibv_qp_ex *ibqp = &mqp->verbs_qp.qp_ex; uint64_t ops = attr->send_ops_flags; + struct mlx5dv_qp_ex *dv_qp; ibqp->wr_start = mlx5_send_wr_start; ibqp->wr_complete = mlx5_send_wr_complete; @@ -2145,7 +2180,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_RC) return EOPNOTSUPP; - fill_wr_builders_rc_xrc(ibqp); + fill_wr_builders_rc_xrc_dc(ibqp); fill_wr_setters_rc_uc(ibqp); break; @@ -2161,8 +2196,8 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_XRC) return EOPNOTSUPP; - fill_wr_builders_rc_xrc(ibqp); - fill_wr_setters_ud_xrc(ibqp); + fill_wr_builders_rc_xrc_dc(ibqp); + fill_wr_setters_ud_xrc_dc(ibqp); ibqp->wr_set_xrc_srqn = mlx5_send_wr_set_xrc_srqn; break; @@ -2174,7 +2209,7 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, return EOPNOTSUPP; fill_wr_builders_ud(ibqp); - fill_wr_setters_ud_xrc(ibqp); + fill_wr_setters_ud_xrc_dc(ibqp); ibqp->wr_set_ud_addr = mlx5_send_wr_set_ud_addr; break; @@ -2186,6 +2221,21 @@ int mlx5_qp_fill_wr_pfns(struct mlx5_qp *mqp, fill_wr_setters_eth(ibqp); break; + case IBV_QPT_DRIVER: + dv_qp = &mqp->dv_qp; + + if (!(mlx5_attr->comp_mask & MLX5DV_QP_INIT_ATTR_MASK_DC && + mlx5_attr->dc_init_attr.dc_type == MLX5DV_DCTYPE_DCI)) + return EOPNOTSUPP; + + if (ops & ~MLX5_SUPPORTED_SEND_OPS_FLAGS_DCI) + return EOPNOTSUPP; + + fill_wr_builders_rc_xrc_dc(ibqp); + fill_wr_setters_ud_xrc_dc(ibqp); + dv_qp->wr_set_dc_addr = mlx5_send_wr_set_dc_addr; + break; + default: return EOPNOTSUPP; } diff --git a/providers/mlx5/verbs.c b/providers/mlx5/verbs.c index 870279e..abbbf5a 100644 --- a/providers/mlx5/verbs.c +++ b/providers/mlx5/verbs.c @@ -1913,7 +1913,17 @@ static struct ibv_qp *create_qp(struct ibv_context *context, qp->atomics_enabled = 1; if (attr->comp_mask & IBV_QP_INIT_ATTR_SEND_OPS_FLAGS) { - ret = mlx5_qp_fill_wr_pfns(qp, attr); + /* + * Scatter2cqe, which is a data-path optimization, is disabled + * since driver DC data-path doesn't support it. + */ + if (mlx5_qp_attr && + mlx5_qp_attr->comp_mask & MLX5DV_QP_INIT_ATTR_MASK_DC) { + mlx5_create_flags &= ~MLX5_QP_FLAG_SCATTER_CQE; + scatter_to_cqe_configured = true; + } + + ret = mlx5_qp_fill_wr_pfns(qp, attr, mlx5_qp_attr); if (ret) { errno = ret; mlx5_dbg(fp, MLX5_DBG_QP, "Failed to handle operations flags (errno %d)\n", errno); @@ -2589,6 +2599,11 @@ struct ibv_qp *mlx5dv_create_qp(struct ibv_context *context, return create_qp(context, qp_attr, mlx5_qp_attr); } +struct mlx5dv_qp_ex *mlx5dv_qp_ex_from_ibv_qp_ex(struct ibv_qp_ex *qp) +{ + return &(container_of(qp, struct mlx5_qp, verbs_qp.qp_ex))->dv_qp; +} + int mlx5_get_srq_num(struct ibv_srq *srq, uint32_t *srq_num) { struct mlx5_srq *msrq = to_msrq(srq);