From patchwork Tue Feb 26 09:18:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gal Pressman X-Patchwork-Id: 10829747 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47AAA1805 for ; Tue, 26 Feb 2019 09:19:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BDE62AF1F for ; Tue, 26 Feb 2019 09:19:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1CAD32AF0B; Tue, 26 Feb 2019 09:19:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E11292AF0B for ; Tue, 26 Feb 2019 09:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727465AbfBZJTj (ORCPT ); Tue, 26 Feb 2019 04:19:39 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:14070 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727426AbfBZJTi (ORCPT ); Tue, 26 Feb 2019 04:19:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1551172768; x=1582708768; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=eCspMCYOmVbpak3ZvbRKSJzFTkTjSlDx5Ir1QPn1S5A=; b=qAiagElu1a3L0ENlRwIRgyrSQHBajsCa7zixs+ekkA/u7GmkiOY3CzLA Xl5V3H4OhjdBlp/37tQfieP99vJewxeBLJQHzDR6C+z7SAwnyE0Btg1MJ JkFKN3W3NuP6yUtSFC+ddGt/lqbAJIXjOxGUpBLiR6dI9UskYW17iPJ4d s=; X-IronPort-AV: E=Sophos;i="5.58,415,1544486400"; d="scan'208";a="388270591" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-c5104f52.us-west-2.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 26 Feb 2019 09:19:19 +0000 Received: from EX13MTAUEA001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-2a-c5104f52.us-west-2.amazon.com (8.14.7/8.14.7) with ESMTP id x1Q9JGIl017486 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Tue, 26 Feb 2019 09:19:18 GMT Received: from EX13D19EUA003.ant.amazon.com (10.43.165.175) by EX13MTAUEA001.ant.amazon.com (10.43.61.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 26 Feb 2019 09:19:17 +0000 Received: from EX13MTAUEA001.ant.amazon.com (10.43.61.82) by EX13D19EUA003.ant.amazon.com (10.43.165.175) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 26 Feb 2019 09:19:15 +0000 Received: from galpress-VirtualBox.hfa16.amazon.com (10.218.62.29) by mail-relay.amazon.com (10.43.61.243) with Microsoft SMTP Server id 15.0.1367.3 via Frontend Transport; Tue, 26 Feb 2019 09:19:12 +0000 From: Gal Pressman To: Jason Gunthorpe , Doug Ledford CC: Yossi Leybovich , Alexander Matushevsky , Leah Shalev , Dave Goodell , Brian Barrett , , Sean Hefty , "Dennis Dalessandro" , Leon Romanovsky , Christoph Hellwig , Parav Pandit , Sagi Grimberg , Gal Pressman Subject: [PATCH rdma-core 1/3] efa: Elastic Fabric Adapter (EFA) userspace RDMA provider Date: Tue, 26 Feb 2019 11:18:57 +0200 Message-ID: <1551172739-399-2-git-send-email-galpress@amazon.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1551172739-399-1-git-send-email-galpress@amazon.com> References: <1551172739-399-1-git-send-email-galpress@amazon.com> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce a provider that exposes EFA devices to user applications via the verbs interface. Signed-off-by: Gal Pressman --- CMakeLists.txt | 1 + MAINTAINERS | 5 + README.md | 1 + debian/control | 9 +- debian/copyright | 4 + kernel-headers/CMakeLists.txt | 2 + kernel-headers/rdma/efa-abi.h | 129 ++++ kernel-headers/rdma/rdma_user_ioctl_cmds.h | 1 + libibverbs/verbs.h | 1 + providers/efa/CMakeLists.txt | 4 + providers/efa/efa-abi.h | 56 ++ providers/efa/efa.c | 155 ++++ providers/efa/efa.h | 200 +++++ providers/efa/efa_io_defs.h | 679 +++++++++++++++++ providers/efa/verbs.c | 1142 ++++++++++++++++++++++++++++ providers/efa/verbs.h | 72 ++ redhat/rdma-core.spec | 3 + suse/rdma-core.spec | 2 + 18 files changed, 2463 insertions(+), 3 deletions(-) create mode 100644 kernel-headers/rdma/efa-abi.h create mode 100644 providers/efa/CMakeLists.txt create mode 100644 providers/efa/efa-abi.h create mode 100644 providers/efa/efa.c create mode 100644 providers/efa/efa.h create mode 100644 providers/efa/efa_io_defs.h create mode 100644 providers/efa/verbs.c create mode 100644 providers/efa/verbs.h diff --git a/CMakeLists.txt b/CMakeLists.txt index 5959e257b7fa..d0f42283ff8a 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -579,6 +579,7 @@ if (HAVE_COHERENT_DMA) add_subdirectory(providers/bnxt_re) add_subdirectory(providers/cxgb3) # NO SPARSE add_subdirectory(providers/cxgb4) # NO SPARSE +add_subdirectory(providers/efa) add_subdirectory(providers/hns) add_subdirectory(providers/i40iw) # NO SPARSE add_subdirectory(providers/mlx4) diff --git a/MAINTAINERS b/MAINTAINERS index cc81916ea499..a3baa6630bed 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -61,6 +61,11 @@ M: Steve Wise S: Supported F: providers/cxgb4/ +EFA USERSPACE PROVIDER (for efa.ko) +M: Gal Pressman +S: Supported +F: providers/efa/ + HF1 USERSPACE PROVIDER (for hf1.ko) M: Mike Marciniszyn M: Dennis Dalessandro diff --git a/README.md b/README.md index 9b78c78a15f5..0cda4ee61aa5 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ is included: - qedr.ko - rdma_rxe.ko - vmw_pvrdma.ko + - efa.ko Additional service daemons are provided for: - srp_daemon (ib_srp.ko) diff --git a/debian/control b/debian/control index 9fb546ac8999..2f3c82d03ab6 100644 --- a/debian/control +++ b/debian/control @@ -64,19 +64,21 @@ Package: ibverbs-providers Architecture: linux-any Multi-Arch: same Depends: ${misc:Depends}, ${shlibs:Depends} -Provides: libcxgb3-1, libipathverbs1, libmlx4-1, libmlx5-1, libmthca1, libnes1 +Provides: libcxgb3-1, libipathverbs1, libmlx4-1, libmlx5-1, libmthca1, libnes1, libefa-1 Replaces: libcxgb3-1 (<< 15), libipathverbs1 (<< 15), libmlx4-1 (<< 15), libmlx5-1 (<< 15), libmthca1 (<< 15), - libnes1 (<< 15) + libnes1 (<< 15), + libefa-1 (<< 15) Breaks: libcxgb3-1 (<< 15), libipathverbs1 (<< 15), libmlx4-1 (<< 15), libmlx5-1 (<< 15), libmthca1 (<< 15), - libnes1 (<< 15) + libnes1 (<< 15), + libefa-1 (<< 15) Description: User space provider drivers for libibverbs libibverbs is a library that allows userspace processes to use RDMA "verbs" as described in the InfiniBand Architecture Specification and @@ -105,6 +107,7 @@ Description: User space provider drivers for libibverbs - qedr: QLogic QL4xxx RoCE HCAs - rxe: A software implementation of the RoCE protocol - vmw_pvrdma: VMware paravirtual RDMA device + - efa: Amazon Elastic Fabric Adapter Package: ibverbs-utils Architecture: linux-any diff --git a/debian/copyright b/debian/copyright index da7f27d21e89..e60bf20b1f4f 100644 --- a/debian/copyright +++ b/debian/copyright @@ -154,6 +154,10 @@ Files: providers/cxgb3/* Copyright: 2003-2016, Chelsio Communications, Inc. License: BSD-MIT or GPL-2 +Files: providers/efa/* +Copyright: 2019 Amazon.com, Inc. or its affiliates. +License: BSD-2-clause or GPL-2 + Files: providers/hfi1verbs/* Copyright: 2005 PathScale, Inc. 2006-2009 QLogic Corporation diff --git a/kernel-headers/CMakeLists.txt b/kernel-headers/CMakeLists.txt index 16ba165e8d45..89f318f147b2 100644 --- a/kernel-headers/CMakeLists.txt +++ b/kernel-headers/CMakeLists.txt @@ -2,6 +2,7 @@ publish_internal_headers(rdma rdma/bnxt_re-abi.h rdma/cxgb3-abi.h rdma/cxgb4-abi.h + rdma/efa-abi.h rdma/hns-abi.h rdma/i40iw-abi.h rdma/ib_user_cm.h @@ -60,6 +61,7 @@ rdma_kernel_provider_abi( rdma/bnxt_re-abi.h rdma/cxgb3-abi.h rdma/cxgb4-abi.h + rdma/efa-abi.h rdma/hns-abi.h rdma/i40iw-abi.h rdma/ib_user_verbs.h diff --git a/kernel-headers/rdma/efa-abi.h b/kernel-headers/rdma/efa-abi.h new file mode 100644 index 000000000000..a97373cdfbd2 --- /dev/null +++ b/kernel-headers/rdma/efa-abi.h @@ -0,0 +1,129 @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */ +/* + * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef EFA_ABI_USER_H +#define EFA_ABI_USER_H + +#include + +/* + * Increment this value if any changes that break userspace ABI + * compatibility are made. + */ +#define EFA_UVERBS_ABI_VERSION 1 + +/* + * Keep structs aligned to 8 bytes. + * Keep reserved fields as arrays of __u8 named reserved_XXX where XXX is the + * hex bit offset of the field. + */ + +enum efa_ibv_user_cmds_supp_udata { + EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE = 1 << 0, + EFA_USER_CMDS_SUPP_UDATA_CREATE_AH = 1 << 1, +}; + +struct efa_ibv_alloc_ucontext_resp { + __u32 comp_mask; + __u32 cmds_supp_udata_mask; + __u16 sub_cqs_per_cq; + __u16 inline_buf_size; + __u32 max_llq_size; /* bytes */ +}; + +struct efa_ibv_alloc_pd_resp { + __u32 comp_mask; + __u16 pdn; + __u8 reserved_30[0x2]; +}; + +struct efa_ibv_create_cq { + __u32 comp_mask; + __u32 cq_entry_size; + __u16 num_sub_cqs; + __u8 reserved_50[0x6]; +}; + +struct efa_ibv_create_cq_resp { + __u32 comp_mask; + __u8 reserved_20[0x4]; + __aligned_u64 q_mmap_key; + __aligned_u64 q_mmap_size; + __u16 cq_idx; + __u8 reserved_d0[0x6]; +}; + +enum { + EFA_QP_DRIVER_TYPE_SRD = 0, +}; + +struct efa_ibv_create_qp { + __u32 comp_mask; + __u32 rq_ring_size; /* bytes */ + __u32 sq_ring_size; /* bytes */ + __u32 driver_qp_type; +}; + +struct efa_ibv_create_qp_resp { + __u32 comp_mask; + /* the offset inside the page of the rq db */ + __u32 rq_db_offset; + /* the offset inside the page of the sq db */ + __u32 sq_db_offset; + /* the offset inside the page of descriptors buffer */ + __u32 llq_desc_offset; + __aligned_u64 rq_mmap_key; + __aligned_u64 rq_mmap_size; + __aligned_u64 rq_db_mmap_key; + __aligned_u64 sq_db_mmap_key; + __aligned_u64 llq_desc_mmap_key; + __u16 send_sub_cq_idx; + __u16 recv_sub_cq_idx; + __u8 reserved_1e0[0x4]; +}; + +struct efa_ibv_create_ah_resp { + __u32 comp_mask; + __u16 efa_address_handle; + __u8 reserved_30[0x2]; +}; + +struct efa_ibv_ex_query_device_resp { + __u32 comp_mask; + __u32 max_sq_wr; + __u32 max_rq_wr; + __u16 max_sq_sge; + __u16 max_rq_sge; +}; + +#endif /* EFA_ABI_USER_H */ diff --git a/kernel-headers/rdma/rdma_user_ioctl_cmds.h b/kernel-headers/rdma/rdma_user_ioctl_cmds.h index 06c34d99be85..26213f49f5c8 100644 --- a/kernel-headers/rdma/rdma_user_ioctl_cmds.h +++ b/kernel-headers/rdma/rdma_user_ioctl_cmds.h @@ -102,6 +102,7 @@ enum rdma_driver_id { RDMA_DRIVER_RXE, RDMA_DRIVER_HFI1, RDMA_DRIVER_QIB, + RDMA_DRIVER_EFA, }; #endif diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index 4cc87202b881..50c671daf607 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -1960,6 +1960,7 @@ extern const struct verbs_device_ops verbs_provider_ocrdma; extern const struct verbs_device_ops verbs_provider_qedr; extern const struct verbs_device_ops verbs_provider_rxe; extern const struct verbs_device_ops verbs_provider_vmw_pvrdma; +extern const struct verbs_device_ops verbs_provider_efa; extern const struct verbs_device_ops verbs_provider_all; extern const struct verbs_device_ops verbs_provider_none; void ibv_static_providers(void *unused, ...); diff --git a/providers/efa/CMakeLists.txt b/providers/efa/CMakeLists.txt new file mode 100644 index 000000000000..1fb6acaa4ecb --- /dev/null +++ b/providers/efa/CMakeLists.txt @@ -0,0 +1,4 @@ +rdma_provider(efa + efa.c + verbs.c +) diff --git a/providers/efa/efa-abi.h b/providers/efa/efa-abi.h new file mode 100644 index 000000000000..4ebf228263d6 --- /dev/null +++ b/providers/efa/efa-abi.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ +/* + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __EFA_ABI_H__ +#define __EFA_ABI_H__ + +#include +#include +#include + +#define EFA_ABI_VERSION 1 + +DECLARE_DRV_CMD(efa_alloc_ucontext, IB_USER_VERBS_CMD_GET_CONTEXT, empty, + efa_ibv_alloc_ucontext_resp); +DECLARE_DRV_CMD(efa_alloc_pd, IB_USER_VERBS_CMD_ALLOC_PD, empty, + efa_ibv_alloc_pd_resp); +DECLARE_DRV_CMD(efa_create_cq, IB_USER_VERBS_CMD_CREATE_CQ, efa_ibv_create_cq, + efa_ibv_create_cq_resp); +DECLARE_DRV_CMD(efa_create_qp, IB_USER_VERBS_CMD_CREATE_QP, efa_ibv_create_qp, + efa_ibv_create_qp_resp); +DECLARE_DRV_CMD(efa_create_ah, IB_USER_VERBS_CMD_CREATE_AH, empty, + efa_ibv_create_ah_resp); +DECLARE_DRV_CMD(efa_query_device_ex, IB_USER_VERBS_EX_CMD_QUERY_DEVICE, empty, + efa_ibv_ex_query_device_resp); + +#endif /* __EFA_ABI_H__ */ diff --git a/providers/efa/efa.c b/providers/efa/efa.c new file mode 100644 index 000000000000..d2287bda4467 --- /dev/null +++ b/providers/efa/efa.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause +/* + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include "efa.h" +#include "verbs.h" + +#define PCI_VENDOR_ID_AMAZON 0x1d0f + +static const struct verbs_match_ent efa_table[] = { + VERBS_PCI_MATCH(PCI_VENDOR_ID_AMAZON, 0xefa0, NULL), +}; + +static const struct verbs_context_ops efa_ctx_ops = { + .alloc_pd = efa_alloc_pd, + .create_ah = efa_create_ah, + .create_cq = efa_create_cq, + .create_qp = efa_create_qp, + .dealloc_pd = efa_dealloc_pd, + .dereg_mr = efa_dereg_mr, + .destroy_ah = efa_destroy_ah, + .destroy_cq = efa_destroy_cq, + .destroy_qp = efa_destroy_qp, + .modify_qp = efa_modify_qp, + .poll_cq = efa_poll_cq, + .post_recv = efa_post_recv, + .post_send = efa_post_send, + .query_device = efa_query_device, + .query_device_ex = efa_query_device_ex, + .query_port = efa_query_port, + .query_qp = efa_query_qp, + .reg_mr = efa_reg_mr, +}; + +static struct verbs_context *efa_alloc_context(struct ibv_device *vdev, + int cmd_fd, + void *private_data) +{ + struct efa_alloc_ucontext_resp resp; + struct ibv_device_attr_ex attr; + struct ibv_get_context cmd; + struct efa_context *ctx; + int err; + + ctx = verbs_init_and_alloc_context(vdev, cmd_fd, ctx, ibvctx, + RDMA_DRIVER_EFA); + if (!ctx) + return NULL; + + memset(&resp, 0, sizeof(resp)); + if (ibv_cmd_get_context(&ctx->ibvctx, &cmd, sizeof(cmd), + &resp.ibv_resp, sizeof(resp))) + goto failed; + + ctx->sub_cqs_per_cq = resp.sub_cqs_per_cq; + ctx->cmds_supp_udata_mask = resp.cmds_supp_udata_mask; + ctx->cqe_size = sizeof(struct efa_io_rx_cdesc); + ctx->inline_buf_size = resp.inline_buf_size; + ctx->max_llq_size = resp.max_llq_size; + pthread_spin_init(&ctx->qp_table_lock, PTHREAD_PROCESS_PRIVATE); + + verbs_set_ops(&ctx->ibvctx, &efa_ctx_ops); + + err = efa_query_device_ex(&ctx->ibvctx.context, NULL, &attr, + sizeof(attr)); + if (err) + goto failed; + + ctx->qp_table = calloc(attr.orig_attr.max_qp, sizeof(*ctx->qp_table)); + if (!ctx->qp_table) + goto failed; + + return &ctx->ibvctx; + +failed: + verbs_uninit_context(&ctx->ibvctx); + free(ctx); + return NULL; +} + +static void efa_free_context(struct ibv_context *ibvctx) +{ + struct efa_context *ctx = to_efa_context(ibvctx); + + pthread_spin_destroy(&ctx->qp_table_lock); + verbs_uninit_context(&ctx->ibvctx); + free(ctx); +} + +static struct verbs_device *efa_device_alloc(struct verbs_sysfs_dev *sysfs_dev) +{ + struct efa_dev *dev; + + dev = calloc(1, sizeof(*dev)); + if (!dev) + return NULL; + + dev->pg_sz = sysconf(_SC_PAGESIZE); + + return &dev->vdev; +} + +static void efa_uninit_device(struct verbs_device *verbs_device) +{ + struct efa_dev *dev = to_efa_dev(&verbs_device->device); + + free(dev); +} + +static const struct verbs_device_ops efa_dev_ops = { + .name = "efa", + .match_min_abi_version = EFA_ABI_VERSION, + .match_max_abi_version = EFA_ABI_VERSION, + .match_table = efa_table, + .alloc_device = efa_device_alloc, + .uninit_device = efa_uninit_device, + .alloc_context = efa_alloc_context, + .free_context = efa_free_context, +}; +PROVIDER_DRIVER(efa, efa_dev_ops); diff --git a/providers/efa/efa.h b/providers/efa/efa.h new file mode 100644 index 000000000000..dfd4eb7a2b18 --- /dev/null +++ b/providers/efa/efa.h @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ +/* + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __EFA_H__ +#define __EFA_H__ + +#include +#include +#include + +#include +#include + +#include "efa-abi.h" +#include "efa_io_defs.h" + +static inline unsigned long align(unsigned long val, unsigned long align) +{ + return (val + align - 1) & ~(align - 1); +} + +static inline uint32_t align_up_queue_size(uint32_t req) +{ + req--; + req |= req >> 1; + req |= req >> 2; + req |= req >> 4; + req |= req >> 8; + req |= req >> 16; + req++; + return req; +} + +struct efa_context { + struct verbs_context ibvctx; + uint32_t cmds_supp_udata_mask; + uint16_t sub_cqs_per_cq; + uint16_t inline_buf_size; + uint32_t max_llq_size; + size_t cqe_size; + struct efa_qp **qp_table; + pthread_spinlock_t qp_table_lock; +}; + +struct efa_pd { + struct ibv_pd ibvpd; + struct efa_context *context; + uint16_t pdn; +}; + +struct efa_sub_cq { + uint16_t consumed_cnt; + int phase; + uint8_t *buf; + int qmask; + int cqe_size; + uint32_t ref_cnt; +}; + +struct efa_cq { + struct ibv_cq ibvcq; + uint32_t cqn; + size_t cqe_size; + uint8_t *buf; + size_t buf_size; + struct efa_sub_cq *sub_cq_arr; + uint16_t num_sub_cqs; + /* Index of next sub cq idx to poll. This is used to guarantee fairness for sub cqs */ + uint16_t next_poll_idx; + pthread_spinlock_t lock; +}; + +struct efa_wq { + uint64_t *wrid; + /* wrid_idx_pool: Pool of free indexes in the wrid array, used to select the + * wrid entry to be used to hold the next tx packet's context. + * At init time, entry N will hold value N, as OOO tx-completions arrive, + * the value stored in a given entry might not equal the entry's index. + */ + uint32_t *wrid_idx_pool; + uint32_t wqe_cnt; + uint32_t wqe_posted; + uint32_t wqe_completed; + uint16_t desc_idx; + uint16_t desc_mask; + /* wrid_idx_pool_next: Index of the next entry to use in wrid_idx_pool. */ + uint16_t wrid_idx_pool_next; + int max_sge; + int phase; + pthread_spinlock_t wqlock; +}; + +struct efa_rq { + struct efa_wq wq; + uint32_t *db; + uint8_t *buf; + size_t buf_size; + uint16_t sub_cq_idx; +}; + +struct efa_sq { + struct efa_wq wq; + uint32_t *db; + uint8_t *desc; + uint32_t desc_offset; + size_t desc_ring_mmap_size; + size_t max_inline_data; + uint16_t sub_cq_idx; +}; + +struct efa_qp { + struct ibv_qp ibvqp; + struct efa_sq sq; + struct efa_rq rq; + int page_size; + struct efa_cq *rcq; + struct efa_cq *scq; + int sq_sig_all; +}; + +struct efa_mr { + struct verbs_mr vmr; +}; + +struct efa_ah { + struct ibv_ah ibvah; + uint16_t efa_ah; +}; + +struct efa_dev { + struct verbs_device vdev; + uint8_t abi_version; + uint32_t pg_sz; + uint32_t max_sq_wr; + uint32_t max_rq_wr; + uint16_t max_sq_sge; + uint16_t max_rq_sge; +}; + +static inline struct efa_dev *to_efa_dev(struct ibv_device *ibvdev) +{ + return container_of(ibvdev, struct efa_dev, vdev.device); +} + +static inline struct efa_context *to_efa_context(struct ibv_context *ibvctx) +{ + return container_of(ibvctx, struct efa_context, ibvctx.context); +} + +static inline struct efa_pd *to_efa_pd(struct ibv_pd *ibvpd) +{ + return container_of(ibvpd, struct efa_pd, ibvpd); +} + +static inline struct efa_cq *to_efa_cq(struct ibv_cq *ibvcq) +{ + return container_of(ibvcq, struct efa_cq, ibvcq); +} + +static inline struct efa_qp *to_efa_qp(struct ibv_qp *ibvqp) +{ + return container_of(ibvqp, struct efa_qp, ibvqp); +} + +static inline struct efa_ah *to_efa_ah(struct ibv_ah *ibvah) +{ + return container_of(ibvah, struct efa_ah, ibvah); +} + +#endif /* __EFA_H__ */ diff --git a/providers/efa/efa_io_defs.h b/providers/efa/efa_io_defs.h new file mode 100644 index 000000000000..90b115ccfa48 --- /dev/null +++ b/providers/efa/efa_io_defs.h @@ -0,0 +1,679 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ +/* + * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef _EFA_IO_H_ +#define _EFA_IO_H_ + +#define BIT(nr) (1UL << (nr)) +#define GENMASK(h, l) (((1U << ((h) - (l) + 1)) - 1) << (l)) + +#define EFA_IO_TX_DESC_NUM_BUFS 2 +#define EFA_IO_TX_DESC_INLINE_MAX_SIZE 32 +#define EFA_IO_TX_DESC_IMM_DATA_SIZE 4 + +enum efa_io_queue_type { + /* send queue (of a QP) */ + EFA_IO_SEND_QUEUE = 1, + /* recv queue (of a QP) */ + EFA_IO_RECV_QUEUE = 2, +}; + +enum efa_io_send_op_type { + /* invalid op */ + EFA_IO_INVALID_OP = 0, + /* send message */ + EFA_IO_SEND = 1, + /* RDMA read, future, not supported yet */ + EFA_IO_RDMA_READ = 2, + /* RDMA write, future, not supported yet */ + EFA_IO_RDMA_WRITE = 3, +}; + +enum efa_io_comp_status { + /* Successful completion */ + EFA_IO_COMP_STATUS_OK = 0, + /* Flushed during QP destroy */ + EFA_IO_COMP_STATUS_FLUSHED = 1, + /* Internal QP error */ + EFA_IO_COMP_STATUS_LOCAL_ERROR_QP_INTERNAL_ERROR = 2, + /* Bad operation type */ + EFA_IO_COMP_STATUS_LOCAL_ERROR_INVALID_OP_TYPE = 3, + /* Bad AH */ + EFA_IO_COMP_STATUS_LOCAL_ERROR_INVALID_AH = 4, + /* LKEY not registered or does not match IOVA */ + EFA_IO_COMP_STATUS_LOCAL_ERROR_INVALID_LKEY = 5, + /* Message too long */ + EFA_IO_COMP_STATUS_LOCAL_ERROR_BAD_LENGTH = 6, + /* Destination ENI is down or does not run EFA */ + EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_ADDRESS = 7, + /* Connection was reset by remote side */ + EFA_IO_COMP_STATUS_REMOTE_ERROR_ABORT = 8, + /* Bad dest QP number (QP does not exist or is in error state) */ + EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_DEST_QPN = 9, + /* Destination resource not ready (no WQEs posted on RQ) */ + EFA_IO_COMP_STATUS_REMOTE_ERROR_RNR = 10, + /* Receiver SGL too short */ + EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_LENGTH = 11, + /* Unexpected status returned by responder */ + EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_STATUS = 12, +}; + +/* Tx Meta descriptor. */ +struct efa_io_tx_meta_desc { + /* Verbs-generated Request ID */ + uint16_t req_id; + + /* + * control flags + * 3:0 : op_type - operation type: send/rdma/fast mem + * ops/etc + * 4 : has_imm - immediate_data field carries valid + * data. + * 5 : inline_msg - inline mode - inline message data + * follows this descriptor (no buffer descriptors). + * Note that it is different from immediate data + * 6 : meta_extension - Extended metadata. MBZ + * 7 : meta_desc - Indicates metadata descriptor. + * Must be set. + */ + uint8_t ctrl1; + + /* + * control flags + * 0 : phase - phase bit. + * 1 : reserved25 - MBZ + * 2 : first - Indicates first descriptor in + * transaction. Must be set. + * 3 : last - Indicates last descriptor in + * transaction. Must be set. + * 4 : comp_req - Indicates whether completion should + * be posted, after packet is transmitted. Valid only + * for the first descriptor + * 7:5 : reserved29 - MBZ + */ + uint8_t ctrl2; + + /* destination QP number */ + uint16_t dest_qp_num; + + /* + * If inline_msg bit is set, length of inline message in bytes, + * otherwise length of SGL (number of buffers). + */ + uint16_t len; + + /* + * immediate data: if has_imm is set, then this field is included + * within Tx message and reported in remote Rx completion. + */ + uint32_t immediate_data; + + /* Address handle */ + uint16_t ah; + + uint16_t reserved; +}; + +/* + * Tx buffer descriptor, for any transport type. Preceded by metadata + * descriptor. + */ +struct efa_io_tx_buf_desc { + /* length in bytes */ + uint16_t length; + + /* + * control flags + * 6:0 : reserved16 + * 7 : meta_desc - MBZ + */ + uint8_t ctrl1; + + /* + * control flags + * 0 : phase - phase bit + * 1 : reserved25 - MBZ + * 2 : first - Indicates first descriptor in + * transaction. MBZ + * 3 : last - Indicates last descriptor in transaction + * 7:4 : reserved28 - MBZ + */ + uint8_t ctrl; + + /* memory translation key */ + uint32_t lkey; + + /* Buffer address bits[31:0] */ + uint32_t buf_addr_lo; + + /* + * 15:0 : buf_addr_hi - Buffer Pointer[47:32] + * 31:16 : reserved - Reserved + */ + uint32_t buf_addr_hi; +}; + +/* Tx meta descriptor for UD */ +struct efa_io_tx_ud_meta { + /* Queue key */ + uint32_t qkey; + + uint8_t reserved[12]; +}; + +/* Remote memory address */ +struct efa_io_remote_mem_addr { + /* length in bytes */ + uint16_t length; + + /* + * control flags + * 5:0 : reserved16 + * 6 : meta_extension - Must be set + * 7 : meta_desc - Must be set + */ + uint8_t ctrl1; + + /* + * control flags + * 0 : phase - phase bit + * 1 : reserved25 - MBZ + * 2 : first - Indicates first descriptor in + * transaction. MBZ + * 3 : last - Indicates last descriptor in transaction + * 7:4 : reserved28 - MBZ + */ + uint8_t ctrl; + + /* remote memory translation key */ + uint32_t rkey; + + /* Buffer address bits[31:0] */ + uint32_t buf_addr_lo; + + /* Buffer address bits[63:32] */ + uint32_t buf_addr_hi; +}; + +/* + * Tx WQE, composed of tx meta descriptors followed by either tx buffer + * descriptors or inline data + */ +struct efa_io_tx_wqe { + /* TX meta */ + struct efa_io_tx_meta_desc common; + + union { + /* Tx meta for UD */ + struct efa_io_tx_ud_meta ud; + + /* Reserved Tx meta for SRD */ + uint8_t srd_padding[16]; + + /* RDMA memory address */ + struct efa_io_remote_mem_addr rdma_mem_addr; + } u; + + union { + /* buffer descriptors */ + struct efa_io_tx_buf_desc sgl[2]; + + /* inline data */ + uint8_t inline_data[32]; + } data; +}; + +/* + * Rx buffer descriptor; RX WQE is composed of one or more RX buffer + * descriptors. + */ +struct efa_io_rx_desc { + /* Buffer address bits[31:0] */ + uint32_t buf_addr_lo; + + /* Buffer Pointer[63:32] */ + uint32_t buf_addr_hi; + + /* Verbs-generated request id. */ + uint16_t req_id; + + /* Length in bytes. */ + uint16_t length; + + /* + * LKey and control flags + * 23:0 : lkey + * 29:24 : reserved - MBZ + * 30 : first - Indicates first descriptor in WQE + * 31 : last - Indicates last descriptor in WQE + */ + uint32_t lkey_ctrl; +}; + +/* Common IO completion descriptor */ +struct efa_io_cdesc_common { + /* + * verbs-generated request ID, as provided in the completed tx or rx + * descriptor. + */ + uint16_t req_id; + + /* status */ + uint8_t status; + + /* + * flags + * 0 : phase - Phase bit + * 2:1 : q_type - enum efa_io_queue_type: send/recv + * 3 : has_imm - indicates that immediate data is + * present - for RX completions only + * 4 : wide_completion - indicates that wide + * completion format is used + * 7:5 : reserved29 + */ + uint8_t flags; + + /* local QP number */ + uint16_t qp_num; + + /* Transferred length */ + uint16_t length; +}; + +/* Tx completion descriptor */ +struct efa_io_tx_cdesc { + /* Common completion info */ + struct efa_io_cdesc_common common; +}; + +/* Rx Completion Descriptor */ +struct efa_io_rx_cdesc { + /* Common completion info */ + struct efa_io_cdesc_common common; + + /* Remote Address Handle FW index, 0xFFFF indicates invalid ah */ + uint16_t ah; + + /* Source QP number */ + uint16_t src_qp_num; + + /* Immediate data */ + uint32_t imm; +}; + +/* Extended Rx Completion Descriptor */ +struct efa_io_rx_cdesc_wide { + /* Base RX completion info */ + struct efa_io_rx_cdesc rx_cdesc_base; + + /* + * Word 0 of remote (source) address, needed only for in-band + * ad-hoc AH support + */ + uint32_t src_addr_0; + + /* + * Word 1 of remote (source) address, needed only for in-band + * ad-hoc AH support + */ + uint32_t src_addr_1; + + /* + * Word 2 of remote (source) address, needed only for in-band + * ad-hoc AH support + */ + uint32_t src_addr_2; + + /* + * Word 3 of remote (source) address, needed only for in-band + * ad-hoc AH support + */ + uint32_t src_addr_3; +}; + +/* tx_meta_desc */ +#define EFA_IO_TX_META_DESC_OP_TYPE_MASK GENMASK(3, 0) +#define EFA_IO_TX_META_DESC_HAS_IMM_SHIFT 4 +#define EFA_IO_TX_META_DESC_HAS_IMM_MASK BIT(4) +#define EFA_IO_TX_META_DESC_INLINE_MSG_SHIFT 5 +#define EFA_IO_TX_META_DESC_INLINE_MSG_MASK BIT(5) +#define EFA_IO_TX_META_DESC_META_EXTENSION_SHIFT 6 +#define EFA_IO_TX_META_DESC_META_EXTENSION_MASK BIT(6) +#define EFA_IO_TX_META_DESC_META_DESC_SHIFT 7 +#define EFA_IO_TX_META_DESC_META_DESC_MASK BIT(7) +#define EFA_IO_TX_META_DESC_PHASE_MASK BIT(0) +#define EFA_IO_TX_META_DESC_FIRST_SHIFT 2 +#define EFA_IO_TX_META_DESC_FIRST_MASK BIT(2) +#define EFA_IO_TX_META_DESC_LAST_SHIFT 3 +#define EFA_IO_TX_META_DESC_LAST_MASK BIT(3) +#define EFA_IO_TX_META_DESC_COMP_REQ_SHIFT 4 +#define EFA_IO_TX_META_DESC_COMP_REQ_MASK BIT(4) + +/* tx_buf_desc */ +#define EFA_IO_TX_BUF_DESC_META_DESC_SHIFT 7 +#define EFA_IO_TX_BUF_DESC_META_DESC_MASK BIT(7) +#define EFA_IO_TX_BUF_DESC_PHASE_MASK BIT(0) +#define EFA_IO_TX_BUF_DESC_FIRST_SHIFT 2 +#define EFA_IO_TX_BUF_DESC_FIRST_MASK BIT(2) +#define EFA_IO_TX_BUF_DESC_LAST_SHIFT 3 +#define EFA_IO_TX_BUF_DESC_LAST_MASK BIT(3) +#define EFA_IO_TX_BUF_DESC_BUF_ADDR_HI_MASK GENMASK(15, 0) + +/* remote_mem_addr */ +#define EFA_IO_REMOTE_MEM_ADDR_META_EXTENSION_SHIFT 6 +#define EFA_IO_REMOTE_MEM_ADDR_META_EXTENSION_MASK BIT(6) +#define EFA_IO_REMOTE_MEM_ADDR_META_DESC_SHIFT 7 +#define EFA_IO_REMOTE_MEM_ADDR_META_DESC_MASK BIT(7) +#define EFA_IO_REMOTE_MEM_ADDR_PHASE_MASK BIT(0) +#define EFA_IO_REMOTE_MEM_ADDR_FIRST_SHIFT 2 +#define EFA_IO_REMOTE_MEM_ADDR_FIRST_MASK BIT(2) +#define EFA_IO_REMOTE_MEM_ADDR_LAST_SHIFT 3 +#define EFA_IO_REMOTE_MEM_ADDR_LAST_MASK BIT(3) + +/* rx_desc */ +#define EFA_IO_RX_DESC_LKEY_MASK GENMASK(23, 0) +#define EFA_IO_RX_DESC_FIRST_SHIFT 30 +#define EFA_IO_RX_DESC_FIRST_MASK BIT(30) +#define EFA_IO_RX_DESC_LAST_SHIFT 31 +#define EFA_IO_RX_DESC_LAST_MASK BIT(31) + +/* cdesc_common */ +#define EFA_IO_CDESC_COMMON_PHASE_MASK BIT(0) +#define EFA_IO_CDESC_COMMON_Q_TYPE_SHIFT 1 +#define EFA_IO_CDESC_COMMON_Q_TYPE_MASK GENMASK(2, 1) +#define EFA_IO_CDESC_COMMON_HAS_IMM_SHIFT 3 +#define EFA_IO_CDESC_COMMON_HAS_IMM_MASK BIT(3) +#define EFA_IO_CDESC_COMMON_WIDE_COMPLETION_SHIFT 4 +#define EFA_IO_CDESC_COMMON_WIDE_COMPLETION_MASK BIT(4) + +static inline uint8_t get_efa_io_tx_meta_desc_op_type(const struct efa_io_tx_meta_desc *p) +{ + return p->ctrl1 & EFA_IO_TX_META_DESC_OP_TYPE_MASK; +} + +static inline void set_efa_io_tx_meta_desc_op_type(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl1 |= val & EFA_IO_TX_META_DESC_OP_TYPE_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_has_imm(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl1 & EFA_IO_TX_META_DESC_HAS_IMM_MASK) >> EFA_IO_TX_META_DESC_HAS_IMM_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_has_imm(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_TX_META_DESC_HAS_IMM_SHIFT) & EFA_IO_TX_META_DESC_HAS_IMM_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_inline_msg(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl1 & EFA_IO_TX_META_DESC_INLINE_MSG_MASK) >> EFA_IO_TX_META_DESC_INLINE_MSG_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_inline_msg(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_TX_META_DESC_INLINE_MSG_SHIFT) & EFA_IO_TX_META_DESC_INLINE_MSG_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_meta_extension(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl1 & EFA_IO_TX_META_DESC_META_EXTENSION_MASK) >> EFA_IO_TX_META_DESC_META_EXTENSION_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_meta_extension(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_TX_META_DESC_META_EXTENSION_SHIFT) & EFA_IO_TX_META_DESC_META_EXTENSION_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_meta_desc(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl1 & EFA_IO_TX_META_DESC_META_DESC_MASK) >> EFA_IO_TX_META_DESC_META_DESC_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_meta_desc(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_TX_META_DESC_META_DESC_SHIFT) & EFA_IO_TX_META_DESC_META_DESC_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_phase(const struct efa_io_tx_meta_desc *p) +{ + return p->ctrl2 & EFA_IO_TX_META_DESC_PHASE_MASK; +} + +static inline void set_efa_io_tx_meta_desc_phase(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl2 |= val & EFA_IO_TX_META_DESC_PHASE_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_first(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl2 & EFA_IO_TX_META_DESC_FIRST_MASK) >> EFA_IO_TX_META_DESC_FIRST_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_first(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl2 |= (val << EFA_IO_TX_META_DESC_FIRST_SHIFT) & EFA_IO_TX_META_DESC_FIRST_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_last(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl2 & EFA_IO_TX_META_DESC_LAST_MASK) >> EFA_IO_TX_META_DESC_LAST_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_last(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl2 |= (val << EFA_IO_TX_META_DESC_LAST_SHIFT) & EFA_IO_TX_META_DESC_LAST_MASK; +} + +static inline uint8_t get_efa_io_tx_meta_desc_comp_req(const struct efa_io_tx_meta_desc *p) +{ + return (p->ctrl2 & EFA_IO_TX_META_DESC_COMP_REQ_MASK) >> EFA_IO_TX_META_DESC_COMP_REQ_SHIFT; +} + +static inline void set_efa_io_tx_meta_desc_comp_req(struct efa_io_tx_meta_desc *p, uint8_t val) +{ + p->ctrl2 |= (val << EFA_IO_TX_META_DESC_COMP_REQ_SHIFT) & EFA_IO_TX_META_DESC_COMP_REQ_MASK; +} + +static inline uint8_t get_efa_io_tx_buf_desc_meta_desc(const struct efa_io_tx_buf_desc *p) +{ + return (p->ctrl1 & EFA_IO_TX_BUF_DESC_META_DESC_MASK) >> EFA_IO_TX_BUF_DESC_META_DESC_SHIFT; +} + +static inline void set_efa_io_tx_buf_desc_meta_desc(struct efa_io_tx_buf_desc *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_TX_BUF_DESC_META_DESC_SHIFT) & EFA_IO_TX_BUF_DESC_META_DESC_MASK; +} + +static inline uint8_t get_efa_io_tx_buf_desc_phase(const struct efa_io_tx_buf_desc *p) +{ + return p->ctrl & EFA_IO_TX_BUF_DESC_PHASE_MASK; +} + +static inline void set_efa_io_tx_buf_desc_phase(struct efa_io_tx_buf_desc *p, uint8_t val) +{ + p->ctrl |= val & EFA_IO_TX_BUF_DESC_PHASE_MASK; +} + +static inline uint8_t get_efa_io_tx_buf_desc_first(const struct efa_io_tx_buf_desc *p) +{ + return (p->ctrl & EFA_IO_TX_BUF_DESC_FIRST_MASK) >> EFA_IO_TX_BUF_DESC_FIRST_SHIFT; +} + +static inline void set_efa_io_tx_buf_desc_first(struct efa_io_tx_buf_desc *p, uint8_t val) +{ + p->ctrl |= (val << EFA_IO_TX_BUF_DESC_FIRST_SHIFT) & EFA_IO_TX_BUF_DESC_FIRST_MASK; +} + +static inline uint8_t get_efa_io_tx_buf_desc_last(const struct efa_io_tx_buf_desc *p) +{ + return (p->ctrl & EFA_IO_TX_BUF_DESC_LAST_MASK) >> EFA_IO_TX_BUF_DESC_LAST_SHIFT; +} + +static inline void set_efa_io_tx_buf_desc_last(struct efa_io_tx_buf_desc *p, uint8_t val) +{ + p->ctrl |= (val << EFA_IO_TX_BUF_DESC_LAST_SHIFT) & EFA_IO_TX_BUF_DESC_LAST_MASK; +} + +static inline uint32_t get_efa_io_tx_buf_desc_buf_addr_hi(const struct efa_io_tx_buf_desc *p) +{ + return p->buf_addr_hi & EFA_IO_TX_BUF_DESC_BUF_ADDR_HI_MASK; +} + +static inline void set_efa_io_tx_buf_desc_buf_addr_hi(struct efa_io_tx_buf_desc *p, uint32_t val) +{ + p->buf_addr_hi |= val & EFA_IO_TX_BUF_DESC_BUF_ADDR_HI_MASK; +} + +static inline uint8_t get_efa_io_remote_mem_addr_meta_extension(const struct efa_io_remote_mem_addr *p) +{ + return (p->ctrl1 & EFA_IO_REMOTE_MEM_ADDR_META_EXTENSION_MASK) >> EFA_IO_REMOTE_MEM_ADDR_META_EXTENSION_SHIFT; +} + +static inline void set_efa_io_remote_mem_addr_meta_extension(struct efa_io_remote_mem_addr *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_REMOTE_MEM_ADDR_META_EXTENSION_SHIFT) & EFA_IO_REMOTE_MEM_ADDR_META_EXTENSION_MASK; +} + +static inline uint8_t get_efa_io_remote_mem_addr_meta_desc(const struct efa_io_remote_mem_addr *p) +{ + return (p->ctrl1 & EFA_IO_REMOTE_MEM_ADDR_META_DESC_MASK) >> EFA_IO_REMOTE_MEM_ADDR_META_DESC_SHIFT; +} + +static inline void set_efa_io_remote_mem_addr_meta_desc(struct efa_io_remote_mem_addr *p, uint8_t val) +{ + p->ctrl1 |= (val << EFA_IO_REMOTE_MEM_ADDR_META_DESC_SHIFT) & EFA_IO_REMOTE_MEM_ADDR_META_DESC_MASK; +} + +static inline uint8_t get_efa_io_remote_mem_addr_phase(const struct efa_io_remote_mem_addr *p) +{ + return p->ctrl & EFA_IO_REMOTE_MEM_ADDR_PHASE_MASK; +} + +static inline void set_efa_io_remote_mem_addr_phase(struct efa_io_remote_mem_addr *p, uint8_t val) +{ + p->ctrl |= val & EFA_IO_REMOTE_MEM_ADDR_PHASE_MASK; +} + +static inline uint8_t get_efa_io_remote_mem_addr_first(const struct efa_io_remote_mem_addr *p) +{ + return (p->ctrl & EFA_IO_REMOTE_MEM_ADDR_FIRST_MASK) >> EFA_IO_REMOTE_MEM_ADDR_FIRST_SHIFT; +} + +static inline void set_efa_io_remote_mem_addr_first(struct efa_io_remote_mem_addr *p, uint8_t val) +{ + p->ctrl |= (val << EFA_IO_REMOTE_MEM_ADDR_FIRST_SHIFT) & EFA_IO_REMOTE_MEM_ADDR_FIRST_MASK; +} + +static inline uint8_t get_efa_io_remote_mem_addr_last(const struct efa_io_remote_mem_addr *p) +{ + return (p->ctrl & EFA_IO_REMOTE_MEM_ADDR_LAST_MASK) >> EFA_IO_REMOTE_MEM_ADDR_LAST_SHIFT; +} + +static inline void set_efa_io_remote_mem_addr_last(struct efa_io_remote_mem_addr *p, uint8_t val) +{ + p->ctrl |= (val << EFA_IO_REMOTE_MEM_ADDR_LAST_SHIFT) & EFA_IO_REMOTE_MEM_ADDR_LAST_MASK; +} + +static inline uint32_t get_efa_io_rx_desc_lkey(const struct efa_io_rx_desc *p) +{ + return p->lkey_ctrl & EFA_IO_RX_DESC_LKEY_MASK; +} + +static inline void set_efa_io_rx_desc_lkey(struct efa_io_rx_desc *p, uint32_t val) +{ + p->lkey_ctrl |= val & EFA_IO_RX_DESC_LKEY_MASK; +} + +static inline uint32_t get_efa_io_rx_desc_first(const struct efa_io_rx_desc *p) +{ + return (p->lkey_ctrl & EFA_IO_RX_DESC_FIRST_MASK) >> EFA_IO_RX_DESC_FIRST_SHIFT; +} + +static inline void set_efa_io_rx_desc_first(struct efa_io_rx_desc *p, uint32_t val) +{ + p->lkey_ctrl |= (val << EFA_IO_RX_DESC_FIRST_SHIFT) & EFA_IO_RX_DESC_FIRST_MASK; +} + +static inline uint32_t get_efa_io_rx_desc_last(const struct efa_io_rx_desc *p) +{ + return (p->lkey_ctrl & EFA_IO_RX_DESC_LAST_MASK) >> EFA_IO_RX_DESC_LAST_SHIFT; +} + +static inline void set_efa_io_rx_desc_last(struct efa_io_rx_desc *p, uint32_t val) +{ + p->lkey_ctrl |= (val << EFA_IO_RX_DESC_LAST_SHIFT) & EFA_IO_RX_DESC_LAST_MASK; +} + +static inline uint8_t get_efa_io_cdesc_common_phase(const struct efa_io_cdesc_common *p) +{ + return p->flags & EFA_IO_CDESC_COMMON_PHASE_MASK; +} + +static inline void set_efa_io_cdesc_common_phase(struct efa_io_cdesc_common *p, uint8_t val) +{ + p->flags |= val & EFA_IO_CDESC_COMMON_PHASE_MASK; +} + +static inline uint8_t get_efa_io_cdesc_common_q_type(const struct efa_io_cdesc_common *p) +{ + return (p->flags & EFA_IO_CDESC_COMMON_Q_TYPE_MASK) >> EFA_IO_CDESC_COMMON_Q_TYPE_SHIFT; +} + +static inline void set_efa_io_cdesc_common_q_type(struct efa_io_cdesc_common *p, uint8_t val) +{ + p->flags |= (val << EFA_IO_CDESC_COMMON_Q_TYPE_SHIFT) & EFA_IO_CDESC_COMMON_Q_TYPE_MASK; +} + +static inline uint8_t get_efa_io_cdesc_common_has_imm(const struct efa_io_cdesc_common *p) +{ + return (p->flags & EFA_IO_CDESC_COMMON_HAS_IMM_MASK) >> EFA_IO_CDESC_COMMON_HAS_IMM_SHIFT; +} + +static inline void set_efa_io_cdesc_common_has_imm(struct efa_io_cdesc_common *p, uint8_t val) +{ + p->flags |= (val << EFA_IO_CDESC_COMMON_HAS_IMM_SHIFT) & EFA_IO_CDESC_COMMON_HAS_IMM_MASK; +} + +static inline uint8_t get_efa_io_cdesc_common_wide_completion(const struct efa_io_cdesc_common *p) +{ + return (p->flags & EFA_IO_CDESC_COMMON_WIDE_COMPLETION_MASK) >> EFA_IO_CDESC_COMMON_WIDE_COMPLETION_SHIFT; +} + +static inline void set_efa_io_cdesc_common_wide_completion(struct efa_io_cdesc_common *p, uint8_t val) +{ + p->flags |= (val << EFA_IO_CDESC_COMMON_WIDE_COMPLETION_SHIFT) & EFA_IO_CDESC_COMMON_WIDE_COMPLETION_MASK; +} + +#endif /*_EFA_IO_H_ */ diff --git a/providers/efa/verbs.c b/providers/efa/verbs.c new file mode 100644 index 000000000000..b821cf2f5d44 --- /dev/null +++ b/providers/efa/verbs.c @@ -0,0 +1,1142 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause +/* + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include + +#include "efa.h" +#include "verbs.h" + +int efa_query_device(struct ibv_context *ibvctx, + struct ibv_device_attr *dev_attr) +{ + struct efa_context *ctx = to_efa_context(ibvctx); + struct ibv_query_device cmd; + uint8_t fw_ver[8]; + int err; + + memset(dev_attr, 0, sizeof(struct ibv_device_attr)); + err = ibv_cmd_query_device(ibvctx, dev_attr, (uint64_t *)&fw_ver, + &cmd, sizeof(cmd)); + if (err) + return err; + + dev_attr->max_qp_wr = min_t(int, dev_attr->max_qp_wr, + ctx->max_llq_size / sizeof(struct efa_io_tx_wqe)); + snprintf(dev_attr->fw_ver, sizeof(dev_attr->fw_ver), "%u.%u.%u.%u", + fw_ver[0], fw_ver[1], fw_ver[2], fw_ver[3]); + + return 0; +} + +int efa_query_port(struct ibv_context *ibvctx, uint8_t port, + struct ibv_port_attr *port_attr) +{ + struct ibv_query_port cmd; + + memset(port_attr, 0, sizeof(struct ibv_port_attr)); + return ibv_cmd_query_port(ibvctx, port, port_attr, &cmd, sizeof(cmd)); +} + +int efa_query_device_ex(struct ibv_context *context, + const struct ibv_query_device_ex_input *input, + struct ibv_device_attr_ex *attr, + size_t attr_size) +{ + struct efa_context *ctx = to_efa_context(context); + struct efa_dev *dev = to_efa_dev(context->device); + int cmd_supp_uhw = ctx->cmds_supp_udata_mask & + EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE; + struct efa_query_device_ex_resp resp; + struct ibv_query_device_ex cmd; + struct ibv_device_attr *a; + uint8_t fw_ver[8]; + int err; + + memset(&resp, 0, sizeof(resp)); + err = ibv_cmd_query_device_ex( + context, input, attr, attr_size, (uint64_t *)&fw_ver, &cmd, + sizeof(cmd), &resp.ibv_resp, + cmd_supp_uhw ? sizeof(resp) : sizeof(resp.ibv_resp)); + if (err) + return err; + + dev->max_sq_wr = resp.max_sq_wr; + dev->max_rq_wr = resp.max_rq_wr; + dev->max_sq_sge = resp.max_sq_sge; + dev->max_rq_sge = resp.max_rq_sge; + + a = &attr->orig_attr; + a->max_qp_wr = min_t(int, a->max_qp_wr, + ctx->max_llq_size / sizeof(struct efa_io_tx_wqe)); + snprintf(a->fw_ver, sizeof(a->fw_ver), "%u.%u.%u.%u", + fw_ver[0], fw_ver[1], fw_ver[2], fw_ver[3]); + + return 0; +} + +struct ibv_pd *efa_alloc_pd(struct ibv_context *ibvctx) +{ + struct efa_alloc_pd_resp resp = {}; + struct ibv_alloc_pd cmd; + struct efa_pd *pd; + + pd = calloc(1, sizeof(*pd)); + if (!pd) + return NULL; + + if (ibv_cmd_alloc_pd(ibvctx, &pd->ibvpd, &cmd, sizeof(cmd), + &resp.ibv_resp, sizeof(resp))) + goto out; + + pd->context = to_efa_context(ibvctx); + pd->pdn = resp.pdn; + + return &pd->ibvpd; + +out: + free(pd); + return NULL; +} + +int efa_dealloc_pd(struct ibv_pd *ibvpd) +{ + struct efa_pd *pd = to_efa_pd(ibvpd); + int err; + + err = ibv_cmd_dealloc_pd(ibvpd); + if (err) + return err; + free(pd); + + return 0; +} + +struct ibv_mr *efa_reg_mr(struct ibv_pd *ibvpd, void *sva, size_t len, + int access) +{ + struct ib_uverbs_reg_mr_resp resp; + struct ibv_reg_mr cmd; + struct efa_mr *mr; + + mr = calloc(1, sizeof(*mr)); + if (!mr) + return NULL; + + if (ibv_cmd_reg_mr(ibvpd, sva, len, (uintptr_t)sva, access, &mr->vmr, + &cmd, sizeof(cmd), &resp, sizeof(resp))) { + free(mr); + return NULL; + } + + return &mr->vmr.ibv_mr; +} + +int efa_dereg_mr(struct verbs_mr *vmr) +{ + struct efa_mr *mr = container_of(vmr, struct efa_mr, vmr); + int err; + + err = ibv_cmd_dereg_mr(vmr); + if (err) + return err; + free(mr); + + return 0; +} + +static uint32_t efa_sub_cq_get_current_index(struct efa_sub_cq *sub_cq) +{ + return sub_cq->consumed_cnt & sub_cq->qmask; +} + +static int efa_cqe_is_pending(struct efa_io_cdesc_common *cqe_common, + int phase) +{ + return (cqe_common->flags & EFA_IO_CDESC_COMMON_PHASE_MASK) == phase; +} + +static struct efa_io_cdesc_common * +efa_sub_cq_get_cqe(struct efa_sub_cq *sub_cq, int entry) +{ + return (struct efa_io_cdesc_common *)(sub_cq->buf + + (entry * sub_cq->cqe_size)); +} + +static void efa_sub_cq_initialize(struct efa_sub_cq *sub_cq, uint8_t *buf, + int sub_cq_size, int cqe_size) +{ + sub_cq->consumed_cnt = 0; + sub_cq->phase = 1; + sub_cq->buf = buf; + sub_cq->qmask = sub_cq_size - 1; + sub_cq->cqe_size = cqe_size; + sub_cq->ref_cnt = 0; +} + +struct ibv_cq *efa_create_cq(struct ibv_context *ibvctx, int ncqe, + struct ibv_comp_channel *channel, int vec) +{ + struct efa_context *ctx = to_efa_context(ibvctx); + struct efa_create_cq_resp resp = {}; + struct efa_create_cq cmd = {}; + uint16_t num_sub_cqs; + struct efa_cq *cq; + int sub_buf_size; + int sub_cq_size; + uint8_t *buf; + int i; + + cq = calloc(1, sizeof(*cq)); + if (!cq) + return NULL; + + num_sub_cqs = ctx->sub_cqs_per_cq; + cmd.num_sub_cqs = num_sub_cqs; + cmd.cq_entry_size = ctx->cqe_size; + + memset(&resp, 0, sizeof(resp)); + ncqe = align_up_queue_size(ncqe); + if (ibv_cmd_create_cq(ibvctx, ncqe, channel, vec, + &cq->ibvcq, &cmd.ibv_cmd, sizeof(cmd), + &resp.ibv_resp, sizeof(resp))) + goto err_free_cq; + + sub_cq_size = cq->ibvcq.cqe; + cq->cqn = resp.cq_idx; + cq->buf_size = resp.q_mmap_size; + cq->num_sub_cqs = num_sub_cqs; + cq->cqe_size = ctx->cqe_size; + + pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE); + + cq->buf = mmap(NULL, cq->buf_size, PROT_READ, MAP_SHARED, + ibvctx->cmd_fd, resp.q_mmap_key); + if (cq->buf == MAP_FAILED) + goto err_destroy_spinlock; + + cq->sub_cq_arr = calloc(cq->num_sub_cqs, sizeof(*cq->sub_cq_arr)); + if (!cq->sub_cq_arr) + goto err_unmap; + + buf = cq->buf; + sub_buf_size = cq->cqe_size * sub_cq_size; + for (i = 0; i < num_sub_cqs; i++) { + efa_sub_cq_initialize(&cq->sub_cq_arr[i], buf, sub_cq_size, + cq->cqe_size); + buf += sub_buf_size; + } + + return &cq->ibvcq; + +err_unmap: + munmap(cq->buf, cq->buf_size); +err_destroy_spinlock: + pthread_spin_destroy(&cq->lock); + ibv_cmd_destroy_cq(&cq->ibvcq); +err_free_cq: + free(cq); + return NULL; +} + +int efa_destroy_cq(struct ibv_cq *ibvcq) +{ + struct efa_cq *cq = to_efa_cq(ibvcq); + int err; + + free(cq->sub_cq_arr); + munmap(cq->buf, cq->buf_size); + + pthread_spin_destroy(&cq->lock); + + err = ibv_cmd_destroy_cq(ibvcq); + if (err) + return err; + + free(cq); + + return 0; +} + +static struct efa_io_cdesc_common * +cq_next_sub_cqe_get(struct efa_sub_cq *sub_cq) +{ + struct efa_io_cdesc_common *cqe; + uint32_t current_index; + + current_index = efa_sub_cq_get_current_index(sub_cq); + cqe = efa_sub_cq_get_cqe(sub_cq, current_index); + if (efa_cqe_is_pending(cqe, sub_cq->phase)) { + /* Do not read the rest of the completion entry before the + * phase bit has been validated. + */ + udma_from_device_barrier(); + sub_cq->consumed_cnt++; + if (!efa_sub_cq_get_current_index(sub_cq)) + sub_cq->phase = 1 - sub_cq->phase; + return cqe; + } + + return NULL; +} + +static enum ibv_wc_status to_ibv_status(enum efa_io_comp_status status) +{ + switch (status) { + case EFA_IO_COMP_STATUS_OK: + return IBV_WC_SUCCESS; + case EFA_IO_COMP_STATUS_FLUSHED: + return IBV_WC_WR_FLUSH_ERR; + case EFA_IO_COMP_STATUS_LOCAL_ERROR_QP_INTERNAL_ERROR: + case EFA_IO_COMP_STATUS_LOCAL_ERROR_INVALID_OP_TYPE: + case EFA_IO_COMP_STATUS_LOCAL_ERROR_INVALID_AH: + return IBV_WC_LOC_QP_OP_ERR; + case EFA_IO_COMP_STATUS_LOCAL_ERROR_INVALID_LKEY: + return IBV_WC_LOC_PROT_ERR; + case EFA_IO_COMP_STATUS_LOCAL_ERROR_BAD_LENGTH: + return IBV_WC_LOC_LEN_ERR; + case EFA_IO_COMP_STATUS_REMOTE_ERROR_ABORT: + return IBV_WC_REM_ABORT_ERR; + case EFA_IO_COMP_STATUS_REMOTE_ERROR_RNR: + return IBV_WC_RNR_RETRY_EXC_ERR; + case EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_DEST_QPN: + return IBV_WC_REM_INV_RD_REQ_ERR; + case EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_STATUS: + return IBV_WC_BAD_RESP_ERR; + case EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_LENGTH: + return IBV_WC_REM_INV_REQ_ERR; + case EFA_IO_COMP_STATUS_REMOTE_ERROR_BAD_ADDRESS: + default: + return IBV_WC_GENERAL_ERR; + } +} + +static int efa_poll_sub_cq(struct efa_cq *cq, struct efa_sub_cq *sub_cq, + struct efa_qp **cur_qp, struct ibv_wc *wc) +{ + struct efa_context *ctx = to_efa_context(cq->ibvcq.context); + struct efa_io_cdesc_common *cqe; + uint32_t qpn, wrid_idx; + struct efa_wq *wq; + + cqe = cq_next_sub_cqe_get(sub_cq); + if (!cqe) + return ENOMEM; + + qpn = cqe->qp_num; + if (!*cur_qp || qpn != (*cur_qp)->ibvqp.qp_num) { + /* We do not have to take the QP table lock here, + * because CQs will be locked while QPs are removed + * from the table. + */ + *cur_qp = ctx->qp_table[qpn]; + if (!*cur_qp) + return EINVAL; + } + + wrid_idx = cqe->req_id; + wc->status = to_ibv_status(cqe->status); + wc->vendor_err = cqe->status; + if (get_efa_io_cdesc_common_q_type(cqe) == EFA_IO_SEND_QUEUE) { + wq = &(*cur_qp)->sq.wq; + wc->opcode = IBV_WC_SEND; + } else { + struct efa_io_rx_cdesc *rcqe = + container_of(cqe, struct efa_io_rx_cdesc, common); + + wq = &(*cur_qp)->rq.wq; + + wc->byte_len = cqe->length; + wc->opcode = IBV_WC_RECV; + wc->src_qp = rcqe->src_qp_num; + wc->sl = 0; + wc->slid = 0; + } + + wc->wc_flags = 0; + wc->qp_num = qpn; + wq->wrid_idx_pool_next--; + wq->wrid_idx_pool[wq->wrid_idx_pool_next] = wrid_idx; + wc->wr_id = wq->wrid[wrid_idx]; + wq->wqe_completed++; + + return 0; +} + +static int efa_poll_sub_cqs(struct efa_cq *cq, struct ibv_wc *wc) +{ + uint16_t num_sub_cqs = cq->num_sub_cqs; + struct efa_sub_cq *sub_cq; + struct efa_qp *qp = NULL; + uint16_t sub_cq_idx; + int err = ENOMEM; + + for (sub_cq_idx = 0; sub_cq_idx < num_sub_cqs; sub_cq_idx++) { + sub_cq = &cq->sub_cq_arr[cq->next_poll_idx++]; + cq->next_poll_idx %= num_sub_cqs; + + if (!sub_cq->ref_cnt) + continue; + + err = efa_poll_sub_cq(cq, sub_cq, &qp, wc); + if (err != ENOMEM) + break; + } + + return err; +} + +int efa_poll_cq(struct ibv_cq *ibvcq, int nwc, struct ibv_wc *wc) +{ + struct efa_cq *cq = to_efa_cq(ibvcq); + ssize_t ret = 0; + int i; + + pthread_spin_lock(&cq->lock); + for (i = 0; i < nwc; i++) { + ret = efa_poll_sub_cqs(cq, &wc[i]); + if (ret) { + if (ret == ENOMEM) + ret = 0; + break; + } + } + pthread_spin_unlock(&cq->lock); + + return i ?: -ret; +} + +static void efa_cq_inc_ref_cnt(struct efa_cq *cq, uint8_t sub_cq_idx) +{ + cq->sub_cq_arr[sub_cq_idx].ref_cnt++; +} + +static void efa_cq_dec_ref_cnt(struct efa_cq *cq, uint8_t sub_cq_idx) +{ + cq->sub_cq_arr[sub_cq_idx].ref_cnt--; +} + +static void efa_wq_terminate(struct efa_wq *wq) +{ + pthread_spin_destroy(&wq->wqlock); + free(wq->wrid_idx_pool); + free(wq->wrid); +} + +static int efa_wq_initialize(struct efa_wq *wq) +{ + int err; + int i; + + wq->wrid = malloc(wq->wqe_cnt * sizeof(*wq->wrid)); + if (!wq->wrid) + return ENOMEM; + + wq->wrid_idx_pool = malloc(wq->wqe_cnt * sizeof(uint32_t)); + if (!wq->wrid_idx_pool) { + err = ENOMEM; + goto err_free_wrid; + } + + /* Initialize the wrid free indexes pool. */ + for (i = 0; i < wq->wqe_cnt; i++) + wq->wrid_idx_pool[i] = i; + + pthread_spin_init(&wq->wqlock, PTHREAD_PROCESS_PRIVATE); + + return 0; + +err_free_wrid: + free(wq->wrid); + + return err; +} + +static void efa_sq_terminate(struct efa_qp *qp) +{ + void *db_aligned; + + if (!qp->sq.wq.wrid) + return; + + db_aligned = (void *)((uintptr_t)qp->sq.db & ~(qp->page_size - 1)); + munmap(db_aligned, qp->page_size); + munmap(qp->sq.desc - qp->sq.desc_offset, qp->sq.desc_ring_mmap_size); + + efa_wq_terminate(&qp->sq.wq); +} + +static int efa_sq_initialize(struct efa_qp *qp, struct efa_create_qp_resp *resp) +{ + size_t desc_ring_size; + uint8_t *db_base; + int err; + + if (!qp->sq.wq.wqe_cnt) + return 0; + + err = efa_wq_initialize(&qp->sq.wq); + if (err) + return err; + + qp->sq.desc_offset = resp->llq_desc_offset; + desc_ring_size = qp->sq.wq.wqe_cnt * sizeof(struct efa_io_tx_wqe); + qp->sq.desc_ring_mmap_size = align(desc_ring_size + qp->sq.desc_offset, + qp->page_size); + qp->sq.max_inline_data = resp->ibv_resp.max_inline_data; + + qp->sq.desc = mmap(NULL, qp->sq.desc_ring_mmap_size, PROT_WRITE, + MAP_SHARED, qp->ibvqp.context->cmd_fd, + resp->llq_desc_mmap_key); + if (qp->sq.desc == MAP_FAILED) + goto err_terminate_wq; + + qp->sq.desc += qp->sq.desc_offset; + + db_base = mmap(NULL, qp->page_size, PROT_WRITE, MAP_SHARED, + qp->ibvqp.context->cmd_fd, resp->sq_db_mmap_key); + if (db_base == MAP_FAILED) + goto err_unmap_desc_ring; + + qp->sq.db = (uint32_t *)(db_base + resp->sq_db_offset); + qp->sq.sub_cq_idx = resp->send_sub_cq_idx; + + return 0; + +err_unmap_desc_ring: + munmap(qp->sq.desc - qp->sq.desc_offset, qp->sq.desc_ring_mmap_size); +err_terminate_wq: + efa_wq_terminate(&qp->sq.wq); + return EINVAL; +} + +static void efa_rq_terminate(struct efa_qp *qp) +{ + void *db_aligned; + + if (!qp->rq.wq.wrid) + return; + + db_aligned = (void *)((uintptr_t)qp->rq.db & ~(qp->page_size - 1)); + munmap(db_aligned, qp->page_size); + munmap(qp->rq.buf, qp->rq.buf_size); + + efa_wq_terminate(&qp->rq.wq); +} + +static int efa_rq_initialize(struct efa_qp *qp, struct efa_create_qp_resp *resp) +{ + uint8_t *db_base; + int err; + + if (!qp->rq.wq.wqe_cnt) + return 0; + + err = efa_wq_initialize(&qp->rq.wq); + if (err) + return err; + + qp->rq.buf_size = resp->rq_mmap_size; + qp->rq.buf = mmap(NULL, qp->rq.buf_size, PROT_WRITE, MAP_SHARED, + qp->ibvqp.context->cmd_fd, resp->rq_mmap_key); + if (qp->rq.buf == MAP_FAILED) + goto err_terminate_wq; + + db_base = mmap(NULL, qp->page_size, PROT_WRITE, MAP_SHARED, + qp->ibvqp.context->cmd_fd, resp->rq_db_mmap_key); + if (db_base == MAP_FAILED) + goto err_unmap_rq_buf; + + qp->rq.db = (uint32_t *)(db_base + resp->rq_db_offset); + qp->rq.sub_cq_idx = resp->recv_sub_cq_idx; + + return 0; + +err_unmap_rq_buf: + munmap(qp->rq.buf, qp->rq.buf_size); +err_terminate_wq: + efa_wq_terminate(&qp->rq.wq); + return EINVAL; +} + +static void efa_qp_init_indices(struct efa_qp *qp) +{ + qp->sq.wq.wqe_posted = 0; + qp->sq.wq.wqe_completed = 0; + qp->sq.wq.desc_idx = 0; + qp->sq.wq.wrid_idx_pool_next = 0; + + qp->rq.wq.wqe_posted = 0; + qp->rq.wq.wqe_completed = 0; + qp->rq.wq.desc_idx = 0; + qp->rq.wq.wrid_idx_pool_next = 0; +} + +static void efa_setup_qp(struct efa_qp *qp, + struct ibv_qp_cap *cap, + size_t page_size) +{ + uint16_t rq_desc_cnt; + + efa_qp_init_indices(qp); + + qp->sq.wq.wqe_cnt = align_up_queue_size(cap->max_send_wr); + qp->sq.wq.max_sge = cap->max_send_sge; + qp->sq.wq.desc_mask = qp->sq.wq.wqe_cnt - 1; + + qp->rq.wq.max_sge = cap->max_recv_sge; + rq_desc_cnt = align_up_queue_size(cap->max_recv_sge * cap->max_recv_wr); + qp->rq.wq.desc_mask = rq_desc_cnt - 1; + qp->rq.wq.wqe_cnt = rq_desc_cnt / qp->rq.wq.max_sge; + + qp->page_size = page_size; +} + +static void efa_lock_cqs(struct ibv_qp *ibvqp) +{ + struct efa_cq *send_cq = to_efa_cq(ibvqp->send_cq); + struct efa_cq *recv_cq = to_efa_cq(ibvqp->recv_cq); + + if (recv_cq == send_cq && recv_cq) { + pthread_spin_lock(&recv_cq->lock); + } else { + if (recv_cq) + pthread_spin_lock(&recv_cq->lock); + if (send_cq) + pthread_spin_lock(&send_cq->lock); + } +} + +static void efa_unlock_cqs(struct ibv_qp *ibvqp) +{ + struct efa_cq *send_cq = to_efa_cq(ibvqp->send_cq); + struct efa_cq *recv_cq = to_efa_cq(ibvqp->recv_cq); + + if (recv_cq == send_cq && recv_cq) { + pthread_spin_unlock(&recv_cq->lock); + } else { + if (recv_cq) + pthread_spin_unlock(&recv_cq->lock); + if (send_cq) + pthread_spin_unlock(&send_cq->lock); + } +} + +static int efa_check_qp_attr(struct efa_dev *dev, + struct ibv_qp_init_attr *attr) +{ + if (!attr->recv_cq || !attr->send_cq) + return EINVAL; + + if (attr->srq) + return EINVAL; + + return 0; +} + +static int efa_check_qp_limits(struct efa_dev *dev, + struct ibv_qp_init_attr *attr) +{ + if (attr->cap.max_send_sge > dev->max_sq_sge) + return EINVAL; + + if (attr->cap.max_recv_sge > dev->max_rq_sge) + return EINVAL; + + if (attr->cap.max_send_wr > dev->max_sq_wr) + return EINVAL; + + if (attr->cap.max_recv_wr > dev->max_rq_wr) + return EINVAL; + + return 0; +} + +static struct ibv_qp *create_qp(struct ibv_pd *ibvpd, + struct ibv_qp_init_attr *attr, + uint32_t driver_qp_type) +{ + struct efa_context *ctx = to_efa_context(ibvpd->context); + struct efa_dev *dev = to_efa_dev(ibvpd->context->device); + struct efa_create_qp_resp resp = {}; + struct efa_create_qp req = {}; + struct efa_cq *send_cq; + struct efa_cq *recv_cq; + struct efa_qp *qp; + int err; + + err = efa_check_qp_attr(dev, attr); + if (err) + return NULL; + + err = efa_check_qp_limits(dev, attr); + if (err) + return NULL; + + qp = calloc(1, sizeof(*qp)); + if (!qp) + return NULL; + + efa_setup_qp(qp, &attr->cap, dev->pg_sz); + + attr->cap.max_send_wr = qp->sq.wq.wqe_cnt; + attr->cap.max_recv_wr = qp->rq.wq.wqe_cnt; + + req.rq_ring_size = (qp->rq.wq.desc_mask + 1) * + sizeof(struct efa_io_rx_desc); + req.sq_ring_size = (attr->cap.max_send_wr) * + sizeof(struct efa_io_tx_wqe); + if (attr->qp_type == IBV_QPT_DRIVER) + req.driver_qp_type = driver_qp_type; + + if (ibv_cmd_create_qp(ibvpd, &qp->ibvqp, attr, &req.ibv_cmd, + sizeof(req), &resp.ibv_resp, sizeof(resp))) + goto err_free_qp; + + qp->ibvqp.state = IBV_QPS_RESET; + qp->sq_sig_all = attr->sq_sig_all; + + err = efa_rq_initialize(qp, &resp); + if (err) + goto err_destroy_qp; + + err = efa_sq_initialize(qp, &resp); + if (err) + goto err_terminate_rq; + + pthread_spin_lock(&ctx->qp_table_lock); + ctx->qp_table[qp->ibvqp.qp_num] = qp; + pthread_spin_unlock(&ctx->qp_table_lock); + + if (attr->send_cq) { + send_cq = to_efa_cq(attr->send_cq); + qp->scq = send_cq; + pthread_spin_lock(&send_cq->lock); + efa_cq_inc_ref_cnt(send_cq, resp.send_sub_cq_idx); + pthread_spin_unlock(&send_cq->lock); + } + + if (attr->recv_cq) { + recv_cq = to_efa_cq(attr->recv_cq); + qp->rcq = recv_cq; + pthread_spin_lock(&recv_cq->lock); + efa_cq_inc_ref_cnt(recv_cq, resp.recv_sub_cq_idx); + pthread_spin_unlock(&recv_cq->lock); + } + + return &qp->ibvqp; + +err_terminate_rq: + efa_rq_terminate(qp); +err_destroy_qp: + ibv_cmd_destroy_qp(&qp->ibvqp); +err_free_qp: + free(qp); + return NULL; +} + +struct ibv_qp *efa_create_qp(struct ibv_pd *ibvpd, + struct ibv_qp_init_attr *attr) +{ + if (attr->qp_type != IBV_QPT_UD) + return NULL; + + return create_qp(ibvpd, attr, 0); +} + +int efa_modify_qp(struct ibv_qp *ibvqp, struct ibv_qp_attr *attr, + int attr_mask) +{ + struct efa_qp *qp = to_efa_qp(ibvqp); + struct ibv_modify_qp cmd; + int err; + + err = ibv_cmd_modify_qp(ibvqp, attr, attr_mask, &cmd, sizeof(cmd)); + if (err) + return err; + + if (attr_mask & IBV_QP_STATE) { + qp->ibvqp.state = attr->qp_state; + /* transition to reset */ + if (qp->ibvqp.state == IBV_QPS_RESET) + efa_qp_init_indices(qp); + } + + return 0; +} + +int efa_query_qp(struct ibv_qp *ibvqp, struct ibv_qp_attr *attr, + int attr_mask, struct ibv_qp_init_attr *init_attr) +{ + struct ibv_query_qp cmd; + + return ibv_cmd_query_qp(ibvqp, attr, attr_mask, init_attr, + &cmd, sizeof(cmd)); +} + +int efa_destroy_qp(struct ibv_qp *ibvqp) +{ + struct efa_context *ctx = to_efa_context(ibvqp->context); + struct efa_qp *qp = to_efa_qp(ibvqp); + int err; + + pthread_spin_lock(&ctx->qp_table_lock); + efa_lock_cqs(ibvqp); + + if (ibvqp->send_cq) + efa_cq_dec_ref_cnt(to_efa_cq(ibvqp->send_cq), + qp->sq.sub_cq_idx); + + if (ibvqp->recv_cq) + efa_cq_dec_ref_cnt(to_efa_cq(ibvqp->recv_cq), + qp->rq.sub_cq_idx); + + ctx->qp_table[ibvqp->qp_num] = NULL; + + efa_unlock_cqs(ibvqp); + pthread_spin_unlock(&ctx->qp_table_lock); + + efa_sq_terminate(qp); + efa_rq_terminate(qp); + + err = ibv_cmd_destroy_qp(ibvqp); + if (err) + return err; + + free(qp); + return 0; +} + +static void efa_post_send_sgl(struct ibv_send_wr *wr, + struct efa_io_tx_wqe *tx_wqe, + int *desc_size) +{ + struct efa_io_tx_buf_desc *tx_buf; + struct ibv_sge *sge; + uintptr_t addr; + size_t i; + + for (i = 0; i < wr->num_sge; i++) { + sge = &wr->sg_list[i]; + tx_buf = &tx_wqe->data.sgl[i]; + addr = sge->addr; + + /* Set TX buffer desc from SGE */ + tx_buf->length = sge->length; + tx_buf->lkey = sge->lkey; + tx_buf->buf_addr_lo = addr & 0xffffffff; + set_efa_io_tx_buf_desc_buf_addr_hi(tx_buf, + (uint64_t)addr >> 32); + } + + *desc_size += sizeof(*tx_buf) * wr->num_sge; +} + +static void efa_post_send_inline_data(const struct ibv_send_wr *wr, + struct efa_io_tx_wqe *tx_wqe, + int *desc_size) +{ + const struct ibv_sge *sgl = wr->sg_list; + uint32_t total_length = 0; + uint32_t length; + size_t i; + + for (i = 0; i < wr->num_sge; i++) { + length = sgl[i].length; + + memcpy(tx_wqe->data.inline_data + total_length, + (void *)(uintptr_t)sgl[i].addr, length); + total_length += length; + } + + *desc_size += total_length; + + set_efa_io_tx_meta_desc_inline_msg(&tx_wqe->common, 1); + tx_wqe->common.len = total_length; +} + +static size_t efa_sge_total_bytes(const struct ibv_send_wr *wr) +{ + size_t bytes = 0; + size_t i; + + for (i = 0; i < wr->num_sge; i++) + bytes += wr->sg_list[i].length; + + return bytes; +} + +static ssize_t efa_post_send_validate(struct efa_qp *qp, + const struct ibv_send_wr *wr) +{ + if (unlikely(qp->ibvqp.state != IBV_QPS_RTS && + qp->ibvqp.state != IBV_QPS_SQD)) + return EINVAL; + + if (unlikely(wr->opcode != IBV_WR_SEND)) + return EINVAL; + + if (unlikely(!qp->scq)) + return EINVAL; + + if (unlikely(wr->num_sge > qp->sq.wq.max_sge)) + return EINVAL; + + if (unlikely(!(wr->send_flags & IBV_SEND_SIGNALED) && !qp->sq_sig_all)) + return EINVAL; + + if (unlikely(wr->send_flags & ~(IBV_SEND_SIGNALED | IBV_SEND_INLINE))) + return EINVAL; + + if (unlikely(wr->send_flags & IBV_SEND_INLINE && + efa_sge_total_bytes(wr) > qp->sq.max_inline_data)) + return EINVAL; + + if (unlikely(qp->sq.wq.wqe_posted - qp->sq.wq.wqe_completed == + qp->sq.wq.wqe_cnt)) + return ENOMEM; + + return 0; +} + +int efa_post_send(struct ibv_qp *ibvqp, struct ibv_send_wr *wr, + struct ibv_send_wr **bad) +{ + struct efa_io_tx_meta_desc *meta_desc; + struct efa_qp *qp = to_efa_qp(ibvqp); + uint32_t sq_desc_offset, wrid_idx; + struct efa_io_tx_wqe tx_wqe; + struct efa_ah *ah; + int desc_size; + int err = 0; + + pthread_spin_lock(&qp->sq.wq.wqlock); + while (wr) { + desc_size = sizeof(tx_wqe.common) + sizeof(tx_wqe.u); + + err = efa_post_send_validate(qp, wr); + if (err) { + *bad = wr; + goto ring_db; + } + + memset(&tx_wqe, 0, sizeof(tx_wqe)); + meta_desc = &tx_wqe.common; + ah = to_efa_ah(wr->wr.ud.ah); + + if (wr->send_flags & IBV_SEND_INLINE) { + efa_post_send_inline_data(wr, &tx_wqe, &desc_size); + } else { + meta_desc->len = wr->num_sge; + efa_post_send_sgl(wr, &tx_wqe, &desc_size); + } + + /* Get the next wrid to be used from the index pool */ + wrid_idx = qp->sq.wq.wrid_idx_pool[qp->sq.wq.wrid_idx_pool_next]; + qp->sq.wq.wrid[wrid_idx] = wr->wr_id; + meta_desc->req_id = wrid_idx; + qp->sq.wq.wqe_posted++; + + /* Will never overlap, as efa_post_send_validate() succeeded */ + qp->sq.wq.wrid_idx_pool_next++; + assert(qp->sq.wq.wrid_idx_pool_next <= qp->sq.wq.wqe_cnt); + + /* Set rest of the descriptor fields */ + set_efa_io_tx_meta_desc_meta_desc(meta_desc, 1); + set_efa_io_tx_meta_desc_phase(meta_desc, qp->sq.wq.phase); + set_efa_io_tx_meta_desc_first(meta_desc, 1); + set_efa_io_tx_meta_desc_last(meta_desc, 1); + meta_desc->dest_qp_num = wr->wr.ud.remote_qpn; + set_efa_io_tx_meta_desc_comp_req(meta_desc, 1); + meta_desc->ah = ah->efa_ah; + tx_wqe.u.ud.qkey = wr->wr.ud.remote_qkey; + + /* Copy descriptor */ + sq_desc_offset = (qp->sq.wq.desc_idx & qp->sq.wq.desc_mask) * + sizeof(tx_wqe); + memcpy(qp->sq.desc + sq_desc_offset, &tx_wqe, desc_size); + + /* advance index and change phase */ + qp->sq.wq.desc_idx++; + if (!(qp->sq.wq.desc_idx & qp->sq.wq.desc_mask)) + qp->sq.wq.phase++; + + wr = wr->next; + } + +ring_db: + udma_to_device_barrier(); + mmio_write32(qp->sq.db, qp->sq.wq.desc_idx); + + pthread_spin_unlock(&qp->sq.wq.wqlock); + return err; +} + +static ssize_t efa_post_recv_validate(struct efa_qp *qp, struct ibv_recv_wr *wr) +{ + if (unlikely(qp->ibvqp.state == IBV_QPS_RESET || + qp->ibvqp.state == IBV_QPS_ERR)) + return EINVAL; + + if (unlikely(!qp->rcq)) + return EINVAL; + + if (unlikely(wr->num_sge > qp->rq.wq.max_sge)) + return EINVAL; + + if (unlikely(qp->rq.wq.wqe_posted - qp->rq.wq.wqe_completed == + qp->rq.wq.wqe_cnt)) + return ENOMEM; + + return 0; +} + +int efa_post_recv(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad) +{ + struct efa_qp *qp = to_efa_qp(ibvqp); + uint32_t wqe_index, rq_desc_offset; + struct efa_io_rx_desc rx_buf; + uintptr_t addr; + int err = 0; + size_t i; + + pthread_spin_lock(&qp->rq.wq.wqlock); + while (wr) { + err = efa_post_recv_validate(qp, wr); + if (err) { + *bad = wr; + goto ring_db; + } + + memset(&rx_buf, 0, sizeof(rx_buf)); + + /* Save wrid */ + /* Get the next wrid to be used from the index pool */ + wqe_index = qp->rq.wq.wrid_idx_pool[qp->rq.wq.wrid_idx_pool_next]; + qp->rq.wq.wrid[wqe_index] = wr->wr_id; + rx_buf.req_id = wqe_index; + qp->rq.wq.wqe_posted++; + + /* Will never overlap, as efa_post_recv_validate() succeeded */ + qp->rq.wq.wrid_idx_pool_next++; + assert(qp->rq.wq.wrid_idx_pool_next <= qp->rq.wq.wqe_cnt); + + /* Default init of the rx buffer */ + set_efa_io_rx_desc_first(&rx_buf, 1); + set_efa_io_rx_desc_last(&rx_buf, 0); + + for (i = 0; i < wr->num_sge; i++) { + /* Set last indication if need) */ + if (i == wr->num_sge - 1) + set_efa_io_rx_desc_last(&rx_buf, 1); + + addr = wr->sg_list[i].addr; + + /* Set RX buffer desc from SGE */ + rx_buf.length = wr->sg_list[i].length; + set_efa_io_rx_desc_lkey(&rx_buf, wr->sg_list[i].lkey); + rx_buf.buf_addr_lo = addr; + rx_buf.buf_addr_hi = (uint64_t)addr >> 32; + + /* Copy descriptor to RX ring */ + rq_desc_offset = (qp->rq.wq.desc_idx & qp->rq.wq.desc_mask) * sizeof(rx_buf); + memcpy(qp->rq.buf + rq_desc_offset, &rx_buf, sizeof(rx_buf)); + + /* Wrap rx descriptor index */ + qp->rq.wq.desc_idx++; + if (!(qp->rq.wq.desc_idx & qp->rq.wq.desc_mask)) + qp->rq.wq.phase++; + + /* reset descriptor for next iov */ + memset(&rx_buf, 0, sizeof(rx_buf)); + } + wr = wr->next; + } + +ring_db: + udma_to_device_barrier(); + mmio_write32(qp->rq.db, qp->rq.wq.desc_idx); + + pthread_spin_unlock(&qp->rq.wq.wqlock); + return err; +} + +struct ibv_ah *efa_create_ah(struct ibv_pd *ibvpd, struct ibv_ah_attr *attr) +{ + struct efa_create_ah_resp resp = {}; + struct efa_ah *ah; + int err; + + ah = calloc(1, sizeof(*ah)); + if (!ah) + return NULL; + + memset(&resp, 0, sizeof(resp)); + err = ibv_cmd_create_ah(ibvpd, &ah->ibvah, attr, + &resp.ibv_resp, sizeof(resp)); + if (err) { + free(ah); + return NULL; + } + + ah->efa_ah = resp.efa_address_handle; + + return &ah->ibvah; +} + +int efa_destroy_ah(struct ibv_ah *ibvah) +{ + struct efa_ah *ah; + int err; + + ah = to_efa_ah(ibvah); + err = ibv_cmd_destroy_ah(ibvah); + if (err) + return err; + free(ah); + + return 0; +} diff --git a/providers/efa/verbs.h b/providers/efa/verbs.h new file mode 100644 index 000000000000..ffe5dea153a3 --- /dev/null +++ b/providers/efa/verbs.h @@ -0,0 +1,72 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */ +/* + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __EFA_VERBS_H__ +#define __EFA_VERBS_H__ + +#include +#include + +int efa_query_device(struct ibv_context *uctx, struct ibv_device_attr *attr); +int efa_query_port(struct ibv_context *uctx, uint8_t port, + struct ibv_port_attr *attr); +int efa_query_device_ex(struct ibv_context *context, + const struct ibv_query_device_ex_input *input, + struct ibv_device_attr_ex *attr, size_t attr_size); +struct ibv_pd *efa_alloc_pd(struct ibv_context *uctx); +int efa_dealloc_pd(struct ibv_pd *ibvpd); +struct ibv_mr *efa_reg_mr(struct ibv_pd *ibvpd, void *buf, size_t len, + int ibv_access_flags); +int efa_dereg_mr(struct verbs_mr *vmr); + +struct ibv_cq *efa_create_cq(struct ibv_context *uctx, int ncqe, + struct ibv_comp_channel *ch, int vec); +int efa_destroy_cq(struct ibv_cq *ibvcq); +int efa_poll_cq(struct ibv_cq *ibvcq, int nwc, struct ibv_wc *wc); + +struct ibv_qp *efa_create_qp(struct ibv_pd *ibvpd, + struct ibv_qp_init_attr *attr); +int efa_modify_qp(struct ibv_qp *ibvqp, struct ibv_qp_attr *attr, + int ibv_qp_attr_mask); +int efa_query_qp(struct ibv_qp *ibvqp, struct ibv_qp_attr *attr, int attr_mask, + struct ibv_qp_init_attr *init_attr); +int efa_destroy_qp(struct ibv_qp *ibvqp); +int efa_post_send(struct ibv_qp *ibvqp, struct ibv_send_wr *wr, + struct ibv_send_wr **bad); +int efa_post_recv(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad); + +struct ibv_ah *efa_create_ah(struct ibv_pd *ibvpd, struct ibv_ah_attr *attr); +int efa_destroy_ah(struct ibv_ah *ibvah); + +#endif /* __EFA_VERBS_H__ */ diff --git a/redhat/rdma-core.spec b/redhat/rdma-core.spec index 8ddd54a164c8..a3a2f6ded447 100644 --- a/redhat/rdma-core.spec +++ b/redhat/rdma-core.spec @@ -119,6 +119,8 @@ Provides: libocrdma = %{version}-%{release} Obsoletes: libocrdma < %{version}-%{release} Provides: librxe = %{version}-%{release} Obsoletes: librxe < %{version}-%{release} +Provides: libefa = %{version}-%{release} +Obsoletes: libefa < %{version}-%{release} %description -n libibverbs libibverbs is a library that allows userspace processes to use RDMA @@ -143,6 +145,7 @@ Device-specific plug-in ibverbs userspace drivers are included: - libqedr: QLogic QL4xxx RoCE HCA - librxe: A software implementation of the RoCE protocol - libvmw_pvrdma: VMware paravirtual RDMA device +- libefa: Amazon Elastic Fabric Adapter %package -n libibverbs-utils Summary: Examples for the libibverbs library diff --git a/suse/rdma-core.spec b/suse/rdma-core.spec index 13f3feefe0ce..8479b06b4262 100644 --- a/suse/rdma-core.spec +++ b/suse/rdma-core.spec @@ -177,6 +177,7 @@ Obsoletes: libmthca-rdmav2 < %{version}-%{release} Obsoletes: libnes-rdmav2 < %{version}-%{release} Obsoletes: libocrdma-rdmav2 < %{version}-%{release} Obsoletes: librxe-rdmav2 < %{version}-%{release} +Obsoletes: libefa-rdmav2 < %{version}-%{release} %if 0%{?dma_coherent} Requires: %{mlx4_lname} = %{version}-%{release} Requires: %{mlx5_lname} = %{version}-%{release} @@ -208,6 +209,7 @@ Device-specific plug-in ibverbs userspace drivers are included: - libqedr: QLogic QL4xxx RoCE HCA - librxe: A software implementation of the RoCE protocol - libvmw_pvrdma: VMware paravirtual RDMA device +- libefa: Amazon Elastic Fabric Adapter %package -n %verbs_lname Summary: Ibverbs runtime library