From patchwork Thu Aug 17 10:21:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cheng Xu X-Patchwork-Id: 13356236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97871C41513 for ; Thu, 17 Aug 2023 10:22:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238948AbjHQKWI (ORCPT ); Thu, 17 Aug 2023 06:22:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242433AbjHQKV7 (ORCPT ); Thu, 17 Aug 2023 06:21:59 -0400 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AC2198C for ; Thu, 17 Aug 2023 03:21:57 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=chengyou@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VpzmBE4_1692267714; Received: from localhost(mailfrom:chengyou@linux.alibaba.com fp:SMTPD_---0VpzmBE4_1692267714) by smtp.aliyun-inc.com; Thu, 17 Aug 2023 18:21:55 +0800 From: Cheng Xu To: jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org, KaiShen@linux.alibaba.com Subject: [PATCH for-next 1/3] RDMA/erdma: Renaming variable names and field names of struct erdma_mem Date: Thu, 17 Aug 2023 18:21:49 +0800 Message-Id: <20230817102151.75964-2-chengyou@linux.alibaba.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20230817102151.75964-1-chengyou@linux.alibaba.com> References: <20230817102151.75964-1-chengyou@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Currently, variable names and field names of struct erdma_mem contain 'mtt', which is not accurate. Renaming them with 'xxx_mem' or 'mem'. Signed-off-by: Cheng Xu --- drivers/infiniband/hw/erdma/erdma_verbs.c | 66 +++++++++++------------ drivers/infiniband/hw/erdma/erdma_verbs.h | 8 +-- 2 files changed, 37 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c index fe0521f1536e..fbbd046b350c 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.c +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c @@ -67,30 +67,30 @@ static int create_qp_cmd(struct erdma_ucontext *uctx, struct erdma_qp *qp) user_qp = &qp->user_qp; req.sq_cqn_mtt_cfg = FIELD_PREP( ERDMA_CMD_CREATE_QP_PAGE_SIZE_MASK, - ilog2(user_qp->sq_mtt.page_size) - ERDMA_HW_PAGE_SHIFT); + ilog2(user_qp->sq_mem.page_size) - ERDMA_HW_PAGE_SHIFT); req.sq_cqn_mtt_cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_CQN_MASK, qp->scq->cqn); req.rq_cqn_mtt_cfg = FIELD_PREP( ERDMA_CMD_CREATE_QP_PAGE_SIZE_MASK, - ilog2(user_qp->rq_mtt.page_size) - ERDMA_HW_PAGE_SHIFT); + ilog2(user_qp->rq_mem.page_size) - ERDMA_HW_PAGE_SHIFT); req.rq_cqn_mtt_cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_CQN_MASK, qp->rcq->cqn); - req.sq_mtt_cfg = user_qp->sq_mtt.page_offset; + req.sq_mtt_cfg = user_qp->sq_mem.page_offset; req.sq_mtt_cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_CNT_MASK, - user_qp->sq_mtt.mtt_nents) | + user_qp->sq_mem.mtt_nents) | FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - user_qp->sq_mtt.mtt_type); + user_qp->sq_mem.mtt_type); - req.rq_mtt_cfg = user_qp->rq_mtt.page_offset; + req.rq_mtt_cfg = user_qp->rq_mem.page_offset; req.rq_mtt_cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_CNT_MASK, - user_qp->rq_mtt.mtt_nents) | + user_qp->rq_mem.mtt_nents) | FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - user_qp->rq_mtt.mtt_type); + user_qp->rq_mem.mtt_type); - req.sq_buf_addr = user_qp->sq_mtt.mtt_entry[0]; - req.rq_buf_addr = user_qp->rq_mtt.mtt_entry[0]; + req.sq_buf_addr = user_qp->sq_mem.mtt_entry[0]; + req.rq_buf_addr = user_qp->rq_mem.mtt_entry[0]; req.sq_db_info_dma_addr = user_qp->sq_db_info_dma_addr; req.rq_db_info_dma_addr = user_qp->rq_db_info_dma_addr; @@ -161,7 +161,7 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq) { struct erdma_dev *dev = to_edev(cq->ibcq.device); struct erdma_cmdq_create_cq_req req; - struct erdma_mem *mtt; + struct erdma_mem *mem; u32 page_size; erdma_cmdq_build_reqhdr(&req.hdr, CMDQ_SUBMOD_RDMA, @@ -186,23 +186,23 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq) req.cq_db_info_addr = cq->kern_cq.qbuf_dma_addr + (cq->depth << CQE_SHIFT); } else { - mtt = &cq->user_cq.qbuf_mtt; + mem = &cq->user_cq.qbuf_mem; req.cfg0 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_PAGESIZE_MASK, - ilog2(mtt->page_size) - ERDMA_HW_PAGE_SHIFT); - if (mtt->mtt_nents == 1) { - req.qbuf_addr_l = lower_32_bits(*(u64 *)mtt->mtt_buf); - req.qbuf_addr_h = upper_32_bits(*(u64 *)mtt->mtt_buf); + ilog2(mem->page_size) - ERDMA_HW_PAGE_SHIFT); + if (mem->mtt_nents == 1) { + req.qbuf_addr_l = lower_32_bits(*(u64 *)mem->mtt_buf); + req.qbuf_addr_h = upper_32_bits(*(u64 *)mem->mtt_buf); } else { - req.qbuf_addr_l = lower_32_bits(mtt->mtt_entry[0]); - req.qbuf_addr_h = upper_32_bits(mtt->mtt_entry[0]); + req.qbuf_addr_l = lower_32_bits(mem->mtt_entry[0]); + req.qbuf_addr_h = upper_32_bits(mem->mtt_entry[0]); } req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_CNT_MASK, - mtt->mtt_nents); + mem->mtt_nents); req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_TYPE_MASK, - mtt->mtt_type); + mem->mtt_type); - req.first_page_offset = mtt->page_offset; + req.first_page_offset = mem->page_offset; req.cq_db_info_addr = cq->user_cq.db_info_dma_addr; if (uctx->ext_db.enable) { @@ -660,7 +660,7 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx, qp->attrs.rq_size * RQE_SIZE)) return -EINVAL; - ret = get_mtt_entries(qp->dev, &qp->user_qp.sq_mtt, va, + ret = get_mtt_entries(qp->dev, &qp->user_qp.sq_mem, va, qp->attrs.sq_size << SQEBB_SHIFT, 0, va, (SZ_1M - SZ_4K), 1); if (ret) @@ -669,7 +669,7 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx, rq_offset = ALIGN(qp->attrs.sq_size << SQEBB_SHIFT, ERDMA_HW_PAGE_SIZE); qp->user_qp.rq_offset = rq_offset; - ret = get_mtt_entries(qp->dev, &qp->user_qp.rq_mtt, va + rq_offset, + ret = get_mtt_entries(qp->dev, &qp->user_qp.rq_mem, va + rq_offset, qp->attrs.rq_size << RQE_SHIFT, 0, va + rq_offset, (SZ_1M - SZ_4K), 1); if (ret) @@ -687,18 +687,18 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx, return 0; put_rq_mtt: - put_mtt_entries(qp->dev, &qp->user_qp.rq_mtt); + put_mtt_entries(qp->dev, &qp->user_qp.rq_mem); put_sq_mtt: - put_mtt_entries(qp->dev, &qp->user_qp.sq_mtt); + put_mtt_entries(qp->dev, &qp->user_qp.sq_mem); return ret; } static void free_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx) { - put_mtt_entries(qp->dev, &qp->user_qp.sq_mtt); - put_mtt_entries(qp->dev, &qp->user_qp.rq_mtt); + put_mtt_entries(qp->dev, &qp->user_qp.sq_mem); + put_mtt_entries(qp->dev, &qp->user_qp.rq_mem); erdma_unmap_user_dbrecords(uctx, &qp->user_qp.user_dbr_page); } @@ -1041,7 +1041,7 @@ int erdma_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) cq->kern_cq.qbuf, cq->kern_cq.qbuf_dma_addr); } else { erdma_unmap_user_dbrecords(ctx, &cq->user_cq.user_dbr_page); - put_mtt_entries(dev, &cq->user_cq.qbuf_mtt); + put_mtt_entries(dev, &cq->user_cq.qbuf_mem); } xa_erase(&dev->cq_xa, cq->cqn); @@ -1089,8 +1089,8 @@ int erdma_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) WARPPED_BUFSIZE(qp->attrs.sq_size << SQEBB_SHIFT), qp->kern_qp.sq_buf, qp->kern_qp.sq_buf_dma_addr); } else { - put_mtt_entries(dev, &qp->user_qp.sq_mtt); - put_mtt_entries(dev, &qp->user_qp.rq_mtt); + put_mtt_entries(dev, &qp->user_qp.sq_mem); + put_mtt_entries(dev, &qp->user_qp.rq_mem); erdma_unmap_user_dbrecords(ctx, &qp->user_qp.user_dbr_page); } @@ -1379,7 +1379,7 @@ static int erdma_init_user_cq(struct erdma_ucontext *ctx, struct erdma_cq *cq, int ret; struct erdma_dev *dev = to_edev(cq->ibcq.device); - ret = get_mtt_entries(dev, &cq->user_cq.qbuf_mtt, ureq->qbuf_va, + ret = get_mtt_entries(dev, &cq->user_cq.qbuf_mem, ureq->qbuf_va, ureq->qbuf_len, 0, ureq->qbuf_va, SZ_64M - SZ_4K, 1); if (ret) @@ -1389,7 +1389,7 @@ static int erdma_init_user_cq(struct erdma_ucontext *ctx, struct erdma_cq *cq, &cq->user_cq.user_dbr_page, &cq->user_cq.db_info_dma_addr); if (ret) - put_mtt_entries(dev, &cq->user_cq.qbuf_mtt); + put_mtt_entries(dev, &cq->user_cq.qbuf_mem); return ret; } @@ -1473,7 +1473,7 @@ int erdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, err_free_res: if (!rdma_is_kernel_res(&ibcq->res)) { erdma_unmap_user_dbrecords(ctx, &cq->user_cq.user_dbr_page); - put_mtt_entries(dev, &cq->user_cq.qbuf_mtt); + put_mtt_entries(dev, &cq->user_cq.qbuf_mem); } else { dma_free_coherent(&dev->pdev->dev, WARPPED_BUFSIZE(depth << CQE_SHIFT), diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.h b/drivers/infiniband/hw/erdma/erdma_verbs.h index 429fc3063f98..abaf031fe0d2 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.h +++ b/drivers/infiniband/hw/erdma/erdma_verbs.h @@ -65,7 +65,7 @@ struct erdma_pd { * MemoryRegion definition. */ #define ERDMA_MAX_INLINE_MTT_ENTRIES 4 -#define MTT_SIZE(mtt_cnt) (mtt_cnt << 3) /* per mtt takes 8 Bytes. */ +#define MTT_SIZE(mtt_cnt) (mtt_cnt << 3) /* per mtt entry takes 8 Bytes. */ #define ERDMA_MR_MAX_MTT_CNT 524288 #define ERDMA_MTT_ENTRY_SIZE 8 @@ -121,8 +121,8 @@ struct erdma_user_dbrecords_page { }; struct erdma_uqp { - struct erdma_mem sq_mtt; - struct erdma_mem rq_mtt; + struct erdma_mem sq_mem; + struct erdma_mem rq_mem; dma_addr_t sq_db_info_dma_addr; dma_addr_t rq_db_info_dma_addr; @@ -234,7 +234,7 @@ struct erdma_kcq_info { }; struct erdma_ucq_info { - struct erdma_mem qbuf_mtt; + struct erdma_mem qbuf_mem; struct erdma_user_dbrecords_page *user_dbr_page; dma_addr_t db_info_dma_addr; }; From patchwork Thu Aug 17 10:21:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cheng Xu X-Patchwork-Id: 13356235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D1F0EB64DD for ; Thu, 17 Aug 2023 10:22:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240931AbjHQKWJ (ORCPT ); Thu, 17 Aug 2023 06:22:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244765AbjHQKWB (ORCPT ); Thu, 17 Aug 2023 06:22:01 -0400 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15B1B2D50 for ; Thu, 17 Aug 2023 03:21:58 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R921e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=chengyou@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VpzVYzf_1692267715; Received: from localhost(mailfrom:chengyou@linux.alibaba.com fp:SMTPD_---0VpzVYzf_1692267715) by smtp.aliyun-inc.com; Thu, 17 Aug 2023 18:21:56 +0800 From: Cheng Xu To: jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org, KaiShen@linux.alibaba.com Subject: [PATCH for-next 2/3] RDMA/erdma: Refactor the storage structure of MTT entries Date: Thu, 17 Aug 2023 18:21:50 +0800 Message-Id: <20230817102151.75964-3-chengyou@linux.alibaba.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20230817102151.75964-1-chengyou@linux.alibaba.com> References: <20230817102151.75964-1-chengyou@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Currently our MTT only support inline mtt entries (0 level MTT) and indirect MTT entries (1 level mtt), which will limit the maximum length of MRs. In order to implement a multi-level MTT, we refactor the structure of MTT first. Signed-off-by: Cheng Xu --- drivers/infiniband/hw/erdma/erdma_hw.h | 4 +- drivers/infiniband/hw/erdma/erdma_qp.c | 2 +- drivers/infiniband/hw/erdma/erdma_verbs.c | 214 +++++++++++++--------- drivers/infiniband/hw/erdma/erdma_verbs.h | 26 ++- 4 files changed, 152 insertions(+), 94 deletions(-) diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h index a882b57aa118..80a78569bc2a 100644 --- a/drivers/infiniband/hw/erdma/erdma_hw.h +++ b/drivers/infiniband/hw/erdma/erdma_hw.h @@ -228,7 +228,7 @@ struct erdma_cmdq_ext_db_req { /* create_cq cfg1 */ #define ERDMA_CMD_CREATE_CQ_MTT_CNT_MASK GENMASK(31, 16) -#define ERDMA_CMD_CREATE_CQ_MTT_TYPE_MASK BIT(15) +#define ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK BIT(15) #define ERDMA_CMD_CREATE_CQ_MTT_DB_CFG_MASK BIT(11) #define ERDMA_CMD_CREATE_CQ_EQN_MASK GENMASK(9, 0) @@ -258,7 +258,7 @@ struct erdma_cmdq_create_cq_req { /* regmr cfg2 */ #define ERDMA_CMD_REGMR_PAGESIZE_MASK GENMASK(31, 27) -#define ERDMA_CMD_REGMR_MTT_TYPE_MASK GENMASK(21, 20) +#define ERDMA_CMD_REGMR_MTT_LEVEL_MASK GENMASK(21, 20) #define ERDMA_CMD_REGMR_MTT_CNT_MASK GENMASK(19, 0) struct erdma_cmdq_reg_mr_req { diff --git a/drivers/infiniband/hw/erdma/erdma_qp.c b/drivers/infiniband/hw/erdma/erdma_qp.c index 44923c51a01b..6d0330badd68 100644 --- a/drivers/infiniband/hw/erdma/erdma_qp.c +++ b/drivers/infiniband/hw/erdma/erdma_qp.c @@ -410,7 +410,7 @@ static int erdma_push_one_sqe(struct erdma_qp *qp, u16 *pi, /* Copy SGLs to SQE content to accelerate */ memcpy(get_queue_entry(qp->kern_qp.sq_buf, idx + 1, qp->attrs.sq_size, SQEBB_SHIFT), - mr->mem.mtt_buf, MTT_SIZE(mr->mem.mtt_nents)); + mr->mem.mtt->buf, MTT_SIZE(mr->mem.mtt_nents)); wqe_size = sizeof(struct erdma_reg_mr_sqe) + MTT_SIZE(mr->mem.mtt_nents); } else { diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c index fbbd046b350c..0d272f18256a 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.c +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c @@ -19,6 +19,23 @@ #include "erdma_cm.h" #include "erdma_verbs.h" +static void assemble_qbuf_mtt_for_cmd(struct erdma_mem *mem, u32 *cfg, + u64 *addr0, u64 *addr1) +{ + struct erdma_mtt *mtt = mem->mtt; + + if (mem->mtt_nents > ERDMA_MAX_INLINE_MTT_ENTRIES) { + *addr0 = mtt->buf_dma; + *cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, + ERDMA_MR_INDIRECT_MTT); + } else { + *addr0 = mtt->buf[0]; + memcpy(addr1, mtt->buf + 1, MTT_SIZE(mem->mtt_nents - 1)); + *cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, + ERDMA_MR_INLINE_MTT); + } +} + static int create_qp_cmd(struct erdma_ucontext *uctx, struct erdma_qp *qp) { struct erdma_dev *dev = to_edev(qp->ibqp.device); @@ -79,18 +96,16 @@ static int create_qp_cmd(struct erdma_ucontext *uctx, struct erdma_qp *qp) req.sq_mtt_cfg = user_qp->sq_mem.page_offset; req.sq_mtt_cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_CNT_MASK, - user_qp->sq_mem.mtt_nents) | - FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - user_qp->sq_mem.mtt_type); + user_qp->sq_mem.mtt_nents); req.rq_mtt_cfg = user_qp->rq_mem.page_offset; req.rq_mtt_cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_CNT_MASK, - user_qp->rq_mem.mtt_nents) | - FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - user_qp->rq_mem.mtt_type); + user_qp->rq_mem.mtt_nents); - req.sq_buf_addr = user_qp->sq_mem.mtt_entry[0]; - req.rq_buf_addr = user_qp->rq_mem.mtt_entry[0]; + assemble_qbuf_mtt_for_cmd(&user_qp->sq_mem, &req.sq_mtt_cfg, + &req.sq_buf_addr, req.sq_mtt_entry); + assemble_qbuf_mtt_for_cmd(&user_qp->rq_mem, &req.rq_mtt_cfg, + &req.rq_buf_addr, req.rq_mtt_entry); req.sq_db_info_dma_addr = user_qp->sq_db_info_dma_addr; req.rq_db_info_dma_addr = user_qp->rq_db_info_dma_addr; @@ -117,13 +132,22 @@ static int create_qp_cmd(struct erdma_ucontext *uctx, struct erdma_qp *qp) static int regmr_cmd(struct erdma_dev *dev, struct erdma_mr *mr) { - struct erdma_cmdq_reg_mr_req req; struct erdma_pd *pd = to_epd(mr->ibmr.pd); - u64 *phy_addr; - int i; + struct erdma_cmdq_reg_mr_req req; + u32 mtt_level; erdma_cmdq_build_reqhdr(&req.hdr, CMDQ_SUBMOD_RDMA, CMDQ_OPCODE_REG_MR); + if (mr->type == ERDMA_MR_TYPE_FRMR || + mr->mem.page_cnt > ERDMA_MAX_INLINE_MTT_ENTRIES) { + req.phy_addr[0] = mr->mem.mtt->buf_dma; + mtt_level = ERDMA_MR_INDIRECT_MTT; + } else { + memcpy(req.phy_addr, mr->mem.mtt->buf, + MTT_SIZE(mr->mem.page_cnt)); + mtt_level = ERDMA_MR_INLINE_MTT; + } + req.cfg0 = FIELD_PREP(ERDMA_CMD_MR_VALID_MASK, mr->valid) | FIELD_PREP(ERDMA_CMD_MR_KEY_MASK, mr->ibmr.lkey & 0xFF) | FIELD_PREP(ERDMA_CMD_MR_MPT_IDX_MASK, mr->ibmr.lkey >> 8); @@ -132,7 +156,7 @@ static int regmr_cmd(struct erdma_dev *dev, struct erdma_mr *mr) FIELD_PREP(ERDMA_CMD_REGMR_RIGHT_MASK, mr->access); req.cfg2 = FIELD_PREP(ERDMA_CMD_REGMR_PAGESIZE_MASK, ilog2(mr->mem.page_size)) | - FIELD_PREP(ERDMA_CMD_REGMR_MTT_TYPE_MASK, mr->mem.mtt_type) | + FIELD_PREP(ERDMA_CMD_REGMR_MTT_LEVEL_MASK, mtt_level) | FIELD_PREP(ERDMA_CMD_REGMR_MTT_CNT_MASK, mr->mem.page_cnt); if (mr->type == ERDMA_MR_TYPE_DMA) @@ -143,16 +167,6 @@ static int regmr_cmd(struct erdma_dev *dev, struct erdma_mr *mr) req.size = mr->mem.len; } - if (mr->type == ERDMA_MR_TYPE_FRMR || - mr->mem.mtt_type == ERDMA_MR_INDIRECT_MTT) { - phy_addr = req.phy_addr; - *phy_addr = mr->mem.mtt_entry[0]; - } else { - phy_addr = req.phy_addr; - for (i = 0; i < mr->mem.mtt_nents; i++) - *phy_addr++ = mr->mem.mtt_entry[i]; - } - post_cmd: return erdma_post_cmd_wait(&dev->cmdq, &req, sizeof(req), NULL, NULL); } @@ -179,7 +193,7 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq) req.qbuf_addr_h = upper_32_bits(cq->kern_cq.qbuf_dma_addr); req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_CNT_MASK, 1) | - FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_TYPE_MASK, + FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK, ERDMA_MR_INLINE_MTT); req.first_page_offset = 0; @@ -191,16 +205,20 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq) FIELD_PREP(ERDMA_CMD_CREATE_CQ_PAGESIZE_MASK, ilog2(mem->page_size) - ERDMA_HW_PAGE_SHIFT); if (mem->mtt_nents == 1) { - req.qbuf_addr_l = lower_32_bits(*(u64 *)mem->mtt_buf); - req.qbuf_addr_h = upper_32_bits(*(u64 *)mem->mtt_buf); + req.qbuf_addr_l = lower_32_bits(mem->mtt->buf[0]); + req.qbuf_addr_h = upper_32_bits(mem->mtt->buf[0]); + req.cfg1 |= + FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK, + ERDMA_MR_INLINE_MTT); } else { - req.qbuf_addr_l = lower_32_bits(mem->mtt_entry[0]); - req.qbuf_addr_h = upper_32_bits(mem->mtt_entry[0]); + req.qbuf_addr_l = lower_32_bits(mem->mtt->buf_dma); + req.qbuf_addr_h = upper_32_bits(mem->mtt->buf_dma); + req.cfg1 |= + FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK, + ERDMA_MR_INDIRECT_MTT); } req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_CNT_MASK, mem->mtt_nents); - req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_TYPE_MASK, - mem->mtt_type); req.first_page_offset = mem->page_offset; req.cq_db_info_addr = cq->user_cq.db_info_dma_addr; @@ -508,12 +526,77 @@ static int init_kernel_qp(struct erdma_dev *dev, struct erdma_qp *qp, return -ENOMEM; } +static void erdma_fill_bottom_mtt(struct erdma_dev *dev, struct erdma_mem *mem) +{ + struct erdma_mtt *mtt = mem->mtt; + struct ib_block_iter biter; + u32 idx = 0; + + while (mtt->low_level) + mtt = mtt->low_level; + + rdma_umem_for_each_dma_block(mem->umem, &biter, mem->page_size) + mtt->buf[idx++] = rdma_block_iter_dma_address(&biter); +} + +static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev, + size_t size) +{ + struct erdma_mtt *mtt; + int ret = -ENOMEM; + + mtt = kzalloc(sizeof(*mtt), GFP_KERNEL); + if (!mtt) + return ERR_PTR(-ENOMEM); + + mtt->size = size; + mtt->buf = kzalloc(mtt->size, GFP_KERNEL); + if (!mtt->buf) + goto err_free_mtt; + + mtt->continuous = true; + mtt->buf_dma = dma_map_single(&dev->pdev->dev, mtt->buf, mtt->size, + DMA_TO_DEVICE); + if (dma_mapping_error(&dev->pdev->dev, mtt->buf_dma)) + goto err_free_mtt_buf; + + return mtt; + +err_free_mtt_buf: + kfree(mtt->buf); + +err_free_mtt: + kfree(mtt); + + return ERR_PTR(ret); +} + +static struct erdma_mtt *erdma_create_mtt(struct erdma_dev *dev, size_t size, + bool force_continuous) +{ + ibdev_dbg(&dev->ibdev, "create_mtt, size:%lu, force cont:%d\n", size, + force_continuous); + + if (force_continuous) + return erdma_create_cont_mtt(dev, size); + + return ERR_PTR(-ENOTSUPP); +} + +static void erdma_destroy_mtt(struct erdma_dev *dev, struct erdma_mtt *mtt) +{ + if (mtt->continuous) { + dma_unmap_single(&dev->pdev->dev, mtt->buf_dma, mtt->size, + DMA_TO_DEVICE); + kfree(mtt->buf); + kfree(mtt); + } +} + static int get_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem, u64 start, u64 len, int access, u64 virt, unsigned long req_page_size, u8 force_indirect_mtt) { - struct ib_block_iter biter; - uint64_t *phy_addr = NULL; int ret = 0; mem->umem = ib_umem_get(&dev->ibdev, start, len, access); @@ -529,38 +612,13 @@ static int get_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem, mem->page_offset = start & (mem->page_size - 1); mem->mtt_nents = ib_umem_num_dma_blocks(mem->umem, mem->page_size); mem->page_cnt = mem->mtt_nents; - - if (mem->page_cnt > ERDMA_MAX_INLINE_MTT_ENTRIES || - force_indirect_mtt) { - mem->mtt_type = ERDMA_MR_INDIRECT_MTT; - mem->mtt_buf = - alloc_pages_exact(MTT_SIZE(mem->page_cnt), GFP_KERNEL); - if (!mem->mtt_buf) { - ret = -ENOMEM; - goto error_ret; - } - phy_addr = mem->mtt_buf; - } else { - mem->mtt_type = ERDMA_MR_INLINE_MTT; - phy_addr = mem->mtt_entry; - } - - rdma_umem_for_each_dma_block(mem->umem, &biter, mem->page_size) { - *phy_addr = rdma_block_iter_dma_address(&biter); - phy_addr++; + mem->mtt = erdma_create_mtt(dev, MTT_SIZE(mem->page_cnt), true); + if (IS_ERR(mem->mtt)) { + ret = PTR_ERR(mem->mtt); + goto error_ret; } - if (mem->mtt_type == ERDMA_MR_INDIRECT_MTT) { - mem->mtt_entry[0] = - dma_map_single(&dev->pdev->dev, mem->mtt_buf, - MTT_SIZE(mem->page_cnt), DMA_TO_DEVICE); - if (dma_mapping_error(&dev->pdev->dev, mem->mtt_entry[0])) { - free_pages_exact(mem->mtt_buf, MTT_SIZE(mem->page_cnt)); - mem->mtt_buf = NULL; - ret = -ENOMEM; - goto error_ret; - } - } + erdma_fill_bottom_mtt(dev, mem); return 0; @@ -575,11 +633,8 @@ static int get_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem, static void put_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem) { - if (mem->mtt_buf) { - dma_unmap_single(&dev->pdev->dev, mem->mtt_entry[0], - MTT_SIZE(mem->page_cnt), DMA_TO_DEVICE); - free_pages_exact(mem->mtt_buf, MTT_SIZE(mem->page_cnt)); - } + if (mem->mtt) + erdma_destroy_mtt(dev, mem->mtt); if (mem->umem) { ib_umem_release(mem->umem); @@ -875,33 +930,20 @@ struct ib_mr *erdma_ib_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, mr->mem.page_size = PAGE_SIZE; /* update it later. */ mr->mem.page_cnt = max_num_sg; - mr->mem.mtt_type = ERDMA_MR_INDIRECT_MTT; - mr->mem.mtt_buf = - alloc_pages_exact(MTT_SIZE(mr->mem.page_cnt), GFP_KERNEL); - if (!mr->mem.mtt_buf) { - ret = -ENOMEM; + mr->mem.mtt = erdma_create_mtt(dev, MTT_SIZE(max_num_sg), true); + if (IS_ERR(mr->mem.mtt)) { + ret = PTR_ERR(mr->mem.mtt); goto out_remove_stag; } - mr->mem.mtt_entry[0] = - dma_map_single(&dev->pdev->dev, mr->mem.mtt_buf, - MTT_SIZE(mr->mem.page_cnt), DMA_TO_DEVICE); - if (dma_mapping_error(&dev->pdev->dev, mr->mem.mtt_entry[0])) { - ret = -ENOMEM; - goto out_free_mtt; - } - ret = regmr_cmd(dev, mr); if (ret) - goto out_dma_unmap; + goto out_destroy_mtt; return &mr->ibmr; -out_dma_unmap: - dma_unmap_single(&dev->pdev->dev, mr->mem.mtt_entry[0], - MTT_SIZE(mr->mem.page_cnt), DMA_TO_DEVICE); -out_free_mtt: - free_pages_exact(mr->mem.mtt_buf, MTT_SIZE(mr->mem.page_cnt)); +out_destroy_mtt: + erdma_destroy_mtt(dev, mr->mem.mtt); out_remove_stag: erdma_free_idx(&dev->res_cb[ERDMA_RES_TYPE_STAG_IDX], @@ -920,7 +962,7 @@ static int erdma_set_page(struct ib_mr *ibmr, u64 addr) if (mr->mem.mtt_nents >= mr->mem.page_cnt) return -1; - *((u64 *)mr->mem.mtt_buf + mr->mem.mtt_nents) = addr; + mr->mem.mtt->buf[mr->mem.mtt_nents] = addr; mr->mem.mtt_nents++; return 0; diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.h b/drivers/infiniband/hw/erdma/erdma_verbs.h index abaf031fe0d2..5f639f27a8a9 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.h +++ b/drivers/infiniband/hw/erdma/erdma_verbs.h @@ -65,7 +65,7 @@ struct erdma_pd { * MemoryRegion definition. */ #define ERDMA_MAX_INLINE_MTT_ENTRIES 4 -#define MTT_SIZE(mtt_cnt) (mtt_cnt << 3) /* per mtt entry takes 8 Bytes. */ +#define MTT_SIZE(mtt_cnt) ((mtt_cnt) << 3) /* per mtt entry takes 8 Bytes. */ #define ERDMA_MR_MAX_MTT_CNT 524288 #define ERDMA_MTT_ENTRY_SIZE 8 @@ -90,10 +90,28 @@ static inline u8 to_erdma_access_flags(int access) (access & IB_ACCESS_REMOTE_ATOMIC ? ERDMA_MR_ACC_RA : 0); } +/* Hierarchical storage structure for MTT entries */ +struct erdma_mtt { + u64 *buf; + size_t size; + + bool continuous; + union { + dma_addr_t buf_dma; + struct { + struct scatterlist *sglist; + u32 nsg; + u32 level; + }; + }; + + struct erdma_mtt *low_level; +}; + struct erdma_mem { struct ib_umem *umem; - void *mtt_buf; - u32 mtt_type; + struct erdma_mtt *mtt; + u32 page_size; u32 page_offset; u32 page_cnt; @@ -101,8 +119,6 @@ struct erdma_mem { u64 va; u64 len; - - u64 mtt_entry[ERDMA_MAX_INLINE_MTT_ENTRIES]; }; struct erdma_mr { From patchwork Thu Aug 17 10:21:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cheng Xu X-Patchwork-Id: 13356237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A91AEC27C7A for ; Thu, 17 Aug 2023 10:22:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244765AbjHQKWJ (ORCPT ); Thu, 17 Aug 2023 06:22:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348524AbjHQKWC (ORCPT ); Thu, 17 Aug 2023 06:22:02 -0400 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 173F2198C for ; Thu, 17 Aug 2023 03:21:59 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R901e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=chengyou@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VpzlsM5_1692267716; Received: from localhost(mailfrom:chengyou@linux.alibaba.com fp:SMTPD_---0VpzlsM5_1692267716) by smtp.aliyun-inc.com; Thu, 17 Aug 2023 18:21:57 +0800 From: Cheng Xu To: jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org, KaiShen@linux.alibaba.com Subject: [PATCH for-next 3/3] RDMA/erdma: Implement hierachical MTT Date: Thu, 17 Aug 2023 18:21:51 +0800 Message-Id: <20230817102151.75964-4-chengyou@linux.alibaba.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20230817102151.75964-1-chengyou@linux.alibaba.com> References: <20230817102151.75964-1-chengyou@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Hierarchical MTT allows large MR registration without the need of continous physical address. This commit add the support of hierachical MTT support for erdma. Signed-off-by: Cheng Xu --- drivers/infiniband/hw/erdma/erdma_hw.h | 14 +- drivers/infiniband/hw/erdma/erdma_verbs.c | 200 +++++++++++++++++++--- drivers/infiniband/hw/erdma/erdma_verbs.h | 4 +- 3 files changed, 194 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h index 80a78569bc2a..9d316fdc6f9a 100644 --- a/drivers/infiniband/hw/erdma/erdma_hw.h +++ b/drivers/infiniband/hw/erdma/erdma_hw.h @@ -248,6 +248,7 @@ struct erdma_cmdq_create_cq_req { /* regmr/deregmr cfg0 */ #define ERDMA_CMD_MR_VALID_MASK BIT(31) +#define ERDMA_CMD_MR_VERSION_MASK GENMASK(30, 28) #define ERDMA_CMD_MR_KEY_MASK GENMASK(27, 20) #define ERDMA_CMD_MR_MPT_IDX_MASK GENMASK(19, 0) @@ -258,6 +259,7 @@ struct erdma_cmdq_create_cq_req { /* regmr cfg2 */ #define ERDMA_CMD_REGMR_PAGESIZE_MASK GENMASK(31, 27) +#define ERDMA_CMD_REGMR_MTT_PAGESIZE_MASK GENMASK(26, 24) #define ERDMA_CMD_REGMR_MTT_LEVEL_MASK GENMASK(21, 20) #define ERDMA_CMD_REGMR_MTT_CNT_MASK GENMASK(19, 0) @@ -268,7 +270,14 @@ struct erdma_cmdq_reg_mr_req { u64 start_va; u32 size; u32 cfg2; - u64 phy_addr[4]; + union { + u64 phy_addr[4]; + struct { + u64 rsvd; + u32 size_h; + u32 mtt_cnt_h; + }; + }; }; struct erdma_cmdq_dereg_mr_req { @@ -309,7 +318,7 @@ struct erdma_cmdq_modify_qp_req { /* create qp mtt_cfg */ #define ERDMA_CMD_CREATE_QP_PAGE_OFFSET_MASK GENMASK(31, 12) #define ERDMA_CMD_CREATE_QP_MTT_CNT_MASK GENMASK(11, 1) -#define ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK BIT(0) +#define ERDMA_CMD_CREATE_QP_MTT_LEVEL_MASK BIT(0) /* create qp db cfg */ #define ERDMA_CMD_CREATE_QP_SQDB_CFG_MASK GENMASK(31, 16) @@ -364,6 +373,7 @@ struct erdma_cmdq_reflush_req { enum { ERDMA_DEV_CAP_FLAGS_ATOMIC = 1 << 7, + ERDMA_DEV_CAP_FLAGS_MTT_VA = 1 << 5, ERDMA_DEV_CAP_FLAGS_EXTEND_DB = 1 << 3, }; diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c index 0d272f18256a..dcccb6015232 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.c +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c @@ -26,13 +26,13 @@ static void assemble_qbuf_mtt_for_cmd(struct erdma_mem *mem, u32 *cfg, if (mem->mtt_nents > ERDMA_MAX_INLINE_MTT_ENTRIES) { *addr0 = mtt->buf_dma; - *cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - ERDMA_MR_INDIRECT_MTT); + *cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_LEVEL_MASK, + ERDMA_MR_MTT_1LEVEL); } else { *addr0 = mtt->buf[0]; memcpy(addr1, mtt->buf + 1, MTT_SIZE(mem->mtt_nents - 1)); - *cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - ERDMA_MR_INLINE_MTT); + *cfg |= FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_LEVEL_MASK, + ERDMA_MR_MTT_0LEVEL); } } @@ -70,8 +70,8 @@ static int create_qp_cmd(struct erdma_ucontext *uctx, struct erdma_qp *qp) req.sq_mtt_cfg = FIELD_PREP(ERDMA_CMD_CREATE_QP_PAGE_OFFSET_MASK, 0) | FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_CNT_MASK, 1) | - FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_TYPE_MASK, - ERDMA_MR_INLINE_MTT); + FIELD_PREP(ERDMA_CMD_CREATE_QP_MTT_LEVEL_MASK, + ERDMA_MR_MTT_0LEVEL); req.rq_mtt_cfg = req.sq_mtt_cfg; req.rq_buf_addr = qp->kern_qp.rq_buf_dma_addr; @@ -140,12 +140,17 @@ static int regmr_cmd(struct erdma_dev *dev, struct erdma_mr *mr) if (mr->type == ERDMA_MR_TYPE_FRMR || mr->mem.page_cnt > ERDMA_MAX_INLINE_MTT_ENTRIES) { - req.phy_addr[0] = mr->mem.mtt->buf_dma; - mtt_level = ERDMA_MR_INDIRECT_MTT; + if (mr->mem.mtt->continuous) { + req.phy_addr[0] = mr->mem.mtt->buf_dma; + mtt_level = ERDMA_MR_MTT_1LEVEL; + } else { + req.phy_addr[0] = sg_dma_address(mr->mem.mtt->sglist); + mtt_level = mr->mem.mtt->level; + } } else { memcpy(req.phy_addr, mr->mem.mtt->buf, MTT_SIZE(mr->mem.page_cnt)); - mtt_level = ERDMA_MR_INLINE_MTT; + mtt_level = ERDMA_MR_MTT_0LEVEL; } req.cfg0 = FIELD_PREP(ERDMA_CMD_MR_VALID_MASK, mr->valid) | @@ -167,6 +172,14 @@ static int regmr_cmd(struct erdma_dev *dev, struct erdma_mr *mr) req.size = mr->mem.len; } + if (!mr->mem.mtt->continuous && mr->mem.mtt->level > 1) { + req.cfg0 |= FIELD_PREP(ERDMA_CMD_MR_VERSION_MASK, 1); + req.cfg2 |= FIELD_PREP(ERDMA_CMD_REGMR_MTT_PAGESIZE_MASK, + PAGE_SHIFT - ERDMA_HW_PAGE_SHIFT); + req.size_h = upper_32_bits(mr->mem.len); + req.mtt_cnt_h = mr->mem.page_cnt >> 20; + } + post_cmd: return erdma_post_cmd_wait(&dev->cmdq, &req, sizeof(req), NULL, NULL); } @@ -194,7 +207,7 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq) req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_CNT_MASK, 1) | FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK, - ERDMA_MR_INLINE_MTT); + ERDMA_MR_MTT_0LEVEL); req.first_page_offset = 0; req.cq_db_info_addr = @@ -209,13 +222,13 @@ static int create_cq_cmd(struct erdma_ucontext *uctx, struct erdma_cq *cq) req.qbuf_addr_h = upper_32_bits(mem->mtt->buf[0]); req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK, - ERDMA_MR_INLINE_MTT); + ERDMA_MR_MTT_0LEVEL); } else { req.qbuf_addr_l = lower_32_bits(mem->mtt->buf_dma); req.qbuf_addr_h = upper_32_bits(mem->mtt->buf_dma); req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_LEVEL_MASK, - ERDMA_MR_INDIRECT_MTT); + ERDMA_MR_MTT_1LEVEL); } req.cfg1 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_MTT_CNT_MASK, mem->mtt_nents); @@ -543,7 +556,6 @@ static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev, size_t size) { struct erdma_mtt *mtt; - int ret = -ENOMEM; mtt = kzalloc(sizeof(*mtt), GFP_KERNEL); if (!mtt) @@ -565,6 +577,104 @@ static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev, err_free_mtt_buf: kfree(mtt->buf); +err_free_mtt: + kfree(mtt); + + return ERR_PTR(-ENOMEM); +} + +static void erdma_destroy_mtt_buf_sg(struct erdma_dev *dev, + struct erdma_mtt *mtt) +{ + dma_unmap_sg(&dev->pdev->dev, mtt->sglist, mtt->nsg, DMA_TO_DEVICE); + vfree(mtt->sglist); +} + +static void erdma_destroy_scatter_mtt(struct erdma_dev *dev, + struct erdma_mtt *mtt) +{ + erdma_destroy_mtt_buf_sg(dev, mtt); + vfree(mtt->buf); + kfree(mtt); +} + +static void erdma_init_middle_mtt(struct erdma_mtt *mtt, + struct erdma_mtt *low_mtt) +{ + struct scatterlist *sg; + u32 idx = 0, i; + + for_each_sg(low_mtt->sglist, sg, low_mtt->nsg, i) + mtt->buf[idx++] = sg_dma_address(sg); +} + +static int erdma_create_mtt_buf_sg(struct erdma_dev *dev, struct erdma_mtt *mtt) +{ + struct scatterlist *sglist; + void *buf = mtt->buf; + u32 npages, i, nsg; + struct page *pg; + + /* Failed if buf is not page aligned */ + if ((uintptr_t)buf & ~PAGE_MASK) + return -EINVAL; + + npages = DIV_ROUND_UP(mtt->size, PAGE_SIZE); + sglist = vzalloc(npages * sizeof(*sglist)); + if (!sglist) + return -ENOMEM; + + sg_init_table(sglist, npages); + for (i = 0; i < npages; i++) { + pg = vmalloc_to_page(buf); + if (!pg) + goto err; + sg_set_page(&sglist[i], pg, PAGE_SIZE, 0); + buf += PAGE_SIZE; + } + + nsg = dma_map_sg(&dev->pdev->dev, sglist, npages, DMA_TO_DEVICE); + if (!nsg) + goto err; + + mtt->sglist = sglist; + mtt->nsg = nsg; + + return 0; +err: + vfree(sglist); + + return -ENOMEM; +} + +static struct erdma_mtt *erdma_create_scatter_mtt(struct erdma_dev *dev, + size_t size) +{ + struct erdma_mtt *mtt; + int ret = -ENOMEM; + + mtt = kzalloc(sizeof(*mtt), GFP_KERNEL); + if (!mtt) + return NULL; + + mtt->size = ALIGN(size, PAGE_SIZE); + mtt->buf = vzalloc(mtt->size); + mtt->continuous = false; + if (!mtt->buf) + goto err_free_mtt; + + ret = erdma_create_mtt_buf_sg(dev, mtt); + if (ret) + goto err_free_mtt_buf; + + ibdev_dbg(&dev->ibdev, "create scatter mtt, size:%lu, nsg:%u\n", + mtt->size, mtt->nsg); + + return mtt; + +err_free_mtt_buf: + vfree(mtt->buf); + err_free_mtt: kfree(mtt); @@ -574,28 +684,77 @@ static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev, static struct erdma_mtt *erdma_create_mtt(struct erdma_dev *dev, size_t size, bool force_continuous) { + struct erdma_mtt *mtt, *tmp_mtt; + int ret, level = 0; + ibdev_dbg(&dev->ibdev, "create_mtt, size:%lu, force cont:%d\n", size, force_continuous); + if (!(dev->attrs.cap_flags & ERDMA_DEV_CAP_FLAGS_MTT_VA)) + force_continuous = true; + if (force_continuous) return erdma_create_cont_mtt(dev, size); - return ERR_PTR(-ENOTSUPP); + mtt = erdma_create_scatter_mtt(dev, size); + if (IS_ERR(mtt)) + return mtt; + level = 1; + + /* convergence the mtt table. */ + while (mtt->nsg != 1 && level <= 3) { + tmp_mtt = erdma_create_scatter_mtt(dev, MTT_SIZE(mtt->nsg)); + if (IS_ERR(tmp_mtt)) { + ret = PTR_ERR(tmp_mtt); + goto err_free_mtt; + } + erdma_init_middle_mtt(tmp_mtt, mtt); + tmp_mtt->low_level = mtt; + mtt = tmp_mtt; + level++; + } + + if (level > 3) { + ret = -ENOMEM; + goto err_free_mtt; + } + + mtt->level = level; + ibdev_dbg(&dev->ibdev, "top mtt: level:%d, dma_addr 0x%llx\n", + mtt->level, mtt->sglist[0].dma_address); + + return mtt; +err_free_mtt: + while (mtt) { + tmp_mtt = mtt->low_level; + erdma_destroy_scatter_mtt(dev, mtt); + mtt = tmp_mtt; + } + + return ERR_PTR(ret); } static void erdma_destroy_mtt(struct erdma_dev *dev, struct erdma_mtt *mtt) { + struct erdma_mtt *tmp_mtt; + if (mtt->continuous) { dma_unmap_single(&dev->pdev->dev, mtt->buf_dma, mtt->size, DMA_TO_DEVICE); kfree(mtt->buf); kfree(mtt); + } else { + while (mtt) { + tmp_mtt = mtt->low_level; + erdma_destroy_scatter_mtt(dev, mtt); + mtt = tmp_mtt; + } } } static int get_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem, u64 start, u64 len, int access, u64 virt, - unsigned long req_page_size, u8 force_indirect_mtt) + unsigned long req_page_size, bool force_continuous) { int ret = 0; @@ -612,7 +771,8 @@ static int get_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem, mem->page_offset = start & (mem->page_size - 1); mem->mtt_nents = ib_umem_num_dma_blocks(mem->umem, mem->page_size); mem->page_cnt = mem->mtt_nents; - mem->mtt = erdma_create_mtt(dev, MTT_SIZE(mem->page_cnt), true); + mem->mtt = erdma_create_mtt(dev, MTT_SIZE(mem->page_cnt), + force_continuous); if (IS_ERR(mem->mtt)) { ret = PTR_ERR(mem->mtt); goto error_ret; @@ -717,7 +877,7 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx, ret = get_mtt_entries(qp->dev, &qp->user_qp.sq_mem, va, qp->attrs.sq_size << SQEBB_SHIFT, 0, va, - (SZ_1M - SZ_4K), 1); + (SZ_1M - SZ_4K), true); if (ret) return ret; @@ -726,7 +886,7 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx, ret = get_mtt_entries(qp->dev, &qp->user_qp.rq_mem, va + rq_offset, qp->attrs.rq_size << RQE_SHIFT, 0, va + rq_offset, - (SZ_1M - SZ_4K), 1); + (SZ_1M - SZ_4K), true); if (ret) goto put_sq_mtt; @@ -998,7 +1158,7 @@ struct ib_mr *erdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len, return ERR_PTR(-ENOMEM); ret = get_mtt_entries(dev, &mr->mem, start, len, access, virt, - SZ_2G - SZ_4K, 0); + SZ_2G - SZ_4K, false); if (ret) goto err_out_free; @@ -1423,7 +1583,7 @@ static int erdma_init_user_cq(struct erdma_ucontext *ctx, struct erdma_cq *cq, ret = get_mtt_entries(dev, &cq->user_cq.qbuf_mem, ureq->qbuf_va, ureq->qbuf_len, 0, ureq->qbuf_va, SZ_64M - SZ_4K, - 1); + true); if (ret) return ret; diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.h b/drivers/infiniband/hw/erdma/erdma_verbs.h index 5f639f27a8a9..eb9c0f92fb6f 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.h +++ b/drivers/infiniband/hw/erdma/erdma_verbs.h @@ -73,8 +73,8 @@ struct erdma_pd { #define ERDMA_MR_TYPE_FRMR 1 #define ERDMA_MR_TYPE_DMA 2 -#define ERDMA_MR_INLINE_MTT 0 -#define ERDMA_MR_INDIRECT_MTT 1 +#define ERDMA_MR_MTT_0LEVEL 0 +#define ERDMA_MR_MTT_1LEVEL 1 #define ERDMA_MR_ACC_RA BIT(0) #define ERDMA_MR_ACC_LR BIT(1)