From patchwork Thu May 18 08:21:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0756CC77B7D for ; Thu, 18 May 2023 08:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229779AbjERIWu (ORCPT ); Thu, 18 May 2023 04:22:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230127AbjERIWq (ORCPT ); Thu, 18 May 2023 04:22:46 -0400 Received: from esa4.hc1455-7.c3s2.iphmx.com (esa4.hc1455-7.c3s2.iphmx.com [68.232.139.117]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ABC1E49; Thu, 18 May 2023 01:22:37 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="117097140" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="117097140" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa4.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:24 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 2A81EC3F94; Thu, 18 May 2023 17:22:22 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 665D0C4A14; Thu, 18 May 2023 17:22:21 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id 338E4200B2AE; Thu, 18 May 2023 17:22:21 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Date: Thu, 18 May 2023 17:21:46 +0900 Message-Id: <3edbcfacf7ff4fbe8b527a372ef73d1a6db17144.1684397037.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Both responder and completer need to sleep to execute page-fault when used with ODP. It can happen when they are going to access user MRs, so tasks must be executed in process context for such cases. Additionally, current implementation seldom defers tasks to workqueue, but instead defers to a softirq context running do_task(). It is called from rxe_resp_queue_pkt() and rxe_comp_queue_pkt() in SOFTIRQ_NET_RX context and can last until maximum RXE_MAX_ITERATIONS (=1024) loops are executed. The problem is the that task execuion appears to be anonymous loads in the system and that the loop can throttle other softirqs on the same CPU. This patch makes responder and completer codes run in process context for ODP and the problem described above. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_comp.c | 12 +----------- drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 - drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 - drivers/infiniband/sw/rxe/rxe_resp.c | 13 +------------ 4 files changed, 2 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index db18ace74d2b..671fdb645030 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -128,18 +128,8 @@ void retransmit_timer(struct timer_list *t) void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - skb_queue_tail(&qp->resp_pkts, skb); - - must_sched = skb_queue_len(&qp->resp_pkts) > 1; - if (must_sched != 0) - rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED); - - if (must_sched) - rxe_sched_task(&qp->comp.task); - else - rxe_run_task(&qp->comp.task); + rxe_sched_task(&qp->comp.task); } static inline enum comp_state get_wqe(struct rxe_qp *qp, diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c index a012522b577a..dc23cf3a6967 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c @@ -14,7 +14,6 @@ static const struct rdma_stat_desc rxe_counter_descs[] = { [RXE_CNT_RCV_RNR].name = "rcvd_rnr_err", [RXE_CNT_SND_RNR].name = "send_rnr_err", [RXE_CNT_RCV_SEQ_ERR].name = "rcvd_seq_err", - [RXE_CNT_COMPLETER_SCHED].name = "ack_deferred", [RXE_CNT_RETRY_EXCEEDED].name = "retry_exceeded_err", [RXE_CNT_RNR_RETRY_EXCEEDED].name = "retry_rnr_exceeded_err", [RXE_CNT_COMP_RETRY].name = "completer_retry_err", diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h index 71f4d4fa9dc8..303da0e3134a 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h @@ -18,7 +18,6 @@ enum rxe_counters { RXE_CNT_RCV_RNR, RXE_CNT_SND_RNR, RXE_CNT_RCV_SEQ_ERR, - RXE_CNT_COMPLETER_SCHED, RXE_CNT_RETRY_EXCEEDED, RXE_CNT_RNR_RETRY_EXCEEDED, RXE_CNT_COMP_RETRY, diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 68f6cd188d8e..ba0222bfce9e 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -46,21 +46,10 @@ static char *resp_state_name[] = { [RESPST_EXIT] = "EXIT", }; -/* rxe_recv calls here to add a request packet to the input queue */ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); - skb_queue_tail(&qp->req_pkts, skb); - - must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) || - (skb_queue_len(&qp->req_pkts) > 1); - - if (must_sched) - rxe_sched_task(&qp->resp.task); - else - rxe_run_task(&qp->resp.task); + rxe_sched_task(&qp->resp.task); } static inline enum resp_states get_req(struct rxe_qp *qp, From patchwork Thu May 18 08:21:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44095C7EE23 for ; Thu, 18 May 2023 08:22:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbjERIWe (ORCPT ); Thu, 18 May 2023 04:22:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230049AbjERIWd (ORCPT ); Thu, 18 May 2023 04:22:33 -0400 Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E2229C; Thu, 18 May 2023 01:22:30 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="117094701" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="117094701" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:27 +0900 Received: from oym-m2.gw.nic.fujitsu.com (oym-nat-oym-m2.gw.nic.fujitsu.com [192.168.87.59]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 66F2CD647A; Thu, 18 May 2023 17:22:24 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by oym-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id A20E4BF3DE; Thu, 18 May 2023 17:22:23 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id 5B508200A4E1; Thu, 18 May 2023 17:22:23 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Date: Thu, 18 May 2023 17:21:47 +0900 Message-Id: <412a29df988820ba9b7f0e8b4198ecae5b3dc79e.1684397037.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Some functions in rxe_mr.c are going to be used in rxe_odp.c, which is to be created in the subsequent patch. List the declarations of the functions in rxe_loc.h. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_loc.h | 14 ++++++++++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 18 ++++-------------- 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 804b15e929dd..00fedd1a4980 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -60,7 +60,9 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ u8 rxe_get_next_key(u32 last_key); +void rxe_mr_init(int access, struct rxe_mr *mr); void rxe_mr_init_dma(int access, struct rxe_mr *mr); +int rxe_mr_fill_pages_from_sgt(struct rxe_mr *mr, struct sg_table *sgt); int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access, struct rxe_mr *mr); int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr); @@ -71,6 +73,8 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir); int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val); int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value); @@ -82,6 +86,16 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 key); int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe); void rxe_mr_cleanup(struct rxe_pool_elem *elem); +static inline unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) +{ + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); +} + +static inline unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) +{ + return iova & (mr_page_size(mr) - 1); +} + /* rxe_mw.c */ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); int rxe_dealloc_mw(struct ib_mw *ibmw); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 0e538fafcc20..ffbac6f5e828 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -49,7 +49,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) | IB_ACCESS_REMOTE_WRITE \ | IB_ACCESS_REMOTE_ATOMIC) -static void rxe_mr_init(int access, struct rxe_mr *mr) +void rxe_mr_init(int access, struct rxe_mr *mr) { u32 lkey = mr->elem.index << 8 | rxe_get_next_key(-1); u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0; @@ -77,16 +77,6 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr) mr->ibmr.type = IB_MR_TYPE_DMA; } -static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) -{ - return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); -} - -static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) -{ - return iova & (mr_page_size(mr) - 1); -} - static bool is_pmem_page(struct page *pg) { unsigned long paddr = page_to_phys(pg); @@ -96,7 +86,7 @@ static bool is_pmem_page(struct page *pg) IORES_DESC_PERSISTENT_MEMORY); } -static int rxe_mr_fill_pages_from_sgt(struct rxe_mr *mr, struct sg_table *sgt) +int rxe_mr_fill_pages_from_sgt(struct rxe_mr *mr, struct sg_table *sgt) { XA_STATE(xas, &mr->page_list, 0); struct sg_page_iter sg_iter; @@ -247,8 +237,8 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page); } -static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, - unsigned int length, enum rxe_mr_copy_dir dir) +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir) { unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova); unsigned long index = rxe_mr_iova_to_index(mr, iova); From patchwork Thu May 18 08:21:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B4D4C7EE24 for ; Thu, 18 May 2023 08:22:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230063AbjERIWd (ORCPT ); Thu, 18 May 2023 04:22:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230038AbjERIWc (ORCPT ); Thu, 18 May 2023 04:22:32 -0400 Received: from esa10.hc1455-7.c3s2.iphmx.com (esa10.hc1455-7.c3s2.iphmx.com [139.138.36.225]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD1A7DA; Thu, 18 May 2023 01:22:30 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="104795498" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="104795498" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa10.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:28 +0900 Received: from yto-m1.gw.nic.fujitsu.com (yto-nat-yto-m1.gw.nic.fujitsu.com [192.168.83.64]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id DBC9FD772C; Thu, 18 May 2023 17:22:25 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by yto-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id 2B8C5CF7C9; Thu, 18 May 2023 17:22:25 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id EAC2F200A4E1; Thu, 18 May 2023 17:22:24 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h Date: Thu, 18 May 2023 17:21:48 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org To use the resp_states values in rxe_loc.h, it is necessary to move the definition to rxe_verbs.h, where other internal states of this driver are defined. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.h | 37 --------------------------- drivers/infiniband/sw/rxe/rxe_verbs.h | 37 +++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index d33dd6cf83d3..9b4d044a1264 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -100,43 +100,6 @@ #define rxe_info_mw(mw, fmt, ...) ibdev_info_ratelimited((mw)->ibmw.device, \ "mw#%d %s: " fmt, (mw)->elem.index, __func__, ##__VA_ARGS__) -/* responder states */ -enum resp_states { - RESPST_NONE, - RESPST_GET_REQ, - RESPST_CHK_PSN, - RESPST_CHK_OP_SEQ, - RESPST_CHK_OP_VALID, - RESPST_CHK_RESOURCE, - RESPST_CHK_LENGTH, - RESPST_CHK_RKEY, - RESPST_EXECUTE, - RESPST_READ_REPLY, - RESPST_ATOMIC_REPLY, - RESPST_ATOMIC_WRITE_REPLY, - RESPST_PROCESS_FLUSH, - RESPST_COMPLETE, - RESPST_ACKNOWLEDGE, - RESPST_CLEANUP, - RESPST_DUPLICATE_REQUEST, - RESPST_ERR_MALFORMED_WQE, - RESPST_ERR_UNSUPPORTED_OPCODE, - RESPST_ERR_MISALIGNED_ATOMIC, - RESPST_ERR_PSN_OUT_OF_SEQ, - RESPST_ERR_MISSING_OPCODE_FIRST, - RESPST_ERR_MISSING_OPCODE_LAST_C, - RESPST_ERR_MISSING_OPCODE_LAST_D1E, - RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, - RESPST_ERR_RNR, - RESPST_ERR_RKEY_VIOLATION, - RESPST_ERR_INVALIDATE_RKEY, - RESPST_ERR_LENGTH, - RESPST_ERR_CQ_OVERFLOW, - RESPST_ERROR, - RESPST_DONE, - RESPST_EXIT, -}; - void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 26a20f088692..b6fbd9b3d086 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -127,6 +127,43 @@ struct rxe_comp_info { struct rxe_task task; }; +/* responder states */ +enum resp_states { + RESPST_NONE, + RESPST_GET_REQ, + RESPST_CHK_PSN, + RESPST_CHK_OP_SEQ, + RESPST_CHK_OP_VALID, + RESPST_CHK_RESOURCE, + RESPST_CHK_LENGTH, + RESPST_CHK_RKEY, + RESPST_EXECUTE, + RESPST_READ_REPLY, + RESPST_ATOMIC_REPLY, + RESPST_ATOMIC_WRITE_REPLY, + RESPST_PROCESS_FLUSH, + RESPST_COMPLETE, + RESPST_ACKNOWLEDGE, + RESPST_CLEANUP, + RESPST_DUPLICATE_REQUEST, + RESPST_ERR_MALFORMED_WQE, + RESPST_ERR_UNSUPPORTED_OPCODE, + RESPST_ERR_MISALIGNED_ATOMIC, + RESPST_ERR_PSN_OUT_OF_SEQ, + RESPST_ERR_MISSING_OPCODE_FIRST, + RESPST_ERR_MISSING_OPCODE_LAST_C, + RESPST_ERR_MISSING_OPCODE_LAST_D1E, + RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, + RESPST_ERR_RNR, + RESPST_ERR_RKEY_VIOLATION, + RESPST_ERR_INVALIDATE_RKEY, + RESPST_ERR_LENGTH, + RESPST_ERR_CQ_OVERFLOW, + RESPST_ERROR, + RESPST_DONE, + RESPST_EXIT, +}; + enum rdatm_res_state { rdatm_res_state_next, rdatm_res_state_new, From patchwork Thu May 18 08:21:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6B27C7EE24 for ; Thu, 18 May 2023 08:22:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230095AbjERIWg (ORCPT ); Thu, 18 May 2023 04:22:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229779AbjERIWd (ORCPT ); Thu, 18 May 2023 04:22:33 -0400 Received: from esa3.hc1455-7.c3s2.iphmx.com (esa3.hc1455-7.c3s2.iphmx.com [207.54.90.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11E2ADF; Thu, 18 May 2023 01:22:31 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="117164683" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="117164683" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa3.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:29 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 52A0AD6473; Thu, 18 May 2023 17:22:27 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 83F64D94AA; Thu, 18 May 2023 17:22:26 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id 3FB84200A4E1; Thu, 18 May 2023 17:22:26 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 4/7] RDMA/rxe: Add page invalidation support Date: Thu, 18 May 2023 17:21:49 +0900 Message-Id: <6e9d5e2d1b2a63385d1e4fc2a74c3b11d4909842.1684397037.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On page invalidation, an MMU notifier callback is invoked to unmap DMA addresses and update the driver page table(umem_odp->dma_list). It also set the corresponding entries in MR xarray to NULL to prevent any access. The callback is registered when an ODP-enabled MR is created. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/Makefile | 2 ++ drivers/infiniband/sw/rxe/rxe_odp.c | 56 +++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..93134f1d1d0c 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -23,3 +23,5 @@ rdma_rxe-y := \ rxe_task.o \ rxe_net.o \ rxe_hw_counters.o + +rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c new file mode 100644 index 000000000000..b69b25e0fef6 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved. + */ + +#include + +#include + +#include "rxe.h" + +static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end) +{ + unsigned long lower, upper, idx; + + lower = rxe_mr_iova_to_index(mr, start); + upper = rxe_mr_iova_to_index(mr, end); + + /* make elements in xarray NULL */ + spin_lock(&mr->page_list.xa_lock); + for (idx = lower; idx <= upper; idx++) + __xa_erase(&mr->page_list, idx); + spin_unlock(&mr->page_list.xa_lock); +} + +static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct ib_umem_odp *umem_odp = + container_of(mni, struct ib_umem_odp, notifier); + struct rxe_mr *mr = umem_odp->private; + unsigned long start, end; + + if (!mmu_notifier_range_blockable(range)) + return false; + + mutex_lock(&umem_odp->umem_mutex); + mmu_interval_set_seq(mni, cur_seq); + + start = max_t(u64, ib_umem_start(umem_odp), range->start); + end = min_t(u64, ib_umem_end(umem_odp), range->end); + + rxe_mr_unset_xarray(mr, start, end); + + /* update umem_odp->dma_list */ + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); + + mutex_unlock(&umem_odp->umem_mutex); + return true; +} + +const struct mmu_interval_notifier_ops rxe_mn_ops = { + .invalidate = rxe_ib_invalidate_range, +}; From patchwork Thu May 18 08:21:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246284 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83BEBC77B7D for ; Thu, 18 May 2023 08:22:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230134AbjERIWq (ORCPT ); Thu, 18 May 2023 04:22:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230076AbjERIWg (ORCPT ); Thu, 18 May 2023 04:22:36 -0400 Received: from esa1.hc1455-7.c3s2.iphmx.com (esa1.hc1455-7.c3s2.iphmx.com [207.54.90.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3D929C; Thu, 18 May 2023 01:22:33 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="117004564" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="117004564" Received: from unknown (HELO yto-r1.gw.nic.fujitsu.com) ([218.44.52.217]) by esa1.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:31 +0900 Received: from yto-m1.gw.nic.fujitsu.com (yto-nat-yto-m1.gw.nic.fujitsu.com [192.168.83.64]) by yto-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id 8C8A4DB394; Thu, 18 May 2023 17:22:28 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by yto-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id DDE8FCF7C9; Thu, 18 May 2023 17:22:27 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id A5C5E200B2AE; Thu, 18 May 2023 17:22:27 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Date: Thu, 18 May 2023 17:21:50 +0900 Message-Id: <7d8595c23e954e0fdc19b14e95da13ceef2adafd.1684397037.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Allow applications to register an ODP-enabled MR, in which case the flag IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no RDMA operation supported right now. They will be enabled later in the subsequent two patches. rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs process address space from the CPU page table to the driver page table (dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page fault when pages being accessed are not present or do not have proper read/write permissions, and possibly to prefetch pages in the future. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 7 ++ drivers/infiniband/sw/rxe/rxe_loc.h | 14 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 +- drivers/infiniband/sw/rxe/rxe_odp.c | 120 ++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 15 +++- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 2 + 7 files changed, 166 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 54c723a6edda..f2284d27229b 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -73,6 +73,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->ndev->dev_addr); rxe->max_ucontext = RXE_MAX_UCONTEXT; + + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; + + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + } } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 00fedd1a4980..93247d123642 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -202,4 +202,18 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; } +/* rxe_odp.c */ +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr); +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +static inline int +rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + return -EOPNOTSUPP; +} + +#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index ffbac6f5e828..cd368cd096c8 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -318,7 +318,10 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, return err; } - return rxe_mr_copy_xarray(mr, iova, addr, length, dir); + if (mr->odp_enabled) + return -EOPNOTSUPP; + else + return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } /* copy data in or out of a wqe, i.e. sg list @@ -527,6 +530,10 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) struct page *page; u64 *va; + /* ODP is not supported right now. WIP. */ + if (mr->odp_enabled) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state"); diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index b69b25e0fef6..e5497d09c399 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -24,6 +24,24 @@ static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, spin_unlock(&mr->page_list.xa_lock); } +static void rxe_mr_set_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end, unsigned long *pfn_list) +{ + unsigned long lower, upper, idx; + struct page *page; + + lower = rxe_mr_iova_to_index(mr, start); + upper = rxe_mr_iova_to_index(mr, end); + + /* make pages visible in xarray. no sleep while taking the lock */ + spin_lock(&mr->page_list.xa_lock); + for (idx = lower; idx <= upper; idx++) { + page = hmm_pfn_to_page(pfn_list[idx]); + __xa_store(&mr->page_list, idx, page, GFP_ATOMIC); + } + spin_unlock(&mr->page_list.xa_lock); +} + static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_notifier_range *range, unsigned long cur_seq) @@ -54,3 +72,105 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_interval_notifier_ops rxe_mn_ops = { .invalidate = rxe_ib_invalidate_range, }; + +#define RXE_PAGEFAULT_RDONLY BIT(1) +#define RXE_PAGEFAULT_SNAPSHOT BIT(2) +static int rxe_odp_do_pagefault(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) +{ + int np; + u64 access_mask; + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + access_mask = ODP_READ_ALLOWED_BIT; + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) + access_mask |= ODP_WRITE_ALLOWED_BIT; + + /* + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. + * Callers must release the lock later to let invalidation handler + * do its work again. + */ + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, + access_mask, fault); + if (np < 0) + return np; + + /* umem_mutex is still locked here, so we can use hmm_pfn_to_page() + * safely to fetch pages in the range. + */ + rxe_mr_set_xarray(mr, user_va, user_va + bcnt, umem_odp->pfn_list); + + return np; +} + +static int rxe_odp_init_pages(struct rxe_mr *mr) +{ + int ret; + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + ret = rxe_odp_do_pagefault(mr, mr->umem->address, mr->umem->length, + RXE_PAGEFAULT_SNAPSHOT); + + if (ret >= 0) + mutex_unlock(&umem_odp->umem_mutex); + + return ret >= 0 ? 0 : ret; +} + +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr) +{ + int err; + struct ib_umem_odp *umem_odp; + + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return -EOPNOTSUPP; + + rxe_mr_init(access_flags, mr); + + xa_init(&mr->page_list); + + if (!start && length == U64_MAX) { + if (iova != 0) + return -EINVAL; + if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) + return -EINVAL; + + /* Never reach here, for implicit ODP is not implemented. */ + } + + umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags, + &rxe_mn_ops); + if (IS_ERR(umem_odp)) { + rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n", + (int)PTR_ERR(umem_odp)); + return PTR_ERR(umem_odp); + } + + umem_odp->private = mr; + + mr->odp_enabled = true; + mr->umem = &umem_odp->umem; + mr->access = access_flags; + mr->ibmr.length = length; + mr->ibmr.iova = iova; + mr->page_offset = ib_umem_offset(&umem_odp->umem); + + err = rxe_odp_init_pages(mr); + if (err) { + ib_umem_odp_release(umem_odp); + return err; + } + + err = rxe_mr_fill_pages_from_sgt(mr, &umem_odp->umem.sgt_append.sgt); + if (err) { + ib_umem_odp_release(umem_odp); + return err; + } + + mr->state = RXE_MR_STATE_VALID; + mr->ibmr.type = IB_MR_TYPE_USER; + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index ba0222bfce9e..90c31c4f2944 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -626,6 +626,10 @@ static enum resp_states process_flush(struct rxe_qp *qp, struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; + /* ODP is not supported right now. WIP. */ + if (mr->odp_enabled) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* oA19-14, oA19-15 */ if (res && res->replay) return RESPST_ACKNOWLEDGE; @@ -679,10 +683,13 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, if (!res->replay) { u64 iova = qp->resp.va + qp->resp.offset; - err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, - atmeth_comp(pkt), - atmeth_swap_add(pkt), - &res->atomic.orig_val); + if (mr->odp_enabled) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); if (err) return err; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index dea605b7f683..9c23defdc7b5 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1274,7 +1274,10 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start, mr->ibmr.pd = ibpd; mr->ibmr.device = ibpd->device; - err = rxe_mr_init_user(rxe, start, length, iova, access, mr); + if (access & IB_ACCESS_ON_DEMAND) + err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr); + else + err = rxe_mr_init_user(rxe, start, length, iova, access, mr); if (err) { rxe_dbg_mr(mr, "reg_user_mr failed, err = %d", err); goto err_cleanup; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index b6fbd9b3d086..de5a982c7c7e 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -333,6 +333,8 @@ struct rxe_mr { u32 nbuf; struct xarray page_list; + + bool odp_enabled; }; static inline unsigned int mr_page_size(struct rxe_mr *mr) From patchwork Thu May 18 08:21:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246286 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0D5BC7EE22 for ; Thu, 18 May 2023 08:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230179AbjERIWv (ORCPT ); Thu, 18 May 2023 04:22:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230124AbjERIWp (ORCPT ); Thu, 18 May 2023 04:22:45 -0400 Received: from esa5.hc1455-7.c3s2.iphmx.com (esa5.hc1455-7.c3s2.iphmx.com [68.232.139.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AFA1DA for ; Thu, 18 May 2023 01:22:35 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="116583933" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="116583933" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa5.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:33 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id D6B0FD5041; Thu, 18 May 2023 17:22:30 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 31585D21D5; Thu, 18 May 2023 17:22:30 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id F0FBF200A4E1; Thu, 18 May 2023 17:22:29 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Date: Thu, 18 May 2023 17:21:51 +0900 Message-Id: <25d903e0136ea1e65c612d8f6b8c18c1f010add7.1684397037.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses it to load payloads of requesting packets; responder uses it to process Send, Write, and Read operaetions; completer uses it to copy data from response packets of Read and Atomic operations to a user MR. Allow these operations to be used with ODP by adding a subordinate function rxe_odp_mr_copy(). It is comprised of the following steps: 1. Check the driver page table(umem_odp->dma_list) to see if pages being accessed are present with appropriate permission. 2. If necessary, trigger page fault to map the pages. 3. Update the MR xarray using PFNs in umem_odp->pfn_list. 4. Execute data copy to/from the pages. umem_mutex is used to ensure that dma_list (an array of addresses of an MR) is not changed while it is being checked and that mapped pages are not invalidated before data copy completes. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 10 +++ drivers/infiniband/sw/rxe/rxe_loc.h | 8 ++ drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_odp.c | 109 ++++++++++++++++++++++++++++ 4 files changed, 128 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index f2284d27229b..207a022156f0 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -79,6 +79,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe) /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; + + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 93247d123642..4b95c8c46bdc 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -206,6 +206,8 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -213,6 +215,12 @@ rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index cd368cd096c8..0e3cda59d702 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -319,7 +319,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, } if (mr->odp_enabled) - return -EOPNOTSUPP; + return rxe_odp_mr_copy(mr, iova, addr, length, dir); else return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index e5497d09c399..cbe5d0c3fcc4 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -174,3 +174,112 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, return err; } + +static inline bool rxe_is_pagefault_neccesary(struct ib_umem_odp *umem_odp, + u64 iova, int length, u32 perm) +{ + int idx; + u64 addr; + bool need_fault = false; + + addr = iova & (~(BIT(umem_odp->page_shift) - 1)); + + /* Skim through all pages that are to be accessed. */ + while (addr < iova + length) { + idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + + if (!(umem_odp->dma_list[idx] & perm)) { + need_fault = true; + break; + } + + addr += BIT(umem_odp->page_shift); + } + return need_fault; +} + +/* umem mutex must be locked before entering this function. */ +static int rxe_odp_map_range(struct rxe_mr *mr, u64 iova, int length, u32 flags) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + const int max_tries = 3; + int cnt = 0; + + int err; + u64 perm; + bool need_fault; + + if (unlikely(length < 1)) { + mutex_unlock(&umem_odp->umem_mutex); + return -EINVAL; + } + + perm = ODP_READ_ALLOWED_BIT; + if (!(flags & RXE_PAGEFAULT_RDONLY)) + perm |= ODP_WRITE_ALLOWED_BIT; + + /* + * A successful return from rxe_odp_do_pagefault() does not guarantee + * that all pages in the range became present. Recheck the DMA address + * array, allowing max 3 tries for pagefault. + */ + while ((need_fault = rxe_is_pagefault_neccesary(umem_odp, + iova, length, perm))) { + if (cnt >= max_tries) + break; + + mutex_unlock(&umem_odp->umem_mutex); + + /* umem_mutex is locked on success. */ + err = rxe_odp_do_pagefault(mr, iova, length, flags); + if (err < 0) + return err; + + cnt++; + } + + if (need_fault) + return -EFAULT; + + return 0; +} + +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u32 flags = 0; + int err; + + if (unlikely(!mr->odp_enabled)) + return -EOPNOTSUPP; + + switch (dir) { + case RXE_TO_MR_OBJ: + break; + + case RXE_FROM_MR_OBJ: + flags = RXE_PAGEFAULT_RDONLY; + break; + + default: + return -EINVAL; + } + + /* If pagefault is not required, umem mutex will be held until data + * copy to the MR completes. Otherwise, it is released and locked + * again in rxe_odp_map_range() to let invalidation handler do its + * work meanwhile. + */ + mutex_lock(&umem_odp->umem_mutex); + + err = rxe_odp_map_range(mr, iova, length, flags); + if (err) + return err; + + err = rxe_mr_copy_xarray(mr, iova, addr, length, dir); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} From patchwork Thu May 18 08:21:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13246287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5A94C77B7D for ; Thu, 18 May 2023 08:22:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230142AbjERIWx (ORCPT ); Thu, 18 May 2023 04:22:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230085AbjERIWs (ORCPT ); Thu, 18 May 2023 04:22:48 -0400 Received: from esa9.hc1455-7.c3s2.iphmx.com (esa9.hc1455-7.c3s2.iphmx.com [139.138.36.223]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C13D10D9; Thu, 18 May 2023 01:22:38 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="105317055" X-IronPort-AV: E=Sophos;i="5.99,284,1677510000"; d="scan'208";a="105317055" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa9.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:22:35 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 7CD9CD772E; Thu, 18 May 2023 17:22:33 +0900 (JST) Received: from m3004.s.css.fujitsu.com (m3004.s.css.fujitsu.com [10.128.233.124]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id C0ADBD5006; Thu, 18 May 2023 17:22:32 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.106]) by m3004.s.css.fujitsu.com (Postfix) with ESMTP id 8CF04200A4E1; Thu, 18 May 2023 17:22:32 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v5 7/7] RDMA/rxe: Add support for the traditional Atomic operations with ODP Date: Thu, 18 May 2023 17:21:52 +0900 Message-Id: <2841b1a86987564f14f15ec5b59f6a8bead86b30.1684397037.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Enable 'fetch and add' and 'compare and swap' operations to manipulate data in an ODP-enabled MR. This is comprised of the following steps: 1. Check the driver page table(umem_odp->dma_list) to see if the target page is both readable and writable. 2. If not, then trigger page fault to map the page. 3. Update the entry in the MR xarray. 4. Execute the operation. umem_mutex is used to ensure that dma_list (an array of addresses of an MR) is not changed while it is being checked and that the target page is not invalidated before data access completes. Signed-off-by: Daisuke Matsuda Reviewed-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 9 +++++++++ drivers/infiniband/sw/rxe/rxe_odp.c | 26 ++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 5 ++++- 4 files changed, 40 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 207a022156f0..abd3267c2873 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -88,6 +88,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4b95c8c46bdc..b9d2985774ee 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -208,6 +208,9 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val); + #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -221,6 +224,12 @@ rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + return RESPST_ERR_UNSUPPORTED_OPCODE; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index cbe5d0c3fcc4..194b1fab98b7 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -283,3 +283,29 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } + +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + int err; + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + /* If pagefault is not required, umem mutex will be held until the + * atomic operation completes. Otherwise, it is released and locked + * again in rxe_odp_map_range() to let invalidation handler do its + * work meanwhile. + */ + mutex_lock(&umem_odp->umem_mutex); + + /* Atomic operations manipulate a single char. */ + err = rxe_odp_map_range(mr, iova, sizeof(char), 0); + if (err) + return err; + + err = rxe_mr_do_atomic_op(mr, iova, opcode, compare, + swap_add, orig_val); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 90c31c4f2944..0a918145dc07 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -684,7 +684,10 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, u64 iova = qp->resp.va + qp->resp.offset; if (mr->odp_enabled) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_mr_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); else err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, atmeth_comp(pkt),