From patchwork Thu Nov 9 05:44:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63248C4332F for ; Thu, 9 Nov 2023 05:45:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232382AbjKIFpm (ORCPT ); Thu, 9 Nov 2023 00:45:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232459AbjKIFpl (ORCPT ); Thu, 9 Nov 2023 00:45:41 -0500 Received: from esa9.hc1455-7.c3s2.iphmx.com (esa9.hc1455-7.c3s2.iphmx.com [139.138.36.223]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEF5926B2; Wed, 8 Nov 2023 21:45:38 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="127393828" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="127393828" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa9.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:35 +0900 Received: from yto-m1.gw.nic.fujitsu.com (yto-nat-yto-m1.gw.nic.fujitsu.com [192.168.83.64]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 96445D6189; Thu, 9 Nov 2023 14:45:33 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id DC626CFAB3; Thu, 9 Nov 2023 14:45:32 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id A958F2005323; Thu, 9 Nov 2023 14:45:32 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Date: Thu, 9 Nov 2023 14:44:46 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Both responder and completer need to sleep to execute page-fault when used with ODP. It can happen when they are going to access user MRs, so tasks must be executed in process context for such cases. Additionally, current implementation seldom defers tasks to workqueue, but instead defers to a softirq context running do_task(). It is called from rxe_resp_queue_pkt() and rxe_comp_queue_pkt() in SOFTIRQ_NET_RX context and can last until maximum RXE_MAX_ITERATIONS (=1024) loops are executed. The problem is the that task execuion appears to be anonymous loads in the system and that the loop can throttle other softirqs on the same CPU. This patch makes responder and completer codes run in process context for ODP and the problem described above. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_comp.c | 12 +----------- drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 - drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 - drivers/infiniband/sw/rxe/rxe_resp.c | 13 +------------ 4 files changed, 2 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index d0bdc2d8adc8..bb016a43330d 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -129,18 +129,8 @@ void retransmit_timer(struct timer_list *t) void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - skb_queue_tail(&qp->resp_pkts, skb); - - must_sched = skb_queue_len(&qp->resp_pkts) > 1; - if (must_sched != 0) - rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED); - - if (must_sched) - rxe_sched_task(&qp->comp.task); - else - rxe_run_task(&qp->comp.task); + rxe_sched_task(&qp->comp.task); } static inline enum comp_state get_wqe(struct rxe_qp *qp, diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c index a012522b577a..dc23cf3a6967 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c @@ -14,7 +14,6 @@ static const struct rdma_stat_desc rxe_counter_descs[] = { [RXE_CNT_RCV_RNR].name = "rcvd_rnr_err", [RXE_CNT_SND_RNR].name = "send_rnr_err", [RXE_CNT_RCV_SEQ_ERR].name = "rcvd_seq_err", - [RXE_CNT_COMPLETER_SCHED].name = "ack_deferred", [RXE_CNT_RETRY_EXCEEDED].name = "retry_exceeded_err", [RXE_CNT_RNR_RETRY_EXCEEDED].name = "retry_rnr_exceeded_err", [RXE_CNT_COMP_RETRY].name = "completer_retry_err", diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h index 71f4d4fa9dc8..303da0e3134a 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h @@ -18,7 +18,6 @@ enum rxe_counters { RXE_CNT_RCV_RNR, RXE_CNT_SND_RNR, RXE_CNT_RCV_SEQ_ERR, - RXE_CNT_COMPLETER_SCHED, RXE_CNT_RETRY_EXCEEDED, RXE_CNT_RNR_RETRY_EXCEEDED, RXE_CNT_COMP_RETRY, diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index da470a925efc..969e057bbfd1 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -46,21 +46,10 @@ static char *resp_state_name[] = { [RESPST_EXIT] = "EXIT", }; -/* rxe_recv calls here to add a request packet to the input queue */ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); - skb_queue_tail(&qp->req_pkts, skb); - - must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) || - (skb_queue_len(&qp->req_pkts) > 1); - - if (must_sched) - rxe_sched_task(&qp->resp.task); - else - rxe_run_task(&qp->resp.task); + rxe_sched_task(&qp->resp.task); } static inline enum resp_states get_req(struct rxe_qp *qp, From patchwork Thu Nov 9 05:44:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12B25C4167D for ; Thu, 9 Nov 2023 05:47:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232493AbjKIFrK (ORCPT ); Thu, 9 Nov 2023 00:47:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232841AbjKIFq5 (ORCPT ); Thu, 9 Nov 2023 00:46:57 -0500 X-Greylist: delayed 62 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Nov 2023 21:46:42 PST Received: from esa3.hc1455-7.c3s2.iphmx.com (esa3.hc1455-7.c3s2.iphmx.com [207.54.90.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7BD42705; Wed, 8 Nov 2023 21:46:42 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="139093709" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="139093709" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa3.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:37 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 85FA9DC146; Thu, 9 Nov 2023 14:45:35 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id B7FAAD52D3; Thu, 9 Nov 2023 14:45:34 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 735122005323; Thu, 9 Nov 2023 14:45:34 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Date: Thu, 9 Nov 2023 14:44:47 +0900 Message-Id: <945befecbcb4d7c0a3a02d03cfa444aafc7c27b8.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Some functions in rxe_mr.c are going to be used in rxe_odp.c, which is to be created in the subsequent patch. List the declarations of the functions in rxe_loc.h. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_loc.h | 8 ++++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 11 +++-------- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4d2a8ef52c85..eb867f7d0d36 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -58,6 +58,7 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ u8 rxe_get_next_key(u32 last_key); +void rxe_mr_init(int access, struct rxe_mr *mr); void rxe_mr_init_dma(int access, struct rxe_mr *mr); int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access, struct rxe_mr *mr); @@ -69,6 +70,8 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir); int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val); int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value); @@ -80,6 +83,11 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 key); int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe); void rxe_mr_cleanup(struct rxe_pool_elem *elem); +static inline unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) +{ + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); +} + /* rxe_mw.c */ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); int rxe_dealloc_mw(struct ib_mw *ibmw); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index f54042e9aeb2..86b1908d304b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -45,7 +45,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) } } -static void rxe_mr_init(int access, struct rxe_mr *mr) +void rxe_mr_init(int access, struct rxe_mr *mr) { u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); @@ -72,11 +72,6 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr) mr->ibmr.type = IB_MR_TYPE_DMA; } -static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) -{ - return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); -} - static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) { return iova & (mr_page_size(mr) - 1); @@ -242,8 +237,8 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page); } -static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, - unsigned int length, enum rxe_mr_copy_dir dir) +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir) { unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova); unsigned long index = rxe_mr_iova_to_index(mr, iova); From patchwork Thu Nov 9 05:44:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C37EC4332F for ; Thu, 9 Nov 2023 05:47:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231161AbjKIFrK (ORCPT ); Thu, 9 Nov 2023 00:47:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232865AbjKIFq5 (ORCPT ); Thu, 9 Nov 2023 00:46:57 -0500 Received: from esa3.hc1455-7.c3s2.iphmx.com (esa3.hc1455-7.c3s2.iphmx.com [207.54.90.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07B4026BF; Wed, 8 Nov 2023 21:46:43 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="139093714" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="139093714" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa3.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:38 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 67D4AD6186; Thu, 9 Nov 2023 14:45:36 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id B28D8C8BFA; Thu, 9 Nov 2023 14:45:35 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 7F2062005323; Thu, 9 Nov 2023 14:45:35 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h Date: Thu, 9 Nov 2023 14:44:48 +0900 Message-Id: <66c8c10009b4bf9e9d30ef5351976f519ee969c7.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org To use the resp_states values in rxe_loc.h, it is necessary to move the definition to rxe_verbs.h, where other internal states of this driver are defined. Reviewed-by: Bob Pearson Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.h | 37 --------------------------- drivers/infiniband/sw/rxe/rxe_verbs.h | 37 +++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index d33dd6cf83d3..9b4d044a1264 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -100,43 +100,6 @@ #define rxe_info_mw(mw, fmt, ...) ibdev_info_ratelimited((mw)->ibmw.device, \ "mw#%d %s: " fmt, (mw)->elem.index, __func__, ##__VA_ARGS__) -/* responder states */ -enum resp_states { - RESPST_NONE, - RESPST_GET_REQ, - RESPST_CHK_PSN, - RESPST_CHK_OP_SEQ, - RESPST_CHK_OP_VALID, - RESPST_CHK_RESOURCE, - RESPST_CHK_LENGTH, - RESPST_CHK_RKEY, - RESPST_EXECUTE, - RESPST_READ_REPLY, - RESPST_ATOMIC_REPLY, - RESPST_ATOMIC_WRITE_REPLY, - RESPST_PROCESS_FLUSH, - RESPST_COMPLETE, - RESPST_ACKNOWLEDGE, - RESPST_CLEANUP, - RESPST_DUPLICATE_REQUEST, - RESPST_ERR_MALFORMED_WQE, - RESPST_ERR_UNSUPPORTED_OPCODE, - RESPST_ERR_MISALIGNED_ATOMIC, - RESPST_ERR_PSN_OUT_OF_SEQ, - RESPST_ERR_MISSING_OPCODE_FIRST, - RESPST_ERR_MISSING_OPCODE_LAST_C, - RESPST_ERR_MISSING_OPCODE_LAST_D1E, - RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, - RESPST_ERR_RNR, - RESPST_ERR_RKEY_VIOLATION, - RESPST_ERR_INVALIDATE_RKEY, - RESPST_ERR_LENGTH, - RESPST_ERR_CQ_OVERFLOW, - RESPST_ERROR, - RESPST_DONE, - RESPST_EXIT, -}; - void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index ccb9d19ffe8a..1058b5de8920 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -127,6 +127,43 @@ struct rxe_comp_info { struct rxe_task task; }; +/* responder states */ +enum resp_states { + RESPST_NONE, + RESPST_GET_REQ, + RESPST_CHK_PSN, + RESPST_CHK_OP_SEQ, + RESPST_CHK_OP_VALID, + RESPST_CHK_RESOURCE, + RESPST_CHK_LENGTH, + RESPST_CHK_RKEY, + RESPST_EXECUTE, + RESPST_READ_REPLY, + RESPST_ATOMIC_REPLY, + RESPST_ATOMIC_WRITE_REPLY, + RESPST_PROCESS_FLUSH, + RESPST_COMPLETE, + RESPST_ACKNOWLEDGE, + RESPST_CLEANUP, + RESPST_DUPLICATE_REQUEST, + RESPST_ERR_MALFORMED_WQE, + RESPST_ERR_UNSUPPORTED_OPCODE, + RESPST_ERR_MISALIGNED_ATOMIC, + RESPST_ERR_PSN_OUT_OF_SEQ, + RESPST_ERR_MISSING_OPCODE_FIRST, + RESPST_ERR_MISSING_OPCODE_LAST_C, + RESPST_ERR_MISSING_OPCODE_LAST_D1E, + RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, + RESPST_ERR_RNR, + RESPST_ERR_RKEY_VIOLATION, + RESPST_ERR_INVALIDATE_RKEY, + RESPST_ERR_LENGTH, + RESPST_ERR_CQ_OVERFLOW, + RESPST_ERROR, + RESPST_DONE, + RESPST_EXIT, +}; + enum rdatm_res_state { rdatm_res_state_next, rdatm_res_state_new, From patchwork Thu Nov 9 05:44:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0277C4332F for ; Thu, 9 Nov 2023 05:45:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232480AbjKIFpq (ORCPT ); Thu, 9 Nov 2023 00:45:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232462AbjKIFpp (ORCPT ); Thu, 9 Nov 2023 00:45:45 -0500 Received: from esa8.hc1455-7.c3s2.iphmx.com (esa8.hc1455-7.c3s2.iphmx.com [139.138.61.253]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8780426AF; Wed, 8 Nov 2023 21:45:42 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="127068098" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="127068098" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa8.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:39 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id A1870C14A1; Thu, 9 Nov 2023 14:45:37 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id D9B34D9C60; Thu, 9 Nov 2023 14:45:36 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 94AAE2005323; Thu, 9 Nov 2023 14:45:36 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 4/7] RDMA/rxe: Add page invalidation support Date: Thu, 9 Nov 2023 14:44:49 +0900 Message-Id: <94f8aa4a634936295ae5772d6df6978c3b5de268.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On page invalidation, an MMU notifier callback is invoked to unmap DMA addresses and update the driver page table(umem_odp->dma_list). It also sets the corresponding entries in MR xarray to NULL to prevent any access. The callback is registered when an ODP-enabled MR is created. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/Makefile | 2 + drivers/infiniband/sw/rxe/rxe_odp.c | 57 +++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..93134f1d1d0c 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -23,3 +23,5 @@ rdma_rxe-y := \ rxe_task.o \ rxe_net.o \ rxe_hw_counters.o + +rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c new file mode 100644 index 000000000000..ea55b79be0c6 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved. + */ + +#include + +#include + +#include "rxe.h" + +static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end) +{ + unsigned long upper = rxe_mr_iova_to_index(mr, end - 1); + unsigned long lower = rxe_mr_iova_to_index(mr, start); + void *entry; + + XA_STATE(xas, &mr->page_list, lower); + + /* make elements in xarray NULL */ + xas_lock(&xas); + xas_for_each(&xas, entry, upper) + xas_store(&xas, NULL); + xas_unlock(&xas); +} + +static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct ib_umem_odp *umem_odp = + container_of(mni, struct ib_umem_odp, notifier); + struct rxe_mr *mr = umem_odp->private; + unsigned long start, end; + + if (!mmu_notifier_range_blockable(range)) + return false; + + mutex_lock(&umem_odp->umem_mutex); + mmu_interval_set_seq(mni, cur_seq); + + start = max_t(u64, ib_umem_start(umem_odp), range->start); + end = min_t(u64, ib_umem_end(umem_odp), range->end); + + rxe_mr_unset_xarray(mr, start, end); + + /* update umem_odp->dma_list */ + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); + + mutex_unlock(&umem_odp->umem_mutex); + return true; +} + +const struct mmu_interval_notifier_ops rxe_mn_ops = { + .invalidate = rxe_ib_invalidate_range, +}; From patchwork Thu Nov 9 05:44:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450636 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E4B4C4167D for ; Thu, 9 Nov 2023 05:47:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232466AbjKIFrS (ORCPT ); Thu, 9 Nov 2023 00:47:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232537AbjKIFq6 (ORCPT ); Thu, 9 Nov 2023 00:46:58 -0500 Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89CFA2D72; Wed, 8 Nov 2023 21:46:46 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="118471671" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="118471671" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:42 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 80418CF1C3; Thu, 9 Nov 2023 14:45:38 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id C1E9FD20B6; Thu, 9 Nov 2023 14:45:37 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 8E2BE2005323; Thu, 9 Nov 2023 14:45:37 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Date: Thu, 9 Nov 2023 14:44:50 +0900 Message-Id: <5d46bd682aa8e3d5cabc38ca1cd67d2976f2731d.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Allow userspace to register an ODP-enabled MR, in which case the flag IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no RDMA operation enabled right now. They will be supported later in the subsequent two patches. rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs process address space from the CPU page table to the driver page table (dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page fault when pages being accessed are not present or do not have proper read/write permissions, and possibly to prefetch pages in the future. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 7 ++ drivers/infiniband/sw/rxe/rxe_loc.h | 14 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 +- drivers/infiniband/sw/rxe/rxe_odp.c | 122 ++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 15 +++- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- 6 files changed, 166 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 54c723a6edda..f2284d27229b 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -73,6 +73,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->ndev->dev_addr); rxe->max_ucontext = RXE_MAX_UCONTEXT; + + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; + + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + } } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index eb867f7d0d36..4bda154a0248 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -188,4 +188,18 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; } +/* rxe_odp.c */ +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr); +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +static inline int +rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + return -EOPNOTSUPP; +} + +#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 86b1908d304b..384cb4ba1f2d 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -318,7 +318,10 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, return err; } - return rxe_mr_copy_xarray(mr, iova, addr, length, dir); + if (mr->umem->is_odp) + return -EOPNOTSUPP; + else + return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } /* copy data in or out of a wqe, i.e. sg list @@ -527,6 +530,10 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) struct page *page; u64 *va; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state"); diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index ea55b79be0c6..c5e24901c141 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -9,6 +9,8 @@ #include "rxe.h" +#define RXE_ODP_WRITABLE_BIT 1UL + static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, unsigned long end) { @@ -25,6 +27,29 @@ static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, xas_unlock(&xas); } +static void rxe_mr_set_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end, unsigned long *pfn_list) +{ + unsigned long upper = rxe_mr_iova_to_index(mr, end - 1); + unsigned long lower = rxe_mr_iova_to_index(mr, start); + void *page, *entry; + + XA_STATE(xas, &mr->page_list, lower); + + xas_lock(&xas); + while (xas.xa_index <= upper) { + if (pfn_list[xas.xa_index] & HMM_PFN_WRITE) { + page = xa_tag_pointer(hmm_pfn_to_page(pfn_list[xas.xa_index]), + RXE_ODP_WRITABLE_BIT); + } else + page = hmm_pfn_to_page(pfn_list[xas.xa_index]); + + xas_store(&xas, page); + entry = xas_next(&xas); + } + xas_unlock(&xas); +} + static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_notifier_range *range, unsigned long cur_seq) @@ -55,3 +80,100 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_interval_notifier_ops rxe_mn_ops = { .invalidate = rxe_ib_invalidate_range, }; + +#define RXE_PAGEFAULT_RDONLY BIT(1) +#define RXE_PAGEFAULT_SNAPSHOT BIT(2) +static int rxe_odp_do_pagefault_and_lock(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); + u64 access_mask; + int np; + + access_mask = ODP_READ_ALLOWED_BIT; + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) + access_mask |= ODP_WRITE_ALLOWED_BIT; + + /* + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. + * Callers must release the lock later to let invalidation handler + * do its work again. + */ + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, + access_mask, fault); + if (np < 0) + return np; + + /* + * umem_mutex is still locked here, so we can use hmm_pfn_to_page() + * safely to fetch pages in the range. + */ + rxe_mr_set_xarray(mr, user_va, user_va + bcnt, umem_odp->pfn_list); + + return np; +} + +static int rxe_odp_init_pages(struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + int ret; + + ret = rxe_odp_do_pagefault_and_lock(mr, mr->umem->address, + mr->umem->length, + RXE_PAGEFAULT_SNAPSHOT); + + if (ret >= 0) + mutex_unlock(&umem_odp->umem_mutex); + + return ret >= 0 ? 0 : ret; +} + +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp; + int err; + + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return -EOPNOTSUPP; + + rxe_mr_init(access_flags, mr); + + xa_init(&mr->page_list); + + if (!start && length == U64_MAX) { + if (iova != 0) + return -EINVAL; + if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) + return -EINVAL; + + /* Never reach here, for implicit ODP is not implemented. */ + } + + umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags, + &rxe_mn_ops); + if (IS_ERR(umem_odp)) { + rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n", + (int)PTR_ERR(umem_odp)); + return PTR_ERR(umem_odp); + } + + umem_odp->private = mr; + + mr->umem = &umem_odp->umem; + mr->access = access_flags; + mr->ibmr.length = length; + mr->ibmr.iova = iova; + mr->page_offset = ib_umem_offset(&umem_odp->umem); + + err = rxe_odp_init_pages(mr); + if (err) { + ib_umem_odp_release(umem_odp); + return err; + } + + mr->state = RXE_MR_STATE_VALID; + mr->ibmr.type = IB_MR_TYPE_USER; + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 969e057bbfd1..9159f1bdfc6f 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -635,6 +635,10 @@ static enum resp_states process_flush(struct rxe_qp *qp, struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* oA19-14, oA19-15 */ if (res && res->replay) return RESPST_ACKNOWLEDGE; @@ -688,10 +692,13 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, if (!res->replay) { u64 iova = qp->resp.va + qp->resp.offset; - err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, - atmeth_comp(pkt), - atmeth_swap_add(pkt), - &res->atomic.orig_val); + if (mr->umem->is_odp) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); if (err) return err; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 48f86839d36a..192ad835c712 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1278,7 +1278,10 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start, mr->ibmr.pd = ibpd; mr->ibmr.device = ibpd->device; - err = rxe_mr_init_user(rxe, start, length, iova, access, mr); + if (access & IB_ACCESS_ON_DEMAND) + err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr); + else + err = rxe_mr_init_user(rxe, start, length, iova, access, mr); if (err) { rxe_dbg_mr(mr, "reg_user_mr failed, err = %d", err); goto err_cleanup; From patchwork Thu Nov 9 05:44:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D2E0C4332F for ; Thu, 9 Nov 2023 05:45:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232475AbjKIFpy (ORCPT ); Thu, 9 Nov 2023 00:45:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232517AbjKIFps (ORCPT ); Thu, 9 Nov 2023 00:45:48 -0500 Received: from esa12.hc1455-7.c3s2.iphmx.com (esa12.hc1455-7.c3s2.iphmx.com [139.138.37.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2937F26B9; Wed, 8 Nov 2023 21:45:44 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="118449468" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="118449468" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa12.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:41 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id C1D48DC146; Thu, 9 Nov 2023 14:45:39 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id ED58CD9C64; Thu, 9 Nov 2023 14:45:38 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id A8AF72005323; Thu, 9 Nov 2023 14:45:38 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Date: Thu, 9 Nov 2023 14:44:51 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses it to load payloads of requesting packets; responder uses it to process Send, Write, and Read operaetions; completer uses it to copy data from response packets of Read and Atomic operations to a user MR. Allow these operations to be used with ODP by adding a subordinate function rxe_odp_mr_copy(). It is comprised of the following steps: 1. Check page presence and R/W permission. 2. If OK, just execute data copy to/from the pages and exit. 3. Otherwise, trigger page fault to map the pages. 4. Update the MR xarray using PFNs in umem_odp->pfn_list. 5. Execute data copy to/from the pages. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 10 ++++ drivers/infiniband/sw/rxe/rxe_loc.h | 8 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++- drivers/infiniband/sw/rxe/rxe_odp.c | 77 +++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index f2284d27229b..207a022156f0 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -79,6 +79,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe) /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; + + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4bda154a0248..eeaeff8a1398 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -192,6 +192,8 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -199,6 +201,12 @@ rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 384cb4ba1f2d..f0ce87c0fc7d 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -247,7 +247,12 @@ int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, void *va; while (length) { - page = xa_load(&mr->page_list, index); + if (mr->umem->is_odp) + page = xa_untag_pointer(xa_load(&mr->page_list, + index)); + else + page = xa_load(&mr->page_list, index); + if (!page) return -EFAULT; @@ -319,7 +324,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, } if (mr->umem->is_odp) - return -EOPNOTSUPP; + return rxe_odp_mr_copy(mr, iova, addr, length, dir); else return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index c5e24901c141..5aa09b9c1095 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -177,3 +177,80 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, return err; } + +/* Take xarray spinlock before entry */ +static inline bool rxe_odp_check_pages(struct rxe_mr *mr, u64 iova, + int length, u32 flags) +{ + unsigned long upper = rxe_mr_iova_to_index(mr, iova + length - 1); + unsigned long lower = rxe_mr_iova_to_index(mr, iova); + bool need_fault = false; + void *page, *entry; + size_t perm = 0; + + + if (!(flags & RXE_PAGEFAULT_RDONLY)) + perm = RXE_ODP_WRITABLE_BIT; + + XA_STATE(xas, &mr->page_list, lower); + + while (xas.xa_index <= upper) { + page = xas_load(&xas); + + /* Check page presence and write permission */ + if (!page || (perm && !(xa_pointer_tag(page) & perm))) { + need_fault = true; + break; + } + entry = xas_next(&xas); + } + + return need_fault; +} + +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u32 flags = 0; + int err; + + if (unlikely(!mr->umem->is_odp)) + return -EOPNOTSUPP; + + switch (dir) { + case RXE_TO_MR_OBJ: + break; + + case RXE_FROM_MR_OBJ: + flags = RXE_PAGEFAULT_RDONLY; + break; + + default: + return -EINVAL; + } + + spin_lock(&mr->page_list.xa_lock); + + if (rxe_odp_check_pages(mr, iova, length, flags)) { + spin_unlock(&mr->page_list.xa_lock); + + /* umem_mutex is locked on success */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, length, flags); + if (err < 0) + return err; + + /* + * The spinlock is always locked under mutex_lock except + * for MR initialization. No worry about deadlock. + */ + spin_lock(&mr->page_list.xa_lock); + mutex_unlock(&umem_odp->umem_mutex); + } + + err = rxe_mr_copy_xarray(mr, iova, addr, length, dir); + + spin_unlock(&mr->page_list.xa_lock); + + return err; +} From patchwork Thu Nov 9 05:44:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 13450635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42B91C4167B for ; Thu, 9 Nov 2023 05:47:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232613AbjKIFrR (ORCPT ); Thu, 9 Nov 2023 00:47:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232589AbjKIFq7 (ORCPT ); Thu, 9 Nov 2023 00:46:59 -0500 X-Greylist: delayed 65 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Nov 2023 21:46:50 PST Received: from esa5.hc1455-7.c3s2.iphmx.com (esa5.hc1455-7.c3s2.iphmx.com [68.232.139.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D60002709; Wed, 8 Nov 2023 21:46:50 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="138532744" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="138532744" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa5.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:44 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 11D7CDC146; Thu, 9 Nov 2023 14:45:41 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id 4537FD9C60; Thu, 9 Nov 2023 14:45:40 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 006152005323; Thu, 9 Nov 2023 14:45:39 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 7/7] RDMA/rxe: Add support for the traditional Atomic operations with ODP Date: Thu, 9 Nov 2023 14:44:52 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Enable 'fetch and add' and 'compare and swap' operations to be used with ODP. This is comprised of the following steps: 1. Verify that the page is present with write permission. 2. If OK, execute the operation and exit. 3. If not, then trigger page fault to map the page. 4. Update the entry in the MR xarray. 5. Execute the operation. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 9 ++++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 7 +++++- drivers/infiniband/sw/rxe/rxe_odp.c | 33 ++++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 5 ++++- 5 files changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 207a022156f0..abd3267c2873 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -88,6 +88,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index eeaeff8a1398..0bae9044f362 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -194,6 +194,9 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val); + #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -207,6 +210,12 @@ rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + return RESPST_ERR_UNSUPPORTED_OPCODE; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index f0ce87c0fc7d..0dc452ab772b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -498,7 +498,12 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, } page_offset = rxe_mr_iova_to_page_offset(mr, iova); index = rxe_mr_iova_to_index(mr, iova); - page = xa_load(&mr->page_list, index); + + if (mr->umem->is_odp) + page = xa_untag_pointer(xa_load(&mr->page_list, index)); + else + page = xa_load(&mr->page_list, index); + if (!page) return RESPST_ERR_RKEY_VIOLATION; } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 5aa09b9c1095..45b54ba15210 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -254,3 +254,36 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } + +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + int err; + + spin_lock(&mr->page_list.xa_lock); + + /* Atomic operations manipulate a single char. */ + if (rxe_odp_check_pages(mr, iova, sizeof(char), 0)) { + spin_unlock(&mr->page_list.xa_lock); + + /* umem_mutex is locked on success */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, sizeof(char), 0); + if (err < 0) + return err; + + /* + * The spinlock is always locked under mutex_lock except + * for MR initialization. No worry about deadlock. + */ + spin_lock(&mr->page_list.xa_lock); + mutex_unlock(&umem_odp->umem_mutex); + } + + err = rxe_mr_do_atomic_op(mr, iova, opcode, compare, + swap_add, orig_val); + + spin_unlock(&mr->page_list.xa_lock); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 9159f1bdfc6f..af3e669679a0 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -693,7 +693,10 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, u64 iova = qp->resp.va + qp->resp.offset; if (mr->umem->is_odp) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_mr_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); else err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, atmeth_comp(pkt),