From patchwork Fri Jan 14 05:48:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Lu X-Patchwork-Id: 12713400 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6033C433EF for ; Fri, 14 Jan 2022 05:49:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239198AbiANFs7 (ORCPT ); Fri, 14 Jan 2022 00:48:59 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:57225 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236473AbiANFs5 (ORCPT ); Fri, 14 Jan 2022 00:48:57 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R771e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=tonylu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0V1nM4f-_1642139335; Received: from localhost(mailfrom:tonylu@linux.alibaba.com fp:SMTPD_---0V1nM4f-_1642139335) by smtp.aliyun-inc.com(127.0.0.1); Fri, 14 Jan 2022 13:48:55 +0800 From: Tony Lu To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org Subject: [RFC PATCH net-next 1/6] net/smc: Spread CQs to differents completion vectors Date: Fri, 14 Jan 2022 13:48:47 +0800 Message-Id: <20220114054852.38058-2-tonylu@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220114054852.38058-1-tonylu@linux.alibaba.com> References: <20220114054852.38058-1-tonylu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This spreads recv/send CQs to different vectors. This removes the limitation of single vector, which binds to some CPU. Signed-off-by: Tony Lu --- net/smc/smc_ib.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index a3e2d3b89568..d1f337522bd5 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -823,6 +823,9 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) smcibdev->roce_cq_send = NULL; goto out; } + /* spread to different completion vector */ + if (smcibdev->ibdev->num_comp_vectors > 1) + cqattr.comp_vector = 1; smcibdev->roce_cq_recv = ib_create_cq(smcibdev->ibdev, smc_wr_rx_cq_handler, NULL, smcibdev, &cqattr); From patchwork Fri Jan 14 05:48:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Lu X-Patchwork-Id: 12713401 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 165D7C433FE for ; Fri, 14 Jan 2022 05:49:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239210AbiANFtA (ORCPT ); Fri, 14 Jan 2022 00:49:00 -0500 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:49600 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231322AbiANFs6 (ORCPT ); Fri, 14 Jan 2022 00:48:58 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=tonylu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0V1nD.Lp_1642139336; Received: from localhost(mailfrom:tonylu@linux.alibaba.com fp:SMTPD_---0V1nD.Lp_1642139336) by smtp.aliyun-inc.com(127.0.0.1); Fri, 14 Jan 2022 13:48:56 +0800 From: Tony Lu To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org Subject: [RFC PATCH net-next 2/6] net/smc: Prepare for multiple CQs per IB devices Date: Fri, 14 Jan 2022 13:48:48 +0800 Message-Id: <20220114054852.38058-3-tonylu@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220114054852.38058-1-tonylu@linux.alibaba.com> References: <20220114054852.38058-1-tonylu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This introduces load of completion vector helper. During setup progress of IB device, it helps pick up the least used vector of current device. Only one CQ and two vectors are needed, so it is no practical use right now. This prepares for multiple CQs support. Signed-off-by: Tony Lu --- net/smc/smc_ib.c | 48 ++++++++++++++++++++++++++++++++++++++++-------- net/smc/smc_ib.h | 1 + 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index d1f337522bd5..9a162810ed8c 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -625,6 +625,28 @@ int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } +static int smc_ib_get_least_used_vector(struct smc_ib_device *smcibdev) +{ + int min = smcibdev->vector_load[0]; + int i, index = 0; + + /* use it from the beginning of vectors */ + for (i = 0; i < smcibdev->ibdev->num_comp_vectors; i++) { + if (smcibdev->vector_load[i] < min) { + index = i; + min = smcibdev->vector_load[i]; + } + } + + smcibdev->vector_load[index]++; + return index; +} + +static void smc_ib_put_vector(struct smc_ib_device *smcibdev, int index) +{ + smcibdev->vector_load[index]--; +} + static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv) { struct smc_link *lnk = (struct smc_link *)priv; @@ -801,8 +823,8 @@ void smc_ib_buf_unmap_sg(struct smc_link *lnk, long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) { - struct ib_cq_init_attr cqattr = { - .cqe = SMC_MAX_CQE, .comp_vector = 0 }; + struct ib_cq_init_attr cqattr = { .cqe = SMC_MAX_CQE }; + int cq_send_vector, cq_recv_vector; int cqe_size_order, smc_order; long rc; @@ -815,31 +837,35 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) smc_order = MAX_ORDER - cqe_size_order - 1; if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE) cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2; + cq_send_vector = smc_ib_get_least_used_vector(smcibdev); + cqattr.comp_vector = cq_send_vector; smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev, smc_wr_tx_cq_handler, NULL, smcibdev, &cqattr); rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_send); if (IS_ERR(smcibdev->roce_cq_send)) { smcibdev->roce_cq_send = NULL; - goto out; + goto err_send; } - /* spread to different completion vector */ - if (smcibdev->ibdev->num_comp_vectors > 1) - cqattr.comp_vector = 1; + cq_recv_vector = smc_ib_get_least_used_vector(smcibdev); + cqattr.comp_vector = cq_recv_vector; smcibdev->roce_cq_recv = ib_create_cq(smcibdev->ibdev, smc_wr_rx_cq_handler, NULL, smcibdev, &cqattr); rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_recv); if (IS_ERR(smcibdev->roce_cq_recv)) { smcibdev->roce_cq_recv = NULL; - goto err; + goto err_recv; } smc_wr_add_dev(smcibdev); smcibdev->initialized = 1; goto out; -err: +err_recv: + smc_ib_put_vector(smcibdev, cq_recv_vector); ib_destroy_cq(smcibdev->roce_cq_send); +err_send: + smc_ib_put_vector(smcibdev, cq_send_vector); out: mutex_unlock(&smcibdev->mutex); return rc; @@ -928,6 +954,11 @@ static int smc_ib_add_dev(struct ib_device *ibdev) INIT_IB_EVENT_HANDLER(&smcibdev->event_handler, smcibdev->ibdev, smc_ib_global_event_handler); ib_register_event_handler(&smcibdev->event_handler); + /* vector's load per ib device */ + smcibdev->vector_load = kcalloc(ibdev->num_comp_vectors, + sizeof(int), GFP_KERNEL); + if (!smcibdev->vector_load) + return -ENOMEM; /* trigger reading of the port attributes */ port_cnt = smcibdev->ibdev->phys_port_cnt; @@ -968,6 +999,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) smc_ib_cleanup_per_ibdev(smcibdev); ib_unregister_event_handler(&smcibdev->event_handler); cancel_work_sync(&smcibdev->port_event_work); + kfree(smcibdev->vector_load); kfree(smcibdev); } diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 5d8b49c57f50..a748b74e56e6 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -57,6 +57,7 @@ struct smc_ib_device { /* ib-device infos for smc */ atomic_t lnk_cnt_by_port[SMC_MAX_PORTS]; /* number of links per port */ int ndev_ifidx[SMC_MAX_PORTS]; /* ndev if indexes */ + int *vector_load; /* load of all completion vectors */ }; static inline __be32 smc_ib_gid_to_ipv4(u8 gid[SMC_GID_SIZE]) From patchwork Fri Jan 14 05:48:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Lu X-Patchwork-Id: 12713403 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE1B7C433FE for ; Fri, 14 Jan 2022 05:49:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239235AbiANFtF (ORCPT ); Fri, 14 Jan 2022 00:49:05 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:37464 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236473AbiANFtA (ORCPT ); Fri, 14 Jan 2022 00:49:00 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=tonylu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0V1nMT1l_1642139337; Received: from localhost(mailfrom:tonylu@linux.alibaba.com fp:SMTPD_---0V1nMT1l_1642139337) by smtp.aliyun-inc.com(127.0.0.1); Fri, 14 Jan 2022 13:48:57 +0800 From: Tony Lu To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org Subject: [RFC PATCH net-next 3/6] net/smc: Introduce smc_ib_cq to bind link and cq Date: Fri, 14 Jan 2022 13:48:49 +0800 Message-Id: <20220114054852.38058-4-tonylu@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220114054852.38058-1-tonylu@linux.alibaba.com> References: <20220114054852.38058-1-tonylu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This patch introduces struct smc_ib_cq as a medium between smc_link and ib_cq. Every smc_link can access ib_cq from their own, and unbinds smc_link from smc_ib_device. This allows flexible mapping, prepares for multiple CQs support. Signed-off-by: Tony Lu --- net/smc/smc_core.h | 2 ++ net/smc/smc_ib.c | 52 +++++++++++++++++++++++++++++++++------------- net/smc/smc_ib.h | 14 +++++++++---- net/smc/smc_wr.c | 34 +++++++++++++++--------------- 4 files changed, 67 insertions(+), 35 deletions(-) diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 521c64a3d8d3..fd10cad8fb77 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -86,6 +86,8 @@ struct smc_link { struct ib_pd *roce_pd; /* IB protection domain, * unique for every RoCE QP */ + struct smc_ib_cq *smcibcq_recv; /* cq for recv */ + struct smc_ib_cq *smcibcq_send; /* cq for send */ struct ib_qp *roce_qp; /* IB queue pair */ struct ib_qp_attr qp_attr; /* IB queue pair attributes */ diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 9a162810ed8c..b08b9af4c156 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -133,7 +133,7 @@ int smc_ib_ready_link(struct smc_link *lnk) if (rc) goto out; smc_wr_remember_qp_attr(lnk); - rc = ib_req_notify_cq(lnk->smcibdev->roce_cq_recv, + rc = ib_req_notify_cq(lnk->smcibcq_recv->roce_cq, IB_CQ_SOLICITED_MASK); if (rc) goto out; @@ -672,6 +672,8 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk) { if (lnk->roce_qp) ib_destroy_qp(lnk->roce_qp); + lnk->smcibcq_send = NULL; + lnk->smcibcq_recv = NULL; lnk->roce_qp = NULL; } @@ -682,8 +684,8 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) struct ib_qp_init_attr qp_attr = { .event_handler = smc_ib_qp_event_handler, .qp_context = lnk, - .send_cq = lnk->smcibdev->roce_cq_send, - .recv_cq = lnk->smcibdev->roce_cq_recv, + .send_cq = lnk->smcibdev->roce_cq_send->roce_cq, + .recv_cq = lnk->smcibdev->roce_cq_recv->roce_cq, .srq = NULL, .cap = { /* include unsolicited rdma_writes as well, @@ -701,10 +703,13 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) lnk->roce_qp = ib_create_qp(lnk->roce_pd, &qp_attr); rc = PTR_ERR_OR_ZERO(lnk->roce_qp); - if (IS_ERR(lnk->roce_qp)) + if (IS_ERR(lnk->roce_qp)) { lnk->roce_qp = NULL; - else + } else { + lnk->smcibcq_send = lnk->smcibdev->roce_cq_send; + lnk->smcibcq_recv = lnk->smcibdev->roce_cq_recv; smc_wr_remember_qp_attr(lnk); + } return rc; } @@ -824,6 +829,7 @@ void smc_ib_buf_unmap_sg(struct smc_link *lnk, long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) { struct ib_cq_init_attr cqattr = { .cqe = SMC_MAX_CQE }; + struct smc_ib_cq *smcibcq_send, *smcibcq_recv; int cq_send_vector, cq_recv_vector; int cqe_size_order, smc_order; long rc; @@ -837,34 +843,52 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) smc_order = MAX_ORDER - cqe_size_order - 1; if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE) cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2; + smcibcq_send = kmalloc(sizeof(*smcibcq_send), GFP_KERNEL); + if (!smcibcq_send) { + rc = -ENOMEM; + goto out; + } cq_send_vector = smc_ib_get_least_used_vector(smcibdev); + smcibcq_send->smcibdev = smcibdev; + smcibcq_send->is_send = 1; cqattr.comp_vector = cq_send_vector; - smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev, - smc_wr_tx_cq_handler, NULL, - smcibdev, &cqattr); + smcibcq_send->roce_cq = ib_create_cq(smcibdev->ibdev, + smc_wr_tx_cq_handler, NULL, + smcibcq_send, &cqattr); rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_send); if (IS_ERR(smcibdev->roce_cq_send)) { smcibdev->roce_cq_send = NULL; goto err_send; } + smcibdev->roce_cq_send = smcibcq_send; + smcibcq_recv = kmalloc(sizeof(*smcibcq_recv), GFP_KERNEL); + if (!smcibcq_recv) { + rc = -ENOMEM; + goto err_send; + } cq_recv_vector = smc_ib_get_least_used_vector(smcibdev); + smcibcq_recv->smcibdev = smcibdev; + smcibcq_recv->is_send = 0; cqattr.comp_vector = cq_recv_vector; - smcibdev->roce_cq_recv = ib_create_cq(smcibdev->ibdev, - smc_wr_rx_cq_handler, NULL, - smcibdev, &cqattr); + smcibcq_recv->roce_cq = ib_create_cq(smcibdev->ibdev, + smc_wr_rx_cq_handler, NULL, + smcibcq_recv, &cqattr); rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_recv); if (IS_ERR(smcibdev->roce_cq_recv)) { smcibdev->roce_cq_recv = NULL; goto err_recv; } + smcibdev->roce_cq_recv = smcibcq_recv; smc_wr_add_dev(smcibdev); smcibdev->initialized = 1; goto out; err_recv: + kfree(smcibcq_recv); smc_ib_put_vector(smcibdev, cq_recv_vector); - ib_destroy_cq(smcibdev->roce_cq_send); + ib_destroy_cq(smcibcq_send->roce_cq); err_send: + kfree(smcibcq_send); smc_ib_put_vector(smcibdev, cq_send_vector); out: mutex_unlock(&smcibdev->mutex); @@ -877,8 +901,8 @@ static void smc_ib_cleanup_per_ibdev(struct smc_ib_device *smcibdev) if (!smcibdev->initialized) goto out; smcibdev->initialized = 0; - ib_destroy_cq(smcibdev->roce_cq_recv); - ib_destroy_cq(smcibdev->roce_cq_send); + ib_destroy_cq(smcibdev->roce_cq_recv->roce_cq); + ib_destroy_cq(smcibdev->roce_cq_send->roce_cq); smc_wr_remove_dev(smcibdev); out: mutex_unlock(&smcibdev->mutex); diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index a748b74e56e6..5b34274ecf47 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -32,15 +32,21 @@ struct smc_ib_devices { /* list of smc ib devices definition */ extern struct smc_ib_devices smc_ib_devices; /* list of smc ib devices */ extern struct smc_lgr_list smc_lgr_list; /* list of linkgroups */ +struct smc_ib_cq { /* ib_cq wrapper for smc */ + struct list_head list; + struct smc_ib_device *smcibdev; /* parent ib device */ + struct ib_cq *roce_cq; /* real ib_cq for link */ + struct tasklet_struct tasklet; /* tasklet for wr */ + bool is_send; /* send for recv cq */ +}; + struct smc_ib_device { /* ib-device infos for smc */ struct list_head list; struct ib_device *ibdev; struct ib_port_attr pattr[SMC_MAX_PORTS]; /* ib dev. port attrs */ struct ib_event_handler event_handler; /* global ib_event handler */ - struct ib_cq *roce_cq_send; /* send completion queue */ - struct ib_cq *roce_cq_recv; /* recv completion queue */ - struct tasklet_struct send_tasklet; /* called by send cq handler */ - struct tasklet_struct recv_tasklet; /* called by recv cq handler */ + struct smc_ib_cq *roce_cq_send; /* send completion queue */ + struct smc_ib_cq *roce_cq_recv; /* recv completion queue */ char mac[SMC_MAX_PORTS][ETH_ALEN]; /* mac address per port*/ u8 pnetid[SMC_MAX_PORTS][SMC_MAX_PNETID_LEN]; diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 24be1d03fef9..011435efb65b 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -135,7 +135,7 @@ static inline void smc_wr_tx_process_cqe(struct ib_wc *wc) static void smc_wr_tx_tasklet_fn(struct tasklet_struct *t) { - struct smc_ib_device *dev = from_tasklet(dev, t, send_tasklet); + struct smc_ib_cq *smcibcq = from_tasklet(smcibcq, t, tasklet); struct ib_wc wc[SMC_WR_MAX_POLL_CQE]; int i = 0, rc; int polled = 0; @@ -144,9 +144,9 @@ static void smc_wr_tx_tasklet_fn(struct tasklet_struct *t) polled++; do { memset(&wc, 0, sizeof(wc)); - rc = ib_poll_cq(dev->roce_cq_send, SMC_WR_MAX_POLL_CQE, wc); + rc = ib_poll_cq(smcibcq->roce_cq, SMC_WR_MAX_POLL_CQE, wc); if (polled == 1) { - ib_req_notify_cq(dev->roce_cq_send, + ib_req_notify_cq(smcibcq->roce_cq, IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS); } @@ -161,9 +161,9 @@ static void smc_wr_tx_tasklet_fn(struct tasklet_struct *t) void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context) { - struct smc_ib_device *dev = (struct smc_ib_device *)cq_context; + struct smc_ib_cq *smcibcq = (struct smc_ib_cq *)cq_context; - tasklet_schedule(&dev->send_tasklet); + tasklet_schedule(&smcibcq->tasklet); } /*---------------------------- request submission ---------------------------*/ @@ -306,7 +306,7 @@ int smc_wr_tx_send(struct smc_link *link, struct smc_wr_tx_pend_priv *priv) struct smc_wr_tx_pend *pend; int rc; - ib_req_notify_cq(link->smcibdev->roce_cq_send, + ib_req_notify_cq(link->smcibcq_send->roce_cq, IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS); pend = container_of(priv, struct smc_wr_tx_pend, priv); rc = ib_post_send(link->roce_qp, &link->wr_tx_ibs[pend->idx], NULL); @@ -323,7 +323,7 @@ int smc_wr_tx_v2_send(struct smc_link *link, struct smc_wr_tx_pend_priv *priv, int rc; link->wr_tx_v2_ib->sg_list[0].length = len; - ib_req_notify_cq(link->smcibdev->roce_cq_send, + ib_req_notify_cq(link->smcibcq_send->roce_cq, IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS); rc = ib_post_send(link->roce_qp, link->wr_tx_v2_ib, NULL); if (rc) { @@ -367,7 +367,7 @@ int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr) { int rc; - ib_req_notify_cq(link->smcibdev->roce_cq_send, + ib_req_notify_cq(link->smcibcq_send->roce_cq, IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS); link->wr_reg_state = POSTED; link->wr_reg.wr.wr_id = (u64)(uintptr_t)mr; @@ -476,7 +476,7 @@ static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num) static void smc_wr_rx_tasklet_fn(struct tasklet_struct *t) { - struct smc_ib_device *dev = from_tasklet(dev, t, recv_tasklet); + struct smc_ib_cq *smcibcq = from_tasklet(smcibcq, t, tasklet); struct ib_wc wc[SMC_WR_MAX_POLL_CQE]; int polled = 0; int rc; @@ -485,9 +485,9 @@ static void smc_wr_rx_tasklet_fn(struct tasklet_struct *t) polled++; do { memset(&wc, 0, sizeof(wc)); - rc = ib_poll_cq(dev->roce_cq_recv, SMC_WR_MAX_POLL_CQE, wc); + rc = ib_poll_cq(smcibcq->roce_cq, SMC_WR_MAX_POLL_CQE, wc); if (polled == 1) { - ib_req_notify_cq(dev->roce_cq_recv, + ib_req_notify_cq(smcibcq->roce_cq, IB_CQ_SOLICITED_MASK | IB_CQ_REPORT_MISSED_EVENTS); } @@ -501,9 +501,9 @@ static void smc_wr_rx_tasklet_fn(struct tasklet_struct *t) void smc_wr_rx_cq_handler(struct ib_cq *ib_cq, void *cq_context) { - struct smc_ib_device *dev = (struct smc_ib_device *)cq_context; + struct smc_ib_cq *smcibcq = (struct smc_ib_cq *)cq_context; - tasklet_schedule(&dev->recv_tasklet); + tasklet_schedule(&smcibcq->tasklet); } int smc_wr_rx_post_init(struct smc_link *link) @@ -830,14 +830,14 @@ int smc_wr_alloc_link_mem(struct smc_link *link) void smc_wr_remove_dev(struct smc_ib_device *smcibdev) { - tasklet_kill(&smcibdev->recv_tasklet); - tasklet_kill(&smcibdev->send_tasklet); + tasklet_kill(&smcibdev->roce_cq_recv->tasklet); + tasklet_kill(&smcibdev->roce_cq_send->tasklet); } void smc_wr_add_dev(struct smc_ib_device *smcibdev) { - tasklet_setup(&smcibdev->recv_tasklet, smc_wr_rx_tasklet_fn); - tasklet_setup(&smcibdev->send_tasklet, smc_wr_tx_tasklet_fn); + tasklet_setup(&smcibdev->roce_cq_recv->tasklet, smc_wr_rx_tasklet_fn); + tasklet_setup(&smcibdev->roce_cq_send->tasklet, smc_wr_tx_tasklet_fn); } int smc_wr_create_link(struct smc_link *lnk) From patchwork Fri Jan 14 05:48:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Lu X-Patchwork-Id: 12713402 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CB21C433F5 for ; Fri, 14 Jan 2022 05:49:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239227AbiANFtD (ORCPT ); Fri, 14 Jan 2022 00:49:03 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:58375 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236524AbiANFtA (ORCPT ); Fri, 14 Jan 2022 00:49:00 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=tonylu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0V1nM4fQ_1642139338; Received: from localhost(mailfrom:tonylu@linux.alibaba.com fp:SMTPD_---0V1nM4fQ_1642139338) by smtp.aliyun-inc.com(127.0.0.1); Fri, 14 Jan 2022 13:48:58 +0800 From: Tony Lu To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org Subject: [RFC PATCH net-next 4/6] net/smc: Multiple CQs per IB devices Date: Fri, 14 Jan 2022 13:48:50 +0800 Message-Id: <20220114054852.38058-5-tonylu@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220114054852.38058-1-tonylu@linux.alibaba.com> References: <20220114054852.38058-1-tonylu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This patch allows multiple CQs for one IB device, compared to one CQ now. During IB device setup, it would initialize ibdev->num_comp_vectors amount of send/recv CQs, and the corresponding tasklets, like queues for net devices. Every smc_link has their own send and recv CQs, which always assigning from the least used CQs of current IB device. Signed-off-by: Tony Lu --- net/smc/smc_ib.c | 165 +++++++++++++++++++++++++++++++++-------------- net/smc/smc_ib.h | 6 +- net/smc/smc_wr.c | 16 +++-- 3 files changed, 132 insertions(+), 55 deletions(-) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index b08b9af4c156..19c49184cd03 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -647,6 +647,38 @@ static void smc_ib_put_vector(struct smc_ib_device *smcibdev, int index) smcibdev->vector_load[index]--; } +static struct smc_ib_cq *smc_ib_get_least_used_cq(struct smc_ib_device *smcibdev, + bool is_send) +{ + struct smc_ib_cq *smcibcq, *cq; + struct list_head *head; + int min; + + if (is_send) + head = &smcibdev->smcibcq_send; + else + head = &smcibdev->smcibcq_recv; + cq = list_first_entry(head, struct smc_ib_cq, list); + min = cq->load; + + list_for_each_entry(smcibcq, head, list) { + if (smcibcq->load < min) { + cq = smcibcq; + min = cq->load; + } + } + if (!cq) + cq = smcibcq; + + cq->load++; + return cq; +} + +static void smc_ib_put_cq(struct smc_ib_cq *smcibcq) +{ + smcibcq->load--; +} + static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv) { struct smc_link *lnk = (struct smc_link *)priv; @@ -670,8 +702,11 @@ static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv) void smc_ib_destroy_queue_pair(struct smc_link *lnk) { - if (lnk->roce_qp) + if (lnk->roce_qp) { ib_destroy_qp(lnk->roce_qp); + smc_ib_put_cq(lnk->smcibcq_send); + smc_ib_put_cq(lnk->smcibcq_recv); + } lnk->smcibcq_send = NULL; lnk->smcibcq_recv = NULL; lnk->roce_qp = NULL; @@ -680,12 +715,16 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk) /* create a queue pair within the protection domain for a link */ int smc_ib_create_queue_pair(struct smc_link *lnk) { + struct smc_ib_cq *smcibcq_send = smc_ib_get_least_used_cq(lnk->smcibdev, + true); + struct smc_ib_cq *smcibcq_recv = smc_ib_get_least_used_cq(lnk->smcibdev, + false); int sges_per_buf = (lnk->lgr->smc_version == SMC_V2) ? 2 : 1; struct ib_qp_init_attr qp_attr = { .event_handler = smc_ib_qp_event_handler, .qp_context = lnk, - .send_cq = lnk->smcibdev->roce_cq_send->roce_cq, - .recv_cq = lnk->smcibdev->roce_cq_recv->roce_cq, + .send_cq = smcibcq_send->roce_cq, + .recv_cq = smcibcq_recv->roce_cq, .srq = NULL, .cap = { /* include unsolicited rdma_writes as well, @@ -706,8 +745,8 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) if (IS_ERR(lnk->roce_qp)) { lnk->roce_qp = NULL; } else { - lnk->smcibcq_send = lnk->smcibdev->roce_cq_send; - lnk->smcibcq_recv = lnk->smcibdev->roce_cq_recv; + lnk->smcibcq_send = smcibcq_send; + lnk->smcibcq_recv = smcibcq_recv; smc_wr_remember_qp_attr(lnk); } return rc; @@ -826,6 +865,24 @@ void smc_ib_buf_unmap_sg(struct smc_link *lnk, buf_slot->sgt[lnk->link_idx].sgl->dma_address = 0; } +static void smc_ib_cleanup_cq(struct smc_ib_device *smcibdev) +{ + struct smc_ib_cq *smcibcq, *cq; + + list_for_each_entry_safe(smcibcq, cq, &smcibdev->smcibcq_send, list) { + list_del(&smcibcq->list); + ib_destroy_cq(smcibcq->roce_cq); + smc_ib_put_vector(smcibdev, smcibcq->comp_vector); + kfree(smcibcq); + } + list_for_each_entry_safe(smcibcq, cq, &smcibdev->smcibcq_recv, list) { + list_del(&smcibcq->list); + ib_destroy_cq(smcibcq->roce_cq); + smc_ib_put_vector(smcibdev, smcibcq->comp_vector); + kfree(smcibcq); + } +} + long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) { struct ib_cq_init_attr cqattr = { .cqe = SMC_MAX_CQE }; @@ -833,6 +890,7 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) int cq_send_vector, cq_recv_vector; int cqe_size_order, smc_order; long rc; + int i; mutex_lock(&smcibdev->mutex); rc = 0; @@ -843,53 +901,61 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) smc_order = MAX_ORDER - cqe_size_order - 1; if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE) cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2; - smcibcq_send = kmalloc(sizeof(*smcibcq_send), GFP_KERNEL); - if (!smcibcq_send) { - rc = -ENOMEM; - goto out; - } - cq_send_vector = smc_ib_get_least_used_vector(smcibdev); - smcibcq_send->smcibdev = smcibdev; - smcibcq_send->is_send = 1; - cqattr.comp_vector = cq_send_vector; - smcibcq_send->roce_cq = ib_create_cq(smcibdev->ibdev, - smc_wr_tx_cq_handler, NULL, - smcibcq_send, &cqattr); - rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_send); - if (IS_ERR(smcibdev->roce_cq_send)) { - smcibdev->roce_cq_send = NULL; - goto err_send; - } - smcibdev->roce_cq_send = smcibcq_send; - smcibcq_recv = kmalloc(sizeof(*smcibcq_recv), GFP_KERNEL); - if (!smcibcq_recv) { - rc = -ENOMEM; - goto err_send; - } - cq_recv_vector = smc_ib_get_least_used_vector(smcibdev); - smcibcq_recv->smcibdev = smcibdev; - smcibcq_recv->is_send = 0; - cqattr.comp_vector = cq_recv_vector; - smcibcq_recv->roce_cq = ib_create_cq(smcibdev->ibdev, - smc_wr_rx_cq_handler, NULL, - smcibcq_recv, &cqattr); - rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_recv); - if (IS_ERR(smcibdev->roce_cq_recv)) { - smcibdev->roce_cq_recv = NULL; - goto err_recv; + + /* initialize send/recv CQs */ + for (i = 0; i < smcibdev->ibdev->num_comp_vectors; i++) { + /* initialize send CQ */ + smcibcq_send = kmalloc(sizeof(*smcibcq_send), GFP_KERNEL); + if (!smcibcq_send) { + rc = -ENOMEM; + goto err; + } + cq_send_vector = smc_ib_get_least_used_vector(smcibdev); + smcibcq_send->smcibdev = smcibdev; + smcibcq_send->load = 0; + smcibcq_send->is_send = 1; + smcibcq_send->comp_vector = cq_send_vector; + INIT_LIST_HEAD(&smcibcq_send->list); + cqattr.comp_vector = cq_send_vector; + smcibcq_send->roce_cq = ib_create_cq(smcibdev->ibdev, + smc_wr_tx_cq_handler, NULL, + smcibcq_send, &cqattr); + rc = PTR_ERR_OR_ZERO(smcibcq_send->roce_cq); + if (IS_ERR(smcibcq_send->roce_cq)) { + smcibcq_send->roce_cq = NULL; + goto err; + } + list_add_tail(&smcibcq_send->list, &smcibdev->smcibcq_send); + + /* initialize recv CQ */ + smcibcq_recv = kmalloc(sizeof(*smcibcq_recv), GFP_KERNEL); + if (!smcibcq_recv) { + rc = -ENOMEM; + goto err; + } + cq_recv_vector = smc_ib_get_least_used_vector(smcibdev); + smcibcq_recv->smcibdev = smcibdev; + smcibcq_recv->load = 0; + smcibcq_recv->is_send = 0; + smcibcq_recv->comp_vector = cq_recv_vector; + INIT_LIST_HEAD(&smcibcq_recv->list); + cqattr.comp_vector = cq_recv_vector; + smcibcq_recv->roce_cq = ib_create_cq(smcibdev->ibdev, + smc_wr_rx_cq_handler, NULL, + smcibcq_recv, &cqattr); + rc = PTR_ERR_OR_ZERO(smcibcq_recv->roce_cq); + if (IS_ERR(smcibcq_recv->roce_cq)) { + smcibcq_recv->roce_cq = NULL; + goto err; + } + list_add_tail(&smcibcq_recv->list, &smcibdev->smcibcq_recv); } - smcibdev->roce_cq_recv = smcibcq_recv; smc_wr_add_dev(smcibdev); smcibdev->initialized = 1; goto out; -err_recv: - kfree(smcibcq_recv); - smc_ib_put_vector(smcibdev, cq_recv_vector); - ib_destroy_cq(smcibcq_send->roce_cq); -err_send: - kfree(smcibcq_send); - smc_ib_put_vector(smcibdev, cq_send_vector); +err: + smc_ib_cleanup_cq(smcibdev); out: mutex_unlock(&smcibdev->mutex); return rc; @@ -901,8 +967,7 @@ static void smc_ib_cleanup_per_ibdev(struct smc_ib_device *smcibdev) if (!smcibdev->initialized) goto out; smcibdev->initialized = 0; - ib_destroy_cq(smcibdev->roce_cq_recv->roce_cq); - ib_destroy_cq(smcibdev->roce_cq_send->roce_cq); + smc_ib_cleanup_cq(smcibdev); smc_wr_remove_dev(smcibdev); out: mutex_unlock(&smcibdev->mutex); @@ -978,6 +1043,8 @@ static int smc_ib_add_dev(struct ib_device *ibdev) INIT_IB_EVENT_HANDLER(&smcibdev->event_handler, smcibdev->ibdev, smc_ib_global_event_handler); ib_register_event_handler(&smcibdev->event_handler); + INIT_LIST_HEAD(&smcibdev->smcibcq_send); + INIT_LIST_HEAD(&smcibdev->smcibcq_recv); /* vector's load per ib device */ smcibdev->vector_load = kcalloc(ibdev->num_comp_vectors, sizeof(int), GFP_KERNEL); diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 5b34274ecf47..1776627f113d 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -38,6 +38,8 @@ struct smc_ib_cq { /* ib_cq wrapper for smc */ struct ib_cq *roce_cq; /* real ib_cq for link */ struct tasklet_struct tasklet; /* tasklet for wr */ bool is_send; /* send for recv cq */ + int comp_vector; /* index of completion vector */ + int load; /* load of current cq */ }; struct smc_ib_device { /* ib-device infos for smc */ @@ -45,8 +47,8 @@ struct smc_ib_device { /* ib-device infos for smc */ struct ib_device *ibdev; struct ib_port_attr pattr[SMC_MAX_PORTS]; /* ib dev. port attrs */ struct ib_event_handler event_handler; /* global ib_event handler */ - struct smc_ib_cq *roce_cq_send; /* send completion queue */ - struct smc_ib_cq *roce_cq_recv; /* recv completion queue */ + struct list_head smcibcq_send; /* all send cqs */ + struct list_head smcibcq_recv; /* all recv cqs */ char mac[SMC_MAX_PORTS][ETH_ALEN]; /* mac address per port*/ u8 pnetid[SMC_MAX_PORTS][SMC_MAX_PNETID_LEN]; diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 011435efb65b..169253e53786 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -830,14 +830,22 @@ int smc_wr_alloc_link_mem(struct smc_link *link) void smc_wr_remove_dev(struct smc_ib_device *smcibdev) { - tasklet_kill(&smcibdev->roce_cq_recv->tasklet); - tasklet_kill(&smcibdev->roce_cq_send->tasklet); + struct smc_ib_cq *smcibcq; + + list_for_each_entry(smcibcq, &smcibdev->smcibcq_send, list) + tasklet_kill(&smcibcq->tasklet); + list_for_each_entry(smcibcq, &smcibdev->smcibcq_recv, list) + tasklet_kill(&smcibcq->tasklet); } void smc_wr_add_dev(struct smc_ib_device *smcibdev) { - tasklet_setup(&smcibdev->roce_cq_recv->tasklet, smc_wr_rx_tasklet_fn); - tasklet_setup(&smcibdev->roce_cq_send->tasklet, smc_wr_tx_tasklet_fn); + struct smc_ib_cq *smcibcq; + + list_for_each_entry(smcibcq, &smcibdev->smcibcq_send, list) + tasklet_setup(&smcibcq->tasklet, smc_wr_tx_tasklet_fn); + list_for_each_entry(smcibcq, &smcibdev->smcibcq_recv, list) + tasklet_setup(&smcibcq->tasklet, smc_wr_rx_tasklet_fn); } int smc_wr_create_link(struct smc_link *lnk) From patchwork Fri Jan 14 05:48:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Lu X-Patchwork-Id: 12713404 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77A76C433EF for ; Fri, 14 Jan 2022 05:49:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236524AbiANFtG (ORCPT ); Fri, 14 Jan 2022 00:49:06 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:35913 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239150AbiANFtC (ORCPT ); Fri, 14 Jan 2022 00:49:02 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=tonylu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0V1nMT2A_1642139339; Received: from localhost(mailfrom:tonylu@linux.alibaba.com fp:SMTPD_---0V1nMT2A_1642139339) by smtp.aliyun-inc.com(127.0.0.1); Fri, 14 Jan 2022 13:48:59 +0800 From: Tony Lu To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org Subject: [RFC PATCH net-next 5/6] net/smc: Unbind buffer size from clcsock and make it tunable Date: Fri, 14 Jan 2022 13:48:51 +0800 Message-Id: <20220114054852.38058-6-tonylu@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220114054852.38058-1-tonylu@linux.alibaba.com> References: <20220114054852.38058-1-tonylu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC SMC uses smc->sk.sk_{rcv|snd}buf to create buffer for send buffer or RMB. And the values of buffer size inherits from clcsock. The clcsock is a TCP sock which is initiated during SMC connection startup. The inherited buffer size doesn't fit SMC well. TCP provides two sysctl knobs to tune r/w buffers, net.ipv4.tcp_{r|w}mem, and SMC use the default value from TCP. The buffer size is tuned for TCP, but not fit SMC well in some scenarios. For example, we need larger buffer of SMC for high throughput applications, and smaller buffer of SMC for saving contiguous memory. We need to adjust the buffer size apart from TCP and not to disturb TCP. This unbinds buffer size which inherits from clcsock, and provides sysctl knobs to adjust buffer size independently. These knobs can be tuned with different values for different net namespaces for performance and flexibility. Signed-off-by: Tony Lu Reviewed-by: Wen Gu --- Documentation/networking/smc-sysctl.rst | 20 ++++++ include/net/netns/smc.h | 5 ++ net/smc/Makefile | 2 +- net/smc/af_smc.c | 17 +++++- net/smc/smc_sysctl.c | 81 +++++++++++++++++++++++++ net/smc/smc_sysctl.h | 22 +++++++ 6 files changed, 144 insertions(+), 3 deletions(-) create mode 100644 Documentation/networking/smc-sysctl.rst create mode 100644 net/smc/smc_sysctl.c create mode 100644 net/smc/smc_sysctl.h diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst new file mode 100644 index 000000000000..ba2be59a57dd --- /dev/null +++ b/Documentation/networking/smc-sysctl.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========= +SMC Sysctl +========= + +/proc/sys/net/smc/* Variables +============================== + +wmem_default - INTEGER + Initial size of send buffer used by SMC sockets. + The default value inherits from net.ipv4.tcp_wmem[1]. + + Default: 16K + +rmem_default - INTEGER + Initial size of receive buffer (RMB) used by SMC sockets. + The default value inherits from net.ipv4.tcp_rmem[1]. + + Default: 131072 bytes. diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h index ea8a9cf2619b..f948235e3156 100644 --- a/include/net/netns/smc.h +++ b/include/net/netns/smc.h @@ -12,5 +12,10 @@ struct netns_smc { /* protect fback_rsn */ struct mutex mutex_fback_rsn; struct smc_stats_rsn *fback_rsn; +#ifdef CONFIG_SYSCTL + struct ctl_table_header *smc_hdr; +#endif + int sysctl_wmem_default; + int sysctl_rmem_default; }; #endif diff --git a/net/smc/Makefile b/net/smc/Makefile index 196fb6f01b14..640af9a39f9c 100644 --- a/net/smc/Makefile +++ b/net/smc/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_SMC) += smc.o obj-$(CONFIG_SMC_DIAG) += smc_diag.o smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_stats.o -smc-y += smc_tracepoint.o +smc-y += smc_tracepoint.o smc_sysctl.o diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index ffab9cee747d..0650b5971e0a 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -51,6 +51,7 @@ #include "smc_close.h" #include "smc_stats.h" #include "smc_tracepoint.h" +#include "smc_sysctl.h" static DEFINE_MUTEX(smc_server_lgr_pending); /* serialize link group * creation on server @@ -2851,8 +2852,8 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol, smc->clcsock = clcsock; } - smc->sk.sk_sndbuf = max(smc->clcsock->sk->sk_sndbuf, SMC_BUF_MIN_SIZE); - smc->sk.sk_rcvbuf = max(smc->clcsock->sk->sk_rcvbuf, SMC_BUF_MIN_SIZE); + smc->sk.sk_sndbuf = sock_net(sk)->smc.sysctl_wmem_default; + smc->sk.sk_rcvbuf = sock_net(sk)->smc.sysctl_rmem_default; out: return rc; @@ -2932,6 +2933,11 @@ unsigned int smc_net_id; static __net_init int smc_net_init(struct net *net) { + net->smc.sysctl_wmem_default = max(net->ipv4.sysctl_tcp_wmem[1], + SMC_BUF_MIN_SIZE); + net->smc.sysctl_rmem_default = max(net->ipv4.sysctl_tcp_rmem[1], + SMC_BUF_MIN_SIZE); + return smc_pnet_net_init(net); } @@ -3044,6 +3050,12 @@ static int __init smc_init(void) goto out_sock; } + rc = smc_sysctl_init(); + if (rc) { + pr_err("%s: sysctl fails with %d\n", __func__, rc); + goto out_sock; + } + static_branch_enable(&tcp_have_smc); return 0; @@ -3085,6 +3097,7 @@ static void __exit smc_exit(void) smc_clc_exit(); unregister_pernet_subsys(&smc_net_stat_ops); unregister_pernet_subsys(&smc_net_ops); + smc_sysctl_exit(); rcu_barrier(); } diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c new file mode 100644 index 000000000000..6706fe1bd888 --- /dev/null +++ b/net/smc/smc_sysctl.c @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +#include "smc_core.h" + +static int min_sndbuf = SMC_BUF_MIN_SIZE; +static int min_rcvbuf = SMC_BUF_MIN_SIZE; + +static struct ctl_table smc_table[] = { + { + .procname = "wmem_default", + .data = &init_net.smc.sysctl_wmem_default, + .maxlen = sizeof(init_net.smc.sysctl_wmem_default), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &min_sndbuf, + }, + { + .procname = "rmem_default", + .data = &init_net.smc.sysctl_rmem_default, + .maxlen = sizeof(init_net.smc.sysctl_rmem_default), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &min_rcvbuf, + }, + { } +}; + +static __net_init int smc_sysctl_init_net(struct net *net) +{ + struct ctl_table *table; + + table = smc_table; + if (!net_eq(net, &init_net)) { + int i; + + table = kmemdup(table, sizeof(smc_table), GFP_KERNEL); + if (!table) + goto err_alloc; + + for (i = 0; i < ARRAY_SIZE(smc_table) - 1; i++) + table[i].data += (void *)net - (void *)&init_net; + } + + net->smc.smc_hdr = register_net_sysctl(net, "net/smc", table); + if (!net->smc.smc_hdr) + goto err_reg; + + return 0; + +err_reg: + if (!net_eq(net, &init_net)) + kfree(table); +err_alloc: + return -ENOMEM; +} + +static __net_exit void smc_sysctl_exit_net(struct net *net) +{ + unregister_net_sysctl_table(net->smc.smc_hdr); +} + +static struct pernet_operations smc_sysctl_ops __net_initdata = { + .init = smc_sysctl_init_net, + .exit = smc_sysctl_exit_net, +}; + +int __init smc_sysctl_init(void) +{ + return register_pernet_subsys(&smc_sysctl_ops); +} + +void smc_sysctl_exit(void) +{ + unregister_pernet_subsys(&smc_sysctl_ops); +} diff --git a/net/smc/smc_sysctl.h b/net/smc/smc_sysctl.h new file mode 100644 index 000000000000..c01c5de3a3ea --- /dev/null +++ b/net/smc/smc_sysctl.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _SMC_SYSCTL_H +#define _SMC_SYSCTL_H + +#ifdef CONFIG_SYSCTL + +int smc_sysctl_init(void); +void smc_sysctl_exit(void); + +#else + +int smc_sysctl_init(void) +{ + return 0; +} + +void smc_sysctl_exit(void) { } + +#endif /* CONFIG_SYSCTL */ + +#endif /* _SMC_SYSCTL_H */ From patchwork Fri Jan 14 05:48:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Lu X-Patchwork-Id: 12713405 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4FB5C433F5 for ; Fri, 14 Jan 2022 05:49:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239150AbiANFtH (ORCPT ); Fri, 14 Jan 2022 00:49:07 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:42805 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231322AbiANFtC (ORCPT ); Fri, 14 Jan 2022 00:49:02 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R981e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=tonylu@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0V1nD.MX_1642139340; Received: from localhost(mailfrom:tonylu@linux.alibaba.com fp:SMTPD_---0V1nD.MX_1642139340) by smtp.aliyun-inc.com(127.0.0.1); Fri, 14 Jan 2022 13:49:00 +0800 From: Tony Lu To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org Subject: [RFC PATCH net-next 6/6] net/smc: Introduce tunable linkgroup max connections Date: Fri, 14 Jan 2022 13:48:52 +0800 Message-Id: <20220114054852.38058-7-tonylu@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220114054852.38058-1-tonylu@linux.alibaba.com> References: <20220114054852.38058-1-tonylu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This introduces tunable sysctl knob max_lgr_conns to tune the max connections in one linkgroup. This knob is net-namespaceify. Currently, a linkgroup is shared by SMC_RMBS_PER_LGR_MAX connectiosn at max, which is 255. This shares one QP, and introduces more competition, as connections increases, such as smc_cdc_get_free_slot(), it shares link-level slots. The environment and scenario are different, so this makes it possible to tunable by users, to save linkgroup resources or reduce competition and increase performance. Signed-off-by: Tony Lu --- include/net/netns/smc.h | 1 + net/smc/af_smc.c | 1 + net/smc/smc_core.c | 2 +- net/smc/smc_sysctl.c | 11 +++++++++++ 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h index f948235e3156..4f55d2876d19 100644 --- a/include/net/netns/smc.h +++ b/include/net/netns/smc.h @@ -17,5 +17,6 @@ struct netns_smc { #endif int sysctl_wmem_default; int sysctl_rmem_default; + int sysctl_max_lgr_conns; }; #endif diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 0650b5971e0a..f38e24cbb4a7 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2937,6 +2937,7 @@ static __net_init int smc_net_init(struct net *net) SMC_BUF_MIN_SIZE); net->smc.sysctl_rmem_default = max(net->ipv4.sysctl_tcp_rmem[1], SMC_BUF_MIN_SIZE); + net->smc.sysctl_max_lgr_conns = SMC_RMBS_PER_LGR_MAX; return smc_pnet_net_init(net); } diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 8935ef4811b0..b6e70dd0688d 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1817,7 +1817,7 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini) (ini->smcd_version == SMC_V2 || lgr->vlan_id == ini->vlan_id) && (role == SMC_CLNT || ini->is_smcd || - lgr->conns_num < SMC_RMBS_PER_LGR_MAX)) { + lgr->conns_num < net->smc.sysctl_max_lgr_conns)) { /* link group found */ ini->first_contact_local = 0; conn->lgr = lgr; diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c index 6706fe1bd888..5ffcf6008c20 100644 --- a/net/smc/smc_sysctl.c +++ b/net/smc/smc_sysctl.c @@ -10,6 +10,8 @@ static int min_sndbuf = SMC_BUF_MIN_SIZE; static int min_rcvbuf = SMC_BUF_MIN_SIZE; +static int min_lgr_conns = 1; +static int max_lgr_conns = SMC_RMBS_PER_LGR_MAX; static struct ctl_table smc_table[] = { { @@ -28,6 +30,15 @@ static struct ctl_table smc_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = &min_rcvbuf, }, + { + .procname = "max_lgr_conns", + .data = &init_net.smc.sysctl_max_lgr_conns, + .maxlen = sizeof(init_net.smc.sysctl_max_lgr_conns), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &min_lgr_conns, + .extra2 = &max_lgr_conns, + }, { } };