From patchwork Tue Feb 8 12:53:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 12738757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16700C4332F for ; Tue, 8 Feb 2022 13:16:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343863AbiBHNQc (ORCPT ); Tue, 8 Feb 2022 08:16:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359726AbiBHMxZ (ORCPT ); Tue, 8 Feb 2022 07:53:25 -0500 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5356C03FECA; Tue, 8 Feb 2022 04:53:22 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V3wWcNp_1644324797; Received: from localhost(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0V3wWcNp_1644324797) by smtp.aliyun-inc.com(127.0.0.1); Tue, 08 Feb 2022 20:53:18 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, "D. Wythe" Subject: [PATCH net-next v5 1/5] net/smc: Make smc_tcp_listen_work() independent Date: Tue, 8 Feb 2022 20:53:09 +0800 Message-Id: <58c544cb206d94b759ff0546bcffe693c3cbfb98.1644323503.git.alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: "D. Wythe" In multithread and 10K connections benchmark, the backend TCP connection established very slowly, and lots of TCP connections stay in SYN_SENT state. Client: smc_run wrk -c 10000 -t 4 http://server the netstate of server host shows like: 145042 times the listen queue of a socket overflowed 145042 SYNs to LISTEN sockets dropped One reason of this issue is that, since the smc_tcp_listen_work() shared the same workqueue (smc_hs_wq) with smc_listen_work(), while the smc_listen_work() do blocking wait for smc connection established. Once the workqueue became congested, it's will block the accept() from TCP listen. This patch creates a independent workqueue(smc_tcp_ls_wq) for smc_tcp_listen_work(), separate it from smc_listen_work(), which is quite acceptable considering that smc_tcp_listen_work() runs very fast. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 00b2e9d..4969ac8 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -59,6 +59,7 @@ * creation on client */ +static struct workqueue_struct *smc_tcp_ls_wq; /* wq for tcp listen work */ struct workqueue_struct *smc_hs_wq; /* wq for handshake work */ struct workqueue_struct *smc_close_wq; /* wq for close work */ @@ -2227,7 +2228,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) lsmc->clcsk_data_ready(listen_clcsock); if (lsmc->sk.sk_state == SMC_LISTEN) { sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */ - if (!queue_work(smc_hs_wq, &lsmc->tcp_listen_work)) + if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work)) sock_put(&lsmc->sk); } } @@ -3024,9 +3025,14 @@ static int __init smc_init(void) goto out_nl; rc = -ENOMEM; + + smc_tcp_ls_wq = alloc_workqueue("smc_tcp_ls_wq", 0, 0); + if (!smc_tcp_ls_wq) + goto out_pnet; + smc_hs_wq = alloc_workqueue("smc_hs_wq", 0, 0); if (!smc_hs_wq) - goto out_pnet; + goto out_alloc_tcp_ls_wq; smc_close_wq = alloc_workqueue("smc_close_wq", 0, 0); if (!smc_close_wq) @@ -3097,6 +3103,8 @@ static int __init smc_init(void) destroy_workqueue(smc_close_wq); out_alloc_hs_wq: destroy_workqueue(smc_hs_wq); +out_alloc_tcp_ls_wq: + destroy_workqueue(smc_tcp_ls_wq); out_pnet: smc_pnet_exit(); out_nl: @@ -3115,6 +3123,7 @@ static void __exit smc_exit(void) smc_core_exit(); smc_ib_unregister_client(); destroy_workqueue(smc_close_wq); + destroy_workqueue(smc_tcp_ls_wq); destroy_workqueue(smc_hs_wq); proto_unregister(&smc_proto6); proto_unregister(&smc_proto); From patchwork Tue Feb 8 12:53:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 12738758 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6EBCC4332F for ; Tue, 8 Feb 2022 13:16:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355644AbiBHNQe (ORCPT ); Tue, 8 Feb 2022 08:16:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359731AbiBHMxZ (ORCPT ); Tue, 8 Feb 2022 07:53:25 -0500 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B41F4C03FECE; Tue, 8 Feb 2022 04:53:23 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V3wYVms_1644324798; Received: from localhost(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0V3wYVms_1644324798) by smtp.aliyun-inc.com(127.0.0.1); Tue, 08 Feb 2022 20:53:19 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, "D. Wythe" Subject: [PATCH net-next v5 2/5] net/smc: Limit backlog connections Date: Tue, 8 Feb 2022 20:53:10 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: "D. Wythe" Current implementation does not handling backlog semantics, one potential risk is that server will be flooded by infinite amount connections, even if client was SMC-incapable. This patch works to put a limit on backlog connections, referring to the TCP implementation, we divides SMC connections into two categories: 1. Half SMC connection, which includes all TCP established while SMC not connections. 2. Full SMC connection, which includes all SMC established connections. For half SMC connection, since all half SMC connections starts with TCP established, we can achieve our goal by put a limit before TCP established. Refer to the implementation of TCP, this limits will based on not only the half SMC connections but also the full connections, which is also a constraint on full SMC connections. For full SMC connections, although we know exactly where it starts, it's quite hard to put a limit before it. The easiest way is to block wait before receive SMC confirm CLC message, while it's under protection by smc_server_lgr_pending, a global lock, which leads this limit to the entire host instead of a single listen socket. Another way is to drop the full connections, but considering the cast of SMC connections, we prefer to keep full SMC connections. Even so, the limits of full SMC connections still exists, see commits about half SMC connection below. After this patch, the limits of backend connection shows like: For SMC: 1. Client with SMC-capability can makes 2 * backlog full SMC connections or 1 * backlog half SMC connections and 1 * backlog full SMC connections at most. 2. Client without SMC-capability can only makes 1 * backlog half TCP connections and 1 * backlog full TCP connections. Signed-off-by: D. Wythe --- net/smc/af_smc.c | 43 +++++++++++++++++++++++++++++++++++++++++++ net/smc/smc.h | 4 ++++ 2 files changed, 47 insertions(+) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 4969ac8..ebfce3d 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -73,6 +73,34 @@ static void smc_set_keepalive(struct sock *sk, int val) smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val); } +static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, + struct request_sock *req, + struct dst_entry *dst, + struct request_sock *req_unhash, + bool *own_req) +{ + struct smc_sock *smc; + + smc = (struct smc_sock *)((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY); + + if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->smc_pendings) > + sk->sk_max_ack_backlog) + goto drop; + + if (sk_acceptq_is_full(&smc->sk)) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); + goto drop; + } + + /* passthrough to origin syn recv sock fct */ + return smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); + +drop: + dst_release(dst); + tcp_listendrop(sk); + return NULL; +} + static struct smc_hashinfo smc_v4_hashinfo = { .lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock), }; @@ -1595,6 +1623,9 @@ static void smc_listen_out(struct smc_sock *new_smc) struct smc_sock *lsmc = new_smc->listen_smc; struct sock *newsmcsk = &new_smc->sk; + if (tcp_sk(new_smc->clcsock->sk)->syn_smc) + atomic_dec(&lsmc->smc_pendings); + if (lsmc->sk.sk_state == SMC_LISTEN) { lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING); smc_accept_enqueue(&lsmc->sk, newsmcsk); @@ -2200,6 +2231,9 @@ static void smc_tcp_listen_work(struct work_struct *work) if (!new_smc) continue; + if (tcp_sk(new_smc->clcsock->sk)->syn_smc) + atomic_inc(&lsmc->smc_pendings); + new_smc->listen_smc = lsmc; new_smc->use_fallback = lsmc->use_fallback; new_smc->fallback_rsn = lsmc->fallback_rsn; @@ -2266,6 +2300,15 @@ static int smc_listen(struct socket *sock, int backlog) smc->clcsock->sk->sk_data_ready = smc_clcsock_data_ready; smc->clcsock->sk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); + + /* save origin ops */ + smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops; + + smc->af_ops = *smc->ori_af_ops; + smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock; + + inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops; + rc = kernel_listen(smc->clcsock, backlog); if (rc) { smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready; diff --git a/net/smc/smc.h b/net/smc/smc.h index 37b2001..5e5e38d 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -252,6 +252,10 @@ struct smc_sock { /* smc sock container */ bool use_fallback; /* fallback to tcp */ int fallback_rsn; /* reason for fallback */ u32 peer_diagnosis; /* decline reason from peer */ + atomic_t smc_pendings; /* pending smc connections */ + struct inet_connection_sock_af_ops af_ops; + const struct inet_connection_sock_af_ops *ori_af_ops; + /* origin af ops */ int sockopt_defer_accept; /* sockopt TCP_DEFER_ACCEPT * value From patchwork Tue Feb 8 12:53:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 12738759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06583C433F5 for ; Tue, 8 Feb 2022 13:16:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350591AbiBHNQY (ORCPT ); Tue, 8 Feb 2022 08:16:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359740AbiBHMxZ (ORCPT ); Tue, 8 Feb 2022 07:53:25 -0500 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7C21C03FECF; Tue, 8 Feb 2022 04:53:23 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V3wCD8l_1644324800; Received: from localhost(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0V3wCD8l_1644324800) by smtp.aliyun-inc.com(127.0.0.1); Tue, 08 Feb 2022 20:53:20 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, "D. Wythe" Subject: [PATCH net-next v5 3/5] net/smc: Fallback when handshake workqueue congested Date: Tue, 8 Feb 2022 20:53:11 +0800 Message-Id: <82a82bc35f0eab8962fe2e6fec801fc5803e07f3.1644323503.git.alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: "D. Wythe" This patch intends to provide a mechanism to allow automatic fallback to TCP according to the pressure of SMC handshake process. At present, frequent visits will cause the incoming connections to be backlogged in SMC handshake queue, raise the connections established time. Which is quite unacceptable for those applications who base on short lived connections. There are two ways to implement this mechanism: 1. Fallback when TCP established. 2. Fallback before TCP established. In the first way, we need to wait and receive CLC messages that the client will potentially send, and then actively reply with a decline message, in a sense, which is also a sort of SMC handshake, affect the connections established time on its way. In the second way, the only problem is that we need to inject SMC logic into TCP when it is about to reply the incoming SYN, since we already do that, it's seems not a problem anymore. And advantage is obvious, few additional processes are required to complete the fallback. This patch use the second way. Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe --- include/linux/tcp.h | 1 + net/ipv4/tcp_input.c | 3 ++- net/smc/af_smc.c | 18 ++++++++++++++++++ 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 78b91bb..1c4ae5d 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -394,6 +394,7 @@ struct tcp_sock { bool is_mptcp; #endif #if IS_ENABLED(CONFIG_SMC) + bool (*smc_in_limited)(const struct sock *sk); bool syn_smc; /* SYN includes SMC */ #endif diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index af94a6d..e817ec6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6703,7 +6703,8 @@ static void tcp_openreq_init(struct request_sock *req, ireq->ir_num = ntohs(tcp_hdr(skb)->dest); ireq->ir_mark = inet_request_mark(sk, skb); #if IS_ENABLED(CONFIG_SMC) - ireq->smc_ok = rx_opt->smc_ok; + ireq->smc_ok = rx_opt->smc_ok && !(tcp_sk(sk)->smc_in_limited && + tcp_sk(sk)->smc_in_limited(sk)); #endif } diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index ebfce3d..8175f60 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -101,6 +101,22 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff return NULL; } +static bool smc_is_in_limited(const struct sock *sk) +{ + const struct smc_sock *smc; + + smc = (const struct smc_sock *) + ((uintptr_t)sk->sk_user_data & ~SK_USER_DATA_NOCOPY); + + if (!smc) + return true; + + if (workqueue_congested(WORK_CPU_UNBOUND, smc_hs_wq)) + return true; + + return false; +} + static struct smc_hashinfo smc_v4_hashinfo = { .lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock), }; @@ -2309,6 +2325,8 @@ static int smc_listen(struct socket *sock, int backlog) inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops; + tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited; + rc = kernel_listen(smc->clcsock, backlog); if (rc) { smc->clcsock->sk->sk_data_ready = smc->clcsk_data_ready; From patchwork Tue Feb 8 12:53:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 12738756 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B298C433EF for ; Tue, 8 Feb 2022 13:16:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245598AbiBHNQa (ORCPT ); Tue, 8 Feb 2022 08:16:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359750AbiBHMx0 (ORCPT ); Tue, 8 Feb 2022 07:53:26 -0500 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47BBEC03FEC0; Tue, 8 Feb 2022 04:53:25 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V3wMPhs_1644324801; Received: from localhost(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0V3wMPhs_1644324801) by smtp.aliyun-inc.com(127.0.0.1); Tue, 08 Feb 2022 20:53:22 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, "D. Wythe" Subject: [PATCH net-next v5 4/5] net/smc: Dynamic control auto fallback by socket options Date: Tue, 8 Feb 2022 20:53:12 +0800 Message-Id: <20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: "D. Wythe" This patch aims to add dynamic control for SMC auto fallback, since we don't have socket option level for SMC yet, which requires we need to implement it at the same time. This patch does the following: - add new socket option level: SOL_SMC. - add new SMC socket option: SMC_AUTO_FALLBACK. - provide getter/setter for SMC socket options. Signed-off-by: D. Wythe --- include/linux/socket.h | 1 + include/uapi/linux/smc.h | 4 +++ net/smc/af_smc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++- net/smc/smc.h | 1 + 4 files changed, 74 insertions(+), 1 deletion(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index 8ef26d8..6f85f5d 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -366,6 +366,7 @@ struct ucred { #define SOL_XDP 283 #define SOL_MPTCP 284 #define SOL_MCTP 285 +#define SOL_SMC 286 /* IPX options */ #define IPX_TYPE 1 diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h index 6c2874f..9f2cbf8 100644 --- a/include/uapi/linux/smc.h +++ b/include/uapi/linux/smc.h @@ -284,4 +284,8 @@ enum { __SMC_NLA_SEID_TABLE_MAX, SMC_NLA_SEID_TABLE_MAX = __SMC_NLA_SEID_TABLE_MAX - 1 }; + +/* SMC socket options */ +#define SMC_AUTO_FALLBACK 1 /* allow auto fallback to TCP */ + #endif /* _UAPI_LINUX_SMC_H */ diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 8175f60..c313561 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2325,7 +2325,8 @@ static int smc_listen(struct socket *sock, int backlog) inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops; - tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited; + if (smc->auto_fallback) + tcp_sk(smc->clcsock->sk)->smc_in_limited = smc_is_in_limited; rc = kernel_listen(smc->clcsock, backlog); if (rc) { @@ -2620,6 +2621,67 @@ static int smc_shutdown(struct socket *sock, int how) return rc ? rc : rc1; } +static int __smc_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen) +{ + struct smc_sock *smc; + int val, len; + + smc = smc_sk(sock->sk); + + if (get_user(len, optlen)) + return -EFAULT; + + len = min_t(int, len, sizeof(int)); + + if (len < 0) + return -EINVAL; + + switch (optname) { + case SMC_AUTO_FALLBACK: + val = smc->auto_fallback; + break; + default: + return -EOPNOTSUPP; + } + + if (put_user(len, optlen)) + return -EFAULT; + if (copy_to_user(optval, &val, len)) + return -EFAULT; + + return 0; +} + +static int __smc_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) +{ + struct sock *sk = sock->sk; + struct smc_sock *smc; + int val, rc; + + smc = smc_sk(sk); + + lock_sock(sk); + switch (optname) { + case SMC_AUTO_FALLBACK: + if (optlen < sizeof(int)) + return -EINVAL; + if (copy_from_sockptr(&val, optval, sizeof(int))) + return -EFAULT; + + smc->auto_fallback = !!val; + rc = 0; + break; + default: + rc = -EOPNOTSUPP; + break; + } + release_sock(sk); + + return rc; +} + static int smc_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { @@ -2629,6 +2691,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, if (level == SOL_TCP && optname == TCP_ULP) return -EOPNOTSUPP; + else if (level == SOL_SMC) + return __smc_setsockopt(sock, level, optname, optval, optlen); smc = smc_sk(sk); @@ -2711,6 +2775,9 @@ static int smc_getsockopt(struct socket *sock, int level, int optname, struct smc_sock *smc; int rc; + if (level == SOL_SMC) + return __smc_getsockopt(sock, level, optname, optval, optlen); + smc = smc_sk(sock->sk); mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { diff --git a/net/smc/smc.h b/net/smc/smc.h index 5e5e38d..a0bdf75 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -249,6 +249,7 @@ struct smc_sock { /* smc sock container */ struct work_struct smc_listen_work;/* prepare new accept socket */ struct list_head accept_q; /* sockets to be accepted */ spinlock_t accept_q_lock; /* protects accept_q */ + bool auto_fallback; /* auto fallabck to tcp */ bool use_fallback; /* fallback to tcp */ int fallback_rsn; /* reason for fallback */ u32 peer_diagnosis; /* decline reason from peer */ From patchwork Tue Feb 8 12:53:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 12738754 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BD52C433EF for ; Tue, 8 Feb 2022 13:16:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350363AbiBHNQW (ORCPT ); Tue, 8 Feb 2022 08:16:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359762AbiBHMx1 (ORCPT ); Tue, 8 Feb 2022 07:53:27 -0500 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48F3BC03FECA; Tue, 8 Feb 2022 04:53:26 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V3wCD9C_1644324803; Received: from localhost(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0V3wCD9C_1644324803) by smtp.aliyun-inc.com(127.0.0.1); Tue, 08 Feb 2022 20:53:23 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, "D. Wythe" Subject: [PATCH net-next v5 5/5] net/smc: Add global configure for auto fallback by netlink Date: Tue, 8 Feb 2022 20:53:13 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: "D. Wythe" Although we can control SMC auto fallback through socket options, which means that applications who need it must modify their code. It's quite troublesome for many existing applications. This patch modifies the global default value of auto fallback through netlink, providing a way to auto fallback without modifying any code for applications. Suggested-by: Tony Lu Signed-off-by: D. Wythe --- include/uapi/linux/smc.h | 3 +++ net/smc/af_smc.c | 17 +++++++++++++++++ net/smc/smc.h | 7 +++++++ net/smc/smc_core.c | 2 ++ net/smc/smc_netlink.c | 10 ++++++++++ 5 files changed, 39 insertions(+) diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h index 9f2cbf8..33f7fb8 100644 --- a/include/uapi/linux/smc.h +++ b/include/uapi/linux/smc.h @@ -59,6 +59,8 @@ enum { SMC_NETLINK_DUMP_SEID, SMC_NETLINK_ENABLE_SEID, SMC_NETLINK_DISABLE_SEID, + SMC_NETLINK_ENABLE_AUTO_FALLBACK, + SMC_NETLINK_DISABLE_AUTO_FALLBACK, }; /* SMC_GENL_FAMILY top level attributes */ @@ -85,6 +87,7 @@ enum { SMC_NLA_SYS_LOCAL_HOST, /* string */ SMC_NLA_SYS_SEID, /* string */ SMC_NLA_SYS_IS_SMCR_V2, /* u8 */ + SMC_NLA_SYS_AUTO_FALLBACK, /* u8 */ __SMC_NLA_SYS_MAX, SMC_NLA_SYS_MAX = __SMC_NLA_SYS_MAX - 1 }; diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index c313561..4a25ce7 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -59,6 +59,8 @@ * creation on client */ +bool smc_auto_fallback; /* default behavior for auto fallback, disable by default */ + static struct workqueue_struct *smc_tcp_ls_wq; /* wq for tcp listen work */ struct workqueue_struct *smc_hs_wq; /* wq for handshake work */ struct workqueue_struct *smc_close_wq; /* wq for close work */ @@ -66,6 +68,18 @@ static void smc_tcp_listen_work(struct work_struct *); static void smc_connect_work(struct work_struct *); +int smc_enable_auto_fallback(struct sk_buff *skb, struct genl_info *info) +{ + WRITE_ONCE(smc_auto_fallback, true); + return 0; +} + +int smc_disable_auto_fallback(struct sk_buff *skb, struct genl_info *info) +{ + WRITE_ONCE(smc_auto_fallback, false); + return 0; +} + static void smc_set_keepalive(struct sock *sk, int val) { struct smc_sock *smc = smc_sk(sk); @@ -3006,6 +3020,9 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol, smc->use_fallback = false; /* assume rdma capability first */ smc->fallback_rsn = 0; + /* default behavior from smc_auto_fallback */ + smc->auto_fallback = READ_ONCE(smc_auto_fallback); + rc = 0; if (!clcsock) { rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, diff --git a/net/smc/smc.h b/net/smc/smc.h index a0bdf75..ac75fe8 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -14,6 +14,7 @@ #include #include #include /* __aligned */ +#include #include #include "smc_ib.h" @@ -336,4 +337,10 @@ void smc_fill_gid_list(struct smc_link_group *lgr, struct smc_gidlist *gidlist, struct smc_ib_device *known_dev, u8 *known_gid); +extern bool smc_auto_fallback; /* default behavior for auto fallback */ + +/* smc_auto_fallback setter for netlink */ +int smc_enable_auto_fallback(struct sk_buff *skb, struct genl_info *info); +int smc_disable_auto_fallback(struct sk_buff *skb, struct genl_info *info); + #endif /* __SMC_H */ diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 29525d0..cc9a398 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -248,6 +248,8 @@ int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb) goto errattr; if (nla_put_u8(skb, SMC_NLA_SYS_IS_SMCR_V2, true)) goto errattr; + if (nla_put_u8(skb, SMC_NLA_SYS_AUTO_FALLBACK, smc_auto_fallback)) + goto errattr; smc_clc_get_hostname(&host); if (host) { memcpy(hostname, host, SMC_MAX_HOSTNAME_LEN); diff --git a/net/smc/smc_netlink.c b/net/smc/smc_netlink.c index f13ab06..a7de517 100644 --- a/net/smc/smc_netlink.c +++ b/net/smc/smc_netlink.c @@ -111,6 +111,16 @@ .flags = GENL_ADMIN_PERM, .doit = smc_nl_disable_seid, }, + { + .cmd = SMC_NETLINK_ENABLE_AUTO_FALLBACK, + .flags = GENL_ADMIN_PERM, + .doit = smc_enable_auto_fallback, + }, + { + .cmd = SMC_NETLINK_DISABLE_AUTO_FALLBACK, + .flags = GENL_ADMIN_PERM, + .doit = smc_disable_auto_fallback, + }, }; static const struct nla_policy smc_gen_nl_policy[2] = {