From patchwork Wed Jun 5 12:56:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13686860 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB9C6188CDA; Wed, 5 Jun 2024 12:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717592197; cv=none; b=B0wjMpaBOL11PnKbkW2TLQFg+Auy713Z5yXATpI6GjqmrUVnLhcUrmwi7XMe+CPWQdajUamziWQYfKshasTRXc+k3elJmLM3BbSiKIjv0o3jyaQBiFxXHQS+RfgupgvxOPeSO6jhOG57lOgmYL5uW6fa/92UwhkYrml0p8ErVdE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717592197; c=relaxed/simple; bh=8P8ThXXY94tRkQLLw1kgLnY+kEtw1OLzhH8MHZ45XOU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=rHUm1YszIApZ+FxjXPRxMPbOZnLKp6zqWnwLNdUYR4gREjcZMgdKpW+oGZ3bIPb0Xb7ItItLm93GGMDE7RS0dpuTOgcAyEDZtz3I2KXjhzxk6N8fFgXD9wL3kazRANZay7xGXtRC5M3DKW9oEftdS4X+NUk3oTladcIK5adL3QI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=S5mljobM; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="S5mljobM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717592186; h=From:To:Subject:Date:Message-Id; bh=8UdumLMUh1joJ53RhSTEIGmWSOK8pHVdfmqZ+JtJT34=; b=S5mljobMaXPfk/1T9KsMDQ2AG4lCZnl/yh6M+TpRTcC0pqx/4YnxUeFwhGsyzrIVO9BQt7mduhL/gvGtHW3mcDfbQNrglSPgb4YOWXoe/jRC4pRuibqPWoJIbosqGoZggAJOw4APb83lzfAtYyR7QEXqzkxm64zrXNvSxGdHpHs= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R711e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037067110;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W7vCi8T_1717592185; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W7vCi8T_1717592185) by smtp.aliyun-inc.com; Wed, 05 Jun 2024 20:56:25 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock Date: Wed, 5 Jun 2024 20:56:18 +0800 Message-Id: <1717592180-66181-2-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1717592180-66181-1-git-send-email-alibuda@linux.alibaba.com> References: <1717592180-66181-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" This patch aims to isolate the shared components of SMC socket allocation by introducing smc_sk_init() for sock initialization and __smc_create_clcsk() for the initialization of clcsock. This is in preparation for the subsequent implementation of the AF_INET version of SMC. Signed-off-by: D. Wythe Reviewed-by: Tony Lu Tested-by: Niklas Schnelle --- net/smc/af_smc.c | 86 +++++++++++++++++++++++++++++++------------------------- net/smc/smc.h | 5 ++++ 2 files changed, 53 insertions(+), 38 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index e50a286..77a9d58 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -361,25 +361,15 @@ static void smc_destruct(struct sock *sk) return; } -static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, - int protocol) +void smc_sk_init(struct net *net, struct sock *sk, int protocol) { - struct smc_sock *smc; - struct proto *prot; - struct sock *sk; - - prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto; - sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0); - if (!sk) - return NULL; + struct smc_sock *smc = smc_sk(sk); - sock_init_data(sock, sk); /* sets sk_refcnt to 1 */ sk->sk_state = SMC_INIT; sk->sk_destruct = smc_destruct; sk->sk_protocol = protocol; WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); - smc = smc_sk(sk); INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); INIT_WORK(&smc->connect_work, smc_connect_work); INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); @@ -389,6 +379,24 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, sk->sk_prot->hash(sk); mutex_init(&smc->clcsock_release_lock); smc_init_saved_callbacks(smc); + smc->limit_smc_hs = net->smc.limit_smc_hs; + smc->use_fallback = false; /* assume rdma capability first */ + smc->fallback_rsn = 0; +} + +static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, + int protocol) +{ + struct proto *prot; + struct sock *sk; + + prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto; + sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0); + if (!sk) + return NULL; + + sock_init_data(sock, sk); /* sets sk_refcnt to 1 */ + smc_sk_init(net, sk, protocol); return sk; } @@ -3321,6 +3329,31 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, .splice_read = smc_splice_read, }; +int smc_create_clcsk(struct net *net, struct sock *sk, int family) +{ + struct smc_sock *smc = smc_sk(sk); + int rc; + + rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, + &smc->clcsock); + if (rc) { + sk_common_release(sk); + return rc; + } + + /* smc_clcsock_release() does not wait smc->clcsock->sk's + * destruction; its sk_state might not be TCP_CLOSE after + * smc->sk is close()d, and TCP timers can be fired later, + * which need net ref. + */ + sk = smc->clcsock->sk; + __netns_tracker_free(net, &sk->ns_tracker, false); + sk->sk_net_refcnt = 1; + get_net_track(net, &sk->ns_tracker, GFP_KERNEL); + sock_inuse_add(net, 1); + return 0; +} + static int __smc_create(struct net *net, struct socket *sock, int protocol, int kern, struct socket *clcsock) { @@ -3346,35 +3379,12 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol, /* create internal TCP socket for CLC handshake and fallback */ smc = smc_sk(sk); - smc->use_fallback = false; /* assume rdma capability first */ - smc->fallback_rsn = 0; - - /* default behavior from limit_smc_hs in every net namespace */ - smc->limit_smc_hs = net->smc.limit_smc_hs; rc = 0; - if (!clcsock) { - rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, - &smc->clcsock); - if (rc) { - sk_common_release(sk); - goto out; - } - - /* smc_clcsock_release() does not wait smc->clcsock->sk's - * destruction; its sk_state might not be TCP_CLOSE after - * smc->sk is close()d, and TCP timers can be fired later, - * which need net ref. - */ - sk = smc->clcsock->sk; - __netns_tracker_free(net, &sk->ns_tracker, false); - sk->sk_net_refcnt = 1; - get_net_track(net, &sk->ns_tracker, GFP_KERNEL); - sock_inuse_add(net, 1); - } else { + if (clcsock) smc->clcsock = clcsock; - } - + else + rc = smc_create_clcsk(net, sk, family); out: return rc; } diff --git a/net/smc/smc.h b/net/smc/smc.h index 18c8b78..3edec1e 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -34,6 +34,11 @@ extern struct proto smc_proto; extern struct proto smc_proto6; +/* smc sock initialization */ +void smc_sk_init(struct net *net, struct sock *sk, int protocol); +/* clcsock initialization */ +int smc_create_clcsk(struct net *net, struct sock *sk, int family); + #ifdef ATOMIC64_INIT #define KERNEL_HAS_ATOMIC64 #endif From patchwork Wed Jun 5 12:56:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13686862 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F72D19412D; Wed, 5 Jun 2024 12:56:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.99 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717592198; cv=none; b=DDpKI/vB7+m7PUicrQPW1WNJdkIuAoTWilZs7s9nKDeEOsNW1NeM80tJT5+eekI1WLwZS6aAIVAi572NiEm5nBp7r983mBFT9IAqPo7YFnu6XGn8FyafYET2JhnmtsBzV4ojM4hSuV8RXiHwBaReKLHbnkG1iWEdJTg3MTHqNHg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717592198; c=relaxed/simple; bh=FSix8fbVytiAYRCjf3U5x+vB1uBinyX+qaThH7i8GyE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=kGlJvbX9qiuPhPfmevBCVhMFs6T3NoqbiG4zHA+MWGEZZZgbtmTneKkumHcrj61G5C5uxmD/EDt63h9J9gHc5ylYTQj/MJix+a2J7VzFQwnmuvOeMuVllFCOlwgCfw5laPPES9IIxrpxjcroVRKIdWcud9b2XeVJV09L9Qvypuc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=n/kyuZ4s; arc=none smtp.client-ip=115.124.30.99 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="n/kyuZ4s" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717592186; h=From:To:Subject:Date:Message-Id; bh=WccimR8a0Usk8PrBUu0npp4F7atMe4rnyDy55Q4iTA8=; b=n/kyuZ4s4iHtXAH2to3EDghWHMKdHEVztaDeTPqH4nTZZdLfczR9B4r/skgXyAaaJvip1XO4OxH4kI7FvaoBDUgYzuS1kgjIVvj4JOQbQGxxv+zlXnzrchlowpA50vWt3WewrKb0+3KpnG6AqEkIKFKFfiD4w2xCWjySOxCHodM= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033068173054;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W7vCi8f_1717592186; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W7vCi8f_1717592186) by smtp.aliyun-inc.com; Wed, 05 Jun 2024 20:56:26 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [PATCH net-next v6 2/3] net/smc: expose smc proto operations Date: Wed, 5 Jun 2024 20:56:19 +0800 Message-Id: <1717592180-66181-3-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1717592180-66181-1-git-send-email-alibuda@linux.alibaba.com> References: <1717592180-66181-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" Externalize smc proto operations (smc_xxx) to allow access from files other than af_smc.c This is in preparation for the subsequent implementation of the AF_INET version of SMC. Signed-off-by: D. Wythe Tested-by: Niklas Schnelle --- net/smc/af_smc.c | 60 ++++++++++++++++++++++++++++---------------------------- net/smc/smc.h | 33 +++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+), 30 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 77a9d58..8e3ce76 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -170,15 +170,15 @@ static bool smc_hs_congested(const struct sock *sk) return false; } -static struct smc_hashinfo smc_v4_hashinfo = { +struct smc_hashinfo smc_v4_hashinfo = { .lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock), }; -static struct smc_hashinfo smc_v6_hashinfo = { +struct smc_hashinfo smc_v6_hashinfo = { .lock = __RW_LOCK_UNLOCKED(smc_v6_hashinfo.lock), }; -static int smc_hash_sk(struct sock *sk) +int smc_hash_sk(struct sock *sk) { struct smc_hashinfo *h = sk->sk_prot->h.smc_hash; struct hlist_head *head; @@ -193,7 +193,7 @@ static int smc_hash_sk(struct sock *sk) return 0; } -static void smc_unhash_sk(struct sock *sk) +void smc_unhash_sk(struct sock *sk) { struct smc_hashinfo *h = sk->sk_prot->h.smc_hash; @@ -207,7 +207,7 @@ static void smc_unhash_sk(struct sock *sk) * work which we didn't do because of user hold the sock_lock in the * BH context */ -static void smc_release_cb(struct sock *sk) +void smc_release_cb(struct sock *sk) { struct smc_sock *smc = smc_sk(sk); @@ -307,7 +307,7 @@ static int __smc_release(struct smc_sock *smc) return rc; } -static int smc_release(struct socket *sock) +int smc_release(struct socket *sock) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -401,8 +401,8 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, return sk; } -static int smc_bind(struct socket *sock, struct sockaddr *uaddr, - int addr_len) +int smc_bind(struct socket *sock, struct sockaddr *uaddr, + int addr_len) { struct sockaddr_in *addr = (struct sockaddr_in *)uaddr; struct sock *sk = sock->sk; @@ -1649,8 +1649,8 @@ static void smc_connect_work(struct work_struct *work) release_sock(&smc->sk); } -static int smc_connect(struct socket *sock, struct sockaddr *addr, - int alen, int flags) +int smc_connect(struct socket *sock, struct sockaddr *addr, + int alen, int flags) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -2631,7 +2631,7 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock) read_unlock_bh(&listen_clcsock->sk_callback_lock); } -static int smc_listen(struct socket *sock, int backlog) +int smc_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -2696,8 +2696,8 @@ static int smc_listen(struct socket *sock, int backlog) return rc; } -static int smc_accept(struct socket *sock, struct socket *new_sock, - struct proto_accept_arg *arg) +int smc_accept(struct socket *sock, struct socket *new_sock, + struct proto_accept_arg *arg) { struct sock *sk = sock->sk, *nsk; DECLARE_WAITQUEUE(wait, current); @@ -2766,8 +2766,8 @@ static int smc_accept(struct socket *sock, struct socket *new_sock, return rc; } -static int smc_getname(struct socket *sock, struct sockaddr *addr, - int peer) +int smc_getname(struct socket *sock, struct sockaddr *addr, + int peer) { struct smc_sock *smc; @@ -2780,7 +2780,7 @@ static int smc_getname(struct socket *sock, struct sockaddr *addr, return smc->clcsock->ops->getname(smc->clcsock, addr, peer); } -static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) +int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -2818,8 +2818,8 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) return rc; } -static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, - int flags) +int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -2868,8 +2868,8 @@ static __poll_t smc_accept_poll(struct sock *parent) return mask; } -static __poll_t smc_poll(struct file *file, struct socket *sock, - poll_table *wait) +__poll_t smc_poll(struct file *file, struct socket *sock, + poll_table *wait) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -2921,7 +2921,7 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, return mask; } -static int smc_shutdown(struct socket *sock, int how) +int smc_shutdown(struct socket *sock, int how) { struct sock *sk = sock->sk; bool do_shutdown = true; @@ -3061,8 +3061,8 @@ static int __smc_setsockopt(struct socket *sock, int level, int optname, return rc; } -static int smc_setsockopt(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int optlen) +int smc_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; struct smc_sock *smc; @@ -3148,8 +3148,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, return rc; } -static int smc_getsockopt(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) +int smc_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen) { struct smc_sock *smc; int rc; @@ -3174,8 +3174,8 @@ static int smc_getsockopt(struct socket *sock, int level, int optname, return rc; } -static int smc_ioctl(struct socket *sock, unsigned int cmd, - unsigned long arg) +int smc_ioctl(struct socket *sock, unsigned int cmd, + unsigned long arg) { union smc_host_cursor cons, urg; struct smc_connection *conn; @@ -3261,9 +3261,9 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd, * Note that subsequent recv() calls have to wait till all splice() processing * completed. */ -static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, - struct pipe_inode_info *pipe, size_t len, - unsigned int flags) +ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) { struct sock *sk = sock->sk; struct smc_sock *smc; diff --git a/net/smc/smc.h b/net/smc/smc.h index 3edec1e..34b781e 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -34,6 +34,39 @@ extern struct proto smc_proto; extern struct proto smc_proto6; +extern struct smc_hashinfo smc_v4_hashinfo; +extern struct smc_hashinfo smc_v6_hashinfo; + +int smc_hash_sk(struct sock *sk); +void smc_unhash_sk(struct sock *sk); +void smc_release_cb(struct sock *sk); + +int smc_release(struct socket *sock); +int smc_bind(struct socket *sock, struct sockaddr *uaddr, + int addr_len); +int smc_connect(struct socket *sock, struct sockaddr *addr, + int alen, int flags); +int smc_accept(struct socket *sock, struct socket *new_sock, + struct proto_accept_arg *arg); +int smc_getname(struct socket *sock, struct sockaddr *addr, + int peer); +__poll_t smc_poll(struct file *file, struct socket *sock, + poll_table *wait); +int smc_ioctl(struct socket *sock, unsigned int cmd, + unsigned long arg); +int smc_listen(struct socket *sock, int backlog); +int smc_shutdown(struct socket *sock, int how); +int smc_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen); +int smc_getsockopt(struct socket *sock, int level, int optname, + char __user *optval, int __user *optlen); +int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len); +int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags); +ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags); + /* smc sock initialization */ void smc_sk_init(struct net *net, struct sock *sk, int protocol); /* clcsock initialization */ From patchwork Wed Jun 5 12:56:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "D. Wythe" X-Patchwork-Id: 13686861 Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6825E19309E; Wed, 5 Jun 2024 12:56:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717592197; cv=none; b=KpCHLvqfnJNsorbCiXeux9wVfIwbOA/hYxS2Z5GeGTzE8RFCuW8VSyiHDEEVrfQ5RONfvWYm1V5pR3NmkZo4Dep0KpYbjfdfoKg4ol1GEUKdBD9QqZRUnbza1TyXWuXVlaTgmL2DhioaVP/ndSawpw2kqBwfCQttHfyhTXjEzCY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717592197; c=relaxed/simple; bh=qGM5ZUeTkKUKvYat0NvVfuWmfxXAzgAZS8f3k//PF0g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=BwKV5te0RTrbTRUkmiKocp8NyVA4bBaOcQkL97H4tyXGUTF1x6HVvJmK/J1dGPVKCSjcaWnqC1u8KXM2ld9mB94MWp6CsNeomuWbHUAl3sqL71map22lXL8VRXBw2jZ4ehdiBvZI38zcnAxesHaN2OdlrOA17Gn7yhQSplzdsZ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=IdXFpUBn; arc=none smtp.client-ip=115.124.30.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="IdXFpUBn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717592187; h=From:To:Subject:Date:Message-Id; bh=+D++LLxYuPWxjq7RYug+cI/d4/wX6eRFJ4wJchbjINk=; b=IdXFpUBntwzrkAA2kKAm7mZmfJEL1L4l3oF1/ciFnEszXEGTDNHmdV0kQrLM+0oUQ+qgLvBWOcUkLF/3Nx/iExVA5hXLLGoDf6lXBlyx++3L+mSlfxFKAJI/6cD7+N3AmMVFfrJEr5N9jX0j+CgYVn+8hx0wpMsEMbxuqBPDW6g= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033045046011;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W7vCi8t_1717592186; Received: from j66a10360.sqa.eu95.tbsite.net(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0W7vCi8t_1717592186) by smtp.aliyun-inc.com; Wed, 05 Jun 2024 20:56:26 +0800 From: "D. Wythe" To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com, guwen@linux.alibaba.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, tonylu@linux.alibaba.com, pabeni@redhat.com, edumazet@google.com Subject: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC Date: Wed, 5 Jun 2024 20:56:20 +0800 Message-Id: <1717592180-66181-4-git-send-email-alibuda@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1717592180-66181-1-git-send-email-alibuda@linux.alibaba.com> References: <1717592180-66181-1-git-send-email-alibuda@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: "D. Wythe" This patch allows to create smc socket via AF_INET, similar to the following code, /* create v4 smc sock */ v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); /* create v6 smc sock */ v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); There are several reasons why we believe it is appropriate here: 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6) address. There is no AF_SMC address at all. 2. Create smc socket in the AF_INET(6) path, which allows us to reuse the infrastructure of AF_INET(6) path, such as common ebpf hooks. Otherwise, smc have to implement it again in AF_SMC path. Signed-off-by: D. Wythe Tested-by: Niklas Schnelle --- include/uapi/linux/in.h | 2 + net/smc/Makefile | 2 +- net/smc/af_smc.c | 16 ++++- net/smc/smc_inet.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++ net/smc/smc_inet.h | 22 +++++++ 5 files changed, 208 insertions(+), 3 deletions(-) create mode 100644 net/smc/smc_inet.c create mode 100644 net/smc/smc_inet.h diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index e682ab6..0c6322b 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -83,6 +83,8 @@ enum { #define IPPROTO_RAW IPPROTO_RAW IPPROTO_MPTCP = 262, /* Multipath TCP connection */ #define IPPROTO_MPTCP IPPROTO_MPTCP + IPPROTO_SMC = 263, /* Shared Memory Communications */ +#define IPPROTO_SMC IPPROTO_SMC IPPROTO_MAX }; #endif diff --git a/net/smc/Makefile b/net/smc/Makefile index 2c510d54..60f1c87 100644 --- a/net/smc/Makefile +++ b/net/smc/Makefile @@ -4,6 +4,6 @@ obj-$(CONFIG_SMC) += smc.o obj-$(CONFIG_SMC_DIAG) += smc_diag.o smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_stats.o -smc-y += smc_tracepoint.o +smc-y += smc_tracepoint.o smc_inet.o smc-$(CONFIG_SYSCTL) += smc_sysctl.o smc-$(CONFIG_SMC_LO) += smc_loopback.o diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 8e3ce76..743c27e 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -54,6 +54,7 @@ #include "smc_tracepoint.h" #include "smc_sysctl.h" #include "smc_loopback.h" +#include "smc_inet.h" static DEFINE_MUTEX(smc_server_lgr_pending); /* serialize link group * creation on server @@ -3593,10 +3594,15 @@ static int __init smc_init(void) pr_err("%s: tcp_ulp_register fails with %d\n", __func__, rc); goto out_lo; } - + rc = smc_inet_init(); + if (rc) { + pr_err("%s: smc_inet_init fails with %d\n", __func__, rc); + goto out_ulp; + } static_branch_enable(&tcp_have_smc); return 0; - +out_ulp: + tcp_unregister_ulp(&smc_ulp_ops); out_lo: smc_loopback_exit(); out_ib: @@ -3633,6 +3639,7 @@ static int __init smc_init(void) static void __exit smc_exit(void) { static_branch_disable(&tcp_have_smc); + smc_inet_exit(); tcp_unregister_ulp(&smc_ulp_ops); sock_unregister(PF_SMC); smc_core_exit(); @@ -3660,4 +3667,9 @@ static void __exit smc_exit(void) MODULE_LICENSE("GPL"); MODULE_ALIAS_NETPROTO(PF_SMC); MODULE_ALIAS_TCP_ULP("smc"); +/* 263 for IPPROTO_SMC and 1 for SOCK_STREAM */ +MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 263, 1); +#if IS_ENABLED(CONFIG_IPV6) +MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET6, 263, 1); +#endif /* CONFIG_IPV6 */ MODULE_ALIAS_GENL_FAMILY(SMC_GENL_FAMILY_NAME); diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c new file mode 100644 index 00000000..bca57ae --- /dev/null +++ b/net/smc/smc_inet.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Shared Memory Communications over RDMA (SMC-R) and RoCE + * + * Definitions for the IPPROTO_SMC (socket related) + * + * Copyright IBM Corp. 2016, 2018 + * Copyright (c) 2024, Alibaba Inc. + * + * Author: D. Wythe + */ + +#include +#include + +#include "smc_inet.h" +#include "smc.h" + +static struct proto smc_inet_prot; +static const struct proto_ops smc_inet_stream_ops; +static struct inet_protosw smc_inet_protosw; + +#if IS_ENABLED(CONFIG_IPV6) +static struct proto smc_inet6_prot; +static const struct proto_ops smc_inet6_stream_ops; +static struct inet_protosw smc_inet6_protosw; +#endif /* CONFIG_IPV6 */ + +static int smc_inet_init_sock(struct sock *sk); + +static struct proto smc_inet_prot = { + .name = "INET_SMC", + .owner = THIS_MODULE, + .init = smc_inet_init_sock, + .hash = smc_hash_sk, + .unhash = smc_unhash_sk, + .release_cb = smc_release_cb, + .obj_size = sizeof(struct smc_sock), + .h.smc_hash = &smc_v4_hashinfo, + .slab_flags = SLAB_TYPESAFE_BY_RCU, +}; + +static const struct proto_ops smc_inet_stream_ops = { + .family = PF_INET, + .owner = THIS_MODULE, + .release = smc_release, + .bind = smc_bind, + .connect = smc_connect, + .socketpair = sock_no_socketpair, + .accept = smc_accept, + .getname = smc_getname, + .poll = smc_poll, + .ioctl = smc_ioctl, + .listen = smc_listen, + .shutdown = smc_shutdown, + .setsockopt = smc_setsockopt, + .getsockopt = smc_getsockopt, + .sendmsg = smc_sendmsg, + .recvmsg = smc_recvmsg, + .mmap = sock_no_mmap, + .splice_read = smc_splice_read, +}; + +static struct inet_protosw smc_inet_protosw = { + .type = SOCK_STREAM, + .protocol = IPPROTO_SMC, + .prot = &smc_inet_prot, + .ops = &smc_inet_stream_ops, + .flags = INET_PROTOSW_ICSK, +}; + +#if IS_ENABLED(CONFIG_IPV6) +static struct proto smc_inet6_prot = { + .name = "INET6_SMC", + .owner = THIS_MODULE, + .init = smc_inet_init_sock, + .hash = smc_hash_sk, + .unhash = smc_unhash_sk, + .release_cb = smc_release_cb, + .obj_size = sizeof(struct smc_sock), + .h.smc_hash = &smc_v6_hashinfo, + .slab_flags = SLAB_TYPESAFE_BY_RCU, +}; + +static const struct proto_ops smc_inet6_stream_ops = { + .family = PF_INET6, + .owner = THIS_MODULE, + .release = smc_release, + .bind = smc_bind, + .connect = smc_connect, + .socketpair = sock_no_socketpair, + .accept = smc_accept, + .getname = smc_getname, + .poll = smc_poll, + .ioctl = smc_ioctl, + .listen = smc_listen, + .shutdown = smc_shutdown, + .setsockopt = smc_setsockopt, + .getsockopt = smc_getsockopt, + .sendmsg = smc_sendmsg, + .recvmsg = smc_recvmsg, + .mmap = sock_no_mmap, + .splice_read = smc_splice_read, +}; + +static struct inet_protosw smc_inet6_protosw = { + .type = SOCK_STREAM, + .protocol = IPPROTO_SMC, + .prot = &smc_inet6_prot, + .ops = &smc_inet6_stream_ops, + .flags = INET_PROTOSW_ICSK, +}; +#endif /* CONFIG_IPV6 */ + +static int smc_inet_init_sock(struct sock *sk) +{ + struct net *net = sock_net(sk); + + /* init common smc sock */ + smc_sk_init(net, sk, IPPROTO_SMC); + /* create clcsock */ + return smc_create_clcsk(net, sk, sk->sk_family); +} + +int __init smc_inet_init(void) +{ + int rc; + + rc = proto_register(&smc_inet_prot, 1); + if (rc) { + pr_err("%s: proto_register smc_inet_prot fails with %d\n", + __func__, rc); + return rc; + } + /* no return value */ + inet_register_protosw(&smc_inet_protosw); + +#if IS_ENABLED(CONFIG_IPV6) + rc = proto_register(&smc_inet6_prot, 1); + if (rc) { + pr_err("%s: proto_register smc_inet6_prot fails with %d\n", + __func__, rc); + goto out_inet6_prot; + } + rc = inet6_register_protosw(&smc_inet6_protosw); + if (rc) { + pr_err("%s: inet6_register_protosw smc_inet6_protosw fails with %d\n", + __func__, rc); + goto out_inet6_protosw; + } + return rc; +out_inet6_protosw: + proto_unregister(&smc_inet6_prot); +out_inet6_prot: + inet_unregister_protosw(&smc_inet_protosw); + proto_unregister(&smc_inet_prot); +#endif /* CONFIG_IPV6 */ + return rc; +} + +void smc_inet_exit(void) +{ +#if IS_ENABLED(CONFIG_IPV6) + inet6_unregister_protosw(&smc_inet6_protosw); + proto_unregister(&smc_inet6_prot); +#endif /* CONFIG_IPV6 */ + inet_unregister_protosw(&smc_inet_protosw); + proto_unregister(&smc_inet_prot); +} diff --git a/net/smc/smc_inet.h b/net/smc/smc_inet.h new file mode 100644 index 00000000..a489c8a --- /dev/null +++ b/net/smc/smc_inet.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Shared Memory Communications over RDMA (SMC-R) and RoCE + * + * Definitions for the IPPROTO_SMC (socket related) + + * Copyright IBM Corp. 2016 + * Copyright (c) 2024, Alibaba Inc. + * + * Author: D. Wythe + */ +#ifndef __INET_SMC +#define __INET_SMC + +/* Initialize protocol registration on IPPROTO_SMC, + * @return 0 on success + */ +int smc_inet_init(void); + +void smc_inet_exit(void); + +#endif /* __INET_SMC */