From patchwork Wed Aug 3 20:46:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935889 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FEFBC19F2D for ; Wed, 3 Aug 2022 20:48:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238598AbiHCUsR (ORCPT ); Wed, 3 Aug 2022 16:48:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238616AbiHCUsP (ORCPT ); Wed, 3 Aug 2022 16:48:15 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C73D65C964 for ; Wed, 3 Aug 2022 13:48:13 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273FHwPC015128 for ; Wed, 3 Aug 2022 13:48:13 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=KbNPiBvyINSqrjYHIHEGwfeuVoLr7livhup3VUtpPAg=; b=ScmlsFOcUluaOeAGEVy0HeJ6+PWjdytwCQiLVKM8B9h5nDsdzEuLYlrTsw59XPpuF76K KPtcAYy/3UvqZw9Ho9UFRww+Vn4Fl5RWTRbKEIqWO6kxW+OgqG3kvS5r6xjlPyFKr6QX it5hdoae5H4Ihy8oVZf7fwUQej8dl6DhpQI= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hq9d70ev0-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:48:13 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:48:12 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id EBD077A3F6EA; Wed, 3 Aug 2022 13:46:07 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 01/15] net: Add sk_setsockopt() to take the sk ptr instead of the sock ptr Date: Wed, 3 Aug 2022 13:46:07 -0700 Message-ID: <20220803204607.3076434-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: KsJ56aVwWdom7biIyRh_19hkPxcKnpLH X-Proofpoint-ORIG-GUID: KsJ56aVwWdom7biIyRh_19hkPxcKnpLH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the sock_setsockopt() to avoid code duplication and code drift between the two duplicates. The current sock_setsockopt() takes sock ptr as the argument. The very first thing of this function is to get back the sk ptr by 'sk = sock->sk'. bpf_setsockopt() could be called when the sk does not have the sock ptr created. Meaning sk->sk_socket is NULL. For example, when a passive tcp connection has just been established but has yet been accept()-ed. Thus, it cannot use the sock_setsockopt(sk->sk_socket) or else it will pass a NULL ptr. This patch moves all sock_setsockopt implementation to the newly added sk_setsockopt(). The new sk_setsockopt() takes a sk ptr and immediately gets the sock ptr by 'sock = sk->sk_socket' The existing sock_setsockopt(sock) is changed to call sk_setsockopt(sock->sk). All existing callers have both sock->sk and sk->sk_socket pointer. The latter patch will make bpf_setsockopt(SOL_SOCKET) call sk_setsockopt(sk) directly. The bpf_setsockopt(SOL_SOCKET) does not use the optnames that require sk->sk_socket, so it will be safe. Signed-off-by: Martin KaFai Lau --- net/core/sock.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 4cb957d934a2..20269c37ab3b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1041,12 +1041,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes) * at the socket level. Everything here is generic. */ -int sock_setsockopt(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int optlen) +static int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct so_timestamping timestamping; + struct socket *sock = sk->sk_socket; struct sock_txtime sk_txtime; - struct sock *sk = sock->sk; int val; int valbool; struct linger ling; @@ -1499,6 +1499,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname, release_sock(sk); return ret; } + +int sock_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) +{ + return sk_setsockopt(sock->sk, level, optname, + optval, optlen); +} EXPORT_SYMBOL(sock_setsockopt); static const struct cred *sk_get_peer_cred(struct sock *sk) From patchwork Wed Aug 3 20:46:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935891 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92139C3F6B0 for ; Wed, 3 Aug 2022 20:49:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238699AbiHCUtR (ORCPT ); Wed, 3 Aug 2022 16:49:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238608AbiHCUtQ (ORCPT ); Wed, 3 Aug 2022 16:49:16 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38FE95C94E for ; Wed, 3 Aug 2022 13:49:15 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273FC40a025784 for ; Wed, 3 Aug 2022 13:49:15 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=VUZgVVhMIX53YGS4qTb5dAd/pH/+EGA3XiUK+r4AV0E=; b=THO264EIHWVa7OVh6ZwjNile+fWxqsXVbFxqufYHbxoR6fR/q1V96TGS7JDVdAt/kVp/ hj0SRcsjHLekbDOgUwLWsVSimA8yMnitrSF1CQyoQr/+satd2u1108NSr+vnPEPQOjbU xV84Y2BfMm4te613aFTISjmcU60uTc+aXy0= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hpy32mq6p-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:14 -0700 Received: from twshared20276.35.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:14 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 3EED07A3F71E; Wed, 3 Aug 2022 13:46:14 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 02/15] bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf Date: Wed, 3 Aug 2022 13:46:14 -0700 Message-ID: <20220803204614.3077284-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: tiqDtsaw-dXLPc7BZzWpeqcZ888e2pGt X-Proofpoint-GUID: tiqDtsaw-dXLPc7BZzWpeqcZ888e2pGt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from the sk_setsockopt(). The number of supported optnames are increasing ever and so as the duplicated code. One issue in reusing sk_setsockopt() is that the bpf prog has already acquired the sk lock. This patch adds a in_bpf() to tell if the sk_setsockopt() is called from a bpf prog. The bpf prog calling bpf_setsockopt() is either running in_task() or in_serving_softirq(). Both cases have the current->bpf_ctx initialized. Thus, the in_bpf() only needs to test !!current->bpf_ctx. This patch also adds sockopt_{lock,release}_sock() helpers for sk_setsockopt() to use. These helpers will test in_bpf() before acquiring/releasing the lock. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Note on the change in sock_setbindtodevice(). sockopt_lock_sock() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 8 ++++++++ include/net/sock.h | 3 +++ net/core/sock.c | 26 +++++++++++++++++++++++--- 3 files changed, 34 insertions(+), 3 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 20c26aed7896..b905b1b34fe4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void) return !sysctl_unprivileged_bpf_disabled; } +static inline bool in_bpf(void) +{ + return !!current->bpf_ctx; +} #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { @@ -2175,6 +2179,10 @@ static inline bool unprivileged_ebpf_enabled(void) return false; } +static inline bool in_bpf(void) +{ + return false; +} #endif /* CONFIG_BPF_SYSCALL */ void __bpf_free_used_btfs(struct bpf_prog_aux *aux, diff --git a/include/net/sock.h b/include/net/sock.h index a7273b289188..b2ff230860c6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1721,6 +1721,9 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) } } +void sockopt_lock_sock(struct sock *sk); +void sockopt_release_sock(struct sock *sk); + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/net/core/sock.c b/net/core/sock.c index 20269c37ab3b..82759540ae2c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -703,7 +703,9 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen) goto out; } - return sock_bindtoindex(sk, index, true); + sockopt_lock_sock(sk); + ret = sock_bindtoindex_locked(sk, index); + sockopt_release_sock(sk); out: #endif @@ -1036,6 +1038,24 @@ static int sock_reserve_memory(struct sock *sk, int bytes) return 0; } +void sockopt_lock_sock(struct sock *sk) +{ + if (in_bpf()) + return; + + lock_sock(sk); +} +EXPORT_SYMBOL(sockopt_lock_sock); + +void sockopt_release_sock(struct sock *sk) +{ + if (in_bpf()) + return; + + release_sock(sk); +} +EXPORT_SYMBOL(sockopt_release_sock); + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -1067,7 +1087,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, valbool = val ? 1 : 0; - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { case SO_DEBUG: @@ -1496,7 +1516,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, ret = -ENOPROTOOPT; break; } - release_sock(sk); + sockopt_release_sock(sk); return ret; } From patchwork Wed Aug 3 20:46:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935894 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42FBFC19F2C for ; Wed, 3 Aug 2022 20:49:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238486AbiHCUt3 (ORCPT ); Wed, 3 Aug 2022 16:49:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238750AbiHCUt1 (ORCPT ); Wed, 3 Aug 2022 16:49:27 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9AE15C962 for ; Wed, 3 Aug 2022 13:49:25 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 273EnGok001345 for ; Wed, 3 Aug 2022 13:49:25 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=mC+JAYl/FxMi5DwqUDHryY0tLe8yuBp32PcYiqeuywg=; b=qaxEcnh3Bh9BheprE0SmaOlxEL+Oi5fvHWgrqfmOF1TzLFUVz/yAr7w6uocL3gEFI2RA peZOwYtvNDRs2h2BvNIl5WxV8jnX/RzRkI+11lt3SZgcleYS7hYZyupLs8Y5QemzIKCW 2MIO8Ug92jfrWUsE4N/PGZnf0zbw0KzV9F8= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3hqbqx7cs7-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:25 -0700 Received: from twshared16418.24.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:22 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 87BB97A3F73D; Wed, 3 Aug 2022 13:46:20 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 03/15] bpf: net: Consider in_bpf() when testing capable() in sk_setsockopt() Date: Wed, 3 Aug 2022 13:46:20 -0700 Message-ID: <20220803204620.3077641-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: QI9Dt6hpAfmUeSHeW016t-AO5krsTVqy X-Proofpoint-GUID: QI9Dt6hpAfmUeSHeW016t-AO5krsTVqy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When bpf program calling bpf_setsockopt(SOL_SOCKET), it could be run in softirq and doesn't make sense to do the capable check. There was a similar situation in bpf_setsockopt(TCP_CONGESTION). In commit 8d650cdedaab ("tcp: fix tcp_set_congestion_control() use from bpf hook"), tcp_set_congestion_control(..., cap_net_admin) was added to skip the cap check for bpf prog. This patch adds sockopt_ns_capable() and sockopt_capable() for the sk_setsockopt() to use. They will consider the in_bpf() before doing the ns_capable() and capable() test. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Suggested-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/net/sock.h | 2 ++ net/core/sock.c | 38 +++++++++++++++++++++++++------------- 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b2ff230860c6..72b78c2b6f83 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1723,6 +1723,8 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) void sockopt_lock_sock(struct sock *sk); void sockopt_release_sock(struct sock *sk); +bool sockopt_ns_capable(struct user_namespace *ns, int cap); +bool sockopt_capable(int cap); /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it diff --git a/net/core/sock.c b/net/core/sock.c index 82759540ae2c..2d88c06c27b7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1056,6 +1056,18 @@ void sockopt_release_sock(struct sock *sk) } EXPORT_SYMBOL(sockopt_release_sock); +bool sockopt_ns_capable(struct user_namespace *ns, int cap) +{ + return in_bpf() || ns_capable(ns, cap); +} +EXPORT_SYMBOL(sockopt_ns_capable); + +bool sockopt_capable(int cap) +{ + return in_bpf() || capable(cap); +} +EXPORT_SYMBOL(sockopt_capable); + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -1091,7 +1103,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, switch (optname) { case SO_DEBUG: - if (val && !capable(CAP_NET_ADMIN)) + if (val && !sockopt_capable(CAP_NET_ADMIN)) ret = -EACCES; else sock_valbool_flag(sk, SOCK_DBG, valbool); @@ -1135,7 +1147,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, break; case SO_SNDBUFFORCE: - if (!capable(CAP_NET_ADMIN)) { + if (!sockopt_capable(CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1157,7 +1169,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, break; case SO_RCVBUFFORCE: - if (!capable(CAP_NET_ADMIN)) { + if (!sockopt_capable(CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1184,8 +1196,8 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, case SO_PRIORITY: if ((val >= 0 && val <= 6) || - ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || - ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) sk->sk_priority = val; else ret = -EPERM; @@ -1330,8 +1342,8 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, clear_bit(SOCK_PASSSEC, &sock->flags); break; case SO_MARK: - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1339,8 +1351,8 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, __sock_set_mark(sk, val); break; case SO_RCVMARK: - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1374,7 +1386,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, #ifdef CONFIG_NET_RX_BUSY_POLL case SO_BUSY_POLL: /* allow unprivileged users to decrease the value */ - if ((val > sk->sk_ll_usec) && !capable(CAP_NET_ADMIN)) + if ((val > sk->sk_ll_usec) && !sockopt_capable(CAP_NET_ADMIN)) ret = -EPERM; else { if (val < 0) @@ -1384,13 +1396,13 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, } break; case SO_PREFER_BUSY_POLL: - if (valbool && !capable(CAP_NET_ADMIN)) + if (valbool && !sockopt_capable(CAP_NET_ADMIN)) ret = -EPERM; else WRITE_ONCE(sk->sk_prefer_busy_poll, valbool); break; case SO_BUSY_POLL_BUDGET: - if (val > READ_ONCE(sk->sk_busy_poll_budget) && !capable(CAP_NET_ADMIN)) { + if (val > READ_ONCE(sk->sk_busy_poll_budget) && !sockopt_capable(CAP_NET_ADMIN)) { ret = -EPERM; } else { if (val < 0 || val > U16_MAX) @@ -1461,7 +1473,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, * scheduler has enough safe guards. */ if (sk_txtime.clockid != CLOCK_MONOTONIC && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } From patchwork Wed Aug 3 20:46:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935892 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC95AC19F2B for ; Wed, 3 Aug 2022 20:49:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238755AbiHCUtV (ORCPT ); Wed, 3 Aug 2022 16:49:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238751AbiHCUtU (ORCPT ); Wed, 3 Aug 2022 16:49:20 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45CE05C95C for ; Wed, 3 Aug 2022 13:49:19 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 273JhO0T032110 for ; Wed, 3 Aug 2022 13:49:18 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=irPemIS8khOKJfzIOv+jj6pgXo9a6s84I7Qf+nMKSdo=; b=VNDrI8+Hn8KswlUhTC0dgsCGgXEe1vbVGd1eB4VTX+Uh5xjROXENKLHtQdLaxFhRjOZX BiLeC3X++OYjzWArXdyx+3eFalIl3amJGQ+zE2aYb9YNavmKku8YUR+gGZd/7SFf2HM+ seHlm0wWxORXJ8zMCseEv7URgfZ2BKKCGVQ= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3hq51d2vdv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:18 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:17 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id CF6FA7A3F74E; Wed, 3 Aug 2022 13:46:26 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 04/15] bpf: net: Change do_tcp_setsockopt() to use the sockopt's lock_sock() and capable() Date: Wed, 3 Aug 2022 13:46:26 -0700 Message-ID: <20220803204626.3078012-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: NGBjOtC2YVqmJEZW0ivRP0X2qFpQbwFA X-Proofpoint-ORIG-GUID: NGBjOtC2YVqmJEZW0ivRP0X2qFpQbwFA X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_tcp_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Signed-off-by: Martin KaFai Lau --- net/ipv4/tcp.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 970e9a2cca4a..cfed84b1883f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3202,7 +3202,7 @@ EXPORT_SYMBOL(tcp_disconnect); static inline bool tcp_can_repair_sock(const struct sock *sk) { - return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) && + return sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) && (sk->sk_state != TCP_LISTEN); } @@ -3502,11 +3502,11 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EFAULT; name[val] = 0; - lock_sock(sk); + sockopt_lock_sock(sk); err = tcp_set_congestion_control(sk, name, true, - ns_capable(sock_net(sk)->user_ns, - CAP_NET_ADMIN)); - release_sock(sk); + sockopt_ns_capable(sock_net(sk)->user_ns, + CAP_NET_ADMIN)); + sockopt_release_sock(sk); return err; } case TCP_ULP: { @@ -3522,9 +3522,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EFAULT; name[val] = 0; - lock_sock(sk); + sockopt_lock_sock(sk); err = tcp_set_ulp(sk, name); - release_sock(sk); + sockopt_release_sock(sk); return err; } case TCP_FASTOPEN_KEY: { @@ -3557,7 +3557,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { case TCP_MAXSEG: @@ -3779,7 +3779,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, break; } - release_sock(sk); + sockopt_release_sock(sk); return err; } From patchwork Wed Aug 3 20:46:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935893 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC72FC19F2B for ; Wed, 3 Aug 2022 20:49:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237819AbiHCUt2 (ORCPT ); Wed, 3 Aug 2022 16:49:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238486AbiHCUt1 (ORCPT ); Wed, 3 Aug 2022 16:49:27 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1904B5C970 for ; Wed, 3 Aug 2022 13:49:25 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273Fdw4J016530 for ; Wed, 3 Aug 2022 13:49:24 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=7ZB1FOytkLEwnhffE8QpnA+JIiI8bQIu+b6KKDyktPM=; b=YYc50ML6za7+TUcHWqwLi0U3HLZ0OX8lKAM+dic4HOok81muN1maUX/UNrMWcSbAKva7 Z4YQVXkNhLnffg0ziJSoH4Np0K1hdaH8a1yw/QWJOyieRsu4SjjOMI0+ywcTvl6ZeYkv ZGoHEaaskrx//9/wWDMQjHhjzxkKZU8by2Q= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hqfeb6469-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:24 -0700 Received: from twshared16418.24.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:22 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 2AAB97A3F762; Wed, 3 Aug 2022 13:46:33 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 05/15] bpf: net: Change do_ip_setsockopt() to use the sockopt's lock_sock() and capable() Date: Wed, 3 Aug 2022 13:46:33 -0700 Message-ID: <20220803204633.3078226-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: bfhVAsDq8uxX9pfE9s8gvlaNwWALmmVi X-Proofpoint-GUID: bfhVAsDq8uxX9pfE9s8gvlaNwWALmmVi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_ip_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Signed-off-by: Martin KaFai Lau --- net/ipv4/ip_sockglue.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index a8a323ecbb54..a3c496580e6b 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -944,7 +944,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, err = 0; if (needs_rtnl) rtnl_lock(); - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { case IP_OPTIONS: @@ -1333,14 +1333,14 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, case IP_IPSEC_POLICY: case IP_XFRM_POLICY: err = -EPERM; - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) + if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) break; err = xfrm_user_policy(sk, optname, optval, optlen); break; case IP_TRANSPARENT: - if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (!!val && !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { err = -EPERM; break; } @@ -1368,13 +1368,13 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, err = -ENOPROTOOPT; break; } - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return err; e_inval: - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return -EINVAL; From patchwork Wed Aug 3 20:46:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935895 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC9A8C19F2B for ; Wed, 3 Aug 2022 20:49:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238757AbiHCUtc (ORCPT ); Wed, 3 Aug 2022 16:49:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238752AbiHCUtb (ORCPT ); Wed, 3 Aug 2022 16:49:31 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07F1B5C964 for ; Wed, 3 Aug 2022 13:49:27 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273KnOIA017912 for ; Wed, 3 Aug 2022 13:49:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=yRFLwGqk94NaXr4KoX+QqXm4ZYNDFsmpZbu64kueqZ4=; b=cqarsKINgjWukgAmJkohWwwGOFJjln4mBM3wf/208f1p1MQTiS8KGaW4DB4Fqvf+Cbwy Xxl3BKtX3zHeoZ4My9XtQTUwYffUhMqHSI24S9Nzp19vvPA6c3sHW66FP59Z05oY2wej CQweViPCEMPqHZHh7sB2eeW2r47x2kxJSGA= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hq4b7u51x-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:27 -0700 Received: from twshared20276.35.frc1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:25 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 7E1D67A3F777; Wed, 3 Aug 2022 13:46:39 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 06/15] bpf: net: Change do_ipv6_setsockopt() to use the sockopt's lock_sock() and capable() Date: Wed, 3 Aug 2022 13:46:39 -0700 Message-ID: <20220803204639.3078356-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: VqM7e1Zg1xWeV4QRDnM1cBYpV1lvVH-J X-Proofpoint-GUID: VqM7e1Zg1xWeV4QRDnM1cBYpV1lvVH-J X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_ipv6_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Signed-off-by: Martin KaFai Lau --- net/ipv6/ipv6_sockglue.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 222f6bf220ba..d23de48ff612 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -327,7 +327,7 @@ static int ipv6_set_opt_hdr(struct sock *sk, int optname, sockptr_t optval, int err; /* hop-by-hop / destination options are privileged option */ - if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW)) + if (optname != IPV6_RTHDR && !sockopt_ns_capable(net->user_ns, CAP_NET_RAW)) return -EPERM; /* remove any sticky options header with a zero option @@ -417,7 +417,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, if (needs_rtnl) rtnl_lock(); - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { @@ -634,8 +634,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; case IPV6_TRANSPARENT: - if (valbool && !ns_capable(net->user_ns, CAP_NET_RAW) && - !ns_capable(net->user_ns, CAP_NET_ADMIN)) { + if (valbool && !sockopt_ns_capable(net->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(net->user_ns, CAP_NET_ADMIN)) { retv = -EPERM; break; } @@ -946,7 +946,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, case IPV6_IPSEC_POLICY: case IPV6_XFRM_POLICY: retv = -EPERM; - if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + if (!sockopt_ns_capable(net->user_ns, CAP_NET_ADMIN)) break; retv = xfrm_user_policy(sk, optname, optval, optlen); break; @@ -994,14 +994,14 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; } - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return retv; e_inval: - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return -EINVAL; From patchwork Wed Aug 3 20:46:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935898 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88F4EC19F2B for ; Wed, 3 Aug 2022 20:49:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238750AbiHCUtd (ORCPT ); Wed, 3 Aug 2022 16:49:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238751AbiHCUtb (ORCPT ); Wed, 3 Aug 2022 16:49:31 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C1E45C97B for ; Wed, 3 Aug 2022 13:49:29 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273G6TM3019609 for ; Wed, 3 Aug 2022 13:49:28 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=jB+A92tTdd2e2Rofg78CH38HLFXpUmHdV5K7L7aTBkw=; b=h/8al6uytRvYG7gQPk3QHi6GwJ6+PkT9FXSa8bAmPASf18L1VtuRuvVp2Us4hGvTLytd kjQT7weeBUuic/7fUjj8w3fbBHzvwFpDvSlX491HtG216uNsYvDrN2B5Y72M2xZgbNq1 1j4o9MbuVZTtO+JV2KA3lQNkPI8biRbHSc0= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hqfeb646j-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:28 -0700 Received: from twshared30313.14.frc2.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:26 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id C765B7A3F78B; Wed, 3 Aug 2022 13:46:45 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 07/15] bpf: Initialize the bpf_run_ctx in bpf_iter_run_prog() Date: Wed, 3 Aug 2022 13:46:45 -0700 Message-ID: <20220803204645.3079236-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 7efyKlMyS9qd5st1mm32jr8JdrM8BqBJ X-Proofpoint-GUID: 7efyKlMyS9qd5st1mm32jr8JdrM8BqBJ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The bpf-iter-prog for tcp and unix sk can do bpf_setsockopt() which needs in_bpf() to decide if it is called by a bpf prog. This patch initializes the bpf_run_ctx in bpf_iter_run_prog() for the in_bpf() to use. Signed-off-by: Martin KaFai Lau --- kernel/bpf/bpf_iter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 7e8fd49406f6..5d3a982eb553 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -681,19 +681,24 @@ struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop) int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx) { + struct bpf_run_ctx run_ctx, *old_run_ctx; int ret; if (prog->aux->sleepable) { rcu_read_lock_trace(); migrate_disable(); might_fault(); + old_run_ctx = bpf_set_run_ctx(&run_ctx); ret = bpf_prog_run(prog, ctx); + bpf_reset_run_ctx(old_run_ctx); migrate_enable(); rcu_read_unlock_trace(); } else { rcu_read_lock(); migrate_disable(); + old_run_ctx = bpf_set_run_ctx(&run_ctx); ret = bpf_prog_run(prog, ctx); + bpf_reset_run_ctx(old_run_ctx); migrate_enable(); rcu_read_unlock(); } From patchwork Wed Aug 3 20:46:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935899 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1756C19F2D for ; Wed, 3 Aug 2022 20:49:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238774AbiHCUte (ORCPT ); Wed, 3 Aug 2022 16:49:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238767AbiHCUtd (ORCPT ); Wed, 3 Aug 2022 16:49:33 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E42495C9C1 for ; Wed, 3 Aug 2022 13:49:29 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273EMN1P008099 for ; Wed, 3 Aug 2022 13:49:29 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=oBzZsS9YU5IlPp/uz4kK4qHUrMDYMlhpxpusLRYRTog=; b=lusEjG6xwi3od1flVh7VxxbqSxGAPgamtNum+B1zFj61ASbu0J/VBdO0nIuOtDrNTANJ ylKuOer1U6ovVh8gv58Fl+eMKKTwN8HlRZ14LLxaaj6zkh4JhKiwg4klMwiQt8uXVS2W Qq45cwTDsI8bqZHP+IXUrdojiDeRY0I+YIs= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hqtqhaumm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:28 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:28 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 1E8227A3F7A5; Wed, 3 Aug 2022 13:46:52 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 08/15] bpf: Embed kernel CONFIG check into the if statement in bpf_setsockopt Date: Wed, 3 Aug 2022 13:46:52 -0700 Message-ID: <20220803204652.3080296-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: zQlueORZD959qrLOKrwjubxjSnoA1Wlm X-Proofpoint-ORIG-GUID: zQlueORZD959qrLOKrwjubxjSnoA1Wlm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch moves the "#ifdef CONFIG_XXX" check into the "if/else" statement itself. The change is done for the bpf_setsockopt() function only. It will make the latter patches easier to follow without the surrounding ifdef macro. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 5669248aff25..01cb4a01b1c1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5113,8 +5113,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#ifdef CONFIG_INET - } else if (level == SOL_IP) { + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { if (optlen != sizeof(int) || sk->sk_family != AF_INET) return -EINVAL; @@ -5135,8 +5134,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#if IS_ENABLED(CONFIG_IPV6) - } else if (level == SOL_IPV6) { + } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { if (optlen != sizeof(int) || sk->sk_family != AF_INET6) return -EINVAL; @@ -5157,8 +5155,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#endif - } else if (level == SOL_TCP && + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && sk->sk_prot->setsockopt == tcp_setsockopt) { if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -5250,7 +5247,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, ret = -EINVAL; } } -#endif } else { ret = -EINVAL; } From patchwork Wed Aug 3 20:46:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935896 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EB0CC19F2D for ; Wed, 3 Aug 2022 20:49:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238758AbiHCUtc (ORCPT ); Wed, 3 Aug 2022 16:49:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238756AbiHCUtb (ORCPT ); Wed, 3 Aug 2022 16:49:31 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AE715C969 for ; Wed, 3 Aug 2022 13:49:28 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273F77f9013301 for ; Wed, 3 Aug 2022 13:49:28 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=wFguDFVefju9nXZfLjygUUYSjTZJSDSTl+ch5Kym9+s=; b=YEOFe77St0zr+HC8zzssXOhGgd+wmdq1AfFa3Np4LZDERre5Uzh/egA92A2YCBRlPGeg aVcmUMEqCMKqJW/S/anCuELM5QvmVLMZKxqQRqERRsXg2yR8qdnWGe5mRxQfNqCqWJOd NSlHY67fIQGeyQnGywAkQEvl9YMyLErsugc= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hq2bq3quq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:28 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:27 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 67CDD7A3F7CE; Wed, 3 Aug 2022 13:46:58 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 09/15] bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt() Date: Wed, 3 Aug 2022 13:46:58 -0700 Message-ID: <20220803204658.3080795-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: FpsvTmG7B46h3-bKFqpp-ugBDGvJgdmf X-Proofpoint-GUID: FpsvTmG7B46h3-bKFqpp-ugBDGvJgdmf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET) and reuses them from sk_setsockopt(). The only exception is SO_RCVLOWAT. The bpf_setsockopt() does not always have the sock ptr (sk->sk_socket) and sock->ops->set_rcvlowat is needed in sk_setsockopt. tcp_set_rcvlowat is the only implementation for set_rcvlowat. bpf_setsockopt() needs one special handling for SO_RCVLOWAT for tcp_sock, so leave SO_RCVLOWAT in bpf_setsockopt() for now. The existing optname white-list is refactored into a new function sol_socket_setsockopt(). Signed-off-by: Martin KaFai Lau --- include/net/sock.h | 2 + net/core/filter.c | 130 +++++++++++++-------------------------------- net/core/sock.c | 4 +- 3 files changed, 42 insertions(+), 94 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 72b78c2b6f83..b7e159f9d7bf 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1800,6 +1800,8 @@ void sock_pfree(struct sk_buff *skb); #define sock_edemux sock_efree #endif +int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); diff --git a/net/core/filter.c b/net/core/filter.c index 01cb4a01b1c1..d1f9f8360f60 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5013,106 +5013,52 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +static int sol_socket_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + switch (optname) { + case SO_SNDBUF: + case SO_RCVBUF: + case SO_KEEPALIVE: + case SO_PRIORITY: + case SO_REUSEPORT: + case SO_RCVLOWAT: + case SO_MARK: + case SO_MAX_PACING_RATE: + case SO_BINDTOIFINDEX: + case SO_TXREHASH: + if (optlen != sizeof(int)) + return -EINVAL; + break; + case SO_BINDTODEVICE: + break; + default: + return -EINVAL; + } + + if (optname == SO_RCVLOWAT) { + int val = *(int *)optval; + + if (val < 0) + val = INT_MAX; + WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); + return 0; + } + + return sk_setsockopt(sk, SOL_SOCKET, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { - char devname[IFNAMSIZ]; - int val, valbool; - struct net *net; - int ifindex; - int ret = 0; + int val, ret = 0; if (!sk_fullsock(sk)) return -EINVAL; if (level == SOL_SOCKET) { - if (optlen != sizeof(int) && optname != SO_BINDTODEVICE) - return -EINVAL; - val = *((int *)optval); - valbool = val ? 1 : 0; - - /* Only some socketops are supported */ - switch (optname) { - case SO_RCVBUF: - val = min_t(u32, val, sysctl_rmem_max); - val = min_t(int, val, INT_MAX / 2); - sk->sk_userlocks |= SOCK_RCVBUF_LOCK; - WRITE_ONCE(sk->sk_rcvbuf, - max_t(int, val * 2, SOCK_MIN_RCVBUF)); - break; - case SO_SNDBUF: - val = min_t(u32, val, sysctl_wmem_max); - val = min_t(int, val, INT_MAX / 2); - sk->sk_userlocks |= SOCK_SNDBUF_LOCK; - WRITE_ONCE(sk->sk_sndbuf, - max_t(int, val * 2, SOCK_MIN_SNDBUF)); - break; - case SO_MAX_PACING_RATE: /* 32bit version */ - if (val != ~0U) - cmpxchg(&sk->sk_pacing_status, - SK_PACING_NONE, - SK_PACING_NEEDED); - sk->sk_max_pacing_rate = (val == ~0U) ? - ~0UL : (unsigned int)val; - sk->sk_pacing_rate = min(sk->sk_pacing_rate, - sk->sk_max_pacing_rate); - break; - case SO_PRIORITY: - sk->sk_priority = val; - break; - case SO_RCVLOWAT: - if (val < 0) - val = INT_MAX; - WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); - break; - case SO_MARK: - if (sk->sk_mark != val) { - sk->sk_mark = val; - sk_dst_reset(sk); - } - break; - case SO_BINDTODEVICE: - optlen = min_t(long, optlen, IFNAMSIZ - 1); - strncpy(devname, optval, optlen); - devname[optlen] = 0; - - ifindex = 0; - if (devname[0] != '\0') { - struct net_device *dev; - - ret = -ENODEV; - - net = sock_net(sk); - dev = dev_get_by_name(net, devname); - if (!dev) - break; - ifindex = dev->ifindex; - dev_put(dev); - } - fallthrough; - case SO_BINDTOIFINDEX: - if (optname == SO_BINDTOIFINDEX) - ifindex = val; - ret = sock_bindtoindex(sk, ifindex, false); - break; - case SO_KEEPALIVE: - if (sk->sk_prot->keepalive) - sk->sk_prot->keepalive(sk, valbool); - sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool); - break; - case SO_REUSEPORT: - sk->sk_reuseport = valbool; - break; - case SO_TXREHASH: - if (val < -1 || val > 1) { - ret = -EINVAL; - break; - } - sk->sk_txrehash = (u8)val; - break; - default: - ret = -EINVAL; - } + return sol_socket_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { if (optlen != sizeof(int) || sk->sk_family != AF_INET) return -EINVAL; diff --git a/net/core/sock.c b/net/core/sock.c index 2d88c06c27b7..1ea320e09794 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1073,8 +1073,8 @@ EXPORT_SYMBOL(sockopt_capable); * at the socket level. Everything here is generic. */ -static int sk_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct so_timestamping timestamping; struct socket *sock = sk->sk_socket; From patchwork Wed Aug 3 20:47:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935897 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 736C1C19F2C for ; Wed, 3 Aug 2022 20:49:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238751AbiHCUtd (ORCPT ); Wed, 3 Aug 2022 16:49:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238754AbiHCUtc (ORCPT ); Wed, 3 Aug 2022 16:49:32 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE9495C97E for ; Wed, 3 Aug 2022 13:49:29 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 273EnGop001345 for ; Wed, 3 Aug 2022 13:49:29 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=ujDSiFNfrZwzZNRg+VBbDUDuH+e7gyExwa8XMjgbvIw=; b=WIR6cbtdT+6uvrelc6DJTVzPUfm4Sm7e3A3V+T/Bg0nhMABiHRSJm0Imv6x5Yd//mbyN 2GQmBXBV83jDnHpDY+Ythp5e5GhGbmLVFUJd27AASLjvytS6E5nMwGqF0wMnOoU9Kznf 18KQlbJgWGAGdypn2d6r+hQJer7Ew1l6kAA= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net (PPS) with ESMTPS id 3hqbqx7csr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:49:28 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:49:28 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id B11627A3F7FD; Wed, 3 Aug 2022 13:47:04 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 10/15] bpf: Refactor bpf specific tcp optnames to a new function Date: Wed, 3 Aug 2022 13:47:04 -0700 Message-ID: <20220803204704.3081014-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: lvG4dNMmn2aCYikd4RFTUfs9sERQXrJV X-Proofpoint-GUID: lvG4dNMmn2aCYikd4RFTUfs9sERQXrJV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The patch moves all bpf specific tcp optnames (TCP_BPF_XXX) to a new function bpf_sol_tcp_setsockopt(). This will make the next patch easier to follow. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 79 ++++++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 29 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index d1f9f8360f60..200e79a1fbfd 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5049,6 +5049,52 @@ static int sol_socket_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR(optval), optlen); } +static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + struct tcp_sock *tp = tcp_sk(sk); + unsigned long timeout; + int val; + + if (optlen != sizeof(int)) + return -EINVAL; + + val = *(int *)optval; + + /* Only some options are supported */ + switch (optname) { + case TCP_BPF_IW: + if (val <= 0 || tp->data_segs_out > tp->syn_data) + return -EINVAL; + tcp_snd_cwnd_set(tp, val); + break; + case TCP_BPF_SNDCWND_CLAMP: + if (val <= 0) + return -EINVAL; + tp->snd_cwnd_clamp = val; + tp->snd_ssthresh = val; + break; + case TCP_BPF_DELACK_MAX: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_DELACK_MAX || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_delack_max = timeout; + break; + case TCP_BPF_RTO_MIN: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_RTO_MIN || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_rto_min = timeout; + break; + default: + return -EINVAL; + } + + return 0; +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5103,6 +5149,10 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, } } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && sk->sk_prot->setsockopt == tcp_setsockopt) { + if (optname >= TCP_BPF_IW) + return bpf_sol_tcp_setsockopt(sk, optname, + optval, optlen); + if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -5113,7 +5163,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, } else { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); - unsigned long timeout; if (optlen != sizeof(int)) return -EINVAL; @@ -5121,34 +5170,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, val = *((int *)optval); /* Only some options are supported */ switch (optname) { - case TCP_BPF_IW: - if (val <= 0 || tp->data_segs_out > tp->syn_data) - ret = -EINVAL; - else - tcp_snd_cwnd_set(tp, val); - break; - case TCP_BPF_SNDCWND_CLAMP: - if (val <= 0) { - ret = -EINVAL; - } else { - tp->snd_cwnd_clamp = val; - tp->snd_ssthresh = val; - } - break; - case TCP_BPF_DELACK_MAX: - timeout = usecs_to_jiffies(val); - if (timeout > TCP_DELACK_MAX || - timeout < TCP_TIMEOUT_MIN) - return -EINVAL; - inet_csk(sk)->icsk_delack_max = timeout; - break; - case TCP_BPF_RTO_MIN: - timeout = usecs_to_jiffies(val); - if (timeout > TCP_RTO_MIN || - timeout < TCP_TIMEOUT_MIN) - return -EINVAL; - inet_csk(sk)->icsk_rto_min = timeout; - break; case TCP_SAVE_SYN: if (val < 0 || val > 1) ret = -EINVAL; From patchwork Wed Aug 3 20:47:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935901 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFC3EC3F6B0 for ; Wed, 3 Aug 2022 20:50:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238780AbiHCUuk (ORCPT ); Wed, 3 Aug 2022 16:50:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238808AbiHCUug (ORCPT ); Wed, 3 Aug 2022 16:50:36 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89F9525E2 for ; Wed, 3 Aug 2022 13:50:16 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273EqWrC018001 for ; Wed, 3 Aug 2022 13:50:15 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=u/ykZrX+A34eoUkuoWSKF5glpuZhxUXsUdm7O7yHv2g=; b=NaMaDkxyxLlh0n+Z6iU4MWSMrpmTHasEmsrxzSmMdZYmeI5zkCGk601aop1l50IOey9f 2Y6gbDE0f7xSaeJ0xQh2J8fOtWKBHe1Y/77naNW4rikMVihXzeOrwVWon8/HTD23WORq /eVKaQIceCJQuZLE1AF06A4dg/Mm6GbYHlQ= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hq4b7u56u-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:50:15 -0700 Received: from twshared30313.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:50:14 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 042757A3F811; Wed, 3 Aug 2022 13:47:10 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 11/15] bpf: Change bpf_setsockopt(SOL_TCP) to reuse do_tcp_setsockopt() Date: Wed, 3 Aug 2022 13:47:10 -0700 Message-ID: <20220803204710.3081262-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: wrye9iqlmj90WVrsPvRquqefDzOGIiIF X-Proofpoint-GUID: wrye9iqlmj90WVrsPvRquqefDzOGIiIF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes all the dup code from bpf_setsockopt(SOL_TCP) and reuses the do_tcp_setsockopt(). The existing optname white-list is refactored into a new function sol_tcp_setsockopt(). The sol_tcp_setsockopt() also calls the bpf_sol_tcp_setsockopt() to handle the TCP_BPF_XXX specific optnames. bpf_setsockopt(TCP_SAVE_SYN) now also allows a value 2 to save the eth header also and it comes for free from do_tcp_setsockopt(). Signed-off-by: Martin KaFai Lau --- include/net/tcp.h | 2 + net/core/filter.c | 97 +++++++++++++++-------------------------------- net/ipv4/tcp.c | 4 +- 3 files changed, 34 insertions(+), 69 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index d10962b9f0d0..c03a50c72f40 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -405,6 +405,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); bool tcp_bpf_bypass_getsockopt(int level, int optname); +int do_tcp_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); void tcp_set_keepalive(struct sock *sk, int val); diff --git a/net/core/filter.c b/net/core/filter.c index 200e79a1fbfd..0c5361b8906d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5095,6 +5095,34 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, return 0; } +static int sol_tcp_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_prot->setsockopt != tcp_setsockopt) + return -EINVAL; + + switch (optname) { + case TCP_KEEPIDLE: + case TCP_KEEPINTVL: + case TCP_KEEPCNT: + case TCP_SYNCNT: + case TCP_WINDOW_CLAMP: + case TCP_USER_TIMEOUT: + case TCP_NOTSENT_LOWAT: + case TCP_SAVE_SYN: + if (optlen != sizeof(int)) + return -EINVAL; + break; + case TCP_CONGESTION: + break; + default: + return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen); + } + + return do_tcp_setsockopt(sk, SOL_TCP, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5147,73 +5175,8 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && - sk->sk_prot->setsockopt == tcp_setsockopt) { - if (optname >= TCP_BPF_IW) - return bpf_sol_tcp_setsockopt(sk, optname, - optval, optlen); - - if (optname == TCP_CONGESTION) { - char name[TCP_CA_NAME_MAX]; - - strncpy(name, optval, min_t(long, optlen, - TCP_CA_NAME_MAX-1)); - name[TCP_CA_NAME_MAX-1] = 0; - ret = tcp_set_congestion_control(sk, name, false, true); - } else { - struct inet_connection_sock *icsk = inet_csk(sk); - struct tcp_sock *tp = tcp_sk(sk); - - if (optlen != sizeof(int)) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case TCP_SAVE_SYN: - if (val < 0 || val > 1) - ret = -EINVAL; - else - tp->save_syn = val; - break; - case TCP_KEEPIDLE: - ret = tcp_sock_set_keepidle_locked(sk, val); - break; - case TCP_KEEPINTVL: - if (val < 1 || val > MAX_TCP_KEEPINTVL) - ret = -EINVAL; - else - tp->keepalive_intvl = val * HZ; - break; - case TCP_KEEPCNT: - if (val < 1 || val > MAX_TCP_KEEPCNT) - ret = -EINVAL; - else - tp->keepalive_probes = val; - break; - case TCP_SYNCNT: - if (val < 1 || val > MAX_TCP_SYNCNT) - ret = -EINVAL; - else - icsk->icsk_syn_retries = val; - break; - case TCP_USER_TIMEOUT: - if (val < 0) - ret = -EINVAL; - else - icsk->icsk_user_timeout = val; - break; - case TCP_NOTSENT_LOWAT: - tp->notsent_lowat = val; - sk->sk_write_space(sk); - break; - case TCP_WINDOW_CLAMP: - ret = tcp_set_window_clamp(sk, val); - break; - default: - ret = -EINVAL; - } - } + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) { + return sol_tcp_setsockopt(sk, optname, optval, optlen); } else { ret = -EINVAL; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cfed84b1883f..a6986f201f92 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3479,8 +3479,8 @@ int tcp_set_window_clamp(struct sock *sk, int val) /* * Socket option code for TCP. */ -static int do_tcp_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_tcp_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); From patchwork Wed Aug 3 20:47:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935903 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6B7DC19F2C for ; Wed, 3 Aug 2022 20:50:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238842AbiHCUup (ORCPT ); Wed, 3 Aug 2022 16:50:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238838AbiHCUui (ORCPT ); Wed, 3 Aug 2022 16:50:38 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E5F11EEC5 for ; Wed, 3 Aug 2022 13:50:18 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 273ErW5b001364 for ; Wed, 3 Aug 2022 13:50:17 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=oLoq0XAqFBD1KiKoxhQcp05Cajd6NWJw8RAKDt/DL0A=; b=UMWbUMKeEvPgthmXl7Wq3jj98idKjUKTqMjYgjBzgorqHLYfCpVS+r4LRrSwramnm5ib pih6p6FzdDVQL0hi2juqk94U3Cu3JbwdkAM2yP6sSrRcLwD6layggKRsxVMz9sfRFt7C pl1bZmhxMMVA5dVRdCMsSWk8Vyk5Wv9doUE= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3hqbqx7cx2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:50:17 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:50:16 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 4D7007A3F81B; Wed, 3 Aug 2022 13:47:17 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 12/15] bpf: Change bpf_setsockopt(SOL_IP) to reuse do_ip_setsockopt() Date: Wed, 3 Aug 2022 13:47:17 -0700 Message-ID: <20220803204717.3081496-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: L7DYJIjcadWdRSHDWMP4lngaIv3TeHo_ X-Proofpoint-GUID: L7DYJIjcadWdRSHDWMP4lngaIv3TeHo_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IP) and reuses the implementation in do_ip_setsockopt(). The existing optname white-list is refactored into a new function sol_ip_setsockopt(). NOTE, the current bpf_setsockopt(IP_TOS) is quite different from the the do_ip_setsockopt(IP_TOS). For example, it does not take the INET_ECN_MASK into the account for tcp and also does not adjust sk->sk_priority. It looks like the current bpf_setsockopt(IP_TOS) was referencing the IPV6_TCLASS implementation instead of IP_TOS. This patch tries to rectify that by using the do_ip_setsockopt(IP_TOS). While this is a behavior change, the do_ip_setsockopt(IP_TOS) behavior is arguably what the user is expecting. At least, the INET_ECN_MASK bits should be masked out for tcp. Signed-off-by: Martin KaFai Lau --- include/net/ip.h | 2 ++ net/core/filter.c | 40 ++++++++++++++++++++-------------------- net/ipv4/ip_sockglue.c | 4 ++-- 3 files changed, 24 insertions(+), 22 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 1c979fd1904c..34fa5b0f0a0e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -743,6 +743,8 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk, int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc, bool allow_ipv6); DECLARE_STATIC_KEY_FALSE(ip4_min_ttl); +int do_ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, diff --git a/net/core/filter.c b/net/core/filter.c index 0c5361b8906d..d236f71ff501 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5123,6 +5123,25 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR(optval), optlen); } +static int sol_ip_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_family != AF_INET) + return -EINVAL; + + switch (optname) { + case IP_TOS: + if (optlen != sizeof(int)) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return do_ip_setsockopt(sk, SOL_IP, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5134,26 +5153,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, if (level == SOL_SOCKET) { return sol_socket_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { - if (optlen != sizeof(int) || sk->sk_family != AF_INET) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case IP_TOS: - if (val < -1 || val > 0xff) { - ret = -EINVAL; - } else { - struct inet_sock *inet = inet_sk(sk); - - if (val == -1) - val = 0; - inet->tos = val; - } - break; - default: - ret = -EINVAL; - } + return sol_ip_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { if (optlen != sizeof(int) || sk->sk_family != AF_INET6) return -EINVAL; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index a3c496580e6b..751fa69cb557 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -888,8 +888,8 @@ static int compat_ip_mcast_join_leave(struct sock *sk, int optname, DEFINE_STATIC_KEY_FALSE(ip4_min_ttl); -static int do_ip_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_ip_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct inet_sock *inet = inet_sk(sk); struct net *net = sock_net(sk); From patchwork Wed Aug 3 20:47:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935900 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AFE6C19F2B for ; Wed, 3 Aug 2022 20:50:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232909AbiHCUui (ORCPT ); Wed, 3 Aug 2022 16:50:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238802AbiHCUug (ORCPT ); Wed, 3 Aug 2022 16:50:36 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A538BCB6 for ; Wed, 3 Aug 2022 13:50:17 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273FDb8W014191 for ; Wed, 3 Aug 2022 13:50:17 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=lLEYFwrY6SOSKYk0D+ktc87AGzCJd+or52chN2QpU+o=; b=XB9IBXn6+5nS4NxikSP3Ivtv24TYkoQdWMnb2t/OxKpdp5N1Aw8uyhnAj9zGSnSEXUxf l7K185+lE8L62qwWvgOf0X5HSAsmT720BYr9+6E9fkCyR3qoPbopqq6kVX/Nr9u96Laq xvRqYrJyjtfGpo358K0xGUPjL5MVz9toANY= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hqufmtfq7-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:50:17 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:50:16 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 96B5A7A3F82C; Wed, 3 Aug 2022 13:47:23 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 13/15] bpf: Change bpf_setsockopt(SOL_IPV6) to reuse do_ipv6_setsockopt() Date: Wed, 3 Aug 2022 13:47:23 -0700 Message-ID: <20220803204723.3081786-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: UoTi7Jflrrg1giELDGG5fuzPctBvPeG7 X-Proofpoint-GUID: UoTi7Jflrrg1giELDGG5fuzPctBvPeG7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IPV6) and reuses the implementation in do_ipv6_setsockopt(). ipv6 could be compiled as a module. Like how other code solved it with stubs in ipv6_stubs.h, this patch adds the do_ipv6_setsockopt to the ipv6_bpf_stub. The current bpf_setsockopt(IPV6_TCLASS) does not take the INET_ECN_MASK into the account for tcp. The do_ipv6_setsockopt(IPV6_TCLASS) will handle it correctly. The existing optname white-list is refactored into a new function sol_ipv6_setsockopt(). After this last SOL_IPV6 dup code removal, the __bpf_setsockopt() is simplified enough that the extra "{ }" around the if statement can be removed. Signed-off-by: Martin KaFai Lau --- include/net/ipv6.h | 2 ++ include/net/ipv6_stubs.h | 2 ++ net/core/filter.c | 56 +++++++++++++++++++--------------------- net/ipv6/af_inet6.c | 1 + net/ipv6/ipv6_sockglue.c | 4 +-- 5 files changed, 33 insertions(+), 32 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index de9dcc5652c4..c110d9032083 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1156,6 +1156,8 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6, */ DECLARE_STATIC_KEY_FALSE(ip6_min_hopcount); +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ipv6_getsockopt(struct sock *sk, int level, int optname, diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h index 45e0339be6fa..8692698b01cf 100644 --- a/include/net/ipv6_stubs.h +++ b/include/net/ipv6_stubs.h @@ -81,6 +81,8 @@ struct ipv6_bpf_stub { const struct in6_addr *daddr, __be16 dport, int dif, int sdif, struct udp_table *tbl, struct sk_buff *skb); + int (*ipv6_setsockopt)(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); }; extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly; diff --git a/net/core/filter.c b/net/core/filter.c index d236f71ff501..de6f90a2ec0d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5142,45 +5142,41 @@ static int sol_ip_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR(optval), optlen); } +static int sol_ipv6_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_family != AF_INET6) + return -EINVAL; + + switch (optname) { + case IPV6_TCLASS: + if (optlen != sizeof(int)) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return ipv6_bpf_stub->ipv6_setsockopt(sk, SOL_IPV6, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { - int val, ret = 0; - if (!sk_fullsock(sk)) return -EINVAL; - if (level == SOL_SOCKET) { + if (level == SOL_SOCKET) return sol_socket_setsockopt(sk, optname, optval, optlen); - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { + else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) return sol_ip_setsockopt(sk, optname, optval, optlen); - } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { - if (optlen != sizeof(int) || sk->sk_family != AF_INET6) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case IPV6_TCLASS: - if (val < -1 || val > 0xff) { - ret = -EINVAL; - } else { - struct ipv6_pinfo *np = inet6_sk(sk); - - if (val == -1) - val = 0; - np->tclass = val; - } - break; - default: - ret = -EINVAL; - } - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) { + else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) + return sol_ipv6_setsockopt(sk, optname, optval, optlen); + else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) return sol_tcp_setsockopt(sk, optname, optval, optlen); - } else { - ret = -EINVAL; - } - return ret; + + return -EINVAL; } static int _bpf_setsockopt(struct sock *sk, int level, int optname, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 2ce0c44d0081..cadc97852787 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -1057,6 +1057,7 @@ static const struct ipv6_stub ipv6_stub_impl = { static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = { .inet6_bind = __inet6_bind, .udp6_lib_lookup = __udp6_lib_lookup, + .ipv6_setsockopt = do_ipv6_setsockopt, }; static int __init inet6_init(void) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index d23de48ff612..a4535bdbd310 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -391,8 +391,8 @@ static int ipv6_set_opt_hdr(struct sock *sk, int optname, sockptr_t optval, return err; } -static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct ipv6_pinfo *np = inet6_sk(sk); struct net *net = sock_net(sk); From patchwork Wed Aug 3 20:47:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935902 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20CFCC19F2B for ; Wed, 3 Aug 2022 20:50:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238829AbiHCUun (ORCPT ); Wed, 3 Aug 2022 16:50:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238834AbiHCUuh (ORCPT ); Wed, 3 Aug 2022 16:50:37 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E8EE1FCF3 for ; Wed, 3 Aug 2022 13:50:20 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 273F0oWx017861 for ; Wed, 3 Aug 2022 13:50:19 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=hBbc4voybpusoqVF5B6emsnTuWy9LwWZi7yWhqpliL0=; b=au1kH5vW3DyngPX4BRD98HpQo3Vam0nXd8UKGISFfDcn28fvGob2tFfYWilh8sXMtpn7 ECmrl8aBx/kHGqC3HAalCklgjq4NKJrEApsMIkJuDyMfke5NyYuV2SVadxzGCFZj4tvL Uaf6RImEPMtim/x1sMbFzJvOK1KhOtgmh/g= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hq4b7u571-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:50:19 -0700 Received: from twshared20276.35.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:50:16 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id EC3237A3F83F; Wed, 3 Aug 2022 13:47:29 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 14/15] bpf: Add a few optnames to bpf_setsockopt Date: Wed, 3 Aug 2022 13:47:29 -0700 Message-ID: <20220803204729.3082004-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Kxyv7tnmejCRu_u-h4ZJKKlM2aflXCuK X-Proofpoint-GUID: Kxyv7tnmejCRu_u-h4ZJKKlM2aflXCuK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds a few optnames for bpf_setsockopt: SO_REUSEADDR, IPV6_AUTOFLOWLABEL, TCP_MAXSEG, TCP_NODELAY, and TCP_THIN_LINEAR_TIMEOUTS. Thanks to the previous patches of this set, all additions can reuse the sk_setsockopt(), do_ipv6_setsockopt(), and do_tcp_setsockopt(). The only change here is to allow them in bpf_setsockopt. The bpf prog has been able to read all members of a sk by using PTR_TO_BTF_ID of a sk. The optname additions here can also be read by the same approach. Meaning there is a way to read the values back. These optnames can also be added to bpf_getsockopt() later with another patch set that makes the bpf_getsockopt() to reuse the sock_getsockopt(), tcp_getsockopt(), and ip[v6]_getsockopt(). Thus, this patch does not add more duplicated code to bpf_getsockopt() now. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index de6f90a2ec0d..52736c833a5f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5017,6 +5017,7 @@ static int sol_socket_setsockopt(struct sock *sk, int optname, char *optval, int optlen) { switch (optname) { + case SO_REUSEADDR: case SO_SNDBUF: case SO_RCVBUF: case SO_KEEPALIVE: @@ -5102,11 +5103,14 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname, return -EINVAL; switch (optname) { + case TCP_NODELAY: + case TCP_MAXSEG: case TCP_KEEPIDLE: case TCP_KEEPINTVL: case TCP_KEEPCNT: case TCP_SYNCNT: case TCP_WINDOW_CLAMP: + case TCP_THIN_LINEAR_TIMEOUTS: case TCP_USER_TIMEOUT: case TCP_NOTSENT_LOWAT: case TCP_SAVE_SYN: @@ -5150,6 +5154,7 @@ static int sol_ipv6_setsockopt(struct sock *sk, int optname, switch (optname) { case IPV6_TCLASS: + case IPV6_AUTOFLOWLABEL: if (optlen != sizeof(int)) return -EINVAL; break; From patchwork Wed Aug 3 20:47:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12935904 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68AA3C19F2B for ; Wed, 3 Aug 2022 20:50:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238823AbiHCUuu (ORCPT ); Wed, 3 Aug 2022 16:50:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238832AbiHCUun (ORCPT ); Wed, 3 Aug 2022 16:50:43 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44A823FA2D for ; Wed, 3 Aug 2022 13:50:21 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 273F4KLW031827 for ; Wed, 3 Aug 2022 13:50:20 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=nmI0Z7q28bbzKSEcKIIdQVilnpN7QtMJ0ha66GUBGbE=; b=nsj6WKLmxVM2Jll2CLmzN9p2o05/8K/IbSGSFyNMg88Og8GhKu8WfnchYsQl4irZ8BCM z5JFrORpdpjo2VNRLjBfVsJHWotIdVvCaJjSKyBeMBDlKQMxTBcuCuWI/s1mGzgswOyZ WnsJA5zUSfH7jQEZC+SuL6MidRoq2Yt3a+s= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3hq51d2vm0-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 03 Aug 2022 13:50:20 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Wed, 3 Aug 2022 13:50:16 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 477237A3F853; Wed, 3 Aug 2022 13:47:36 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v2 bpf-next 15/15] selftests/bpf: bpf_setsockopt tests Date: Wed, 3 Aug 2022 13:47:36 -0700 Message-ID: <20220803204736.3082620-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220803204601.3075863-1-kafai@fb.com> References: <20220803204601.3075863-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: gK-W6WKN-4L4TKTS91mi70JO7t5kv09X X-Proofpoint-ORIG-GUID: gK-W6WKN-4L4TKTS91mi70JO7t5kv09X X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-03_06,2022-08-02_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds tests to exercise optnames that are allowed in bpf_setsockopt(). Signed-off-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/setget_sockopt.c | 125 ++++ .../selftests/bpf/progs/setget_sockopt.c | 547 ++++++++++++++++++ 2 files changed, 672 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/setget_sockopt.c create mode 100644 tools/testing/selftests/bpf/progs/setget_sockopt.c diff --git a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c new file mode 100644 index 000000000000..018611e6b248 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c @@ -0,0 +1,125 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#define _GNU_SOURCE +#include +#include +#include + +#include "test_progs.h" +#include "cgroup_helpers.h" +#include "network_helpers.h" + +#include "setget_sockopt.skel.h" + +#define CG_NAME "/setget-sockopt-test" + +static const char addr4_str[] = "127.0.0.1"; +static const char addr6_str[] = "::1"; +static struct setget_sockopt *skel; +static int cg_fd; + +static int create_netns(void) +{ + if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns")) + return -1; + + if (!ASSERT_OK(system("ip link set dev lo up"), "set lo up")) + return -1; + + if (!ASSERT_OK(system("ip link add dev binddevtest1 type veth peer name binddevtest2"), + "add veth")) + return -1; + + if (!ASSERT_OK(system("ip link set dev binddevtest1 up"), + "bring veth up")) + return -1; + + return 0; +} + +static void test_tcp(int family) +{ + struct setget_sockopt__bss *bss = skel->bss; + int sfd, cfd; + + memset(bss, 0, sizeof(*bss)); + + sfd = start_server(family, SOCK_STREAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_GE(sfd, 0, "start_server")) + return; + + cfd = connect_to_fd(sfd, 0); + if (!ASSERT_GE(cfd, 0, "connect_to_fd_server")) { + close(sfd); + return; + } + close(sfd); + close(cfd); + + ASSERT_EQ(bss->nr_listen, 1, "nr_listen"); + ASSERT_EQ(bss->nr_connect, 1, "nr_connect"); + ASSERT_EQ(bss->nr_active, 1, "nr_active"); + ASSERT_EQ(bss->nr_passive, 1, "nr_passive"); + ASSERT_EQ(bss->nr_socket_post_create, 2, "nr_socket_post_create"); + ASSERT_EQ(bss->nr_binddev, 2, "nr_bind"); +} + +static void test_udp(int family) +{ + struct setget_sockopt__bss *bss = skel->bss; + int sfd; + + memset(bss, 0, sizeof(*bss)); + + sfd = start_server(family, SOCK_DGRAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_GE(sfd, 0, "start_server")) + return; + close(sfd); + + ASSERT_GE(bss->nr_socket_post_create, 1, "nr_socket_post_create"); + ASSERT_EQ(bss->nr_binddev, 1, "nr_bind"); +} + +void test_setget_sockopt(void) +{ + cg_fd = test__join_cgroup(CG_NAME); + if (cg_fd < 0) + return; + + if (create_netns()) + goto done; + + skel = setget_sockopt__open(); + if (!ASSERT_OK_PTR(skel, "open skel")) + goto done; + + strcpy(skel->rodata->veth, "binddevtest1"); + skel->rodata->veth_ifindex = if_nametoindex("binddevtest1"); + if (!ASSERT_GT(skel->rodata->veth_ifindex, 0, "if_nametoindex")) + goto done; + + if (!ASSERT_OK(setget_sockopt__load(skel), "load skel")) + goto done; + + skel->links.skops_sockopt = + bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd); + if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup")) + goto done; + + skel->links.socket_post_create = + bpf_program__attach_cgroup(skel->progs.socket_post_create, cg_fd); + if (!ASSERT_OK_PTR(skel->links.socket_post_create, "attach_cgroup")) + goto done; + + test_tcp(AF_INET6); + test_tcp(AF_INET); + test_udp(AF_INET6); + test_udp(AF_INET); + +done: + setget_sockopt__destroy(skel); + close(cg_fd); +} diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c new file mode 100644 index 000000000000..560cf4b92d65 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c @@ -0,0 +1,547 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef SO_TXREHASH +#define SO_TXREHASH 74 +#endif + +#ifndef TCP_NAGLE_OFF +#define TCP_NAGLE_OFF 1 +#endif + +#ifndef ARRAY_SIZE +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#endif + +extern unsigned long CONFIG_HZ __kconfig; + +const volatile char veth[IFNAMSIZ]; +const volatile int veth_ifindex; + +int nr_listen; +int nr_passive; +int nr_active; +int nr_connect; +int nr_binddev; +int nr_socket_post_create; + +struct sockopt_test { + int opt; + int new; + int restore; + int expected; + int tcp_expected; + int flip:1; +}; + +static const char cubic_cc[] = "cubic"; +static const char reno_cc[] = "reno"; + +static const struct sockopt_test sol_socket_tests[] = { + { .opt = SO_REUSEADDR, .flip = 1, }, + { .opt = SO_SNDBUF, .new = 8123, .expected = 8123 * 2, }, + { .opt = SO_RCVBUF, .new = 8123, .expected = 8123 * 2, }, + { .opt = SO_KEEPALIVE, .flip = 1, }, + { .opt = SO_PRIORITY, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_REUSEPORT, .flip = 1, }, + { .opt = SO_RCVLOWAT, .new = 8123, .expected = 8123, }, + { .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_TXREHASH, .flip = 1, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_tcp_tests[] = { + { .opt = TCP_NODELAY, .flip = 1, }, + { .opt = TCP_MAXSEG, .new = 1314, .expected = 1314, }, + { .opt = TCP_KEEPIDLE, .new = 123, .expected = 123, .restore = 321, }, + { .opt = TCP_KEEPINTVL, .new = 123, .expected = 123, .restore = 321, }, + { .opt = TCP_KEEPCNT, .new = 123, .expected = 123, .restore = 124, }, + { .opt = TCP_SYNCNT, .new = 123, .expected = 123, .restore = 124, }, + { .opt = TCP_WINDOW_CLAMP, .new = 8123, .expected = 8123, .restore = 8124, }, + { .opt = TCP_CONGESTION, }, + { .opt = TCP_THIN_LINEAR_TIMEOUTS, .flip = 1, }, + { .opt = TCP_USER_TIMEOUT, .new = 123400, .expected = 123400, }, + { .opt = TCP_NOTSENT_LOWAT, .new = 1314, .expected = 1314, }, + { .opt = TCP_SAVE_SYN, .new = 1, .expected = 1, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_ip_tests[] = { + { .opt = IP_TOS, .new = 0xe1, .expected = 0xe1, .tcp_expected = 0xe0, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_ipv6_tests[] = { + { .opt = IPV6_TCLASS, .new = 0xe1, .expected = 0xe1, .tcp_expected = 0xe0, }, + { .opt = IPV6_AUTOFLOWLABEL, .flip = 1, }, + { .opt = 0, }, +}; + +struct sock_common { + unsigned short skc_family; + unsigned long skc_flags; + unsigned char skc_reuse:4; + unsigned char skc_reuseport:1; + unsigned char skc_ipv6only:1; + unsigned char skc_net_refcnt:1; +} __attribute__((preserve_access_index)); + +struct sock { + struct sock_common __sk_common; + __u16 sk_type; + __u16 sk_protocol; + int sk_rcvlowat; + __u32 sk_mark; + unsigned long sk_max_pacing_rate; + unsigned int keepalive_time; + unsigned int keepalive_intvl; +} __attribute__((preserve_access_index)); + +struct tcp_options_received { + __u16 user_mss; +} __attribute__((preserve_access_index)); + +struct ipv6_pinfo { + __u16 recverr:1, + sndflow:1, + repflow:1, + pmtudisc:3, + padding:1, + srcprefs:3, + dontfrag:1, + autoflowlabel:1, + autoflowlabel_set:1, + mc_all:1, + recverr_rfc4884:1, + rtalert_isolate:1; +} __attribute__((preserve_access_index)); + +struct inet_sock { + /* sk and pinet6 has to be the first two members of inet_sock */ + struct sock sk; + struct ipv6_pinfo *pinet6; +} __attribute__((preserve_access_index)); + +struct inet_connection_sock { + __u32 icsk_user_timeout; + __u8 icsk_syn_retries; +} __attribute__((preserve_access_index)); + +struct tcp_sock { + struct inet_connection_sock inet_conn; + struct tcp_options_received rx_opt; + __u8 save_syn:2, + syn_data:1, + syn_fastopen:1, + syn_fastopen_exp:1, + syn_fastopen_ch:1, + syn_data_acked:1, + is_cwnd_limited:1; + __u32 window_clamp; + __u8 nonagle : 4, + thin_lto : 1, + recvmsg_inq : 1, + repair : 1, + frto : 1; + __u32 notsent_lowat; + __u8 keepalive_probes; + unsigned int keepalive_time; + unsigned int keepalive_intvl; +} __attribute__((preserve_access_index)); + +struct socket { + struct sock *sk; +} __attribute__((preserve_access_index)); + +struct loop_ctx { + void *ctx; + struct sock *sk; +}; + +static int __bpf_getsockopt(void *ctx, struct sock *sk, + int level, int opt, int *optval, + int optlen) +{ + if (level == SOL_SOCKET) { + switch (opt) { + case SO_REUSEADDR: + *optval = !!(sk->__sk_common.skc_reuse); + break; + case SO_KEEPALIVE: + *optval = !!(sk->__sk_common.skc_flags & (1UL << 3)); + break; + case SO_RCVLOWAT: + *optval = sk->sk_rcvlowat; + break; + case SO_MARK: + *optval = sk->sk_mark; + break; + case SO_MAX_PACING_RATE: + *optval = sk->sk_max_pacing_rate; + break; + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + if (level == IPPROTO_TCP) { + struct tcp_sock *tp = bpf_skc_to_tcp_sock(sk); + + if (!tp) + return -1; + + switch (opt) { + case TCP_NODELAY: + *optval = !!(tp->nonagle & TCP_NAGLE_OFF); + break; + case TCP_MAXSEG: + *optval = tp->rx_opt.user_mss; + break; + case TCP_KEEPIDLE: + *optval = tp->keepalive_time / CONFIG_HZ; + break; + case TCP_SYNCNT: + *optval = tp->inet_conn.icsk_syn_retries; + break; + case TCP_KEEPINTVL: + *optval = tp->keepalive_intvl / CONFIG_HZ; + break; + case TCP_KEEPCNT: + *optval = tp->keepalive_probes; + break; + case TCP_WINDOW_CLAMP: + *optval = tp->window_clamp; + break; + case TCP_THIN_LINEAR_TIMEOUTS: + *optval = tp->thin_lto; + break; + case TCP_USER_TIMEOUT: + *optval = tp->inet_conn.icsk_user_timeout; + break; + case TCP_NOTSENT_LOWAT: + *optval = tp->notsent_lowat; + break; + case TCP_SAVE_SYN: + *optval = tp->save_syn; + break; + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + if (level == IPPROTO_IPV6) { + switch (opt) { + case IPV6_AUTOFLOWLABEL: { + __u16 proto = sk->sk_protocol; + struct inet_sock *inet_sk; + + if (proto == IPPROTO_TCP) + inet_sk = (struct inet_sock *)bpf_skc_to_tcp_sock(sk); + else + inet_sk = (struct inet_sock *)bpf_skc_to_udp6_sock(sk); + + if (!inet_sk) + return -1; + + *optval = !!inet_sk->pinet6->autoflowlabel; + break; + } + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + return bpf_getsockopt(ctx, level, opt, optval, optlen); +} + +static int bpf_test_sockopt_flip(void *ctx, struct sock *sk, + const struct sockopt_test *t, + int level) +{ + int old, tmp, new, opt = t->opt; + + opt = t->opt; + + if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old))) + return 1; + /* kernel initialized txrehash to 255 */ + if (level == SOL_SOCKET && opt == SO_TXREHASH && old != 0 && old != 1) + old = 1; + + new = !old; + if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new))) + return 1; + if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) || + tmp != new) + return 1; + + if (bpf_setsockopt(ctx, level, opt, &old, sizeof(old))) + return 1; + + return 0; +} + +static int bpf_test_sockopt_int(void *ctx, struct sock *sk, + const struct sockopt_test *t, + int level) +{ + int old, tmp, new, expected, opt; + + opt = t->opt; + new = t->new; + if (sk->sk_type == SOCK_STREAM && t->tcp_expected) + expected = t->tcp_expected; + else + expected = t->expected; + + if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old)) || + old == new) + return 1; + + if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new))) + return 1; + if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) || + tmp != expected) + return 1; + + if (t->restore) + old = t->restore; + if (bpf_setsockopt(ctx, level, opt, &old, sizeof(old))) + return 1; + + return 0; +} + +static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_socket_tests)) + return 1; + + t = &sol_socket_tests[i]; + if (!t->opt) + return 1; + + if (t->flip) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, SOL_SOCKET); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET); +} + +static int bpf_test_ip_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_ip_tests)) + return 1; + + t = &sol_ip_tests[i]; + if (!t->opt) + return 1; + + if (t->flip) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, IPPROTO_IP); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, IPPROTO_IP); +} + +static int bpf_test_ipv6_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_ipv6_tests)) + return 1; + + t = &sol_ipv6_tests[i]; + if (!t->opt) + return 1; + + if (t->flip) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, IPPROTO_IPV6); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, IPPROTO_IPV6); +} + +static int bpf_test_tcp_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + struct sock *sk; + void *ctx; + + if (i >= ARRAY_SIZE(sol_tcp_tests)) + return 1; + + t = &sol_tcp_tests[i]; + if (!t->opt) + return 1; + + ctx = lc->ctx; + sk = lc->sk; + + if (t->opt == TCP_CONGESTION) { + char old_cc[16], tmp_cc[16]; + const char *new_cc; + + if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, old_cc, sizeof(old_cc))) + return 1; + if (!bpf_strncmp(old_cc, sizeof(old_cc), cubic_cc)) + new_cc = reno_cc; + else + new_cc = cubic_cc; + if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, (void *)new_cc, + sizeof(new_cc))) + return 1; + if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, tmp_cc, sizeof(tmp_cc))) + return 1; + if (bpf_strncmp(tmp_cc, sizeof(tmp_cc), new_cc)) + return 1; + if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, old_cc, sizeof(old_cc))) + return 1; + return 0; + } + + if (t->flip) + return bpf_test_sockopt_flip(ctx, sk, t, IPPROTO_TCP); + + return bpf_test_sockopt_int(ctx, sk, t, IPPROTO_TCP); +} + +static int bpf_test_sockopt(void *ctx, struct sock *sk) +{ + struct loop_ctx lc = { .ctx = ctx, .sk = sk, }; + __u16 family, proto; + int n; + + family = sk->__sk_common.skc_family; + proto = sk->sk_protocol; + + n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_socket_tests)) + return -1; + + if (proto == IPPROTO_TCP) { + n = bpf_loop(ARRAY_SIZE(sol_tcp_tests), bpf_test_tcp_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_tcp_tests)) + return -1; + } + + if (family == AF_INET) { + n = bpf_loop(ARRAY_SIZE(sol_ip_tests), bpf_test_ip_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_ip_tests)) + return -1; + } else { + n = bpf_loop(ARRAY_SIZE(sol_ipv6_tests), bpf_test_ipv6_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_ipv6_tests)) + return -1; + } + + return 0; +} + +static int binddev_test(void *ctx) +{ + const char empty_ifname[] = ""; + int ifindex, zero = 0; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE, + (void *)veth, sizeof(veth))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != veth_ifindex) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE, + (void *)empty_ifname, sizeof(empty_ifname))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != 0) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + (void *)&veth_ifindex, sizeof(int))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != veth_ifindex) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &zero, sizeof(int))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != 0) + return -1; + + return 0; +} + +SEC("lsm_cgroup/socket_post_create") +int BPF_PROG(socket_post_create, struct socket *sock, int family, + int type, int protocol, int kern) +{ + struct sock *sk = sock->sk; + + if (!sk) + return 1; + + nr_socket_post_create += !bpf_test_sockopt(sk, sk); + nr_binddev += !binddev_test(sk); + + return 1; +} + +SEC("sockops") +int skops_sockopt(struct bpf_sock_ops *skops) +{ + struct bpf_sock *bpf_sk = skops->sk; + struct sock *sk; + + if (!bpf_sk) + return 1; + + sk = (struct sock *)bpf_skc_to_tcp_sock(bpf_sk); + if (!sk) + return 1; + + switch (skops->op) { + case BPF_SOCK_OPS_TCP_LISTEN_CB: + nr_listen += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_TCP_CONNECT_CB: + nr_connect += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: + nr_active += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: + nr_passive += !bpf_test_sockopt(skops, sk); + break; + } + + return 1; +} + +char _license[] SEC("license") = "GPL";