From patchwork Wed Aug 17 06:17:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945510 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E4C2C25B08 for ; Wed, 17 Aug 2022 06:20:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233326AbiHQGT7 (ORCPT ); Wed, 17 Aug 2022 02:19:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231920AbiHQGT7 (ORCPT ); Wed, 17 Aug 2022 02:19:59 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2793967172 for ; Tue, 16 Aug 2022 23:19:58 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0Rqmj001573 for ; Tue, 16 Aug 2022 23:19:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=3p2HTvJo/KiB0bwCllk5DnuaCAIxYYmMk/pkI/2jg9o=; b=gUECw0zlpdc4RG19F5PyzAGpmTpDl90eBUoOnf/oDJzjUizQpmAViGfAD1gKQSAvzCO3 m7/mTdQMWg07SRwdWIp7ODN7dgPOQ5xfPXKXSrGr9qclMgLtQmzl5NGh9Q5jtL+rRK5n U70yemN/YABBVnQsiDlukJH8QNdmejkVyLI= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9h6y2-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:19:57 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:19:56 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 4436E825DBF9; Tue, 16 Aug 2022 23:17:11 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 01/15] net: Add sk_setsockopt() to take the sk ptr instead of the sock ptr Date: Tue, 16 Aug 2022 23:17:11 -0700 Message-ID: <20220817061711.4175048-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: dhdPRvIWAbSfdim9_Wk-Y2FE9Pqqxthz X-Proofpoint-ORIG-GUID: dhdPRvIWAbSfdim9_Wk-Y2FE9Pqqxthz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the sock_setsockopt() to avoid code duplication and code drift between the two duplicates. The current sock_setsockopt() takes sock ptr as the argument. The very first thing of this function is to get back the sk ptr by 'sk = sock->sk'. bpf_setsockopt() could be called when the sk does not have the sock ptr created. Meaning sk->sk_socket is NULL. For example, when a passive tcp connection has just been established but has yet been accept()-ed. Thus, it cannot use the sock_setsockopt(sk->sk_socket) or else it will pass a NULL ptr. This patch moves all sock_setsockopt implementation to the newly added sk_setsockopt(). The new sk_setsockopt() takes a sk ptr and immediately gets the sock ptr by 'sock = sk->sk_socket' The existing sock_setsockopt(sock) is changed to call sk_setsockopt(sock->sk). All existing callers have both sock->sk and sk->sk_socket pointer. The latter patch will make bpf_setsockopt(SOL_SOCKET) call sk_setsockopt(sk) directly. The bpf_setsockopt(SOL_SOCKET) does not use the optnames that require sk->sk_socket, so it will be safe. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/core/sock.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 4cb957d934a2..20269c37ab3b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1041,12 +1041,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes) * at the socket level. Everything here is generic. */ -int sock_setsockopt(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int optlen) +static int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct so_timestamping timestamping; + struct socket *sock = sk->sk_socket; struct sock_txtime sk_txtime; - struct sock *sk = sock->sk; int val; int valbool; struct linger ling; @@ -1499,6 +1499,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname, release_sock(sk); return ret; } + +int sock_setsockopt(struct socket *sock, int level, int optname, + sockptr_t optval, unsigned int optlen) +{ + return sk_setsockopt(sock->sk, level, optname, + optval, optlen); +} EXPORT_SYMBOL(sock_setsockopt); static const struct cred *sk_get_peer_cred(struct sock *sk) From patchwork Wed Aug 17 06:17:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945545 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A947BC25B08 for ; Wed, 17 Aug 2022 06:47:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230101AbiHQGrL (ORCPT ); Wed, 17 Aug 2022 02:47:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233329AbiHQGrK (ORCPT ); Wed, 17 Aug 2022 02:47:10 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2B3D6554B for ; Tue, 16 Aug 2022 23:47:07 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H6kBbG016611 for ; Tue, 16 Aug 2022 23:47:07 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=wpSbf31KyvOXxKfB48qiXSTnhqjs5yy0+Iiyh4mJkMs=; b=ZYnzHAWWVM3A/KrbuF1DdIytNCaE7VZ3JzFeX0YeQ6ESNy5l3Y9/sz5DoEhFXTp3zS7u i7IFkBoGWXRshnsrcbfNE6lp/gQxWUtZ0oIjN3LOkW+QhhzXwKIO+Ive2nlVKnyIo+D6 aulXoSli6g1b/YQ43WQyMvXiSu2+sEeW5bY= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9h9rf-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:47:07 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:47:06 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 96746825DC1F; Tue, 16 Aug 2022 23:17:17 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 02/15] bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf Date: Tue, 16 Aug 2022 23:17:17 -0700 Message-ID: <20220817061717.4175589-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 7IL0--S1irAM-jYlXw1q0x87vRyTScTE X-Proofpoint-ORIG-GUID: 7IL0--S1irAM-jYlXw1q0x87vRyTScTE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from the sk_setsockopt(). The number of supported optnames are increasing ever and so as the duplicated code. One issue in reusing sk_setsockopt() is that the bpf prog has already acquired the sk lock. This patch adds a has_current_bpf_ctx() to tell if the sk_setsockopt() is called from a bpf prog. The bpf prog calling bpf_setsockopt() is either running in_task() or in_serving_softirq(). Both cases have the current->bpf_ctx initialized. Thus, the has_current_bpf_ctx() only needs to test !!current->bpf_ctx. This patch also adds sockopt_{lock,release}_sock() helpers for sk_setsockopt() to use. These helpers will test has_current_bpf_ctx() before acquiring/releasing the lock. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Note on the change in sock_setbindtodevice(). sockopt_lock_sock() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 13 +++++++++++++ include/net/sock.h | 3 +++ net/core/sock.c | 30 +++++++++++++++++++++++++++--- 3 files changed, 43 insertions(+), 3 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a627a02cf8ab..39bd36359c1e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1966,6 +1966,15 @@ static inline bool unprivileged_ebpf_enabled(void) return !sysctl_unprivileged_bpf_disabled; } +/* Not all bpf prog type has the bpf_ctx. + * For the bpf prog type that has initialized the bpf_ctx, + * this function can be used to decide if a kernel function + * is called by a bpf program. + */ +static inline bool has_current_bpf_ctx(void) +{ + return !!current->bpf_ctx; +} #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { @@ -2175,6 +2184,10 @@ static inline bool unprivileged_ebpf_enabled(void) return false; } +static inline bool has_current_bpf_ctx(void) +{ + return false; +} #endif /* CONFIG_BPF_SYSCALL */ void __bpf_free_used_btfs(struct bpf_prog_aux *aux, diff --git a/include/net/sock.h b/include/net/sock.h index a7273b289188..b2ff230860c6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1721,6 +1721,9 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) } } +void sockopt_lock_sock(struct sock *sk); +void sockopt_release_sock(struct sock *sk); + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/net/core/sock.c b/net/core/sock.c index 20269c37ab3b..d3683228376f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -703,7 +703,9 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen) goto out; } - return sock_bindtoindex(sk, index, true); + sockopt_lock_sock(sk); + ret = sock_bindtoindex_locked(sk, index); + sockopt_release_sock(sk); out: #endif @@ -1036,6 +1038,28 @@ static int sock_reserve_memory(struct sock *sk, int bytes) return 0; } +void sockopt_lock_sock(struct sock *sk) +{ + /* When current->bpf_ctx is set, the setsockopt is called from + * a bpf prog. bpf has ensured the sk lock has been + * acquired before calling setsockopt(). + */ + if (has_current_bpf_ctx()) + return; + + lock_sock(sk); +} +EXPORT_SYMBOL(sockopt_lock_sock); + +void sockopt_release_sock(struct sock *sk) +{ + if (has_current_bpf_ctx()) + return; + + release_sock(sk); +} +EXPORT_SYMBOL(sockopt_release_sock); + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -1067,7 +1091,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, valbool = val ? 1 : 0; - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { case SO_DEBUG: @@ -1496,7 +1520,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, ret = -ENOPROTOOPT; break; } - release_sock(sk); + sockopt_release_sock(sk); return ret; } From patchwork Wed Aug 17 06:17:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945541 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEA98C25B08 for ; Wed, 17 Aug 2022 06:41:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229689AbiHQGlF (ORCPT ); Wed, 17 Aug 2022 02:41:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229751AbiHQGlF (ORCPT ); Wed, 17 Aug 2022 02:41:05 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67A434D4C4 for ; Tue, 16 Aug 2022 23:41:04 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0RxYv001691 for ; Tue, 16 Aug 2022 23:41:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=APAhwg9LK0IYA3HatFKwc1KnwMhuer+fjtfMjTlcRqg=; b=mBgvGHiFDNBQJzdHhXkCsQtW0+RXXKm0uCpG8zPp/fnr4Jdg6Jj9uL1mKM93K/MG70nZ OwiQpWrUVnNS8SOZcPa5JFKYBTyU2nLzMwutAHLSer99hM5nrZ2OxLI8ODxQmW/mh7vl HhXTxZAdHzFJ4nJZUFy6YwcECRc+IKvRRVg= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9h92r-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:41:04 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:41:02 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 82059825DC2C; Tue, 16 Aug 2022 23:17:23 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 03/15] bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt() Date: Tue, 16 Aug 2022 23:17:23 -0700 Message-ID: <20220817061723.4175820-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 8tvJQdNbOFPBmocSJAXeFfvUvgJ0LKFT X-Proofpoint-ORIG-GUID: 8tvJQdNbOFPBmocSJAXeFfvUvgJ0LKFT X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When bpf program calling bpf_setsockopt(SOL_SOCKET), it could be run in softirq and doesn't make sense to do the capable check. There was a similar situation in bpf_setsockopt(TCP_CONGESTION). In commit 8d650cdedaab ("tcp: fix tcp_set_congestion_control() use from bpf hook"), tcp_set_congestion_control(..., cap_net_admin) was added to skip the cap check for bpf prog. This patch adds sockopt_ns_capable() and sockopt_capable() for the sk_setsockopt() to use. They will consider the has_current_bpf_ctx() before doing the ns_capable() and capable() test. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Suggested-by: Stanislav Fomichev Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/net/sock.h | 2 ++ net/core/sock.c | 38 +++++++++++++++++++++++++------------- 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b2ff230860c6..72b78c2b6f83 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1723,6 +1723,8 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) void sockopt_lock_sock(struct sock *sk); void sockopt_release_sock(struct sock *sk); +bool sockopt_ns_capable(struct user_namespace *ns, int cap); +bool sockopt_capable(int cap); /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it diff --git a/net/core/sock.c b/net/core/sock.c index d3683228376f..7ea46e4700fd 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1060,6 +1060,18 @@ void sockopt_release_sock(struct sock *sk) } EXPORT_SYMBOL(sockopt_release_sock); +bool sockopt_ns_capable(struct user_namespace *ns, int cap) +{ + return has_current_bpf_ctx() || ns_capable(ns, cap); +} +EXPORT_SYMBOL(sockopt_ns_capable); + +bool sockopt_capable(int cap) +{ + return has_current_bpf_ctx() || capable(cap); +} +EXPORT_SYMBOL(sockopt_capable); + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -1095,7 +1107,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, switch (optname) { case SO_DEBUG: - if (val && !capable(CAP_NET_ADMIN)) + if (val && !sockopt_capable(CAP_NET_ADMIN)) ret = -EACCES; else sock_valbool_flag(sk, SOCK_DBG, valbool); @@ -1139,7 +1151,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, break; case SO_SNDBUFFORCE: - if (!capable(CAP_NET_ADMIN)) { + if (!sockopt_capable(CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1161,7 +1173,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, break; case SO_RCVBUFFORCE: - if (!capable(CAP_NET_ADMIN)) { + if (!sockopt_capable(CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1188,8 +1200,8 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, case SO_PRIORITY: if ((val >= 0 && val <= 6) || - ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || - ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) sk->sk_priority = val; else ret = -EPERM; @@ -1334,8 +1346,8 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, clear_bit(SOCK_PASSSEC, &sock->flags); break; case SO_MARK: - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1343,8 +1355,8 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, __sock_set_mark(sk, val); break; case SO_RCVMARK: - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } @@ -1378,7 +1390,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, #ifdef CONFIG_NET_RX_BUSY_POLL case SO_BUSY_POLL: /* allow unprivileged users to decrease the value */ - if ((val > sk->sk_ll_usec) && !capable(CAP_NET_ADMIN)) + if ((val > sk->sk_ll_usec) && !sockopt_capable(CAP_NET_ADMIN)) ret = -EPERM; else { if (val < 0) @@ -1388,13 +1400,13 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, } break; case SO_PREFER_BUSY_POLL: - if (valbool && !capable(CAP_NET_ADMIN)) + if (valbool && !sockopt_capable(CAP_NET_ADMIN)) ret = -EPERM; else WRITE_ONCE(sk->sk_prefer_busy_poll, valbool); break; case SO_BUSY_POLL_BUDGET: - if (val > READ_ONCE(sk->sk_busy_poll_budget) && !capable(CAP_NET_ADMIN)) { + if (val > READ_ONCE(sk->sk_busy_poll_budget) && !sockopt_capable(CAP_NET_ADMIN)) { ret = -EPERM; } else { if (val < 0 || val > U16_MAX) @@ -1465,7 +1477,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, * scheduler has enough safe guards. */ if (sk_txtime.clockid != CLOCK_MONOTONIC && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } From patchwork Wed Aug 17 06:17:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945523 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B54B4C25B08 for ; Wed, 17 Aug 2022 06:23:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229981AbiHQGXF (ORCPT ); Wed, 17 Aug 2022 02:23:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229936AbiHQGXE (ORCPT ); Wed, 17 Aug 2022 02:23:04 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E5A77B28C for ; Tue, 16 Aug 2022 23:23:03 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27H0Wbx8028672 for ; Tue, 16 Aug 2022 23:23:02 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=rDNYBH7Bv8pfgDHD18fUmBZ0xQFnWCFlQmHSQYNv/5s=; b=m/RNpt0kR0ktEko/zLHF54rl9EHwGpw2N6YiTySq27XIf4rSyxmtiV/4lNjvpVoRM6IF oxzIW5vinXDVx/2t89cxKmBQY2YSdpRujaAD64QMTFu2kz9fEf8e0t2waR8yv7ofj3b8 cLBsTQc6LYt3c+Djju/Y26Yq0FBt0OqvQWs= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3j0nvjh70q-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:23:02 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:23:00 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 4F975825DC41; Tue, 16 Aug 2022 23:17:30 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 04/15] bpf: net: Change do_tcp_setsockopt() to use the sockopt's lock_sock() and capable() Date: Tue, 16 Aug 2022 23:17:30 -0700 Message-ID: <20220817061730.4176021-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: yFxCZygYMRxH8GrwaP6ud5Wxj6-kGWX0 X-Proofpoint-ORIG-GUID: yFxCZygYMRxH8GrwaP6ud5Wxj6-kGWX0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_tcp_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/ipv4/tcp.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 970e9a2cca4a..cfed84b1883f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3202,7 +3202,7 @@ EXPORT_SYMBOL(tcp_disconnect); static inline bool tcp_can_repair_sock(const struct sock *sk) { - return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) && + return sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) && (sk->sk_state != TCP_LISTEN); } @@ -3502,11 +3502,11 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EFAULT; name[val] = 0; - lock_sock(sk); + sockopt_lock_sock(sk); err = tcp_set_congestion_control(sk, name, true, - ns_capable(sock_net(sk)->user_ns, - CAP_NET_ADMIN)); - release_sock(sk); + sockopt_ns_capable(sock_net(sk)->user_ns, + CAP_NET_ADMIN)); + sockopt_release_sock(sk); return err; } case TCP_ULP: { @@ -3522,9 +3522,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EFAULT; name[val] = 0; - lock_sock(sk); + sockopt_lock_sock(sk); err = tcp_set_ulp(sk, name); - release_sock(sk); + sockopt_release_sock(sk); return err; } case TCP_FASTOPEN_KEY: { @@ -3557,7 +3557,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { case TCP_MAXSEG: @@ -3779,7 +3779,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, break; } - release_sock(sk); + sockopt_release_sock(sk); return err; } From patchwork Wed Aug 17 06:17:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945550 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FB90C3F6B0 for ; Wed, 17 Aug 2022 06:59:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238258AbiHQG7K (ORCPT ); Wed, 17 Aug 2022 02:59:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238408AbiHQG7F (ORCPT ); Wed, 17 Aug 2022 02:59:05 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDD8E786CB for ; Tue, 16 Aug 2022 23:59:04 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0RqoM001573 for ; Tue, 16 Aug 2022 23:59:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=B0vNEoQIJSp/abWi0itBMd8TCeNRav/Bbth0BGd2aFE=; b=jzjI/HeQI673FuvMsr3Fk/u05nYHHanTJNTA5H2sc1zrS6W/x/8Lx0D5riOaveuTHJY+ GaRti5OPYzrDir5mXPIZT2SMFwH1yu45yF3jQ4m+kBgf8XyEsHsmJZISR02k5gXfNkOt ZzVMagA/qrL4V1/t2aX8+JrroOz203mEVfs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9haw3-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:59:04 -0700 Received: from twshared30313.14.frc2.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:59:03 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 5CC9A825DC66; Tue, 16 Aug 2022 23:17:37 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 05/15] bpf: net: Change do_ip_setsockopt() to use the sockopt's lock_sock() and capable() Date: Tue, 16 Aug 2022 23:17:37 -0700 Message-ID: <20220817061737.4176402-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: KHYQzEAz1txswqrYQ-qDADGEo4s5noYP X-Proofpoint-ORIG-GUID: KHYQzEAz1txswqrYQ-qDADGEo4s5noYP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_ip_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/ipv4/ip_sockglue.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index a8a323ecbb54..a3c496580e6b 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -944,7 +944,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, err = 0; if (needs_rtnl) rtnl_lock(); - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { case IP_OPTIONS: @@ -1333,14 +1333,14 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, case IP_IPSEC_POLICY: case IP_XFRM_POLICY: err = -EPERM; - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) + if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) break; err = xfrm_user_policy(sk, optname, optval, optlen); break; case IP_TRANSPARENT: - if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { + if (!!val && !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { err = -EPERM; break; } @@ -1368,13 +1368,13 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, err = -ENOPROTOOPT; break; } - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return err; e_inval: - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return -EINVAL; From patchwork Wed Aug 17 06:17:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945548 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBDB8C25B08 for ; Wed, 17 Aug 2022 06:53:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230272AbiHQGxJ (ORCPT ); Wed, 17 Aug 2022 02:53:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230473AbiHQGxH (ORCPT ); Wed, 17 Aug 2022 02:53:07 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53C82246 for ; Tue, 16 Aug 2022 23:53:06 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27H0WrHM028921 for ; Tue, 16 Aug 2022 23:53:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=biZpN7ohszoJ3Mskcxzy0Zt3nSobeX/bnUTitHKAEds=; b=jiCoyL43QgYQeugBHnzIJSAAAV/9qVhNnv1YlvDAA5bcQ0wko42nSP18MwTyWxYaz5+f 0UVhr9f5TRAsO3Q+SpD/IO3smSFO3h8FbJvKOD5UHy6V9HABW4lDOFFnQHt/kfZRPd1s JO1Sme7vt23nJxtk+OwDymBLvYqeoRtszEU= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3j0nvjha3t-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:53:05 -0700 Received: from twshared16418.24.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:53:02 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id B509B825DCA4; Tue, 16 Aug 2022 23:17:44 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 06/15] bpf: net: Change do_ipv6_setsockopt() to use the sockopt's lock_sock() and capable() Date: Tue, 16 Aug 2022 23:17:44 -0700 Message-ID: <20220817061744.4176893-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: XwhxsVY2Y1DgN1h-m5AXTVTeUVjxmcms X-Proofpoint-ORIG-GUID: XwhxsVY2Y1DgN1h-m5AXTVTeUVjxmcms X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_ipv6_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/ipv6/ipv6_sockglue.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 222f6bf220ba..d23de48ff612 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -327,7 +327,7 @@ static int ipv6_set_opt_hdr(struct sock *sk, int optname, sockptr_t optval, int err; /* hop-by-hop / destination options are privileged option */ - if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW)) + if (optname != IPV6_RTHDR && !sockopt_ns_capable(net->user_ns, CAP_NET_RAW)) return -EPERM; /* remove any sticky options header with a zero option @@ -417,7 +417,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, if (needs_rtnl) rtnl_lock(); - lock_sock(sk); + sockopt_lock_sock(sk); switch (optname) { @@ -634,8 +634,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; case IPV6_TRANSPARENT: - if (valbool && !ns_capable(net->user_ns, CAP_NET_RAW) && - !ns_capable(net->user_ns, CAP_NET_ADMIN)) { + if (valbool && !sockopt_ns_capable(net->user_ns, CAP_NET_RAW) && + !sockopt_ns_capable(net->user_ns, CAP_NET_ADMIN)) { retv = -EPERM; break; } @@ -946,7 +946,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, case IPV6_IPSEC_POLICY: case IPV6_XFRM_POLICY: retv = -EPERM; - if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + if (!sockopt_ns_capable(net->user_ns, CAP_NET_ADMIN)) break; retv = xfrm_user_policy(sk, optname, optval, optlen); break; @@ -994,14 +994,14 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; } - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return retv; e_inval: - release_sock(sk); + sockopt_release_sock(sk); if (needs_rtnl) rtnl_unlock(); return -EINVAL; From patchwork Wed Aug 17 06:17:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945535 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07A4FC25B08 for ; Wed, 17 Aug 2022 06:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238635AbiHQGfB (ORCPT ); Wed, 17 Aug 2022 02:35:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229687AbiHQGfA (ORCPT ); Wed, 17 Aug 2022 02:35:00 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 116A87757F for ; Tue, 16 Aug 2022 23:35:00 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0RnlP001547 for ; Tue, 16 Aug 2022 23:34:59 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=LWvmoWcjp/8fgeoyXvqACqQ+iib8lQuxwzZSCPD/NKY=; b=il8v4ifWT1VD+5w4Zp2ugx59jzL8An8yzRtyhRbhlX9ogWK6txOVgBHRx0JsCnqVmCOu O75ekIQSAB65KHd1rjPM/196uXZnDkOMabz/4zG4Ov+xBRs60ucMAoTRHZNv7oTrfYWa jTHP0ttbzmyVjLSWY5WP4OVD7BcfIxwShZs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9h8g3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:34:59 -0700 Received: from twshared20276.35.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:34:59 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 075D5825DCD6; Tue, 16 Aug 2022 23:17:52 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 07/15] bpf: Initialize the bpf_run_ctx in bpf_iter_run_prog() Date: Tue, 16 Aug 2022 23:17:51 -0700 Message-ID: <20220817061751.4177657-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: y9JEMgj9ytTq0WxTwL2JdLvAuO7RJFyB X-Proofpoint-ORIG-GUID: y9JEMgj9ytTq0WxTwL2JdLvAuO7RJFyB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The bpf-iter-prog for tcp and unix sk can do bpf_setsockopt() which needs has_current_bpf_ctx() to decide if it is called by a bpf prog. This patch initializes the bpf_run_ctx in bpf_iter_run_prog() for the has_current_bpf_ctx() to use. Acked-by: Andrii Nakryiko Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- kernel/bpf/bpf_iter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 4b112aa8bba3..6476b2c03527 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -685,19 +685,24 @@ struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop) int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx) { + struct bpf_run_ctx run_ctx, *old_run_ctx; int ret; if (prog->aux->sleepable) { rcu_read_lock_trace(); migrate_disable(); might_fault(); + old_run_ctx = bpf_set_run_ctx(&run_ctx); ret = bpf_prog_run(prog, ctx); + bpf_reset_run_ctx(old_run_ctx); migrate_enable(); rcu_read_unlock_trace(); } else { rcu_read_lock(); migrate_disable(); + old_run_ctx = bpf_set_run_ctx(&run_ctx); ret = bpf_prog_run(prog, ctx); + bpf_reset_run_ctx(old_run_ctx); migrate_enable(); rcu_read_unlock(); } From patchwork Wed Aug 17 06:17:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945549 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD57AC25B08 for ; Wed, 17 Aug 2022 06:59:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238685AbiHQG7J (ORCPT ); Wed, 17 Aug 2022 02:59:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233977AbiHQG7F (ORCPT ); Wed, 17 Aug 2022 02:59:05 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 292917548D for ; Tue, 16 Aug 2022 23:59:04 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0RqoK001573 for ; Tue, 16 Aug 2022 23:59:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=6Sd3N0xsVvUYcQdf9po4PyUnPzjitym6YcyiBkA8sQU=; b=bekHfWQf7c0dDiwFJXfxfcayKrvp6ca1FCXxfDuqVBxYKvnpyLe5JA+Zq0d5bNbIJkbo +udaIL5qK0AgD9EAvuduy1aL2cjt/GU82VPgMQtjhq074HKVmPNHSxTIV8U9a+1IZJha qY4kesioWf+9BykMp4H2vb5JrDNgufb4JME= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9haw3-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:59:03 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:59:02 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 66366825DCED; Tue, 16 Aug 2022 23:17:58 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 08/15] bpf: Embed kernel CONFIG check into the if statement in bpf_setsockopt Date: Tue, 16 Aug 2022 23:17:58 -0700 Message-ID: <20220817061758.4178374-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: wad4noaWsPJd4eDIHsgmox8dcjiA-Z8l X-Proofpoint-ORIG-GUID: wad4noaWsPJd4eDIHsgmox8dcjiA-Z8l X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch moves the "#ifdef CONFIG_XXX" check into the "if/else" statement itself. The change is done for the bpf_setsockopt() function only. It will make the latter patches easier to follow without the surrounding ifdef macro. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index e8508aaafd27..a663d7b96bad 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5116,8 +5116,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#ifdef CONFIG_INET - } else if (level == SOL_IP) { + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { if (optlen != sizeof(int) || sk->sk_family != AF_INET) return -EINVAL; @@ -5138,8 +5137,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#if IS_ENABLED(CONFIG_IPV6) - } else if (level == SOL_IPV6) { + } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { if (optlen != sizeof(int) || sk->sk_family != AF_INET6) return -EINVAL; @@ -5160,8 +5158,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#endif - } else if (level == SOL_TCP && + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && sk->sk_prot->setsockopt == tcp_setsockopt) { if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -5253,7 +5250,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, ret = -EINVAL; } } -#endif } else { ret = -EINVAL; } From patchwork Wed Aug 17 06:18:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945531 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6406CC25B08 for ; Wed, 17 Aug 2022 06:29:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229746AbiHQG3A (ORCPT ); Wed, 17 Aug 2022 02:29:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229462AbiHQG27 (ORCPT ); Wed, 17 Aug 2022 02:28:59 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A69E31DE6 for ; Tue, 16 Aug 2022 23:28:58 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0UR1D013355 for ; Tue, 16 Aug 2022 23:28:58 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=IpRE4QMT7DpbDi59QcyrlbkiR3lL4fl7L+nqzsfc95U=; b=AtuRiivmOu8Rjsek7xnw9F8dR6kimVh9HhFwInnX1vanpjUQ9d8cU/TzJ1jTwfjzywgN diDrXTYYgOGc1NB++hOnqGJBXC7CGHEcZhtxZw4/CocBLI8qZLcb3eMJa5kagmW37LkR 3g9dJ4UzqoZy+/um/5HVyIZjrqHZUhjthD4= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nuds7j0-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:28:58 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:28:57 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 0B87F825DD0D; Tue, 16 Aug 2022 23:18:04 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 09/15] bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt() Date: Tue, 16 Aug 2022 23:18:04 -0700 Message-ID: <20220817061804.4178920-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: y_bS2tEbckPdsf83LLgw91SQ4vvj7JnF X-Proofpoint-ORIG-GUID: y_bS2tEbckPdsf83LLgw91SQ4vvj7JnF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET) and reuses them from sk_setsockopt(). The sock ptr test is added to the SO_RCVLOWAT because the sk->sk_socket could be NULL in some of the bpf hooks. The existing optname white-list is refactored into a new function sol_socket_setsockopt(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/net/sock.h | 2 + net/core/filter.c | 124 +++++++++++---------------------------------- net/core/sock.c | 6 +-- 3 files changed, 34 insertions(+), 98 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 72b78c2b6f83..b7e159f9d7bf 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1800,6 +1800,8 @@ void sock_pfree(struct sk_buff *skb); #define sock_edemux sock_efree #endif +int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); diff --git a/net/core/filter.c b/net/core/filter.c index a663d7b96bad..6f5bcc8df487 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5013,109 +5013,43 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +static int sol_socket_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + switch (optname) { + case SO_SNDBUF: + case SO_RCVBUF: + case SO_KEEPALIVE: + case SO_PRIORITY: + case SO_REUSEPORT: + case SO_RCVLOWAT: + case SO_MARK: + case SO_MAX_PACING_RATE: + case SO_BINDTOIFINDEX: + case SO_TXREHASH: + if (optlen != sizeof(int)) + return -EINVAL; + break; + case SO_BINDTODEVICE: + break; + default: + return -EINVAL; + } + + return sk_setsockopt(sk, SOL_SOCKET, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { - char devname[IFNAMSIZ]; - int val, valbool; - struct net *net; - int ifindex; - int ret = 0; + int val, ret = 0; if (!sk_fullsock(sk)) return -EINVAL; if (level == SOL_SOCKET) { - if (optlen != sizeof(int) && optname != SO_BINDTODEVICE) - return -EINVAL; - val = *((int *)optval); - valbool = val ? 1 : 0; - - /* Only some socketops are supported */ - switch (optname) { - case SO_RCVBUF: - val = min_t(u32, val, sysctl_rmem_max); - val = min_t(int, val, INT_MAX / 2); - sk->sk_userlocks |= SOCK_RCVBUF_LOCK; - WRITE_ONCE(sk->sk_rcvbuf, - max_t(int, val * 2, SOCK_MIN_RCVBUF)); - break; - case SO_SNDBUF: - val = min_t(u32, val, sysctl_wmem_max); - val = min_t(int, val, INT_MAX / 2); - sk->sk_userlocks |= SOCK_SNDBUF_LOCK; - WRITE_ONCE(sk->sk_sndbuf, - max_t(int, val * 2, SOCK_MIN_SNDBUF)); - break; - case SO_MAX_PACING_RATE: /* 32bit version */ - if (val != ~0U) - cmpxchg(&sk->sk_pacing_status, - SK_PACING_NONE, - SK_PACING_NEEDED); - sk->sk_max_pacing_rate = (val == ~0U) ? - ~0UL : (unsigned int)val; - sk->sk_pacing_rate = min(sk->sk_pacing_rate, - sk->sk_max_pacing_rate); - break; - case SO_PRIORITY: - sk->sk_priority = val; - break; - case SO_RCVLOWAT: - if (val < 0) - val = INT_MAX; - if (sk->sk_socket && sk->sk_socket->ops->set_rcvlowat) - ret = sk->sk_socket->ops->set_rcvlowat(sk, val); - else - WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); - break; - case SO_MARK: - if (sk->sk_mark != val) { - sk->sk_mark = val; - sk_dst_reset(sk); - } - break; - case SO_BINDTODEVICE: - optlen = min_t(long, optlen, IFNAMSIZ - 1); - strncpy(devname, optval, optlen); - devname[optlen] = 0; - - ifindex = 0; - if (devname[0] != '\0') { - struct net_device *dev; - - ret = -ENODEV; - - net = sock_net(sk); - dev = dev_get_by_name(net, devname); - if (!dev) - break; - ifindex = dev->ifindex; - dev_put(dev); - } - fallthrough; - case SO_BINDTOIFINDEX: - if (optname == SO_BINDTOIFINDEX) - ifindex = val; - ret = sock_bindtoindex(sk, ifindex, false); - break; - case SO_KEEPALIVE: - if (sk->sk_prot->keepalive) - sk->sk_prot->keepalive(sk, valbool); - sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool); - break; - case SO_REUSEPORT: - sk->sk_reuseport = valbool; - break; - case SO_TXREHASH: - if (val < -1 || val > 1) { - ret = -EINVAL; - break; - } - sk->sk_txrehash = (u8)val; - break; - default: - ret = -EINVAL; - } + return sol_socket_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { if (optlen != sizeof(int) || sk->sk_family != AF_INET) return -EINVAL; diff --git a/net/core/sock.c b/net/core/sock.c index 7ea46e4700fd..2a6f84702eb9 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1077,8 +1077,8 @@ EXPORT_SYMBOL(sockopt_capable); * at the socket level. Everything here is generic. */ -static int sk_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct so_timestamping timestamping; struct socket *sock = sk->sk_socket; @@ -1264,7 +1264,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname, case SO_RCVLOWAT: if (val < 0) val = INT_MAX; - if (sock->ops->set_rcvlowat) + if (sock && sock->ops->set_rcvlowat) ret = sock->ops->set_rcvlowat(sk, val); else WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); From patchwork Wed Aug 17 06:18:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945546 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54B9FC25B08 for ; Wed, 17 Aug 2022 06:53:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230228AbiHQGxD (ORCPT ); Wed, 17 Aug 2022 02:53:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229493AbiHQGxC (ORCPT ); Wed, 17 Aug 2022 02:53:02 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD44874E0D for ; Tue, 16 Aug 2022 23:53:01 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0QBaK001306 for ; Tue, 16 Aug 2022 23:53:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=DNQQN563yAZFoTHbkDdn+G55mN0egNmmb+bDRrq/S1M=; b=PUn0VpkSvxD2hztlc7ZOLOqKhBG8XgFO/OALmOsB6+EeQzN27c5qlq3cpzLQOvYjKzau BYnRWjJNYdtTsrHHZCf3VX3wkNqcJHEuAI1uM7ipRH6FOSDBOWuqyj+J7DomVUf0wAwv iWJQQdmGjJPgItI1zbxYam12H1Dsjl682gA= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nsq1abg-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:53:01 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:52:58 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 8CAD7825DD1C; Tue, 16 Aug 2022 23:18:12 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 10/15] bpf: Refactor bpf specific tcp optnames to a new function Date: Tue, 16 Aug 2022 23:18:12 -0700 Message-ID: <20220817061812.4179645-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: qlZ3LDQng0ah4MfRi1WUa4XGA4jqJ-E3 X-Proofpoint-ORIG-GUID: qlZ3LDQng0ah4MfRi1WUa4XGA4jqJ-E3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The patch moves all bpf specific tcp optnames (TCP_BPF_XXX) to a new function bpf_sol_tcp_setsockopt(). This will make the next patch easier to follow. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 79 ++++++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 29 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 6f5bcc8df487..bb135d456a53 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5040,6 +5040,52 @@ static int sol_socket_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR(optval), optlen); } +static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + struct tcp_sock *tp = tcp_sk(sk); + unsigned long timeout; + int val; + + if (optlen != sizeof(int)) + return -EINVAL; + + val = *(int *)optval; + + /* Only some options are supported */ + switch (optname) { + case TCP_BPF_IW: + if (val <= 0 || tp->data_segs_out > tp->syn_data) + return -EINVAL; + tcp_snd_cwnd_set(tp, val); + break; + case TCP_BPF_SNDCWND_CLAMP: + if (val <= 0) + return -EINVAL; + tp->snd_cwnd_clamp = val; + tp->snd_ssthresh = val; + break; + case TCP_BPF_DELACK_MAX: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_DELACK_MAX || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_delack_max = timeout; + break; + case TCP_BPF_RTO_MIN: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_RTO_MIN || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_rto_min = timeout; + break; + default: + return -EINVAL; + } + + return 0; +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5094,6 +5140,10 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, } } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && sk->sk_prot->setsockopt == tcp_setsockopt) { + if (optname >= TCP_BPF_IW) + return bpf_sol_tcp_setsockopt(sk, optname, + optval, optlen); + if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -5104,7 +5154,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, } else { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); - unsigned long timeout; if (optlen != sizeof(int)) return -EINVAL; @@ -5112,34 +5161,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, val = *((int *)optval); /* Only some options are supported */ switch (optname) { - case TCP_BPF_IW: - if (val <= 0 || tp->data_segs_out > tp->syn_data) - ret = -EINVAL; - else - tcp_snd_cwnd_set(tp, val); - break; - case TCP_BPF_SNDCWND_CLAMP: - if (val <= 0) { - ret = -EINVAL; - } else { - tp->snd_cwnd_clamp = val; - tp->snd_ssthresh = val; - } - break; - case TCP_BPF_DELACK_MAX: - timeout = usecs_to_jiffies(val); - if (timeout > TCP_DELACK_MAX || - timeout < TCP_TIMEOUT_MIN) - return -EINVAL; - inet_csk(sk)->icsk_delack_max = timeout; - break; - case TCP_BPF_RTO_MIN: - timeout = usecs_to_jiffies(val); - if (timeout > TCP_RTO_MIN || - timeout < TCP_TIMEOUT_MIN) - return -EINVAL; - inet_csk(sk)->icsk_rto_min = timeout; - break; case TCP_SAVE_SYN: if (val < 0 || val > 1) ret = -EINVAL; From patchwork Wed Aug 17 06:18:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945559 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28776C25B08 for ; Wed, 17 Aug 2022 07:11:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238842AbiHQHLR (ORCPT ); Wed, 17 Aug 2022 03:11:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238838AbiHQHLI (ORCPT ); Wed, 17 Aug 2022 03:11:08 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF0605F106 for ; Wed, 17 Aug 2022 00:11:06 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27H0X23N028988 for ; Wed, 17 Aug 2022 00:11:06 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=9Wpd+EP0f4WvrEOEIxmjFxCt/Ic9sj2fpK/gO0U/XtQ=; b=Hs7QiNhJqb7Jt3Ds3rIK6MBt+ll3uyRDA1yVvI3RyhtxKSHFIDLKm3ITrqL+OvK8/nLf KTJ+66Lmhfukz7pSzmW4yX9CX0iGSwlX5LG3O1NeUiHb80b1xZSRMZXxiEclk6pAkVIf A6FS1qhmK4YY7f4gyYr1ErhZY7FBGCUdzPQ= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net (PPS) with ESMTPS id 3j0nvjhc45-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 17 Aug 2022 00:11:06 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 17 Aug 2022 00:11:05 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id D909E825DD58; Tue, 16 Aug 2022 23:18:19 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 11/15] bpf: Change bpf_setsockopt(SOL_TCP) to reuse do_tcp_setsockopt() Date: Tue, 16 Aug 2022 23:18:19 -0700 Message-ID: <20220817061819.4180146-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 1CbLneUIEbNmLs_AY4SE5DcG0tjBoNtt X-Proofpoint-ORIG-GUID: 1CbLneUIEbNmLs_AY4SE5DcG0tjBoNtt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes all the dup code from bpf_setsockopt(SOL_TCP) and reuses the do_tcp_setsockopt(). The existing optname white-list is refactored into a new function sol_tcp_setsockopt(). The sol_tcp_setsockopt() also calls the bpf_sol_tcp_setsockopt() to handle the TCP_BPF_XXX specific optnames. bpf_setsockopt(TCP_SAVE_SYN) now also allows a value 2 to save the eth header also and it comes for free from do_tcp_setsockopt(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/net/tcp.h | 2 + net/core/filter.c | 97 +++++++++++++++-------------------------------- net/ipv4/tcp.c | 4 +- 3 files changed, 34 insertions(+), 69 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index d10962b9f0d0..c03a50c72f40 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -405,6 +405,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); bool tcp_bpf_bypass_getsockopt(int level, int optname); +int do_tcp_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); void tcp_set_keepalive(struct sock *sk, int val); diff --git a/net/core/filter.c b/net/core/filter.c index bb135d456a53..66877605bb78 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5086,6 +5086,34 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, return 0; } +static int sol_tcp_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_prot->setsockopt != tcp_setsockopt) + return -EINVAL; + + switch (optname) { + case TCP_KEEPIDLE: + case TCP_KEEPINTVL: + case TCP_KEEPCNT: + case TCP_SYNCNT: + case TCP_WINDOW_CLAMP: + case TCP_USER_TIMEOUT: + case TCP_NOTSENT_LOWAT: + case TCP_SAVE_SYN: + if (optlen != sizeof(int)) + return -EINVAL; + break; + case TCP_CONGESTION: + break; + default: + return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen); + } + + return do_tcp_setsockopt(sk, SOL_TCP, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5138,73 +5166,8 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && - sk->sk_prot->setsockopt == tcp_setsockopt) { - if (optname >= TCP_BPF_IW) - return bpf_sol_tcp_setsockopt(sk, optname, - optval, optlen); - - if (optname == TCP_CONGESTION) { - char name[TCP_CA_NAME_MAX]; - - strncpy(name, optval, min_t(long, optlen, - TCP_CA_NAME_MAX-1)); - name[TCP_CA_NAME_MAX-1] = 0; - ret = tcp_set_congestion_control(sk, name, false, true); - } else { - struct inet_connection_sock *icsk = inet_csk(sk); - struct tcp_sock *tp = tcp_sk(sk); - - if (optlen != sizeof(int)) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case TCP_SAVE_SYN: - if (val < 0 || val > 1) - ret = -EINVAL; - else - tp->save_syn = val; - break; - case TCP_KEEPIDLE: - ret = tcp_sock_set_keepidle_locked(sk, val); - break; - case TCP_KEEPINTVL: - if (val < 1 || val > MAX_TCP_KEEPINTVL) - ret = -EINVAL; - else - tp->keepalive_intvl = val * HZ; - break; - case TCP_KEEPCNT: - if (val < 1 || val > MAX_TCP_KEEPCNT) - ret = -EINVAL; - else - tp->keepalive_probes = val; - break; - case TCP_SYNCNT: - if (val < 1 || val > MAX_TCP_SYNCNT) - ret = -EINVAL; - else - icsk->icsk_syn_retries = val; - break; - case TCP_USER_TIMEOUT: - if (val < 0) - ret = -EINVAL; - else - icsk->icsk_user_timeout = val; - break; - case TCP_NOTSENT_LOWAT: - tp->notsent_lowat = val; - sk->sk_write_space(sk); - break; - case TCP_WINDOW_CLAMP: - ret = tcp_set_window_clamp(sk, val); - break; - default: - ret = -EINVAL; - } - } + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) { + return sol_tcp_setsockopt(sk, optname, optval, optlen); } else { ret = -EINVAL; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cfed84b1883f..a6986f201f92 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3479,8 +3479,8 @@ int tcp_set_window_clamp(struct sock *sk, int val) /* * Socket option code for TCP. */ -static int do_tcp_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_tcp_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); From patchwork Wed Aug 17 06:18:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945547 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12423C25B08 for ; Wed, 17 Aug 2022 06:53:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229493AbiHQGxF (ORCPT ); Wed, 17 Aug 2022 02:53:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230272AbiHQGxE (ORCPT ); Wed, 17 Aug 2022 02:53:04 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19B1474E0D for ; Tue, 16 Aug 2022 23:53:04 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0J8tx007698 for ; Tue, 16 Aug 2022 23:53:03 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=b9+a8J4nTZQThAx5/Ev65A2t+7Y1GdHEU2hKfWkpFnk=; b=AY9h0JGMST3I7nylTlD00dBCDpAV1LiTvQFTCo+cMW1NI0uCyTA0RqOQz2wNucEVVFve ZsgkC36D0F+6B/GQQaWdvwiP5L5mQEG8IbeDnspb7JYbtocZWi1VNftEMRgJKZMht7on Ti+L6FgxbFcYr4/sNgXDLCp9JOR0Viha67Q= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0np59cad-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:53:03 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:53:02 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id CBE07825DDCF; Tue, 16 Aug 2022 23:18:26 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 12/15] bpf: Change bpf_setsockopt(SOL_IP) to reuse do_ip_setsockopt() Date: Tue, 16 Aug 2022 23:18:26 -0700 Message-ID: <20220817061826.4180990-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: l0sYvHiQ4Cxu00wMG4LREiGF48Kzov_v X-Proofpoint-ORIG-GUID: l0sYvHiQ4Cxu00wMG4LREiGF48Kzov_v X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IP) and reuses the implementation in do_ip_setsockopt(). The existing optname white-list is refactored into a new function sol_ip_setsockopt(). NOTE, the current bpf_setsockopt(IP_TOS) is quite different from the the do_ip_setsockopt(IP_TOS). For example, it does not take the INET_ECN_MASK into the account for tcp and also does not adjust sk->sk_priority. It looks like the current bpf_setsockopt(IP_TOS) was referencing the IPV6_TCLASS implementation instead of IP_TOS. This patch tries to rectify that by using the do_ip_setsockopt(IP_TOS). While this is a behavior change, the do_ip_setsockopt(IP_TOS) behavior is arguably what the user is expecting. At least, the INET_ECN_MASK bits should be masked out for tcp. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/net/ip.h | 2 ++ net/core/filter.c | 40 ++++++++++++++++++++-------------------- net/ipv4/ip_sockglue.c | 4 ++-- 3 files changed, 24 insertions(+), 22 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 1c979fd1904c..34fa5b0f0a0e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -743,6 +743,8 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk, int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc, bool allow_ipv6); DECLARE_STATIC_KEY_FALSE(ip4_min_ttl); +int do_ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, diff --git a/net/core/filter.c b/net/core/filter.c index 66877605bb78..4d1b42b8f4a8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5114,6 +5114,25 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR(optval), optlen); } +static int sol_ip_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_family != AF_INET) + return -EINVAL; + + switch (optname) { + case IP_TOS: + if (optlen != sizeof(int)) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return do_ip_setsockopt(sk, SOL_IP, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5125,26 +5144,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, if (level == SOL_SOCKET) { return sol_socket_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { - if (optlen != sizeof(int) || sk->sk_family != AF_INET) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case IP_TOS: - if (val < -1 || val > 0xff) { - ret = -EINVAL; - } else { - struct inet_sock *inet = inet_sk(sk); - - if (val == -1) - val = 0; - inet->tos = val; - } - break; - default: - ret = -EINVAL; - } + return sol_ip_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { if (optlen != sizeof(int) || sk->sk_family != AF_INET6) return -EINVAL; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index a3c496580e6b..751fa69cb557 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -888,8 +888,8 @@ static int compat_ip_mcast_join_leave(struct sock *sk, int optname, DEFINE_STATIC_KEY_FALSE(ip4_min_ttl); -static int do_ip_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_ip_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct inet_sock *inet = inet_sk(sk); struct net *net = sock_net(sk); From patchwork Wed Aug 17 06:18:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945540 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77C8CC25B08 for ; Wed, 17 Aug 2022 06:41:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229721AbiHQGlC (ORCPT ); Wed, 17 Aug 2022 02:41:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbiHQGlB (ORCPT ); Wed, 17 Aug 2022 02:41:01 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B2B64D4C4 for ; Tue, 16 Aug 2022 23:40:59 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27H0KMwk005289 for ; Tue, 16 Aug 2022 23:40:59 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=7zuZoJ5JteKK757R5i1xlPN4aMbs7xzbh+2YYd0r4vI=; b=Ro7/akeEIY2atexLvCGa7uTtf91xPkI+d8iIOUd6fcuuP5YzhlmiBjOQs8GobOaaCNM/ QaiSnSchsPQy2IyHHJgw3NRsIyHQBY6C/DqoDwx6H1nshyyqLIEzFwFi7iv26wJV0HJ/ 08Dt4neGNlE0EpNY5ZEc3HLJxXuCOVDn7wM= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3j0npgsagg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:40:59 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:40:58 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id E94DE825DDDE; Tue, 16 Aug 2022 23:18:34 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 13/15] bpf: Change bpf_setsockopt(SOL_IPV6) to reuse do_ipv6_setsockopt() Date: Tue, 16 Aug 2022 23:18:34 -0700 Message-ID: <20220817061834.4181198-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: RQIZW6mi5XLKpRxNXpUAVPecm-GWnsVq X-Proofpoint-ORIG-GUID: RQIZW6mi5XLKpRxNXpUAVPecm-GWnsVq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IPV6) and reuses the implementation in do_ipv6_setsockopt(). ipv6 could be compiled as a module. Like how other code solved it with stubs in ipv6_stubs.h, this patch adds the do_ipv6_setsockopt to the ipv6_bpf_stub. The current bpf_setsockopt(IPV6_TCLASS) does not take the INET_ECN_MASK into the account for tcp. The do_ipv6_setsockopt(IPV6_TCLASS) will handle it correctly. The existing optname white-list is refactored into a new function sol_ipv6_setsockopt(). After this last SOL_IPV6 dup code removal, the __bpf_setsockopt() is simplified enough that the extra "{ }" around the if statement can be removed. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- include/net/ipv6.h | 2 ++ include/net/ipv6_stubs.h | 2 ++ net/core/filter.c | 56 +++++++++++++++++++--------------------- net/ipv6/af_inet6.c | 1 + net/ipv6/ipv6_sockglue.c | 4 +-- 5 files changed, 33 insertions(+), 32 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index de9dcc5652c4..c110d9032083 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1156,6 +1156,8 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6, */ DECLARE_STATIC_KEY_FALSE(ip6_min_hopcount); +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ipv6_getsockopt(struct sock *sk, int level, int optname, diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h index 45e0339be6fa..8692698b01cf 100644 --- a/include/net/ipv6_stubs.h +++ b/include/net/ipv6_stubs.h @@ -81,6 +81,8 @@ struct ipv6_bpf_stub { const struct in6_addr *daddr, __be16 dport, int dif, int sdif, struct udp_table *tbl, struct sk_buff *skb); + int (*ipv6_setsockopt)(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); }; extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly; diff --git a/net/core/filter.c b/net/core/filter.c index 4d1b42b8f4a8..23282b8cf61e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5133,45 +5133,41 @@ static int sol_ip_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR(optval), optlen); } +static int sol_ipv6_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_family != AF_INET6) + return -EINVAL; + + switch (optname) { + case IPV6_TCLASS: + if (optlen != sizeof(int)) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return ipv6_bpf_stub->ipv6_setsockopt(sk, SOL_IPV6, optname, + KERNEL_SOCKPTR(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { - int val, ret = 0; - if (!sk_fullsock(sk)) return -EINVAL; - if (level == SOL_SOCKET) { + if (level == SOL_SOCKET) return sol_socket_setsockopt(sk, optname, optval, optlen); - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { + else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) return sol_ip_setsockopt(sk, optname, optval, optlen); - } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { - if (optlen != sizeof(int) || sk->sk_family != AF_INET6) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case IPV6_TCLASS: - if (val < -1 || val > 0xff) { - ret = -EINVAL; - } else { - struct ipv6_pinfo *np = inet6_sk(sk); - - if (val == -1) - val = 0; - np->tclass = val; - } - break; - default: - ret = -EINVAL; - } - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) { + else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) + return sol_ipv6_setsockopt(sk, optname, optval, optlen); + else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) return sol_tcp_setsockopt(sk, optname, optval, optlen); - } else { - ret = -EINVAL; - } - return ret; + + return -EINVAL; } static int _bpf_setsockopt(struct sock *sk, int level, int optname, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 2ce0c44d0081..cadc97852787 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -1057,6 +1057,7 @@ static const struct ipv6_stub ipv6_stub_impl = { static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = { .inet6_bind = __inet6_bind, .udp6_lib_lookup = __udp6_lib_lookup, + .ipv6_setsockopt = do_ipv6_setsockopt, }; static int __init inet6_init(void) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index d23de48ff612..a4535bdbd310 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -391,8 +391,8 @@ static int ipv6_set_opt_hdr(struct sock *sk, int optname, sockptr_t optval, return err; } -static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct ipv6_pinfo *np = inet6_sk(sk); struct net *net = sock_net(sk); From patchwork Wed Aug 17 06:18:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945532 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6980C25B08 for ; Wed, 17 Aug 2022 06:29:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229462AbiHQG3K (ORCPT ); Wed, 17 Aug 2022 02:29:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230187AbiHQG3J (ORCPT ); Wed, 17 Aug 2022 02:29:09 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 911BD31DE6 for ; Tue, 16 Aug 2022 23:29:07 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0QJWN017706 for ; Tue, 16 Aug 2022 23:29:07 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=T73Kh4riqgtkXF0ouQn+38NEgIX47ntvMOV6xn6a1Mc=; b=ILTmugEwW5rWJdaljB9PVsSQuJfnZZn650Iup6GGA2nYAGkKsZTpBiM4l8H4znBb+ULf p/Aw8ee28VMkMHOo9onr5fgHxjoIeM5tqrR2qs1vbegO9ghqQBxMX6W4uBieTbWA7KdF PBzP4FsfE9GsE/dhGyzNz5OD9yln1Bm4Rqk= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nsjh7m6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 16 Aug 2022 23:29:07 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 16 Aug 2022 23:29:05 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 93F03825DE1B; Tue, 16 Aug 2022 23:18:41 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 14/15] bpf: Add a few optnames to bpf_setsockopt Date: Tue, 16 Aug 2022 23:18:41 -0700 Message-ID: <20220817061841.4181642-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: UdRDRfZxnJt8mZ5vJni0GfgQF9_9OvqL X-Proofpoint-ORIG-GUID: UdRDRfZxnJt8mZ5vJni0GfgQF9_9OvqL X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds a few optnames for bpf_setsockopt: SO_REUSEADDR, IPV6_AUTOFLOWLABEL, TCP_MAXSEG, TCP_NODELAY, and TCP_THIN_LINEAR_TIMEOUTS. Thanks to the previous patches of this set, all additions can reuse the sk_setsockopt(), do_ipv6_setsockopt(), and do_tcp_setsockopt(). The only change here is to allow them in bpf_setsockopt. The bpf prog has been able to read all members of a sk by using PTR_TO_BTF_ID of a sk. The optname additions here can also be read by the same approach. Meaning there is a way to read the values back. These optnames can also be added to bpf_getsockopt() later with another patch set that makes the bpf_getsockopt() to reuse the sock_getsockopt(), tcp_getsockopt(), and ip[v6]_getsockopt(). Thus, this patch does not add more duplicated code to bpf_getsockopt() now. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index 23282b8cf61e..1acfaffeaf32 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5017,6 +5017,7 @@ static int sol_socket_setsockopt(struct sock *sk, int optname, char *optval, int optlen) { switch (optname) { + case SO_REUSEADDR: case SO_SNDBUF: case SO_RCVBUF: case SO_KEEPALIVE: @@ -5093,11 +5094,14 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname, return -EINVAL; switch (optname) { + case TCP_NODELAY: + case TCP_MAXSEG: case TCP_KEEPIDLE: case TCP_KEEPINTVL: case TCP_KEEPCNT: case TCP_SYNCNT: case TCP_WINDOW_CLAMP: + case TCP_THIN_LINEAR_TIMEOUTS: case TCP_USER_TIMEOUT: case TCP_NOTSENT_LOWAT: case TCP_SAVE_SYN: @@ -5141,6 +5145,7 @@ static int sol_ipv6_setsockopt(struct sock *sk, int optname, switch (optname) { case IPV6_TCLASS: + case IPV6_AUTOFLOWLABEL: if (optlen != sizeof(int)) return -EINVAL; break; From patchwork Wed Aug 17 06:18:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12945557 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBEEAC25B08 for ; Wed, 17 Aug 2022 07:05:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231238AbiHQHFS (ORCPT ); Wed, 17 Aug 2022 03:05:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229688AbiHQHFR (ORCPT ); Wed, 17 Aug 2022 03:05:17 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA58E5722F for ; Wed, 17 Aug 2022 00:05:15 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H0SCF7001893 for ; Wed, 17 Aug 2022 00:05:15 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Hkzx1vB93qRPMpqIxhR0perjjQmQy778KyyYs5JcfuE=; b=aJJ+y67ZvtkPWAzYf+Ji+WmcRv+8+zp5gvDgDI0HcKEDbWCCG7gCJK66+e4q19B2K7Il xRwMmC7LDoL/o2j+omdFwheC03scd8pI0Mpp7K/g5GFj2VEceNzc8fVUPN3WE0Q3W5zK 341hsms94rCBYOeIRj7DQ8FmjGyKd4KK2tU= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j0nt9hbtc-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 17 Aug 2022 00:05:15 -0700 Received: from twshared20276.35.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 17 Aug 2022 00:05:04 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id DF8BB825DE4D; Tue, 16 Aug 2022 23:18:47 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Stanislav Fomichev Subject: [PATCH v4 bpf-next 15/15] selftests/bpf: bpf_setsockopt tests Date: Tue, 16 Aug 2022 23:18:47 -0700 Message-ID: <20220817061847.4182339-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220817061704.4174272-1-kafai@fb.com> References: <20220817061704.4174272-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 1sfs9sL7sIxCbeT-07o0OVdAT0NAZUKG X-Proofpoint-ORIG-GUID: 1sfs9sL7sIxCbeT-07o0OVdAT0NAZUKG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_04,2022-08-16_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds tests to exercise optnames that are allowed in bpf_setsockopt(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/setget_sockopt.c | 125 +++++ .../selftests/bpf/progs/bpf_tracing_net.h | 31 +- .../selftests/bpf/progs/setget_sockopt.c | 451 ++++++++++++++++++ 3 files changed, 606 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/setget_sockopt.c create mode 100644 tools/testing/selftests/bpf/progs/setget_sockopt.c diff --git a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c new file mode 100644 index 000000000000..018611e6b248 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c @@ -0,0 +1,125 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#define _GNU_SOURCE +#include +#include +#include + +#include "test_progs.h" +#include "cgroup_helpers.h" +#include "network_helpers.h" + +#include "setget_sockopt.skel.h" + +#define CG_NAME "/setget-sockopt-test" + +static const char addr4_str[] = "127.0.0.1"; +static const char addr6_str[] = "::1"; +static struct setget_sockopt *skel; +static int cg_fd; + +static int create_netns(void) +{ + if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns")) + return -1; + + if (!ASSERT_OK(system("ip link set dev lo up"), "set lo up")) + return -1; + + if (!ASSERT_OK(system("ip link add dev binddevtest1 type veth peer name binddevtest2"), + "add veth")) + return -1; + + if (!ASSERT_OK(system("ip link set dev binddevtest1 up"), + "bring veth up")) + return -1; + + return 0; +} + +static void test_tcp(int family) +{ + struct setget_sockopt__bss *bss = skel->bss; + int sfd, cfd; + + memset(bss, 0, sizeof(*bss)); + + sfd = start_server(family, SOCK_STREAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_GE(sfd, 0, "start_server")) + return; + + cfd = connect_to_fd(sfd, 0); + if (!ASSERT_GE(cfd, 0, "connect_to_fd_server")) { + close(sfd); + return; + } + close(sfd); + close(cfd); + + ASSERT_EQ(bss->nr_listen, 1, "nr_listen"); + ASSERT_EQ(bss->nr_connect, 1, "nr_connect"); + ASSERT_EQ(bss->nr_active, 1, "nr_active"); + ASSERT_EQ(bss->nr_passive, 1, "nr_passive"); + ASSERT_EQ(bss->nr_socket_post_create, 2, "nr_socket_post_create"); + ASSERT_EQ(bss->nr_binddev, 2, "nr_bind"); +} + +static void test_udp(int family) +{ + struct setget_sockopt__bss *bss = skel->bss; + int sfd; + + memset(bss, 0, sizeof(*bss)); + + sfd = start_server(family, SOCK_DGRAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_GE(sfd, 0, "start_server")) + return; + close(sfd); + + ASSERT_GE(bss->nr_socket_post_create, 1, "nr_socket_post_create"); + ASSERT_EQ(bss->nr_binddev, 1, "nr_bind"); +} + +void test_setget_sockopt(void) +{ + cg_fd = test__join_cgroup(CG_NAME); + if (cg_fd < 0) + return; + + if (create_netns()) + goto done; + + skel = setget_sockopt__open(); + if (!ASSERT_OK_PTR(skel, "open skel")) + goto done; + + strcpy(skel->rodata->veth, "binddevtest1"); + skel->rodata->veth_ifindex = if_nametoindex("binddevtest1"); + if (!ASSERT_GT(skel->rodata->veth_ifindex, 0, "if_nametoindex")) + goto done; + + if (!ASSERT_OK(setget_sockopt__load(skel), "load skel")) + goto done; + + skel->links.skops_sockopt = + bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd); + if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup")) + goto done; + + skel->links.socket_post_create = + bpf_program__attach_cgroup(skel->progs.socket_post_create, cg_fd); + if (!ASSERT_OK_PTR(skel->links.socket_post_create, "attach_cgroup")) + goto done; + + test_tcp(AF_INET6); + test_tcp(AF_INET); + test_udp(AF_INET6); + test_udp(AF_INET); + +done: + setget_sockopt__destroy(skel); + close(cg_fd); +} diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h index 98dd2c4815f0..5ebc6dabef84 100644 --- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h +++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h @@ -6,13 +6,40 @@ #define AF_INET6 10 #define SOL_SOCKET 1 +#define SO_REUSEADDR 2 #define SO_SNDBUF 7 -#define __SO_ACCEPTCON (1 << 16) +#define SO_RCVBUF 8 +#define SO_KEEPALIVE 9 #define SO_PRIORITY 12 +#define SO_REUSEPORT 15 +#define SO_RCVLOWAT 18 +#define SO_BINDTODEVICE 25 +#define SO_MARK 36 +#define SO_MAX_PACING_RATE 47 +#define SO_BINDTOIFINDEX 62 +#define SO_TXREHASH 74 +#define __SO_ACCEPTCON (1 << 16) + +#define IP_TOS 1 + +#define IPV6_TCLASS 67 +#define IPV6_AUTOFLOWLABEL 70 #define SOL_TCP 6 +#define TCP_NODELAY 1 +#define TCP_MAXSEG 2 +#define TCP_KEEPIDLE 4 +#define TCP_KEEPINTVL 5 +#define TCP_KEEPCNT 6 +#define TCP_SYNCNT 7 +#define TCP_WINDOW_CLAMP 10 #define TCP_CONGESTION 13 +#define TCP_THIN_LINEAR_TIMEOUTS 16 +#define TCP_USER_TIMEOUT 18 +#define TCP_NOTSENT_LOWAT 25 +#define TCP_SAVE_SYN 27 #define TCP_CA_NAME_MAX 16 +#define TCP_NAGLE_OFF 1 #define ICSK_TIME_RETRANS 1 #define ICSK_TIME_PROBE0 3 @@ -49,6 +76,8 @@ #define sk_state __sk_common.skc_state #define sk_v6_daddr __sk_common.skc_v6_daddr #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr +#define sk_flags __sk_common.skc_flags +#define sk_reuse __sk_common.skc_reuse #define s6_addr32 in6_u.u6_addr32 diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c new file mode 100644 index 000000000000..4a4cb44a4a15 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c @@ -0,0 +1,451 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include "bpf_tracing_net.h" +#include +#include +#include + +#ifndef ARRAY_SIZE +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#endif + +extern unsigned long CONFIG_HZ __kconfig; + +const volatile char veth[IFNAMSIZ]; +const volatile int veth_ifindex; + +int nr_listen; +int nr_passive; +int nr_active; +int nr_connect; +int nr_binddev; +int nr_socket_post_create; + +struct sockopt_test { + int opt; + int new; + int restore; + int expected; + int tcp_expected; + unsigned int flip:1; +}; + +static const char cubic_cc[] = "cubic"; +static const char reno_cc[] = "reno"; + +static const struct sockopt_test sol_socket_tests[] = { + { .opt = SO_REUSEADDR, .flip = 1, }, + { .opt = SO_SNDBUF, .new = 8123, .expected = 8123 * 2, }, + { .opt = SO_RCVBUF, .new = 8123, .expected = 8123 * 2, }, + { .opt = SO_KEEPALIVE, .flip = 1, }, + { .opt = SO_PRIORITY, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_REUSEPORT, .flip = 1, }, + { .opt = SO_RCVLOWAT, .new = 8123, .expected = 8123, }, + { .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_TXREHASH, .flip = 1, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_tcp_tests[] = { + { .opt = TCP_NODELAY, .flip = 1, }, + { .opt = TCP_MAXSEG, .new = 1314, .expected = 1314, }, + { .opt = TCP_KEEPIDLE, .new = 123, .expected = 123, .restore = 321, }, + { .opt = TCP_KEEPINTVL, .new = 123, .expected = 123, .restore = 321, }, + { .opt = TCP_KEEPCNT, .new = 123, .expected = 123, .restore = 124, }, + { .opt = TCP_SYNCNT, .new = 123, .expected = 123, .restore = 124, }, + { .opt = TCP_WINDOW_CLAMP, .new = 8123, .expected = 8123, .restore = 8124, }, + { .opt = TCP_CONGESTION, }, + { .opt = TCP_THIN_LINEAR_TIMEOUTS, .flip = 1, }, + { .opt = TCP_USER_TIMEOUT, .new = 123400, .expected = 123400, }, + { .opt = TCP_NOTSENT_LOWAT, .new = 1314, .expected = 1314, }, + { .opt = TCP_SAVE_SYN, .new = 1, .expected = 1, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_ip_tests[] = { + { .opt = IP_TOS, .new = 0xe1, .expected = 0xe1, .tcp_expected = 0xe0, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_ipv6_tests[] = { + { .opt = IPV6_TCLASS, .new = 0xe1, .expected = 0xe1, .tcp_expected = 0xe0, }, + { .opt = IPV6_AUTOFLOWLABEL, .flip = 1, }, + { .opt = 0, }, +}; + +struct loop_ctx { + void *ctx; + struct sock *sk; +}; + +static int __bpf_getsockopt(void *ctx, struct sock *sk, + int level, int opt, int *optval, + int optlen) +{ + if (level == SOL_SOCKET) { + switch (opt) { + case SO_REUSEADDR: + *optval = !!BPF_CORE_READ_BITFIELD(sk, sk_reuse); + break; + case SO_KEEPALIVE: + *optval = !!(sk->sk_flags & (1UL << 3)); + break; + case SO_RCVLOWAT: + *optval = sk->sk_rcvlowat; + break; + case SO_MAX_PACING_RATE: + *optval = sk->sk_max_pacing_rate; + break; + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + if (level == IPPROTO_TCP) { + struct tcp_sock *tp = bpf_skc_to_tcp_sock(sk); + + if (!tp) + return -1; + + switch (opt) { + case TCP_NODELAY: + *optval = !!(BPF_CORE_READ_BITFIELD(tp, nonagle) & TCP_NAGLE_OFF); + break; + case TCP_MAXSEG: + *optval = tp->rx_opt.user_mss; + break; + case TCP_KEEPIDLE: + *optval = tp->keepalive_time / CONFIG_HZ; + break; + case TCP_SYNCNT: + *optval = tp->inet_conn.icsk_syn_retries; + break; + case TCP_KEEPINTVL: + *optval = tp->keepalive_intvl / CONFIG_HZ; + break; + case TCP_KEEPCNT: + *optval = tp->keepalive_probes; + break; + case TCP_WINDOW_CLAMP: + *optval = tp->window_clamp; + break; + case TCP_THIN_LINEAR_TIMEOUTS: + *optval = !!BPF_CORE_READ_BITFIELD(tp, thin_lto); + break; + case TCP_USER_TIMEOUT: + *optval = tp->inet_conn.icsk_user_timeout; + break; + case TCP_NOTSENT_LOWAT: + *optval = tp->notsent_lowat; + break; + case TCP_SAVE_SYN: + *optval = BPF_CORE_READ_BITFIELD(tp, save_syn); + break; + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + if (level == IPPROTO_IPV6) { + switch (opt) { + case IPV6_AUTOFLOWLABEL: { + __u16 proto = sk->sk_protocol; + struct inet_sock *inet_sk; + + if (proto == IPPROTO_TCP) + inet_sk = (struct inet_sock *)bpf_skc_to_tcp_sock(sk); + else + inet_sk = (struct inet_sock *)bpf_skc_to_udp6_sock(sk); + + if (!inet_sk) + return -1; + + *optval = !!inet_sk->pinet6->autoflowlabel; + break; + } + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + return bpf_getsockopt(ctx, level, opt, optval, optlen); +} + +static int bpf_test_sockopt_flip(void *ctx, struct sock *sk, + const struct sockopt_test *t, + int level) +{ + int old, tmp, new, opt = t->opt; + + opt = t->opt; + + if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old))) + return 1; + /* kernel initialized txrehash to 255 */ + if (level == SOL_SOCKET && opt == SO_TXREHASH && old != 0 && old != 1) + old = 1; + + new = !old; + if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new))) + return 1; + if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) || + tmp != new) + return 1; + + if (bpf_setsockopt(ctx, level, opt, &old, sizeof(old))) + return 1; + + return 0; +} + +static int bpf_test_sockopt_int(void *ctx, struct sock *sk, + const struct sockopt_test *t, + int level) +{ + int old, tmp, new, expected, opt; + + opt = t->opt; + new = t->new; + if (sk->sk_type == SOCK_STREAM && t->tcp_expected) + expected = t->tcp_expected; + else + expected = t->expected; + + if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old)) || + old == new) + return 1; + + if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new))) + return 1; + if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) || + tmp != expected) + return 1; + + if (t->restore) + old = t->restore; + if (bpf_setsockopt(ctx, level, opt, &old, sizeof(old))) + return 1; + + return 0; +} + +static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_socket_tests)) + return 1; + + t = &sol_socket_tests[i]; + if (!t->opt) + return 1; + + if (t->flip) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, SOL_SOCKET); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET); +} + +static int bpf_test_ip_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_ip_tests)) + return 1; + + t = &sol_ip_tests[i]; + if (!t->opt) + return 1; + + if (t->flip) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, IPPROTO_IP); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, IPPROTO_IP); +} + +static int bpf_test_ipv6_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_ipv6_tests)) + return 1; + + t = &sol_ipv6_tests[i]; + if (!t->opt) + return 1; + + if (t->flip) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, IPPROTO_IPV6); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, IPPROTO_IPV6); +} + +static int bpf_test_tcp_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + struct sock *sk; + void *ctx; + + if (i >= ARRAY_SIZE(sol_tcp_tests)) + return 1; + + t = &sol_tcp_tests[i]; + if (!t->opt) + return 1; + + ctx = lc->ctx; + sk = lc->sk; + + if (t->opt == TCP_CONGESTION) { + char old_cc[16], tmp_cc[16]; + const char *new_cc; + + if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, old_cc, sizeof(old_cc))) + return 1; + if (!bpf_strncmp(old_cc, sizeof(old_cc), cubic_cc)) + new_cc = reno_cc; + else + new_cc = cubic_cc; + if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, (void *)new_cc, + sizeof(new_cc))) + return 1; + if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, tmp_cc, sizeof(tmp_cc))) + return 1; + if (bpf_strncmp(tmp_cc, sizeof(tmp_cc), new_cc)) + return 1; + if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, old_cc, sizeof(old_cc))) + return 1; + return 0; + } + + if (t->flip) + return bpf_test_sockopt_flip(ctx, sk, t, IPPROTO_TCP); + + return bpf_test_sockopt_int(ctx, sk, t, IPPROTO_TCP); +} + +static int bpf_test_sockopt(void *ctx, struct sock *sk) +{ + struct loop_ctx lc = { .ctx = ctx, .sk = sk, }; + __u16 family, proto; + int n; + + family = sk->sk_family; + proto = sk->sk_protocol; + + n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_socket_tests)) + return -1; + + if (proto == IPPROTO_TCP) { + n = bpf_loop(ARRAY_SIZE(sol_tcp_tests), bpf_test_tcp_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_tcp_tests)) + return -1; + } + + if (family == AF_INET) { + n = bpf_loop(ARRAY_SIZE(sol_ip_tests), bpf_test_ip_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_ip_tests)) + return -1; + } else { + n = bpf_loop(ARRAY_SIZE(sol_ipv6_tests), bpf_test_ipv6_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_ipv6_tests)) + return -1; + } + + return 0; +} + +static int binddev_test(void *ctx) +{ + const char empty_ifname[] = ""; + int ifindex, zero = 0; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE, + (void *)veth, sizeof(veth))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != veth_ifindex) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE, + (void *)empty_ifname, sizeof(empty_ifname))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != 0) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + (void *)&veth_ifindex, sizeof(int))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != veth_ifindex) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &zero, sizeof(int))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != 0) + return -1; + + return 0; +} + +SEC("lsm_cgroup/socket_post_create") +int BPF_PROG(socket_post_create, struct socket *sock, int family, + int type, int protocol, int kern) +{ + struct sock *sk = sock->sk; + + if (!sk) + return 1; + + nr_socket_post_create += !bpf_test_sockopt(sk, sk); + nr_binddev += !binddev_test(sk); + + return 1; +} + +SEC("sockops") +int skops_sockopt(struct bpf_sock_ops *skops) +{ + struct bpf_sock *bpf_sk = skops->sk; + struct sock *sk; + + if (!bpf_sk) + return 1; + + sk = (struct sock *)bpf_skc_to_tcp_sock(bpf_sk); + if (!sk) + return 1; + + switch (skops->op) { + case BPF_SOCK_OPS_TCP_LISTEN_CB: + nr_listen += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_TCP_CONNECT_CB: + nr_connect += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: + nr_active += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: + nr_passive += !bpf_test_sockopt(skops, sk); + break; + } + + return 1; +} + +char _license[] SEC("license") = "GPL";