From patchwork Wed Jul 27 06:09:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930088 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8981FC19F28 for ; Wed, 27 Jul 2022 06:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229924AbiG0GJO (ORCPT ); Wed, 27 Jul 2022 02:09:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230169AbiG0GJJ (ORCPT ); Wed, 27 Jul 2022 02:09:09 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A79ED3FA0B for ; Tue, 26 Jul 2022 23:09:08 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND46A005163 for ; Tue, 26 Jul 2022 23:09:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=9MTYNo4YZsw9yi69NoNQIlU/M5aVblUYD9ww+V4slBs=; b=YR96TdFKlATzOSumbE132pLFjeLX19y9Rubkqi/TyL131Ue9K14BCJuicxzvsGlvOrNn 6eCQEg2FegWX4qXvJypjQMZtxAtYRBndP8VChNI6DSsoWX3qWcDc4CL+/sujhe0rj79m PPnkJ89twxn6sNK9LfEbVu4Wrm6bA2iPBpQ= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjhxaw32d-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:08 -0700 Received: from twshared34609.14.frc2.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:07 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id DB345757CB54; Tue, 26 Jul 2022 23:09:02 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 01/14] net: Change sock_setsockopt from taking sock ptr to sk ptr Date: Tue, 26 Jul 2022 23:09:02 -0700 Message-ID: <20220727060902.2370689-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: xbkuGCwJ4YFEDSrRiSyLXtRvHr0qGwJj X-Proofpoint-GUID: xbkuGCwJ4YFEDSrRiSyLXtRvHr0qGwJj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the sock_setsockopt() to avoid code duplication and code drift between the two duplicates. The current sock_setsockopt() takes sock ptr as the argument. The very first thing of this function is to get back the sk ptr by 'sk = sock->sk'. bpf_setsockopt() could be called when the sk does not have a userspace owner. Meaning sk->sk_socket is NULL. For example, when a passive tcp connection has just been established. Thus, it cannot use the sock_setsockopt(sk->sk_socket) or else it will pass a NULL sock ptr. All existing callers have both sock->sk and sk->sk_socket pointer. Thus, this patch changes the sock_setsockopt() to take a sk ptr instead of the sock ptr. The bpf_setsockopt() only allows optnames that do not require a sock ptr. Signed-off-by: Martin KaFai Lau --- drivers/nvme/host/tcp.c | 2 +- fs/ksmbd/transport_tcp.c | 2 +- include/net/sock.h | 2 +- net/core/sock.c | 4 ++-- net/mptcp/sockopt.c | 12 ++++++------ net/socket.c | 2 +- 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 7a9e6ffa2342..60e14cc39e49 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1555,7 +1555,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, char *iface = nctrl->opts->host_iface; sockptr_t optval = KERNEL_SOCKPTR(iface); - ret = sock_setsockopt(queue->sock, SOL_SOCKET, SO_BINDTODEVICE, + ret = sock_setsockopt(queue->sock->sk, SOL_SOCKET, SO_BINDTODEVICE, optval, strlen(iface)); if (ret) { dev_err(nctrl->device, diff --git a/fs/ksmbd/transport_tcp.c b/fs/ksmbd/transport_tcp.c index 143bba4e4db8..982eed2dd575 100644 --- a/fs/ksmbd/transport_tcp.c +++ b/fs/ksmbd/transport_tcp.c @@ -420,7 +420,7 @@ static int create_socket(struct interface *iface) ksmbd_tcp_nodelay(ksmbd_socket); ksmbd_tcp_reuseaddr(ksmbd_socket); - ret = sock_setsockopt(ksmbd_socket, + ret = sock_setsockopt(ksmbd_socket->sk, SOL_SOCKET, SO_BINDTODEVICE, KERNEL_SOCKPTR(iface->name), diff --git a/include/net/sock.h b/include/net/sock.h index f7ad1a7705e9..9e2539dcc293 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1795,7 +1795,7 @@ void sock_pfree(struct sk_buff *skb); #define sock_edemux sock_efree #endif -int sock_setsockopt(struct socket *sock, int level, int op, +int sock_setsockopt(struct sock *sk, int level, int op, sockptr_t optval, unsigned int optlen); int sock_getsockopt(struct socket *sock, int level, int op, diff --git a/net/core/sock.c b/net/core/sock.c index 4cb957d934a2..18bb4f269cf1 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1041,12 +1041,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes) * at the socket level. Everything here is generic. */ -int sock_setsockopt(struct socket *sock, int level, int optname, +int sock_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen) { struct so_timestamping timestamping; + struct socket *sock = sk->sk_socket; struct sock_txtime sk_txtime; - struct sock *sk = sock->sk; int val; int valbool; struct linger ling; diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 423d3826ca1e..5684499b4d39 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -124,7 +124,7 @@ static int mptcp_sol_socket_intval(struct mptcp_sock *msk, int optname, int val) struct sock *sk = (struct sock *)msk; int ret; - ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, + ret = sock_setsockopt(sk, SOL_SOCKET, optname, optval, sizeof(val)); if (ret) return ret; @@ -149,7 +149,7 @@ static int mptcp_setsockopt_sol_socket_tstamp(struct mptcp_sock *msk, int optnam struct sock *sk = (struct sock *)msk; int ret; - ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, + ret = sock_setsockopt(sk, SOL_SOCKET, optname, optval, sizeof(val)); if (ret) return ret; @@ -225,7 +225,7 @@ static int mptcp_setsockopt_sol_socket_timestamping(struct mptcp_sock *msk, return -EINVAL; } - ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, + ret = sock_setsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(×tamping), sizeof(timestamping)); if (ret) @@ -262,7 +262,7 @@ static int mptcp_setsockopt_sol_socket_linger(struct mptcp_sock *msk, sockptr_t return -EFAULT; kopt = KERNEL_SOCKPTR(&ling); - ret = sock_setsockopt(sk->sk_socket, SOL_SOCKET, SO_LINGER, kopt, sizeof(ling)); + ret = sock_setsockopt(sk, SOL_SOCKET, SO_LINGER, kopt, sizeof(ling)); if (ret) return ret; @@ -306,7 +306,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname, return -EINVAL; } - ret = sock_setsockopt(ssock, SOL_SOCKET, optname, optval, optlen); + ret = sock_setsockopt(ssock->sk, SOL_SOCKET, optname, optval, optlen); if (ret == 0) { if (optname == SO_REUSEPORT) sk->sk_reuseport = ssock->sk->sk_reuseport; @@ -349,7 +349,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname, case SO_PREFER_BUSY_POLL: case SO_BUSY_POLL_BUDGET: /* No need to copy: only relevant for msk */ - return sock_setsockopt(sk->sk_socket, SOL_SOCKET, optname, optval, optlen); + return sock_setsockopt(sk, SOL_SOCKET, optname, optval, optlen); case SO_NO_CHECK: case SO_DONTROUTE: case SO_BROADCAST: diff --git a/net/socket.c b/net/socket.c index b6bd4cf44d3f..c6911d613ae2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2245,7 +2245,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, if (kernel_optval) optval = KERNEL_SOCKPTR(kernel_optval); if (level == SOL_SOCKET && !sock_use_custom_sol_socket(sock)) - err = sock_setsockopt(sock, level, optname, optval, optlen); + err = sock_setsockopt(sock->sk, level, optname, optval, optlen); else if (unlikely(!sock->ops->setsockopt)) err = -EOPNOTSUPP; else From patchwork Wed Jul 27 06:09:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930089 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5664BC3F6B0 for ; Wed, 27 Jul 2022 06:09:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229725AbiG0GJW (ORCPT ); Wed, 27 Jul 2022 02:09:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230140AbiG0GJS (ORCPT ); Wed, 27 Jul 2022 02:09:18 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4DF73FA16 for ; Tue, 26 Jul 2022 23:09:17 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND7wm008762 for ; Tue, 26 Jul 2022 23:09:17 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=wZoCpmLJMa9fJowR/gtOH34UaLy3DGaHUp0wCMfo4F0=; b=MSnE6+nA+gsvnh4sfdWisJHGZ5YnJ4hSzjIkMPuu3sUnEfK6jKiFGM0Mv2BJPoF1I4Wf pPs0/J+w0uku5kjMnKm803qplLz5to3WyRWo2UHup+ooWvvPHpy6Ok8dTyZfWehn0HPf WGWILGKDtEZe7N0edt7f1yWm227JIoq6F1A= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjdvtphnh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:17 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:16 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 3243A757CB72; Tue, 26 Jul 2022 23:09:09 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 02/14] bpf: net: Avoid sock_setsockopt() taking sk lock when called from bpf Date: Tue, 26 Jul 2022 23:09:09 -0700 Message-ID: <20220727060909.2371812-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: FF-G2sck-zSCNaBwqju3UoX9CoeZknlM X-Proofpoint-ORIG-GUID: FF-G2sck-zSCNaBwqju3UoX9CoeZknlM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from the sock_setsockopt(). The number of supported options are increasing ever and so as the duplicated codes. One issue in reusing sock_setsockopt() is that the bpf prog has already acquired the sk lock. sockptr_t is useful to handle this. sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user memory copy. This patch adds a 'is_bpf' bit to tell if sk locking has already been ensured by the bpf prog. {lock,release}_sock_sockopt() helpers are added for the sock_setsockopt() to use. These helpers will handle the new 'is_bpf' bit. Note on the change in sock_setbindtodevice(). lock_sock_sockopt() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Signed-off-by: Martin KaFai Lau --- include/linux/sockptr.h | 8 +++++++- include/net/sock.h | 12 ++++++++++++ net/core/sock.c | 8 +++++--- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index d45902fb4cad..787d0be08fb7 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -16,7 +16,8 @@ typedef struct { void *kernel; void __user *user; }; - bool is_kernel : 1; + bool is_kernel : 1, + is_bpf : 1; } sockptr_t; static inline bool sockptr_is_kernel(sockptr_t sockptr) @@ -24,6 +25,11 @@ static inline bool sockptr_is_kernel(sockptr_t sockptr) return sockptr.is_kernel; } +static inline sockptr_t KERNEL_SOCKPTR_BPF(void *p) +{ + return (sockptr_t) { .kernel = p, .is_kernel = true, .is_bpf = true }; +} + static inline sockptr_t KERNEL_SOCKPTR(void *p) { return (sockptr_t) { .kernel = p, .is_kernel = true }; diff --git a/include/net/sock.h b/include/net/sock.h index 9e2539dcc293..2f913bdf0035 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1721,6 +1721,18 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) } } +static inline void lock_sock_sockopt(struct sock *sk, sockptr_t optval) +{ + if (!optval.is_bpf) + lock_sock(sk); +} + +static inline void release_sock_sockopt(struct sock *sk, sockptr_t optval) +{ + if (!optval.is_bpf) + release_sock(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/net/core/sock.c b/net/core/sock.c index 18bb4f269cf1..61d927a5f6cb 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -703,7 +703,9 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen) goto out; } - return sock_bindtoindex(sk, index, true); + lock_sock_sockopt(sk, optval); + ret = sock_bindtoindex_locked(sk, index); + release_sock_sockopt(sk, optval); out: #endif @@ -1067,7 +1069,7 @@ int sock_setsockopt(struct sock *sk, int level, int optname, valbool = val ? 1 : 0; - lock_sock(sk); + lock_sock_sockopt(sk, optval); switch (optname) { case SO_DEBUG: @@ -1496,7 +1498,7 @@ int sock_setsockopt(struct sock *sk, int level, int optname, ret = -ENOPROTOOPT; break; } - release_sock(sk); + release_sock_sockopt(sk, optval); return ret; } EXPORT_SYMBOL(sock_setsockopt); From patchwork Wed Jul 27 06:09:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930090 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4819C19F21 for ; Wed, 27 Jul 2022 06:09:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230148AbiG0GJb (ORCPT ); Wed, 27 Jul 2022 02:09:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbiG0GJ3 (ORCPT ); Wed, 27 Jul 2022 02:09:29 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7D2D3FA1B for ; Tue, 26 Jul 2022 23:09:27 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND39x023090 for ; Tue, 26 Jul 2022 23:09:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=UYajfNksjQ0xDGx8SLc0vrxm7yeny8R81U/q7n7S7dU=; b=BYPl4vnfNDlWStprmMqwsaV20wu74sKOMf28X0jtEya9vNXyilcXxReE8JuHAtUkTeA6 bZSX9FmfVYGwtMFKkhrxx7eDUfs+PeiZSAxjdf0z4iIH53TOGEUxaIKx57ToaeWCZXs9 5xmsfDeds9zXRnU8s88UhcNRm0TYnIEuFzM= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjjnsmqj4-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:27 -0700 Received: from snc-exhub201.TheFacebook.com (2620:10d:c085:21d::7) by snc-exhub202.TheFacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:26 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:25 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 75746757CBDA; Tue, 26 Jul 2022 23:09:15 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 03/14] bpf: net: Consider optval.is_bpf before capable check in sock_setsockopt() Date: Tue, 26 Jul 2022 23:09:15 -0700 Message-ID: <20220727060915.2372520-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: PT_wWggIz53taQtwDKgy5J8p0V85Nc5a X-Proofpoint-GUID: PT_wWggIz53taQtwDKgy5J8p0V85Nc5a X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When bpf program calling bpf_setsockopt(SOL_SOCKET), it could be run in softirq and doesn't make sense to do the capable check. There was a similar situation in bpf_setsockopt(TCP_CONGESTION). In commit 8d650cdedaab ("tcp: fix tcp_set_congestion_control() use from bpf hook") tcp_set_congestion_control(..., cap_net_admin) was added to skip the cap check for bpf prog. A similar change is done in this patch for SO_MARK, SO_PRIORITY, and SO_BINDTO{DEVICE,IFINDEX} which are the optnames allowed by bpf_setsockopt(SOL_SOCKET). This will allow the sock_setsockopt() to be reused by bpf_setsockopt(SOL_SOCKET) in a latter patch. Signed-off-by: Martin KaFai Lau --- net/core/sock.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 61d927a5f6cb..f2c582491d5f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -620,7 +620,7 @@ struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie) } EXPORT_SYMBOL(sk_dst_check); -static int sock_bindtoindex_locked(struct sock *sk, int ifindex) +static int sock_bindtoindex_locked(struct sock *sk, int ifindex, bool cap_check) { int ret = -ENOPROTOOPT; #ifdef CONFIG_NETDEVICES @@ -628,7 +628,8 @@ static int sock_bindtoindex_locked(struct sock *sk, int ifindex) /* Sorry... */ ret = -EPERM; - if (sk->sk_bound_dev_if && !ns_capable(net->user_ns, CAP_NET_RAW)) + if (sk->sk_bound_dev_if && cap_check && + !ns_capable(net->user_ns, CAP_NET_RAW)) goto out; ret = -EINVAL; @@ -656,7 +657,7 @@ int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk) if (lock_sk) lock_sock(sk); - ret = sock_bindtoindex_locked(sk, ifindex); + ret = sock_bindtoindex_locked(sk, ifindex, true); if (lock_sk) release_sock(sk); @@ -704,7 +705,7 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen) } lock_sock_sockopt(sk, optval); - ret = sock_bindtoindex_locked(sk, index); + ret = sock_bindtoindex_locked(sk, index, !optval.is_bpf); release_sock_sockopt(sk, optval); out: #endif @@ -1166,6 +1167,7 @@ int sock_setsockopt(struct sock *sk, int level, int optname, case SO_PRIORITY: if ((val >= 0 && val <= 6) || + optval.is_bpf || ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) sk->sk_priority = val; @@ -1312,7 +1314,8 @@ int sock_setsockopt(struct sock *sk, int level, int optname, clear_bit(SOCK_PASSSEC, &sock->flags); break; case SO_MARK: - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && + if (!optval.is_bpf && + !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; @@ -1456,7 +1459,7 @@ int sock_setsockopt(struct sock *sk, int level, int optname, break; case SO_BINDTOIFINDEX: - ret = sock_bindtoindex_locked(sk, val); + ret = sock_bindtoindex_locked(sk, val, !optval.is_bpf); break; case SO_BUF_LOCK: From patchwork Wed Jul 27 06:09:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930091 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9196C19F21 for ; Wed, 27 Jul 2022 06:09:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230173AbiG0GJf (ORCPT ); Wed, 27 Jul 2022 02:09:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230115AbiG0GJc (ORCPT ); Wed, 27 Jul 2022 02:09:32 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 627BA3FA26 for ; Tue, 26 Jul 2022 23:09:31 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QNDGcZ000955 for ; Tue, 26 Jul 2022 23:09:30 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=UKIPQVu6eKMnk726mpApj3QnXSFYtxMB3wm5kiaDCTI=; b=Y/GyFbucYeuaY89Gr/1rYn+OtaxvRAkcYzJ0uc6auH7x9B7WeByk+x0rrlbeofwWZdAN pcLVKPnFyPAHIfxPbcF1gd5a/EsTJBnO1u4jMjfDmcJPWK5OAKEbAVaP8+UVfWkNncXv NLL5MF1Vip3rn74rktgxYbRYjWUHf/YUXEE= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hj592rgq8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:30 -0700 Received: from twshared34609.14.frc2.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:29 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id BEB39757CC81; Tue, 26 Jul 2022 23:09:21 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 04/14] bpf: net: Avoid do_tcp_setsockopt() taking sk lock when called from bpf Date: Tue, 26 Jul 2022 23:09:21 -0700 Message-ID: <20220727060921.2373314-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: IqEzap8C1wnvUaioF_taz76pivWEKbom X-Proofpoint-GUID: IqEzap8C1wnvUaioF_taz76pivWEKbom X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sock_setsockopt() from taking sk lock when called from bpf. This patch changes do_tcp_setsockopt() to use the {lock,release}_sock_sockopt(). This patch also changes do_tcp_setsockopt() to check optval.is_bpf when passing the cap_net_admin arg to tcp_set_congestion_control(..., cap_net_admin). It is the same as how bpf_setsockopt(TCP_CONGESTION) is calling tcp_set_congestion_control() now. Signed-off-by: Martin KaFai Lau --- net/ipv4/tcp.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ba2bdc811374..7f8d81befa8e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3462,11 +3462,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EFAULT; name[val] = 0; - lock_sock(sk); - err = tcp_set_congestion_control(sk, name, true, - ns_capable(sock_net(sk)->user_ns, - CAP_NET_ADMIN)); - release_sock(sk); + lock_sock_sockopt(sk, optval); + err = tcp_set_congestion_control(sk, name, !optval.is_bpf, + optval.is_bpf ? + true : ns_capable(sock_net(sk)->user_ns, + CAP_NET_ADMIN)); + release_sock_sockopt(sk, optval); return err; } case TCP_ULP: { @@ -3482,9 +3483,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EFAULT; name[val] = 0; - lock_sock(sk); + lock_sock_sockopt(sk, optval); err = tcp_set_ulp(sk, name); - release_sock(sk); + release_sock_sockopt(sk, optval); return err; } case TCP_FASTOPEN_KEY: { @@ -3517,7 +3518,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; - lock_sock(sk); + lock_sock_sockopt(sk, optval); switch (optname) { case TCP_MAXSEG: @@ -3739,7 +3740,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, break; } - release_sock(sk); + release_sock_sockopt(sk, optval); return err; } From patchwork Wed Jul 27 06:09:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930092 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1074DC19F2C for ; Wed, 27 Jul 2022 06:09:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbiG0GJs (ORCPT ); Wed, 27 Jul 2022 02:09:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230115AbiG0GJo (ORCPT ); Wed, 27 Jul 2022 02:09:44 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C294402CA for ; Tue, 26 Jul 2022 23:09:43 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND9Dm029540 for ; Tue, 26 Jul 2022 23:09:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=t2qYenmu3xZuB5RYl+s2v8VQSwmD4QgYS86NCfp4PNw=; b=QEMTcEcjfuEtHNovDewxYFaN55wllugJZlI6YWDQH9XvjOlFVi0CBWsObPVXW9oBo5gB 3DQH7g8mHqwvwMPDVzVxVGqpBZwktFdaohSpp4Xi8WbkTik6jfsToqVLBHp13zNtFznE piKLKOzzGNNuoEg7q8xbOAj1gGFtTsZjjXo= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hj1ust335-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:43 -0700 Received: from twshared20276.35.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:32 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 13DD3757CC9E; Tue, 26 Jul 2022 23:09:28 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 05/14] bpf: net: Avoid do_ip_setsockopt() taking sk lock when called from bpf Date: Tue, 26 Jul 2022 23:09:28 -0700 Message-ID: <20220727060928.2374257-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 74V9DPQjSvGrYpEfuGoOaaVyBwmcBVG7 X-Proofpoint-ORIG-GUID: 74V9DPQjSvGrYpEfuGoOaaVyBwmcBVG7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sock_setsockopt() from taking sk lock when called from bpf. This patch changes do_ip_setsockopt() to use the {lock,release}_sock_sockopt(). Signed-off-by: Martin KaFai Lau --- net/ipv4/ip_sockglue.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index a8a323ecbb54..8271ad565a3a 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -944,7 +944,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, err = 0; if (needs_rtnl) rtnl_lock(); - lock_sock(sk); + lock_sock_sockopt(sk, optval); switch (optname) { case IP_OPTIONS: @@ -1368,13 +1368,13 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname, err = -ENOPROTOOPT; break; } - release_sock(sk); + release_sock_sockopt(sk, optval); if (needs_rtnl) rtnl_unlock(); return err; e_inval: - release_sock(sk); + release_sock_sockopt(sk, optval); if (needs_rtnl) rtnl_unlock(); return -EINVAL; From patchwork Wed Jul 27 06:09:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930093 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42EDDC19F21 for ; Wed, 27 Jul 2022 06:09:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230233AbiG0GJt (ORCPT ); Wed, 27 Jul 2022 02:09:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230171AbiG0GJo (ORCPT ); Wed, 27 Jul 2022 02:09:44 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 825FB402CC for ; Tue, 26 Jul 2022 23:09:43 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND7ws008762 for ; Tue, 26 Jul 2022 23:09:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=WPn8tpKwcKUlO9OODePkSgWXsKkPsy3LqflPY1r2EFE=; b=aSB82UMcCZrOvgJZGZSM7/eR3WgJhD4zXATK/QGpQDCqIHPgn2+tEbInX+PiI15AbSxr BWnBVc9bUKZKfR8GJ6onbzoufy/vV7FMZUlAHFy/oxbcxTGm9rrza74rnWM6Z7s8vyky Zis9oFF8fpmfeDuUqIMa6ThcxGKZi4pYBI4= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjdvtphq2-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:43 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:42 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 5F29B757CCBB; Tue, 26 Jul 2022 23:09:34 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 06/14] bpf: net: Avoid do_ipv6_setsockopt() taking sk lock when called from bpf Date: Tue, 26 Jul 2022 23:09:34 -0700 Message-ID: <20220727060934.2374650-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: eepqYqkv8wd7Z4k8Oxdy_NKvYAcmIQ53 X-Proofpoint-ORIG-GUID: eepqYqkv8wd7Z4k8Oxdy_NKvYAcmIQ53 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similar to the earlier patch that avoids sock_setsockopt() from taking sk lock when called from bpf. This patch changes do_ipv6_setsockopt() to use the {lock,release}_sock_sockopt(). Signed-off-by: Martin KaFai Lau --- net/ipv6/ipv6_sockglue.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 222f6bf220ba..4559f02ab4a8 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -417,7 +417,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, if (needs_rtnl) rtnl_lock(); - lock_sock(sk); + lock_sock_sockopt(sk, optval); switch (optname) { @@ -994,14 +994,14 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; } - release_sock(sk); + release_sock_sockopt(sk, optval); if (needs_rtnl) rtnl_unlock(); return retv; e_inval: - release_sock(sk); + release_sock_sockopt(sk, optval); if (needs_rtnl) rtnl_unlock(); return -EINVAL; From patchwork Wed Jul 27 06:09:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930094 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B45E5C19F2B for ; Wed, 27 Jul 2022 06:09:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230183AbiG0GJz (ORCPT ); Wed, 27 Jul 2022 02:09:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230267AbiG0GJu (ORCPT ); Wed, 27 Jul 2022 02:09:50 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71AB5402C1 for ; Tue, 26 Jul 2022 23:09:49 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND0Bu004975 for ; Tue, 26 Jul 2022 23:09:49 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=oBzZsS9YU5IlPp/uz4kK4qHUrMDYMlhpxpusLRYRTog=; b=JJWW+THEpRMbB2koL8w5JrY3Evu0esVMDTiSY+uGg8u4/Rl60+EefbIVVh9PyWQRAaxt ibYk9ZMmsbshiykcgC7RAXmgMJ1qFWHJnkV6WHs8jbt79aBL/X4Kwz3QrCA3xC0DvDNW v5i49nThodhAsm/VzBIP1lzUatJ2Q3sQEQ4= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjhxaw34f-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:49 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:48 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id AC832757CCF3; Tue, 26 Jul 2022 23:09:40 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 07/14] bpf: Embed kernel CONFIG check into the if statement in bpf_setsockopt Date: Tue, 26 Jul 2022 23:09:40 -0700 Message-ID: <20220727060940.2376067-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 96E2EmI0n1dS-nsQzrxV5AksitTbjokM X-Proofpoint-GUID: 96E2EmI0n1dS-nsQzrxV5AksitTbjokM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch moves the "#ifdef CONFIG_XXX" check into the "if/else" statement itself. The change is done for the bpf_setsockopt() function only. It will make the latter patches easier to follow without the surrounding ifdef macro. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 5669248aff25..01cb4a01b1c1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5113,8 +5113,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#ifdef CONFIG_INET - } else if (level == SOL_IP) { + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { if (optlen != sizeof(int) || sk->sk_family != AF_INET) return -EINVAL; @@ -5135,8 +5134,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#if IS_ENABLED(CONFIG_IPV6) - } else if (level == SOL_IPV6) { + } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { if (optlen != sizeof(int) || sk->sk_family != AF_INET6) return -EINVAL; @@ -5157,8 +5155,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } -#endif - } else if (level == SOL_TCP && + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && sk->sk_prot->setsockopt == tcp_setsockopt) { if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -5250,7 +5247,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, ret = -EINVAL; } } -#endif } else { ret = -EINVAL; } From patchwork Wed Jul 27 06:09:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930095 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DF01C19F28 for ; Wed, 27 Jul 2022 06:10:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230191AbiG0GKC (ORCPT ); Wed, 27 Jul 2022 02:10:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230174AbiG0GJy (ORCPT ); Wed, 27 Jul 2022 02:09:54 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 472833FA32 for ; Tue, 26 Jul 2022 23:09:53 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QNDDsa000738 for ; Tue, 26 Jul 2022 23:09:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=y4jFDHk3Wstaay7AZalAvEfrDZlbDykmCIKY5rB20w8=; b=GExIVPStZ3DjJO1+X8IGiBnUbi2vE3toDXJ1RJmQ9uRaiGG47ODv7gWsK5bccxc7ruMy CYGH6/PxOM9QIJTGyyP8Z1Hke+0GTQ/bTy3JZKNrbHvhRv/mdL/vG71/ytJTlpGsx5nc FIIw5DI6hEeCrtT7qQ64Agthcn+yCP8uBrs= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hj592rgr9-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:09:51 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:09:50 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 0BCA9757CD5F; Tue, 26 Jul 2022 23:09:46 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 08/14] bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sock_setsockopt() Date: Tue, 26 Jul 2022 23:09:46 -0700 Message-ID: <20220727060946.2377275-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: P-4CsFefq2eHS1NgGTM6RskInxNdXXTv X-Proofpoint-GUID: P-4CsFefq2eHS1NgGTM6RskInxNdXXTv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET) and reuses them from sock_setsockopt(). The only exception is SO_RCVLOWAT. The bpf_setsockopt() does not always have the sock ptr (sk->sk_socket) and sock->ops->set_rcvlowat is needed in sock_setsockopt. tcp_set_rcvlowat is the only implementation for set_rcvlowat. bpf_setsockopt() needs one special handling for SO_RCVLOWAT for tcp_sock, so leave SO_RCVLOWAT in bpf_setsockopt() for now. The existing optname white-list is refactored into a new function sol_socket_setsockopt(). Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 130 ++++++++++++++-------------------------------- 1 file changed, 38 insertions(+), 92 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 01cb4a01b1c1..77a9aa17a1c0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5013,106 +5013,52 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +static int sol_socket_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + switch (optname) { + case SO_SNDBUF: + case SO_RCVBUF: + case SO_KEEPALIVE: + case SO_PRIORITY: + case SO_REUSEPORT: + case SO_RCVLOWAT: + case SO_MARK: + case SO_MAX_PACING_RATE: + case SO_BINDTOIFINDEX: + case SO_TXREHASH: + if (optlen != sizeof(int)) + return -EINVAL; + break; + case SO_BINDTODEVICE: + break; + default: + return -EINVAL; + } + + if (optname == SO_RCVLOWAT) { + int val = *(int *)optval; + + if (val < 0) + val = INT_MAX; + WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); + return 0; + } + + return sock_setsockopt(sk, SOL_SOCKET, optname, + KERNEL_SOCKPTR_BPF(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { - char devname[IFNAMSIZ]; - int val, valbool; - struct net *net; - int ifindex; - int ret = 0; + int val, ret = 0; if (!sk_fullsock(sk)) return -EINVAL; if (level == SOL_SOCKET) { - if (optlen != sizeof(int) && optname != SO_BINDTODEVICE) - return -EINVAL; - val = *((int *)optval); - valbool = val ? 1 : 0; - - /* Only some socketops are supported */ - switch (optname) { - case SO_RCVBUF: - val = min_t(u32, val, sysctl_rmem_max); - val = min_t(int, val, INT_MAX / 2); - sk->sk_userlocks |= SOCK_RCVBUF_LOCK; - WRITE_ONCE(sk->sk_rcvbuf, - max_t(int, val * 2, SOCK_MIN_RCVBUF)); - break; - case SO_SNDBUF: - val = min_t(u32, val, sysctl_wmem_max); - val = min_t(int, val, INT_MAX / 2); - sk->sk_userlocks |= SOCK_SNDBUF_LOCK; - WRITE_ONCE(sk->sk_sndbuf, - max_t(int, val * 2, SOCK_MIN_SNDBUF)); - break; - case SO_MAX_PACING_RATE: /* 32bit version */ - if (val != ~0U) - cmpxchg(&sk->sk_pacing_status, - SK_PACING_NONE, - SK_PACING_NEEDED); - sk->sk_max_pacing_rate = (val == ~0U) ? - ~0UL : (unsigned int)val; - sk->sk_pacing_rate = min(sk->sk_pacing_rate, - sk->sk_max_pacing_rate); - break; - case SO_PRIORITY: - sk->sk_priority = val; - break; - case SO_RCVLOWAT: - if (val < 0) - val = INT_MAX; - WRITE_ONCE(sk->sk_rcvlowat, val ? : 1); - break; - case SO_MARK: - if (sk->sk_mark != val) { - sk->sk_mark = val; - sk_dst_reset(sk); - } - break; - case SO_BINDTODEVICE: - optlen = min_t(long, optlen, IFNAMSIZ - 1); - strncpy(devname, optval, optlen); - devname[optlen] = 0; - - ifindex = 0; - if (devname[0] != '\0') { - struct net_device *dev; - - ret = -ENODEV; - - net = sock_net(sk); - dev = dev_get_by_name(net, devname); - if (!dev) - break; - ifindex = dev->ifindex; - dev_put(dev); - } - fallthrough; - case SO_BINDTOIFINDEX: - if (optname == SO_BINDTOIFINDEX) - ifindex = val; - ret = sock_bindtoindex(sk, ifindex, false); - break; - case SO_KEEPALIVE: - if (sk->sk_prot->keepalive) - sk->sk_prot->keepalive(sk, valbool); - sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool); - break; - case SO_REUSEPORT: - sk->sk_reuseport = valbool; - break; - case SO_TXREHASH: - if (val < -1 || val > 1) { - ret = -EINVAL; - break; - } - sk->sk_txrehash = (u8)val; - break; - default: - ret = -EINVAL; - } + return sol_socket_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { if (optlen != sizeof(int) || sk->sk_family != AF_INET) return -EINVAL; From patchwork Wed Jul 27 06:09:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930096 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CACBC19F21 for ; Wed, 27 Jul 2022 06:10:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231179AbiG0GKR (ORCPT ); Wed, 27 Jul 2022 02:10:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231177AbiG0GKK (ORCPT ); Wed, 27 Jul 2022 02:10:10 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D8D1402DF for ; Tue, 26 Jul 2022 23:10:02 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND8Tl008844 for ; Tue, 26 Jul 2022 23:10:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Q7ye1uvhTNm1WK39/SGxiNy6FlGNaF1cXm4Kn6i1V3M=; b=XskIJ089+fAg2HoROJ4X0AUx3+jz8v6hkiCqmYLEFX6t/CXjKSKRPX7XwZupH+vaIUYb fayGGqSzSQxFzEc1f6IfcODmgtHvkgQ/siUTMRdvGpWS3HQf7UPsCCQ9HlAzdgj0KnVn zEOQBX4mx554FfaKf9uNfaKi8JPs20P5x38= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjdvtphr5-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:10:01 -0700 Received: from snc-exhub201.TheFacebook.com (2620:10d:c085:21d::7) by snc-exhub204.TheFacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:00 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:00 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 571DB757CDA8; Tue, 26 Jul 2022 23:09:53 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 09/14] bpf: Refactor bpf specific tcp optnames to a new function Date: Tue, 26 Jul 2022 23:09:53 -0700 Message-ID: <20220727060953.2377778-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: jBO78Io7L99InDS6wszjkeaRx8ojfL9x X-Proofpoint-ORIG-GUID: jBO78Io7L99InDS6wszjkeaRx8ojfL9x X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The patch moves all bpf specific tcp optnames (TCP_BPF_XXX) to a new function bpf_sol_tcp_setsockopt(). This will make the next patch easier to follow. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 79 ++++++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 29 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 77a9aa17a1c0..8dd195b9b860 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5049,6 +5049,52 @@ static int sol_socket_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR_BPF(optval), optlen); } +static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + struct tcp_sock *tp = tcp_sk(sk); + unsigned long timeout; + int val; + + if (optlen != sizeof(int)) + return -EINVAL; + + val = *(int *)optval; + + /* Only some options are supported */ + switch (optname) { + case TCP_BPF_IW: + if (val <= 0 || tp->data_segs_out > tp->syn_data) + return -EINVAL; + tcp_snd_cwnd_set(tp, val); + break; + case TCP_BPF_SNDCWND_CLAMP: + if (val <= 0) + return -EINVAL; + tp->snd_cwnd_clamp = val; + tp->snd_ssthresh = val; + break; + case TCP_BPF_DELACK_MAX: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_DELACK_MAX || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_delack_max = timeout; + break; + case TCP_BPF_RTO_MIN: + timeout = usecs_to_jiffies(val); + if (timeout > TCP_RTO_MIN || + timeout < TCP_TIMEOUT_MIN) + return -EINVAL; + inet_csk(sk)->icsk_rto_min = timeout; + break; + default: + return -EINVAL; + } + + return 0; +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5103,6 +5149,10 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, } } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && sk->sk_prot->setsockopt == tcp_setsockopt) { + if (optname >= TCP_BPF_IW) + return bpf_sol_tcp_setsockopt(sk, optname, + optval, optlen); + if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -5113,7 +5163,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, } else { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); - unsigned long timeout; if (optlen != sizeof(int)) return -EINVAL; @@ -5121,34 +5170,6 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, val = *((int *)optval); /* Only some options are supported */ switch (optname) { - case TCP_BPF_IW: - if (val <= 0 || tp->data_segs_out > tp->syn_data) - ret = -EINVAL; - else - tcp_snd_cwnd_set(tp, val); - break; - case TCP_BPF_SNDCWND_CLAMP: - if (val <= 0) { - ret = -EINVAL; - } else { - tp->snd_cwnd_clamp = val; - tp->snd_ssthresh = val; - } - break; - case TCP_BPF_DELACK_MAX: - timeout = usecs_to_jiffies(val); - if (timeout > TCP_DELACK_MAX || - timeout < TCP_TIMEOUT_MIN) - return -EINVAL; - inet_csk(sk)->icsk_delack_max = timeout; - break; - case TCP_BPF_RTO_MIN: - timeout = usecs_to_jiffies(val); - if (timeout > TCP_RTO_MIN || - timeout < TCP_TIMEOUT_MIN) - return -EINVAL; - inet_csk(sk)->icsk_rto_min = timeout; - break; case TCP_SAVE_SYN: if (val < 0 || val > 1) ret = -EINVAL; From patchwork Wed Jul 27 06:09:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930097 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12A4FC19F2B for ; Wed, 27 Jul 2022 06:10:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230313AbiG0GK3 (ORCPT ); Wed, 27 Jul 2022 02:10:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231361AbiG0GKN (ORCPT ); Wed, 27 Jul 2022 02:10:13 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FA84262D for ; Tue, 26 Jul 2022 23:10:10 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QNDEBT019357 for ; Tue, 26 Jul 2022 23:10:10 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=GMZDiunj/oFdXyWbHzglPsZfKubquR6VD5UPdIw9Gdc=; b=B52LRS9+pJIeR0FNvBQyFHt/DY5gT/5ZL3RRd6qTLBtBy6G3yic3YcubJRUD74JBC54N VLut4D0PBwtohPOmG689sOdeu0rigjvBSiR19Ll+HK0lhv1ONHRMxZrHWbGNZGYS8Mau LNFakdT7JCiKlOid7XRvUcQO3yOr5BjQAjo= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hhxbwutme-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:10:09 -0700 Received: from snc-exhub201.TheFacebook.com (2620:10d:c085:21d::7) by snc-exhub104.TheFacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:09 -0700 Received: from twshared22413.18.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:09 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id CA7C7757CE1E; Tue, 26 Jul 2022 23:09:59 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 10/14] bpf: Change bpf_setsockopt(SOL_TCP) to reuse do_tcp_setsockopt() Date: Tue, 26 Jul 2022 23:09:59 -0700 Message-ID: <20220727060959.2378252-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: vI3B7vRAXfU3dVcghk8j7tLL3RkzYnnT X-Proofpoint-ORIG-GUID: vI3B7vRAXfU3dVcghk8j7tLL3RkzYnnT X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes all the dup code from bpf_setsockopt(SOL_TCP) and reuses the do_tcp_setsockopt(). The existing optname white-list is refactored into a new function sol_tcp_setsockopt(). The sol_tcp_setsockopt() also calls the bpf_sol_tcp_setsockopt() to handle the TCP_BPF_XXX specific optnames. bpf_setsockopt(TCP_SAVE_SYN) now also allows a value 2 to save the eth header also and it comes for free from do_tcp_setsockopt(). Signed-off-by: Martin KaFai Lau --- include/net/tcp.h | 2 + net/core/filter.c | 97 +++++++++++++++-------------------------------- net/ipv4/tcp.c | 2 +- 3 files changed, 33 insertions(+), 68 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index f9e7c85ea829..06b63a807c33 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -405,6 +405,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); bool tcp_bpf_bypass_getsockopt(int level, int optname); +int do_tcp_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); void tcp_set_keepalive(struct sock *sk, int val); diff --git a/net/core/filter.c b/net/core/filter.c index 8dd195b9b860..97aed6575810 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5095,6 +5095,34 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, return 0; } +static int sol_tcp_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_prot->setsockopt != tcp_setsockopt) + return -EINVAL; + + switch (optname) { + case TCP_KEEPIDLE: + case TCP_KEEPINTVL: + case TCP_KEEPCNT: + case TCP_SYNCNT: + case TCP_WINDOW_CLAMP: + case TCP_USER_TIMEOUT: + case TCP_NOTSENT_LOWAT: + case TCP_SAVE_SYN: + if (optlen != sizeof(int)) + return -EINVAL; + break; + case TCP_CONGESTION: + break; + default: + return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen); + } + + return do_tcp_setsockopt(sk, SOL_TCP, optname, + KERNEL_SOCKPTR_BPF(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5147,73 +5175,8 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, default: ret = -EINVAL; } - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && - sk->sk_prot->setsockopt == tcp_setsockopt) { - if (optname >= TCP_BPF_IW) - return bpf_sol_tcp_setsockopt(sk, optname, - optval, optlen); - - if (optname == TCP_CONGESTION) { - char name[TCP_CA_NAME_MAX]; - - strncpy(name, optval, min_t(long, optlen, - TCP_CA_NAME_MAX-1)); - name[TCP_CA_NAME_MAX-1] = 0; - ret = tcp_set_congestion_control(sk, name, false, true); - } else { - struct inet_connection_sock *icsk = inet_csk(sk); - struct tcp_sock *tp = tcp_sk(sk); - - if (optlen != sizeof(int)) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case TCP_SAVE_SYN: - if (val < 0 || val > 1) - ret = -EINVAL; - else - tp->save_syn = val; - break; - case TCP_KEEPIDLE: - ret = tcp_sock_set_keepidle_locked(sk, val); - break; - case TCP_KEEPINTVL: - if (val < 1 || val > MAX_TCP_KEEPINTVL) - ret = -EINVAL; - else - tp->keepalive_intvl = val * HZ; - break; - case TCP_KEEPCNT: - if (val < 1 || val > MAX_TCP_KEEPCNT) - ret = -EINVAL; - else - tp->keepalive_probes = val; - break; - case TCP_SYNCNT: - if (val < 1 || val > MAX_TCP_SYNCNT) - ret = -EINVAL; - else - icsk->icsk_syn_retries = val; - break; - case TCP_USER_TIMEOUT: - if (val < 0) - ret = -EINVAL; - else - icsk->icsk_user_timeout = val; - break; - case TCP_NOTSENT_LOWAT: - tp->notsent_lowat = val; - sk->sk_write_space(sk); - break; - case TCP_WINDOW_CLAMP: - ret = tcp_set_window_clamp(sk, val); - break; - default: - ret = -EINVAL; - } - } + } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) { + return sol_tcp_setsockopt(sk, optname, optval, optlen); } else { ret = -EINVAL; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 7f8d81befa8e..5a327a0e1af9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3439,7 +3439,7 @@ int tcp_set_window_clamp(struct sock *sk, int val) /* * Socket option code for TCP. */ -static int do_tcp_setsockopt(struct sock *sk, int level, int optname, +int do_tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen) { struct tcp_sock *tp = tcp_sk(sk); From patchwork Wed Jul 27 06:10:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930098 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3EEDC19F21 for ; Wed, 27 Jul 2022 06:10:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230341AbiG0GKi (ORCPT ); Wed, 27 Jul 2022 02:10:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231451AbiG0GKR (ORCPT ); Wed, 27 Jul 2022 02:10:17 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E76711581B for ; Tue, 26 Jul 2022 23:10:15 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND2vx005041 for ; Tue, 26 Jul 2022 23:10:15 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=HIcXugGdjGb3m4nxvq9BIJhx/dKvecQVqj0TazWuD9k=; b=K4xZaEjbgj33f9x6iqWQpwckufvrbIVIBfgZmHscf5hci3zZyF1XW/9nqBW63VIodviz 1GdaeSdxuT5MbKEYCsgTtqscPCPocZ7RjkZqtpZ3imvdDkWCeI0NchMH8gGKbOSrtQmN GW+eHnSiI13I8zU3FT37d2HJEJnc6B6gQ3s= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjhxaw369-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:10:15 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:13 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 22443757CF93; Tue, 26 Jul 2022 23:10:06 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 11/14] bpf: Change bpf_setsockopt(SOL_IP) to reuse do_ip_setsockopt() Date: Tue, 26 Jul 2022 23:10:06 -0700 Message-ID: <20220727061006.2379501-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: nCt6cC60D4zJJbndzz-fEcCdgikfCHq7 X-Proofpoint-GUID: nCt6cC60D4zJJbndzz-fEcCdgikfCHq7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IP) and reuses the implementation in do_ip_setsockopt(). The existing optname white-list is refactored into a new function sol_ip_setsockopt(). The current bpf_setsockopt(IP_TOS) does not take the INET_ECN_MASK into the account for tcp. The do_ip_setsockopt(IP_TOS) will handle it correctly. Signed-off-by: Martin KaFai Lau --- include/net/ip.h | 2 ++ net/core/filter.c | 40 ++++++++++++++++++++-------------------- net/ipv4/ip_sockglue.c | 4 ++-- 3 files changed, 24 insertions(+), 22 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 1c979fd1904c..34fa5b0f0a0e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -743,6 +743,8 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk, int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc, bool allow_ipv6); DECLARE_STATIC_KEY_FALSE(ip4_min_ttl); +int do_ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, diff --git a/net/core/filter.c b/net/core/filter.c index 97aed6575810..67c87d7acb23 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5123,6 +5123,25 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR_BPF(optval), optlen); } +static int sol_ip_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_family != AF_INET) + return -EINVAL; + + switch (optname) { + case IP_TOS: + if (optlen != sizeof(int)) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return do_ip_setsockopt(sk, SOL_IP, optname, + KERNEL_SOCKPTR_BPF(optval), optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { @@ -5134,26 +5153,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, if (level == SOL_SOCKET) { return sol_socket_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { - if (optlen != sizeof(int) || sk->sk_family != AF_INET) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case IP_TOS: - if (val < -1 || val > 0xff) { - ret = -EINVAL; - } else { - struct inet_sock *inet = inet_sk(sk); - - if (val == -1) - val = 0; - inet->tos = val; - } - break; - default: - ret = -EINVAL; - } + return sol_ip_setsockopt(sk, optname, optval, optlen); } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { if (optlen != sizeof(int) || sk->sk_family != AF_INET6) return -EINVAL; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 8271ad565a3a..e70572142d99 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -888,8 +888,8 @@ static int compat_ip_mcast_join_leave(struct sock *sk, int optname, DEFINE_STATIC_KEY_FALSE(ip4_min_ttl); -static int do_ip_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_ip_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct inet_sock *inet = inet_sk(sk); struct net *net = sock_net(sk); From patchwork Wed Jul 27 06:10:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930099 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 707D8C19F21 for ; Wed, 27 Jul 2022 06:10:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230322AbiG0GKl (ORCPT ); Wed, 27 Jul 2022 02:10:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231266AbiG0GK0 (ORCPT ); Wed, 27 Jul 2022 02:10:26 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9DE51035 for ; Tue, 26 Jul 2022 23:10:25 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND3GB005118 for ; Tue, 26 Jul 2022 23:10:25 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=L7/Ct8dJb4Le+DgrGi94W6ApfbmILzSfMfxX97vXqrM=; b=JhQx8Lfuwy/DcJ4IW6R/ApwLIkdzdQ5mcnnrRsdPmLDQn4Wiav87/2wFJ1LIlIt1xtMD 8Li/cX1MCRMXU76CDAiH9O0XdrlDcb+IkXyGRknh8SBya6Zy6Ex/wi5DGanA1RHR5cgr +n9LNnMln7aNdzGUZpraVPTB+emHsLXOELY= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjhxaw36v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:10:25 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:23 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 6F99C757CFB0; Tue, 26 Jul 2022 23:10:12 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 12/14] bpf: Change bpf_setsockopt(SOL_IPV6) to reuse do_ipv6_setsockopt() Date: Tue, 26 Jul 2022 23:10:12 -0700 Message-ID: <20220727061012.2380506-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: UUXVjjQpUp_EOqFCoUiSKz_QLw4_ufAe X-Proofpoint-GUID: UUXVjjQpUp_EOqFCoUiSKz_QLw4_ufAe X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IPV6) and reuses the implementation in do_ipv6_setsockopt(). ipv6 could be compiled as a module. Like how other codes solved it with stubs in ipv6_stubs.h, this patch adds the do_ipv6_setsockopt to the ipv6_bpf_stub. The current bpf_setsockopt(IPV6_TCLASS) does not take the INET_ECN_MASK into the account for tcp. The do_ipv6_setsockopt(IPV6_TCLASS) will handle it correctly. The existing optname white-list is refactored into a new function sol_ipv6_setsockopt(). After this last SOL_IPV6 dup code removal, the __bpf_setsockopt() is simplified enough that the extra "{ }" around the if statement can be removed. Signed-off-by: Martin KaFai Lau --- include/net/ipv6.h | 2 ++ include/net/ipv6_stubs.h | 2 ++ net/core/filter.c | 57 +++++++++++++++++++--------------------- net/ipv6/af_inet6.c | 1 + net/ipv6/ipv6_sockglue.c | 4 +-- 5 files changed, 34 insertions(+), 32 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index de9dcc5652c4..c110d9032083 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1156,6 +1156,8 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6, */ DECLARE_STATIC_KEY_FALSE(ip6_min_hopcount); +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ipv6_getsockopt(struct sock *sk, int level, int optname, diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h index 45e0339be6fa..8692698b01cf 100644 --- a/include/net/ipv6_stubs.h +++ b/include/net/ipv6_stubs.h @@ -81,6 +81,8 @@ struct ipv6_bpf_stub { const struct in6_addr *daddr, __be16 dport, int dif, int sdif, struct udp_table *tbl, struct sk_buff *skb); + int (*ipv6_setsockopt)(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); }; extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly; diff --git a/net/core/filter.c b/net/core/filter.c index 67c87d7acb23..7b510e009bb3 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5142,45 +5142,42 @@ static int sol_ip_setsockopt(struct sock *sk, int optname, KERNEL_SOCKPTR_BPF(optval), optlen); } +static int sol_ipv6_setsockopt(struct sock *sk, int optname, + char *optval, int optlen) +{ + if (sk->sk_family != AF_INET6) + return -EINVAL; + + switch (optname) { + case IPV6_TCLASS: + if (optlen != sizeof(int)) + return -EINVAL; + break; + default: + return -EINVAL; + } + + return ipv6_bpf_stub->ipv6_setsockopt(sk, SOL_IPV6, optname, + KERNEL_SOCKPTR_BPF(optval), + optlen); +} + static int __bpf_setsockopt(struct sock *sk, int level, int optname, char *optval, int optlen) { - int val, ret = 0; - if (!sk_fullsock(sk)) return -EINVAL; - if (level == SOL_SOCKET) { + if (level == SOL_SOCKET) return sol_socket_setsockopt(sk, optname, optval, optlen); - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) { + else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) return sol_ip_setsockopt(sk, optname, optval, optlen); - } else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) { - if (optlen != sizeof(int) || sk->sk_family != AF_INET6) - return -EINVAL; - - val = *((int *)optval); - /* Only some options are supported */ - switch (optname) { - case IPV6_TCLASS: - if (val < -1 || val > 0xff) { - ret = -EINVAL; - } else { - struct ipv6_pinfo *np = inet6_sk(sk); - - if (val == -1) - val = 0; - np->tclass = val; - } - break; - default: - ret = -EINVAL; - } - } else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) { + else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) + return sol_ipv6_setsockopt(sk, optname, optval, optlen); + else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) return sol_tcp_setsockopt(sk, optname, optval, optlen); - } else { - ret = -EINVAL; - } - return ret; + + return -EINVAL; } static int _bpf_setsockopt(struct sock *sk, int level, int optname, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 2ce0c44d0081..cadc97852787 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -1057,6 +1057,7 @@ static const struct ipv6_stub ipv6_stub_impl = { static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = { .inet6_bind = __inet6_bind, .udp6_lib_lookup = __udp6_lib_lookup, + .ipv6_setsockopt = do_ipv6_setsockopt, }; static int __init inet6_init(void) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 4559f02ab4a8..0eef5a11dc3c 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -391,8 +391,8 @@ static int ipv6_set_opt_hdr(struct sock *sk, int optname, sockptr_t optval, return err; } -static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, - sockptr_t optval, unsigned int optlen) +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen) { struct ipv6_pinfo *np = inet6_sk(sk); struct net *net = sock_net(sk); From patchwork Wed Jul 27 06:10:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930100 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 051EAC19F28 for ; Wed, 27 Jul 2022 06:10:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230158AbiG0GKt (ORCPT ); Wed, 27 Jul 2022 02:10:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230262AbiG0GKb (ORCPT ); Wed, 27 Jul 2022 02:10:31 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58322258 for ; Tue, 26 Jul 2022 23:10:30 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 26QND3lT001087 for ; Tue, 26 Jul 2022 23:10:29 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=qCB9pTHH9Z78sg6zce++NMce/xeti5pluLE82LZEVDs=; b=P+CzyG+kAqr9d9oN7WID9JN30YC4wknvaphTLVA4t/DZCLwrDvVLiBAFqRK3QUWPATlS yp1BPt4ajCaql+pGrh3srOkeNfN4FwMxBzbUH3/AvSETJQcKHxJ4x29HG+WHNldRTBKE /N6RigwtSTdRsyBLO+G1jQe0gaqSnCSSDeg= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3hjj4e4whk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:10:29 -0700 Received: from twshared30313.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:28 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id CF3E4757CFBE; Tue, 26 Jul 2022 23:10:18 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 13/14] bpf: Add a few optnames to bpf_setsockopt Date: Tue, 26 Jul 2022 23:10:18 -0700 Message-ID: <20220727061018.2380664-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: H_xsR9bgfn2xtvQVgdYkiRwhLi8AqnJm X-Proofpoint-GUID: H_xsR9bgfn2xtvQVgdYkiRwhLi8AqnJm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds a few optnames for bpf_setsockopt: SO_REUSEADDR, IPV6_AUTOFLOWLABEL, TCP_MAXSEG, TCP_NODELAY, and TCP_THIN_LINEAR_TIMEOUTS. Thanks to the previous patches of this set, all additions can reuse the sock_setsockopt(), do_ipv6_setsockopt(), and do_tcp_setsockopt(). The only change here is to allow them in bpf_setsockopt. The bpf prog has been able to read all members of a sk by using PTR_TO_BTF_ID of a sk. The optname additions here can also be read by the same approach. Meaning there is a way to read the values back. These optnames can also be added to bpf_getsockopt() later with another patch set that makes the bpf_getsockopt() to reuse the sock_getsockopt(), tcp_getsockopt(), and ip[v6]_getsockopt(). Thus, this patch does not add more duplicated codes to bpf_getsockopt() now. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index 7b510e009bb3..899ee7b4a04a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5017,6 +5017,7 @@ static int sol_socket_setsockopt(struct sock *sk, int optname, char *optval, int optlen) { switch (optname) { + case SO_REUSEADDR: case SO_SNDBUF: case SO_RCVBUF: case SO_KEEPALIVE: @@ -5102,11 +5103,14 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname, return -EINVAL; switch (optname) { + case TCP_NODELAY: + case TCP_MAXSEG: case TCP_KEEPIDLE: case TCP_KEEPINTVL: case TCP_KEEPCNT: case TCP_SYNCNT: case TCP_WINDOW_CLAMP: + case TCP_THIN_LINEAR_TIMEOUTS: case TCP_USER_TIMEOUT: case TCP_NOTSENT_LOWAT: case TCP_SAVE_SYN: @@ -5150,6 +5154,7 @@ static int sol_ipv6_setsockopt(struct sock *sk, int optname, switch (optname) { case IPV6_TCLASS: + case IPV6_AUTOFLOWLABEL: if (optlen != sizeof(int)) return -EINVAL; break; From patchwork Wed Jul 27 06:10:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12930101 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 898A5C19F21 for ; Wed, 27 Jul 2022 06:10:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230162AbiG0GKx (ORCPT ); Wed, 27 Jul 2022 02:10:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229844AbiG0GKi (ORCPT ); Wed, 27 Jul 2022 02:10:38 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06F211759B for ; Tue, 26 Jul 2022 23:10:37 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QND2If005053 for ; Tue, 26 Jul 2022 23:10:36 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=7MCOx1AoF1yOC0onMUZ2eDZTpIuAB22t2U/DWXwO2T8=; b=GyLb8544FimhC+z8G55qw4hBgAciRgu87L4ZKuKbT6Tldhaa8OprCnCtXk+EMTpEjfHI xelpDeYfsUTQOuD1LxZM31/L6zY9oc3nMULmADKc/4mi/hqGDFmcWRdPJMZLBG1uvmgF TWunkjlY0GzSrMkGnc/3XCDMwLQvIG/u5Q4= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3hjhxaw380-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 26 Jul 2022 23:10:36 -0700 Received: from snc-exhub201.TheFacebook.com (2620:10d:c085:21d::7) by snc-exhub203.TheFacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:35 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 26 Jul 2022 23:10:35 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 24605757CFD4; Tue, 26 Jul 2022 23:10:25 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni Subject: [PATCH bpf-next 14/14] selftests/bpf: bpf_setsockopt tests Date: Tue, 26 Jul 2022 23:10:25 -0700 Message-ID: <20220727061025.2380990-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220727060856.2370358-1-kafai@fb.com> References: <20220727060856.2370358-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 51hPcB9kagR_1SZTaPikLB1GdPrIoNdh X-Proofpoint-GUID: 51hPcB9kagR_1SZTaPikLB1GdPrIoNdh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_07,2022-07-26_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds tests to exercise optnames that are allowed in bpf_setsockopt(). Signed-off-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/setget_sockopt.c | 125 ++++ .../selftests/bpf/progs/setget_sockopt.c | 538 ++++++++++++++++++ 2 files changed, 663 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/setget_sockopt.c create mode 100644 tools/testing/selftests/bpf/progs/setget_sockopt.c diff --git a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c new file mode 100644 index 000000000000..018611e6b248 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c @@ -0,0 +1,125 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#define _GNU_SOURCE +#include +#include +#include + +#include "test_progs.h" +#include "cgroup_helpers.h" +#include "network_helpers.h" + +#include "setget_sockopt.skel.h" + +#define CG_NAME "/setget-sockopt-test" + +static const char addr4_str[] = "127.0.0.1"; +static const char addr6_str[] = "::1"; +static struct setget_sockopt *skel; +static int cg_fd; + +static int create_netns(void) +{ + if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns")) + return -1; + + if (!ASSERT_OK(system("ip link set dev lo up"), "set lo up")) + return -1; + + if (!ASSERT_OK(system("ip link add dev binddevtest1 type veth peer name binddevtest2"), + "add veth")) + return -1; + + if (!ASSERT_OK(system("ip link set dev binddevtest1 up"), + "bring veth up")) + return -1; + + return 0; +} + +static void test_tcp(int family) +{ + struct setget_sockopt__bss *bss = skel->bss; + int sfd, cfd; + + memset(bss, 0, sizeof(*bss)); + + sfd = start_server(family, SOCK_STREAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_GE(sfd, 0, "start_server")) + return; + + cfd = connect_to_fd(sfd, 0); + if (!ASSERT_GE(cfd, 0, "connect_to_fd_server")) { + close(sfd); + return; + } + close(sfd); + close(cfd); + + ASSERT_EQ(bss->nr_listen, 1, "nr_listen"); + ASSERT_EQ(bss->nr_connect, 1, "nr_connect"); + ASSERT_EQ(bss->nr_active, 1, "nr_active"); + ASSERT_EQ(bss->nr_passive, 1, "nr_passive"); + ASSERT_EQ(bss->nr_socket_post_create, 2, "nr_socket_post_create"); + ASSERT_EQ(bss->nr_binddev, 2, "nr_bind"); +} + +static void test_udp(int family) +{ + struct setget_sockopt__bss *bss = skel->bss; + int sfd; + + memset(bss, 0, sizeof(*bss)); + + sfd = start_server(family, SOCK_DGRAM, + family == AF_INET6 ? addr6_str : addr4_str, 0, 0); + if (!ASSERT_GE(sfd, 0, "start_server")) + return; + close(sfd); + + ASSERT_GE(bss->nr_socket_post_create, 1, "nr_socket_post_create"); + ASSERT_EQ(bss->nr_binddev, 1, "nr_bind"); +} + +void test_setget_sockopt(void) +{ + cg_fd = test__join_cgroup(CG_NAME); + if (cg_fd < 0) + return; + + if (create_netns()) + goto done; + + skel = setget_sockopt__open(); + if (!ASSERT_OK_PTR(skel, "open skel")) + goto done; + + strcpy(skel->rodata->veth, "binddevtest1"); + skel->rodata->veth_ifindex = if_nametoindex("binddevtest1"); + if (!ASSERT_GT(skel->rodata->veth_ifindex, 0, "if_nametoindex")) + goto done; + + if (!ASSERT_OK(setget_sockopt__load(skel), "load skel")) + goto done; + + skel->links.skops_sockopt = + bpf_program__attach_cgroup(skel->progs.skops_sockopt, cg_fd); + if (!ASSERT_OK_PTR(skel->links.skops_sockopt, "attach cgroup")) + goto done; + + skel->links.socket_post_create = + bpf_program__attach_cgroup(skel->progs.socket_post_create, cg_fd); + if (!ASSERT_OK_PTR(skel->links.socket_post_create, "attach_cgroup")) + goto done; + + test_tcp(AF_INET6); + test_tcp(AF_INET); + test_udp(AF_INET6); + test_udp(AF_INET); + +done: + setget_sockopt__destroy(skel); + close(cg_fd); +} diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c new file mode 100644 index 000000000000..e52b96cf85fb --- /dev/null +++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c @@ -0,0 +1,538 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef SO_TXREHASH +#define SO_TXREHASH 74 +#endif + +#ifndef TCP_NAGLE_OFF +#define TCP_NAGLE_OFF 1 +#endif + +#ifndef ARRAY_SIZE +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#endif + +extern unsigned long CONFIG_HZ __kconfig; + +const volatile char veth[IFNAMSIZ]; +const volatile int veth_ifindex; +const char cubic_cc[] = "cubic"; +const char reno_cc[] = "reno"; + +int nr_listen; +int nr_passive; +int nr_active; +int nr_connect; +int nr_binddev; +int nr_socket_post_create; + +struct sockopt_test { + int opt; + int new; + int restore; + int expected; + int tcp_expected; + int toggle:1; +}; + +static const struct sockopt_test sol_socket_tests[] = { + { .opt = SO_SNDBUF, .new = 8123, .expected = 8123 * 2, }, + { .opt = SO_RCVBUF, .new = 8123, .expected = 8123 * 2, }, + { .opt = SO_KEEPALIVE, .toggle = 1, }, + { .opt = SO_PRIORITY, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_REUSEPORT, .toggle = 1, }, + { .opt = SO_RCVLOWAT, .new = 8123, .expected = 8123, }, + { .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, }, + { .opt = SO_TXREHASH, .toggle = 1, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_tcp_tests[] = { + { .opt = TCP_NODELAY, .toggle = 1, }, + { .opt = TCP_MAXSEG, .new = 1314, .expected = 1314, }, + { .opt = TCP_KEEPIDLE, .new = 123, .expected = 123, .restore = 321, }, + { .opt = TCP_KEEPINTVL, .new = 123, .expected = 123, .restore = 321, }, + { .opt = TCP_KEEPCNT, .new = 123, .expected = 123, .restore = 124, }, + { .opt = TCP_SYNCNT, .new = 123, .expected = 123, .restore = 124, }, + { .opt = TCP_WINDOW_CLAMP, .new = 8123, .expected = 8123, .restore = 8124, }, + { .opt = TCP_CONGESTION, }, + { .opt = TCP_THIN_LINEAR_TIMEOUTS, .toggle = 1, }, + { .opt = TCP_USER_TIMEOUT, .new = 123400, .expected = 123400, }, + { .opt = TCP_NOTSENT_LOWAT, .new = 1314, .expected = 1314, }, + { .opt = TCP_SAVE_SYN, .new = 1, .expected = 1, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_ip_tests[] = { + { .opt = IP_TOS, .new = 0xe1, .expected = 0xe1, .tcp_expected = 0xe0, }, + { .opt = 0, }, +}; + +static const struct sockopt_test sol_ipv6_tests[] = { + { .opt = IPV6_TCLASS, .new = 0xe1, .expected = 0xe1, .tcp_expected = 0xe0, }, + { .opt = IPV6_AUTOFLOWLABEL, .toggle = 1, }, + { .opt = 0, }, +}; + +struct sock_common { + unsigned short skc_family; + unsigned long skc_flags; +} __attribute__((preserve_access_index)); + +struct sock { + struct sock_common __sk_common; + __u16 sk_type; + __u16 sk_protocol; + int sk_rcvlowat; + __u32 sk_mark; + unsigned long sk_max_pacing_rate; + unsigned int keepalive_time; + unsigned int keepalive_intvl; +} __attribute__((preserve_access_index)); + +struct tcp_options_received { + __u16 user_mss; +} __attribute__((preserve_access_index)); + +struct ipv6_pinfo { + __u16 recverr:1, + sndflow:1, + repflow:1, + pmtudisc:3, + padding:1, + srcprefs:3, + dontfrag:1, + autoflowlabel:1, + autoflowlabel_set:1, + mc_all:1, + recverr_rfc4884:1, + rtalert_isolate:1; +} __attribute__((preserve_access_index)); + +struct inet_sock { + /* sk and pinet6 has to be the first two members of inet_sock */ + struct sock sk; + struct ipv6_pinfo *pinet6; +} __attribute__((preserve_access_index)); + +struct inet_connection_sock { + __u32 icsk_user_timeout; + __u8 icsk_syn_retries; +} __attribute__((preserve_access_index)); + +struct tcp_sock { + struct inet_connection_sock inet_conn; + struct tcp_options_received rx_opt; + __u8 save_syn:2, + syn_data:1, + syn_fastopen:1, + syn_fastopen_exp:1, + syn_fastopen_ch:1, + syn_data_acked:1, + is_cwnd_limited:1; + __u32 window_clamp; + __u8 nonagle : 4, + thin_lto : 1, + recvmsg_inq : 1, + repair : 1, + frto : 1; + __u32 notsent_lowat; + __u8 keepalive_probes; + unsigned int keepalive_time; + unsigned int keepalive_intvl; +} __attribute__((preserve_access_index)); + +struct socket { + struct sock *sk; +} __attribute__((preserve_access_index)); + +struct loop_ctx { + void *ctx; + struct sock *sk; +}; + +static int __bpf_getsockopt(void *ctx, struct sock *sk, + int level, int opt, int *optval, + int optlen) +{ + if (level == SOL_SOCKET) { + switch (opt) { + case SO_KEEPALIVE: + *optval = !!(sk->__sk_common.skc_flags & (1UL << 3)); + break; + case SO_RCVLOWAT: + *optval = sk->sk_rcvlowat; + break; + case SO_MARK: + *optval = sk->sk_mark; + break; + case SO_MAX_PACING_RATE: + *optval = sk->sk_max_pacing_rate; + break; + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + if (level == IPPROTO_TCP) { + struct tcp_sock *tp = bpf_skc_to_tcp_sock(sk); + + if (!tp) + return -1; + + switch (opt) { + case TCP_NODELAY: + *optval = !!(tp->nonagle & TCP_NAGLE_OFF); + break; + case TCP_MAXSEG: + *optval = tp->rx_opt.user_mss; + break; + case TCP_KEEPIDLE: + *optval = tp->keepalive_time / CONFIG_HZ; + break; + case TCP_SYNCNT: + *optval = tp->inet_conn.icsk_syn_retries; + break; + case TCP_KEEPINTVL: + *optval = tp->keepalive_intvl / CONFIG_HZ; + break; + case TCP_KEEPCNT: + *optval = tp->keepalive_probes; + break; + case TCP_WINDOW_CLAMP: + *optval = tp->window_clamp; + break; + case TCP_THIN_LINEAR_TIMEOUTS: + *optval = tp->thin_lto; + break; + case TCP_USER_TIMEOUT: + *optval = tp->inet_conn.icsk_user_timeout; + break; + case TCP_NOTSENT_LOWAT: + *optval = tp->notsent_lowat; + break; + case TCP_SAVE_SYN: + *optval = tp->save_syn; + break; + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + if (level == IPPROTO_IPV6) { + switch (opt) { + case IPV6_AUTOFLOWLABEL: { + __u16 proto = sk->sk_protocol; + struct inet_sock *inet_sk; + + if (proto == IPPROTO_TCP) + inet_sk = (struct inet_sock *)bpf_skc_to_tcp_sock(sk); + else + inet_sk = (struct inet_sock *)bpf_skc_to_udp6_sock(sk); + + if (!inet_sk) + return -1; + + *optval = !!inet_sk->pinet6->autoflowlabel; + break; + } + default: + return bpf_getsockopt(ctx, level, opt, optval, optlen); + } + return 0; + } + + return bpf_getsockopt(ctx, level, opt, optval, optlen); +} + +static int bpf_test_sockopt_flip(void *ctx, struct sock *sk, + const struct sockopt_test *t, + int level) +{ + int old, tmp, new, opt = t->opt; + + opt = t->opt; + + if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old))) + return 1; + /* kernel initialized txrehash to 255 */ + if (level == SOL_SOCKET && opt == SO_TXREHASH && old != 0 && old != 1) + old = 1; + + new = !old; + if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new))) + return 1; + if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) || + tmp != new) + return 1; + + if (bpf_setsockopt(ctx, level, opt, &old, sizeof(old))) + return 1; + + return 0; +} + +static int bpf_test_sockopt_int(void *ctx, struct sock *sk, + const struct sockopt_test *t, + int level) +{ + int old, tmp, new, expected, opt; + + opt = t->opt; + new = t->new; + if (sk->sk_type == SOCK_STREAM && t->tcp_expected) + expected = t->tcp_expected; + else + expected = t->expected; + + if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old)) || + old == new) + return 1; + + if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new))) + return 1; + if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) || + tmp != expected) + return 1; + + if (t->restore) + old = t->restore; + if (bpf_setsockopt(ctx, level, opt, &old, sizeof(old))) + return 1; + + return 0; +} + +static int bpf_test_socket_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_socket_tests)) + return 1; + + t = &sol_socket_tests[i]; + if (!t->opt) + return 1; + + if (t->toggle) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, SOL_SOCKET); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, SOL_SOCKET); +} + +static int bpf_test_ip_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_ip_tests)) + return 1; + + t = &sol_ip_tests[i]; + if (!t->opt) + return 1; + + if (t->toggle) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, IPPROTO_IP); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, IPPROTO_IP); +} + +static int bpf_test_ipv6_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + + if (i >= ARRAY_SIZE(sol_ipv6_tests)) + return 1; + + t = &sol_ipv6_tests[i]; + if (!t->opt) + return 1; + + if (t->toggle) + return bpf_test_sockopt_flip(lc->ctx, lc->sk, t, IPPROTO_IPV6); + + return bpf_test_sockopt_int(lc->ctx, lc->sk, t, IPPROTO_IPV6); +} + +static int bpf_test_tcp_sockopt(__u32 i, struct loop_ctx *lc) +{ + const struct sockopt_test *t; + struct sock *sk; + void *ctx; + + if (i >= ARRAY_SIZE(sol_tcp_tests)) + return 1; + + t = &sol_tcp_tests[i]; + if (!t->opt) + return 1; + + ctx = lc->ctx; + sk = lc->sk; + + if (t->opt == TCP_CONGESTION) { + char old_cc[16], tmp_cc[16]; + const char *new_cc; + + if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, old_cc, sizeof(old_cc))) + return 1; + if (!bpf_strncmp(old_cc, sizeof(old_cc), cubic_cc)) + new_cc = reno_cc; + else + new_cc = cubic_cc; + if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, (void *)new_cc, + sizeof(new_cc))) + return 1; + if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, tmp_cc, sizeof(tmp_cc))) + return 1; + if (bpf_strncmp(tmp_cc, sizeof(tmp_cc), new_cc)) + return 1; + if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, old_cc, sizeof(old_cc))) + return 1; + return 0; + } + + if (t->toggle) + return bpf_test_sockopt_flip(ctx, sk, t, IPPROTO_TCP); + + return bpf_test_sockopt_int(ctx, sk, t, IPPROTO_TCP); +} + +static int bpf_test_sockopt(void *ctx, struct sock *sk) +{ + struct loop_ctx lc = { .ctx = ctx, .sk = sk, }; + __u16 family, proto; + int n; + + family = sk->__sk_common.skc_family; + proto = sk->sk_protocol; + + n = bpf_loop(ARRAY_SIZE(sol_socket_tests), bpf_test_socket_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_socket_tests)) + return -1; + + if (proto == IPPROTO_TCP) { + n = bpf_loop(ARRAY_SIZE(sol_tcp_tests), bpf_test_tcp_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_tcp_tests)) + return -1; + } + + if (family == AF_INET) { + n = bpf_loop(ARRAY_SIZE(sol_ip_tests), bpf_test_ip_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_ip_tests)) + return -1; + } else { + n = bpf_loop(ARRAY_SIZE(sol_ipv6_tests), bpf_test_ipv6_sockopt, &lc, 0); + if (n != ARRAY_SIZE(sol_ipv6_tests)) + return -1; + } + + return 0; +} + +static int binddev_test(void *ctx) +{ + const char empty_ifname[] = ""; + int ifindex, zero = 0; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE, + (void *)veth, sizeof(veth))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != veth_ifindex) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE, + (void *)empty_ifname, sizeof(empty_ifname))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != 0) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + (void *)&veth_ifindex, sizeof(int))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != veth_ifindex) + return -1; + + if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &zero, sizeof(int))) + return -1; + if (bpf_getsockopt(ctx, SOL_SOCKET, SO_BINDTOIFINDEX, + &ifindex, sizeof(int)) || + ifindex != 0) + return -1; + + return 0; +} + +SEC("lsm_cgroup/socket_post_create") +int BPF_PROG(socket_post_create, struct socket *sock, int family, + int type, int protocol, int kern) +{ + struct sock *sk = sock->sk; + + if (!sk) + return 1; + + nr_socket_post_create += !bpf_test_sockopt(sk, sk); + nr_binddev += !binddev_test(sk); + + return 1; +} + +SEC("sockops") +int skops_sockopt(struct bpf_sock_ops *skops) +{ + struct bpf_sock *bpf_sk = skops->sk; + struct sock *sk; + + if (!bpf_sk) + return 1; + + sk = (struct sock *)bpf_skc_to_tcp_sock(bpf_sk); + if (!sk) + return 1; + + switch (skops->op) { + case BPF_SOCK_OPS_TCP_LISTEN_CB: + nr_listen += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_TCP_CONNECT_CB: + nr_connect += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: + nr_active += !bpf_test_sockopt(skops, sk); + break; + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: + nr_passive += !bpf_test_sockopt(skops, sk); + break; + } + + return 1; +} + +char _license[] SEC("license") = "GPL";