From patchwork Fri Sep 23 22:44:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12987261 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 101DFC07E9D for ; Fri, 23 Sep 2022 22:47:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229520AbiIWWrN (ORCPT ); Fri, 23 Sep 2022 18:47:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232003AbiIWWrL (ORCPT ); Fri, 23 Sep 2022 18:47:11 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03F3F5E313 for ; Fri, 23 Sep 2022 15:47:10 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 28NM8xTS030847 for ; Fri, 23 Sep 2022 15:47:10 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=h9D6nJwYPmU+FeFh90ooVRJBT3lkdMbR7mCWzwU2vow=; b=TCzyH7e5c5j2hgRplWXf+P/Aeiyiy3vTm1SdgVn+gy3O8xRXgffnPH73UMf+cIstJkLk eNaWIm0HUs/t2tFu665uLwHwym7smJI0SbeiwuAVrCakvrvGi5K2cN9WA1La1IpiDj4y pF0uuyXR8FdRzZuTZNBCtZDP2FciuHcJQNw= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net (PPS) with ESMTPS id 3js3egy36m-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 23 Sep 2022 15:47:10 -0700 Received: from twshared10425.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 23 Sep 2022 15:47:08 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 6E99A9A4069A; Fri, 23 Sep 2022 15:44:59 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Martin KaFai Lau Subject: [PATCH v2 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Date: Fri, 23 Sep 2022 15:44:59 -0700 Message-ID: <20220923224459.2352143-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220923224453.2351753-1-kafai@fb.com> References: <20220923224453.2351753-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: QVhwpJD_LQQhW3izSF-6RVibhMwEzwfF X-Proofpoint-ORIG-GUID: QVhwpJD_LQQhW3izSF-6RVibhMwEzwfF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_10,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau --- arch/x86/net/bpf_jit_comp.c | 3 +++ include/linux/bpf.h | 4 ++++ kernel/bpf/trampoline.c | 23 +++++++++++++++++++++++ 3 files changed, 30 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index ae89f4143eb4..58a131dec954 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1836,6 +1836,9 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, if (p->aux->sleepable) { enter = __bpf_prog_enter_sleepable; exit = __bpf_prog_exit_sleepable; + } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) { + enter = __bpf_prog_enter_struct_ops; + exit = __bpf_prog_exit_struct_ops; } else if (p->expected_attach_type == BPF_LSM_CGROUP) { enter = __bpf_prog_enter_lsm_cgroup; exit = __bpf_prog_exit_lsm_cgroup; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index edd43edb27d6..6fdbc1398b8a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -864,6 +864,10 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 41b67eb83ab3..e6551e4a6064 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -976,6 +976,29 @@ void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, rcu_read_unlock_trace(); } +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) + __acquires(RCU) +{ + rcu_read_lock(); + migrate_disable(); + + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + + return bpf_prog_start_time(); +} + +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) + __releases(RCU) +{ + bpf_reset_run_ctx(run_ctx->saved_run_ctx); + + update_prog_stats(prog, start); + migrate_enable(); + rcu_read_unlock(); +} + void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr) { percpu_ref_get(&tr->pcref); From patchwork Fri Sep 23 22:45:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12987264 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FA9DC04A95 for ; Fri, 23 Sep 2022 22:48:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232311AbiIWWsP (ORCPT ); Fri, 23 Sep 2022 18:48:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232003AbiIWWsO (ORCPT ); Fri, 23 Sep 2022 18:48:14 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E8845E313 for ; Fri, 23 Sep 2022 15:48:13 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28NM91RH001199 for ; Fri, 23 Sep 2022 15:48:12 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=tWjdpDfLvXhmiu5oXo73H9zGE0MjJCKQaEepAtyFAJw=; b=BYa3leb0QQEQh8rt80uX5bndVBvrtXNlKPn1jucvjL+8IIGwrrPhb9tWgaKUXghUDxKy At6lS5HrQ0ETH9JeDRz9KzmYiT177H6wNc1Uc+ChCdUR1BnJH5SdeccFaXdSHL/asaZN ZurU/QCcQRLMiSr0LBfkLAPg04hOwBcej9Y= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jsf8t37bf-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 23 Sep 2022 15:48:12 -0700 Received: from twshared3307.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 23 Sep 2022 15:48:10 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id B8C7F9A406C7; Fri, 23 Sep 2022 15:45:05 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Martin KaFai Lau Subject: [PATCH v2 bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() Date: Fri, 23 Sep 2022 15:45:05 -0700 Message-ID: <20220923224505.2352542-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220923224453.2351753-1-kafai@fb.com> References: <20220923224453.2351753-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: rjgJF77gO0XdaPQx5HelcibJDBVdWfAs X-Proofpoint-GUID: rjgJF77gO0XdaPQx5HelcibJDBVdWfAs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_10,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks. However, it is not done for cg sock_ddr, cg sockops, and some of the bpf_lsm_cgroup hooks. The tcp-cc "cdg" should have very limited usage. This patch is to move the "cdg" check to the common sol_tcp_sockopt() so that all hooks have a consistent behavior. The motivation to make this check consistent now is because the latter patch will refactor the bpf_setsockopt(TCP_CONGESTION) into another function, so it is better to take this chance to refactor this piece also. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 2fd9449026aa..f4cea3ff994a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5127,6 +5127,13 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, case TCP_CONGESTION: if (*optlen < 2) return -EINVAL; + /* "cdg" is the only cc that alloc a ptr + * in inet_csk_ca area. The bpf-tcp-cc may + * overwrite this ptr after switching to cdg. + */ + if (!getopt && *optlen >= sizeof("cdg") - 1 && + !strncmp("cdg", optval, *optlen)) + return -ENOTSUPP; break; case TCP_SAVED_SYN: if (*optlen < 1) @@ -5285,12 +5292,6 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname, BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level, int, optname, char *, optval, int, optlen) { - if (level == SOL_TCP && optname == TCP_CONGESTION) { - if (optlen >= sizeof("cdg") - 1 && - !strncmp("cdg", optval, optlen)) - return -ENOTSUPP; - } - return _bpf_setsockopt(sk, level, optname, optval, optlen); } From patchwork Fri Sep 23 22:45:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12987263 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A496FC07E9D for ; Fri, 23 Sep 2022 22:48:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232241AbiIWWsN (ORCPT ); Fri, 23 Sep 2022 18:48:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232240AbiIWWsM (ORCPT ); Fri, 23 Sep 2022 18:48:12 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0BFDAF4B5 for ; Fri, 23 Sep 2022 15:48:11 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28NM91RC001199 for ; Fri, 23 Sep 2022 15:48:11 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=x4Cv6lbYHG9ROYs/8OMrPXB3vbyGKY23tmA4A9L+XXo=; b=FJ6Z/m0IQhmcPbeOqxtEK0xlSWD+ZXwmo3Uz77vbT2sEYLusTLZwwnIuTlT/XDe2JMpo MNzBNpwqh2lOdsHfL/WCnC73gyBvQL9LdoEocec7Nsy8zxnHX2zisaqVVtF8f21Zni5O Xe3y/get2ENzJCGWJLLMCeFAaI3kgpZr8SY= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jsf8t37bf-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 23 Sep 2022 15:48:10 -0700 Received: from twshared10425.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 23 Sep 2022 15:48:09 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 264509A406DC; Fri, 23 Sep 2022 15:45:12 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Martin KaFai Lau Subject: [PATCH v2 bpf-next 3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function Date: Fri, 23 Sep 2022 15:45:12 -0700 Message-ID: <20220923224512.2353244-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220923224453.2351753-1-kafai@fb.com> References: <20220923224453.2351753-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: l2CS743Usq68T4uMR4LXDIEFGTOnAFAx X-Proofpoint-GUID: l2CS743Usq68T4uMR4LXDIEFGTOnAFAx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_10,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into another function. The next patch will add extra logic to avoid recursion and this will make the latter patch easier to follow. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index f4cea3ff994a..96f2f7a65e65 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5102,6 +5102,33 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, return 0; } +static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, + int *optlen, bool getopt) +{ + if (*optlen < 2) + return -EINVAL; + + if (getopt) { + if (!inet_csk(sk)->icsk_ca_ops) + return -EINVAL; + /* BPF expects NULL-terminated tcp-cc string */ + optval[--(*optlen)] = '\0'; + return do_tcp_getsockopt(sk, SOL_TCP, TCP_CONGESTION, + KERNEL_SOCKPTR(optval), + KERNEL_SOCKPTR(optlen)); + } + + /* "cdg" is the only cc that alloc a ptr + * in inet_csk_ca area. The bpf-tcp-cc may + * overwrite this ptr after switching to cdg. + */ + if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen)) + return -ENOTSUPP; + + return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + KERNEL_SOCKPTR(optval), *optlen); +} + static int sol_tcp_sockopt(struct sock *sk, int optname, char *optval, int *optlen, bool getopt) @@ -5125,16 +5152,7 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, return -EINVAL; break; case TCP_CONGESTION: - if (*optlen < 2) - return -EINVAL; - /* "cdg" is the only cc that alloc a ptr - * in inet_csk_ca area. The bpf-tcp-cc may - * overwrite this ptr after switching to cdg. - */ - if (!getopt && *optlen >= sizeof("cdg") - 1 && - !strncmp("cdg", optval, *optlen)) - return -ENOTSUPP; - break; + return sol_tcp_sockopt_congestion(sk, optval, optlen, getopt); case TCP_SAVED_SYN: if (*optlen < 1) return -EINVAL; @@ -5159,13 +5177,6 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, return 0; } - if (optname == TCP_CONGESTION) { - if (!inet_csk(sk)->icsk_ca_ops) - return -EINVAL; - /* BPF expects NULL-terminated tcp-cc string */ - optval[--(*optlen)] = '\0'; - } - return do_tcp_getsockopt(sk, SOL_TCP, optname, KERNEL_SOCKPTR(optval), KERNEL_SOCKPTR(optlen)); From patchwork Fri Sep 23 22:45:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12987262 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97EF4C07E9D for ; Fri, 23 Sep 2022 22:48:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231576AbiIWWsK (ORCPT ); Fri, 23 Sep 2022 18:48:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232003AbiIWWsJ (ORCPT ); Fri, 23 Sep 2022 18:48:09 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41CFDAF0DC for ; Fri, 23 Sep 2022 15:48:08 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 28NM8wPe021099 for ; Fri, 23 Sep 2022 15:48:07 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=N7DaUqXLRK9CHN/e3+UHlB9NxAW8df9FBbJug88H3+Q=; b=lM2tmRj6RbHZuKs5lugFBrUyMc9paopQ/vfeAWRLg9NjJBd6ncOla0a3fPP+/ITZF9ZW AsisNeAuPCDdwVoou02u0Jmy2r13zhkU1O9CnMHOamKofvCpPn5efErCdDmn3/2qDMf4 O8lvmcAERwv4/fI/txFPZSKJh47Sr/3V5Oc= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3jshcst26d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 23 Sep 2022 15:48:07 -0700 Received: from twshared25017.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 23 Sep 2022 15:48:05 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id CB1E99A406E9; Fri, 23 Sep 2022 15:45:18 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Martin KaFai Lau Subject: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Date: Fri, 23 Sep 2022 15:45:18 -0700 Message-ID: <20220923224518.2353383-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220923224453.2351753-1-kafai@fb.com> References: <20220923224453.2351753-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: CEFX9f_Q6j32dNVDVIcqeuOvUzM0RSuN X-Proofpoint-GUID: CEFX9f_Q6j32dNVDVIcqeuOvUzM0RSuN X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_10,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau --- include/linux/tcp.h | 6 ++++++ net/core/filter.c | 28 +++++++++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a9fbe22732c3..3bdf687e2fb3 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -388,6 +388,12 @@ struct tcp_sock { u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs * values defined in uapi/linux/tcp.h */ + u8 bpf_chg_cc_inprogress:1; /* In the middle of + * bpf_setsockopt(TCP_CONGESTION), + * it is to avoid the bpf_tcp_cc->init() + * to recur itself by calling + * bpf_setsockopt(TCP_CONGESTION, "itself"). + */ #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG) #else #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0 diff --git a/net/core/filter.c b/net/core/filter.c index 96f2f7a65e65..ac4c45c02da5 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, int *optlen, bool getopt) { + struct tcp_sock *tp; + int ret; + if (*optlen < 2) return -EINVAL; @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen)) return -ENOTSUPP; - return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + /* It stops this looping + * + * .init => bpf_setsockopt(tcp_cc) => .init => + * bpf_setsockopt(tcp_cc)" => .init => .... + * + * The second bpf_setsockopt(tcp_cc) is not allowed + * in order to break the loop when both .init + * are the same bpf prog. + * + * This applies even the second bpf_setsockopt(tcp_cc) + * does not cause a loop. This limits only the first + * '.init' can call bpf_setsockopt(TCP_CONGESTION) to + * pick a fallback cc (eg. peer does not support ECN) + * and the second '.init' cannot fallback to + * another. + */ + tp = tcp_sk(sk); + if (tp->bpf_chg_cc_inprogress) + return -EBUSY; + + tp->bpf_chg_cc_inprogress = 1; + ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, KERNEL_SOCKPTR(optval), *optlen); + tp->bpf_chg_cc_inprogress = 0; + return ret; } static int sol_tcp_sockopt(struct sock *sk, int optname, From patchwork Fri Sep 23 22:45:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12987265 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A41B4C04A95 for ; Fri, 23 Sep 2022 22:48:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231851AbiIWWsQ (ORCPT ); Fri, 23 Sep 2022 18:48:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232003AbiIWWsP (ORCPT ); Fri, 23 Sep 2022 18:48:15 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CED6112757E for ; Fri, 23 Sep 2022 15:48:14 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28NM91RN001199 for ; Fri, 23 Sep 2022 15:48:14 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=DRHfaWUXUvuXuvFFgYURqEyVC/6Ya/fIP5ChI7vwJE8=; b=kmfmHzFN9LcVB4T1LPB6iU+UWrqbRpDUAm+KY3KoWCHo/gLheMNJz6kvFyVMlsLmz8YX oyiJNnk0b+KUAC9tI6BTxu+Fn6SW9Za4gH/EzPYH4X09BmWW7VOqKHAgXWbIXARmSmm4 vupnFYUg1xmvvYhK09eGDXSjlf0IYk82GTE= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jsf8t37bf-12 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 23 Sep 2022 15:48:14 -0700 Received: from twshared10425.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 23 Sep 2022 15:48:10 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 264E09A4070C; Fri, 23 Sep 2022 15:45:25 -0700 (PDT) From: Martin KaFai Lau To: , CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , David Miller , Eric Dumazet , Jakub Kicinski , , Paolo Abeni , Martin KaFai Lau Subject: [PATCH v2 bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) Date: Fri, 23 Sep 2022 15:45:25 -0700 Message-ID: <20220923224525.2353722-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220923224453.2351753-1-kafai@fb.com> References: <20220923224453.2351753-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Vp417AWQRt1M2DrBgC9QkASdG17QESLm X-Proofpoint-GUID: Vp417AWQRt1M2DrBgC9QkASdG17QESLm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_10,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch changes the bpf_dctcp test to ensure the recurred bpf_setsockopt(TCP_CONGESTION) returns -EBUSY. Signed-off-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 +++ tools/testing/selftests/bpf/progs/bpf_dctcp.c | 25 +++++++++++++------ 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c index 2959a52ced06..e980188d4124 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -290,6 +290,10 @@ static void test_dctcp_fallback(void) goto done; ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res"); ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res"); + /* All setsockopt(TCP_CONGESTION) in the recurred + * bpf_dctcp->init() should fail with -EBUSY. + */ + ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 3, "ebusy_cnt"); err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len); if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)")) diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c index 9573be6122be..460682759aed 100644 --- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c +++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include "bpf_tcp_helpers.h" @@ -23,6 +24,7 @@ const char tcp_cdg[] = "cdg"; char cc_res[TCP_CA_NAME_MAX]; int tcp_cdg_res = 0; int stg_result = 0; +int ebusy_cnt = 0; struct { __uint(type, BPF_MAP_TYPE_SK_STORAGE); @@ -64,16 +66,23 @@ void BPF_PROG(dctcp_init, struct sock *sk) if (!(tp->ecn_flags & TCP_ECN_OK) && fallback[0]) { /* Switch to fallback */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)fallback, sizeof(fallback)); - /* Switch back to myself which the bpf trampoline - * stopped calling dctcp_init recursively. + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)fallback, sizeof(fallback)) == -EBUSY) + ebusy_cnt++; + + /* Switch back to myself and the recurred dctcp_init() + * will get -EBUSY for all bpf_setsockopt(TCP_CONGESTION), + * except the last "cdg" one. */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)bpf_dctcp, sizeof(bpf_dctcp)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)bpf_dctcp, sizeof(bpf_dctcp)) == -EBUSY) + ebusy_cnt++; + /* Switch back to fallback */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)fallback, sizeof(fallback)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)fallback, sizeof(fallback)) == -EBUSY) + ebusy_cnt++; + /* Expecting -ENOTSUPP for tcp_cdg_res */ tcp_cdg_res = bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, (void *)tcp_cdg, sizeof(tcp_cdg));