From patchwork Thu Sep 22 22:56:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12985917 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A7F2C54EE9 for ; Thu, 22 Sep 2022 22:56:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230021AbiIVW4i (ORCPT ); Thu, 22 Sep 2022 18:56:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230255AbiIVW4f (ORCPT ); Thu, 22 Sep 2022 18:56:35 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA31D10FE3B for ; Thu, 22 Sep 2022 15:56:33 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28MKigCt027103 for ; Thu, 22 Sep 2022 15:56:33 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=h9D6nJwYPmU+FeFh90ooVRJBT3lkdMbR7mCWzwU2vow=; b=RWExcF7eM2xD1o0K3Tql+RHe1dZU48s4X4zvtK4K+T2aPDo1nExvddwYSauXJ6CYOMUu byLitfMaiw2VzS4zS6ck6UCwEky4WWrZzY25+izjoqwJrI7OpxuJ4HVbibS6kDu6Ts+x jiPgRrjD2dNPjTdCY3SiQRCVYNY7HfxR43Q= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jrenwfgb9-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 22 Sep 2022 15:56:33 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 15:56:29 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id ACE27999DAFA; Thu, 22 Sep 2022 15:56:23 -0700 (PDT) From: Martin KaFai Lau To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , , Martin KaFai Lau Subject: [PATCH bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Date: Thu, 22 Sep 2022 15:56:23 -0700 Message-ID: <20220922225623.3055902-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220922225616.3054840-1-kafai@fb.com> References: <20220922225616.3054840-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: dREilJCYBTl_3OCvKbpwMxb5iJVCrVKP X-Proofpoint-ORIG-GUID: dREilJCYBTl_3OCvKbpwMxb5iJVCrVKP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-22_15,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau --- arch/x86/net/bpf_jit_comp.c | 3 +++ include/linux/bpf.h | 4 ++++ kernel/bpf/trampoline.c | 23 +++++++++++++++++++++++ 3 files changed, 30 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index ae89f4143eb4..58a131dec954 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1836,6 +1836,9 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, if (p->aux->sleepable) { enter = __bpf_prog_enter_sleepable; exit = __bpf_prog_exit_sleepable; + } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) { + enter = __bpf_prog_enter_struct_ops; + exit = __bpf_prog_exit_struct_ops; } else if (p->expected_attach_type == BPF_LSM_CGROUP) { enter = __bpf_prog_enter_lsm_cgroup; exit = __bpf_prog_exit_lsm_cgroup; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index edd43edb27d6..6fdbc1398b8a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -864,6 +864,10 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 41b67eb83ab3..e6551e4a6064 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -976,6 +976,29 @@ void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, rcu_read_unlock_trace(); } +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) + __acquires(RCU) +{ + rcu_read_lock(); + migrate_disable(); + + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + + return bpf_prog_start_time(); +} + +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) + __releases(RCU) +{ + bpf_reset_run_ctx(run_ctx->saved_run_ctx); + + update_prog_stats(prog, start); + migrate_enable(); + rcu_read_unlock(); +} + void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr) { percpu_ref_get(&tr->pcref); From patchwork Thu Sep 22 22:56:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12985918 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3D13ECAAD8 for ; Thu, 22 Sep 2022 22:56:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbiIVW4r (ORCPT ); Thu, 22 Sep 2022 18:56:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230333AbiIVW4l (ORCPT ); Thu, 22 Sep 2022 18:56:41 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 267F510C788 for ; Thu, 22 Sep 2022 15:56:39 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 28MKiPuH025171 for ; Thu, 22 Sep 2022 15:56:38 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=yOMcBkyNpc6uPFakbVtQSLXuvpd1tCikemrP3nWsx4o=; b=Q5gxqV6qstEL0Vc72iHliSCn0+OPEiYrsQnxhSJMqA30r0BLzW9Woda0vvnCbpzQjCTv w9l7MGfR3HhbzTfGU0zfuPZKJDYumQ6oQ5n15Euy7vL1vew8sN9qdYJ/08nS1Lh3V9Sd kiNoHddcXIc6RseYjFmKGBS4pHuihZKBihA= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3jraubgvvn-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 22 Sep 2022 15:56:38 -0700 Received: from twshared17341.24.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 15:56:35 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 02073999DC6D; Thu, 22 Sep 2022 15:56:29 -0700 (PDT) From: Martin KaFai Lau To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , , Martin KaFai Lau Subject: [PATCH bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() Date: Thu, 22 Sep 2022 15:56:29 -0700 Message-ID: <20220922225629.3056949-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220922225616.3054840-1-kafai@fb.com> References: <20220922225616.3054840-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: InLoXWV-FRsMa_S0y9_6JnjFGgY-JXxU X-Proofpoint-GUID: InLoXWV-FRsMa_S0y9_6JnjFGgY-JXxU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-22_15,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks. However, it is not done for cg sock_ddr, cg sockops, and some of the bpf_lsm_cgroup hooks. The tcp-cc "cdg" should have very limited usage. This patch is to move the "cdg" check to the common sol_tcp_sockopt() so that all hooks have a consistent behavior. The motivation to make this check consistent now is because the latter patch will need to expose _bpf_setsockopt() for the bpf_tcp_ca to use and it requires the "cdg" check. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 2fd9449026aa..f4cea3ff994a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5127,6 +5127,13 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, case TCP_CONGESTION: if (*optlen < 2) return -EINVAL; + /* "cdg" is the only cc that alloc a ptr + * in inet_csk_ca area. The bpf-tcp-cc may + * overwrite this ptr after switching to cdg. + */ + if (!getopt && *optlen >= sizeof("cdg") - 1 && + !strncmp("cdg", optval, *optlen)) + return -ENOTSUPP; break; case TCP_SAVED_SYN: if (*optlen < 1) @@ -5285,12 +5292,6 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname, BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level, int, optname, char *, optval, int, optlen) { - if (level == SOL_TCP && optname == TCP_CONGESTION) { - if (optlen >= sizeof("cdg") - 1 && - !strncmp("cdg", optval, optlen)) - return -ENOTSUPP; - } - return _bpf_setsockopt(sk, level, optname, optval, optlen); } From patchwork Thu Sep 22 22:56:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12985919 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5214ECAAD8 for ; Thu, 22 Sep 2022 22:56:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230454AbiIVW4t (ORCPT ); Thu, 22 Sep 2022 18:56:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230477AbiIVW4p (ORCPT ); Thu, 22 Sep 2022 18:56:45 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 795D710F734 for ; Thu, 22 Sep 2022 15:56:44 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28MKiQgv001128 for ; Thu, 22 Sep 2022 15:56:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=RqFL4vFtq4xcNAh9IUa9kzPBN7c/NmqbgQ+sYfefREs=; b=pHrzvTJRxAIFO+63+bl0P3og0gtwmjKRzwnTwZ1ggnlC6njx2mvFXjZqloFM4Zr5CBm+ pXzo0cRqHMQVSyPdfEgPCCJpnEFRqUuxY+CgvcHJq+NohfBw3S268vWSEQuKqFPo2NGj dzhTXawVerHDHBwJ0+7uicVSJ9ScUP6KyrU= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jrhjgps1g-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 22 Sep 2022 15:56:44 -0700 Received: from twshared20273.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 15:56:42 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 7605F999DE7D; Thu, 22 Sep 2022 15:56:36 -0700 (PDT) From: Martin KaFai Lau To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , , Martin KaFai Lau Subject: [PATCH bpf-next 3/5] bpf: Add bpf_run_ctx_type Date: Thu, 22 Sep 2022 15:56:36 -0700 Message-ID: <20220922225636.3057567-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220922225616.3054840-1-kafai@fb.com> References: <20220922225616.3054840-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: iRZALEdLytxiydXFOUaBnyiEY9MIXYJk X-Proofpoint-ORIG-GUID: iRZALEdLytxiydXFOUaBnyiEY9MIXYJk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-22_14,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch adds a bpf_run_ctx_type to the struct bpf_run_ctx. The next patch needs to look at the previous run ctx saved at tramp_run_ctx->saved_run_ctx and checks if it is also changing the tcp-cc for the same sk (saved in bpf_cookie). Thus, it needs to know if the saved_run_ctx is the bpf_run_ctx type that it is looking for before looking into its members. Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 17 ++++++++++++++--- kernel/bpf/bpf_iter.c | 2 +- kernel/bpf/cgroup.c | 2 +- kernel/bpf/trampoline.c | 4 ++++ kernel/trace/bpf_trace.c | 1 + net/bpf/test_run.c | 2 +- 6 files changed, 22 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6fdbc1398b8a..902b1be047cf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1517,7 +1517,18 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array, u64 bpf_cookie, struct bpf_prog_array **new_array); -struct bpf_run_ctx {}; +enum bpf_run_ctx_type { + BPF_RUN_CTX_TYPE_NONE, + BPF_RUN_CTX_TYPE_CG, + BPF_RUN_CTX_TYPE_TRACE, + BPF_RUN_CTX_TYPE_TRAMP, + BPF_RUN_CTX_TYPE_KPROBE_MULTI, + BPF_RUN_CTX_TYPE_STRUCT_OPS, +}; + +struct bpf_run_ctx { + enum bpf_run_ctx_type type; +}; struct bpf_cg_run_ctx { struct bpf_run_ctx run_ctx; @@ -1568,7 +1579,7 @@ bpf_prog_run_array(const struct bpf_prog_array *array, const struct bpf_prog_array_item *item; const struct bpf_prog *prog; struct bpf_run_ctx *old_run_ctx; - struct bpf_trace_run_ctx run_ctx; + struct bpf_trace_run_ctx run_ctx = { .run_ctx.type = BPF_RUN_CTX_TYPE_TRACE }; u32 ret = 1; RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held"); @@ -1607,7 +1618,7 @@ bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu, const struct bpf_prog *prog; const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; - struct bpf_trace_run_ctx run_ctx; + struct bpf_trace_run_ctx run_ctx = { .run_ctx.type = BPF_RUN_CTX_TYPE_TRACE }; u32 ret = 1; might_fault(); diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 5dc307bdeaeb..65ff0c93b0ba 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -694,7 +694,7 @@ struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop) int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx) { - struct bpf_run_ctx run_ctx, *old_run_ctx; + struct bpf_run_ctx run_ctx = {}, *old_run_ctx; int ret; if (prog->aux->sleepable) { diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 00c7f864900e..850fd6983b9a 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -37,7 +37,7 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp, const struct bpf_prog *prog; const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; - struct bpf_cg_run_ctx run_ctx; + struct bpf_cg_run_ctx run_ctx = { .run_ctx.type = BPF_RUN_CTX_TYPE_CG }; u32 func_ret; run_ctx.retval = retval; diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index e6551e4a6064..313619012a59 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -882,6 +882,7 @@ u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *ru rcu_read_lock(); migrate_disable(); + run_ctx->run_ctx.type = BPF_RUN_CTX_TYPE_TRAMP; run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { @@ -934,6 +935,7 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, rcu_read_lock(); migrate_disable(); + run_ctx->run_ctx.type = BPF_RUN_CTX_TYPE_TRAMP; run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return NO_START_TIME; @@ -960,6 +962,7 @@ u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_r return 0; } + run_ctx->run_ctx.type = BPF_RUN_CTX_TYPE_TRAMP; run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return bpf_prog_start_time(); @@ -983,6 +986,7 @@ u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, rcu_read_lock(); migrate_disable(); + run_ctx->run_ctx.type = BPF_RUN_CTX_TYPE_STRUCT_OPS; run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return bpf_prog_start_time(); diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b05f0310dbd3..7670ca88b721 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2575,6 +2575,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, unsigned long entry_ip, struct pt_regs *regs) { struct bpf_kprobe_multi_run_ctx run_ctx = { + .run_ctx.type = BPF_RUN_CTX_TYPE_KPROBE_MULTI, .link = link, .entry_ip = entry_ip, }; diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 13d578ce2a09..1f2a745e8641 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -374,7 +374,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, { struct bpf_prog_array_item item = {.prog = prog}; struct bpf_run_ctx *old_ctx; - struct bpf_cg_run_ctx run_ctx; + struct bpf_cg_run_ctx run_ctx = { .run_ctx.type = BPF_RUN_CTX_TYPE_CG }; struct bpf_test_timer t = { NO_MIGRATE }; enum bpf_cgroup_storage_type stype; int ret; From patchwork Thu Sep 22 22:56:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12985921 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 694CFECAAD8 for ; Thu, 22 Sep 2022 22:57:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230441AbiIVW5P (ORCPT ); Thu, 22 Sep 2022 18:57:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbiIVW47 (ORCPT ); Thu, 22 Sep 2022 18:56:59 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FDB21138F8 for ; Thu, 22 Sep 2022 15:56:56 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28MKiT0g013235 for ; Thu, 22 Sep 2022 15:56:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=I6mE5Nuvw1CCKlFjYO5v8nXq6zIBUNfcRxIn1vznMQA=; b=djeLTeuLwo9JehQCGslUUR2VHyTAw6QT2iYbqwdxWwPJxYO+Kwjz+B5jfxs/UP7+DOJk Ya5hE8ISgUxN3JsVng8EHkjbHmwvoopYSvzIFxjDrqrO67mYtoUfe8WEa/swH0WpHcgC GCJ16i8Xtp7XM2o9fIfFSoYJ+GmJYZ4bsrs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jr6s3twre-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 22 Sep 2022 15:56:55 -0700 Received: from twshared20273.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 15:56:54 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id C35B5999DE97; Thu, 22 Sep 2022 15:56:42 -0700 (PDT) From: Martin KaFai Lau To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , , Martin KaFai Lau Subject: [PATCH bpf-next 4/5] bpf: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Date: Thu, 22 Sep 2022 15:56:42 -0700 Message-ID: <20220922225642.3058176-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220922225616.3054840-1-kafai@fb.com> References: <20220922225616.3054840-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: jyF6ZMna8JMqItWBbAB9AFQx7wNRWgF9 X-Proofpoint-GUID: jyF6ZMna8JMqItWBbAB9AFQx7wNRWgF9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-22_15,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by checking the previous bpf_run_ctx has saved the same sk pointer in the bpf_cookie. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 3 +++ net/core/filter.c | 4 ++-- net/ipv4/bpf_tcp_ca.c | 54 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 98e28126c24b..9942ecc68a45 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -911,6 +911,9 @@ int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len); bool sk_filter_charge(struct sock *sk, struct sk_filter *fp); void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp); +int _bpf_setsockopt(struct sock *sk, int level, int optname, + char *optval, int optlen); + u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); #define __bpf_call_base_args \ ((u64 (*)(u64, u64, u64, u64, u64, const struct bpf_insn *)) \ diff --git a/net/core/filter.c b/net/core/filter.c index f4cea3ff994a..e56a1ebcf1bc 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5244,8 +5244,8 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, return -EINVAL; } -static int _bpf_setsockopt(struct sock *sk, int level, int optname, - char *optval, int optlen) +int _bpf_setsockopt(struct sock *sk, int level, int optname, + char *optval, int optlen) { if (sk_fullsock(sk)) sock_owned_by_me(sk); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 6da16ae6a962..a9f2cab5ffbc 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -144,6 +144,57 @@ static const struct bpf_func_proto bpf_tcp_send_ack_proto = { .arg2_type = ARG_ANYTHING, }; +BPF_CALL_5(bpf_init_ops_setsockopt, struct sock *, sk, int, level, + int, optname, char *, optval, int, optlen) +{ + struct bpf_tramp_run_ctx *run_ctx, *saved_run_ctx; + int ret; + + if (optname != TCP_CONGESTION) + return _bpf_setsockopt(sk, level, optname, optval, optlen); + + run_ctx = (struct bpf_tramp_run_ctx *)current->bpf_ctx; + if (unlikely(run_ctx->saved_run_ctx && + run_ctx->saved_run_ctx->type == BPF_RUN_CTX_TYPE_STRUCT_OPS)) { + saved_run_ctx = (struct bpf_tramp_run_ctx *)run_ctx->saved_run_ctx; + /* It stops this looping + * + * .init => bpf_setsockopt(tcp_cc) => .init => + * bpf_setsockopt(tcp_cc)" => .init => .... + * + * The second bpf_setsockopt(tcp_cc) is not allowed + * in order to break the loop when both .init + * are the same bpf prog. + * + * This applies even the second bpf_setsockopt(tcp_cc) + * does not cause a loop. This limits only the first + * '.init' can call bpf_setsockopt(TCP_CONGESTION) to + * pick a fallback cc (eg. peer does not support ECN) + * and the second '.init' cannot fallback to + * another cc. + */ + if (saved_run_ctx->bpf_cookie == (uintptr_t)sk) + return -EBUSY; + } + + run_ctx->bpf_cookie = (uintptr_t)sk; + ret = _bpf_setsockopt(sk, level, optname, optval, optlen); + run_ctx->bpf_cookie = 0; + + return ret; +} + +static const struct bpf_func_proto bpf_init_ops_setsockopt_proto = { + .func = bpf_init_ops_setsockopt, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY, + .arg5_type = ARG_CONST_SIZE, +}; + static u32 prog_ops_moff(const struct bpf_prog *prog) { const struct btf_member *m; @@ -169,6 +220,9 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id, case BPF_FUNC_sk_storage_delete: return &bpf_sk_storage_delete_proto; case BPF_FUNC_setsockopt: + if (prog_ops_moff(prog) == + offsetof(struct tcp_congestion_ops, init)) + return &bpf_init_ops_setsockopt_proto; /* Does not allow release() to call setsockopt. * release() is called when the current bpf-tcp-cc * is retiring. It is not allowed to call From patchwork Thu Sep 22 22:56:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12985920 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23006C54EE9 for ; Thu, 22 Sep 2022 22:57:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230216AbiIVW5L (ORCPT ); Thu, 22 Sep 2022 18:57:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230504AbiIVW46 (ORCPT ); Thu, 22 Sep 2022 18:56:58 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B08F112662 for ; Thu, 22 Sep 2022 15:56:54 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28MKibuk013680 for ; Thu, 22 Sep 2022 15:56:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=sVRCkHaTLXKWBf27vtVaiDWC3DjrTAzIxcUmFd0K0e4=; b=W7Tma40SNJYqbSkMIWQoKGfvRlRPjSG4I9w9Q1EItUTKHNaYQdX0Kh7PHGkJ0CPxwSwF RzslopLDX7lTjAg6VkqdV+9kD7cez/47MvmgyTUzDIgG10eK1R4t2iTNqSNbQNfVue2q X0VbRmxdPA7+GnvdzxVQj9IT+LGb8SC/fL0= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jr6s3twr2-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 22 Sep 2022 15:56:53 -0700 Received: from twshared17341.24.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 15:56:51 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id 187C5999DEAB; Thu, 22 Sep 2022 15:56:49 -0700 (PDT) From: Martin KaFai Lau To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , , Martin KaFai Lau Subject: [PATCH bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) Date: Thu, 22 Sep 2022 15:56:49 -0700 Message-ID: <20220922225649.3058534-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220922225616.3054840-1-kafai@fb.com> References: <20220922225616.3054840-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: HCW6ijWm3MBMYaMA0zFJIDfLYR0QJZQ3 X-Proofpoint-GUID: HCW6ijWm3MBMYaMA0zFJIDfLYR0QJZQ3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-22_15,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch changes the bpf_dctcp test to ensure the recurred bpf_setsockopt(TCP_CONGESTION) returns -EBUSY. Signed-off-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ++++ tools/testing/selftests/bpf/progs/bpf_dctcp.c | 23 ++++++++++++++----- 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c index 2959a52ced06..930d2e6b3d5e 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -290,6 +290,10 @@ static void test_dctcp_fallback(void) goto done; ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res"); ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res"); + /* All setsockopt(TCP_CONGESTION) in the recurred + * bpf_dctcp->init() should fail with -EBUSY. + */ + ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 4, "ebusy_cnt"); err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len); if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)")) diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c index 9573be6122be..0cab241c33b5 100644 --- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c +++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include "bpf_tcp_helpers.h" @@ -23,6 +24,7 @@ const char tcp_cdg[] = "cdg"; char cc_res[TCP_CA_NAME_MAX]; int tcp_cdg_res = 0; int stg_result = 0; +int ebusy_cnt = 0; struct { __uint(type, BPF_MAP_TYPE_SK_STORAGE); @@ -64,19 +66,28 @@ void BPF_PROG(dctcp_init, struct sock *sk) if (!(tp->ecn_flags & TCP_ECN_OK) && fallback[0]) { /* Switch to fallback */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)fallback, sizeof(fallback)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)fallback, sizeof(fallback)) == -EBUSY) + ebusy_cnt++; + /* Switch back to myself which the bpf trampoline * stopped calling dctcp_init recursively. */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)bpf_dctcp, sizeof(bpf_dctcp)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)bpf_dctcp, sizeof(bpf_dctcp)) == -EBUSY) + ebusy_cnt++; + /* Switch back to fallback */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)fallback, sizeof(fallback)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)fallback, sizeof(fallback)) == -EBUSY) + ebusy_cnt++; + /* Expecting -ENOTSUPP for tcp_cdg_res */ tcp_cdg_res = bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, (void *)tcp_cdg, sizeof(tcp_cdg)); + if (tcp_cdg_res == -EBUSY) + ebusy_cnt++; + bpf_getsockopt(sk, SOL_TCP, TCP_CONGESTION, (void *)cc_res, sizeof(cc_res)); return;