From patchwork Thu Sep 22 22:56:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12985921 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 694CFECAAD8 for ; Thu, 22 Sep 2022 22:57:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230441AbiIVW5P (ORCPT ); Thu, 22 Sep 2022 18:57:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbiIVW47 (ORCPT ); Thu, 22 Sep 2022 18:56:59 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FDB21138F8 for ; Thu, 22 Sep 2022 15:56:56 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28MKiT0g013235 for ; Thu, 22 Sep 2022 15:56:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=I6mE5Nuvw1CCKlFjYO5v8nXq6zIBUNfcRxIn1vznMQA=; b=djeLTeuLwo9JehQCGslUUR2VHyTAw6QT2iYbqwdxWwPJxYO+Kwjz+B5jfxs/UP7+DOJk Ya5hE8ISgUxN3JsVng8EHkjbHmwvoopYSvzIFxjDrqrO67mYtoUfe8WEa/swH0WpHcgC GCJ16i8Xtp7XM2o9fIfFSoYJ+GmJYZ4bsrs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3jr6s3twre-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 22 Sep 2022 15:56:55 -0700 Received: from twshared20273.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 15:56:54 -0700 Received: by devbig933.frc1.facebook.com (Postfix, from userid 6611) id C35B5999DE97; Thu, 22 Sep 2022 15:56:42 -0700 (PDT) From: Martin KaFai Lau To: CC: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , , , Martin KaFai Lau Subject: [PATCH bpf-next 4/5] bpf: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Date: Thu, 22 Sep 2022 15:56:42 -0700 Message-ID: <20220922225642.3058176-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220922225616.3054840-1-kafai@fb.com> References: <20220922225616.3054840-1-kafai@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: jyF6ZMna8JMqItWBbAB9AFQx7wNRWgF9 X-Proofpoint-GUID: jyF6ZMna8JMqItWBbAB9AFQx7wNRWgF9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-22_15,2022-09-22_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by checking the previous bpf_run_ctx has saved the same sk pointer in the bpf_cookie. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 3 +++ net/core/filter.c | 4 ++-- net/ipv4/bpf_tcp_ca.c | 54 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 98e28126c24b..9942ecc68a45 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -911,6 +911,9 @@ int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len); bool sk_filter_charge(struct sock *sk, struct sk_filter *fp); void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp); +int _bpf_setsockopt(struct sock *sk, int level, int optname, + char *optval, int optlen); + u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); #define __bpf_call_base_args \ ((u64 (*)(u64, u64, u64, u64, u64, const struct bpf_insn *)) \ diff --git a/net/core/filter.c b/net/core/filter.c index f4cea3ff994a..e56a1ebcf1bc 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5244,8 +5244,8 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname, return -EINVAL; } -static int _bpf_setsockopt(struct sock *sk, int level, int optname, - char *optval, int optlen) +int _bpf_setsockopt(struct sock *sk, int level, int optname, + char *optval, int optlen) { if (sk_fullsock(sk)) sock_owned_by_me(sk); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 6da16ae6a962..a9f2cab5ffbc 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -144,6 +144,57 @@ static const struct bpf_func_proto bpf_tcp_send_ack_proto = { .arg2_type = ARG_ANYTHING, }; +BPF_CALL_5(bpf_init_ops_setsockopt, struct sock *, sk, int, level, + int, optname, char *, optval, int, optlen) +{ + struct bpf_tramp_run_ctx *run_ctx, *saved_run_ctx; + int ret; + + if (optname != TCP_CONGESTION) + return _bpf_setsockopt(sk, level, optname, optval, optlen); + + run_ctx = (struct bpf_tramp_run_ctx *)current->bpf_ctx; + if (unlikely(run_ctx->saved_run_ctx && + run_ctx->saved_run_ctx->type == BPF_RUN_CTX_TYPE_STRUCT_OPS)) { + saved_run_ctx = (struct bpf_tramp_run_ctx *)run_ctx->saved_run_ctx; + /* It stops this looping + * + * .init => bpf_setsockopt(tcp_cc) => .init => + * bpf_setsockopt(tcp_cc)" => .init => .... + * + * The second bpf_setsockopt(tcp_cc) is not allowed + * in order to break the loop when both .init + * are the same bpf prog. + * + * This applies even the second bpf_setsockopt(tcp_cc) + * does not cause a loop. This limits only the first + * '.init' can call bpf_setsockopt(TCP_CONGESTION) to + * pick a fallback cc (eg. peer does not support ECN) + * and the second '.init' cannot fallback to + * another cc. + */ + if (saved_run_ctx->bpf_cookie == (uintptr_t)sk) + return -EBUSY; + } + + run_ctx->bpf_cookie = (uintptr_t)sk; + ret = _bpf_setsockopt(sk, level, optname, optval, optlen); + run_ctx->bpf_cookie = 0; + + return ret; +} + +static const struct bpf_func_proto bpf_init_ops_setsockopt_proto = { + .func = bpf_init_ops_setsockopt, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY, + .arg5_type = ARG_CONST_SIZE, +}; + static u32 prog_ops_moff(const struct bpf_prog *prog) { const struct btf_member *m; @@ -169,6 +220,9 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id, case BPF_FUNC_sk_storage_delete: return &bpf_sk_storage_delete_proto; case BPF_FUNC_setsockopt: + if (prog_ops_moff(prog) == + offsetof(struct tcp_congestion_ops, init)) + return &bpf_init_ops_setsockopt_proto; /* Does not allow release() to call setsockopt. * release() is called when the current bpf-tcp-cc * is retiring. It is not allowed to call