From patchwork Wed Mar 8 00:50:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165036 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04C45C678D5 for ; Wed, 8 Mar 2023 00:51:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229695AbjCHAvM (ORCPT ); Tue, 7 Mar 2023 19:51:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229692AbjCHAvL (ORCPT ); Tue, 7 Mar 2023 19:51:11 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C83DAA9DE2 for ; Tue, 7 Mar 2023 16:51:08 -0800 (PST) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32803NAP010424 for ; Tue, 7 Mar 2023 16:51:08 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=2o78ghm+GH4bSaX4Kxbpkr3zLnMk9Bs2vlvkmd5ifDw=; b=DdqLBrakxAdr2GXTXOHgK3pSqo4X9qAkJ2hYeFkMiQq2/vVMn57piPvMP1tnCzdNCrVf mVXkzF/QY/YeqGCW6tN8n/UOp7rMj4zwiKK1nNezhP8c2/ZOkhrYN3cqwFKWppMwfeUo 3Yl+kueNWvqMUlOdz5LNLRbwBrf6l+EO6qXy0i6Fq3DNmCPJoBe/1cNe1EgkOaTkQF/e pepqLr7HCGh7IJ8PQvbVl+EdQNH0Oc2LEoq141Ld3CBxnmIda6j+pZ11oVlTOLOueL7K a1fIdpPLRuQQUoyj9isgxDfBAyRlqikEc0I+Wx+sEf8gzOHtGuvyPlSG6ljuNm22jJE3 fA== Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6ffpr7hm-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:51:08 -0800 Received: from twshared21709.17.frc2.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:51:06 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 71DF76C92E14; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 1/8] bpf: Retire the struct_ops map kvalue->refcnt. Date: Tue, 7 Mar 2023 16:50:43 -0800 Message-ID: <20230308005050.255859-2-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: aC24bi73h7ecqDdORDIerxL0XPUrzhi1 X-Proofpoint-GUID: aC24bi73h7ecqDdORDIerxL0XPUrzhi1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We have replaced kvalue-refcnt with synchronize_rcu() to wait for an RCU grace period. Maintenance of kvalue->refcnt was a complicated task, as we had to simultaneously keep track of two reference counts: one for the reference count of bpf_map. When the kvalue->refcnt reaches zero, we also have to reduce the reference count on bpf_map - yet these steps are not performed in an atomic manner and require us to be vigilant when managing them. By eliminating kvalue->refcnt, we can make our maintenance more straightforward as the refcount of bpf_map is now solely managed! To prevent the trampoline image of a struct_ops from being released while it is still in use, we wait for an RCU grace period. The setsockopt(TCP_CONGESTION, "...") command allows you to change your socket's congestion control algorithm and can result in releasing the old struct_ops implementation. Moreover, since this function is exposed through bpf_setsockopt(), it may be accessed by BPF programs as well. To ensure that the trampoline image belonging to struct_op can be safely called while its method is in use, struct_ops is safeguarded with rcu_read_lock(). Doing so prevents any destruction of the associated images before returning from a trampoline and requires us to wait for an RCU grace period. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 2 ++ kernel/bpf/bpf_struct_ops.c | 60 ++++++++++++++++++------------------- kernel/bpf/syscall.c | 2 +- 3 files changed, 32 insertions(+), 32 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6792a7940e1e..50cfc2388cbc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -78,6 +78,7 @@ struct bpf_map_ops { struct bpf_map *(*map_alloc)(union bpf_attr *attr); void (*map_release)(struct bpf_map *map, struct file *map_file); void (*map_free)(struct bpf_map *map); + void (*map_free_rcu)(struct bpf_map *map); int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); void (*map_release_uref)(struct bpf_map *map); void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); @@ -1934,6 +1935,7 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd); struct bpf_map *__bpf_map_get(struct fd f); void bpf_map_inc(struct bpf_map *map); void bpf_map_inc_with_uref(struct bpf_map *map); +struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref); struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map); void bpf_map_put_with_uref(struct bpf_map *map); void bpf_map_put(struct bpf_map *map); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 38903fb52f98..9e097fcc9cf4 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -58,6 +58,8 @@ struct bpf_struct_ops_map { struct bpf_struct_ops_value kvalue; }; +static DEFINE_MUTEX(update_mutex); + #define VALUE_PREFIX "bpf_struct_ops_" #define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1) @@ -249,6 +251,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; struct bpf_struct_ops_value *uvalue, *kvalue; enum bpf_struct_ops_state state; + s64 refcnt; if (unlikely(*(u32 *)key != 0)) return -ENOENT; @@ -267,7 +270,9 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, uvalue = value; memcpy(uvalue, st_map->uvalue, map->value_size); uvalue->state = state; - refcount_set(&uvalue->refcnt, refcount_read(&kvalue->refcnt)); + + refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt); + refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0)); return 0; } @@ -491,7 +496,6 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, *(unsigned long *)(udata + moff) = prog->aux->id; } - refcount_set(&kvalue->refcnt, 1); bpf_map_inc(map); set_memory_rox((long)st_map->image, 1); @@ -536,8 +540,7 @@ static int bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) switch (prev_state) { case BPF_STRUCT_OPS_STATE_INUSE: st_map->st_ops->unreg(&st_map->kvalue.data); - if (refcount_dec_and_test(&st_map->kvalue.refcnt)) - bpf_map_put(map); + bpf_map_put(map); return 0; case BPF_STRUCT_OPS_STATE_TOBEFREE: return -EINPROGRESS; @@ -574,6 +577,19 @@ static void bpf_struct_ops_map_free(struct bpf_map *map) { struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; + /* The struct_ops's function may switch to another struct_ops. + * + * For example, bpf_tcp_cc_x->init() may switch to + * another tcp_cc_y by calling + * setsockopt(TCP_CONGESTION, "tcp_cc_y"). + * During the switch, bpf_struct_ops_put(tcp_cc_x) is called + * and its refcount may reach 0 which then free its + * trampoline image while tcp_cc_x is still running. + * + * Thus, a rcu grace period is needed here. + */ + synchronize_rcu(); + if (st_map->links) bpf_struct_ops_map_put_progs(st_map); bpf_map_area_free(st_map->links); @@ -676,41 +692,23 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = { bool bpf_struct_ops_get(const void *kdata) { struct bpf_struct_ops_value *kvalue; + struct bpf_struct_ops_map *st_map; + struct bpf_map *map; kvalue = container_of(kdata, struct bpf_struct_ops_value, data); + st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue); - return refcount_inc_not_zero(&kvalue->refcnt); -} - -static void bpf_struct_ops_put_rcu(struct rcu_head *head) -{ - struct bpf_struct_ops_map *st_map; - - st_map = container_of(head, struct bpf_struct_ops_map, rcu); - bpf_map_put(&st_map->map); + map = __bpf_map_inc_not_zero(&st_map->map, false); + return !IS_ERR(map); } void bpf_struct_ops_put(const void *kdata) { struct bpf_struct_ops_value *kvalue; + struct bpf_struct_ops_map *st_map; kvalue = container_of(kdata, struct bpf_struct_ops_value, data); - if (refcount_dec_and_test(&kvalue->refcnt)) { - struct bpf_struct_ops_map *st_map; - - st_map = container_of(kvalue, struct bpf_struct_ops_map, - kvalue); - /* The struct_ops's function may switch to another struct_ops. - * - * For example, bpf_tcp_cc_x->init() may switch to - * another tcp_cc_y by calling - * setsockopt(TCP_CONGESTION, "tcp_cc_y"). - * During the switch, bpf_struct_ops_put(tcp_cc_x) is called - * and its map->refcnt may reach 0 which then free its - * trampoline image while tcp_cc_x is still running. - * - * Thus, a rcu grace period is needed here. - */ - call_rcu(&st_map->rcu, bpf_struct_ops_put_rcu); - } + st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue); + + bpf_map_put(&st_map->map); } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f406dfa13792..03273cddd6bd 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1288,7 +1288,7 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) } /* map_idr_lock should have been held */ -static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) +struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) { int refold; From patchwork Wed Mar 8 00:50:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165032 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 016DDC678D5 for ; Wed, 8 Mar 2023 00:51:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229529AbjCHAvI (ORCPT ); Tue, 7 Mar 2023 19:51:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229477AbjCHAvI (ORCPT ); Tue, 7 Mar 2023 19:51:08 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06A735ADD2 for ; Tue, 7 Mar 2023 16:51:03 -0800 (PST) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32803ImT010351 for ; Tue, 7 Mar 2023 16:51:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=SLft0s/6wnnZRzpYBgm1TmBZ2QcvTaBoPyzfcaVJIbI=; b=RPfqIHjAplrUIXyN5PPivnfV7gmSZTTj5EEOgzXUp0Fzqr72RKWXvOHyc2kA8i9zX0iF fcbH7C4HegEraXHg+hBOcQwmZQMWI/PvEMLcAvoBj6ZMMWUtTinDy7lNBSJwzW+eOUzW 8xYxLq1hRaUoQ3I/KCdHAgfNOnfRdeeDvxGKY1ztibv0fSSvam7z6nQ4JTWi1knufVnD NS8Mu8leD/WOxnWJLU8XWbqdV8/kmw7b6i3z2pa9Q6CtrW9ZkU6OElwgnDDM+cQy21j7 qLyXWtCfi9nDUdR3vGkdsaGaRE449Y7JTZ1pQ6+x7/r7D6w8eU3/5tfLdnr6+p1hVXqk 8g== Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6ffpr7gv-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:51:03 -0800 Received: from twshared33736.38.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:51:02 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 801B16C92E16; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 2/8] net: Update an existing TCP congestion control algorithm. Date: Tue, 7 Mar 2023 16:50:44 -0800 Message-ID: <20230308005050.255859-3-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: rAp-kyGPWfhfQYTnMDbpfB5tOyRyZGyQ X-Proofpoint-GUID: rAp-kyGPWfhfQYTnMDbpfB5tOyRyZGyQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This feature lets you immediately transition to another congestion control algorithm or implementation with the same name. Once a name is updated, new connections will apply this new algorithm. The 'validate' function pointer has been added to bpf_struct_ops as well, allowing us to validate a struct_ops without having to go through the registration process. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 2 ++ include/net/tcp.h | 3 ++ net/bpf/bpf_dummy_struct_ops.c | 6 ++++ net/ipv4/bpf_tcp_ca.c | 14 +++++++-- net/ipv4/tcp_cong.c | 57 +++++++++++++++++++++++++++++----- 5 files changed, 73 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 50cfc2388cbc..00b6e1a2edaf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1512,6 +1512,8 @@ struct bpf_struct_ops { void *kdata, const void *udata); int (*reg)(void *kdata); void (*unreg)(void *kdata); + int (*update)(void *kdata, void *old_kdata); + int (*validate)(void *kdata); const struct btf_type *type; const struct btf_type *value_type; const char *name; diff --git a/include/net/tcp.h b/include/net/tcp.h index db9f828e9d1e..2abb755e6a3a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1117,6 +1117,9 @@ struct tcp_congestion_ops { int tcp_register_congestion_control(struct tcp_congestion_ops *type); void tcp_unregister_congestion_control(struct tcp_congestion_ops *type); +int tcp_update_congestion_control(struct tcp_congestion_ops *type, + struct tcp_congestion_ops *old_type); +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca); void tcp_assign_congestion_control(struct sock *sk); void tcp_init_congestion_control(struct sock *sk); diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index ff4f89a2b02a..158f14e240d0 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -222,12 +222,18 @@ static void bpf_dummy_unreg(void *kdata) { } +static int bpf_dummy_update(void *kdata, void *old_kdata) +{ + return -EOPNOTSUPP; +} + struct bpf_struct_ops bpf_bpf_dummy_ops = { .verifier_ops = &bpf_dummy_verifier_ops, .init = bpf_dummy_init, .check_member = bpf_dummy_ops_check_member, .init_member = bpf_dummy_init_member, .reg = bpf_dummy_reg, + .update = bpf_dummy_update, .unreg = bpf_dummy_unreg, .name = "bpf_dummy_ops", }; diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 13fc0c185cd9..e8b27826283e 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -239,8 +239,6 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t, if (bpf_obj_name_cpy(tcp_ca->name, utcp_ca->name, sizeof(tcp_ca->name)) <= 0) return -EINVAL; - if (tcp_ca_find(utcp_ca->name)) - return -EEXIST; return 1; } @@ -266,13 +264,25 @@ static void bpf_tcp_ca_unreg(void *kdata) tcp_unregister_congestion_control(kdata); } +static int bpf_tcp_ca_update(void *kdata, void *old_kdata) +{ + return tcp_update_congestion_control(kdata, old_kdata); +} + +static int bpf_tcp_ca_validate(void *kdata) +{ + return tcp_validate_congestion_control(kdata); +} + struct bpf_struct_ops bpf_tcp_congestion_ops = { .verifier_ops = &bpf_tcp_ca_verifier_ops, .reg = bpf_tcp_ca_reg, .unreg = bpf_tcp_ca_unreg, + .update = bpf_tcp_ca_update, .check_member = bpf_tcp_ca_check_member, .init_member = bpf_tcp_ca_init_member, .init = bpf_tcp_ca_init, + .validate = bpf_tcp_ca_validate, .name = "tcp_congestion_ops", }; diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index db8b4b488c31..24829390e495 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -75,14 +75,8 @@ struct tcp_congestion_ops *tcp_ca_find_key(u32 key) return NULL; } -/* - * Attach new congestion control algorithm to the list - * of available options. - */ -int tcp_register_congestion_control(struct tcp_congestion_ops *ca) +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca) { - int ret = 0; - /* all algorithms must implement these */ if (!ca->ssthresh || !ca->undo_cwnd || !(ca->cong_avoid || ca->cong_control)) { @@ -90,6 +84,20 @@ int tcp_register_congestion_control(struct tcp_congestion_ops *ca) return -EINVAL; } + return 0; +} + +/* Attach new congestion control algorithm to the list + * of available options. + */ +int tcp_register_congestion_control(struct tcp_congestion_ops *ca) +{ + int ret; + + ret = tcp_validate_congestion_control(ca); + if (ret) + return ret; + ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); spin_lock(&tcp_cong_list_lock); @@ -130,6 +138,41 @@ void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca) } EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control); +/* Replace a registered old ca with a new one. + * + * The new ca must have the same name as the old one, that has been + * registered. + */ +int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca) +{ + struct tcp_congestion_ops *existing; + int ret; + + ret = tcp_validate_congestion_control(ca); + if (ret) + return ret; + + ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); + + spin_lock(&tcp_cong_list_lock); + existing = tcp_ca_find_key(old_ca->key); + if (ca->key == TCP_CA_UNSPEC || !existing || strcmp(existing->name, ca->name)) { + pr_notice("%s not registered or non-unique key\n", + ca->name); + ret = -EINVAL; + } else if (existing != old_ca) { + pr_notice("invalid old congestion control algorithm to replace\n"); + ret = -EINVAL; + } else { + list_add_tail_rcu(&ca->list, &tcp_cong_list); + list_del_rcu(&existing->list); + pr_debug("%s updated\n", ca->name); + } + spin_unlock(&tcp_cong_list_lock); + + return ret; +} + u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca) { const struct tcp_congestion_ops *ca; From patchwork Wed Mar 8 00:50:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165034 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C71CC678D5 for ; Wed, 8 Mar 2023 00:51:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229477AbjCHAvL (ORCPT ); Tue, 7 Mar 2023 19:51:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229605AbjCHAvK (ORCPT ); Tue, 7 Mar 2023 19:51:10 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98F57A9DE6 for ; Tue, 7 Mar 2023 16:51:07 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32805C3U001603 for ; Tue, 7 Mar 2023 16:51:07 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=FmFC/fk4R/DY9zzfOK97UWbCCbj8c72kQ/I1CQbIrXE=; b=K59yPESZTMjZTycqy/ffEzrxzPVsU2ENTo8m5Ak4JJXp/9P1RztU3983MZXdY9SVl3oL TCHnb4kbbjZ4ZMNmmTQgeunaCLnS2vChFRJzdbSzqYCbo4ZZa9hBnpgqoMPkoPkYI/l1 iu7Zz8qpEzl/EdnD1VcHfjUtCRyrcvYUnkz7dzanPzSWi4ZELavpO+tS1pWXkMhKgJZs osBK/SHqJ63GE+pL3OvRCUhjEusNybAvXWLrjNJVWETDX2VTO6BZken7xs4QsoigyBK1 XzlAQ3NiOPnvIYBKgnwCjtngdaoSdHP4+XQKOlT4JY24v71C80VfL1IBOOkQi86YLIOV YA== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6fgp071f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:51:07 -0800 Received: from twshared52565.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:51:05 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 8A4496C92E18; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 3/8] bpf: Create links for BPF struct_ops maps. Date: Tue, 7 Mar 2023 16:50:45 -0800 Message-ID: <20230308005050.255859-4-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: oS69KoqU65lUhApCNs69NE7QLodRCJ_I X-Proofpoint-ORIG-GUID: oS69KoqU65lUhApCNs69NE7QLodRCJ_I X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net BPF struct_ops maps are employed directly to register TCP Congestion Control algorithms. Unlike other BPF programs that terminate when their links gone. The link of a BPF struct_ops map provides a uniform experience akin to other types of BPF programs. bpf_links are responsible for registering their associated struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag set to create a bpf_link, while a structs without this flag behaves in the same manner as before and is registered upon updating its value. The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it used to craft the links for BPF struct_ops programs, but also to create links for BPF struct_ops them-self. Since the links of BPF struct_ops programs are only used to create trampolines internally, they are never seen in other contexts. Thus, they can be reused for struct_ops themself. To maintain a reference to the map supporting this link, we add bpf_struct_ops_link as an additional type. The pointer of the map is RCU and won't be necessary until later in the patchset. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 11 +++ include/uapi/linux/bpf.h | 12 +++- kernel/bpf/bpf_struct_ops.c | 124 +++++++++++++++++++++++++++++++-- kernel/bpf/syscall.c | 23 +++--- tools/include/uapi/linux/bpf.h | 12 +++- 5 files changed, 168 insertions(+), 14 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 00b6e1a2edaf..afca6c526fe4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1548,6 +1548,7 @@ static inline void bpf_module_put(const void *data, struct module *owner) else module_put(owner); } +int bpf_struct_ops_link_create(union bpf_attr *attr); #ifdef CONFIG_NET /* Define it here to avoid the use of forward declaration */ @@ -1588,6 +1589,11 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, { return -EINVAL; } +static inline int bpf_struct_ops_link_create(union bpf_attr *attr) +{ + return -EOPNOTSUPP; +} + #endif #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) @@ -2379,6 +2385,11 @@ static inline void bpf_link_put(struct bpf_link *link) { } +static inline int bpf_struct_ops_link_create(union bpf_attr *attr) +{ + return -EOPNOTSUPP; +} + static inline int bpf_obj_get_user(const char __user *pathname, int flags) { return -EOPNOTSUPP; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 976b194eb775..f9fc7b8af3c4 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1033,6 +1033,7 @@ enum bpf_attach_type { BPF_PERF_EVENT, BPF_TRACE_KPROBE_MULTI, BPF_LSM_CGROUP, + BPF_STRUCT_OPS, __MAX_BPF_ATTACH_TYPE }; @@ -1266,6 +1267,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Create a map that will be registered/unregesitered by the backed bpf_link */ + BPF_F_LINK = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ @@ -1507,7 +1511,10 @@ union bpf_attr { } task_fd_query; struct { /* struct used by BPF_LINK_CREATE command */ - __u32 prog_fd; /* eBPF program to attach */ + union { + __u32 prog_fd; /* eBPF program to attach */ + __u32 map_fd; /* struct_ops to attach */ + }; union { __u32 target_fd; /* object to attach to */ __u32 target_ifindex; /* target ifindex */ @@ -6379,6 +6386,9 @@ struct bpf_link_info { struct { __u32 ifindex; } xdp; + struct { + __u32 map_id; + } struct_ops; }; } __attribute__((aligned(8))); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 9e097fcc9cf4..5a7e86cf67b5 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -16,6 +16,7 @@ enum bpf_struct_ops_state { BPF_STRUCT_OPS_STATE_INIT, BPF_STRUCT_OPS_STATE_INUSE, BPF_STRUCT_OPS_STATE_TOBEFREE, + BPF_STRUCT_OPS_STATE_READY, }; #define BPF_STRUCT_OPS_COMMON_VALUE \ @@ -58,6 +59,11 @@ struct bpf_struct_ops_map { struct bpf_struct_ops_value kvalue; }; +struct bpf_struct_ops_link { + struct bpf_link link; + struct bpf_map __rcu *map; +}; + static DEFINE_MUTEX(update_mutex); #define VALUE_PREFIX "bpf_struct_ops_" @@ -496,11 +502,24 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, *(unsigned long *)(udata + moff) = prog->aux->id; } - bpf_map_inc(map); - set_memory_rox((long)st_map->image, 1); + if (st_map->map.map_flags & BPF_F_LINK) { + if (st_ops->validate) { + err = st_ops->validate(kdata); + if (err) + goto unlock; + } + /* Let bpf_link handle registration & unregistration. + * + * Pair with smp_load_acquire() during lookup_elem(). + */ + smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY); + goto unlock; + } + err = st_ops->reg(kdata); if (likely(!err)) { + bpf_map_inc(map); /* Pair with smp_load_acquire() during lookup_elem(). * It ensures the above udata updates (e.g. prog->aux->id) * can be seen once BPF_STRUCT_OPS_STATE_INUSE is set. @@ -516,7 +535,6 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, */ set_memory_nx((long)st_map->image, 1); set_memory_rw((long)st_map->image, 1); - bpf_map_put(map); reset_unlock: bpf_struct_ops_map_put_progs(st_map); @@ -534,6 +552,9 @@ static int bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) struct bpf_struct_ops_map *st_map; st_map = (struct bpf_struct_ops_map *)map; + if (st_map->map.map_flags & BPF_F_LINK) + return -EOPNOTSUPP; + prev_state = cmpxchg(&st_map->kvalue.state, BPF_STRUCT_OPS_STATE_INUSE, BPF_STRUCT_OPS_STATE_TOBEFREE); @@ -601,7 +622,7 @@ static void bpf_struct_ops_map_free(struct bpf_map *map) static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr) { if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 || - attr->map_flags || !attr->btf_vmlinux_value_type_id) + (attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id) return -EINVAL; return 0; } @@ -712,3 +733,98 @@ void bpf_struct_ops_put(const void *kdata) bpf_map_put(&st_map->map); } + +static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link) +{ + struct bpf_struct_ops_link *st_link; + struct bpf_struct_ops_map *st_map; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + st_map = (struct bpf_struct_ops_map *)st_link->map; + st_map->st_ops->unreg(&st_map->kvalue.data); + bpf_map_put(st_link->map); + kfree(st_link); +} + +static void bpf_struct_ops_map_link_show_fdinfo(const struct bpf_link *link, + struct seq_file *seq) +{ + struct bpf_struct_ops_link *st_link; + struct bpf_map *map; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + rcu_read_lock(); + map = rcu_dereference(st_link->map); + if (map) + seq_printf(seq, "map_id:\t%d\n", map->id); + rcu_read_unlock(); +} + +static int bpf_struct_ops_map_link_fill_link_info(const struct bpf_link *link, + struct bpf_link_info *info) +{ + struct bpf_struct_ops_link *st_link; + struct bpf_map *map; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + rcu_read_lock(); + map = rcu_dereference(st_link->map); + if (map) + info->struct_ops.map_id = map->id; + rcu_read_unlock(); + return 0; +} + +static const struct bpf_link_ops bpf_struct_ops_map_lops = { + .dealloc = bpf_struct_ops_map_link_dealloc, + .show_fdinfo = bpf_struct_ops_map_link_show_fdinfo, + .fill_link_info = bpf_struct_ops_map_link_fill_link_info, +}; + +int bpf_struct_ops_link_create(union bpf_attr *attr) +{ + struct bpf_struct_ops_link *link = NULL; + struct bpf_link_primer link_primer; + struct bpf_struct_ops_map *st_map; + struct bpf_map *map; + int err; + + map = bpf_map_get(attr->link_create.map_fd); + if (!map) + return -EINVAL; + + st_map = (struct bpf_struct_ops_map *)map; + + if (map->map_type != BPF_MAP_TYPE_STRUCT_OPS || !(map->map_flags & BPF_F_LINK) || + /* Pair with smp_store_release() during map_update */ + smp_load_acquire(&st_map->kvalue.state) != BPF_STRUCT_OPS_STATE_READY) { + err = -EINVAL; + goto err_out; + } + + link = kzalloc(sizeof(*link), GFP_USER); + if (!link) { + err = -ENOMEM; + goto err_out; + } + bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_map_lops, NULL); + RCU_INIT_POINTER(link->map, map); + + err = bpf_link_prime(&link->link, &link_primer); + if (err) + goto err_out; + + err = st_map->st_ops->reg(st_map->kvalue.data); + if (err) { + bpf_link_cleanup(&link_primer); + goto err_out; + } + + return bpf_link_settle(&link_primer); + +err_out: + bpf_map_put(map); + kfree(link); + return err; +} + diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 03273cddd6bd..3a4503987a48 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2806,16 +2806,19 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) const struct bpf_prog *prog = link->prog; char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; - bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); seq_printf(m, "link_type:\t%s\n" - "link_id:\t%u\n" - "prog_tag:\t%s\n" - "prog_id:\t%u\n", + "link_id:\t%u\n", bpf_link_type_strs[link->type], - link->id, - prog_tag, - prog->aux->id); + link->id); + if (prog) { + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); + seq_printf(m, + "prog_tag:\t%s\n" + "prog_id:\t%u\n", + prog_tag, + prog->aux->id); + } if (link->ops->show_fdinfo) link->ops->show_fdinfo(link, m); } @@ -4290,7 +4293,8 @@ static int bpf_link_get_info_by_fd(struct file *file, info.type = link->type; info.id = link->id; - info.prog_id = link->prog->aux->id; + if (link->prog) + info.prog_id = link->prog->aux->id; if (link->ops->fill_link_info) { err = link->ops->fill_link_info(link, &info); @@ -4553,6 +4557,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) if (CHECK_ATTR(BPF_LINK_CREATE)) return -EINVAL; + if (attr->link_create.attach_type == BPF_STRUCT_OPS) + return bpf_struct_ops_link_create(attr); + prog = bpf_prog_get(attr->link_create.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 976b194eb775..051b85525302 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1033,6 +1033,7 @@ enum bpf_attach_type { BPF_PERF_EVENT, BPF_TRACE_KPROBE_MULTI, BPF_LSM_CGROUP, + BPF_STRUCT_OPS, __MAX_BPF_ATTACH_TYPE }; @@ -1266,6 +1267,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Create a map that will be registered/unregesitered by the backed bpf_link */ + BPF_F_LINK = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ @@ -1507,7 +1511,10 @@ union bpf_attr { } task_fd_query; struct { /* struct used by BPF_LINK_CREATE command */ - __u32 prog_fd; /* eBPF program to attach */ + union { + __u32 prog_fd; /* eBPF program to attach */ + __u32 map_fd; /* eBPF struct_ops to attach */ + }; union { __u32 target_fd; /* object to attach to */ __u32 target_ifindex; /* target ifindex */ @@ -6379,6 +6386,9 @@ struct bpf_link_info { struct { __u32 ifindex; } xdp; + struct { + __u32 map_id; + } struct_ops; }; } __attribute__((aligned(8))); From patchwork Wed Mar 8 00:50:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165035 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B90AAC6FA99 for ; Wed, 8 Mar 2023 00:51:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229605AbjCHAvL (ORCPT ); Tue, 7 Mar 2023 19:51:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229574AbjCHAvK (ORCPT ); Tue, 7 Mar 2023 19:51:10 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 315D8A9DEE for ; Tue, 7 Mar 2023 16:51:08 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32805C3V001603 for ; Tue, 7 Mar 2023 16:51:08 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=Vmh4sckhr1JnJXdL7JYiz7yy6wpia1wiGhnKjRG31ng=; b=WutrvywNHIZhmGl30vih/2O41EIDYxjArhehVUTXSS/lChYtcK+12EyCCUyJRMu2lF1o oDbC6OQxqebs7lXZRxnJjwbT6Ing3RQC6jXspVGOme8JbQG29HGj9Nu/foxHU0hmcWkr vlew+Ac1Emy4RQvvgRtbhnh0K+Rdmdz2ZXCg82ffLnSwQZ0f90lEhNLxE3AokfGuZyCK 1ZPTYxMJl6pJcomc8I06i33EpfTI31vnk3lLcw7cQPeBDXHwz+R3dbCqOpPtDRQdXUXD rYCR7E6+zw1Mu0nS84hPhwtmBoohz2bHltHZ3gYda/fi3qwfo3Oe7xe1OGLhlqeDmF0N GA== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6fgp071f-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:51:07 -0800 Received: from twshared52565.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:51:05 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 9066B6C92E1A; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 4/8] libbpf: Create a bpf_link in bpf_map__attach_struct_ops(). Date: Tue, 7 Mar 2023 16:50:46 -0800 Message-ID: <20230308005050.255859-5-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: iTaxcSRCIo8CzB8KpkU-LXJ_VmZTBjoG X-Proofpoint-ORIG-GUID: iTaxcSRCIo8CzB8KpkU-LXJ_VmZTBjoG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net bpf_map__attach_struct_ops() was creating a dummy bpf_link as a placeholder, but now it is constructing an authentic one by calling bpf_link_create() if the map has the BPF_F_LINK flag. You can flag a struct_ops map with BPF_F_LINK by calling bpf_map__set_map_flags(). Signed-off-by: Kui-Feng Lee --- tools/lib/bpf/libbpf.c | 84 +++++++++++++++++++++++++++++++----------- 1 file changed, 62 insertions(+), 22 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index a557718401e4..f70b55c0f40e 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -116,6 +116,7 @@ static const char * const attach_type_name[] = { [BPF_SK_REUSEPORT_SELECT_OR_MIGRATE] = "sk_reuseport_select_or_migrate", [BPF_PERF_EVENT] = "perf_event", [BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi", + [BPF_STRUCT_OPS] = "struct_ops", }; static const char * const link_type_name[] = { @@ -7677,6 +7678,26 @@ static int bpf_object__resolve_externs(struct bpf_object *obj, return 0; } +static void bpf_map_prepare_vdata(const struct bpf_map *map) +{ + struct bpf_struct_ops *st_ops; + __u32 i; + + st_ops = map->st_ops; + for (i = 0; i < btf_vlen(st_ops->type); i++) { + struct bpf_program *prog = st_ops->progs[i]; + void *kern_data; + int prog_fd; + + if (!prog) + continue; + + prog_fd = bpf_program__fd(prog); + kern_data = st_ops->kern_vdata + st_ops->kern_func_off[i]; + *(unsigned long *)kern_data = prog_fd; + } +} + static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const char *target_btf_path) { int err, i; @@ -7728,6 +7749,10 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch btf__free(obj->btf_vmlinux); obj->btf_vmlinux = NULL; + for (i = 0; i < obj->nr_maps; i++) + if (bpf_map__is_struct_ops(&obj->maps[i])) + bpf_map_prepare_vdata(&obj->maps[i]); + obj->loaded = true; /* doesn't matter if successfully or not */ if (err) @@ -11566,22 +11591,34 @@ struct bpf_link *bpf_program__attach(const struct bpf_program *prog) return link; } +struct bpf_link_struct_ops { + struct bpf_link link; + int map_fd; +}; + static int bpf_link__detach_struct_ops(struct bpf_link *link) { + struct bpf_link_struct_ops *st_link; __u32 zero = 0; - if (bpf_map_delete_elem(link->fd, &zero)) - return -errno; + st_link = container_of(link, struct bpf_link_struct_ops, link); - return 0; + if (st_link->map_fd < 0) { + /* Fake bpf_link */ + if (bpf_map_delete_elem(link->fd, &zero)) + return -errno; + return 0; + } + + /* Doesn't support detaching. */ + return -EOPNOTSUPP; } struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) { - struct bpf_struct_ops *st_ops; - struct bpf_link *link; - __u32 i, zero = 0; - int err; + struct bpf_link_struct_ops *link; + __u32 zero = 0; + int err, fd; if (!bpf_map__is_struct_ops(map) || map->fd == -1) return libbpf_err_ptr(-EINVAL); @@ -11590,31 +11627,34 @@ struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) if (!link) return libbpf_err_ptr(-EINVAL); - st_ops = map->st_ops; - for (i = 0; i < btf_vlen(st_ops->type); i++) { - struct bpf_program *prog = st_ops->progs[i]; - void *kern_data; - int prog_fd; + /* kern_vdata should be prepared during the loading phase. */ + err = bpf_map_update_elem(map->fd, &zero, map->st_ops->kern_vdata, 0); + if (err) { + err = -errno; + free(link); + return libbpf_err_ptr(err); + } - if (!prog) - continue; - prog_fd = bpf_program__fd(prog); - kern_data = st_ops->kern_vdata + st_ops->kern_func_off[i]; - *(unsigned long *)kern_data = prog_fd; + if (!(map->def.map_flags & BPF_F_LINK)) { + /* Fake bpf_link */ + link->link.fd = map->fd; + link->map_fd = -1; + link->link.detach = bpf_link__detach_struct_ops; + return &link->link; } - err = bpf_map_update_elem(map->fd, &zero, st_ops->kern_vdata, 0); - if (err) { + fd = bpf_link_create(map->fd, -1, BPF_STRUCT_OPS, NULL); + if (fd < 0) { err = -errno; free(link); return libbpf_err_ptr(err); } - link->detach = bpf_link__detach_struct_ops; - link->fd = map->fd; + link->link.fd = fd; + link->map_fd = map->fd; - return link; + return &link->link; } typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(struct perf_event_header *hdr, From patchwork Wed Mar 8 00:50:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165042 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60B02C678D5 for ; Wed, 8 Mar 2023 00:53:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229706AbjCHAxq (ORCPT ); Tue, 7 Mar 2023 19:53:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229732AbjCHAxp (ORCPT ); Tue, 7 Mar 2023 19:53:45 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F7D1984C5 for ; Tue, 7 Mar 2023 16:53:43 -0800 (PST) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32800e0e023549 for ; Tue, 7 Mar 2023 16:53:43 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=y19padtCiXBhWeTHeZaIQKUnIyNQqGgu8ga+f3l70xY=; b=iB6kwo7eje9fkWXPcdcMIevyKpn/ZkBlTxkqpkKiGh4F9sMSPBvI5BxWAuz/+jGSrbrA khD/3uo7GZmXjJA4Qks9krmlZ4NONgQ4RgSWD+KJEUdt7daJcsivoo5hvOA8OZjdDUty rvxmZzCZzpn4SaZRAYRAWF4sbr+nD9jrgkguaqi1A2htZWAdVwOQMZlyX4QK3Vux0OsW OPted16Faem880ay6yU5WDsdsrWNKu4cuRmJKZzHVM4M06pqoCO99h39lgVLQtHPWyg7 LelvZRGCjSX1FHkG0tlBp5rrFSj3PYYW3/gvdRCqhlp8oftrn5mk4Nu/pMusWRGdWHzt 0Q== Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6fekg8pq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:53:42 -0800 Received: from twshared33736.38.frc1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:53:41 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id A0A716C92E1C; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 5/8] bpf: Update the struct_ops of a bpf_link. Date: Tue, 7 Mar 2023 16:50:47 -0800 Message-ID: <20230308005050.255859-6-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 5Le3zePsQguySn2_aY8JgE290AB9mWz3 X-Proofpoint-ORIG-GUID: 5Le3zePsQguySn2_aY8JgE290AB9mWz3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net By improving the BPF_LINK_UPDATE command of bpf(), it should allow you to conveniently switch between different struct_ops on a single bpf_link. This would enable smoother transitions from one struct_ops to another. The struct_ops maps passing along with BPF_LINK_UPDATE should have the BPF_F_LINK flag. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 8 ++++-- kernel/bpf/bpf_struct_ops.c | 46 ++++++++++++++++++++++++++++++++++ kernel/bpf/syscall.c | 43 ++++++++++++++++++++++++++++--- tools/include/uapi/linux/bpf.h | 7 +++++- 5 files changed, 98 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index afca6c526fe4..29d555a82bad 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1470,6 +1470,7 @@ struct bpf_link_ops { void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq); int (*fill_link_info)(const struct bpf_link *link, struct bpf_link_info *info); + int (*update_map)(struct bpf_link *link, struct bpf_map *new_map); }; struct bpf_tramp_link { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index f9fc7b8af3c4..edef9cf7d596 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1555,8 +1555,12 @@ union bpf_attr { struct { /* struct used by BPF_LINK_UPDATE command */ __u32 link_fd; /* link fd */ - /* new program fd to update link with */ - __u32 new_prog_fd; + union { + /* new program fd to update link with */ + __u32 new_prog_fd; + /* new struct_ops map fd to update link with */ + __u32 new_map_fd; + }; __u32 flags; /* extra flags */ /* expected link's program fd; is specified only if * BPF_F_REPLACE flag is set in flags */ diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 5a7e86cf67b5..79e663869e51 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -775,10 +775,56 @@ static int bpf_struct_ops_map_link_fill_link_info(const struct bpf_link *link, return 0; } +static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map *new_map) +{ + struct bpf_struct_ops_value *kvalue; + struct bpf_struct_ops_map *st_map, *old_st_map; + struct bpf_struct_ops_link *st_link; + struct bpf_map *old_map; + int err = 0; + + if (new_map->map_type != BPF_MAP_TYPE_STRUCT_OPS || + !(new_map->map_flags & BPF_F_LINK)) + return -EINVAL; + + mutex_lock(&update_mutex); + + st_link = container_of(link, struct bpf_struct_ops_link, link); + + /* The new and old struct_ops must be the same type. */ + st_map = container_of(new_map, struct bpf_struct_ops_map, map); + + old_map = st_link->map; + old_st_map = container_of(old_map, struct bpf_struct_ops_map, map); + if (st_map->st_ops != old_st_map->st_ops || + /* Pair with smp_store_release() during map_update */ + smp_load_acquire(&st_map->kvalue.state) != BPF_STRUCT_OPS_STATE_READY) { + err = -EINVAL; + goto err_out; + } + + kvalue = &st_map->kvalue; + + err = st_map->st_ops->update(kvalue->data, old_st_map->kvalue.data); + if (err) + goto err_out; + + bpf_map_inc(new_map); + rcu_assign_pointer(st_link->map, new_map); + + bpf_map_put(old_map); + +err_out: + mutex_unlock(&update_mutex); + + return err; +} + static const struct bpf_link_ops bpf_struct_ops_map_lops = { .dealloc = bpf_struct_ops_map_link_dealloc, .show_fdinfo = bpf_struct_ops_map_link_show_fdinfo, .fill_link_info = bpf_struct_ops_map_link_fill_link_info, + .update_map = bpf_struct_ops_map_link_update, }; int bpf_struct_ops_link_create(union bpf_attr *attr) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 3a4503987a48..c087dd2e2c08 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4658,6 +4658,30 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) return ret; } +static int link_update_map(struct bpf_link *link, union bpf_attr *attr) +{ + struct bpf_map *new_map; + int ret = 0; + + new_map = bpf_map_get(attr->link_update.new_map_fd); + if (IS_ERR(new_map)) + return -EINVAL; + + if (new_map->map_type != BPF_MAP_TYPE_STRUCT_OPS) { + ret = -EINVAL; + goto out_put_map; + } + + if (link->ops->update_map) + ret = link->ops->update_map(link, new_map); + else + ret = -EINVAL; + +out_put_map: + bpf_map_put(new_map); + return ret; +} + #define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd static int link_update(union bpf_attr *attr) @@ -4670,14 +4694,25 @@ static int link_update(union bpf_attr *attr) if (CHECK_ATTR(BPF_LINK_UPDATE)) return -EINVAL; - flags = attr->link_update.flags; - if (flags & ~BPF_F_REPLACE) - return -EINVAL; - link = bpf_link_get_from_fd(attr->link_update.link_fd); if (IS_ERR(link)) return PTR_ERR(link); + flags = attr->link_update.flags; + + if (link->ops->update_map) { + if (flags) /* always replace the existing one */ + ret = -EINVAL; + else + ret = link_update_map(link, attr); + goto out_put_link; + } + + if (flags & ~BPF_F_REPLACE) { + ret = -EINVAL; + goto out_put_link; + } + new_prog = bpf_prog_get(attr->link_update.new_prog_fd); if (IS_ERR(new_prog)) { ret = PTR_ERR(new_prog); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 051b85525302..cfd1d2a9898d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1556,7 +1556,12 @@ union bpf_attr { struct { /* struct used by BPF_LINK_UPDATE command */ __u32 link_fd; /* link fd */ /* new program fd to update link with */ - __u32 new_prog_fd; + union { + /* new program fd to update link with */ + __u32 new_prog_fd; + /* new struct_ops map fd to update link with */ + __u32 new_map_fd; + }; __u32 flags; /* extra flags */ /* expected link's program fd; is specified only if * BPF_F_REPLACE flag is set in flags */ From patchwork Wed Mar 8 00:50:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165041 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFA23C678D5 for ; Wed, 8 Mar 2023 00:53:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229760AbjCHAxm (ORCPT ); Tue, 7 Mar 2023 19:53:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229732AbjCHAxm (ORCPT ); Tue, 7 Mar 2023 19:53:42 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C9BB99245 for ; Tue, 7 Mar 2023 16:53:41 -0800 (PST) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32803UuC006025 for ; Tue, 7 Mar 2023 16:53:40 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=xGyFyd+lQUC+ifSexXOhvgCJY1riQKqULnjP5jvfkeY=; b=cSSiHGpn2hClql9nz4tm7lfs82Kd8xgCZrRnHepK6LJnAQlFV5bIgFMSCOl4aVYpi0Fg xvTj4xG4KXzjTqNblzMQNu0qsaFiq2IAZTIQycRP2F9KGeEekNLdeb/9CyqJJUuZZuw7 h4LrCXNfuF5523xJRbymiokQct8GrrcztmYYUIIx65L6R+x6x2dcsagVZiKJFiAQo8Ha KSiKU4K9tVN/iqxJ6XS30s3KkNMq5EdvjKL4KGUKVVIquQAB93iChcFshqSRWReSi5/c r8oRAvKCsAKe/LSoU0MN0NUlWgAE42s/VdeK+2kYIpwidLX63FCxuTqBVsZ2VIgWAGDl 2w== Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6fft87v5-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:53:40 -0800 Received: from twshared24004.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:53:39 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id A6C6C6C92E1E; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 6/8] libbpf: Update a bpf_link with another struct_ops. Date: Tue, 7 Mar 2023 16:50:48 -0800 Message-ID: <20230308005050.255859-7-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: ckTmwa2L7ajxZy-SmLOEpTugHT2I4JNA X-Proofpoint-ORIG-GUID: ckTmwa2L7ajxZy-SmLOEpTugHT2I4JNA X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Introduce bpf_link__update_struct_ops(), which will allow you to effortlessly transition the struct_ops map of any given bpf_link into an alternative. Signed-off-by: Kui-Feng Lee --- tools/lib/bpf/libbpf.c | 36 ++++++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf.h | 1 + tools/lib/bpf/libbpf.map | 1 + 3 files changed, 38 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index f70b55c0f40e..0406d0e00e1f 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -11657,6 +11657,42 @@ struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) return &link->link; } +/* + * Swap the back struct_ops of a link with a new struct_ops map. + */ +int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map) +{ + struct bpf_link_struct_ops *st_ops_link; + __u32 zero = 0; + int err, fd; + + if (!bpf_map__is_struct_ops(map) || map->fd == -1) + return -EINVAL; + + st_ops_link = container_of(link, struct bpf_link_struct_ops, link); + /* Ensure the type of a link is correct */ + if (st_ops_link->map_fd < 0) + return -EINVAL; + + err = bpf_map_update_elem(map->fd, &zero, map->st_ops->kern_vdata, 0); + if (err && errno != EBUSY) { + err = -errno; + free(link); + return err; + } + + fd = bpf_link_update(link->fd, map->fd, NULL); + if (fd < 0) { + err = -errno; + free(link); + return err; + } + + st_ops_link->map_fd = map->fd; + + return 0; +} + typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(struct perf_event_header *hdr, void *private_data); diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index db4992a036f8..1615e55e2e79 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -719,6 +719,7 @@ bpf_program__attach_freplace(const struct bpf_program *prog, struct bpf_map; LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map); +LIBBPF_API int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map); struct bpf_iter_attach_opts { size_t sz; /* size of this struct for forward/backward compatibility */ diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 50dde1f6521e..cc05be376257 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -387,6 +387,7 @@ LIBBPF_1.2.0 { global: bpf_btf_get_info_by_fd; bpf_link_get_info_by_fd; + bpf_link__update_map; bpf_map_get_info_by_fd; bpf_prog_get_info_by_fd; } LIBBPF_1.1.0; From patchwork Wed Mar 8 00:50:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165037 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88A32C678D5 for ; Wed, 8 Mar 2023 00:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229574AbjCHAvR (ORCPT ); Tue, 7 Mar 2023 19:51:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229917AbjCHAvQ (ORCPT ); Tue, 7 Mar 2023 19:51:16 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E94EA9094 for ; Tue, 7 Mar 2023 16:51:14 -0800 (PST) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32803Tds005982 for ; Tue, 7 Mar 2023 16:51:14 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=xBtLHRi/fE39rmtAIXz9SbXkIFMqtXCh5jvOhDGCg74=; b=fBvMUziRSQUJJ6Oht6OXijt72SKkOaWnqYnGLLaKGBzP+Thg1bJgvykMEd86bhLZsQx5 x6V/dRNK3cey9qbBxIMDY9x2ess1kY/znlX7kh3PVGPg+EhihCY/mLKf0Jn9x7YtVgtp B+mZfkiJpmgQcAvzcnvfeeziE6pmRNGgUMCQk20CnXH/fT+dIwtCiZuEd+2b4GtXfUFX 1f/qlFEPWZ/ysP1348OcuElRCg0MZgePAMklKZNRHul6hJuDFZA75TOlyztSwjNUWH2R o7OcP8rNOXHV6LxnaLivAsKsIcen93HoEiQQRrR4nniWfAgqS9uRsZYflI86I3lm593H Ew== Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6fft87a2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:51:14 -0800 Received: from twshared19568.39.frc1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:51:13 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id B50716C92E20; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 7/8] libbpf: Use .struct_ops.link section to indicate a struct_ops with a link. Date: Tue, 7 Mar 2023 16:50:49 -0800 Message-ID: <20230308005050.255859-8-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: wBBWptg1MlvesFQ9TbAecinu8z78T9D_ X-Proofpoint-ORIG-GUID: wBBWptg1MlvesFQ9TbAecinu8z78T9D_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Flags a struct_ops is to back a bpf_link by putting it to the ".struct_ops.link" section. Once it is flagged, the created struct_ops can be used to create a bpf_link or update a bpf_link that has been backed by another struct_ops. Signed-off-by: Kui-Feng Lee --- tools/lib/bpf/libbpf.c | 64 +++++++++++++++++++++++++++++++++--------- 1 file changed, 50 insertions(+), 14 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 0406d0e00e1f..589eea158d95 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -468,6 +468,7 @@ struct bpf_struct_ops { #define KCONFIG_SEC ".kconfig" #define KSYMS_SEC ".ksyms" #define STRUCT_OPS_SEC ".struct_ops" +#define STRUCT_OPS_LINK_SEC ".struct_ops.link" enum libbpf_map_type { LIBBPF_MAP_UNSPEC, @@ -597,6 +598,7 @@ struct elf_state { Elf64_Ehdr *ehdr; Elf_Data *symbols; Elf_Data *st_ops_data; + Elf_Data *st_ops_link_data; size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct elf_sec_desc *secs; @@ -606,6 +608,7 @@ struct elf_state { int text_shndx; int symbols_shndx; int st_ops_shndx; + int st_ops_link_shndx; }; struct usdt_manager; @@ -1119,7 +1122,7 @@ static int bpf_object__init_kern_struct_ops_maps(struct bpf_object *obj) return 0; } -static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) +static int bpf_object__init_struct_ops_maps_link(struct bpf_object *obj, bool link) { const struct btf_type *type, *datasec; const struct btf_var_secinfo *vsi; @@ -1127,18 +1130,33 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) const char *tname, *var_name; __s32 type_id, datasec_id; const struct btf *btf; + const char *sec_name; struct bpf_map *map; - __u32 i; + __u32 i, map_flags; + Elf_Data *data; + int shndx; - if (obj->efile.st_ops_shndx == -1) + if (link) { + sec_name = STRUCT_OPS_LINK_SEC; + shndx = obj->efile.st_ops_link_shndx; + data = obj->efile.st_ops_link_data; + map_flags = BPF_F_LINK; + } else { + sec_name = STRUCT_OPS_SEC; + shndx = obj->efile.st_ops_shndx; + data = obj->efile.st_ops_data; + map_flags = 0; + } + + if (shndx == -1) return 0; btf = obj->btf; - datasec_id = btf__find_by_name_kind(btf, STRUCT_OPS_SEC, + datasec_id = btf__find_by_name_kind(btf, sec_name, BTF_KIND_DATASEC); if (datasec_id < 0) { pr_warn("struct_ops init: DATASEC %s not found\n", - STRUCT_OPS_SEC); + sec_name); return -EINVAL; } @@ -1151,7 +1169,7 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) type_id = btf__resolve_type(obj->btf, vsi->type); if (type_id < 0) { pr_warn("struct_ops init: Cannot resolve var type_id %u in DATASEC %s\n", - vsi->type, STRUCT_OPS_SEC); + vsi->type, sec_name); return -EINVAL; } @@ -1170,7 +1188,7 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) if (IS_ERR(map)) return PTR_ERR(map); - map->sec_idx = obj->efile.st_ops_shndx; + map->sec_idx = shndx; map->sec_offset = vsi->offset; map->name = strdup(var_name); if (!map->name) @@ -1180,6 +1198,7 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) map->def.key_size = sizeof(int); map->def.value_size = type->size; map->def.max_entries = 1; + map->def.map_flags = map_flags; map->st_ops = calloc(1, sizeof(*map->st_ops)); if (!map->st_ops) @@ -1192,14 +1211,14 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) if (!st_ops->data || !st_ops->progs || !st_ops->kern_func_off) return -ENOMEM; - if (vsi->offset + type->size > obj->efile.st_ops_data->d_size) { + if (vsi->offset + type->size > data->d_size) { pr_warn("struct_ops init: var %s is beyond the end of DATASEC %s\n", - var_name, STRUCT_OPS_SEC); + var_name, sec_name); return -EINVAL; } memcpy(st_ops->data, - obj->efile.st_ops_data->d_buf + vsi->offset, + data->d_buf + vsi->offset, type->size); st_ops->tname = tname; st_ops->type = type; @@ -1212,6 +1231,15 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) return 0; } +static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) +{ + int err; + + err = bpf_object__init_struct_ops_maps_link(obj, false); + err = err ?: bpf_object__init_struct_ops_maps_link(obj, true); + return err; +} + static struct bpf_object *bpf_object__new(const char *path, const void *obj_buf, size_t obj_buf_sz, @@ -1248,6 +1276,7 @@ static struct bpf_object *bpf_object__new(const char *path, obj->efile.obj_buf_sz = obj_buf_sz; obj->efile.btf_maps_shndx = -1; obj->efile.st_ops_shndx = -1; + obj->efile.st_ops_link_shndx = -1; obj->kconfig_map_idx = -1; obj->kern_version = get_kernel_version(); @@ -1265,6 +1294,7 @@ static void bpf_object__elf_finish(struct bpf_object *obj) obj->efile.elf = NULL; obj->efile.symbols = NULL; obj->efile.st_ops_data = NULL; + obj->efile.st_ops_link_data = NULL; zfree(&obj->efile.secs); obj->efile.sec_cnt = 0; @@ -2753,12 +2783,13 @@ static bool libbpf_needs_btf(const struct bpf_object *obj) { return obj->efile.btf_maps_shndx >= 0 || obj->efile.st_ops_shndx >= 0 || + obj->efile.st_ops_link_shndx >= 0 || obj->nr_extern > 0; } static bool kernel_needs_btf(const struct bpf_object *obj) { - return obj->efile.st_ops_shndx >= 0; + return obj->efile.st_ops_shndx >= 0 || obj->efile.st_ops_link_shndx >= 0; } static int bpf_object__init_btf(struct bpf_object *obj, @@ -3451,6 +3482,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj) } else if (strcmp(name, STRUCT_OPS_SEC) == 0) { obj->efile.st_ops_data = data; obj->efile.st_ops_shndx = idx; + } else if (strcmp(name, STRUCT_OPS_LINK_SEC) == 0) { + obj->efile.st_ops_link_data = data; + obj->efile.st_ops_link_shndx = idx; } else { pr_info("elf: skipping unrecognized data section(%d) %s\n", idx, name); @@ -3465,6 +3499,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj) /* Only do relo for section with exec instructions */ if (!section_have_execinstr(obj, targ_sec_idx) && strcmp(name, ".rel" STRUCT_OPS_SEC) && + strcmp(name, ".rel" STRUCT_OPS_LINK_SEC) && strcmp(name, ".rel" MAPS_ELF_SEC)) { pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n", idx, name, targ_sec_idx, @@ -6611,7 +6646,7 @@ static int bpf_object__collect_relos(struct bpf_object *obj) return -LIBBPF_ERRNO__INTERNAL; } - if (idx == obj->efile.st_ops_shndx) + if (idx == obj->efile.st_ops_shndx || idx == obj->efile.st_ops_link_shndx) err = bpf_object__collect_st_ops_relos(obj, shdr, data); else if (idx == obj->efile.btf_maps_shndx) err = bpf_object__collect_map_relos(obj, shdr, data); @@ -8954,8 +8989,9 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, } /* struct_ops BPF prog can be re-used between multiple - * .struct_ops as long as it's the same struct_ops struct - * definition and the same function pointer field + * .struct_ops & .struct_ops.link as long as it's the + * same struct_ops struct definition and the same + * function pointer field */ if (prog->attach_btf_id != st_ops->type_id || prog->expected_attach_type != member_idx) { From patchwork Wed Mar 8 00:50:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13165040 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2363DC678D5 for ; Wed, 8 Mar 2023 00:53:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229707AbjCHAxj (ORCPT ); Tue, 7 Mar 2023 19:53:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229484AbjCHAxi (ORCPT ); Tue, 7 Mar 2023 19:53:38 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25ACF984C5 for ; Tue, 7 Mar 2023 16:53:37 -0800 (PST) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 328031bZ010098 for ; Tue, 7 Mar 2023 16:53:37 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=geIAKPSU6XMI+auXswOSbzwZRLvdp28+u4JQppatNfY=; b=c7r5eEk9vF2UCM6bF+feV1pUcMbPNNAb94e1QXoiUYBFyaLZHvG+opcK3zwSLUsCeZSZ MQqqeyraEaDDURPzG0unJDJOwBpHdpSFiWpZpclTyXa98fVO/2S0xygi88bxzYZrKvtT dAbE9xyyGz9DTrf07oXiz6E7non4jgwBC3jf6Cw0divKDGIrPLjZLWVYVn/5sLlLCgtK AYkbl2M5Bp3ChXMjdxw8ZbAw7EgVD7NEJInBNfj8po0AYQJw4q3MQidMYxJJKHbQUbKi vX7OQ65cED1H/SqDiVb8j+40pzyjpJ44EGJgl15itwQ2YIOLDhmXF675HQc5X+MZKWJf 1A== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p6ffpr85d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 07 Mar 2023 16:53:36 -0800 Received: from twshared52565.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 7 Mar 2023 16:53:35 -0800 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id C33EF6C92E22; Tue, 7 Mar 2023 16:50:54 -0800 (PST) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v5 8/8] selftests/bpf: Test switching TCP Congestion Control algorithms. Date: Tue, 7 Mar 2023 16:50:50 -0800 Message-ID: <20230308005050.255859-9-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230308005050.255859-1-kuifeng@meta.com> References: <20230308005050.255859-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: GuYmRml1E--KZ_CN05O7Hd2rxlx_tjP1 X-Proofpoint-GUID: GuYmRml1E--KZ_CN05O7Hd2rxlx_tjP1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-07_18,2023-03-07_01,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Create a pair of sockets that utilize the congestion control algorithm under a particular name. Then switch up this congestion control algorithm to another implementation and check whether newly created connections using the same cc name now run the new implementation. Signed-off-by: Kui-Feng Lee --- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 38 ++++++++++++ .../selftests/bpf/progs/tcp_ca_update.c | 62 +++++++++++++++++++ 2 files changed, 100 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_update.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c index e980188d4124..caaa9175ee36 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -8,6 +8,7 @@ #include "bpf_dctcp.skel.h" #include "bpf_cubic.skel.h" #include "bpf_tcp_nogpl.skel.h" +#include "tcp_ca_update.skel.h" #include "bpf_dctcp_release.skel.h" #include "tcp_ca_write_sk_pacing.skel.h" #include "tcp_ca_incompl_cong_ops.skel.h" @@ -381,6 +382,41 @@ static void test_unsupp_cong_op(void) libbpf_set_print(old_print_fn); } +static void test_update_ca(void) +{ + struct tcp_ca_update *skel; + struct bpf_link *link; + int saved_ca1_cnt; + int err; + + skel = tcp_ca_update__open(); + if (!ASSERT_OK_PTR(skel, "open")) + return; + + err = tcp_ca_update__load(skel); + if (!ASSERT_OK(err, "load")) { + tcp_ca_update__destroy(skel); + return; + } + + link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); + ASSERT_OK_PTR(link, "attach_struct_ops"); + + do_test("tcp_ca_update", NULL); + saved_ca1_cnt = skel->bss->ca1_cnt; + ASSERT_GT(saved_ca1_cnt, 0, "ca1_ca1_cnt"); + + err = bpf_link__update_map(link, skel->maps.ca_update_2); + ASSERT_OK(err, "update_struct_ops"); + + do_test("tcp_ca_update", NULL); + ASSERT_EQ(skel->bss->ca1_cnt, saved_ca1_cnt, "ca2_ca1_cnt"); + ASSERT_GT(skel->bss->ca2_cnt, 0, "ca2_ca2_cnt"); + + bpf_link__destroy(link); + tcp_ca_update__destroy(skel); +} + void test_bpf_tcp_ca(void) { if (test__start_subtest("dctcp")) @@ -399,4 +435,6 @@ void test_bpf_tcp_ca(void) test_incompl_cong_ops(); if (test__start_subtest("unsupp_cong_op")) test_unsupp_cong_op(); + if (test__start_subtest("update_ca")) + test_update_ca(); } diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_update.c b/tools/testing/selftests/bpf/progs/tcp_ca_update.c new file mode 100644 index 000000000000..36a04be95df5 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/tcp_ca_update.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" + +#include +#include + +char _license[] SEC("license") = "GPL"; + +int ca1_cnt = 0; +int ca2_cnt = 0; + +#define USEC_PER_SEC 1000000UL + +#define min(a, b) ((a) < (b) ? (a) : (b)) + +static inline struct tcp_sock *tcp_sk(const struct sock *sk) +{ + return (struct tcp_sock *)sk; +} + +SEC("struct_ops/ca_update_1_cong_control") +void BPF_PROG(ca_update_1_cong_control, struct sock *sk, + const struct rate_sample *rs) +{ + ca1_cnt++; +} + +SEC("struct_ops/ca_update_2_cong_control") +void BPF_PROG(ca_update_2_cong_control, struct sock *sk, + const struct rate_sample *rs) +{ + ca2_cnt++; +} + +SEC("struct_ops/ca_update_ssthresh") +__u32 BPF_PROG(ca_update_ssthresh, struct sock *sk) +{ + return tcp_sk(sk)->snd_ssthresh; +} + +SEC("struct_ops/ca_update_undo_cwnd") +__u32 BPF_PROG(ca_update_undo_cwnd, struct sock *sk) +{ + return tcp_sk(sk)->snd_cwnd; +} + +SEC(".struct_ops.link") +struct tcp_congestion_ops ca_update_1 = { + .cong_control = (void *)ca_update_1_cong_control, + .ssthresh = (void *)ca_update_ssthresh, + .undo_cwnd = (void *)ca_update_undo_cwnd, + .name = "tcp_ca_update", +}; + +SEC(".struct_ops.link") +struct tcp_congestion_ops ca_update_2 = { + .cong_control = (void *)ca_update_2_cong_control, + .ssthresh = (void *)ca_update_ssthresh, + .undo_cwnd = (void *)ca_update_undo_cwnd, + .name = "tcp_ca_update", +};