From patchwork Mon Mar 20 19:56:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181798 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0F6DC6FD1D for ; Mon, 20 Mar 2023 19:57:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229940AbjCTT5L (ORCPT ); Mon, 20 Mar 2023 15:57:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230000AbjCTT5K (ORCPT ); Mon, 20 Mar 2023 15:57:10 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2701522DDE for ; Mon, 20 Mar 2023 12:57:07 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH743f008139 for ; Mon, 20 Mar 2023 12:57:07 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=yCIuXsuNrCBmzuSiAZsGrmHGqerpaTnqPUs3YtArSl4=; b=nUEW8yC46wXAC7qTuQRkaFXlEPGTTmExGAIFQbq1LVdbp/bcorVrZZpdUT9fB+lD+t68 48M+PA/FtNMeJNwSmqlJE4BFBVrSLDRQE6kwh8JaDYFJK7vX2n7SweT0eyIUazhb82r8 PwKuJSMD8NogNBrNxw+Ucnuj4ELg55CeCmY7SDl4h+kh5RB3P61fq9S0QyHJZUVQ1wvq frEIEKKW9gfLY7HYZVvpLzk9lBCYdOWwDppkUlOmAseAral18MMr4b4+sF13+3rkWsYo sAwtquQ5rCxoYaAT19/1arXVEnawSoo6RklaV/b48ITwcO3nEbY2MNccCULPxYGWdM0e +w== Received: from nam12-bn8-obe.outbound.protection.outlook.com (mail-bn8nam12lp2168.outbound.protection.outlook.com [104.47.55.168]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3pdae1kuxx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Mar 2023 12:57:07 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hYgbVCvc3ceKipVPC7n9elBqk/36TOeGneerGrfXolfID2n4N/OOsscnoPsjpxdGg1CrTlkYBW8QII9w82Kw+QA6cQT4p6EyyJ3GgtuI4hq98KU7ryjGV1Ush79MYwKK3A2WY1rH7VWQ/jLgmHZqUaSavhJNvd+4QPKqosPldZcO8iluhfXXjNOTFS9pBtZGOK/Sn0iAOZB2dip6OEbpwWb/hG+Qhezf/EIsAHuknH91lizlge1s6t0rRIBh/ww1YKDn2ftB69G3/8kGW1X6QfZ1Twn56RZ961Vd/3hN0jlioKG0uh5TrQCuEtG7nvSA7QaYdszHiw5IHUs01oQ85g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yCIuXsuNrCBmzuSiAZsGrmHGqerpaTnqPUs3YtArSl4=; b=nwVKjjPzVI2VY9Wv6NI+SzV5TPIyOWdGvu/JxD5NASbCf0gcjOrTNXRWzwUa0o9og5ZAVhNs+PcCUvf5w7wJcqEGSzMu+xpHoJrTsEPyS/kFe80eGNBs3HSis3987FlVKaecGTqgp2grZcSvH7AHKTjcvuyjd0mEjpMRXNyUy4RFNiCDMJdAGRmz1pOMuL1iTQqtIoj0n3DN2FYCAcdeQL9T1D995hmf3dmB7ng47vIcsMVJGKOOqTN9erg2kXUYdKRj2RugI7idWcAS1VS4X6yhQH4ak1vWy6TLNHYHY2hRCmMCtwIN/ljCEEjmCIHfnkZBNPY4ato6C1fCRRrzNQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 69.171.232.181) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=meta.com; dmarc=fail (p=reject sp=reject pct=100) action=oreject header.from=meta.com; dkim=none (message not signed); arc=none Received: from DS7PR03CA0176.namprd03.prod.outlook.com (2603:10b6:5:3b2::31) by BY1PR15MB5957.namprd15.prod.outlook.com (2603:10b6:a03:52b::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37; Mon, 20 Mar 2023 19:57:04 +0000 Received: from DM6NAM12FT079.eop-nam12.prod.protection.outlook.com (2603:10b6:5:3b2:cafe::40) by DS7PR03CA0176.outlook.office365.com (2603:10b6:5:3b2::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37 via Frontend Transport; Mon, 20 Mar 2023 19:57:00 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 69.171.232.181) smtp.mailfrom=meta.com; dkim=none (message not signed) header.d=none;dmarc=fail action=oreject header.from=meta.com; Received-SPF: Fail (protection.outlook.com: domain of meta.com does not designate 69.171.232.181 as permitted sender) receiver=protection.outlook.com; client-ip=69.171.232.181; helo=69-171-232-181.mail-mxout.facebook.com; Received: from 69-171-232-181.mail-mxout.facebook.com (69.171.232.181) by DM6NAM12FT079.mail.protection.outlook.com (10.13.178.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6222.11 via Frontend Transport; Mon, 20 Mar 2023 19:57:04 +0000 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 46A7E7D4C175; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com Cc: Kui-Feng Lee Subject: [PATCH bpf-next v9 1/8] bpf: Retire the struct_ops map kvalue->refcnt. Date: Mon, 20 Mar 2023 12:56:37 -0700 Message-Id: <20230320195644.1953096-2-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM12FT079:EE_|BY1PR15MB5957:EE_ X-MS-Office365-Filtering-Correlation-Id: a2757c9c-35d7-4db1-27b1-08db297d4717 X-ETR: Bypass spam filtering X-FB-Source: Internal X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 6ijWJrVDf7r46NqYc1DVgffHrDExVzJrpTjUlixFVnN5tY3ZVk7u1IliJh1JMI5IMXzsN/U0C3EhSaVZzlt67SqyXaI3SUZrjn2yoOPRInbEmKxWSzAU+96Q3lnZg6AxjNjsKcYi3F+mWeIyOy5UVfCwfWv996RD8//YnKbk9YADSRi0XsT+ttHwqq/Rg5dWFPH1Qs2N95e+qPTOrt2HdIOA7iXXPGUPzL9mhVO9L9WAx98R0oMov/dydFfnudUNzzjeKiYF2K0S06PispP8FP2J81FJOFSA1a/xoCCOwlOk2BDsFv+7wHG5SQOTj2Meo4/+0sK2B+RynoLBHABfep7FpO3AnCZqux7dlDXcMu9o78R9VoiJBWKHDDMcHmlpobTQyhajr+cJv3RMFtdCHounCkuJ5MCdr4pn8IDo70XPbKRQIcyBIwTwcFgAuiEfKUNDD2aF8BrqvlxmtmoW3zl2AwFWEAqcDAY/dD097W8BqV1ZpD+DXWHTrygF6a/b9Q/+S+pI8ILR2HqnhQ2U+Gl48/3zyz7vabwJ7h/yClFgkhiD2fcTPfBDj2Xb3bzHymqmADZGjm6vKpEcdJGjmRhLHx7/l8U6DVONp3jlDOgz/Y3CT+RQTqMF1yvRa6q8yOM2/pGibCJRzkxQcW5VaCJtc1aOXWEBcbR6L/3iICafuOm1sIc+BbF6BIGdJ9+3QgFR6FQf2+MgGuPncttozQ== X-Forefront-Antispam-Report: CIP:69.171.232.181;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:69-171-232-181.mail-mxout.facebook.com;PTR:69-171-232-181.mail-mxout.facebook.com;CAT:NONE;SFS:(13230025)(4636009)(136003)(376002)(39860400002)(346002)(396003)(451199018)(46966006)(40470700004)(36840700001)(66899018)(33570700077)(47076005)(2616005)(83380400001)(82310400005)(42186006)(316002)(6666004)(478600001)(26005)(6266002)(107886003)(186003)(5660300002)(1076003)(36860700001)(7636003)(7596003)(356005)(40460700003)(86362001)(336012)(8936002)(70206006)(2906002)(40480700001)(4326008)(82740400003)(41300700001)(36756003)(8676002);DIR:OUT;SFP:1501; X-OriginatorOrg: meta.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Mar 2023 19:57:04.2314 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a2757c9c-35d7-4db1-27b1-08db297d4717 X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=8ae927fe-1255-47a7-a2af-5f3a069daaa2;Ip=[69.171.232.181];Helo=[69-171-232-181.mail-mxout.facebook.com] X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TreatMessagesAsInternal-DM6NAM12FT079.eop-nam12.prod.protection.outlook.com X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR15MB5957 X-Proofpoint-GUID: JCDIeBYPJBUo-_wydNO6kGFWcZ0Vjdxa X-Proofpoint-ORIG-GUID: JCDIeBYPJBUo-_wydNO6kGFWcZ0Vjdxa X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We have replaced kvalue-refcnt with synchronize_rcu() to wait for an RCU grace period. Maintenance of kvalue->refcnt was a complicated task, as we had to simultaneously keep track of two reference counts: one for the reference count of bpf_map. When the kvalue->refcnt reaches zero, we also have to reduce the reference count on bpf_map - yet these steps are not performed in an atomic manner and require us to be vigilant when managing them. By eliminating kvalue->refcnt, we can make our maintenance more straightforward as the refcount of bpf_map is now solely managed! To prevent the trampoline image of a struct_ops from being released while it is still in use, we wait for an RCU grace period. The setsockopt(TCP_CONGESTION, "...") command allows you to change your socket's congestion control algorithm and can result in releasing the old struct_ops implementation. It is fine. However, this function is exposed through bpf_setsockopt(), it may be accessed by BPF programs as well. To ensure that the trampoline image belonging to struct_op can be safely called while its method is in use, the trampoline safeguarde the BPF program with rcu_read_lock(). Doing so prevents any destruction of the associated images before returning from a trampoline and requires us to wait for an RCU grace period. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 1 + kernel/bpf/bpf_struct_ops.c | 73 ++++++++++++++++++++----------------- kernel/bpf/syscall.c | 6 ++- 3 files changed, 45 insertions(+), 35 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3ef98fb92987..3304c84fe021 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1945,6 +1945,7 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd); struct bpf_map *__bpf_map_get(struct fd f); void bpf_map_inc(struct bpf_map *map); void bpf_map_inc_with_uref(struct bpf_map *map); +struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref); struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map); void bpf_map_put_with_uref(struct bpf_map *map); void bpf_map_put(struct bpf_map *map); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 38903fb52f98..ca87258b42e9 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -11,6 +11,7 @@ #include #include #include +#include enum bpf_struct_ops_state { BPF_STRUCT_OPS_STATE_INIT, @@ -249,6 +250,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; struct bpf_struct_ops_value *uvalue, *kvalue; enum bpf_struct_ops_state state; + s64 refcnt; if (unlikely(*(u32 *)key != 0)) return -ENOENT; @@ -267,7 +269,14 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, uvalue = value; memcpy(uvalue, st_map->uvalue, map->value_size); uvalue->state = state; - refcount_set(&uvalue->refcnt, refcount_read(&kvalue->refcnt)); + + /* This value offers the user space a general estimate of how + * many sockets are still utilizing this struct_ops for TCP + * congestion control. The number might not be exact, but it + * should sufficiently meet our present goals. + */ + refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt); + refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0)); return 0; } @@ -491,7 +500,6 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, *(unsigned long *)(udata + moff) = prog->aux->id; } - refcount_set(&kvalue->refcnt, 1); bpf_map_inc(map); set_memory_rox((long)st_map->image, 1); @@ -536,8 +544,7 @@ static int bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) switch (prev_state) { case BPF_STRUCT_OPS_STATE_INUSE: st_map->st_ops->unreg(&st_map->kvalue.data); - if (refcount_dec_and_test(&st_map->kvalue.refcnt)) - bpf_map_put(map); + bpf_map_put(map); return 0; case BPF_STRUCT_OPS_STATE_TOBEFREE: return -EINPROGRESS; @@ -570,7 +577,7 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, kfree(value); } -static void bpf_struct_ops_map_free(struct bpf_map *map) +static void __bpf_struct_ops_map_free(struct bpf_map *map) { struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; @@ -582,6 +589,24 @@ static void bpf_struct_ops_map_free(struct bpf_map *map) bpf_map_area_free(st_map); } +static void bpf_struct_ops_map_free(struct bpf_map *map) +{ + /* The struct_ops's function may switch to another struct_ops. + * + * For example, bpf_tcp_cc_x->init() may switch to + * another tcp_cc_y by calling + * setsockopt(TCP_CONGESTION, "tcp_cc_y"). + * During the switch, bpf_struct_ops_put(tcp_cc_x) is called + * and its refcount may reach 0 which then free its + * trampoline image while tcp_cc_x is still running. + * + * Thus, a rcu grace period is needed here. + */ + synchronize_rcu_mult(call_rcu, call_rcu_tasks); + + __bpf_struct_ops_map_free(map); +} + static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr) { if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 || @@ -630,7 +655,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) NUMA_NO_NODE); st_map->image = bpf_jit_alloc_exec(PAGE_SIZE); if (!st_map->uvalue || !st_map->links || !st_map->image) { - bpf_struct_ops_map_free(map); + __bpf_struct_ops_map_free(map); return ERR_PTR(-ENOMEM); } @@ -676,41 +701,23 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = { bool bpf_struct_ops_get(const void *kdata) { struct bpf_struct_ops_value *kvalue; + struct bpf_struct_ops_map *st_map; + struct bpf_map *map; kvalue = container_of(kdata, struct bpf_struct_ops_value, data); + st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue); - return refcount_inc_not_zero(&kvalue->refcnt); -} - -static void bpf_struct_ops_put_rcu(struct rcu_head *head) -{ - struct bpf_struct_ops_map *st_map; - - st_map = container_of(head, struct bpf_struct_ops_map, rcu); - bpf_map_put(&st_map->map); + map = __bpf_map_inc_not_zero(&st_map->map, false); + return !IS_ERR(map); } void bpf_struct_ops_put(const void *kdata) { struct bpf_struct_ops_value *kvalue; + struct bpf_struct_ops_map *st_map; kvalue = container_of(kdata, struct bpf_struct_ops_value, data); - if (refcount_dec_and_test(&kvalue->refcnt)) { - struct bpf_struct_ops_map *st_map; - - st_map = container_of(kvalue, struct bpf_struct_ops_map, - kvalue); - /* The struct_ops's function may switch to another struct_ops. - * - * For example, bpf_tcp_cc_x->init() may switch to - * another tcp_cc_y by calling - * setsockopt(TCP_CONGESTION, "tcp_cc_y"). - * During the switch, bpf_struct_ops_put(tcp_cc_x) is called - * and its map->refcnt may reach 0 which then free its - * trampoline image while tcp_cc_x is still running. - * - * Thus, a rcu grace period is needed here. - */ - call_rcu(&st_map->rcu, bpf_struct_ops_put_rcu); - } + st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue); + + bpf_map_put(&st_map->map); } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 099e9068bcdd..cff0348a2871 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1303,8 +1303,10 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) return map; } -/* map_idr_lock should have been held */ -static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) +/* map_idr_lock should have been held or the map should have been + * protected by rcu read lock. + */ +struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) { int refold; From patchwork Mon Mar 20 19:56:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181797 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CBD1C7619A for ; Mon, 20 Mar 2023 19:57:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230098AbjCTT5E (ORCPT ); Mon, 20 Mar 2023 15:57:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229940AbjCTT5B (ORCPT ); Mon, 20 Mar 2023 15:57:01 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A006F211C3 for ; Mon, 20 Mar 2023 12:56:59 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH7X45023803 for ; Mon, 20 Mar 2023 12:56:59 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=BjQy3aUUufJ+026SPDGyGiUVo6uCwdZYOChPF0HQ8Ko=; b=B52XEYk0R77gbQ8KOIIrtstN8oV9oy6frHaunRGCoUEfOT9xNPPx+TlcmJkAJEODbEFW crYoNm4bkFy6GbtmstKy8dmKTqlYlgnaGpPmTk20mZyUsR6XMVkJ0vrz4TnvDsQb0rqE hpnWnKTyQByTTmZHucbLZV8mJJ0zIfDmiJ1+oGVxxwLAqx83nhljiXPpuICGCKM3b9fx vHYQ/tZu2QgxzLp2hDVxqTFkWw1sIFijyi5EKTAM7x5sNF6J8GR3tZiYascOKtHrPwAu clS/OoLZo4t365zg7xrwOOeLrHxfPQIg02PphYiIlWufnUKeAQih8+Ys0s7/1xDQGOrv tg== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3pd8mrv0t0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 20 Mar 2023 12:56:59 -0700 Received: from twshared21760.39.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Mon, 20 Mar 2023 12:56:56 -0700 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 54E5D7D4C178; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee , , Eric Dumazet Subject: [PATCH bpf-next v9 2/8] net: Update an existing TCP congestion control algorithm. Date: Mon, 20 Mar 2023 12:56:38 -0700 Message-ID: <20230320195644.1953096-3-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: jziiItVaZk0dfvvsoPQrZDAoPGNkrCPR X-Proofpoint-ORIG-GUID: jziiItVaZk0dfvvsoPQrZDAoPGNkrCPR X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This feature lets you immediately transition to another congestion control algorithm or implementation with the same name. Once a name is updated, new connections will apply this new algorithm. The purpose is to update a customized algorithm implemented in BPF struct_ops with a new version on the flight. The following is an example of using the userspace API implemented in later BPF patches. link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); ....... err = bpf_link__update_map(link, skel->maps.ca_update_2); We first load and register an algorithm implemented in BPF struct_ops, then swap it out with a new one using the same name. After that, newly created connections will apply the updated algorithm, while older ones retain the previous version already applied. This patch also takes this chance to refactor the ca validation into the new tcp_validate_congestion_control() function. Cc: netdev@vger.kernel.org, Eric Dumazet Signed-off-by: Kui-Feng Lee --- include/net/tcp.h | 3 +++ net/ipv4/tcp_cong.c | 65 ++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 61 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index db9f828e9d1e..2abb755e6a3a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1117,6 +1117,9 @@ struct tcp_congestion_ops { int tcp_register_congestion_control(struct tcp_congestion_ops *type); void tcp_unregister_congestion_control(struct tcp_congestion_ops *type); +int tcp_update_congestion_control(struct tcp_congestion_ops *type, + struct tcp_congestion_ops *old_type); +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca); void tcp_assign_congestion_control(struct sock *sk); void tcp_init_congestion_control(struct sock *sk); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index db8b4b488c31..e677d0bc12ad 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -75,14 +75,8 @@ struct tcp_congestion_ops *tcp_ca_find_key(u32 key) return NULL; } -/* - * Attach new congestion control algorithm to the list - * of available options. - */ -int tcp_register_congestion_control(struct tcp_congestion_ops *ca) +int tcp_validate_congestion_control(struct tcp_congestion_ops *ca) { - int ret = 0; - /* all algorithms must implement these */ if (!ca->ssthresh || !ca->undo_cwnd || !(ca->cong_avoid || ca->cong_control)) { @@ -90,6 +84,20 @@ int tcp_register_congestion_control(struct tcp_congestion_ops *ca) return -EINVAL; } + return 0; +} + +/* Attach new congestion control algorithm to the list + * of available options. + */ +int tcp_register_congestion_control(struct tcp_congestion_ops *ca) +{ + int ret; + + ret = tcp_validate_congestion_control(ca); + if (ret) + return ret; + ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); spin_lock(&tcp_cong_list_lock); @@ -130,6 +138,49 @@ void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca) } EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control); +/* Replace a registered old ca with a new one. + * + * The new ca must have the same name as the old one, that has been + * registered. + */ +int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca) +{ + struct tcp_congestion_ops *existing; + int ret; + + ret = tcp_validate_congestion_control(ca); + if (ret) + return ret; + + ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); + + spin_lock(&tcp_cong_list_lock); + existing = tcp_ca_find_key(old_ca->key); + if (ca->key == TCP_CA_UNSPEC || !existing || strcmp(existing->name, ca->name)) { + pr_notice("%s not registered or non-unique key\n", + ca->name); + ret = -EINVAL; + } else if (existing != old_ca) { + pr_notice("invalid old congestion control algorithm to replace\n"); + ret = -EINVAL; + } else { + /* Add the new one before removing the old one to keep + * one implementation available all the time. + */ + list_add_tail_rcu(&ca->list, &tcp_cong_list); + list_del_rcu(&existing->list); + pr_debug("%s updated\n", ca->name); + } + spin_unlock(&tcp_cong_list_lock); + + /* Wait for outstanding readers to complete before the + * module or struct_ops gets removed entirely. + */ + synchronize_rcu(); + + return ret; +} + u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca) { const struct tcp_congestion_ops *ca; From patchwork Mon Mar 20 19:56:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181801 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2762C7619A for ; Mon, 20 Mar 2023 19:57:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229912AbjCTT5O (ORCPT ); Mon, 20 Mar 2023 15:57:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230244AbjCTT5M (ORCPT ); Mon, 20 Mar 2023 15:57:12 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E4A622785 for ; Mon, 20 Mar 2023 12:57:09 -0700 (PDT) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH7dhN021673 for ; Mon, 20 Mar 2023 12:57:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=ApcoQyEJMJHl3eVsyLv62eZLnphmfRwe1cxINOQ0yQM=; b=DXHOK/u4dAa2dLGjBKeQQgVfgph/TtasY8/AmxTzHdugfcKQxoNAc9fjLaAkg82H8Xzq 0zkgDKy7Tp21gPBqsETodZr9+Ic9GgunhRaIfppJjm3FpIEB3LKbETx+W1KDHdhQIEHh LA/xD3VS6aLgwdNDBtglGnvoENNMwV76kga0rV/jKSahHpC7+gm3mtvePuT6Gat15FNB ijvaigJkeI+0ZCfvpyY+HfNoQtEKKI2U4wd4BNC5NiFiFpSw5jy/Kk8VsXuqSs9PkUQ9 IYiw46z0LcKca+GCR8W/oCbcOjkXj48NoICWl+jrqBMfqLedHT6IUImlBixvsrDEOFq0 2g== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3peq8hb4x8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 20 Mar 2023 12:57:08 -0700 Received: from twshared21760.39.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Mon, 20 Mar 2023 12:57:06 -0700 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 631D17D4C17A; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v9 3/8] bpf: Create links for BPF struct_ops maps. Date: Mon, 20 Mar 2023 12:56:39 -0700 Message-ID: <20230320195644.1953096-4-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: pnRqVCBE9B5f00CBllS6ffND0YCipJs3 X-Proofpoint-ORIG-GUID: pnRqVCBE9B5f00CBllS6ffND0YCipJs3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Make bpf_link support struct_ops. Previously, struct_ops were always used alone without any associated links. Upon updating its value, a struct_ops would be activated automatically. Yet other BPF program types required to make a bpf_link with their instances before they could become active. Now, however, you can create an inactive struct_ops, and create a link to activate it later. With bpf_links, struct_ops has a behavior similar to other BPF program types. You can pin/unpin them from their links and the struct_ops will be deactivated when its link is removed while previously need someone to delete the value for it to be deactivated. bpf_links are responsible for registering their associated struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag set to create a bpf_link, while a structs without this flag behaves in the same manner as before and is registered upon updating its value. The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it used to craft the links for BPF struct_ops programs, but also to create links for BPF struct_ops them-self. Since the links of BPF struct_ops programs are only used to create trampolines internally, they are never seen in other contexts. Thus, they can be reused for struct_ops themself. To maintain a reference to the map supporting this link, we add bpf_struct_ops_link as an additional type. The pointer of the map is RCU and won't be necessary until later in the patchset. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 7 ++ include/uapi/linux/bpf.h | 12 ++- kernel/bpf/bpf_struct_ops.c | 143 ++++++++++++++++++++++++++++++++- kernel/bpf/syscall.c | 23 ++++-- net/ipv4/bpf_tcp_ca.c | 8 +- tools/include/uapi/linux/bpf.h | 12 ++- 6 files changed, 190 insertions(+), 15 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3304c84fe021..2faf01fa3f04 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1518,6 +1518,7 @@ struct bpf_struct_ops { void *kdata, const void *udata); int (*reg)(void *kdata); void (*unreg)(void *kdata); + int (*validate)(void *kdata); const struct btf_type *type; const struct btf_type *value_type; const char *name; @@ -1552,6 +1553,7 @@ static inline void bpf_module_put(const void *data, struct module *owner) else module_put(owner); } +int bpf_struct_ops_link_create(union bpf_attr *attr); #ifdef CONFIG_NET /* Define it here to avoid the use of forward declaration */ @@ -1592,6 +1594,11 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, { return -EINVAL; } +static inline int bpf_struct_ops_link_create(union bpf_attr *attr) +{ + return -EOPNOTSUPP; +} + #endif #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 13129df937cd..42f40ee083bf 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1033,6 +1033,7 @@ enum bpf_attach_type { BPF_PERF_EVENT, BPF_TRACE_KPROBE_MULTI, BPF_LSM_CGROUP, + BPF_STRUCT_OPS, __MAX_BPF_ATTACH_TYPE }; @@ -1266,6 +1267,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Create a map that will be registered/unregesitered by the backed bpf_link */ + BPF_F_LINK = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ @@ -1507,7 +1511,10 @@ union bpf_attr { } task_fd_query; struct { /* struct used by BPF_LINK_CREATE command */ - __u32 prog_fd; /* eBPF program to attach */ + union { + __u32 prog_fd; /* eBPF program to attach */ + __u32 map_fd; /* struct_ops to attach */ + }; union { __u32 target_fd; /* object to attach to */ __u32 target_ifindex; /* target ifindex */ @@ -6379,6 +6386,9 @@ struct bpf_link_info { struct { __u32 ifindex; } xdp; + struct { + __u32 map_id; + } struct_ops; }; } __attribute__((aligned(8))); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index ca87258b42e9..5e77d1d4a7f5 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -17,6 +17,7 @@ enum bpf_struct_ops_state { BPF_STRUCT_OPS_STATE_INIT, BPF_STRUCT_OPS_STATE_INUSE, BPF_STRUCT_OPS_STATE_TOBEFREE, + BPF_STRUCT_OPS_STATE_READY, }; #define BPF_STRUCT_OPS_COMMON_VALUE \ @@ -59,6 +60,11 @@ struct bpf_struct_ops_map { struct bpf_struct_ops_value kvalue; }; +struct bpf_struct_ops_link { + struct bpf_link link; + struct bpf_map __rcu *map; +}; + #define VALUE_PREFIX "bpf_struct_ops_" #define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1) @@ -500,11 +506,29 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, *(unsigned long *)(udata + moff) = prog->aux->id; } - bpf_map_inc(map); + if (st_map->map.map_flags & BPF_F_LINK) { + err = st_ops->validate(kdata); + if (err) + goto reset_unlock; + set_memory_rox((long)st_map->image, 1); + /* Let bpf_link handle registration & unregistration. + * + * Pair with smp_load_acquire() during lookup_elem(). + */ + smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY); + goto unlock; + } set_memory_rox((long)st_map->image, 1); err = st_ops->reg(kdata); if (likely(!err)) { + /* This refcnt increment on the map here after + * 'st_ops->reg()' is secure since the state of the + * map must be set to INIT at this moment, and thus + * bpf_struct_ops_map_delete_elem() can't unregister + * or transition it to TOBEFREE concurrently. + */ + bpf_map_inc(map); /* Pair with smp_load_acquire() during lookup_elem(). * It ensures the above udata updates (e.g. prog->aux->id) * can be seen once BPF_STRUCT_OPS_STATE_INUSE is set. @@ -520,7 +544,6 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, */ set_memory_nx((long)st_map->image, 1); set_memory_rw((long)st_map->image, 1); - bpf_map_put(map); reset_unlock: bpf_struct_ops_map_put_progs(st_map); @@ -538,6 +561,9 @@ static int bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) struct bpf_struct_ops_map *st_map; st_map = (struct bpf_struct_ops_map *)map; + if (st_map->map.map_flags & BPF_F_LINK) + return -EOPNOTSUPP; + prev_state = cmpxchg(&st_map->kvalue.state, BPF_STRUCT_OPS_STATE_INUSE, BPF_STRUCT_OPS_STATE_TOBEFREE); @@ -610,7 +636,7 @@ static void bpf_struct_ops_map_free(struct bpf_map *map) static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr) { if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 || - attr->map_flags || !attr->btf_vmlinux_value_type_id) + (attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id) return -EINVAL; return 0; } @@ -634,6 +660,9 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) if (attr->value_size != vt->size) return ERR_PTR(-EINVAL); + if (attr->map_flags & BPF_F_LINK && !st_ops->validate) + return ERR_PTR(-EOPNOTSUPP); + t = st_ops->type; st_map_size = sizeof(*st_map) + @@ -721,3 +750,111 @@ void bpf_struct_ops_put(const void *kdata) bpf_map_put(&st_map->map); } + +static bool bpf_struct_ops_valid_to_reg(struct bpf_map *map) +{ + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; + + return map->map_type == BPF_MAP_TYPE_STRUCT_OPS && + map->map_flags & BPF_F_LINK && + /* Pair with smp_store_release() during map_update */ + smp_load_acquire(&st_map->kvalue.state) == BPF_STRUCT_OPS_STATE_READY; +} + +static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link) +{ + struct bpf_struct_ops_link *st_link; + struct bpf_struct_ops_map *st_map; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + st_map = (struct bpf_struct_ops_map *) + rcu_dereference_protected(st_link->map, true); + if (st_map) { + /* st_link->map can be NULL if + * bpf_struct_ops_link_create() fails to register. + */ + st_map->st_ops->unreg(&st_map->kvalue.data); + bpf_map_put(&st_map->map); + } + kfree(st_link); +} + +static void bpf_struct_ops_map_link_show_fdinfo(const struct bpf_link *link, + struct seq_file *seq) +{ + struct bpf_struct_ops_link *st_link; + struct bpf_map *map; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + rcu_read_lock(); + map = rcu_dereference(st_link->map); + seq_printf(seq, "map_id:\t%d\n", map->id); + rcu_read_unlock(); +} + +static int bpf_struct_ops_map_link_fill_link_info(const struct bpf_link *link, + struct bpf_link_info *info) +{ + struct bpf_struct_ops_link *st_link; + struct bpf_map *map; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + rcu_read_lock(); + map = rcu_dereference(st_link->map); + info->struct_ops.map_id = map->id; + rcu_read_unlock(); + return 0; +} + +static const struct bpf_link_ops bpf_struct_ops_map_lops = { + .dealloc = bpf_struct_ops_map_link_dealloc, + .show_fdinfo = bpf_struct_ops_map_link_show_fdinfo, + .fill_link_info = bpf_struct_ops_map_link_fill_link_info, +}; + +int bpf_struct_ops_link_create(union bpf_attr *attr) +{ + struct bpf_struct_ops_link *link = NULL; + struct bpf_link_primer link_primer; + struct bpf_struct_ops_map *st_map; + struct bpf_map *map; + int err; + + map = bpf_map_get(attr->link_create.map_fd); + if (!map) + return -EINVAL; + + st_map = (struct bpf_struct_ops_map *)map; + + if (!bpf_struct_ops_valid_to_reg(map)) { + err = -EINVAL; + goto err_out; + } + + link = kzalloc(sizeof(*link), GFP_USER); + if (!link) { + err = -ENOMEM; + goto err_out; + } + bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_map_lops, NULL); + + err = bpf_link_prime(&link->link, &link_primer); + if (err) + goto err_out; + + err = st_map->st_ops->reg(st_map->kvalue.data); + if (err) { + bpf_link_cleanup(&link_primer); + link = NULL; + goto err_out; + } + RCU_INIT_POINTER(link->map, map); + + return bpf_link_settle(&link_primer); + +err_out: + bpf_map_put(map); + kfree(link); + return err; +} + diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cff0348a2871..21f76698875c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2825,16 +2825,19 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) const struct bpf_prog *prog = link->prog; char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; - bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); seq_printf(m, "link_type:\t%s\n" - "link_id:\t%u\n" - "prog_tag:\t%s\n" - "prog_id:\t%u\n", + "link_id:\t%u\n", bpf_link_type_strs[link->type], - link->id, - prog_tag, - prog->aux->id); + link->id); + if (prog) { + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); + seq_printf(m, + "prog_tag:\t%s\n" + "prog_id:\t%u\n", + prog_tag, + prog->aux->id); + } if (link->ops->show_fdinfo) link->ops->show_fdinfo(link, m); } @@ -4314,7 +4317,8 @@ static int bpf_link_get_info_by_fd(struct file *file, info.type = link->type; info.id = link->id; - info.prog_id = link->prog->aux->id; + if (link->prog) + info.prog_id = link->prog->aux->id; if (link->ops->fill_link_info) { err = link->ops->fill_link_info(link, &info); @@ -4577,6 +4581,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) if (CHECK_ATTR(BPF_LINK_CREATE)) return -EINVAL; + if (attr->link_create.attach_type == BPF_STRUCT_OPS) + return bpf_struct_ops_link_create(attr); + prog = bpf_prog_get(attr->link_create.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 13fc0c185cd9..bbbd5eb94db2 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -239,8 +239,6 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t, if (bpf_obj_name_cpy(tcp_ca->name, utcp_ca->name, sizeof(tcp_ca->name)) <= 0) return -EINVAL; - if (tcp_ca_find(utcp_ca->name)) - return -EEXIST; return 1; } @@ -266,6 +264,11 @@ static void bpf_tcp_ca_unreg(void *kdata) tcp_unregister_congestion_control(kdata); } +static int bpf_tcp_ca_validate(void *kdata) +{ + return tcp_validate_congestion_control(kdata); +} + struct bpf_struct_ops bpf_tcp_congestion_ops = { .verifier_ops = &bpf_tcp_ca_verifier_ops, .reg = bpf_tcp_ca_reg, @@ -273,6 +276,7 @@ struct bpf_struct_ops bpf_tcp_congestion_ops = { .check_member = bpf_tcp_ca_check_member, .init_member = bpf_tcp_ca_init_member, .init = bpf_tcp_ca_init, + .validate = bpf_tcp_ca_validate, .name = "tcp_congestion_ops", }; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 13129df937cd..9cf1deaf21f2 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1033,6 +1033,7 @@ enum bpf_attach_type { BPF_PERF_EVENT, BPF_TRACE_KPROBE_MULTI, BPF_LSM_CGROUP, + BPF_STRUCT_OPS, __MAX_BPF_ATTACH_TYPE }; @@ -1266,6 +1267,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Create a map that will be registered/unregesitered by the backed bpf_link */ + BPF_F_LINK = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ @@ -1507,7 +1511,10 @@ union bpf_attr { } task_fd_query; struct { /* struct used by BPF_LINK_CREATE command */ - __u32 prog_fd; /* eBPF program to attach */ + union { + __u32 prog_fd; /* eBPF program to attach */ + __u32 map_fd; /* eBPF struct_ops to attach */ + }; union { __u32 target_fd; /* object to attach to */ __u32 target_ifindex; /* target ifindex */ @@ -6379,6 +6386,9 @@ struct bpf_link_info { struct { __u32 ifindex; } xdp; + struct { + __u32 map_id; + } struct_ops; }; } __attribute__((aligned(8))); From patchwork Mon Mar 20 19:56:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181795 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B8E1C6FD1C for ; Mon, 20 Mar 2023 19:57:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229967AbjCTT5B (ORCPT ); Mon, 20 Mar 2023 15:57:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjCTT5A (ORCPT ); Mon, 20 Mar 2023 15:57:00 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2955D211DA for ; Mon, 20 Mar 2023 12:56:58 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH7aBk012664 for ; Mon, 20 Mar 2023 12:56:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=+vvs/l+gkw85c6C8w//C3e34B5mEhSjVJPXVhKGKCeI=; b=FsCUNrQdCTsjD1FHn6PwJbEKb9a4K0D3Bo+ENlJ7OvEfMfPp33r1Seu/RLSYObblAvlw sTOaXEDBChh8AiHhHgjHl2SDaSl+R2XzIJQgHBdvUBtUGR9iqbT4U/V+NAC7XOnLT+/z yMnw0ZlLyjgEFdu+K6PY8AkDntspGvunz2HO92Yb9YG+g7/Nr6zMXXc7QgxH7fdCCNZD 1gIMY0Ek6s/HrEcLwxGOlgMe5PwLLnKxKm4tB2akCqznz1FI5wsBzluSH6CtN3f+yA1d mGB4CO7IlfQ+fEueIBOBoAFJLrFoU1kfa0iD8gn1lcDKkgJQi126GhEyDbKJPgVwy/C+ JA== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3pdau1key8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 20 Mar 2023 12:56:57 -0700 Received: from twshared21760.39.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Mon, 20 Mar 2023 12:56:56 -0700 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 6B4C47D4C17C; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: , , , , , , CC: Kui-Feng Lee Subject: [PATCH bpf-next v9 4/8] libbpf: Create a bpf_link in bpf_map__attach_struct_ops(). Date: Mon, 20 Mar 2023 12:56:40 -0700 Message-ID: <20230320195644.1953096-5-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 91Yi6EW0dvjdpHsO1VgS8HlLZ2aRASgh X-Proofpoint-GUID: 91Yi6EW0dvjdpHsO1VgS8HlLZ2aRASgh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net bpf_map__attach_struct_ops() was creating a dummy bpf_link as a placeholder, but now it is constructing an authentic one by calling bpf_link_create() if the map has the BPF_F_LINK flag. You can flag a struct_ops map with BPF_F_LINK by calling bpf_map__set_map_flags(). Signed-off-by: Kui-Feng Lee --- tools/lib/bpf/libbpf.c | 90 +++++++++++++++++++++++++++++++----------- 1 file changed, 66 insertions(+), 24 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 4c34fbd7b5be..56a60ab2ca8f 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -116,6 +116,7 @@ static const char * const attach_type_name[] = { [BPF_SK_REUSEPORT_SELECT_OR_MIGRATE] = "sk_reuseport_select_or_migrate", [BPF_PERF_EVENT] = "perf_event", [BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi", + [BPF_STRUCT_OPS] = "struct_ops", }; static const char * const link_type_name[] = { @@ -7683,6 +7684,37 @@ static int bpf_object__resolve_externs(struct bpf_object *obj, return 0; } +static void bpf_map_prepare_vdata(const struct bpf_map *map) +{ + struct bpf_struct_ops *st_ops; + __u32 i; + + st_ops = map->st_ops; + for (i = 0; i < btf_vlen(st_ops->type); i++) { + struct bpf_program *prog = st_ops->progs[i]; + void *kern_data; + int prog_fd; + + if (!prog) + continue; + + prog_fd = bpf_program__fd(prog); + kern_data = st_ops->kern_vdata + st_ops->kern_func_off[i]; + *(unsigned long *)kern_data = prog_fd; + } +} + +static int bpf_object_prepare_struct_ops(struct bpf_object *obj) +{ + int i; + + for (i = 0; i < obj->nr_maps; i++) + if (bpf_map__is_struct_ops(&obj->maps[i])) + bpf_map_prepare_vdata(&obj->maps[i]); + + return 0; +} + static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const char *target_btf_path) { int err, i; @@ -7708,6 +7740,7 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch err = err ? : bpf_object__relocate(obj, obj->btf_custom_path ? : target_btf_path); err = err ? : bpf_object__load_progs(obj, extra_log_level); err = err ? : bpf_object_init_prog_arrays(obj); + err = err ? : bpf_object_prepare_struct_ops(obj); if (obj->gen_loader) { /* reset FDs */ @@ -11572,22 +11605,30 @@ struct bpf_link *bpf_program__attach(const struct bpf_program *prog) return link; } +struct bpf_link_struct_ops { + struct bpf_link link; + int map_fd; +}; + static int bpf_link__detach_struct_ops(struct bpf_link *link) { + struct bpf_link_struct_ops *st_link; __u32 zero = 0; - if (bpf_map_delete_elem(link->fd, &zero)) - return -errno; + st_link = container_of(link, struct bpf_link_struct_ops, link); - return 0; + if (st_link->map_fd < 0) + /* w/o a real link */ + return bpf_map_delete_elem(link->fd, &zero); + + return close(link->fd); } struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) { - struct bpf_struct_ops *st_ops; - struct bpf_link *link; - __u32 i, zero = 0; - int err; + struct bpf_link_struct_ops *link; + __u32 zero = 0; + int err, fd; if (!bpf_map__is_struct_ops(map) || map->fd == -1) return libbpf_err_ptr(-EINVAL); @@ -11596,31 +11637,32 @@ struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) if (!link) return libbpf_err_ptr(-EINVAL); - st_ops = map->st_ops; - for (i = 0; i < btf_vlen(st_ops->type); i++) { - struct bpf_program *prog = st_ops->progs[i]; - void *kern_data; - int prog_fd; + /* kern_vdata should be prepared during the loading phase. */ + err = bpf_map_update_elem(map->fd, &zero, map->st_ops->kern_vdata, 0); + if (err && err != -EBUSY) { + free(link); + return libbpf_err_ptr(err); + } - if (!prog) - continue; + link->link.detach = bpf_link__detach_struct_ops; - prog_fd = bpf_program__fd(prog); - kern_data = st_ops->kern_vdata + st_ops->kern_func_off[i]; - *(unsigned long *)kern_data = prog_fd; + if (!(map->def.map_flags & BPF_F_LINK)) { + /* w/o a real link */ + link->link.fd = map->fd; + link->map_fd = -1; + return &link->link; } - err = bpf_map_update_elem(map->fd, &zero, st_ops->kern_vdata, 0); - if (err) { - err = -errno; + fd = bpf_link_create(map->fd, 0, BPF_STRUCT_OPS, NULL); + if (fd < 0) { free(link); - return libbpf_err_ptr(err); + return libbpf_err_ptr(fd); } - link->detach = bpf_link__detach_struct_ops; - link->fd = map->fd; + link->link.fd = fd; + link->map_fd = map->fd; - return link; + return &link->link; } typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(struct perf_event_header *hdr, From patchwork Mon Mar 20 19:56:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181800 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B8C8C761A6 for ; Mon, 20 Mar 2023 19:57:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230032AbjCTT5P (ORCPT ); Mon, 20 Mar 2023 15:57:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230284AbjCTT5O (ORCPT ); Mon, 20 Mar 2023 15:57:14 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93811222D9 for ; Mon, 20 Mar 2023 12:57:10 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH7Yl9004136 for ; Mon, 20 Mar 2023 12:57:09 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=uRY8LMt1y5hIyig+tWUlb7KPDIEtTerR8dDtW0YeL3s=; b=MGEEPvmROlDZykhscEOrLZu9Z6EN0C2ToOp7eOBntOenAD4C8UmvwQQqVdhO3TezhJ9K 8qnuWt1KrFxpGpOcda7PyqeUHK745i3v/Dk1EhRiYzXyLWZuMwxx/dtAL7b79hclp/Jt G4hQS28OYepSTKbexl4l4/7mKiEYuEwALEjNrTFMTM4DxiXtECnh3NX7c9pi2Ilp/HMn Vx9ml3v7X7IHPIQJ5FqXAwWHxzh5Kuln56op3VoJYHBnZLE8DKXYTkurqJfcjELNzt4I VzAYWAmMg6olCAuH1xHwG8o2i0KVew3P3Xv+C8dY7LDnFSvILlb9r/tzBGMAEth+zuO3 UQ== Received: from nam10-bn7-obe.outbound.protection.outlook.com (mail-bn7nam10lp2105.outbound.protection.outlook.com [104.47.70.105]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3pdb4vbjh5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Mar 2023 12:57:09 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q8lh58+0+AyvVIO979LIXrFd1VHtYdqGoagyHGTHdW3xj0VBgpChg4BgW9aqe0HU5Cg4Po+uf5HU6qlPzy0JwnVVhKuKNVCYr5GKb6Zm9ibtcGno26dHS1TsSv3Yh3X/2G5rri7i4zoh4/r3Ma+hCCV4TSn6nfAGWi8elCMKXk4ux6pdY6f3XKiAmJHeQMCDfStpZOkz/1Q9mMH0+A912gR2Qc7JXvowvoim2+OCI6u/5gHoUiMuzyM1mhABtKLA/3BOA3its/wUFHR5mpqI8resCeQlqSMjnJIOWjBsSbPA+R2yMPfOjxkm5InLaL6XKvvysL2KAs+u1gLd3P36Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uRY8LMt1y5hIyig+tWUlb7KPDIEtTerR8dDtW0YeL3s=; b=G3JTjG3bp8x0L+azWxnd/CSjwMIEavEaBv94t1/TCPguieEPuiCtH75Jzj1cSTWVHeYIKrC9iWdEXuz0c98381RynI1OmidKXJ0dr5dbIjPt2buBkaDBz8KkCpa5ZhaFcnUYz7tOavDTHAr8OD95VFD+juW1vuezUclf56iTq5m4WhbOTbLyfgVpGsHqdEBuCQP69n3jg/8H/HmJXG4ytPbgJ3tYHpfVktVmoLqIoYt0ZH7pVhDfYW9Kk1xQfgU8acRutGyTczyU7gWQQss2FPGsdHT+hl3yb77wGz2X1XNw22R1qTb5pnEqkP1kob7v+k+FTV1ovuyApslv8bWpFQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 69.171.232.181) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=meta.com; dmarc=fail (p=reject sp=reject pct=100) action=oreject header.from=meta.com; dkim=none (message not signed); arc=none Received: from MW4PR03CA0197.namprd03.prod.outlook.com (2603:10b6:303:b8::22) by MW3PR15MB3849.namprd15.prod.outlook.com (2603:10b6:303:51::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37; Mon, 20 Mar 2023 19:57:07 +0000 Received: from MW2NAM12FT075.eop-nam12.prod.protection.outlook.com (2603:10b6:303:b8:cafe::f9) by MW4PR03CA0197.outlook.office365.com (2603:10b6:303:b8::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37 via Frontend Transport; Mon, 20 Mar 2023 19:57:07 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 69.171.232.181) smtp.mailfrom=meta.com; dkim=none (message not signed) header.d=none;dmarc=fail action=oreject header.from=meta.com; Received-SPF: Fail (protection.outlook.com: domain of meta.com does not designate 69.171.232.181 as permitted sender) receiver=protection.outlook.com; client-ip=69.171.232.181; helo=69-171-232-181.mail-mxout.facebook.com; Received: from 69-171-232-181.mail-mxout.facebook.com (69.171.232.181) by MW2NAM12FT075.mail.protection.outlook.com (10.13.181.223) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.2 via Frontend Transport; Mon, 20 Mar 2023 19:57:07 +0000 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 776F47D4C17E; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com Cc: Kui-Feng Lee Subject: [PATCH bpf-next v9 5/8] bpf: Update the struct_ops of a bpf_link. Date: Mon, 20 Mar 2023 12:56:41 -0700 Message-Id: <20230320195644.1953096-6-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MW2NAM12FT075:EE_|MW3PR15MB3849:EE_ X-MS-Office365-Filtering-Correlation-Id: 8e416564-2af8-4b38-89fe-08db297d4922 X-ETR: Bypass spam filtering X-FB-Source: Internal X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: hYy35v8P1SqliqCalMoIjPEfKBBf5mnbSXjtYpY3OUUxBym3uCpmY3YpaIzrm0Bups2f5yIfshtSp3db5HNlnaRZoBFtdcrsCfqNkUeRXbpyOX3KMEbT2h0853VUlBNzy76VogniR/dU1iPpur7lAQTo8HKqAm27fi62Hhv1TsT9BgZGJ+/phUz/+Ox7Ra1TLVj6w4dGn5Ue6Zc7+pHcASbdz/AINw7RmGomyIBjl6+eQvH6/HN30P4OZlwdnAYODRHdXQpegAMyLy66zgxwtpaio+Sm+EH64k6KPWC/lVBHfJmuHcJ5kREO/PRarU/u5gijYyzSXYOCDi4qTrN1HMdT2TjeA8Qh355uQOobRK9BmM1+rVWfcuReFFWwXg7fE5HrhGll2QH2CQwOCBsHcTWySDj2x7/RSYxVywOCewnQCcGA+QBDW/qZkEyXgSJqyVEHtJRncloR1nbzyGl9N3CjGvHtfhDqpBIsfbOOf4zDunNbj9ZsqDfT6iLXLDtvzudn7VtUw9lPB9kMbwhtzn1jKfwJnpaTA8Y8DuxNFZsImNVSJV3qq4BRFlYeR49GaG4xbzE3f/GwNQ3S2ETm7u46NPgp6yP7x8VysMawbbL58+8Ox3wOFQxFXScz7NHYV9kOAnCxpVOA/WZIscHjJhc1mtGSwQZx3HJdqiNzXZIqaRiFVE4iq6q3PJXABc3YW9UDL894YvZSV2WqUNX2sg== X-Forefront-Antispam-Report: CIP:69.171.232.181;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:69-171-232-181.mail-mxout.facebook.com;PTR:69-171-232-181.mail-mxout.facebook.com;CAT:NONE;SFS:(13230025)(4636009)(376002)(39860400002)(136003)(346002)(396003)(451199018)(46966006)(40470700004)(36840700001)(336012)(2616005)(26005)(1076003)(107886003)(47076005)(6666004)(6266002)(316002)(4326008)(83380400001)(186003)(478600001)(70206006)(42186006)(8676002)(5660300002)(8936002)(15650500001)(2906002)(7596003)(36860700001)(82740400003)(41300700001)(7636003)(356005)(66899018)(82310400005)(33570700077)(86362001)(36756003)(40480700001)(40460700003);DIR:OUT;SFP:1501; X-OriginatorOrg: meta.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Mar 2023 19:57:07.5677 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8e416564-2af8-4b38-89fe-08db297d4922 X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=8ae927fe-1255-47a7-a2af-5f3a069daaa2;Ip=[69.171.232.181];Helo=[69-171-232-181.mail-mxout.facebook.com] X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TreatMessagesAsInternal-MW2NAM12FT075.eop-nam12.prod.protection.outlook.com X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR15MB3849 X-Proofpoint-GUID: 5yJC-R5WAX06yPNqEEpDbMkTJUnGTNI4 X-Proofpoint-ORIG-GUID: 5yJC-R5WAX06yPNqEEpDbMkTJUnGTNI4 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net By improving the BPF_LINK_UPDATE command of bpf(), it should allow you to conveniently switch between different struct_ops on a single bpf_link. This would enable smoother transitions from one struct_ops to another. The struct_ops maps passing along with BPF_LINK_UPDATE should have the BPF_F_LINK flag. Signed-off-by: Kui-Feng Lee --- include/linux/bpf.h | 3 +++ include/uapi/linux/bpf.h | 21 +++++++++++---- kernel/bpf/bpf_struct_ops.c | 48 +++++++++++++++++++++++++++++++++- kernel/bpf/syscall.c | 34 ++++++++++++++++++++++++ net/ipv4/bpf_tcp_ca.c | 6 +++++ tools/include/uapi/linux/bpf.h | 21 +++++++++++---- 6 files changed, 122 insertions(+), 11 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2faf01fa3f04..29287a2d8b1b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1476,6 +1476,8 @@ struct bpf_link_ops { void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq); int (*fill_link_info)(const struct bpf_link *link, struct bpf_link_info *info); + int (*update_map)(struct bpf_link *link, struct bpf_map *new_map, + struct bpf_map *old_map); }; struct bpf_tramp_link { @@ -1518,6 +1520,7 @@ struct bpf_struct_ops { void *kdata, const void *udata); int (*reg)(void *kdata); void (*unreg)(void *kdata); + int (*update)(void *kdata, void *old_kdata); int (*validate)(void *kdata); const struct btf_type *type; const struct btf_type *value_type; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 42f40ee083bf..e3d3b5160d26 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1555,12 +1555,23 @@ union bpf_attr { struct { /* struct used by BPF_LINK_UPDATE command */ __u32 link_fd; /* link fd */ - /* new program fd to update link with */ - __u32 new_prog_fd; + union { + /* new program fd to update link with */ + __u32 new_prog_fd; + /* new struct_ops map fd to update link with */ + __u32 new_map_fd; + }; __u32 flags; /* extra flags */ - /* expected link's program fd; is specified only if - * BPF_F_REPLACE flag is set in flags */ - __u32 old_prog_fd; + union { + /* expected link's program fd; is specified only if + * BPF_F_REPLACE flag is set in flags. + */ + __u32 old_prog_fd; + /* expected link's map fd; is specified only + * if BPF_F_REPLACE flag is set. + */ + __u32 old_map_fd; + }; } link_update; struct { diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 5e77d1d4a7f5..71dd5d5d3ce5 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -65,6 +65,8 @@ struct bpf_struct_ops_link { struct bpf_map __rcu *map; }; +static DEFINE_MUTEX(update_mutex); + #define VALUE_PREFIX "bpf_struct_ops_" #define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1) @@ -660,7 +662,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) if (attr->value_size != vt->size) return ERR_PTR(-EINVAL); - if (attr->map_flags & BPF_F_LINK && !st_ops->validate) + if (attr->map_flags & BPF_F_LINK && (!st_ops->validate || !st_ops->update)) return ERR_PTR(-EOPNOTSUPP); t = st_ops->type; @@ -806,10 +808,54 @@ static int bpf_struct_ops_map_link_fill_link_info(const struct bpf_link *link, return 0; } +static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map *new_map, + struct bpf_map *expected_old_map) +{ + struct bpf_struct_ops_map *st_map, *old_st_map; + struct bpf_map *old_map; + struct bpf_struct_ops_link *st_link; + int err = 0; + + st_link = container_of(link, struct bpf_struct_ops_link, link); + st_map = container_of(new_map, struct bpf_struct_ops_map, map); + + if (!bpf_struct_ops_valid_to_reg(new_map)) + return -EINVAL; + + mutex_lock(&update_mutex); + + old_map = rcu_dereference_protected(st_link->map, lockdep_is_held(&update_mutex)); + if (expected_old_map && old_map != expected_old_map) { + err = -EINVAL; + goto err_out; + } + + old_st_map = container_of(old_map, struct bpf_struct_ops_map, map); + /* The new and old struct_ops must be the same type. */ + if (st_map->st_ops != old_st_map->st_ops) { + err = -EINVAL; + goto err_out; + } + + err = st_map->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data); + if (err) + goto err_out; + + bpf_map_inc(new_map); + rcu_assign_pointer(st_link->map, new_map); + bpf_map_put(old_map); + +err_out: + mutex_unlock(&update_mutex); + + return err; +} + static const struct bpf_link_ops bpf_struct_ops_map_lops = { .dealloc = bpf_struct_ops_map_link_dealloc, .show_fdinfo = bpf_struct_ops_map_link_show_fdinfo, .fill_link_info = bpf_struct_ops_map_link_fill_link_info, + .update_map = bpf_struct_ops_map_link_update, }; int bpf_struct_ops_link_create(union bpf_attr *attr) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 21f76698875c..b4d758fa5981 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4682,6 +4682,35 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) return ret; } +static int link_update_map(struct bpf_link *link, union bpf_attr *attr) +{ + struct bpf_map *new_map, *old_map = NULL; + int ret; + + new_map = bpf_map_get(attr->link_update.new_map_fd); + if (IS_ERR(new_map)) + return -EINVAL; + + if (attr->link_update.flags & BPF_F_REPLACE) { + old_map = bpf_map_get(attr->link_update.old_map_fd); + if (IS_ERR(old_map)) { + ret = -EINVAL; + goto out_put; + } + } else if (attr->link_update.old_map_fd) { + ret = -EINVAL; + goto out_put; + } + + ret = link->ops->update_map(link, new_map, old_map); + + if (old_map) + bpf_map_put(old_map); +out_put: + bpf_map_put(new_map); + return ret; +} + #define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd static int link_update(union bpf_attr *attr) @@ -4702,6 +4731,11 @@ static int link_update(union bpf_attr *attr) if (IS_ERR(link)) return PTR_ERR(link); + if (link->ops->update_map) { + ret = link_update_map(link, attr); + goto out_put_link; + } + new_prog = bpf_prog_get(attr->link_update.new_prog_fd); if (IS_ERR(new_prog)) { ret = PTR_ERR(new_prog); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index bbbd5eb94db2..e8b27826283e 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -264,6 +264,11 @@ static void bpf_tcp_ca_unreg(void *kdata) tcp_unregister_congestion_control(kdata); } +static int bpf_tcp_ca_update(void *kdata, void *old_kdata) +{ + return tcp_update_congestion_control(kdata, old_kdata); +} + static int bpf_tcp_ca_validate(void *kdata) { return tcp_validate_congestion_control(kdata); @@ -273,6 +278,7 @@ struct bpf_struct_ops bpf_tcp_congestion_ops = { .verifier_ops = &bpf_tcp_ca_verifier_ops, .reg = bpf_tcp_ca_reg, .unreg = bpf_tcp_ca_unreg, + .update = bpf_tcp_ca_update, .check_member = bpf_tcp_ca_check_member, .init_member = bpf_tcp_ca_init_member, .init = bpf_tcp_ca_init, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 9cf1deaf21f2..d6c5a022ae28 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1555,12 +1555,23 @@ union bpf_attr { struct { /* struct used by BPF_LINK_UPDATE command */ __u32 link_fd; /* link fd */ - /* new program fd to update link with */ - __u32 new_prog_fd; + union { + /* new program fd to update link with */ + __u32 new_prog_fd; + /* new struct_ops map fd to update link with */ + __u32 new_map_fd; + }; __u32 flags; /* extra flags */ - /* expected link's program fd; is specified only if - * BPF_F_REPLACE flag is set in flags */ - __u32 old_prog_fd; + union { + /* expected link's program fd; is specified only if + * BPF_F_REPLACE flag is set in flags. + */ + __u32 old_prog_fd; + /* expected link's map fd; is specified only + * if BPF_F_REPLACE flag is set. + */ + __u32 old_map_fd; + }; } link_update; struct { From patchwork Mon Mar 20 19:56:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181799 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B83D5C6FD1C for ; Mon, 20 Mar 2023 19:57:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbjCTT5O (ORCPT ); Mon, 20 Mar 2023 15:57:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230000AbjCTT5M (ORCPT ); Mon, 20 Mar 2023 15:57:12 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 272DB24BF3 for ; Mon, 20 Mar 2023 12:57:08 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH743g008139 for ; Mon, 20 Mar 2023 12:57:07 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=1FqxCMUrgUat4OB/sYZfAflSMoMxq5LRpdm3BCF4Mrw=; b=KwuN4FhIhF1zipBw6hT2dumGtezq1iOQ4AxXV5OxrR3Khn7DxjM3+sfoAY+9MK53U12s qfL2HLMLAqInDRdfRP1Z9qXl96LBeUyt8jOMuswqEB3riLBgkk6QqZbsq6NfjPPbqFoi p4yahyOBrClow4dqH2ARKDGLSbiCTenOl5jWMZ5qedhB+HNFveWO5PXJkCftHQTPmcBs IqFNMxUJqGgV9TzsxJLvypEFCeZg0JND2tU764mjG5bPBL9XlnwNKLxoR+69/wf1Md54 RNBlVdtQv2Qpx6cm+YeEbDTyEsLfjLUdYduZnNlpYkehnb8YCneqKJdP4/66Of7QSYNG 0g== Received: from nam12-bn8-obe.outbound.protection.outlook.com (mail-bn8nam12lp2171.outbound.protection.outlook.com [104.47.55.171]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3pdae1kuy2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Mar 2023 12:57:07 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GhlZGyblJ3sAjjPcxq3lMXLUrkW8Mqg6Z1gmVcVw+ckhIBbm0Y+klYeVvTXo0bWN7X8VkYA9yJnOm5ck2IRDu/wpOA8dke8BB27HncL6sp+Bhn8kABOhZRsKLxdJOZokSrOuSsVo5SDso98P6VcLRcKxEJcMxyH5jXWWE35GBEyQgiRLsZSURoda74aUK5mxCPnB8ik57MQNxB0M1UnKO4TJq3iV1pJzQG+wsMfK8uJvpMRWjQbP07SnV6tkfYSMW3tvI32AfJ/KdsxDkzeEF21uRJ+7f79aNeOK4M+/zrP5zKOUvhr5yHkow7fcVwbla0OyRJLhqCHHZjXNrRAFIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1FqxCMUrgUat4OB/sYZfAflSMoMxq5LRpdm3BCF4Mrw=; b=Ne7IDNox6NUvM1Lf/uzENhcbJGaT0Q7F5+hWeJkB6C82omxqL2nae59c2Fl4Bi79VmQ2xZo1TnvjB5Mv/lWs+vInJ0FyBdTF75QuSP1srGT9HKBzf36BqMxCR+y42wBvXCYB4Fi1lMQV24JGTQ9J37P0JpBpTMIaJ4QjExsYSyKob2D6ydB6SKGLNZPwvbD3si9KsxZE/4fwtLVosVV+fgxUAjn+XgFrM/bcKp4wFTtDsXC941m0g7/6KWiS2/wiB0khiN+lkKNF3Kdb4ef3uXY+kklLkf1zuVtrTffM0nLl8+l4qBK2sqkT6piwVWqsgdp8nEGgSJCy4oZsxnwNRw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 69.171.232.181) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=meta.com; dmarc=fail (p=reject sp=reject pct=100) action=oreject header.from=meta.com; dkim=none (message not signed); arc=none Received: from BN8PR07CA0035.namprd07.prod.outlook.com (2603:10b6:408:ac::48) by MN6PR15MB6050.namprd15.prod.outlook.com (2603:10b6:208:475::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.36; Mon, 20 Mar 2023 19:57:05 +0000 Received: from BN8NAM12FT063.eop-nam12.prod.protection.outlook.com (2603:10b6:408:ac:cafe::e1) by BN8PR07CA0035.outlook.office365.com (2603:10b6:408:ac::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37 via Frontend Transport; Mon, 20 Mar 2023 19:57:05 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 69.171.232.181) smtp.mailfrom=meta.com; dkim=none (message not signed) header.d=none;dmarc=fail action=oreject header.from=meta.com; Received-SPF: Fail (protection.outlook.com: domain of meta.com does not designate 69.171.232.181 as permitted sender) receiver=protection.outlook.com; client-ip=69.171.232.181; helo=69-171-232-181.mail-mxout.facebook.com; Received: from 69-171-232-181.mail-mxout.facebook.com (69.171.232.181) by BN8NAM12FT063.mail.protection.outlook.com (10.13.182.194) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6222.11 via Frontend Transport; Mon, 20 Mar 2023 19:57:05 +0000 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id 7D9027D4C180; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com Cc: Kui-Feng Lee Subject: [PATCH bpf-next v9 6/8] libbpf: Update a bpf_link with another struct_ops. Date: Mon, 20 Mar 2023 12:56:42 -0700 Message-Id: <20230320195644.1953096-7-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM12FT063:EE_|MN6PR15MB6050:EE_ X-MS-Office365-Filtering-Correlation-Id: 84dbc7f4-e3be-445b-49fe-08db297d47dc X-ETR: Bypass spam filtering X-FB-Source: Internal X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wv32a+BQi6crfgIGdRO5pYVzdDJH4IGomnMJ6V5OfHv6tu+A3k8L6xbq7R/krKHS8TdNwzgivV9ntg0NCRYdBFDoOf5EpPI0FGTnCg6GPhTE1COx265/GACf/jEcLnKZUbNZTLmHdrD+rhZsSoSmgDI8+5A0VMdBcSbW7z1Qzl3fsDHlj/iCajDqPk75ca/IGBftAFB384iQQ/wnkJ7ANjDjfo7X+VFJo0zPdRrKMTpqFlDgb64NC4Iwxkt1hUuGPhpsdejP5rbUG3Q7NCkQOEkq84kZqoQYaqqwZJ38nFZ8pXF2GAlRca7lh8zM4WKH1neaQdS4VPLmydbeHdNE7f9GdkTVXLmk7dM4iJVYXKqpROBbE03I1XIFSYpIkzKqnWX3Pl3IjBdwuAREuG7sVg1APFT6tgqGlpi3twzr2wOBry+DpTVf969RZHGHKUaufS6jI15VFgt2J/HxmYj0wL0TNEpg7AU8xJ2HWBo9ufRznH6UDy5ADWdHzCNuX/lAFl1her8WUUSu/VXEEGtX54CfVn2D7D01BOozfdc50rnoeyswr5/IT+BI9ziuyjhcYpKACxzZnMAO78SQbXTrrXJm7BhPxYDIuNx+qjBEVxbFdoLPPl5bf9BLgoFJIFnGsv9SKQAsfSJ28HrlQhzq5WZ/tcB6VVe0l7zUNCv4Ri5J35NIZs9TS8IRwCeNeWj9xZH8MSOiy0ENp+T3RKSrQw== X-Forefront-Antispam-Report: CIP:69.171.232.181;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:69-171-232-181.mail-mxout.facebook.com;PTR:69-171-232-181.mail-mxout.facebook.com;CAT:NONE;SFS:(13230025)(4636009)(39860400002)(346002)(376002)(396003)(136003)(451199018)(36840700001)(46966006)(40470700004)(86362001)(33570700077)(40460700003)(82310400005)(40480700001)(36756003)(316002)(42186006)(83380400001)(4326008)(8676002)(478600001)(6266002)(26005)(107886003)(6666004)(2616005)(336012)(1076003)(47076005)(186003)(70206006)(356005)(5660300002)(41300700001)(82740400003)(7636003)(36860700001)(8936002)(2906002)(7596003)(15650500001);DIR:OUT;SFP:1501; X-OriginatorOrg: meta.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Mar 2023 19:57:05.5858 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 84dbc7f4-e3be-445b-49fe-08db297d47dc X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=8ae927fe-1255-47a7-a2af-5f3a069daaa2;Ip=[69.171.232.181];Helo=[69-171-232-181.mail-mxout.facebook.com] X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TreatMessagesAsInternal-BN8NAM12FT063.eop-nam12.prod.protection.outlook.com X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN6PR15MB6050 X-Proofpoint-GUID: xi3wTEE4EqBzwv0_cmO7uyly764XBnTw X-Proofpoint-ORIG-GUID: xi3wTEE4EqBzwv0_cmO7uyly764XBnTw X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Introduce bpf_link__update_map(), which allows to atomically update underlying struct_ops implementation for given struct_ops BPF link Signed-off-by: Kui-Feng Lee --- tools/lib/bpf/libbpf.c | 40 ++++++++++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf.h | 1 + tools/lib/bpf/libbpf.map | 1 + 3 files changed, 42 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 56a60ab2ca8f..f84d68c049e3 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -11639,6 +11639,11 @@ struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) /* kern_vdata should be prepared during the loading phase. */ err = bpf_map_update_elem(map->fd, &zero, map->st_ops->kern_vdata, 0); + /* It can be EBUSY if the map has been used to create or + * update a link before. We don't allow updating the value of + * a struct_ops once it is set. That ensures that the value + * never changed. So, it is safe to skip EBUSY. + */ if (err && err != -EBUSY) { free(link); return libbpf_err_ptr(err); @@ -11665,6 +11670,41 @@ struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map) return &link->link; } +/* + * Swap the back struct_ops of a link with a new struct_ops map. + */ +int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map) +{ + struct bpf_link_struct_ops *st_ops_link; + __u32 zero = 0; + int err; + + if (!bpf_map__is_struct_ops(map) || map->fd < 0) + return -EINVAL; + + st_ops_link = container_of(link, struct bpf_link_struct_ops, link); + /* Ensure the type of a link is correct */ + if (st_ops_link->map_fd < 0) + return -EINVAL; + + err = bpf_map_update_elem(map->fd, &zero, map->st_ops->kern_vdata, 0); + /* It can be EBUSY if the map has been used to create or + * update a link before. We don't allow updating the value of + * a struct_ops once it is set. That ensures that the value + * never changed. So, it is safe to skip EBUSY. + */ + if (err && err != -EBUSY) + return err; + + err = bpf_link_update(link->fd, map->fd, NULL); + if (err < 0) + return err; + + st_ops_link->map_fd = map->fd; + + return 0; +} + typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(struct perf_event_header *hdr, void *private_data); diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index db4992a036f8..1615e55e2e79 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -719,6 +719,7 @@ bpf_program__attach_freplace(const struct bpf_program *prog, struct bpf_map; LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map); +LIBBPF_API int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map); struct bpf_iter_attach_opts { size_t sz; /* size of this struct for forward/backward compatibility */ diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 50dde1f6521e..a5aa3a383d69 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -386,6 +386,7 @@ LIBBPF_1.1.0 { LIBBPF_1.2.0 { global: bpf_btf_get_info_by_fd; + bpf_link__update_map; bpf_link_get_info_by_fd; bpf_map_get_info_by_fd; bpf_prog_get_info_by_fd; From patchwork Mon Mar 20 19:56:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181802 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68B03C6FD1D for ; Mon, 20 Mar 2023 19:57:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230436AbjCTT52 (ORCPT ); Mon, 20 Mar 2023 15:57:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230000AbjCTT5W (ORCPT ); Mon, 20 Mar 2023 15:57:22 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48B6F22126 for ; Mon, 20 Mar 2023 12:57:14 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32KH7Tir023560 for ; Mon, 20 Mar 2023 12:57:13 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=250zDJIaOCI/0NSrmRzvQLYeEZCXgM1J8p6NoY5yvAo=; b=fBdW2MgNg8LoxF97kh3Mv087L57IVW2z02gXTYPjuSDk0qb/lKyUAesDo+lWiRCJ+Mwy bqAx/d6VnxbvbsAthkktWzMezo5n83cSDvzIxT5aYLDZaDY5u8ug2TXBbG7N9HfOdEaK KdAlprHDeAXLHh8QZqyHG+XUF/KTfkG99NmC7zWK2MUXN/i5slfc3cRZOvuYnZam2XK0 QFuFVRqzQZAU48Duszo70DWWNDcAnmzG4NvP2uhxuWzGRM3YNN40rFnyP9ecLNedL1nr UMN/1d6FomM6nL8Vg4ErbjM2XYYh8buUtFm+bqhKEDUt3+TgVrgjJ3B3KtgJTeo/K84D Eg== Received: from nam10-dm6-obe.outbound.protection.outlook.com (mail-dm6nam10lp2102.outbound.protection.outlook.com [104.47.58.102]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3pd8mrv0up-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Mar 2023 12:57:12 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XGbFXmWv9kDCf+Nj+iCDW+TFML+IFtu7isjtuDAollwR8RoK5WaOmZAuDGdB+YnhprJs9jB51C5GyRXPi7yGAGa9OpeVO2JMaQGKoqHeRrISUYGhvNJrdfVGTp6z6BSpVd+OVjcreOFwkvl891e8jG71Ujn366Jr8strtEcCAwjHVf3INzYUMlRTr8pSkgSIhhjxjSx3dI7PtCUHdJCjyfPKbW6ln3kFPBbPesAVU6X9j1ksNvKxEoG3KVwVct48JvcZ71LVw/M9o8rksPD597rpqZ4hyfzDNXGB1ImP7dqmOhurRCLsJYjrpk+wnZJ8L1rkdeRWb9lwPh5farku5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=250zDJIaOCI/0NSrmRzvQLYeEZCXgM1J8p6NoY5yvAo=; b=F5fS0FImP1QZz236R/o+pgX/tEDjO5HFQq9LP2Nl5AnE1X8Ywz3/X1Mfzbfh8jVJD812OOD/6kBWJ3plAdKmHcP8eV/H25JVsZ5i1/Fb/cux+OSN7VBK6vcbFi41+7Vsvq6jph9atfG3JIRqC5XwC3ahC7ro1nRvQERFKFw9QaQpnaLs4UdvN9FabObssFJFihwW7+A7MuR38GOaiIWPnskErtmxP+0gcfz48Ax63mjCI9HPfMiM8478idy8/R6NXe2ltEVAsT7EpOe6QpZA1I9sozXeLeopvXGMqh7oM+TCo2FFHSGPhg+/NockuKy+r5ngBCCLQdHy7zeOlTsejQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 69.171.232.181) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=meta.com; dmarc=fail (p=reject sp=reject pct=100) action=oreject header.from=meta.com; dkim=none (message not signed); arc=none Received: from BN0PR04CA0045.namprd04.prod.outlook.com (2603:10b6:408:e8::20) by IA1PR15MB5394.namprd15.prod.outlook.com (2603:10b6:208:38a::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37; Mon, 20 Mar 2023 19:57:10 +0000 Received: from BN8NAM12FT113.eop-nam12.prod.protection.outlook.com (2603:10b6:408:e8:cafe::7d) by BN0PR04CA0045.outlook.office365.com (2603:10b6:408:e8::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37 via Frontend Transport; Mon, 20 Mar 2023 19:57:10 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 69.171.232.181) smtp.mailfrom=meta.com; dkim=none (message not signed) header.d=none;dmarc=fail action=oreject header.from=meta.com; Received-SPF: Fail (protection.outlook.com: domain of meta.com does not designate 69.171.232.181 as permitted sender) receiver=protection.outlook.com; client-ip=69.171.232.181; helo=69-171-232-181.mail-mxout.facebook.com; Received: from 69-171-232-181.mail-mxout.facebook.com (69.171.232.181) by BN8NAM12FT113.mail.protection.outlook.com (10.13.183.27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.2 via Frontend Transport; Mon, 20 Mar 2023 19:57:10 +0000 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id A63DF7D4C182; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com Cc: Kui-Feng Lee Subject: [PATCH bpf-next v9 7/8] libbpf: Use .struct_ops.link section to indicate a struct_ops with a link. Date: Mon, 20 Mar 2023 12:56:43 -0700 Message-Id: <20230320195644.1953096-8-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM12FT113:EE_|IA1PR15MB5394:EE_ X-MS-Office365-Filtering-Correlation-Id: 7859305b-e767-4519-9f9c-08db297d4ad4 X-ETR: Bypass spam filtering X-FB-Source: Internal X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KJufiE4lOeOZH5RCUrUHZAI3KkSB7JrbkP1T1c+fft6IkOkDB2tO9d0JBphjKvqUnY33HSNE6Pz/hHystCAKiLWwZ04FB//W+LqJQSsIKFdNHB2kFz4RY10R6d89R73skBtPG/3/NGzG1hgAOQ5jzySxoxoDTxx1gM7cm87yg2z29q7OjbHjUHKFYK579D5z+5KBGB5B/uwcodXw+jORJqHLAka3VOpl0BjozCEnYPdXR/5Yd2OeGYdQWJCOpHFJ4XJTQR0GoFcSNxT8sjpnesIpMKwx8ycDPHcCNU9pt57mmxmMmdClgI3Bt3+Ebl9El8snGwLDTkLAzYPxgqRAu4JlQXxWOe7y3i8yzi63nG5pqoWn9LRbQNz3IwajRVYG28YEtK6DWirdTFODEU9LDkeB+NON+crC3B6pEMOoGl/eXTKj/ZOmPuxzEqq5z+s0i/PG+YpXyCNB/r+m7AeFpzqsIt0HuAzb8fJY9JcqZgYv9IVZofDyOn3PEOhMptSBi70PE4ivPldnRfHJm/bz5zO7ue9MzTE1pMZK7ZIaYCYzgKNkX2fMjFT0qJ+r1h8jzhbIu7eDonbD1Xz/vrGSrboy/ckDUgh16ZsjmuRsN+JZVI18cMFnBBRoZ9nVVBk4GHWM4rUNlfj02t2A/x53zIuVJEU4h2xKp7nczTt6mdbK7no/rdW5xL+Ly9cjICBaeFos3z0qwYSeI8ML+f92BZynn0fwo5lE0x2+BembXG8= X-Forefront-Antispam-Report: CIP:69.171.232.181;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:69-171-232-181.mail-mxout.facebook.com;PTR:69-171-232-181.mail-mxout.facebook.com;CAT:NONE;SFS:(13230025)(4636009)(376002)(396003)(39860400002)(136003)(346002)(451199018)(40470700004)(46966006)(36840700001)(36756003)(8936002)(5660300002)(40480700001)(40460700003)(47076005)(83380400001)(2616005)(36860700001)(336012)(2906002)(478600001)(82310400005)(33570700077)(42186006)(316002)(86362001)(82740400003)(41300700001)(26005)(1076003)(7596003)(70206006)(4326008)(356005)(8676002)(6666004)(107886003)(6266002)(186003)(7636003)(142923001);DIR:OUT;SFP:1501; X-OriginatorOrg: meta.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Mar 2023 19:57:10.5687 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7859305b-e767-4519-9f9c-08db297d4ad4 X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=8ae927fe-1255-47a7-a2af-5f3a069daaa2;Ip=[69.171.232.181];Helo=[69-171-232-181.mail-mxout.facebook.com] X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TreatMessagesAsInternal-BN8NAM12FT113.eop-nam12.prod.protection.outlook.com X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR15MB5394 X-Proofpoint-GUID: WlYMLDeuSKP-FOFE7I1_-0c9H6pe4Ut1 X-Proofpoint-ORIG-GUID: WlYMLDeuSKP-FOFE7I1_-0c9H6pe4Ut1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Flags a struct_ops is to back a bpf_link by putting it to the ".struct_ops.link" section. Once it is flagged, the created struct_ops can be used to create a bpf_link or update a bpf_link that has been backed by another struct_ops. Signed-off-by: Kui-Feng Lee Acked-by: Andrii Nakryiko --- tools/lib/bpf/libbpf.c | 60 +++++++++++++++++++++++++++++++----------- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index f84d68c049e3..d801ab3a46d8 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -468,6 +468,7 @@ struct bpf_struct_ops { #define KCONFIG_SEC ".kconfig" #define KSYMS_SEC ".ksyms" #define STRUCT_OPS_SEC ".struct_ops" +#define STRUCT_OPS_LINK_SEC ".struct_ops.link" enum libbpf_map_type { LIBBPF_MAP_UNSPEC, @@ -597,6 +598,7 @@ struct elf_state { Elf64_Ehdr *ehdr; Elf_Data *symbols; Elf_Data *st_ops_data; + Elf_Data *st_ops_link_data; size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct elf_sec_desc *secs; @@ -606,6 +608,7 @@ struct elf_state { int text_shndx; int symbols_shndx; int st_ops_shndx; + int st_ops_link_shndx; }; struct usdt_manager; @@ -1119,7 +1122,8 @@ static int bpf_object__init_kern_struct_ops_maps(struct bpf_object *obj) return 0; } -static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) +static int init_struct_ops_maps(struct bpf_object *obj, const char *sec_name, + int shndx, Elf_Data *data, __u32 map_flags) { const struct btf_type *type, *datasec; const struct btf_var_secinfo *vsi; @@ -1130,15 +1134,15 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) struct bpf_map *map; __u32 i; - if (obj->efile.st_ops_shndx == -1) + if (shndx == -1) return 0; btf = obj->btf; - datasec_id = btf__find_by_name_kind(btf, STRUCT_OPS_SEC, + datasec_id = btf__find_by_name_kind(btf, sec_name, BTF_KIND_DATASEC); if (datasec_id < 0) { pr_warn("struct_ops init: DATASEC %s not found\n", - STRUCT_OPS_SEC); + sec_name); return -EINVAL; } @@ -1151,7 +1155,7 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) type_id = btf__resolve_type(obj->btf, vsi->type); if (type_id < 0) { pr_warn("struct_ops init: Cannot resolve var type_id %u in DATASEC %s\n", - vsi->type, STRUCT_OPS_SEC); + vsi->type, sec_name); return -EINVAL; } @@ -1170,7 +1174,7 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) if (IS_ERR(map)) return PTR_ERR(map); - map->sec_idx = obj->efile.st_ops_shndx; + map->sec_idx = shndx; map->sec_offset = vsi->offset; map->name = strdup(var_name); if (!map->name) @@ -1180,6 +1184,7 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) map->def.key_size = sizeof(int); map->def.value_size = type->size; map->def.max_entries = 1; + map->def.map_flags = map_flags; map->st_ops = calloc(1, sizeof(*map->st_ops)); if (!map->st_ops) @@ -1192,14 +1197,14 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) if (!st_ops->data || !st_ops->progs || !st_ops->kern_func_off) return -ENOMEM; - if (vsi->offset + type->size > obj->efile.st_ops_data->d_size) { + if (vsi->offset + type->size > data->d_size) { pr_warn("struct_ops init: var %s is beyond the end of DATASEC %s\n", - var_name, STRUCT_OPS_SEC); + var_name, sec_name); return -EINVAL; } memcpy(st_ops->data, - obj->efile.st_ops_data->d_buf + vsi->offset, + data->d_buf + vsi->offset, type->size); st_ops->tname = tname; st_ops->type = type; @@ -1212,6 +1217,19 @@ static int bpf_object__init_struct_ops_maps(struct bpf_object *obj) return 0; } +static int bpf_object_init_struct_ops(struct bpf_object *obj) +{ + int err; + + err = init_struct_ops_maps(obj, STRUCT_OPS_SEC, obj->efile.st_ops_shndx, + obj->efile.st_ops_data, 0); + err = err ?: init_struct_ops_maps(obj, STRUCT_OPS_LINK_SEC, + obj->efile.st_ops_link_shndx, + obj->efile.st_ops_link_data, + BPF_F_LINK); + return err; +} + static struct bpf_object *bpf_object__new(const char *path, const void *obj_buf, size_t obj_buf_sz, @@ -1248,6 +1266,7 @@ static struct bpf_object *bpf_object__new(const char *path, obj->efile.obj_buf_sz = obj_buf_sz; obj->efile.btf_maps_shndx = -1; obj->efile.st_ops_shndx = -1; + obj->efile.st_ops_link_shndx = -1; obj->kconfig_map_idx = -1; obj->kern_version = get_kernel_version(); @@ -1265,6 +1284,7 @@ static void bpf_object__elf_finish(struct bpf_object *obj) obj->efile.elf = NULL; obj->efile.symbols = NULL; obj->efile.st_ops_data = NULL; + obj->efile.st_ops_link_data = NULL; zfree(&obj->efile.secs); obj->efile.sec_cnt = 0; @@ -2619,7 +2639,7 @@ static int bpf_object__init_maps(struct bpf_object *obj, err = bpf_object__init_user_btf_maps(obj, strict, pin_root_path); err = err ?: bpf_object__init_global_data_maps(obj); err = err ?: bpf_object__init_kconfig_map(obj); - err = err ?: bpf_object__init_struct_ops_maps(obj); + err = err ?: bpf_object_init_struct_ops(obj); return err; } @@ -2753,12 +2773,13 @@ static bool libbpf_needs_btf(const struct bpf_object *obj) { return obj->efile.btf_maps_shndx >= 0 || obj->efile.st_ops_shndx >= 0 || + obj->efile.st_ops_link_shndx >= 0 || obj->nr_extern > 0; } static bool kernel_needs_btf(const struct bpf_object *obj) { - return obj->efile.st_ops_shndx >= 0; + return obj->efile.st_ops_shndx >= 0 || obj->efile.st_ops_link_shndx >= 0; } static int bpf_object__init_btf(struct bpf_object *obj, @@ -3451,6 +3472,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj) } else if (strcmp(name, STRUCT_OPS_SEC) == 0) { obj->efile.st_ops_data = data; obj->efile.st_ops_shndx = idx; + } else if (strcmp(name, STRUCT_OPS_LINK_SEC) == 0) { + obj->efile.st_ops_link_data = data; + obj->efile.st_ops_link_shndx = idx; } else { pr_info("elf: skipping unrecognized data section(%d) %s\n", idx, name); @@ -3465,6 +3489,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj) /* Only do relo for section with exec instructions */ if (!section_have_execinstr(obj, targ_sec_idx) && strcmp(name, ".rel" STRUCT_OPS_SEC) && + strcmp(name, ".rel" STRUCT_OPS_LINK_SEC) && strcmp(name, ".rel" MAPS_ELF_SEC)) { pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n", idx, name, targ_sec_idx, @@ -6611,7 +6636,7 @@ static int bpf_object__collect_relos(struct bpf_object *obj) return -LIBBPF_ERRNO__INTERNAL; } - if (idx == obj->efile.st_ops_shndx) + if (idx == obj->efile.st_ops_shndx || idx == obj->efile.st_ops_link_shndx) err = bpf_object__collect_st_ops_relos(obj, shdr, data); else if (idx == obj->efile.btf_maps_shndx) err = bpf_object__collect_map_relos(obj, shdr, data); @@ -8850,6 +8875,7 @@ const char *libbpf_bpf_prog_type_str(enum bpf_prog_type t) } static struct bpf_map *find_struct_ops_map_by_offset(struct bpf_object *obj, + int sec_idx, size_t offset) { struct bpf_map *map; @@ -8859,7 +8885,8 @@ static struct bpf_map *find_struct_ops_map_by_offset(struct bpf_object *obj, map = &obj->maps[i]; if (!bpf_map__is_struct_ops(map)) continue; - if (map->sec_offset <= offset && + if (map->sec_idx == sec_idx && + map->sec_offset <= offset && offset - map->sec_offset < map->def.value_size) return map; } @@ -8901,7 +8928,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, } name = elf_sym_str(obj, sym->st_name) ?: ""; - map = find_struct_ops_map_by_offset(obj, rel->r_offset); + map = find_struct_ops_map_by_offset(obj, shdr->sh_info, rel->r_offset); if (!map) { pr_warn("struct_ops reloc: cannot find map at rel->r_offset %zu\n", (size_t)rel->r_offset); @@ -8968,8 +8995,9 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, } /* struct_ops BPF prog can be re-used between multiple - * .struct_ops as long as it's the same struct_ops struct - * definition and the same function pointer field + * .struct_ops & .struct_ops.link as long as it's the + * same struct_ops struct definition and the same + * function pointer field */ if (prog->attach_btf_id != st_ops->type_id || prog->expected_attach_type != member_idx) { From patchwork Mon Mar 20 19:56:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kui-Feng Lee X-Patchwork-Id: 13181803 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEFA9C6FD1C for ; Mon, 20 Mar 2023 19:59:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229635AbjCTT7j (ORCPT ); Mon, 20 Mar 2023 15:59:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbjCTT7i (ORCPT ); Mon, 20 Mar 2023 15:59:38 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D999B21966 for ; Mon, 20 Mar 2023 12:59:35 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id 32KH7dCS016505 for ; Mon, 20 Mar 2023 12:59:34 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=cLGbrYiSP2GxYN6pv6+F/7L8SNzfXLZ0+yVDwT2Ok/w=; b=BHjeYHi7vFj1uqkTSXJOfVRbMA7h2vFHWx3BdbudT5pgRRZztwcF0qZLmCRBMw2OY7Pl 3THbZFFHEc+lI/3f0uw8ZhLp+w6kj+KjDcXXyOalCJ2GGHlBppDU8qEiy0hZEZQHtx5y 680/i0bVMwATdJBDDrTuI4ja3WziMgqgT/ckNlvdnBh8fR5Uf8IPnpTtjU8l+6WBGcmP Cyj2JFFG/ZVeEQCHd4GO8/8Fq2Wu8Fdv83kgHItQryjiy8voKhR0mhk4jyb2vruc/MC1 mNU1/pifY6Z9b8GnMSw3okGEXPiAZCDY6AybIDRkKjfu41tQAmIEPdJg7qBXNd2M7PaN 1A== Received: from nam04-dm6-obe.outbound.protection.outlook.com (mail-dm6nam04lp2045.outbound.protection.outlook.com [104.47.73.45]) by m0001303.ppops.net (PPS) with ESMTPS id 3pd8yyuwyy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Mar 2023 12:59:34 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eU6W65pyvEOvjEcVzriKWCQ5t24edPaeWGwk/e9wwWNILEWVavMCxV58yODyqGeWXkN/Hx+e1yjmzsa4OpbeCaPjCpJ9sfCqDgrWU38pO2T1op+ztZqmZztM+cdMCGqxYI+/V48xNZWIT7vBFeKXh5fJ6t6OfQnJYLXboWjC+juI5MTn6C9eOgXnKcsahqW8UQlfEz7yOrOLYmFTNCemFA2UATx+Lwpim6hZNLUUU1N2QtDii2H+9uv1IQ3weBN5KkYvWwS47y8ce1uUnwg6HjKmPcY+74eI4Kap+d47y16ZIc9+T+ueZLUezUGgd9h2MweJEbdeZ+wrpUJCBg1Q3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cLGbrYiSP2GxYN6pv6+F/7L8SNzfXLZ0+yVDwT2Ok/w=; b=HO90eOCR356YjkCt2g3xpSnFkFb5fuhfaUTmA+93OGYB265HoQ0Rj/y/04hrdaG51OE9fi2aJWZfqyilZkD0eJ+TNVA4Z+6/zpcEwcmNgvdyA533UYU+RrnCfGZywUy5P2B1chMAtO9yYqFERYaBydd3F4YeONXBEACvfeHuQ48pTG7mZefhBJ73WuPTAtkbkd2LHHBqjr5NgRcaviEpRDDbxWgXQ7t6r9BDdYFVeXc6t9iVJWPLhYintOnlhU2LOohyBRI+9IilD5vtPyorRlb7ndqCTvDVl9a6WIRJDTFDLOTGuBWiR8Q/1gEFic9IOYUIT9ZZEIClHBqZ/uOH/A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 69.171.232.181) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=meta.com; dmarc=fail (p=reject sp=reject pct=100) action=oreject header.from=meta.com; dkim=none (message not signed); arc=none Received: from DM6PR06CA0006.namprd06.prod.outlook.com (2603:10b6:5:120::19) by SJ0PR15MB4201.namprd15.prod.outlook.com (2603:10b6:a03:2ab::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37; Mon, 20 Mar 2023 19:59:32 +0000 Received: from DM6NAM12FT032.eop-nam12.prod.protection.outlook.com (2603:10b6:5:120:cafe::72) by DM6PR06CA0006.outlook.office365.com (2603:10b6:5:120::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.37 via Frontend Transport; Mon, 20 Mar 2023 19:59:32 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 69.171.232.181) smtp.mailfrom=meta.com; dkim=none (message not signed) header.d=none;dmarc=fail action=oreject header.from=meta.com; Received-SPF: Fail (protection.outlook.com: domain of meta.com does not designate 69.171.232.181 as permitted sender) receiver=protection.outlook.com; client-ip=69.171.232.181; helo=69-171-232-181.mail-mxout.facebook.com; Received: from 69-171-232-181.mail-mxout.facebook.com (69.171.232.181) by DM6NAM12FT032.mail.protection.outlook.com (10.13.178.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6222.16 via Frontend Transport; Mon, 20 Mar 2023 19:59:32 +0000 Received: by devbig931.frc1.facebook.com (Postfix, from userid 460691) id AC5C17D4C184; Mon, 20 Mar 2023 12:56:46 -0700 (PDT) From: Kui-Feng Lee To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, sdf@google.com Cc: Kui-Feng Lee Subject: [PATCH bpf-next v9 8/8] selftests/bpf: Test switching TCP Congestion Control algorithms. Date: Mon, 20 Mar 2023 12:56:44 -0700 Message-Id: <20230320195644.1953096-9-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230320195644.1953096-1-kuifeng@meta.com> References: <20230320195644.1953096-1-kuifeng@meta.com> MIME-Version: 1.0 X-FB-Internal: Safe X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM12FT032:EE_|SJ0PR15MB4201:EE_ X-MS-Office365-Filtering-Correlation-Id: 45ed7753-d193-4b24-80fb-08db297d9f40 X-ETR: Bypass spam filtering X-FB-Source: Internal X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: t5qK9RnLXQTrwv+A35eCiSnTM//xq38Vj+iF7oKXyTRgBPXm8J+ia69xfrHg3QjQmJ9P7JrEmuJuLxNS7EdlmZtt/BAby7nquA9FL6qLe2S6+nUIsJxzlBB2Z98PXpPHUJweQsytQEyQ4yYqBfwSdzzrdGLkLkPXCSHLA7gYMXRiJxpjmXGEWsMG/I/xvJksK6KRLGLVgZwjEfUUnFT20XRFJzm4+kjHPF7KoN/gECHDWdnCEOCrcrlyKo9GDfj0E2GpZ1GWrk4XrvDRFE5eOcPeITlEvUMg/8e+T01OFKynUQXzNqMKeM/lqofA6/y7Qq+HQ3bxRqq7qL1OLZyE7A9pEhWxJkRNH6+MpCr5TfHsGEpxmbQKDTD8xwGKu4I0xMzekImSCC5XkTIWwj1ZlacVMfNg6mrGHGDGCXQhz1OrsURsdmLClqvFT3YVO9Bphy85rbrzw54CZl+XgFYAqqf/7jubh9XHyvdvM4td8nrO+N/D6wtpGCdQrS30mbAyAm0nVPS5TjnHsnBec+P27DVp/MOURzHHh2qbIBEDDGYDPXyXkPhLFE3zWyhUOPsIi14xUXJ1X1Jympir0ekD5CHNzqGSV5uJA14aQZqd433NObt1hs/f5HfrgBLqHY21J0uLEeS3w5Nsu/Iv18pLhNhR79Rc1ecvZQrDQ7AXOfEiTYyZABrKQDx5G2qwwkkL6AkH+Nylp52kUUku9JfDSA== X-Forefront-Antispam-Report: CIP:69.171.232.181;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:69-171-232-181.mail-mxout.facebook.com;PTR:69-171-232-181.mail-mxout.facebook.com;CAT:NONE;SFS:(13230025)(4636009)(39860400002)(376002)(346002)(396003)(136003)(451199018)(46966006)(40470700004)(36840700001)(33570700077)(47076005)(83380400001)(82310400005)(2616005)(42186006)(478600001)(26005)(6266002)(1076003)(316002)(107886003)(5660300002)(186003)(36860700001)(7596003)(356005)(40460700003)(336012)(86362001)(82740400003)(7636003)(2906002)(8936002)(70206006)(4326008)(36756003)(41300700001)(8676002)(40480700001);DIR:OUT;SFP:1501; X-OriginatorOrg: meta.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Mar 2023 19:59:32.1418 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 45ed7753-d193-4b24-80fb-08db297d9f40 X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=8ae927fe-1255-47a7-a2af-5f3a069daaa2;Ip=[69.171.232.181];Helo=[69-171-232-181.mail-mxout.facebook.com] X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TreatMessagesAsInternal-DM6NAM12FT032.eop-nam12.prod.protection.outlook.com X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR15MB4201 X-Proofpoint-GUID: CqmDRxpYrGvMG4CtITIY6O89EW5FqSDD X-Proofpoint-ORIG-GUID: CqmDRxpYrGvMG4CtITIY6O89EW5FqSDD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_16,2023-03-20_02,2023-02-09_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Create a pair of sockets that utilize the congestion control algorithm under a particular name. Then switch up this congestion control algorithm to another implementation and check whether newly created connections using the same cc name now run the new implementation. Also, try to update a link with a struct_ops that is without BPF_F_LINK or with a wrong or different name. These cases should fail due to the violation of assumptions. To update a bpf_link of a struct_ops, it must be replaced with another struct_ops that is identical in type and name and has the BPF_F_LINK flag. The other test case is to create links from the same struct_ops more than once. It makes sure a struct_ops can be used repeatly. Signed-off-by: Kui-Feng Lee --- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 116 ++++++++++++++++++ .../selftests/bpf/progs/tcp_ca_update.c | 80 ++++++++++++ 2 files changed, 196 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_update.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c index e980188d4124..5f3602326bbc 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -8,6 +8,7 @@ #include "bpf_dctcp.skel.h" #include "bpf_cubic.skel.h" #include "bpf_tcp_nogpl.skel.h" +#include "tcp_ca_update.skel.h" #include "bpf_dctcp_release.skel.h" #include "tcp_ca_write_sk_pacing.skel.h" #include "tcp_ca_incompl_cong_ops.skel.h" @@ -381,6 +382,113 @@ static void test_unsupp_cong_op(void) libbpf_set_print(old_print_fn); } +static void test_update_ca(void) +{ + struct tcp_ca_update *skel; + struct bpf_link *link; + int saved_ca1_cnt; + int err; + + skel = tcp_ca_update__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open")) + return; + + link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); + ASSERT_OK_PTR(link, "attach_struct_ops"); + + do_test("tcp_ca_update", NULL); + saved_ca1_cnt = skel->bss->ca1_cnt; + ASSERT_GT(saved_ca1_cnt, 0, "ca1_ca1_cnt"); + + err = bpf_link__update_map(link, skel->maps.ca_update_2); + ASSERT_OK(err, "update_map"); + + do_test("tcp_ca_update", NULL); + ASSERT_EQ(skel->bss->ca1_cnt, saved_ca1_cnt, "ca2_ca1_cnt"); + ASSERT_GT(skel->bss->ca2_cnt, 0, "ca2_ca2_cnt"); + + bpf_link__destroy(link); + tcp_ca_update__destroy(skel); +} + +static void test_update_wrong(void) +{ + struct tcp_ca_update *skel; + struct bpf_link *link; + int saved_ca1_cnt; + int err; + + skel = tcp_ca_update__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open")) + return; + + link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); + ASSERT_OK_PTR(link, "attach_struct_ops"); + + do_test("tcp_ca_update", NULL); + saved_ca1_cnt = skel->bss->ca1_cnt; + ASSERT_GT(saved_ca1_cnt, 0, "ca1_ca1_cnt"); + + err = bpf_link__update_map(link, skel->maps.ca_wrong); + ASSERT_ERR(err, "update_map"); + + do_test("tcp_ca_update", NULL); + ASSERT_GT(skel->bss->ca1_cnt, saved_ca1_cnt, "ca2_ca1_cnt"); + + bpf_link__destroy(link); + tcp_ca_update__destroy(skel); +} + +static void test_mixed_links(void) +{ + struct tcp_ca_update *skel; + struct bpf_link *link, *link_nl; + int err; + + skel = tcp_ca_update__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open")) + return; + + link_nl = bpf_map__attach_struct_ops(skel->maps.ca_no_link); + ASSERT_OK_PTR(link_nl, "attach_struct_ops_nl"); + + link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); + ASSERT_OK_PTR(link, "attach_struct_ops"); + + do_test("tcp_ca_update", NULL); + ASSERT_GT(skel->bss->ca1_cnt, 0, "ca1_ca1_cnt"); + + err = bpf_link__update_map(link, skel->maps.ca_no_link); + ASSERT_ERR(err, "update_map"); + + bpf_link__destroy(link); + bpf_link__destroy(link_nl); + tcp_ca_update__destroy(skel); +} + +static void test_multi_links(void) +{ + struct tcp_ca_update *skel; + struct bpf_link *link; + + skel = tcp_ca_update__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open")) + return; + + link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); + ASSERT_OK_PTR(link, "attach_struct_ops_1st"); + bpf_link__destroy(link); + + /* A map should be able to be used to create links multiple + * times. + */ + link = bpf_map__attach_struct_ops(skel->maps.ca_update_1); + ASSERT_OK_PTR(link, "attach_struct_ops_2nd"); + bpf_link__destroy(link); + + tcp_ca_update__destroy(skel); +} + void test_bpf_tcp_ca(void) { if (test__start_subtest("dctcp")) @@ -399,4 +507,12 @@ void test_bpf_tcp_ca(void) test_incompl_cong_ops(); if (test__start_subtest("unsupp_cong_op")) test_unsupp_cong_op(); + if (test__start_subtest("update_ca")) + test_update_ca(); + if (test__start_subtest("update_wrong")) + test_update_wrong(); + if (test__start_subtest("mixed_links")) + test_mixed_links(); + if (test__start_subtest("multi_links")) + test_multi_links(); } diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_update.c b/tools/testing/selftests/bpf/progs/tcp_ca_update.c new file mode 100644 index 000000000000..b93a0ed33057 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/tcp_ca_update.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" + +#include +#include + +char _license[] SEC("license") = "GPL"; + +int ca1_cnt = 0; +int ca2_cnt = 0; + +static inline struct tcp_sock *tcp_sk(const struct sock *sk) +{ + return (struct tcp_sock *)sk; +} + +SEC("struct_ops/ca_update_1_init") +void BPF_PROG(ca_update_1_init, struct sock *sk) +{ + ca1_cnt++; +} + +SEC("struct_ops/ca_update_2_init") +void BPF_PROG(ca_update_2_init, struct sock *sk) +{ + ca2_cnt++; +} + +SEC("struct_ops/ca_update_cong_control") +void BPF_PROG(ca_update_cong_control, struct sock *sk, + const struct rate_sample *rs) +{ +} + +SEC("struct_ops/ca_update_ssthresh") +__u32 BPF_PROG(ca_update_ssthresh, struct sock *sk) +{ + return tcp_sk(sk)->snd_ssthresh; +} + +SEC("struct_ops/ca_update_undo_cwnd") +__u32 BPF_PROG(ca_update_undo_cwnd, struct sock *sk) +{ + return tcp_sk(sk)->snd_cwnd; +} + +SEC(".struct_ops.link") +struct tcp_congestion_ops ca_update_1 = { + .init = (void *)ca_update_1_init, + .cong_control = (void *)ca_update_cong_control, + .ssthresh = (void *)ca_update_ssthresh, + .undo_cwnd = (void *)ca_update_undo_cwnd, + .name = "tcp_ca_update", +}; + +SEC(".struct_ops.link") +struct tcp_congestion_ops ca_update_2 = { + .init = (void *)ca_update_2_init, + .cong_control = (void *)ca_update_cong_control, + .ssthresh = (void *)ca_update_ssthresh, + .undo_cwnd = (void *)ca_update_undo_cwnd, + .name = "tcp_ca_update", +}; + +SEC(".struct_ops.link") +struct tcp_congestion_ops ca_wrong = { + .cong_control = (void *)ca_update_cong_control, + .ssthresh = (void *)ca_update_ssthresh, + .undo_cwnd = (void *)ca_update_undo_cwnd, + .name = "tcp_ca_wrong", +}; + +SEC(".struct_ops") +struct tcp_congestion_ops ca_no_link = { + .cong_control = (void *)ca_update_cong_control, + .ssthresh = (void *)ca_update_ssthresh, + .undo_cwnd = (void *)ca_update_undo_cwnd, + .name = "tcp_ca_no_link", +};