From patchwork Tue Jan 24 17:05:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jamal Hadi Salim X-Patchwork-Id: 13114398 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C437DC54E94 for ; Tue, 24 Jan 2023 17:07:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234524AbjAXRHB (ORCPT ); Tue, 24 Jan 2023 12:07:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234714AbjAXRGZ (ORCPT ); Tue, 24 Jan 2023 12:06:25 -0500 Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 681064DCF9 for ; Tue, 24 Jan 2023 09:05:41 -0800 (PST) Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-4ff07dae50dso182326327b3.2 for ; Tue, 24 Jan 2023 09:05:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mojatatu-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RuBHVrg+LlprbV7alk+KeiZPCcYFTXG+R4LePQFieo0=; b=YqskR9KSx44vQkKmht7papn4xyGAhaMN9Fu951/MVvjScmz++08FA9q35eQ4Ve1P7/ B1OsZA7V1YOU1t0bGb8dJBLkqXdLdKteRTOQytKWkT4NieBkjbrQ8U+oZzAvzd+QUjs+ fIxyc0Mk9Jnv5m4envyGHz9jAXmZqyO+LSrZ+5eU2o4xRvBHsV2gfKsC5+k3wMtEZv25 MeWM85RUuFdn8Bjz6uc5aov7MjKsiPUf/gB4R5smows/OGjlyWfq2Lk307F++YrP8tkb WHKucnVyZ8kTb2avivzxteVTzKUC9Rk2nXHRjvmEXa3NfzEqP42f/ancXlmMoR3oeHWI 7ftg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RuBHVrg+LlprbV7alk+KeiZPCcYFTXG+R4LePQFieo0=; b=gPDqOAvkTJqG65Cg+bKNBTD4eqw9yCcGqnSDvOjagldUxM9u4NuhF7xUtpLDLx4ybh xKFWGii0ZzJx+VDymARsjlx/HfXgMvDT7XlMe26r/LiKMtFG9bi+CnTWTkrC5LSit6DR hVkl5poF2H9gnSRDGOTCj/WjLpKvJCd61DwQwc9CAfe94MlXfhsBIIL+oNFRShxgbySc 4ZUSfsaOeihjIYnHBSCT1LWPqygHkRmLHgfUbnLHZqsXS5kMAmqloU9wQzzJd/g17bWP RBuYcO+5MFtA0Ko49slnJ0/oWMbYc+opzLX8glJrgUkH17sp/bD8rd2NLkynwtnesiWw 9rVw== X-Gm-Message-State: AFqh2kqlfkolPLTyOjzF2lCmo7zFWnRVjDwGH5EocCzsJ57V3gicBfpM SULTNOek2kENs8380LLiITVqxAcKGviv1bvY X-Google-Smtp-Source: AMrXdXsdTXk9Rp4Bfo3eWBxqgBSharLMQfJoUO+Z9ozvvUXAS0vfkkkbPcM6pMMAO62oRjlvO90wVQ== X-Received: by 2002:a05:7500:374b:b0:f0:52c4:5de2 with SMTP id gx11-20020a057500374b00b000f052c45de2mr2099890gab.38.1674579933700; Tue, 24 Jan 2023 09:05:33 -0800 (PST) Received: from localhost.localdomain (bras-base-kntaon1618w-grc-10-184-145-9-64.dsl.bell.ca. [184.145.9.64]) by smtp.gmail.com with ESMTPSA id t5-20020a05620a0b0500b007063036cb03sm1700208qkg.126.2023.01.24.09.05.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jan 2023 09:05:33 -0800 (PST) From: Jamal Hadi Salim To: netdev@vger.kernel.org Cc: kernel@mojatatu.com, deb.chatterjee@intel.com, anjali.singhai@intel.com, namrata.limaye@intel.com, khalidm@nvidia.com, tom@sipanda.io, pratyush@sipanda.io, jiri@resnulli.us, xiyou.wangcong@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vladbu@nvidia.com, simon.horman@corigine.com Subject: [PATCH net-next RFC 17/20] p4tc: add table entry create, update, get, delete, flush and dump Date: Tue, 24 Jan 2023 12:05:07 -0500 Message-Id: <20230124170510.316970-17-jhs@mojatatu.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230124170510.316970-1-jhs@mojatatu.com> References: <20230124170510.316970-1-jhs@mojatatu.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Tables are conceptually similar to TCAMs and this implementation could be labelled as an "algorithmic" TCAM. Tables have keys of specific size, maximum number of entries and masks allowed. The basic P4 key types are supported (exact, LPM, ternary, and ranges) although the kernel side is oblivious of all that and sees only bit blobs which it masks before a lookup is performed. This commit allows users to create, update, delete, get, flush and dump table _entries_ (templates were described in earlier patch). For example, a user issuing the following command: tc p4runtime create myprog/table/cb/tname \ dstAddr 10.10.10.0/24 srcAddr 192.168.0.0/16 prio 16 \ action myprog/cb/send param port type dev port1 indicates a pipeline named "myprog" with a table "tname" whose entry we are updating. User space tc will create a key which has a value of 0x0a0a0a00c0a00000 (10.10.10.0 concatenated with 192.168.0.0) and a mask value of 0xffffff00ffff0000 (/24 concatenated with /16) that will be sent to the kernel. In addition a priority field of 16 is passed to the kernel as well as the action definition. The priority field is needed to disambiguate in case two entries match. In that case, the kernel will choose the one with lowest priority number. Note that table entries can only be created once the pipeline template is sealed. If the user wanted to, for example, add an action to our just created entry, they'd issue the following command: tc p4runtime update myprog/table/cb/tname srcAddr 10.10.10.0/24 \ dstAddr 192.168.0.0/16 prio 16 action myprog/cb/send param port type dev port5 In this case, the user needs to specify the pipeline name, the table name, the keys and the priority, so that we can locate the table entry. If the user wanted to, for example, get the table entry that we just updated, they'd issue the following command: tc p4runtime get myprog/table/cb/tname srcAddr 10.10.10.0/24 \ dstAddr 192.168.0.0/16 prio 16 Note that, again, we need to specify the pipeline name, the table name, the keys and the priority, so that we can locate the table entry. If the user wanted to delete the table entry we created, they'd issue the following command: tc p4runtime del myprog/table/cb/tname srcAddr 10.10.10.0/24 \ dstAddr 192.168.0.0/16 prio 16 Note that, again, we need to specify the pipeline name, the table type, the table instance, the keys and the priority, so that we can locate the table entry. We can also flush all the table entries from a specific table instance. To flush the table entries of table instance named tinst1, from table type tname and pipeline ptables, the user would issue the following command: tc p4runtime del myprog/table/cb/tname We can also dump all the table entries from a specific table instance. To dump the table entries of table instance named tinst1, from table type tname and pipeline ptables, the user would issue the following command: tc p4runtime get myprog/table/cb/tname __Table Entry Permissions__ Table entries can have permissions specified when they are being added. caveat: we are doing a lot more than what P4 defines because we feel it is necessary. Table entry permissions build on the table permissions provided when a table is created via the template (see earlier patch). We have two types of permissions: Control path vs datapath. The template definition can set either one. For example, one could allow for adding table entries by the datapath in case of PNA add-on-miss is needed. By default tables entries have control plane RUD, meaning the control plane can Read, Update or Delete entries. By default, as well, the control plane can create new entries unless specified otherwise by the template. Lets see an example of defining a table "tname" at template time: $TC p4template create table/ptables/cb/tname tblid 1 keysz 64 permissions 0x3C9 ... Above is setting the table tname's permission to be 0x3C9 is equivalent to CRUD--R--X meaning: the control plane can Create, Read, Update, Delete The datapath can only Read and Execute table entries. If one was to dump this table with: $TC p4template get table/ptables/cb/tname The output would be the following: pipeline name ptables id 22 table id 1 table name cb/tname key_sz 64 max entries 256 masks 8 default key 1 table entries 0 permissions CRUD--R--X The expressed permissions above are probably the most practical for most use cases. __Constant Tables And P4-programmed Defined Entries__ If one wanted to restrict the table to be an equivalent to a "const" then the permissions would be set to be: -R----R--X In such a case, typically the P4 program will have some entries defined (see the famous P4 calc example). The "initial entries" specified in the P4 program will have to be added by the template (as generated by the compiler), as such: $TC p4template update table/ptables/cb/tname entry srcAddr 10.10.10.10/24 dstAddr 1.1.1.0/24 prio 17 This table cannot be updated at runtime. Any attempt to add an entry of a table which is read-only at runtime will get a permission denied response back from the kernel. Note: If one was to create an equivalent for PNA add-on-miss feature for this table, then the template would issue table permissions as: -R---CR--X PNA doesn't specify whether the datapath can also delete or update entries, but if it did then more appropriate permissions will be: -R----XCRUDX __Mix And Match of RW vs Constant Entries__ Lets look at other scenarios; lets say the table has CRUD--R--X permissions as defined by the template... At runtime the user could add entries which are "const" - by specifying the entry's permission as -R---R--X example: $TC p4runtime create ptables/table/cb/tname srcAddr 10.10.10.10/24 \ dstAddr 1.1.1.0/24 prio 17 permissions 0x109 action drop or not specify permissions at all as such: $TC p4runtime create ptables/table/cb/tname srcAddr 10.10.10.10/24 \ dstAddr 1.1.1.0/24 prio 17 \ action drop in which case the table's permissions defined at template time( CRUD--R--X) are assumed; meaning the table entry can be deleted or updated by the control plane. __Entries permissions Allowed On A Table Entry Creation At Runtime__ When an entry is added with expressed permissions it has at most to have what the template table definition expressed but could ask for less permission. For example, assuming a table with templated specified permissions of CR-D--R--X: An entry created at runtime with permission of -R----R--X is allowed but an entry with -RUD--R--X will be rejected. Co-developed-by: Victor Nogueira Signed-off-by: Victor Nogueira Co-developed-by: Pedro Tammela Signed-off-by: Pedro Tammela Signed-off-by: Jamal Hadi Salim --- include/net/p4tc.h | 60 + include/uapi/linux/p4tc.h | 32 + include/uapi/linux/rtnetlink.h | 7 + net/sched/p4tc/Makefile | 3 +- net/sched/p4tc/p4tc_pipeline.c | 12 + net/sched/p4tc/p4tc_table.c | 45 + net/sched/p4tc/p4tc_tbl_api.c | 1898 ++++++++++++++++++++++++++++++++ security/selinux/nlmsgtab.c | 5 +- 8 files changed, 2060 insertions(+), 2 deletions(-) create mode 100644 net/sched/p4tc/p4tc_tbl_api.c diff --git a/include/net/p4tc.h b/include/net/p4tc.h index 58be4f96f..9a7942992 100644 --- a/include/net/p4tc.h +++ b/include/net/p4tc.h @@ -123,6 +123,7 @@ struct p4tc_pipeline { u32 num_created_acts; refcount_t p_ref; refcount_t p_ctrl_ref; + refcount_t p_entry_deferal_ref; u16 num_tables; u16 curr_tables; u8 p_state; @@ -234,6 +235,7 @@ struct p4tc_table { struct rhltable tbl_entries; struct tc_action **tbl_preacts; struct tc_action **tbl_postacts; + struct p4tc_table_entry *tbl_const_entry; struct p4tc_table_defact __rcu *tbl_default_hitact; struct p4tc_table_defact __rcu *tbl_default_missact; struct p4tc_table_perm __rcu *tbl_permissions; @@ -321,6 +323,54 @@ extern const struct rhashtable_params p4tc_label_ht_params; extern const struct rhashtable_params acts_params; void p4tc_label_ht_destroy(void *ptr, void *arg); +extern const struct rhashtable_params entry_hlt_params; + +struct p4tc_table_entry; +struct p4tc_table_entry_work { + struct work_struct work; + struct p4tc_pipeline *pipeline; + struct p4tc_table_entry *entry; + bool defer_deletion; +}; + +struct p4tc_table_entry_key { + u8 *value; + u8 *unmasked_key; + u16 keysz; +}; + +struct p4tc_table_entry_mask { + struct rcu_head rcu; + u32 sz; + u32 mask_id; + refcount_t mask_ref; + u8 *value; +}; + +struct p4tc_table_entry { + struct p4tc_table_entry_key key; + struct work_struct work; + struct p4tc_table_entry_tm __rcu *tm; + u32 prio; + u32 mask_id; + struct tc_action **acts; + struct p4tc_table_entry_work *entry_work; + int num_acts; + struct rhlist_head ht_node; + struct list_head list; + struct rcu_head rcu; + refcount_t entries_ref; + u16 who_created; + u16 who_updated; + u16 permissions; +}; + +extern const struct nla_policy p4tc_root_policy[P4TC_ROOT_MAX + 1]; +extern const struct nla_policy p4tc_policy[P4TC_MAX + 1]; +struct p4tc_table_entry *p4tc_table_entry_lookup(struct sk_buff *skb, + struct p4tc_table *table, + u32 keysz); + struct p4tc_parser { char parser_name[PARSERNAMSIZ]; struct idr hdr_fields_idr; @@ -445,6 +495,16 @@ struct p4tc_table *tcf_table_get(struct p4tc_pipeline *pipeline, struct netlink_ext_ack *extack); void tcf_table_put_ref(struct p4tc_table *table); +void tcf_table_entry_destroy_hash(void *ptr, void *arg); + +int tcf_table_const_entry_cu(struct net *net, struct nlattr *arg, + struct p4tc_table_entry *entry, + struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct netlink_ext_ack *extack); +int p4tca_table_get_entry_fill(struct sk_buff *skb, struct p4tc_table *table, + struct p4tc_table_entry *entry, u32 tbl_id); + struct p4tc_parser *tcf_parser_create(struct p4tc_pipeline *pipeline, const char *parser_name, u32 parser_inst_id, diff --git a/include/uapi/linux/p4tc.h b/include/uapi/linux/p4tc.h index 678ee20cd..727fdcfe5 100644 --- a/include/uapi/linux/p4tc.h +++ b/include/uapi/linux/p4tc.h @@ -119,6 +119,7 @@ enum { P4TC_OBJ_HDR_FIELD, P4TC_OBJ_ACT, P4TC_OBJ_TABLE, + P4TC_OBJ_TABLE_ENTRY, __P4TC_OBJ_MAX, }; #define P4TC_OBJ_MAX __P4TC_OBJ_MAX @@ -321,6 +322,37 @@ struct tc_act_dyna { tc_gen; }; +struct p4tc_table_entry_tm { + __u64 created; + __u64 lastused; + __u64 firstused; +}; + +/* Table entry attributes */ +enum { + P4TC_ENTRY_UNSPEC, + P4TC_ENTRY_TBLNAME, /* string */ + P4TC_ENTRY_KEY_BLOB, /* Key blob */ + P4TC_ENTRY_MASK_BLOB, /* Mask blob */ + P4TC_ENTRY_PRIO, /* u32 */ + P4TC_ENTRY_ACT, /* nested actions */ + P4TC_ENTRY_TM, /* entry data path timestamps */ + P4TC_ENTRY_WHODUNNIT, /* tells who's modifying the entry */ + P4TC_ENTRY_CREATE_WHODUNNIT, /* tells who created the entry */ + P4TC_ENTRY_UPDATE_WHODUNNIT, /* tells who updated the entry last */ + P4TC_ENTRY_PERMISSIONS, /* entry CRUDX permissions */ + P4TC_ENTRY_PAD, + __P4TC_ENTRY_MAX +}; +#define P4TC_ENTRY_MAX (__P4TC_ENTRY_MAX - 1) + +enum { + P4TC_ENTITY_UNSPEC, + P4TC_ENTITY_KERNEL, + P4TC_ENTITY_TC, + P4TC_ENTITY_MAX +}; + #define P4TC_RTA(r) \ ((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct p4tcmsg)))) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 62f0f5c90..dc061ddb8 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -201,6 +201,13 @@ enum { RTM_GETP4TEMPLATE, #define RTM_GETP4TEMPLATE RTM_GETP4TEMPLATE + RTM_CREATEP4TBENT = 128, +#define RTM_CREATEP4TBENT RTM_CREATEP4TBENT + RTM_DELP4TBENT, +#define RTM_DELP4TBENT RTM_DELP4TBENT + RTM_GETP4TBENT, +#define RTM_GETP4TBENT RTM_GETP4TBENT + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/net/sched/p4tc/Makefile b/net/sched/p4tc/Makefile index de3a7b833..0d2c20223 100644 --- a/net/sched/p4tc/Makefile +++ b/net/sched/p4tc/Makefile @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 obj-y := p4tc_types.o p4tc_pipeline.o p4tc_tmpl_api.o p4tc_meta.o \ - p4tc_parser_api.o p4tc_hdrfield.o p4tc_action.o p4tc_table.o + p4tc_parser_api.o p4tc_hdrfield.o p4tc_action.o p4tc_table.o \ + p4tc_tbl_api.o diff --git a/net/sched/p4tc/p4tc_pipeline.c b/net/sched/p4tc/p4tc_pipeline.c index 854fc5b57..f8fcde20b 100644 --- a/net/sched/p4tc/p4tc_pipeline.c +++ b/net/sched/p4tc/p4tc_pipeline.c @@ -328,7 +328,16 @@ static int tcf_pipeline_put(struct net *net, struct p4tc_metadata *meta; struct p4tc_table *table; + if (!refcount_dec_if_one(&pipeline->p_ctrl_ref)) { + if (pipeline_net) { + put_net(pipeline_net); + NL_SET_ERR_MSG(extack, "Can't delete referenced pipeline"); + return -EBUSY; + } + } + if (pipeline_net && !refcount_dec_if_one(&pipeline->p_ref)) { + refcount_set(&pipeline->p_ctrl_ref, 1); NL_SET_ERR_MSG(extack, "Can't delete referenced pipeline"); return -EBUSY; } @@ -567,6 +576,9 @@ static struct p4tc_pipeline *tcf_pipeline_create(struct net *net, pipeline->net = net; refcount_set(&pipeline->p_ref, 1); + refcount_set(&pipeline->p_ctrl_ref, 1); + refcount_set(&pipeline->p_hdrs_used, 1); + refcount_set(&pipeline->p_entry_deferal_ref, 1); pipeline->common.ops = (struct p4tc_template_ops *)&p4tc_pipeline_ops; diff --git a/net/sched/p4tc/p4tc_table.c b/net/sched/p4tc/p4tc_table.c index f793c70bc..491e44396 100644 --- a/net/sched/p4tc/p4tc_table.c +++ b/net/sched/p4tc/p4tc_table.c @@ -234,6 +234,17 @@ static int _tcf_table_fill_nlmsg(struct sk_buff *skb, struct p4tc_table *table) } nla_nest_end(skb, nested_tbl_acts); + if (table->tbl_const_entry) { + struct nlattr *const_nest; + + const_nest = nla_nest_start(skb, P4TC_TABLE_OPT_ENTRY); + p4tca_table_get_entry_fill(skb, table, table->tbl_const_entry, + table->tbl_id); + nla_nest_end(skb, const_nest); + } + kfree(table->tbl_const_entry); + table->tbl_const_entry = NULL; + if (nla_put(skb, P4TC_TABLE_INFO, sizeof(parm), &parm)) goto out_nlmsg_trim; nla_nest_end(skb, nest); @@ -381,6 +392,9 @@ static inline int _tcf_table_put(struct net *net, struct nlattr **tb, tcf_table_acts_list_destroy(&table->tbl_acts_list); + rhltable_free_and_destroy(&table->tbl_entries, + tcf_table_entry_destroy_hash, table); + idr_destroy(&table->tbl_masks_idr); idr_destroy(&table->tbl_prio_idr); @@ -1075,6 +1089,11 @@ static struct p4tc_table *tcf_table_create(struct net *net, struct nlattr **tb, spin_lock_init(&table->tbl_masks_idr_lock); spin_lock_init(&table->tbl_prio_idr_lock); + if (rhltable_init(&table->tbl_entries, &entry_hlt_params) < 0) { + ret = -EINVAL; + goto defaultacts_destroy; + } + table->tbl_key = key; pipeline->curr_tables += 1; @@ -1083,6 +1102,10 @@ static struct p4tc_table *tcf_table_create(struct net *net, struct nlattr **tb, return table; +defaultacts_destroy: + p4tc_table_defact_destroy(table->tbl_default_missact); + p4tc_table_defact_destroy(table->tbl_default_hitact); + key_put: if (key) tcf_table_key_put(key); @@ -1279,6 +1302,25 @@ static struct p4tc_table *tcf_table_update(struct net *net, struct nlattr **tb, } } + if (tb[P4TC_TABLE_OPT_ENTRY]) { + struct p4tc_table_entry *entry; + + entry = kzalloc(GFP_KERNEL, sizeof(*entry)); + if (!entry) { + ret = -ENOMEM; + goto free_perm; + } + + /* Workaround to make this work */ + ret = tcf_table_const_entry_cu(net, tb[P4TC_TABLE_OPT_ENTRY], + entry, pipeline, table, extack); + if (ret < 0) { + kfree(entry); + goto free_perm; + } + table->tbl_const_entry = entry; + } + if (preacts) { p4tc_action_destroy(table->tbl_preacts); table->tbl_preacts = preacts; @@ -1326,6 +1368,9 @@ static struct p4tc_table *tcf_table_update(struct net *net, struct nlattr **tb, return table; +free_perm: + kfree(perm); + key_destroy: if (key) tcf_table_key_put(key); diff --git a/net/sched/p4tc/p4tc_tbl_api.c b/net/sched/p4tc/p4tc_tbl_api.c new file mode 100644 index 000000000..4523ec09b --- /dev/null +++ b/net/sched/p4tc/p4tc_tbl_api.c @@ -0,0 +1,1898 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * net/sched/p4tc_tbl_api.c TC P4 TABLE API + * + * Copyright (c) 2022, Mojatatu Networks + * Copyright (c) 2022, Intel Corporation. + * Authors: Jamal Hadi Salim + * Victor Nogueira + * Pedro Tammela + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define KEY_MASK_ID_SZ (sizeof(u32)) +#define KEY_MASK_ID_SZ_BITS (KEY_MASK_ID_SZ * BITS_PER_BYTE) + +static u32 p4tc_entry_hash_fn(const void *data, u32 len, u32 seed) +{ + const struct p4tc_table_entry_key *key = data; + + return jhash(key->value, key->keysz >> 3, seed); +} + +static int p4tc_entry_hash_cmp(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + const struct p4tc_table_entry_key *key = arg->key; + const struct p4tc_table_entry *entry = ptr; + + return memcmp(entry->key.value, key->value, entry->key.keysz >> 3); +} + +static u32 p4tc_entry_obj_hash_fn(const void *data, u32 len, u32 seed) +{ + const struct p4tc_table_entry *entry = data; + + return p4tc_entry_hash_fn(&entry->key, 0, seed); +} + +const struct rhashtable_params entry_hlt_params = { + .obj_cmpfn = p4tc_entry_hash_cmp, + .obj_hashfn = p4tc_entry_obj_hash_fn, + .hashfn = p4tc_entry_hash_fn, + .head_offset = offsetof(struct p4tc_table_entry, ht_node), + .key_offset = offsetof(struct p4tc_table_entry, key), + .automatic_shrinking = true, +}; + +static struct p4tc_table_entry * +p4tc_entry_lookup(struct p4tc_table *table, struct p4tc_table_entry_key *key, + u32 prio) __must_hold(RCU) +{ + struct p4tc_table_entry *entry; + struct rhlist_head *tmp, *bucket_list; + + bucket_list = + rhltable_lookup(&table->tbl_entries, key, entry_hlt_params); + if (!bucket_list) + return NULL; + + rhl_for_each_entry_rcu(entry, tmp, bucket_list, ht_node) + if (entry->prio == prio) + return entry; + + return NULL; +} + +static struct p4tc_table_entry * +__p4tc_entry_lookup(struct p4tc_table *table, struct p4tc_table_entry_key *key) + __must_hold(RCU) +{ + struct p4tc_table_entry *entry = NULL; + u32 smallest_prio = U32_MAX; + struct rhlist_head *tmp, *bucket_list; + struct p4tc_table_entry *entry_curr; + + bucket_list = + rhltable_lookup(&table->tbl_entries, key, entry_hlt_params); + if (!bucket_list) + return NULL; + + rhl_for_each_entry_rcu(entry_curr, tmp, bucket_list, ht_node) { + if (entry_curr->prio <= smallest_prio) { + smallest_prio = entry_curr->prio; + entry = entry_curr; + } + } + + return entry; +} + +static void mask_key(struct p4tc_table_entry_mask *mask, u8 *masked_key, + u8 *skb_key) +{ + int i; + __u32 *mask_id; + + mask_id = (u32 *)&masked_key[0]; + *mask_id = mask->mask_id; + + for (i = KEY_MASK_ID_SZ; i < BITS_TO_BYTES(mask->sz); i++) + masked_key[i] = skb_key[i - KEY_MASK_ID_SZ] & mask->value[i]; +} + +struct p4tc_table_entry *p4tc_table_entry_lookup(struct sk_buff *skb, + struct p4tc_table *table, + u32 keysz) +{ + struct p4tc_table_entry *entry_curr = NULL; + u8 masked_key[KEY_MASK_ID_SZ + BITS_TO_BYTES(P4TC_MAX_KEYSZ)] = { 0 }; + u32 smallest_prio = U32_MAX; + struct p4tc_table_entry_mask *mask; + struct p4tc_table_entry *entry = NULL; + struct p4tc_skb_ext *p4tc_skb_ext; + unsigned long tmp, mask_id; + + p4tc_skb_ext = skb_ext_find(skb, P4TC_SKB_EXT); + if (unlikely(!p4tc_skb_ext)) + return ERR_PTR(-ENOENT); + + idr_for_each_entry_ul(&table->tbl_masks_idr, mask, tmp, mask_id) { + struct p4tc_table_entry_key key = {}; + + mask_key(mask, masked_key, p4tc_skb_ext->p4tc_ext->key); + + key.value = masked_key; + key.keysz = keysz + KEY_MASK_ID_SZ_BITS; + + entry_curr = __p4tc_entry_lookup(table, &key); + if (entry_curr) { + if (entry_curr->prio <= smallest_prio) { + smallest_prio = entry_curr->prio; + entry = entry_curr; + } + } + } + + return entry; +} + +#define tcf_table_entry_mask_find_byid(table, id) \ + (idr_find(&(table)->tbl_masks_idr, id)) + +static int p4tca_table_get_entry_keys(struct sk_buff *skb, + struct p4tc_table *table, + struct p4tc_table_entry *entry) +{ + unsigned char *b = nlmsg_get_pos(skb); + int ret = -ENOMEM; + struct p4tc_table_entry_mask *mask; + u32 key_sz_bytes; + + key_sz_bytes = (entry->key.keysz - KEY_MASK_ID_SZ_BITS) / BITS_PER_BYTE; + if (nla_put(skb, P4TC_ENTRY_KEY_BLOB, key_sz_bytes, + entry->key.unmasked_key + KEY_MASK_ID_SZ)) + goto out_nlmsg_trim; + + mask = tcf_table_entry_mask_find_byid(table, entry->mask_id); + if (nla_put(skb, P4TC_ENTRY_MASK_BLOB, key_sz_bytes, + mask->value + KEY_MASK_ID_SZ)) + goto out_nlmsg_trim; + + return 0; + +out_nlmsg_trim: + nlmsg_trim(skb, b); + return ret; +} + +static void p4tc_table_entry_tm_dump(struct p4tc_table_entry_tm *dtm, + struct p4tc_table_entry_tm *stm) +{ + unsigned long now = jiffies; + + dtm->created = stm->created ? + jiffies_to_clock_t(now - stm->created) : 0; + dtm->lastused = stm->lastused ? + jiffies_to_clock_t(now - stm->lastused) : 0; + dtm->firstused = stm->firstused ? + jiffies_to_clock_t(now - stm->firstused) : 0; +} + +#define P4TC_ENTRY_MAX_IDS (P4TC_PATH_MAX - 1) + +int p4tca_table_get_entry_fill(struct sk_buff *skb, struct p4tc_table *table, + struct p4tc_table_entry *entry, u32 tbl_id) +{ + unsigned char *b = nlmsg_get_pos(skb); + int ret = -ENOMEM; + struct nlattr *nest, *nest_acts; + struct p4tc_table_entry_tm dtm, *tm; + u32 ids[P4TC_ENTRY_MAX_IDS]; + + ids[P4TC_TBLID_IDX - 1] = tbl_id; + + if (nla_put(skb, P4TC_PATH, P4TC_ENTRY_MAX_IDS * sizeof(u32), ids)) + goto out_nlmsg_trim; + + nest = nla_nest_start(skb, P4TC_PARAMS); + if (!nest) + goto out_nlmsg_trim; + + if (nla_put_u32(skb, P4TC_ENTRY_PRIO, entry->prio)) + goto out_nlmsg_trim; + + if (p4tca_table_get_entry_keys(skb, table, entry) < 0) + goto out_nlmsg_trim; + + if (entry->acts) { + nest_acts = nla_nest_start(skb, P4TC_ENTRY_ACT); + if (tcf_action_dump(skb, entry->acts, 0, 0, false) < 0) + goto out_nlmsg_trim; + nla_nest_end(skb, nest_acts); + } + + if (nla_put_u8(skb, P4TC_ENTRY_CREATE_WHODUNNIT, entry->who_created)) + goto out_nlmsg_trim; + + if (entry->who_updated) { + if (nla_put_u8(skb, P4TC_ENTRY_UPDATE_WHODUNNIT, + entry->who_updated)) + goto out_nlmsg_trim; + } + + if (nla_put_u16(skb, P4TC_ENTRY_PERMISSIONS, entry->permissions)) + goto out_nlmsg_trim; + + tm = rtnl_dereference(entry->tm); + p4tc_table_entry_tm_dump(&dtm, tm); + if (nla_put_64bit(skb, P4TC_ENTRY_TM, sizeof(dtm), &dtm, + P4TC_ENTRY_PAD)) + goto out_nlmsg_trim; + + nla_nest_end(skb, nest); + + return skb->len; + +out_nlmsg_trim: + nlmsg_trim(skb, b); + return ret; +} + +static const struct nla_policy p4tc_entry_policy[P4TC_ENTRY_MAX + 1] = { + [P4TC_ENTRY_TBLNAME] = { .type = NLA_STRING }, + [P4TC_ENTRY_KEY_BLOB] = { .type = NLA_BINARY }, + [P4TC_ENTRY_MASK_BLOB] = { .type = NLA_BINARY }, + [P4TC_ENTRY_PRIO] = { .type = NLA_U32 }, + [P4TC_ENTRY_ACT] = { .type = NLA_NESTED }, + [P4TC_ENTRY_TM] = { .type = NLA_BINARY, + .len = sizeof(struct p4tc_table_entry_tm) }, + [P4TC_ENTRY_WHODUNNIT] = { .type = NLA_U8 }, + [P4TC_ENTRY_CREATE_WHODUNNIT] = { .type = NLA_U8 }, + [P4TC_ENTRY_UPDATE_WHODUNNIT] = { .type = NLA_U8 }, + [P4TC_ENTRY_PERMISSIONS] = { .type = NLA_U16 }, +}; + +static void __tcf_table_entry_mask_destroy(struct p4tc_table_entry_mask *mask) +{ + kfree(mask->value); + kfree(mask); +} + +static void tcf_table_entry_mask_destroy(struct rcu_head *rcu) +{ + struct p4tc_table_entry_mask *mask; + + mask = container_of(rcu, struct p4tc_table_entry_mask, rcu); + + __tcf_table_entry_mask_destroy(mask); +} + +static struct p4tc_table_entry_mask * +tcf_table_entry_mask_find_byvalue(struct p4tc_table *table, + struct p4tc_table_entry_mask *mask) +{ + struct p4tc_table_entry_mask *mask_cur; + unsigned long mask_id, tmp; + + idr_for_each_entry_ul(&table->tbl_masks_idr, mask_cur, tmp, mask_id) { + if (mask_cur->sz == mask->sz) { + u32 mask_sz_bytes = mask->sz / BITS_PER_BYTE - KEY_MASK_ID_SZ; + void *curr_mask_value = mask_cur->value + KEY_MASK_ID_SZ; + void *mask_value = mask->value + KEY_MASK_ID_SZ; + + if (memcmp(curr_mask_value, mask_value, mask_sz_bytes) == 0) + return mask_cur; + } + } + + return NULL; +} + +static void tcf_table_entry_mask_del(struct p4tc_table *table, + struct p4tc_table_entry *entry) +{ + const u32 mask_id = entry->mask_id; + struct p4tc_table_entry_mask *mask_found; + + /* Will always be found*/ + mask_found = tcf_table_entry_mask_find_byid(table, mask_id); + + /* Last reference, can delete*/ + if (refcount_dec_if_one(&mask_found->mask_ref)) { + spin_lock_bh(&table->tbl_masks_idr_lock); + idr_remove(&table->tbl_masks_idr, mask_found->mask_id); + spin_unlock_bh(&table->tbl_masks_idr_lock); + call_rcu(&mask_found->rcu, tcf_table_entry_mask_destroy); + } else { + if (!refcount_dec_not_one(&mask_found->mask_ref)) + pr_warn("Mask was deleted in parallel"); + } +} + +/* TODO: Ordering optimisation for LPM */ +static struct p4tc_table_entry_mask * +tcf_table_entry_mask_add(struct p4tc_table *table, + struct p4tc_table_entry *entry, + struct p4tc_table_entry_mask *mask) +{ + struct p4tc_table_entry_mask *mask_found; + int ret; + + mask_found = tcf_table_entry_mask_find_byvalue(table, mask); + /* Only add mask if it was not already added */ + if (!mask_found) { + struct p4tc_table_entry_mask *mask_allocated; + + mask_allocated = kzalloc(sizeof(*mask_allocated), GFP_ATOMIC); + if (!mask_allocated) + return ERR_PTR(-ENOMEM); + + mask_allocated->value = + kzalloc(BITS_TO_BYTES(mask->sz), GFP_ATOMIC); + if (!mask_allocated->value) { + kfree(mask_allocated); + return ERR_PTR(-ENOMEM); + } + memcpy(mask_allocated->value, mask->value, + BITS_TO_BYTES(mask->sz)); + + mask_allocated->mask_id = 1; + refcount_set(&mask_allocated->mask_ref, 1); + mask_allocated->sz = mask->sz; + + spin_lock_bh(&table->tbl_masks_idr_lock); + ret = idr_alloc_u32(&table->tbl_masks_idr, mask_allocated, + &mask_allocated->mask_id, UINT_MAX, + GFP_ATOMIC); + spin_unlock_bh(&table->tbl_masks_idr_lock); + if (ret < 0) { + kfree(mask_allocated->value); + kfree(mask_allocated); + return ERR_PTR(ret); + } + entry->mask_id = mask_allocated->mask_id; + mask_found = mask_allocated; + } else { + if (!refcount_inc_not_zero(&mask_found->mask_ref)) + return ERR_PTR(-EBUSY); + entry->mask_id = mask_found->mask_id; + } + + return mask_found; +} + +static void tcf_table_entry_del_act(struct p4tc_table_entry *entry) +{ + p4tc_action_destroy(entry->acts); + kfree(entry); +} + +static void tcf_table_entry_del_act_work(struct work_struct *work) +{ + struct p4tc_table_entry_work *entry_work = + container_of(work, typeof(*entry_work), work); + struct p4tc_pipeline *pipeline = entry_work->pipeline; + + tcf_table_entry_del_act(entry_work->entry); + put_net(pipeline->net); + + refcount_dec(&entry_work->pipeline->p_entry_deferal_ref); + + kfree(entry_work); +} + +static void tcf_table_entry_put(struct p4tc_table_entry *entry) +{ + struct p4tc_table_entry_tm *tm; + + tm = rcu_dereference(entry->tm); + kfree(tm); + + kfree(entry->key.unmasked_key); + kfree(entry->key.value); + + if (entry->acts) { + struct p4tc_table_entry_work *entry_work = entry->entry_work; + struct p4tc_pipeline *pipeline = entry_work->pipeline; + struct net *net; + + if (entry_work->defer_deletion) { + net = get_net(pipeline->net); + refcount_inc(&entry_work->pipeline->p_entry_deferal_ref); + schedule_work(&entry_work->work); + } else { + kfree(entry_work); + tcf_table_entry_del_act(entry); + } + } else { + kfree(entry->entry_work); + kfree(entry); + } +} + +static void tcf_table_entry_put_rcu(struct rcu_head *rcu) +{ + struct p4tc_table_entry *entry; + + entry = container_of(rcu, struct p4tc_table_entry, rcu); + + tcf_table_entry_put(entry); +} + +static int tcf_table_entry_destroy(struct p4tc_table *table, + struct p4tc_table_entry *entry, + bool remove_from_hash) +{ + /* Entry was deleted in parallel */ + if (!refcount_dec_if_one(&entry->entries_ref)) + return -EBUSY; + + if (remove_from_hash) + rhltable_remove(&table->tbl_entries, &entry->ht_node, + entry_hlt_params); + + tcf_table_entry_mask_del(table, entry); + if (entry->entry_work->defer_deletion) { + call_rcu(&entry->rcu, tcf_table_entry_put_rcu); + } else { + synchronize_rcu(); + tcf_table_entry_put(entry); + } + + return 0; +} + +/* Only deletes entries when called from pipeline delete, which means + * pipeline->p_ref will already be 0, so no need to use that refcount. + */ +void tcf_table_entry_destroy_hash(void *ptr, void *arg) +{ + struct p4tc_table *table = arg; + struct p4tc_table_entry *entry = ptr; + + refcount_dec(&table->tbl_entries_ref); + + entry->entry_work->defer_deletion = false; + tcf_table_entry_destroy(table, entry, false); +} + +static void tcf_table_entry_put_table(struct p4tc_pipeline *pipeline, + struct p4tc_table *table) +{ + /* If we are here, it means that this was just incremented, so it should be > 1 */ + WARN_ON(!refcount_dec_not_one(&table->tbl_ctrl_ref)); + WARN_ON(!refcount_dec_not_one(&pipeline->p_ctrl_ref)); +} + +static int tcf_table_entry_get_table(struct net *net, + struct p4tc_pipeline **pipeline, + struct p4tc_table **table, + struct nlattr **tb, u32 *ids, char *p_name, + struct netlink_ext_ack *extack) + __must_hold(RCU) +{ + u32 pipeid, tbl_id; + char *tblname; + int ret; + + pipeid = ids[P4TC_PID_IDX]; + + *pipeline = tcf_pipeline_find_byany(net, p_name, pipeid, extack); + if (IS_ERR(*pipeline)) { + ret = PTR_ERR(*pipeline); + goto out; + } + + if (!refcount_inc_not_zero(&((*pipeline)->p_ctrl_ref))) { + NL_SET_ERR_MSG(extack, "Pipeline is stale"); + ret = -EBUSY; + goto out; + } + + tbl_id = ids[P4TC_TBLID_IDX]; + + tblname = tb[P4TC_ENTRY_TBLNAME] ? nla_data(tb[P4TC_ENTRY_TBLNAME]) : NULL; + *table = tcf_table_find_byany(*pipeline, tblname, tbl_id, extack); + if (IS_ERR(*table)) { + ret = PTR_ERR(*table); + goto dec_pipeline_refcount; + } + if (!refcount_inc_not_zero(&((*table)->tbl_ctrl_ref))) { + NL_SET_ERR_MSG(extack, "Table is marked for deletion"); + ret = -EBUSY; + goto dec_pipeline_refcount; + } + + return 0; + +/* If we are here, it means that this was just incremented, so it should be > 1 */ +dec_pipeline_refcount: + WARN_ON(!refcount_dec_not_one(&((*pipeline)->p_ctrl_ref))); + +out: + return ret; +} + +static void tcf_table_entry_assign_key(struct p4tc_table_entry_key *key, + struct p4tc_table_entry_mask *mask, + u8 *keyblob, u8 *maskblob, u32 keysz) +{ + /* Don't assign mask_id to key yet, because it has not been allocated */ + memcpy(key->unmasked_key + KEY_MASK_ID_SZ, keyblob, keysz); + + /* Don't assign mask_id to value yet, because it has not been allocated */ + memcpy(mask->value + KEY_MASK_ID_SZ, maskblob, keysz); +} + +static int tcf_table_entry_extract_key(struct nlattr **tb, + struct p4tc_table_entry_key *key, + struct p4tc_table_entry_mask *mask, + struct netlink_ext_ack *extack) +{ + u32 internal_keysz; + u32 keysz; + + if (!tb[P4TC_ENTRY_KEY_BLOB] || !tb[P4TC_ENTRY_MASK_BLOB]) { + NL_SET_ERR_MSG(extack, "Must specify key and mask blobs"); + return -EINVAL; + } + + keysz = nla_len(tb[P4TC_ENTRY_KEY_BLOB]); + internal_keysz = (keysz + KEY_MASK_ID_SZ) * BITS_PER_BYTE; + if (key->keysz != internal_keysz) { + NL_SET_ERR_MSG(extack, + "Key blob size and table key size differ"); + return -EINVAL; + } + + if (keysz != nla_len(tb[P4TC_ENTRY_MASK_BLOB])) { + NL_SET_ERR_MSG(extack, + "Key and mask blob must have the same length"); + return -EINVAL; + } + + tcf_table_entry_assign_key(key, mask, nla_data(tb[P4TC_ENTRY_KEY_BLOB]), + nla_data(tb[P4TC_ENTRY_MASK_BLOB]), keysz); + + return 0; +} + +static void tcf_table_entry_build_key(struct p4tc_table_entry_key *key, + struct p4tc_table_entry_mask *mask) +{ + u32 *mask_id; + int i; + + mask_id = (u32 *)&key->unmasked_key[0]; + *mask_id = mask->mask_id; + + mask_id = (u32 *)&mask->value[0]; + *mask_id = mask->mask_id; + + for (i = 0; i < BITS_TO_BYTES(key->keysz); i++) + key->value[i] = key->unmasked_key[i] & mask->value[i]; +} + +static int ___tcf_table_entry_del(struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct p4tc_table_entry *entry, + bool from_control) + __must_hold(RCU) +{ + int ret = 0; + + if (from_control) { + if (!p4tc_ctrl_delete_ok(entry->permissions)) + return -EPERM; + } else { + if (!p4tc_data_delete_ok(entry->permissions)) + return -EPERM; + } + + if (!refcount_dec_not_one(&table->tbl_entries_ref)) + return -EBUSY; + + spin_lock_bh(&table->tbl_prio_idr_lock); + idr_remove(&table->tbl_prio_idr, entry->prio); + spin_unlock_bh(&table->tbl_prio_idr_lock); + + if (tcf_table_entry_destroy(table, entry, true) < 0) { + ret = -EBUSY; + goto inc_entries_ref; + } + + goto out; + +inc_entries_ref: + WARN_ON(!refcount_dec_not_one(&table->tbl_entries_ref)); + +out: + return ret; +} + +/* Internal function which will be called by the data path */ +static int __tcf_table_entry_del(struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct p4tc_table_entry_key *key, + struct p4tc_table_entry_mask *mask, u32 prio, + struct netlink_ext_ack *extack) +{ + struct p4tc_table_entry *entry; + int ret; + + tcf_table_entry_build_key(key, mask); + + entry = p4tc_entry_lookup(table, key, prio); + if (!entry) { + rcu_read_unlock(); + NL_SET_ERR_MSG(extack, "Unable to find entry"); + return -EINVAL; + } + + entry->entry_work->defer_deletion = true; + ret = ___tcf_table_entry_del(pipeline, table, entry, false); + + return ret; +} + +static int tcf_table_entry_gd(struct net *net, struct sk_buff *skb, + struct nlmsghdr *n, struct nlattr *arg, u32 *ids, + struct p4tc_nl_pname *nl_pname, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[P4TC_ENTRY_MAX + 1] = { NULL }; + struct p4tc_table_entry *entry = NULL; + struct p4tc_pipeline *pipeline = NULL; + struct p4tc_table_entry_mask *mask, *new_mask; + struct p4tc_table_entry_key *key; + struct p4tc_table *table; + u32 keysz_bytes; + u32 prio; + int ret; + + if (arg) { + ret = nla_parse_nested(tb, P4TC_ENTRY_MAX, arg, + p4tc_entry_policy, extack); + + if (ret < 0) + return ret; + } + + if (!tb[P4TC_ENTRY_PRIO]) { + NL_SET_ERR_MSG(extack, "Must specify table entry priority"); + return -EINVAL; + } + prio = *((u32 *)nla_data(tb[P4TC_ENTRY_PRIO])); + + rcu_read_lock(); + ret = tcf_table_entry_get_table(net, &pipeline, &table, tb, ids, + nl_pname->data, extack); + rcu_read_unlock(); + if (ret < 0) + return ret; + + if (n->nlmsg_type == RTM_DELP4TBENT && !pipeline_sealed(pipeline)) { + NL_SET_ERR_MSG(extack, + "Unable to delete table entry in unsealed pipeline"); + ret = -EINVAL; + goto table_put; + } + + key = kzalloc(sizeof(*key), GFP_KERNEL); + if (!key) { + NL_SET_ERR_MSG(extack, "Unable to allocate key"); + ret = -ENOMEM; + goto table_put; + } + key->keysz = table->tbl_keysz + KEY_MASK_ID_SZ_BITS; + keysz_bytes = (key->keysz / BITS_PER_BYTE); + + mask = kzalloc(sizeof(*mask), GFP_KERNEL); + if (!mask) { + NL_SET_ERR_MSG(extack, "Failed to allocate mask"); + ret = -ENOMEM; + goto free_key; + } + mask->value = kzalloc(keysz_bytes, GFP_KERNEL); + if (!mask->value) { + NL_SET_ERR_MSG(extack, "Failed to allocate mask value"); + ret = -ENOMEM; + kfree(mask); + goto free_key; + } + mask->sz = key->keysz; + + key->value = kzalloc(keysz_bytes, GFP_KERNEL); + if (!key->value) { + ret = -ENOMEM; + kfree(mask->value); + kfree(mask); + goto free_key; + } + + key->unmasked_key = kzalloc(keysz_bytes, GFP_KERNEL); + if (!key->unmasked_key) { + ret = -ENOMEM; + kfree(mask->value); + kfree(mask); + goto free_key_value; + } + + ret = tcf_table_entry_extract_key(tb, key, mask, extack); + if (ret < 0) { + kfree(mask->value); + kfree(mask); + goto free_key_unmasked; + } + + new_mask = tcf_table_entry_mask_find_byvalue(table, mask); + kfree(mask->value); + kfree(mask); + if (!new_mask) { + NL_SET_ERR_MSG(extack, "Unable to find entry"); + ret = -ENOENT; + goto free_key_unmasked; + } else { + mask = new_mask; + } + + tcf_table_entry_build_key(key, mask); + + rcu_read_lock(); + entry = p4tc_entry_lookup(table, key, prio); + if (!entry) { + NL_SET_ERR_MSG(extack, "Unable to find entry"); + ret = -EINVAL; + goto unlock; + } + + if (n->nlmsg_type == RTM_GETP4TBENT) { + if (!p4tc_ctrl_read_ok(entry->permissions)) { + NL_SET_ERR_MSG(extack, + "Permission denied: Unable to read table entry"); + ret = -EINVAL; + goto unlock; + } + } + + if (p4tca_table_get_entry_fill(skb, table, entry, table->tbl_id) <= 0) { + NL_SET_ERR_MSG(extack, "Unable to fill table entry attributes"); + ret = -EINVAL; + goto unlock; + } + + if (n->nlmsg_type == RTM_DELP4TBENT) { + entry->entry_work->defer_deletion = true; + ret = ___tcf_table_entry_del(pipeline, table, entry, true); + if (ret < 0) + goto unlock; + } + + if (!ids[P4TC_PID_IDX]) + ids[P4TC_PID_IDX] = pipeline->common.p_id; + + if (!nl_pname->passed) + strscpy(nl_pname->data, pipeline->common.name, PIPELINENAMSIZ); + + ret = 0; + + goto unlock; + +unlock: + rcu_read_unlock(); + +free_key_unmasked: + kfree(key->unmasked_key); + +free_key_value: + kfree(key->value); + +free_key: + kfree(key); + +table_put: + tcf_table_entry_put_table(pipeline, table); + + return ret; +} + +static int tcf_table_entry_flush(struct net *net, struct sk_buff *skb, + struct nlmsghdr *n, struct nlattr *arg, + u32 *ids, struct p4tc_nl_pname *nl_pname, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[P4TC_ENTRY_MAX + 1] = { NULL }; + unsigned char *b = nlmsg_get_pos(skb); + int ret = 0; + int i = 0; + struct p4tc_pipeline *pipeline; + struct p4tc_table_entry *entry; + struct p4tc_table *table; + u32 arg_ids[P4TC_PATH_MAX - 1]; + struct rhashtable_iter iter; + + if (arg) { + ret = nla_parse_nested(tb, P4TC_ENTRY_MAX, arg, + p4tc_entry_policy, extack); + if (ret < 0) + return ret; + } + + rcu_read_lock(); + ret = tcf_table_entry_get_table(net, &pipeline, &table, tb, ids, + nl_pname->data, extack); + rcu_read_unlock(); + if (ret < 0) + return ret; + + if (!ids[P4TC_TBLID_IDX]) + arg_ids[P4TC_TBLID_IDX - 1] = table->tbl_id; + + if (nla_put(skb, P4TC_PATH, sizeof(arg_ids), arg_ids)) { + ret = -ENOMEM; + goto out_nlmsg_trim; + } + + rhltable_walk_enter(&table->tbl_entries, &iter); + do { + rhashtable_walk_start(&iter); + + while ((entry = rhashtable_walk_next(&iter)) && !IS_ERR(entry)) { + if (!p4tc_ctrl_delete_ok(entry->permissions)) { + ret = -EPERM; + continue; + } + + if (!refcount_dec_not_one(&table->tbl_entries_ref)) { + NL_SET_ERR_MSG(extack, "Table entry is stale"); + ret = -EBUSY; + rhashtable_walk_stop(&iter); + goto walk_exit; + } + + entry->entry_work->defer_deletion = true; + if (tcf_table_entry_destroy(table, entry, true) < 0) { + ret = -EBUSY; + continue; + } + i++; + } + + rhashtable_walk_stop(&iter); + } while (entry == ERR_PTR(-EAGAIN)); + +walk_exit: + rhashtable_walk_exit(&iter); + + nla_put_u32(skb, P4TC_COUNT, i); + + if (ret < 0) { + if (i == 0) { + if (!extack->_msg) + NL_SET_ERR_MSG(extack, + "Unable to flush any entries"); + goto out_nlmsg_trim; + } else { + if (!extack->_msg) + NL_SET_ERR_MSG(extack, + "Unable to flush all entries"); + } + } + + if (!ids[P4TC_PID_IDX]) + ids[P4TC_PID_IDX] = pipeline->common.p_id; + + if (!nl_pname->passed) + strscpy(nl_pname->data, pipeline->common.name, PIPELINENAMSIZ); + + ret = 0; + goto table_put; + +out_nlmsg_trim: + nlmsg_trim(skb, b); + +/* If we are here, it means that this was just incremented, so it should be > 1 */ +table_put: + tcf_table_entry_put_table(pipeline, table); + + return ret; +} + +/* Invoked from both control and data path */ +static int __tcf_table_entry_create(struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct p4tc_table_entry *entry, + struct p4tc_table_entry_mask *mask, + u16 whodunnit, bool from_control) + __must_hold(RCU) +{ + struct p4tc_table_perm *tbl_perm; + struct p4tc_table_entry_mask *mask_found; + struct p4tc_table_entry_work *entry_work; + struct p4tc_table_entry_tm *dtm; + u16 permissions; + int ret; + + refcount_set(&entry->entries_ref, 1); + + tbl_perm = rcu_dereference(table->tbl_permissions); + permissions = tbl_perm->permissions; + if (from_control) { + if (!p4tc_ctrl_create_ok(permissions)) + return -EPERM; + } else { + if (!p4tc_data_create_ok(permissions)) + return -EPERM; + } + + mask_found = tcf_table_entry_mask_add(table, entry, mask); + if (IS_ERR(mask_found)) { + ret = PTR_ERR(mask_found); + goto out; + } + + tcf_table_entry_build_key(&entry->key, mask_found); + + if (!refcount_inc_not_zero(&table->tbl_entries_ref)) { + ret = -EBUSY; + goto rm_masks_idr; + } + + if (p4tc_entry_lookup(table, &entry->key, entry->prio)) { + ret = -EEXIST; + goto dec_entries_ref; + } + + dtm = kzalloc(sizeof(*dtm), GFP_ATOMIC); + if (!dtm) { + ret = -ENOMEM; + goto dec_entries_ref; + } + + entry->who_created = whodunnit; + + dtm->created = jiffies; + dtm->firstused = 0; + dtm->lastused = jiffies; + rcu_assign_pointer(entry->tm, dtm); + + entry_work = kzalloc(sizeof(*(entry_work)), GFP_ATOMIC); + if (!entry_work) { + ret = -ENOMEM; + goto free_tm; + } + + entry_work->pipeline = pipeline; + entry_work->entry = entry; + entry->entry_work = entry_work; + + INIT_WORK(&entry_work->work, tcf_table_entry_del_act_work); + + if (rhltable_insert(&table->tbl_entries, &entry->ht_node, + entry_hlt_params) < 0) { + ret = -EBUSY; + goto free_entry_work; + } + + return 0; + +free_entry_work: + kfree(entry_work); + +free_tm: + kfree(dtm); +/*If we are here, it means that this was just incremented, so it should be > 1 */ +dec_entries_ref: + WARN_ON(!refcount_dec_not_one(&table->tbl_entries_ref)); + +rm_masks_idr: + tcf_table_entry_mask_del(table, entry); + +out: + return ret; +} + +/* Invoked from both control and data path */ +static int __tcf_table_entry_update(struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct p4tc_table_entry *entry, + struct p4tc_table_entry_mask *mask, + u16 whodunnit, bool from_control) + __must_hold(RCU) +{ + struct p4tc_table_entry_mask *mask_found; + struct p4tc_table_entry_work *entry_work; + struct p4tc_table_entry *entry_old; + struct p4tc_table_entry_tm *tm_old; + struct p4tc_table_entry_tm *tm; + int ret; + + refcount_set(&entry->entries_ref, 1); + + mask_found = tcf_table_entry_mask_add(table, entry, mask); + if (IS_ERR(mask_found)) { + ret = PTR_ERR(mask_found); + goto out; + } + + tcf_table_entry_build_key(&entry->key, mask_found); + + entry_old = p4tc_entry_lookup(table, &entry->key, entry->prio); + if (!entry_old) { + ret = -ENOENT; + goto rm_masks_idr; + } + + if (from_control) { + if (!p4tc_ctrl_update_ok(entry_old->permissions)) { + ret = -EPERM; + goto rm_masks_idr; + } + } else { + if (!p4tc_data_update_ok(entry_old->permissions)) { + ret = -EPERM; + goto rm_masks_idr; + } + } + + if (refcount_read(&entry_old->entries_ref) > 1) { + ret = -EBUSY; + goto rm_masks_idr; + } + + tm = kzalloc(sizeof(*tm), GFP_ATOMIC); + if (!tm) { + ret = -ENOMEM; + goto rm_masks_idr; + } + + tm_old = rcu_dereference_protected(entry_old->tm, 1); + tm->created = tm_old->created; + tm->firstused = tm_old->firstused; + tm->lastused = jiffies; + + entry->who_updated = whodunnit; + + entry->who_created = entry_old->who_created; + + if (entry->permissions == P4TC_PERMISSIONS_UNINIT) + entry->permissions = entry_old->permissions; + + rcu_assign_pointer(entry->tm, tm); + + entry_work = kzalloc(sizeof(*(entry_work)), GFP_ATOMIC); + if (!entry_work) { + ret = -ENOMEM; + goto free_tm; + } + + entry_work->pipeline = pipeline; + entry_work->entry = entry; + entry->entry_work = entry_work; + + INIT_WORK(&entry_work->work, tcf_table_entry_del_act_work); + + if (rhltable_insert(&table->tbl_entries, &entry->ht_node, + entry_hlt_params) < 0) { + ret = -EEXIST; + goto free_entry_work; + } + + entry_old->entry_work->defer_deletion = true; + if (tcf_table_entry_destroy(table, entry_old, true) < 0) { + ret = -EBUSY; + goto out; + } + + return 0; + +free_entry_work: + kfree(entry_work); + +free_tm: + kfree(tm); + +rm_masks_idr: + tcf_table_entry_mask_del(table, entry); + +out: + return ret; +} + +#define P4TC_DEFAULT_TENTRY_PERMISSIONS \ + (P4TC_CTRL_PERM_R | P4TC_CTRL_PERM_U | P4TC_CTRL_PERM_D | \ + P4TC_DATA_PERM_R | P4TC_DATA_PERM_X) + +static bool tcf_table_check_entry_acts(struct p4tc_table *table, + struct tc_action *entry_acts[], + struct list_head *allowed_acts, + int num_entry_acts) +{ + struct p4tc_table_act *table_act; + int i; + + for (i = 0; i < num_entry_acts; i++) { + const struct tc_action *entry_act = entry_acts[i]; + + list_for_each_entry(table_act, allowed_acts, node) { + if (table_act->ops->id == entry_act->ops->id && + !(table_act->flags & BIT(P4TC_TABLE_ACTS_DEFAULT_ONLY))) + return true; + } + } + + return false; +} + +static int __tcf_table_entry_cu(struct net *net, u32 flags, struct nlattr **tb, + struct p4tc_table_entry *entry_cpy, + struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct netlink_ext_ack *extack) +{ + u8 mask_value[KEY_MASK_ID_SZ + BITS_TO_BYTES(P4TC_MAX_KEYSZ)] = { 0 }; + struct p4tc_table_entry_mask mask = { 0 }; + u8 whodunnit = P4TC_ENTITY_UNSPEC; + int ret = 0; + struct p4tc_table_entry *entry; + u32 keysz_bytes; + u32 prio; + + prio = tb[P4TC_ENTRY_PRIO] ? *((u32 *)nla_data(tb[P4TC_ENTRY_PRIO])) : 0; + if (flags & NLM_F_REPLACE) { + if (!prio) { + NL_SET_ERR_MSG(extack, "Must specify entry priority"); + return -EINVAL; + } + } else { + if (!prio) { + prio = 1; + spin_lock(&table->tbl_prio_idr_lock); + ret = idr_alloc_u32(&table->tbl_prio_idr, + ERR_PTR(-EBUSY), &prio, UINT_MAX, + GFP_ATOMIC); + spin_unlock(&table->tbl_prio_idr_lock); + if (ret < 0) { + NL_SET_ERR_MSG(extack, + "Unable to allocate priority"); + return ret; + } + } else { + rcu_read_lock(); + if (idr_find(&table->tbl_prio_idr, prio)) { + rcu_read_unlock(); + NL_SET_ERR_MSG(extack, + "Priority already in use"); + return -EBUSY; + } + rcu_read_unlock(); + } + + if (refcount_read(&table->tbl_entries_ref) > table->tbl_max_entries) { + NL_SET_ERR_MSG(extack, + "Table instance max entries reached"); + return -EINVAL; + } + } + if (tb[P4TC_ENTRY_WHODUNNIT]) { + whodunnit = *((u8 *)nla_data(tb[P4TC_ENTRY_WHODUNNIT])); + } else { + NL_SET_ERR_MSG(extack, "Must specify whodunnit attribute"); + ret = -EINVAL; + goto idr_rm; + } + + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) { + NL_SET_ERR_MSG(extack, "Unable to allocate table entry"); + ret = -ENOMEM; + goto idr_rm; + } + entry->prio = prio; + + entry->key.keysz = table->tbl_keysz + KEY_MASK_ID_SZ_BITS; + keysz_bytes = entry->key.keysz / BITS_PER_BYTE; + + mask.sz = entry->key.keysz; + mask.value = mask_value; + + entry->key.value = kzalloc(keysz_bytes, GFP_KERNEL); + if (!entry->key.value) { + ret = -ENOMEM; + goto free_entry; + } + + entry->key.unmasked_key = kzalloc(keysz_bytes, GFP_KERNEL); + if (!entry->key.unmasked_key) { + ret = -ENOMEM; + goto free_key_value; + } + + ret = tcf_table_entry_extract_key(tb, &entry->key, &mask, extack); + if (ret < 0) + goto free_key_unmasked; + + if (tb[P4TC_ENTRY_PERMISSIONS]) { + const u16 tblperm = + rcu_dereference(table->tbl_permissions)->permissions; + u16 nlperm; + + nlperm = *((u16 *)nla_data(tb[P4TC_ENTRY_PERMISSIONS])); + if (nlperm > P4TC_MAX_PERMISSION) { + NL_SET_ERR_MSG(extack, + "Permission may only have 10 bits turned on"); + ret = -EINVAL; + goto free_key_unmasked; + } + if (p4tc_ctrl_create_ok(nlperm) || + p4tc_data_create_ok(nlperm)) { + NL_SET_ERR_MSG(extack, + "Create permission for table entry doesn't make sense"); + ret = -EINVAL; + goto free_key_unmasked; + } + if (!p4tc_data_read_ok(nlperm)) { + NL_SET_ERR_MSG(extack, + "Data path read permission must be set"); + ret = -EINVAL; + goto free_key_unmasked; + } + if (!p4tc_data_exec_ok(nlperm)) { + NL_SET_ERR_MSG(extack, + "Data path execute permissions for entry must be set"); + ret = -EINVAL; + goto free_key_unmasked; + } + + if (~tblperm & nlperm) { + NL_SET_ERR_MSG(extack, + "Trying to set permission bits which aren't allowed by table"); + ret = -EINVAL; + goto free_key_unmasked; + } + entry->permissions = nlperm; + } else { + if (flags & NLM_F_REPLACE) + entry->permissions = P4TC_PERMISSIONS_UNINIT; + else + entry->permissions = P4TC_DEFAULT_TENTRY_PERMISSIONS; + } + + if (tb[P4TC_ENTRY_ACT]) { + entry->acts = kcalloc(TCA_ACT_MAX_PRIO, + sizeof(struct tc_action *), GFP_KERNEL); + if (!entry->acts) { + ret = -ENOMEM; + goto free_key_unmasked; + } + + ret = p4tc_action_init(net, tb[P4TC_ENTRY_ACT], entry->acts, + table->common.p_id, + TCA_ACT_FLAGS_NO_RTNL, extack); + if (ret < 0) { + kfree(entry->acts); + entry->acts = NULL; + goto free_key_unmasked; + } + entry->num_acts = ret; + + if (!tcf_table_check_entry_acts(table, entry->acts, + &table->tbl_acts_list, ret)) { + ret = -EPERM; + NL_SET_ERR_MSG(extack, + "Action is not allowed as entry action"); + goto free_acts; + } + } + + rcu_read_lock(); + if (flags & NLM_F_REPLACE) + ret = __tcf_table_entry_update(pipeline, table, entry, &mask, + whodunnit, true); + else + ret = __tcf_table_entry_create(pipeline, table, entry, &mask, + whodunnit, true); + if (ret < 0) { + rcu_read_unlock(); + goto free_acts; + } + + memcpy(entry_cpy, entry, sizeof(*entry)); + + rcu_read_unlock(); + + return 0; + +free_acts: + p4tc_action_destroy(entry->acts); + +free_key_unmasked: + kfree(entry->key.unmasked_key); + +free_key_value: + kfree(entry->key.value); + +free_entry: + kfree(entry); + +idr_rm: + if (!(flags & NLM_F_REPLACE)) { + spin_lock(&table->tbl_prio_idr_lock); + idr_remove(&table->tbl_prio_idr, prio); + spin_unlock(&table->tbl_prio_idr_lock); + } + + return ret; +} + +static int tcf_table_entry_cu(struct sk_buff *skb, struct net *net, u32 flags, + struct nlattr *arg, u32 *ids, + struct p4tc_nl_pname *nl_pname, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[P4TC_ENTRY_MAX + 1] = { NULL }; + struct p4tc_table_entry entry = { 0 }; + struct p4tc_pipeline *pipeline; + struct p4tc_table *table; + int ret; + + ret = nla_parse_nested(tb, P4TC_ENTRY_MAX, arg, p4tc_entry_policy, + extack); + if (ret < 0) + return ret; + + rcu_read_lock(); + ret = tcf_table_entry_get_table(net, &pipeline, &table, tb, ids, + nl_pname->data, extack); + rcu_read_unlock(); + if (ret < 0) + return ret; + + if (!pipeline_sealed(pipeline)) { + NL_SET_ERR_MSG(extack, + "Need to seal pipeline before issuing runtime command"); + ret = -EINVAL; + goto table_put; + } + + ret = __tcf_table_entry_cu(net, flags, tb, &entry, pipeline, table, + extack); + if (ret < 0) + goto table_put; + + if (p4tca_table_get_entry_fill(skb, table, &entry, table->tbl_id) <= 0) + NL_SET_ERR_MSG(extack, "Unable to fill table entry attributes"); + + if (!nl_pname->passed) + strscpy(nl_pname->data, pipeline->common.name, PIPELINENAMSIZ); + + if (!ids[P4TC_PID_IDX]) + ids[P4TC_PID_IDX] = pipeline->common.p_id; + +table_put: + tcf_table_entry_put_table(pipeline, table); + return ret; +} + +int tcf_table_const_entry_cu(struct net *net, struct nlattr *arg, + struct p4tc_table_entry *entry, + struct p4tc_pipeline *pipeline, + struct p4tc_table *table, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[P4TC_ENTRY_MAX + 1] = { NULL }; + int ret; + + ret = nla_parse_nested(tb, P4TC_ENTRY_MAX, arg, p4tc_entry_policy, + extack); + if (ret < 0) + return ret; + + return __tcf_table_entry_cu(net, 0, tb, entry, pipeline, table, extack); +} + +static int tc_ctl_p4_get_1(struct net *net, struct sk_buff *skb, + struct nlmsghdr *n, u32 *ids, struct nlattr *arg, + struct p4tc_nl_pname *nl_pname, + struct netlink_ext_ack *extack) +{ + int ret = 0; + struct nlattr *tb[P4TC_MAX + 1]; + u32 *arg_ids; + + ret = nla_parse_nested(tb, P4TC_MAX, arg, NULL, extack); + if (ret < 0) + return ret; + + if (!tb[P4TC_PATH]) { + NL_SET_ERR_MSG(extack, "Must specify object path"); + return -EINVAL; + } + + if (nla_len(tb[P4TC_PATH]) > (P4TC_PATH_MAX - 1) * sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Path is too big"); + return -E2BIG; + } + + arg_ids = nla_data(tb[P4TC_PATH]); + memcpy(&ids[P4TC_TBLID_IDX], arg_ids, nla_len(tb[P4TC_PATH])); + + return tcf_table_entry_gd(net, skb, n, tb[P4TC_PARAMS], ids, nl_pname, + extack); +} + +static int tc_ctl_p4_delete_1(struct net *net, struct sk_buff *skb, + struct nlmsghdr *n, struct nlattr *arg, u32 *ids, + struct p4tc_nl_pname *nl_pname, + struct netlink_ext_ack *extack) +{ + int ret = 0; + struct nlattr *tb[P4TC_MAX + 1]; + u32 *arg_ids; + + ret = nla_parse_nested(tb, P4TC_MAX, arg, NULL, extack); + if (ret < 0) + return ret; + + if (!tb[P4TC_PATH]) { + NL_SET_ERR_MSG(extack, "Must specify object path"); + return -EINVAL; + } + + if ((nla_len(tb[P4TC_PATH])) > (P4TC_PATH_MAX - 1) * sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Path is too big"); + return -E2BIG; + } + + arg_ids = nla_data(tb[P4TC_PATH]); + memcpy(&ids[P4TC_TBLID_IDX], arg_ids, nla_len(tb[P4TC_PATH])); + if (n->nlmsg_flags & NLM_F_ROOT) + ret = tcf_table_entry_flush(net, skb, n, tb[P4TC_PARAMS], ids, + nl_pname, extack); + else + ret = tcf_table_entry_gd(net, skb, n, tb[P4TC_PARAMS], ids, + nl_pname, extack); + + return ret; +} + +static int tc_ctl_p4_cu_1(struct net *net, struct sk_buff *skb, + struct nlmsghdr *n, u32 *ids, struct nlattr *nla, + struct p4tc_nl_pname *nl_pname, + struct netlink_ext_ack *extack) +{ + int ret = 0; + struct nlattr *p4tca[P4TC_MAX + 1]; + u32 *arg_ids; + + ret = nla_parse_nested(p4tca, P4TC_MAX, nla, NULL, extack); + if (ret < 0) + return ret; + + if (!p4tca[P4TC_PATH]) { + NL_SET_ERR_MSG(extack, "Must specify object path"); + return -EINVAL; + } + + if (nla_len(p4tca[P4TC_PATH]) > (P4TC_PATH_MAX - 1) * sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Path is too big"); + return -E2BIG; + } + + if (!p4tca[P4TC_PARAMS]) { + NL_SET_ERR_MSG(extack, "Must specify object attributes"); + return -EINVAL; + } + + arg_ids = nla_data(p4tca[P4TC_PATH]); + memcpy(&ids[P4TC_TBLID_IDX], arg_ids, nla_len(p4tca[P4TC_PATH])); + + return tcf_table_entry_cu(skb, net, n->nlmsg_flags, p4tca[P4TC_PARAMS], + ids, nl_pname, extack); +} + +static int tc_ctl_p4_table_n(struct sk_buff *skb, struct nlmsghdr *n, int cmd, + char *p_name, struct nlattr *nla, + struct netlink_ext_ack *extack) +{ + struct p4tcmsg *t = (struct p4tcmsg *)nlmsg_data(n); + struct net *net = sock_net(skb->sk); + u32 portid = NETLINK_CB(skb).portid; + u32 ids[P4TC_PATH_MAX] = { 0 }; + int ret = 0, ret_send; + struct nlattr *p4tca[P4TC_MSGBATCH_SIZE + 1]; + struct p4tc_nl_pname nl_pname; + struct sk_buff *new_skb; + struct p4tcmsg *t_new; + struct nlmsghdr *nlh; + struct nlattr *pnatt; + struct nlattr *root; + int i; + + ret = nla_parse_nested(p4tca, P4TC_MSGBATCH_SIZE, nla, NULL, extack); + if (ret < 0) + return ret; + + if (!p4tca[1]) { + NL_SET_ERR_MSG(extack, "No elements in root table array"); + return -EINVAL; + } + + new_skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!new_skb) + return -ENOBUFS; + + nlh = nlmsg_put(new_skb, portid, n->nlmsg_seq, cmd, sizeof(*t), + n->nlmsg_flags); + if (!nlh) + goto out; + + t_new = nlmsg_data(nlh); + t_new->pipeid = t->pipeid; + t_new->obj = t->obj; + ids[P4TC_PID_IDX] = t_new->pipeid; + + pnatt = nla_reserve(new_skb, P4TC_ROOT_PNAME, PIPELINENAMSIZ); + if (!pnatt) { + ret = -ENOMEM; + goto out; + } + + nl_pname.data = nla_data(pnatt); + if (!p_name) { + /* Filled up by the operation or forced failure */ + memset(nl_pname.data, 0, PIPELINENAMSIZ); + nl_pname.passed = false; + } else { + strscpy(nl_pname.data, p_name, PIPELINENAMSIZ); + nl_pname.passed = true; + } + + net = maybe_get_net(net); + if (!net) { + NL_SET_ERR_MSG(extack, "Net namespace is going down"); + ret = -EBUSY; + goto out; + } + + root = nla_nest_start(new_skb, P4TC_ROOT); + for (i = 1; i < P4TC_MSGBATCH_SIZE + 1 && p4tca[i]; i++) { + struct nlattr *nest = nla_nest_start(new_skb, i); + + if (cmd == RTM_GETP4TBENT) + ret = tc_ctl_p4_get_1(net, new_skb, nlh, ids, p4tca[i], + &nl_pname, extack); + else if (cmd == RTM_CREATEP4TBENT) + ret = tc_ctl_p4_cu_1(net, new_skb, nlh, ids, p4tca[i], + &nl_pname, extack); + else if (cmd == RTM_DELP4TBENT) + ret = tc_ctl_p4_delete_1(net, new_skb, nlh, p4tca[i], + ids, &nl_pname, extack); + + if (ret < 0) { + if (i == 1) { + goto put_net; + } else { + nla_nest_cancel(new_skb, nest); + break; + } + } + nla_nest_end(new_skb, nest); + } + nla_nest_end(new_skb, root); + + if (!t_new->pipeid) + t_new->pipeid = ids[P4TC_PID_IDX]; + + nlmsg_end(new_skb, nlh); + + if (cmd == RTM_GETP4TBENT) + ret_send = rtnl_unicast(new_skb, net, portid); + else + ret_send = rtnetlink_send(new_skb, net, portid, RTNLGRP_TC, + n->nlmsg_flags & NLM_F_ECHO); + + put_net(net); + + return ret_send ? ret_send : ret; + +put_net: + put_net(net); + +out: + kfree_skb(new_skb); + return ret; +} + +static int tc_ctl_p4_root(struct sk_buff *skb, struct nlmsghdr *n, int cmd, + struct netlink_ext_ack *extack) +{ + char *p_name = NULL; + int ret = 0; + struct nlattr *p4tca[P4TC_ROOT_MAX + 1]; + + ret = nlmsg_parse(n, sizeof(struct p4tcmsg), p4tca, P4TC_ROOT_MAX, + p4tc_root_policy, extack); + if (ret < 0) + return ret; + + if (!p4tca[P4TC_ROOT]) { + NL_SET_ERR_MSG(extack, "Netlink P4TC table attributes missing"); + return -EINVAL; + } + + if (p4tca[P4TC_ROOT_PNAME]) + p_name = nla_data(p4tca[P4TC_ROOT_PNAME]); + + return tc_ctl_p4_table_n(skb, n, cmd, p_name, p4tca[P4TC_ROOT], extack); +} + +static int tc_ctl_p4_get(struct sk_buff *skb, struct nlmsghdr *n, + struct netlink_ext_ack *extack) +{ + return tc_ctl_p4_root(skb, n, RTM_GETP4TBENT, extack); +} + +static int tc_ctl_p4_delete(struct sk_buff *skb, struct nlmsghdr *n, + struct netlink_ext_ack *extack) +{ + if (!netlink_capable(skb, CAP_NET_ADMIN)) + return -EPERM; + + return tc_ctl_p4_root(skb, n, RTM_DELP4TBENT, extack); +} + +static int tc_ctl_p4_cu(struct sk_buff *skb, struct nlmsghdr *n, + struct netlink_ext_ack *extack) +{ + int ret; + + if (!netlink_capable(skb, CAP_NET_ADMIN)) + return -EPERM; + + ret = tc_ctl_p4_root(skb, n, RTM_CREATEP4TBENT, extack); + + return ret; +} + +static int tcf_table_entry_dump(struct sk_buff *skb, struct nlattr *arg, + u32 *ids, struct netlink_callback *cb, + char **p_name, struct netlink_ext_ack *extack) +{ + struct nlattr *tb[P4TC_ENTRY_MAX + 1] = { NULL }; + struct p4tc_dump_ctx *ctx = (void *)cb->ctx; + unsigned char *b = nlmsg_get_pos(skb); + struct p4tc_pipeline *pipeline = NULL; + struct p4tc_table_entry *entry = NULL; + struct net *net = sock_net(skb->sk); + int i = 0; + struct p4tc_table *table; + int ret; + + net = maybe_get_net(net); + if (!net) { + NL_SET_ERR_MSG(extack, "Net namespace is going down"); + return -EBUSY; + } + + if (arg) { + ret = nla_parse_nested(tb, P4TC_ENTRY_MAX, arg, + p4tc_entry_policy, extack); + if (ret < 0) { + kfree(ctx->iter); + goto net_put; + } + } + + rcu_read_lock(); + ret = tcf_table_entry_get_table(net, &pipeline, &table, tb, ids, + *p_name, extack); + rcu_read_unlock(); + if (ret < 0) { + kfree(ctx->iter); + goto net_put; + } + + if (!ctx->iter) { + ctx->iter = kzalloc(sizeof(*ctx->iter), GFP_KERNEL); + if (!ctx->iter) { + ret = -ENOMEM; + goto table_put; + } + + rhltable_walk_enter(&table->tbl_entries, ctx->iter); + } + + ret = -ENOMEM; + rhashtable_walk_start(ctx->iter); + do { + for (i = 0; i < P4TC_MSGBATCH_SIZE && + (entry = rhashtable_walk_next(ctx->iter)) && + !IS_ERR(entry); i++) { + struct nlattr *count; + + if (!p4tc_ctrl_read_ok(entry->permissions)) { + i--; + continue; + } + + count = nla_nest_start(skb, i + 1); + if (!count) { + rhashtable_walk_stop(ctx->iter); + goto table_put; + } + ret = p4tca_table_get_entry_fill(skb, table, entry, + table->tbl_id); + if (ret == 0) { + NL_SET_ERR_MSG(extack, + "Failed to fill notification attributes for table entry"); + goto walk_done; + } else if (ret == -ENOMEM) { + ret = 1; + nla_nest_cancel(skb, count); + rhashtable_walk_stop(ctx->iter); + goto table_put; + } + nla_nest_end(skb, count); + } + } while (entry == ERR_PTR(-EAGAIN)); + rhashtable_walk_stop(ctx->iter); + + if (!i) { + rhashtable_walk_exit(ctx->iter); + + ret = 0; + kfree(ctx->iter); + + goto table_put; + } + + if (!*p_name) + *p_name = pipeline->common.name; + + if (!ids[P4TC_PID_IDX]) + ids[P4TC_PID_IDX] = pipeline->common.p_id; + + ret = skb->len; + + goto table_put; + +walk_done: + rhashtable_walk_stop(ctx->iter); + rhashtable_walk_exit(ctx->iter); + kfree(ctx->iter); + + nlmsg_trim(skb, b); + +table_put: + tcf_table_entry_put_table(pipeline, table); + +net_put: + put_net(net); + + return ret; +} + +static int tc_ctl_p4_dump_1(struct sk_buff *skb, struct netlink_callback *cb, + struct nlattr *arg, char *p_name) +{ + struct netlink_ext_ack *extack = cb->extack; + u32 portid = NETLINK_CB(cb->skb).portid; + const struct nlmsghdr *n = cb->nlh; + u32 ids[P4TC_PATH_MAX] = { 0 }; + struct nlattr *tb[P4TC_MAX + 1]; + struct p4tcmsg *t_new; + struct nlmsghdr *nlh; + struct nlattr *root; + struct p4tcmsg *t; + u32 *arg_ids; + int ret; + + ret = nla_parse_nested(tb, P4TC_MAX, arg, p4tc_policy, extack); + if (ret < 0) + return ret; + + nlh = nlmsg_put(skb, portid, n->nlmsg_seq, RTM_GETP4TBENT, sizeof(*t), + n->nlmsg_flags); + if (!nlh) + return -ENOSPC; + + t = (struct p4tcmsg *)nlmsg_data(n); + t_new = nlmsg_data(nlh); + t_new->pipeid = t->pipeid; + t_new->obj = t->obj; + + if (!tb[P4TC_PATH]) { + NL_SET_ERR_MSG(extack, "Must specify object path"); + return -EINVAL; + } + + if ((nla_len(tb[P4TC_PATH])) > (P4TC_PATH_MAX - 1) * sizeof(u32)) { + NL_SET_ERR_MSG(extack, "Path is too big"); + return -E2BIG; + } + + ids[P4TC_PID_IDX] = t_new->pipeid; + arg_ids = nla_data(tb[P4TC_PATH]); + memcpy(&ids[P4TC_TBLID_IDX], arg_ids, nla_len(tb[P4TC_PATH])); + + root = nla_nest_start(skb, P4TC_ROOT); + ret = tcf_table_entry_dump(skb, tb[P4TC_PARAMS], ids, cb, &p_name, + extack); + if (ret <= 0) + goto out; + nla_nest_end(skb, root); + + if (p_name) { + if (nla_put_string(skb, P4TC_ROOT_PNAME, p_name)) { + ret = -1; + goto out; + } + } + + if (!t_new->pipeid) + t_new->pipeid = ids[P4TC_PID_IDX]; + + nlmsg_end(skb, nlh); + + return skb->len; + +out: + nlmsg_cancel(skb, nlh); + return ret; +} + +static int tc_ctl_p4_dump(struct sk_buff *skb, struct netlink_callback *cb) +{ + char *p_name = NULL; + int ret = 0; + struct nlattr *p4tca[P4TC_ROOT_MAX + 1]; + + ret = nlmsg_parse(cb->nlh, sizeof(struct p4tcmsg), p4tca, P4TC_ROOT_MAX, + p4tc_root_policy, cb->extack); + if (ret < 0) + return ret; + + if (!p4tca[P4TC_ROOT]) { + NL_SET_ERR_MSG(cb->extack, + "Netlink P4TC table attributes missing"); + return -EINVAL; + } + + if (p4tca[P4TC_ROOT_PNAME]) + p_name = nla_data(p4tca[P4TC_ROOT_PNAME]); + + return tc_ctl_p4_dump_1(skb, cb, p4tca[P4TC_ROOT], p_name); +} + +static int __init p4tc_tbl_init(void) +{ + rtnl_register(PF_UNSPEC, RTM_CREATEP4TBENT, tc_ctl_p4_cu, NULL, + RTNL_FLAG_DOIT_UNLOCKED); + rtnl_register(PF_UNSPEC, RTM_DELP4TBENT, tc_ctl_p4_delete, NULL, + RTNL_FLAG_DOIT_UNLOCKED); + rtnl_register(PF_UNSPEC, RTM_GETP4TBENT, tc_ctl_p4_get, tc_ctl_p4_dump, + RTNL_FLAG_DOIT_UNLOCKED); + + return 0; +} + +subsys_initcall(p4tc_tbl_init); diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index 0a8daf2f8..3c26d4dc4 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -97,6 +97,9 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { { RTM_CREATEP4TEMPLATE, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELP4TEMPLATE, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_GETP4TEMPLATE, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_CREATEP4TBENT, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_DELP4TBENT, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_GETP4TBENT, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = { @@ -179,7 +182,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) * structures at the top of this file with the new mappings * before updating the BUILD_BUG_ON() macro! */ - BUILD_BUG_ON(RTM_MAX != (RTM_CREATEP4TEMPLATE + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_CREATEP4TBENT + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break;