From patchwork Tue Aug 15 16:25:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jamal Hadi Salim X-Patchwork-Id: 13353966 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE5C6156E4 for ; Tue, 15 Aug 2023 16:26:23 +0000 (UTC) Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA028EE for ; Tue, 15 Aug 2023 09:26:21 -0700 (PDT) Received: by mail-qk1-x731.google.com with SMTP id af79cd13be357-76d1c58ace6so288218685a.1 for ; Tue, 15 Aug 2023 09:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mojatatu-com.20221208.gappssmtp.com; s=20221208; t=1692116781; x=1692721581; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uoRTLxA83owuqHIF5w85em2MfTM3rO2JzFdkcwhTzpo=; b=ogMciMFn2gZkrKs/gMwjvIs9QfrrkMlVMwuJDcxc48PZpADgWTQ09fy8225k8+iYy9 u8amT8pVo2uFuZxWXS6CtpGbqP7WOamihiMSTgHlBTseDXwoNEkb4RKxwqxNRwXkrDAl nGlnvM6b9NMswRHPCJEWpTuHNxi3pDRO3A4+b9S2o9iJBf8iICMvwAw0kGcVWIEh8gAC ctLoF6ISOOgfTDDgSjStQ0FqmR0COcgBfqgFOz2jHjvZsGeGXIReWFLL0KdyHa4qJLPb HN/wkNodi6qBeG8HIbCzONnVgUHKmNKpm7DG00EOIoWNNKbmiRvVMqhw0viS5VHe9kZS Xc0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692116781; x=1692721581; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uoRTLxA83owuqHIF5w85em2MfTM3rO2JzFdkcwhTzpo=; b=dHIzTRBMwVuh9JFynz8ODssWSd6lcmsWvceGF+vpRLLB1yvETa8ourSKALtBuIaTzZ HYNherXZvsloJz4ko/wW6b20JAJEcysM4ogZOjVpxNmpCPZd7TNaoOQmhkTNndwRIMKP FJZGvzHew7eOuDtZ7aCFj+GJfRP0A8ZLmpn9WSVXNxxkuqGYHO/w74oxC7BDyT8/VL+M ju0ahhS76EISU12+LbyaKwAjlbaRaoW06pqj5fRu0Pf9bBuTlWxQZ8PoSWFrVnAEsRBp 1i3WuzVJelavj/lH1WyjTgaCy0yfBl0Jh/AGqBbS7xjarMJyi3G0zMnLvy21DcUssRA+ jVRQ== X-Gm-Message-State: AOJu0YyPbCxQms44qkIdU0i6HdT6pKAIogI7e7KqCfReX+zfEvaAZbZN 50mNx51TGLj8QUcA9WTr+2jFMw== X-Google-Smtp-Source: AGHT+IEtgCEWhGhkeVfGVFrkVu+c6z5e8LVCiYCw+JpcuLHL0GICBcANeatvwPc6HpwQVBbCjQSZIg== X-Received: by 2002:a05:620a:2982:b0:76c:e661:448b with SMTP id r2-20020a05620a298200b0076ce661448bmr3432368qkp.33.1692116781043; Tue, 15 Aug 2023 09:26:21 -0700 (PDT) Received: from majuu.waya ([174.93.66.252]) by smtp.gmail.com with ESMTPSA id q5-20020ac87345000000b003fde3d63d22sm3874640qtp.69.2023.08.15.09.26.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 09:26:20 -0700 (PDT) From: Jamal Hadi Salim To: jiri@resnulli.us Cc: xiyou.wangcong@gmail.com, netdev@vger.kernel.org, vladbu@nvidia.com, mleitner@redhat.com, Jamal Hadi Salim , Jiri Pirko , Victor Nogueira , Pedro Tammela Subject: [PATCH RFC net-next 1/3] Introduce tc block netdev tracking infra Date: Tue, 15 Aug 2023 12:25:28 -0400 Message-Id: <20230815162530.150994-2-jhs@mojatatu.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815162530.150994-1-jhs@mojatatu.com> References: <20230815162530.150994-1-jhs@mojatatu.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC The tc block is a collection of netdevs/ports which allow qdiscs to share filter block instances (as opposed to the traditional tc filter per port). Example: $ tc qdisc add dev ens7 ingress block 22 $ tc qdisc add dev ens8 ingress block 22 Now we can add a filter using the block index: $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 192.168.0.0/16 action drop Upto this point, the block is unaware of its ports. This patch fixes that and makes the tc block ports available to the datapath as well as control path on offloading. Suggested-by: Jiri Pirko Co-developed-by: Victor Nogueira Signed-off-by: Victor Nogueira Co-developed-by: Pedro Tammela Signed-off-by: Pedro Tammela Signed-off-by: Jamal Hadi Salim --- include/net/sch_generic.h | 4 ++ net/sched/cls_api.c | 1 + net/sched/sch_api.c | 82 +++++++++++++++++++++++++++++++++++++-- net/sched/sch_generic.c | 40 ++++++++++++++++++- 4 files changed, 121 insertions(+), 6 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index f232512505f8..f002b0423efc 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -19,6 +19,7 @@ #include #include #include +#include struct Qdisc_ops; struct qdisc_walker; @@ -126,6 +127,8 @@ struct Qdisc { struct rcu_head rcu; netdevice_tracker dev_tracker; + netdevice_tracker in_block_tracker; + netdevice_tracker eg_block_tracker; /* private data */ long privdata[] ____cacheline_aligned; }; @@ -458,6 +461,7 @@ struct tcf_chain { }; struct tcf_block { + struct xarray ports; /* datapath accessible */ /* Lock protects tcf_block and lifetime-management data of chains * attached to the block (refcnt, action_refcnt, explicitly_created). */ diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index a193cc7b3241..a976792ef02f 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1003,6 +1003,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q, refcount_set(&block->refcnt, 1); block->net = net; block->index = block_index; + xa_init(&block->ports); /* Don't store q pointer for blocks which are shared */ if (!tcf_block_shared(block)) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index aa6b1fe65151..744db6d50f77 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1180,6 +1180,73 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent, return 0; } +//XXX: Does not seem necessary +static void qdisc_block_undo_set(struct Qdisc *sch, struct nlattr **tca) +{ + if (tca[TCA_INGRESS_BLOCK]) + sch->ops->ingress_block_set(sch, 0); + + if (tca[TCA_EGRESS_BLOCK]) + sch->ops->egress_block_set(sch, 0); +} + +static int qdisc_block_add_dev(struct Qdisc *sch, struct net_device *dev, + struct nlattr **tca, + struct netlink_ext_ack *extack) +{ + const struct Qdisc_class_ops *cl_ops = sch->ops->cl_ops; + struct tcf_block *in_block = NULL; + struct tcf_block *eg_block = NULL; + unsigned long cl = 0; + int err; + + if (tca[TCA_INGRESS_BLOCK]) { + /* works for both ingress and clsact */ + cl = TC_H_MIN_INGRESS; + in_block = cl_ops->tcf_block(sch, cl, NULL); + if (!in_block) { + NL_SET_ERR_MSG(extack, "Shared ingress block missing"); + return -EINVAL; + } + + err = xa_insert(&in_block->ports, dev->ifindex, dev, GFP_KERNEL); + if (err) { + NL_SET_ERR_MSG(extack, "ingress block dev insert failed"); + return err; + } + + netdev_hold(dev, &sch->in_block_tracker, GFP_KERNEL); + } + + if (tca[TCA_EGRESS_BLOCK]) { + cl = TC_H_MIN_EGRESS; + eg_block = cl_ops->tcf_block(sch, cl, NULL); + if (!eg_block) { + NL_SET_ERR_MSG(extack, "Shared egress block missing"); + err = -EINVAL; + goto err_out; + } + + err = xa_insert(&eg_block->ports, dev->ifindex, dev, GFP_KERNEL); + if (err) { + netdev_put(dev, &sch->eg_block_tracker); + NL_SET_ERR_MSG(extack, "Egress block dev insert failed"); + goto err_out; + } + netdev_hold(dev, &sch->eg_block_tracker, GFP_KERNEL); + } + + return 0; +err_out: + if (in_block) { + xa_erase(&in_block->ports, dev->ifindex); + netdev_put(dev, &sch->in_block_tracker); + NL_SET_ERR_MSG(extack, "ingress block dev insert failed"); + } + return err; +} + +//XXX: Should we reset INGRES if EGRESS fails? static int qdisc_block_indexes_set(struct Qdisc *sch, struct nlattr **tca, struct netlink_ext_ack *extack) { @@ -1270,7 +1337,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, sch = qdisc_alloc(dev_queue, ops, extack); if (IS_ERR(sch)) { err = PTR_ERR(sch); - goto err_out2; + goto err_out1; } sch->parent = parent; @@ -1289,7 +1356,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, if (handle == 0) { NL_SET_ERR_MSG(extack, "Maximum number of qdisc handles was exceeded"); err = -ENOSPC; - goto err_out3; + goto err_out2; } } if (!netif_is_multiqueue(dev)) @@ -1311,7 +1378,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, err = qdisc_block_indexes_set(sch, tca, extack); if (err) - goto err_out3; + goto err_out2; if (tca[TCA_STAB]) { stab = qdisc_get_stab(tca[TCA_STAB], extack); @@ -1350,6 +1417,10 @@ static struct Qdisc *qdisc_create(struct net_device *dev, qdisc_hash_add(sch, false); trace_qdisc_create(ops, dev, parent); + err = qdisc_block_add_dev(sch, dev, tca, extack); + if (err) + goto err_out4; + return sch; err_out4: @@ -1360,9 +1431,12 @@ static struct Qdisc *qdisc_create(struct net_device *dev, ops->destroy(sch); qdisc_put_stab(rtnl_dereference(sch->stab)); err_out3: + //XXX: not sure if we need to do this + qdisc_block_undo_set(sch, tca); +err_out2: netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); -err_out2: +err_out1: module_put(ops->owner); err_out: *errp = err; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 5d7e23f4cc0e..c62583bba06f 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1048,7 +1048,12 @@ static void qdisc_free_cb(struct rcu_head *head) static void __qdisc_destroy(struct Qdisc *qdisc) { - const struct Qdisc_ops *ops = qdisc->ops; + struct net_device *dev = qdisc_dev(qdisc); + const struct Qdisc_ops *ops = qdisc->ops; + const struct Qdisc_class_ops *cops; + struct tcf_block *block; + unsigned long cl; + u32 block_index; #ifdef CONFIG_NET_SCHED qdisc_hash_del(qdisc); @@ -1059,11 +1064,42 @@ static void __qdisc_destroy(struct Qdisc *qdisc) qdisc_reset(qdisc); + cops = ops->cl_ops; + if (ops->ingress_block_get) { + block_index = ops->ingress_block_get(qdisc); + if (block_index) { + /* XXX: will only work for clsact and ingress, we need + * a flag for qdiscs instead of depending on this hack + */ + cl = TC_H_MIN_INGRESS; + block = cops->tcf_block(qdisc, cl, NULL); + if (block) { + if (xa_erase(&block->ports, dev->ifindex)) + netdev_put(dev, &qdisc->in_block_tracker); + } + } + } + + if (ops->egress_block_get) { + block_index = ops->egress_block_get(qdisc); + if (block_index) { + /* XXX: will only work for clsact, we need a flag for + * qdiscs instead of depending on this hack + */ + cl = TC_H_MIN_EGRESS; + block = cops->tcf_block(qdisc, cl, NULL); + if (block) { + if (xa_erase(&block->ports, dev->ifindex)) + netdev_put(dev, &qdisc->eg_block_tracker); + } + } + } + if (ops->destroy) ops->destroy(qdisc); module_put(ops->owner); - netdev_put(qdisc_dev(qdisc), &qdisc->dev_tracker); + netdev_put(dev, &qdisc->dev_tracker); trace_qdisc_destroy(qdisc); From patchwork Tue Aug 15 16:25:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jamal Hadi Salim X-Patchwork-Id: 13353967 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E2AA156FD for ; Tue, 15 Aug 2023 16:26:24 +0000 (UTC) Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E89A310D1 for ; Tue, 15 Aug 2023 09:26:22 -0700 (PDT) Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-40ffa784eaeso28630571cf.0 for ; Tue, 15 Aug 2023 09:26:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mojatatu-com.20221208.gappssmtp.com; s=20221208; t=1692116782; x=1692721582; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GjmZ3NeF/Ok7CZo49WbchZlg0SyWPJ1z32hVeimCtu8=; b=C3980aITKSv8xzxwzSzD63kjJvFTPIhMMHecWJU8u3sdszNwJCK7wW8ncjq7yHg03p ee4CTYFLOJZtNkYbIB0Yc8DN93+3lKQBPRnoSL4OAliduX6OB/djHpqEuG0w5fnK+aN6 3qvnAPzyrTDtxjlE0Q0CH6sEIOtZw0igXZwb1MVHSl0AO4ldvN/2c4RgYZVXnKy0Wbta jXTqrUHLnK3bAsA3+GXrmOLACIzLEDOQEw8oQ/46DIlHYuvZRxqCRNazsB0isLLtXHcH 4s9EohlCpoutg3CKpHffB1yJBAw7LS6mexpnbdMS2LdjdEGU/vRf5OrqiZY5H9DMTO2g X24w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692116782; x=1692721582; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GjmZ3NeF/Ok7CZo49WbchZlg0SyWPJ1z32hVeimCtu8=; b=Ur5wB7U+2fLDjiavekEhO2NfN5+vXXAwn3LJ7ddS0yW9fVfqUMglMPY1FfdZjsOexx RzEZvVb8MEiNcL78QllPLtlezG4ft7VO30SLWW/8SAo0rmJ8qlokeG9q34vwXUsW6lMx ev0G/M44sQ8hzHeoy3YRiArate7FKAmVF10XG6iMWbbHIuTQAKUvEACSKt/B0qmfHSjG 4lf6xU0rQmXfp+bw1Tocd2xWOAhl8P8h1+9+ejWciWm1saFD3BlQfYug/Npvgkb1Hatz D4cK7EMLTzNPes4OKcFbZ35JDMdxL69YEWNkyBv3bwBHHtWlQsqAXLf1mEI/dv2HpISK 3D7Q== X-Gm-Message-State: AOJu0YyEgKn9/xxowI7FPtE53xDc0VSMkE/JwN2H6/9Otu+5AzQDR7/x 6MMyXuojfybFyOMuK+q/Bcz0ew== X-Google-Smtp-Source: AGHT+IGJZ3kDbWFYxkcUdW3XzedwvzL3kR8f/St9QPbgYa4k1O64K9yuolVY1i5iIPB8XLnbABtM2g== X-Received: by 2002:ac8:5f0f:0:b0:406:94af:c912 with SMTP id x15-20020ac85f0f000000b0040694afc912mr15257815qta.54.1692116782080; Tue, 15 Aug 2023 09:26:22 -0700 (PDT) Received: from majuu.waya ([174.93.66.252]) by smtp.gmail.com with ESMTPSA id q5-20020ac87345000000b003fde3d63d22sm3874640qtp.69.2023.08.15.09.26.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 09:26:21 -0700 (PDT) From: Jamal Hadi Salim To: jiri@resnulli.us Cc: xiyou.wangcong@gmail.com, netdev@vger.kernel.org, vladbu@nvidia.com, mleitner@redhat.com, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela Subject: [PATCH RFC net-next 2/3] Expose tc block ports to the datapath Date: Tue, 15 Aug 2023 12:25:29 -0400 Message-Id: <20230815162530.150994-3-jhs@mojatatu.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815162530.150994-1-jhs@mojatatu.com> References: <20230815162530.150994-1-jhs@mojatatu.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC The datapath can now find the block of the port in which the packet arrived at. It can then use it for various activities. In the next patch we show a simple action that multicast to all ports except for the port in which the packet arrived on. Co-developed-by: Victor Nogueira Signed-off-by: Victor Nogueira Co-developed-by: Pedro Tammela Signed-off-by: Pedro Tammela Signed-off-by: Jamal Hadi Salim --- include/net/sch_generic.h | 4 ++++ net/sched/cls_api.c | 6 +++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index f002b0423efc..a99ac60426b3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -440,6 +440,8 @@ struct qdisc_skb_cb { }; #define QDISC_CB_PRIV_LEN 20 unsigned char data[QDISC_CB_PRIV_LEN]; + /* This should allow eBPF to continue to align */ + u32 block_index; }; typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv); @@ -488,6 +490,8 @@ struct tcf_block { struct mutex proto_destroy_lock; /* Lock for proto_destroy hashtable. */ }; +struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index); + static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain) { return lockdep_is_held(&chain->filter_chain_lock); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index a976792ef02f..be4555714519 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1011,12 +1011,13 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q, return block; } -static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index) +struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index) { struct tcf_net *tn = net_generic(net, tcf_net_id); return idr_find(&tn->idr, block_index); } +EXPORT_SYMBOL(tcf_block_lookup); static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index) { @@ -1737,9 +1738,12 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res, bool compat_mode) { + struct qdisc_skb_cb *qdisc_cb = qdisc_skb_cb(skb); #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT) u32 last_executed_chain = 0; + qdisc_cb->block_index = block->index; + return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0, &last_executed_chain); #else From patchwork Tue Aug 15 16:25:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jamal Hadi Salim X-Patchwork-Id: 13353968 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE21415AD7 for ; Tue, 15 Aug 2023 16:26:25 +0000 (UTC) Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 379EAD1 for ; Tue, 15 Aug 2023 09:26:24 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-40fe3850312so43143371cf.1 for ; Tue, 15 Aug 2023 09:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mojatatu-com.20221208.gappssmtp.com; s=20221208; t=1692116783; x=1692721583; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MomApDLp/2T/CDgvCCBDaGq8kO9jgyrr0PVLGor6fuQ=; b=UcaHHTMk8fIzv4rBdZl2U7ikFx3tCLf2rzV5qN92eI8OEd551K+ZTnaeyzgrAq84g8 bRvhX5Wg7++38HQIY0nZ0fS7rAk2h31NctFqnGPbko5FM4Ouo01sdIG2PLfPjqChY/ly YTuJC/5cCYhiVhkyFEBkoVKdlkwvqXmXPb2wGHMwwIoRT62r0lI7XFbA+mixJA+fpWnS dnYv+/gE683qfP3uoBzzHMCklDdJfY2WRkkS4pyrfLuRVMFCv0XO9teWS3sFc8qJUqwE g8chuHA2RJan9PPIZvAQGxhFsT8MCp97sgZ9l46kY28oP3cvuPo9vHTPs5TO+1qHjwv/ Z1sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692116783; x=1692721583; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MomApDLp/2T/CDgvCCBDaGq8kO9jgyrr0PVLGor6fuQ=; b=H8TRN9ehxuZtboyLtVwJMru2UQ8AUXbjZCBO//FSHkKb9AHbNv4vggB127q1zjbqYe /6x9CA15iM6hkrxYGd6XYekEGMZa/+jcJqxFmVAmqQBfGqFLg5k8vbSIJwKla8Nv2cge ZddTFiRH0umvUQMYr9sUSZmEcUN0sSEpnyCzOKu8rnXzOwfIqXZ6kIZRwIFKn2AdXqZB aQRi3T8FobMZk8W7WSUF1GhkVr00btaCnGsui9OuDcAkC6pye8ULLkzE+Y5mxPbC621s KDJvLMAxA2qdFGy2U5DrUDAd96E7j2mZVDkt8jfinrcf23Xz78H/fI1CnOYZ6hmvwx96 //Mw== X-Gm-Message-State: AOJu0YxFF5XubvgcGlCZVxibh9daH6R3FxdTSyHIkM3eGnHhCfKqsIy5 vJOdKJYwEQDFLyqhzFBeBGEtAA== X-Google-Smtp-Source: AGHT+IHdN8STiHKzkVEWaN5BHyViWYiHqt37bg3Ynnk2XDBE/eiGAzTd6UfDfarPTsNQHn11Gb1Mhg== X-Received: by 2002:a05:622a:110b:b0:40f:db89:8616 with SMTP id e11-20020a05622a110b00b0040fdb898616mr18621023qty.67.1692116783084; Tue, 15 Aug 2023 09:26:23 -0700 (PDT) Received: from majuu.waya ([174.93.66.252]) by smtp.gmail.com with ESMTPSA id q5-20020ac87345000000b003fde3d63d22sm3874640qtp.69.2023.08.15.09.26.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 09:26:22 -0700 (PDT) From: Jamal Hadi Salim To: jiri@resnulli.us Cc: xiyou.wangcong@gmail.com, netdev@vger.kernel.org, vladbu@nvidia.com, mleitner@redhat.com, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela Subject: [PATCH RFC net-next 3/3] Introduce blockcast tc action Date: Tue, 15 Aug 2023 12:25:30 -0400 Message-Id: <20230815162530.150994-4-jhs@mojatatu.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230815162530.150994-1-jhs@mojatatu.com> References: <20230815162530.150994-1-jhs@mojatatu.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC This action takes advantage of the presence of tc block ports set in the datapath and broadcast a packet to all ports on that set with exception of the port in which it arrived on.. Example usage: $ tc qdisc add dev ens7 ingress block 22 $ tc qdisc add dev ens8 ingress block 22 Now we can add a filter using the block index: $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 192.168.0.0/16 action blockcast Co-developed-by: Victor Nogueira Signed-off-by: Victor Nogueira Co-developed-by: Pedro Tammela Signed-off-by: Pedro Tammela Signed-off-by: Jamal Hadi Salim --- include/net/tc_wrapper.h | 5 + net/sched/Kconfig | 13 ++ net/sched/Makefile | 1 + net/sched/act_blockcast.c | 302 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 321 insertions(+) create mode 100644 net/sched/act_blockcast.c diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h index a6d481b5bcbc..8ef848968be7 100644 --- a/include/net/tc_wrapper.h +++ b/include/net/tc_wrapper.h @@ -28,6 +28,7 @@ TC_INDIRECT_ACTION_DECLARE(tcf_csum_act); TC_INDIRECT_ACTION_DECLARE(tcf_ct_act); TC_INDIRECT_ACTION_DECLARE(tcf_ctinfo_act); TC_INDIRECT_ACTION_DECLARE(tcf_gact_act); +TC_INDIRECT_ACTION_DECLARE(tcf_blockcast_run); TC_INDIRECT_ACTION_DECLARE(tcf_gate_act); TC_INDIRECT_ACTION_DECLARE(tcf_ife_act); TC_INDIRECT_ACTION_DECLARE(tcf_ipt_act); @@ -57,6 +58,10 @@ static inline int tc_act(struct sk_buff *skb, const struct tc_action *a, if (a->ops->act == tcf_mirred_act) return tcf_mirred_act(skb, a, res); #endif +#if IS_BUILTIN(CONFIG_NET_ACT_BLOCKCAST) + if (a->ops->act == tcf_blockcast_run) + return tcf_blockcast_run(skb, a, res); +#endif #if IS_BUILTIN(CONFIG_NET_ACT_PEDIT) if (a->ops->act == tcf_pedit_act) return tcf_pedit_act(skb, a, res); diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 470c70deffe2..abf26f0c921f 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -780,6 +780,19 @@ config NET_ACT_SIMP To compile this code as a module, choose M here: the module will be called act_simple. +config NET_ACT_BLOCKCAST + tristate "TC block Multicast" + depends on NET_CLS_ACT + help + Say Y here to add an action that will multicast an skb to egress of + all netdevs that belong to a tc block except for the netdev on which + the skb arrived on + + If unsure, say N. + + To compile this code as a module, choose M here: the + module will be called act_blockcast. + config NET_ACT_SKBEDIT tristate "SKB Editing" depends on NET_CLS_ACT diff --git a/net/sched/Makefile b/net/sched/Makefile index b5fd49641d91..2cdcf30645eb 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_IPT) += act_ipt.o obj-$(CONFIG_NET_ACT_NAT) += act_nat.o obj-$(CONFIG_NET_ACT_PEDIT) += act_pedit.o obj-$(CONFIG_NET_ACT_SIMP) += act_simple.o +obj-$(CONFIG_NET_ACT_BLOCKCAST) += act_blockcast.o obj-$(CONFIG_NET_ACT_SKBEDIT) += act_skbedit.o obj-$(CONFIG_NET_ACT_CSUM) += act_csum.o obj-$(CONFIG_NET_ACT_MPLS) += act_mpls.o diff --git a/net/sched/act_blockcast.c b/net/sched/act_blockcast.c new file mode 100644 index 000000000000..1c9e49d68540 --- /dev/null +++ b/net/sched/act_blockcast.c @@ -0,0 +1,302 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * net/sched/act_blockcast.c Block Cast action + * Copyright (c) 2023, Mojatatu Networks + * Authors: Jamal Hadi Salim + * Victor Nogueira + * Pedro Tammela + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +static struct tc_action_ops act_blockcast_ops; + +struct tcf_blockcast_act { + struct tc_action common; +}; + +#define to_blockcast_act(a) ((struct tcf_blockcast_act *)a) + +#define TCA_ID_BLOCKCAST 123 +#define CAST_RECURSION_LIMIT 4 + +static DEFINE_PER_CPU(unsigned int, redirect_rec_level); + +//XXX: Refactor mirred code and reuse here before final version +static int cast_one(struct sk_buff *skb, const u32 ifindex) +{ + struct sk_buff *skb2 = skb; + int retval = TC_ACT_PIPE; + struct net_device *dev; + unsigned int rec_level; + bool expects_nh; + int mac_len; + bool at_nh; + int err; + + rec_level = __this_cpu_inc_return(redirect_rec_level); + if (unlikely(rec_level > CAST_RECURSION_LIMIT)) { + net_warn_ratelimited("blockcast: exceeded redirect recursion limit on dev %s\n", + netdev_name(skb->dev)); + __this_cpu_dec(redirect_rec_level); + return TC_ACT_SHOT; + } + + dev = dev_get_by_index_rcu(dev_net(skb->dev), ifindex); + if (unlikely(!dev)) { + pr_notice_once("blockcast: target device %s is gone\n", + dev->name); + __this_cpu_dec(redirect_rec_level); + return TC_ACT_SHOT; + } + + if (unlikely(!(dev->flags & IFF_UP))) { + net_notice_ratelimited("blockcast: device %s is down\n", + dev->name); + __this_cpu_dec(redirect_rec_level); + return TC_ACT_SHOT; + } + + skb2 = skb_clone(skb, GFP_ATOMIC); + if (!skb2) { + __this_cpu_dec(redirect_rec_level); + return retval; + } + + nf_reset_ct(skb2); + + expects_nh = !dev_is_mac_header_xmit(dev); + at_nh = skb->data == skb_network_header(skb); + if (at_nh != expects_nh) { + mac_len = skb_at_tc_ingress(skb) ? + skb->mac_len : + skb_network_header(skb) - skb_mac_header(skb); + + if (expects_nh) { + /* target device/action expect data at nh */ + skb_pull_rcsum(skb2, mac_len); + } else { + /* target device/action expect data at mac */ + skb_push_rcsum(skb2, mac_len); + } + } + + skb2->skb_iif = skb->dev->ifindex; + skb2->dev = dev; + + err = dev_queue_xmit(skb2); + if (err) + retval = TC_ACT_SHOT; + + __this_cpu_dec(redirect_rec_level); + + return retval; +} + +TC_INDIRECT_SCOPE int tcf_blockcast_run(struct sk_buff *skb, + const struct tc_action *a, + struct tcf_result *res) +{ + u32 block_index = qdisc_skb_cb(skb)->block_index; + struct tcf_blockcast_act *p = to_blockcast_act(a); + int action = READ_ONCE(p->tcf_action); + struct net *net = dev_net(skb->dev); + struct tcf_block *block; + struct net_device *dev; + u32 exception_ifindex; + unsigned long index; + + block = tcf_block_lookup(net, block_index); + exception_ifindex = skb->dev->ifindex; + + tcf_action_update_bstats(&p->common, skb); + tcf_lastuse_update(&p->tcf_tm); + + if (!block || xa_empty(&block->ports)) + goto act_done; + + /* we are already under rcu protection, so iterating block is safe*/ + xa_for_each(&block->ports, index, dev) { + int err; + + if (index == exception_ifindex) + continue; + + err = cast_one(skb, dev->ifindex); + if (err != TC_ACT_PIPE) + printk("(%d)Failed to send to dev\t%d: %s\n", err, + dev->ifindex, dev->name); + } + +act_done: + if (action == TC_ACT_SHOT) + tcf_action_inc_drop_qstats(&p->common); + return action; +} + +static const struct nla_policy blockcast_policy[TCA_DEF_MAX + 1] = { + [TCA_DEF_PARMS] = { .len = sizeof(struct tc_defact) }, +}; + +static int tcf_blockcast_init(struct net *net, struct nlattr *nla, + struct nlattr *est, struct tc_action **a, + struct tcf_proto *tp, u32 flags, + struct netlink_ext_ack *extack) +{ + struct tc_action_net *tn = net_generic(net, act_blockcast_ops.net_id); + struct tcf_blockcast_act *p = to_blockcast_act(a); + bool bind = flags & TCA_ACT_FLAGS_BIND; + struct nlattr *tb[TCA_DEF_MAX + 1]; + struct tcf_chain *goto_ch = NULL; + struct tc_defact *parm; + bool exists = false; + int ret = 0, err; + u32 index; + + if (!nla) + return -EINVAL; + + err = nla_parse_nested_deprecated(tb, TCA_DEF_MAX, nla, + blockcast_policy, NULL); + if (err < 0) + return err; + + if (!tb[TCA_DEF_PARMS]) + return -EINVAL; + + parm = nla_data(tb[TCA_DEF_PARMS]); + index = parm->index; + + err = tcf_idr_check_alloc(tn, &index, a, bind); + if (err < 0) + return err; + + exists = err; + if (exists && bind) + return 0; + + if (!exists) { + ret = tcf_idr_create_from_flags(tn, index, est, a, + &act_blockcast_ops, bind, flags); + if (ret) { + tcf_idr_cleanup(tn, index); + return ret; + } + + ret = ACT_P_CREATED; + } else { + if (!(flags & TCA_ACT_FLAGS_REPLACE)) { + err = -EEXIST; + goto release_idr; + } + } + + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; + + if (exists) + spin_lock_bh(&p->tcf_lock); + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); + if (exists) + spin_unlock_bh(&p->tcf_lock); + + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + + return ret; +release_idr: + tcf_idr_release(*a, bind); + return err; +} + +static int tcf_blockcast_dump(struct sk_buff *skb, struct tc_action *a, + int bind, int ref) +{ + unsigned char *b = skb_tail_pointer(skb); + struct tcf_blockcast_act *p = to_blockcast_act(a); + struct tc_defact opt = { + .index = p->tcf_index, + .refcnt = refcount_read(&p->tcf_refcnt) - ref, + .bindcnt = atomic_read(&p->tcf_bindcnt) - bind, + }; + struct tcf_t t; + + spin_lock_bh(&p->tcf_lock); + opt.action = p->tcf_action; + if (nla_put(skb, TCA_DEF_PARMS, sizeof(opt), &opt)) + goto nla_put_failure; + + tcf_tm_dump(&t, &p->tcf_tm); + if (nla_put_64bit(skb, TCA_DEF_TM, sizeof(t), &t, TCA_DEF_PAD)) + goto nla_put_failure; + spin_unlock_bh(&p->tcf_lock); + + return skb->len; + +nla_put_failure: + spin_unlock_bh(&p->tcf_lock); + nlmsg_trim(skb, b); + return -1; +} + +static struct tc_action_ops act_blockcast_ops = { + .kind = "blockcast", + .id = TCA_ID_BLOCKCAST, + .owner = THIS_MODULE, + .act = tcf_blockcast_run, + .dump = tcf_blockcast_dump, + .init = tcf_blockcast_init, + .size = sizeof(struct tcf_blockcast_act), +}; + +static __net_init int blockcast_init_net(struct net *net) +{ + struct tc_action_net *tn = net_generic(net, act_blockcast_ops.net_id); + + return tc_action_net_init(net, tn, &act_blockcast_ops); +} + +static void __net_exit blockcast_exit_net(struct list_head *net_list) +{ + tc_action_net_exit(net_list, act_blockcast_ops.net_id); +} + +static struct pernet_operations blockcast_net_ops = { + .init = blockcast_init_net, + .exit_batch = blockcast_exit_net, + .id = &act_blockcast_ops.net_id, + .size = sizeof(struct tc_action_net), +}; + +MODULE_AUTHOR("Mojatatu Networks, Inc"); +MODULE_LICENSE("GPL"); + +static int __init blockcast_init_module(void) +{ + int ret = tcf_register_action(&act_blockcast_ops, &blockcast_net_ops); + + if (!ret) + pr_info("blockcast TC action Loaded\n"); + return ret; +} + +static void __exit blockcast_cleanup_module(void) +{ + tcf_unregister_action(&act_blockcast_ops, &blockcast_net_ops); +} + +module_init(blockcast_init_module); +module_exit(blockcast_cleanup_module);