From patchwork Wed Dec 22 12:08:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tonghao Zhang X-Patchwork-Id: 12691461 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4315DC433EF for ; Wed, 22 Dec 2021 12:09:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244821AbhLVMJg (ORCPT ); Wed, 22 Dec 2021 07:09:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244837AbhLVMJg (ORCPT ); Wed, 22 Dec 2021 07:09:36 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2D9DC061574 for ; Wed, 22 Dec 2021 04:09:35 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id n15-20020a17090a160f00b001a75089daa3so5657289pja.1 for ; Wed, 22 Dec 2021 04:09:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/GgfR7QcCJjgE15e+lCO8WDZtcKd87cFfiY03FXqO1c=; b=eT+LPpvhvyIj3+JGEPgGVYagc1S0KR25aZeJDp60eFoOVl12ujkcnj2+FLMTTBmxSQ e99xWgZFEpj+HxCzwyr85iH6OXSb7wxJ94pWbTsmrkWle4WdmczlQ66W6D6xdipymE+v EEhtVJXVOQarY3Qi55Pf7AEfKNih+vCOo591KYg0wy3F6EKXfNDn14fQcUEdgKCcekl3 FsU5s9BXLarmYMIpyB8HSm6bA7rol1PSPqNXdFljKHQQz3eHZhLTqsC7K9x7/WNsroGl SmnmGnhrUGS+0uW4cy5sqFM6jvoMOMH57XywdPLkrHd30E/0BDvBY08aah0vla3LtWP5 B7ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/GgfR7QcCJjgE15e+lCO8WDZtcKd87cFfiY03FXqO1c=; b=Yjx1bumzsXjt5cpCppxcsZEF3CdBGkYTIn+/y7pmY0AMlq5QuZscPmLyGTLQobgllz N+srN4jN18xhRwC8Kst6EuJGe+MA3wRi1vUNqe+kwzRD/rGvSKn5ZRA+RfjVPZ4DJ9jj vaRJIXDnU+29AtJcWAHNda7mA0RkgMPwnxQwOV3EljE2BfhqjupOfrNsw1jwCUv+92Se ZH/5nDUoaPwy/O0fIz8/WFNvOvb7hLf8zGqdiC6ReKYG/UtBILg6sY5rwvyrYHm/wpJl DiBQiGjE1hAzalQ6YCuJKJi65Nqb91m25PqaQPKWtRChf+PSrX3FYF7+EbWdGgWs5vK4 PVKA== X-Gm-Message-State: AOAM531TE2ki+D1LgoGeHFda/5aTJuJevtBe32pYrZAnPXg+HtbvCOlO tfxmUFMgcWYiDPBiWJRaFPsWeaWVwJK2TQ== X-Google-Smtp-Source: ABdhPJz20rIH9PBaae0/mbZs0e94Y48tx+lOj5Uij6yij2qxxzwx+EvSZtiv6FXkZGOoTopejWzgRA== X-Received: by 2002:a17:90a:f998:: with SMTP id cq24mr1001520pjb.64.1640174975179; Wed, 22 Dec 2021 04:09:35 -0800 (PST) Received: from localhost.localdomain ([111.201.150.233]) by smtp.gmail.com with ESMTPSA id y128sm2598517pfb.24.2021.12.22.04.09.30 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Dec 2021 04:09:34 -0800 (PST) From: xiangxia.m.yue@gmail.com To: netdev@vger.kernel.org Cc: Tonghao Zhang , Jamal Hadi Salim , Cong Wang , Jiri Pirko , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Eric Dumazet , Alexander Lobakin , Paolo Abeni , Talal Ahmad , Kevin Hao , Ilias Apalodimas , Kees Cook , Kumar Kartikeya Dwivedi , Antoine Tenart , Wei Wang , Arnd Bergmann Subject: [net-next v6 2/2] net: sched: support hash/classid/cpuid selecting tx queue Date: Wed, 22 Dec 2021 20:08:09 +0800 Message-Id: <20211222120809.2222-3-xiangxia.m.yue@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20211222120809.2222-1-xiangxia.m.yue@gmail.com> References: <20211222120809.2222-1-xiangxia.m.yue@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Tonghao Zhang This patch allows user to select queue_mapping, range from A to B. And user can use skbhash, cgroup classid and cpuid to select Tx queues. Then we can load balance packets from A to B queue. The range is an unsigned 16bit value in decimal format. $ tc filter ... action skbedit queue_mapping skbhash A B "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is enhanced with flags: * SKBEDIT_F_TXQ_SKBHASH * SKBEDIT_F_TXQ_CLASSID * SKBEDIT_F_TXQ_CPUID Use skb->hash, cgroup classid, or cpuid to distribute packets. Then same range of tx queues can be shared for different flows, cgroups, or CPUs in a variety of scenarios. For example, F1 may share range R1 with F2. The best way to do that is to set flag to SKBEDIT_F_TXQ_HASH, using skb->hash to share the queues. If cgroup C1 want to share the R1 with cgroup C2 .. Cn, use the SKBEDIT_F_TXQ_CLASSID. Of course, in some other scenario, C1 use R1, while Cn can use the Rn. Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Jonathan Lemon Cc: Eric Dumazet Cc: Alexander Lobakin Cc: Paolo Abeni Cc: Talal Ahmad Cc: Kevin Hao Cc: Ilias Apalodimas Cc: Kees Cook Cc: Kumar Kartikeya Dwivedi Cc: Antoine Tenart Cc: Wei Wang Cc: Arnd Bergmann Signed-off-by: Tonghao Zhang --- include/net/tc_act/tc_skbedit.h | 1 + include/uapi/linux/tc_act/tc_skbedit.h | 8 +++ net/sched/act_skbedit.c | 78 +++++++++++++++++++++++++- 3 files changed, 84 insertions(+), 3 deletions(-) diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h index 00bfee70609e..ee96e0fa6566 100644 --- a/include/net/tc_act/tc_skbedit.h +++ b/include/net/tc_act/tc_skbedit.h @@ -17,6 +17,7 @@ struct tcf_skbedit_params { u32 mark; u32 mask; u16 queue_mapping; + u16 mapping_mod; u16 ptype; struct rcu_head rcu; }; diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h index 800e93377218..5ea1438a4d88 100644 --- a/include/uapi/linux/tc_act/tc_skbedit.h +++ b/include/uapi/linux/tc_act/tc_skbedit.h @@ -29,6 +29,13 @@ #define SKBEDIT_F_PTYPE 0x8 #define SKBEDIT_F_MASK 0x10 #define SKBEDIT_F_INHERITDSFIELD 0x20 +#define SKBEDIT_F_TXQ_SKBHASH 0x40 +#define SKBEDIT_F_TXQ_CLASSID 0x80 +#define SKBEDIT_F_TXQ_CPUID 0x100 + +#define SKBEDIT_F_TXQ_HASH_MASK (SKBEDIT_F_TXQ_SKBHASH | \ + SKBEDIT_F_TXQ_CLASSID | \ + SKBEDIT_F_TXQ_CPUID) struct tc_skbedit { tc_gen; @@ -45,6 +52,7 @@ enum { TCA_SKBEDIT_PTYPE, TCA_SKBEDIT_MASK, TCA_SKBEDIT_FLAGS, + TCA_SKBEDIT_QUEUE_MAPPING_MAX, __TCA_SKBEDIT_MAX }; #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1) diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 325991080a8a..9493b3102923 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -23,6 +24,38 @@ static unsigned int skbedit_net_id; static struct tc_action_ops act_skbedit_ops; +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params, + struct sk_buff *skb) +{ + u32 mapping_hash_type = params->flags & SKBEDIT_F_TXQ_HASH_MASK; + u16 queue_mapping = params->queue_mapping; + u16 mapping_mod = params->mapping_mod; + u32 hash = 0; + + switch (mapping_hash_type) { + case SKBEDIT_F_TXQ_CLASSID: + hash = task_get_classid(skb); + break; + case SKBEDIT_F_TXQ_SKBHASH: + hash = skb_get_hash(skb); + break; + case SKBEDIT_F_TXQ_CPUID: + hash = raw_smp_processor_id(); + break; + case 0: + /* Hash type isn't specified. In this case: + * hash % mapping_mod == 0 + */ + break; + default: + net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n", + mapping_hash_type); + } + + queue_mapping = queue_mapping + hash % mapping_mod; + return netdev_cap_txqueue(skb->dev, queue_mapping); +} + static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { @@ -62,7 +95,7 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a, #ifdef CONFIG_NET_EGRESS netdev_xmit_skip_txqueue(true); #endif - skb_set_queue_mapping(skb, params->queue_mapping); + skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb)); } if (params->flags & SKBEDIT_F_MARK) { skb->mark &= ~params->mask; @@ -96,6 +129,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = { [TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) }, [TCA_SKBEDIT_MASK] = { .len = sizeof(u32) }, [TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) }, + [TCA_SKBEDIT_QUEUE_MAPPING_MAX] = { .len = sizeof(u16) }, }; static int tcf_skbedit_init(struct net *net, struct nlattr *nla, @@ -112,6 +146,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct tcf_skbedit *d; u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL; u16 *queue_mapping = NULL, *ptype = NULL; + u16 mapping_mod = 1; bool exists = false; int ret = 0, err; u32 index; @@ -156,7 +191,34 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, if (tb[TCA_SKBEDIT_FLAGS] != NULL) { u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]); - + u64 mapping_hash_type; + + mapping_hash_type = *pure_flags & SKBEDIT_F_TXQ_HASH_MASK; + if (mapping_hash_type) { + u16 *queue_mapping_max; + + /* Hash types are mutually exclusive. */ + if (mapping_hash_type & (mapping_hash_type - 1)) { + NL_SET_ERR_MSG_MOD(extack, "Multi types of hash are specified."); + return -EINVAL; + } + + if (!tb[TCA_SKBEDIT_QUEUE_MAPPING] || + !tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]) { + NL_SET_ERR_MSG_MOD(extack, "Missing required range of queue_mapping."); + return -EINVAL; + } + + queue_mapping_max = + nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]); + if (*queue_mapping_max < *queue_mapping) { + NL_SET_ERR_MSG_MOD(extack, "The range of queue_mapping is invalid, max < min."); + return -EINVAL; + } + + mapping_mod = *queue_mapping_max - *queue_mapping + 1; + flags |= mapping_hash_type; + } if (*pure_flags & SKBEDIT_F_INHERITDSFIELD) flags |= SKBEDIT_F_INHERITDSFIELD; } @@ -208,8 +270,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, params_new->flags = flags; if (flags & SKBEDIT_F_PRIORITY) params_new->priority = *priority; - if (flags & SKBEDIT_F_QUEUE_MAPPING) + if (flags & SKBEDIT_F_QUEUE_MAPPING) { params_new->queue_mapping = *queue_mapping; + params_new->mapping_mod = mapping_mod; + } if (flags & SKBEDIT_F_MARK) params_new->mark = *mark; if (flags & SKBEDIT_F_PTYPE) @@ -281,6 +345,13 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a, goto nla_put_failure; if (params->flags & SKBEDIT_F_INHERITDSFIELD) pure_flags |= SKBEDIT_F_INHERITDSFIELD; + if (params->flags & SKBEDIT_F_TXQ_HASH_MASK) { + if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX, + params->queue_mapping + params->mapping_mod - 1)) + goto nla_put_failure; + + pure_flags |= params->flags & SKBEDIT_F_TXQ_HASH_MASK; + } if (pure_flags != 0 && nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags)) goto nla_put_failure; @@ -335,6 +406,7 @@ static size_t tcf_skbedit_get_fill_size(const struct tc_action *act) return nla_total_size(sizeof(struct tc_skbedit)) + nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_PRIORITY */ + nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_QUEUE_MAPPING */ + + nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_QUEUE_MAPPING_MAX */ + nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MARK */ + nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_PTYPE */ + nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MASK */