From patchwork Mon Dec 6 08:05:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tonghao Zhang X-Patchwork-Id: 12657751 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08223C433EF for ; Mon, 6 Dec 2021 08:05:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238871AbhLFII4 (ORCPT ); Mon, 6 Dec 2021 03:08:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238716AbhLFIIz (ORCPT ); Mon, 6 Dec 2021 03:08:55 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A50B0C0613F8 for ; Mon, 6 Dec 2021 00:05:27 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id b13so6557849plg.2 for ; Mon, 06 Dec 2021 00:05:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K14zWgtsUTCQfzykwYkpCG3ItSI3waYsU5ZRB/ODMgQ=; b=oQPyrsTFUZKvegBtLMDHknvB7ma5W2CbTN59/3Z/WKc33prGCTWhBS14ZCtdDo+zqq cgmDsc2GvmOFSnhJFwO4nNnJVTNXBMdFHLakRbVL4fUZMA2sRXQkfxFb/YBV5c9JP6CG bXy46+piZQfdR2hBZzmgEa5VckE8oAz8ok3bQVrCK7sXm9CVJVUJf2b8wlyDeXz+cQ31 bSz28FuS3oJdTZGEwoAv2ISpKfYCEJeaHtOqWLUr/n8yZqqfLhHwOkQwWoLHZVIdNQQv /TZ9rd7dNFUEH0iP4X/Cme150DYrKOICuLKic+P8W5nGQyiTJ1Pv9Bem2bLo9cK4hINF S+4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K14zWgtsUTCQfzykwYkpCG3ItSI3waYsU5ZRB/ODMgQ=; b=YOMbURYMvkM2y0jqFGskXb/p6udxRYuahi3PykGrIihER/Jxf/Tw4vJO2UB57OOjBv l4g6ZsA0rfb/26qlB49k3l3agtWPSIyYaHlYXSIjZiT1DnP7HQ3AxsBP+nA8UM2Z57N4 Lsf4olYyvFQmRByvgs95dXx5TddL8TTeRXMuXSgOu78ze+UVmu1qFqVBMOYapC10hOVt 8E2EVU2Fo3PrRZiJk6AFoVdN7VTz4m96EcPr34GLGfdC1kUIb21Uu8sJmiDepnntH/vB xwjLDtVv+TjLi6R5crcauNZHHvIw/PkQ93HMp8zA4nEHptlD7JYbAe6QIsPU64AeqdsU 01Kg== X-Gm-Message-State: AOAM530DGbaMIXU1zb/DSTRdibdGxoozN/NuiAhGUzC7/SRLTHH4fNEy DP9rVgkBfooktrUJljJLvH9snYwsC5WKYw== X-Google-Smtp-Source: ABdhPJxhNryhiWOgHQ16sJqYsiM+Ary4RdkFqz2rJ90Kg+e8t+46KkskkUaZ3/krH01Tc1ik9zscvw== X-Received: by 2002:a17:90b:3b4c:: with SMTP id ot12mr35758763pjb.196.1638777926878; Mon, 06 Dec 2021 00:05:26 -0800 (PST) Received: from localhost.localdomain ([111.204.182.106]) by smtp.gmail.com with ESMTPSA id e15sm11148798pfv.131.2021.12.06.00.05.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Dec 2021 00:05:26 -0800 (PST) From: xiangxia.m.yue@gmail.com To: netdev@vger.kernel.org Cc: Tonghao Zhang , Jamal Hadi Salim , Cong Wang , Jiri Pirko , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Eric Dumazet , Alexander Lobakin , Paolo Abeni , Talal Ahmad , Kevin Hao , Ilias Apalodimas , Kees Cook , Kumar Kartikeya Dwivedi , Antoine Tenart , Wei Wang , Arnd Bergmann Subject: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue Date: Mon, 6 Dec 2021 16:05:11 +0800 Message-Id: <20211206080512.36610-2-xiangxia.m.yue@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20211206080512.36610-1-xiangxia.m.yue@gmail.com> References: <20211206080512.36610-1-xiangxia.m.yue@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Tonghao Zhang This patch fix issue: * If we install tc filters with act_skbedit in clsact hook. It doesn't work, because *netdev_core_pick_tx will overwrite queue_mapping. $ tc filter add dev $NETDEV egress .. action skbedit queue_mapping 1 And this patch is useful: * In containter networking environment, one kind of pod/containter/ net-namespace (e.g. P1, P2) which outbound traffic limited, can use one specific tx queue which used HTB/TBF Qdisc. But other kind of pods (e.g. Pn) can use other specific tx queue too, which used fifio Qdisc. Then the lock contention of HTB/TBF Qdisc will not affect Pn. +----+ +----+ +----+ | P1 | | P2 | | Pn | +----+ +----+ +----+ | | | +-----------+-----------+ | | clsact/skbedit | MQ v +-----------+-----------+ | q0 | q1 | qn v v v HTB HTB ... FIFO Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Jonathan Lemon Cc: Eric Dumazet Cc: Alexander Lobakin Cc: Paolo Abeni Cc: Talal Ahmad Cc: Kevin Hao Cc: Ilias Apalodimas Cc: Kees Cook Cc: Kumar Kartikeya Dwivedi Cc: Antoine Tenart Cc: Wei Wang Cc: Arnd Bergmann Signed-off-by: Tonghao Zhang --- include/linux/skbuff.h | 1 + net/core/dev.c | 12 +++++++++--- net/sched/act_skbedit.c | 4 +++- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index eae4bd3237a4..b6ea4b920409 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -856,6 +856,7 @@ struct sk_buff { #endif #ifdef CONFIG_NET_CLS_ACT __u8 tc_skip_classify:1; + __u8 tc_skip_txqueue:1; __u8 tc_at_ingress:1; #endif __u8 redirected:1; diff --git a/net/core/dev.c b/net/core/dev.c index aba8acc1238c..fb9d4eee29ee 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3975,10 +3975,16 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, { int queue_index = 0; -#ifdef CONFIG_XPS - u32 sender_cpu = skb->sender_cpu - 1; +#ifdef CONFIG_NET_CLS_ACT + if (skb->tc_skip_txqueue) { + queue_index = netdev_cap_txqueue(dev, + skb_get_queue_mapping(skb)); + return netdev_get_tx_queue(dev, queue_index); + } +#endif - if (sender_cpu >= (u32)NR_CPUS) +#ifdef CONFIG_XPS + if ((skb->sender_cpu - 1) >= (u32)NR_CPUS) skb->sender_cpu = raw_smp_processor_id() + 1; #endif diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index d30ecbfc8f84..940091a7c7f0 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -58,8 +58,10 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a, } } if (params->flags & SKBEDIT_F_QUEUE_MAPPING && - skb->dev->real_num_tx_queues > params->queue_mapping) + skb->dev->real_num_tx_queues > params->queue_mapping) { + skb->tc_skip_txqueue = 1; skb_set_queue_mapping(skb, params->queue_mapping); + } if (params->flags & SKBEDIT_F_MARK) { skb->mark &= ~params->mask; skb->mark |= params->mark & params->mask; From patchwork Mon Dec 6 08:05:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tonghao Zhang X-Patchwork-Id: 12657753 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5C9AC433EF for ; Mon, 6 Dec 2021 08:05:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238883AbhLFIJD (ORCPT ); Mon, 6 Dec 2021 03:09:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238716AbhLFIJA (ORCPT ); Mon, 6 Dec 2021 03:09:00 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77F82C0613F8 for ; Mon, 6 Dec 2021 00:05:32 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id y8so6561068plg.1 for ; Mon, 06 Dec 2021 00:05:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JdvpH7hqZ8LDXKkoiDH/2JwIH8yu9+WV/0GhoP2/m0M=; b=KkBrPmZrbqPLsp4nTspt8feNGOQbwbcExZhaisyeXsbgGtUsF798jdNIOBjw73v6ld BfhdUSJ56+NOOdh/aakpmxVJiAUhwCO9dEzTCt9V69RTfREIQGPpiv5GwhU+NV5m4AEz /ZCJBaPyOWVDI5XP2gWxnJGFPGQQOypiqEyjQGWXlKbhpe7gcOQYdYYUjgPa4ZHEOnG0 O3AHFHuKflXhC58VVITPJ1O/OMYeXZTbRMPlMSegt0S1lMm/+5E1RyBhIkbuY/ksNZ8t 0jzBfMpu6rO+TQzrm+l0iqitoZRNBkHjrpMAF5rYKUIcPGGETK4+JvTFHYT56QQaG5J+ Gu1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JdvpH7hqZ8LDXKkoiDH/2JwIH8yu9+WV/0GhoP2/m0M=; b=WOdNC5AzzBuQRauIXxtVfrvk0oPJz4Jo89oRvPMqN0IGhM0A3GO8dysy20OXyjGagW FnjlCw5DPxVbPRZfoPl9AhNLLxake+1CVWXx4mtH+/jQ7qVkz4PIxNKU1I3Cs7J2hILa kQD7ozhIe09AF2MR/Qz4Sg3vS5Yxcr3KTLN+LSfRDaKxkcIXK6HroKOb2UEtjTFeqD3U yf8NCHp07eZ2+z+ozutYEy4WZIeeEa0FEzu4yqrl9E9wNqonfbCUFVckvkKSTekIA4s+ RuJnP22gz2mw3rEDyPug5TcFbIK8L+MKhgDNVkunENt6vxiBgBI7iahUZP4LZzKDUotY ZRsQ== X-Gm-Message-State: AOAM531wgOx1XlU4Lxfo1Gr0kEA/mVdR25tkds3v/dKGpGGdKCtKzrAZ dY7QnhwRodUqHjc5qMkBqT75TqRiSY/8+Q== X-Google-Smtp-Source: ABdhPJze6b8WIE6l6wAwNpW8Ii6uylYKntV8ozibFReagH2Yu5PAVSStDW6j6nWhp5G3tgkbDMbkWQ== X-Received: by 2002:a17:90a:8049:: with SMTP id e9mr35181752pjw.229.1638777931690; Mon, 06 Dec 2021 00:05:31 -0800 (PST) Received: from localhost.localdomain ([111.204.182.106]) by smtp.gmail.com with ESMTPSA id e15sm11148798pfv.131.2021.12.06.00.05.27 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Dec 2021 00:05:31 -0800 (PST) From: xiangxia.m.yue@gmail.com To: netdev@vger.kernel.org Cc: Tonghao Zhang , Jamal Hadi Salim , Cong Wang , Jiri Pirko , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Eric Dumazet , Alexander Lobakin , Paolo Abeni , Talal Ahmad , Kevin Hao , Ilias Apalodimas , Kees Cook , Kumar Kartikeya Dwivedi , Antoine Tenart , Wei Wang , Arnd Bergmann Subject: [net-next v1 2/2] net: sched: support hash/classid selecting queue_mapping Date: Mon, 6 Dec 2021 16:05:12 +0800 Message-Id: <20211206080512.36610-3-xiangxia.m.yue@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20211206080512.36610-1-xiangxia.m.yue@gmail.com> References: <20211206080512.36610-1-xiangxia.m.yue@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Tonghao Zhang This patch allows users to select queue_mapping, range from A to B. And users can use skb-hash or cgroup classid to select queues. Then the packets can load balance from A to B queue. $ tc filter ... action skbedit queue_mapping hash-type normal 0 4 "skbedit queue_mapping QUEUE_MAPPING"[0] is enhanced with two flags: SKBEDIT_F_QUEUE_MAPPING_HASH, SKBEDIT_F_QUEUE_MAPPING_CLASSID. The range is an unsigned 8bit value in decimal format. [0]: https://man7.org/linux/man-pages/man8/tc-skbedit.8.html Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Jonathan Lemon Cc: Eric Dumazet Cc: Alexander Lobakin Cc: Paolo Abeni Cc: Talal Ahmad Cc: Kevin Hao Cc: Ilias Apalodimas Cc: Kees Cook Cc: Kumar Kartikeya Dwivedi Cc: Antoine Tenart Cc: Wei Wang Cc: Arnd Bergmann Signed-off-by: Tonghao Zhang --- include/net/tc_act/tc_skbedit.h | 1 + include/uapi/linux/tc_act/tc_skbedit.h | 5 +++ net/sched/act_skbedit.c | 48 +++++++++++++++++++++++--- 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h index 00bfee70609e..ee96e0fa6566 100644 --- a/include/net/tc_act/tc_skbedit.h +++ b/include/net/tc_act/tc_skbedit.h @@ -17,6 +17,7 @@ struct tcf_skbedit_params { u32 mark; u32 mask; u16 queue_mapping; + u16 mapping_mod; u16 ptype; struct rcu_head rcu; }; diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h index 800e93377218..badb58ec84ef 100644 --- a/include/uapi/linux/tc_act/tc_skbedit.h +++ b/include/uapi/linux/tc_act/tc_skbedit.h @@ -29,6 +29,11 @@ #define SKBEDIT_F_PTYPE 0x8 #define SKBEDIT_F_MASK 0x10 #define SKBEDIT_F_INHERITDSFIELD 0x20 +#define SKBEDIT_F_QUEUE_MAPPING_HASH 0x40 +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID 0x80 + +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \ + SKBEDIT_F_QUEUE_MAPPING_CLASSID) struct tc_skbedit { tc_gen; diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 940091a7c7f0..9cb65bcce001 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -23,6 +24,25 @@ static unsigned int skbedit_net_id; static struct tc_action_ops act_skbedit_ops; +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params, + struct sk_buff *skb) +{ + u16 queue_mapping = params->queue_mapping; + u16 mapping_mod = params->mapping_mod; + u32 hash = 0; + + if (!(params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK)) + return netdev_cap_txqueue(skb->dev, queue_mapping); + + if (params->flags & SKBEDIT_F_QUEUE_MAPPING_CLASSID) + hash = jhash_1word(task_get_classid(skb), 0); + else if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH) + hash = skb_get_hash(skb); + + queue_mapping= (queue_mapping & 0xff) + hash % mapping_mod; + return netdev_cap_txqueue(skb->dev, queue_mapping); +} + static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { @@ -57,10 +77,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a, break; } } - if (params->flags & SKBEDIT_F_QUEUE_MAPPING && - skb->dev->real_num_tx_queues > params->queue_mapping) { + if (params->flags & SKBEDIT_F_QUEUE_MAPPING) { skb->tc_skip_txqueue = 1; - skb_set_queue_mapping(skb, params->queue_mapping); + skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb)); } if (params->flags & SKBEDIT_F_MARK) { skb->mark &= ~params->mask; @@ -110,6 +129,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct tcf_skbedit *d; u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL; u16 *queue_mapping = NULL, *ptype = NULL; + u16 mapping_mod = 0; bool exists = false; int ret = 0, err; u32 index; @@ -157,6 +177,21 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, if (*pure_flags & SKBEDIT_F_INHERITDSFIELD) flags |= SKBEDIT_F_INHERITDSFIELD; + if (*pure_flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) { + u16 max, min; + + if (!queue_mapping) + return -EINVAL; + + max = *queue_mapping >> 8; + min = *queue_mapping & 0xff; + if (max < min) + return -EINVAL; + + mapping_mod = max - min + 1; + flags |= *pure_flags & + SKBEDIT_F_QUEUE_MAPPING_HASH_MASK; + } } parm = nla_data(tb[TCA_SKBEDIT_PARMS]); @@ -206,8 +241,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, params_new->flags = flags; if (flags & SKBEDIT_F_PRIORITY) params_new->priority = *priority; - if (flags & SKBEDIT_F_QUEUE_MAPPING) + if (flags & SKBEDIT_F_QUEUE_MAPPING) { params_new->queue_mapping = *queue_mapping; + params_new->mapping_mod = mapping_mod; + } if (flags & SKBEDIT_F_MARK) params_new->mark = *mark; if (flags & SKBEDIT_F_PTYPE) @@ -274,6 +311,9 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a, goto nla_put_failure; if (params->flags & SKBEDIT_F_INHERITDSFIELD) pure_flags |= SKBEDIT_F_INHERITDSFIELD; + if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) + pure_flags |= params->flags & + SKBEDIT_F_QUEUE_MAPPING_HASH_MASK; if (pure_flags != 0 && nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags)) goto nla_put_failure;