net: sched: check tc_skip_classify as far as possible

Message ID	20211028135644.2258-1-xiangxia.m.yue@gmail.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: xiangxia.m.yue@gmail.com To: netdev@vger.kernel.org Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>, Willem de Bruijn <willemb@google.com>, Cong Wang <xiyou.wangcong@gmail.com>, Jakub Kicinski <kuba@kernel.org> Subject: [PATCH] net: sched: check tc_skip_classify as far as possible Date: Thu, 28 Oct 2021 21:56:44 +0800 Message-Id: <20211028135644.2258-1-xiangxia.m.yue@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	net: sched: check tc_skip_classify as far as possible \| expand net: sched: check tc_skip_classify as far as possible

Message ID

20211028135644.2258-1-xiangxia.m.yue@gmail.com (mailing list archive)

State

Superseded

Delegated to:

Netdev Maintainers

Headers

From: xiangxia.m.yue@gmail.com
To: netdev@vger.kernel.org
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>,
        Willem de Bruijn <willemb@google.com>,
        Cong Wang <xiyou.wangcong@gmail.com>,
        Jakub Kicinski <kuba@kernel.org>
Subject: [PATCH] net: sched: check tc_skip_classify as far as possible
Date: Thu, 28 Oct 2021 21:56:44 +0800
Message-Id: <20211028135644.2258-1-xiangxia.m.yue@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

net: sched: check tc_skip_classify as far as possible | expand

Context	Check	Description
netdev/tree_selection	success	Guessing tree name failed - patch did not apply

Context

Check

Description

netdev/tree_selection

success

Guessing tree name failed - patch did not apply

Commit Message

Tonghao Zhang Oct. 28, 2021, 1:56 p.m. UTC

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

We look up and then check tc_skip_classify flag in net
sched layer, even though skb don't want to be classified.
That case may consume a lot of cpu cycles.

Install the rules as below:
$ for id in $(seq 1 100); do
$ 	tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
$ done

netperf:
$ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
$ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32

Without this patch:
10662.33 tps
108.95 Mbit/s

With this patch:
12434.48 tps
145.89 Mbit/s

For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c      | 3 ++-
 net/sched/act_api.c | 3 ---
 2 files changed, 2 insertions(+), 4 deletions(-)

Comments

Daniel Borkmann Oct. 28, 2021, 2:28 p.m. UTC | #1

On 10/28/21 3:56 PM, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> We look up and then check tc_skip_classify flag in net
> sched layer, even though skb don't want to be classified.
> That case may consume a lot of cpu cycles.
> 
> Install the rules as below:
> $ for id in $(seq 1 100); do
> $ 	tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> $ done

Do you actually have such a case in practice or is this just hypothetical?
Asking as this feels rather broken to begin with.

> netperf:
> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> 
> Without this patch:
> 10662.33 tps
> 108.95 Mbit/s
> 
> With this patch:
> 12434.48 tps
> 145.89 Mbit/s
> 
> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
> 
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
>   net/core/dev.c      | 3 ++-
>   net/sched/act_api.c | 3 ---
>   2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index eb61a8821b3a..856ac1fb75b4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>   #ifdef CONFIG_NET_CLS_ACT
>   	skb->tc_at_ingress = 0;
>   # ifdef CONFIG_NET_EGRESS
> -	if (static_branch_unlikely(&egress_needed_key)) {
> +	if (static_branch_unlikely(&egress_needed_key) &&
> +	    !skb_skip_tc_classify(skb)) {
>   		skb = sch_handle_egress(skb, &rc, dev);
>   		if (!skb)
>   			goto out;
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 7dd3a2dc5fa4..bd66f27178be 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
>   	int i;
>   	int ret = TC_ACT_OK;
>   
> -	if (skb_skip_tc_classify(skb))
> -		return TC_ACT_OK;
> -

I think this might imply a change in behavior which could have the potential
to break setups in the wild.

>   restart_act_graph:
>   	for (i = 0; i < nr_actions; i++) {
>   		const struct tc_action *a = actions[i];
>

Tonghao Zhang Oct. 29, 2021, 12:04 a.m. UTC | #2

On Thu, Oct 28, 2021 at 10:28 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/28/21 3:56 PM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > We look up and then check tc_skip_classify flag in net
> > sched layer, even though skb don't want to be classified.
> > That case may consume a lot of cpu cycles.
> >
> > Install the rules as below:
> > $ for id in $(seq 1 100); do
> > $     tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > $ done
>
> Do you actually have such a case in practice or is this just hypothetical?
Hi Daniel, I did some research about this for k8s in production. There
are not so many tc prio(~5 different prio).
butg in this test, I use the 100 prio.

I reviewed the code, for the tx path, I think we check the
tc_skip_classify too later. In the rx path, we check it
in __netif_receive_skb_core.

> Asking as this feels rather broken to begin with.
> > netperf:
> > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> >
> > Without this patch:
> > 10662.33 tps
> > 108.95 Mbit/s
> >
> > With this patch:
> > 12434.48 tps
> > 145.89 Mbit/s
> >
> > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
> >
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Cong Wang <xiyou.wangcong@gmail.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> >   net/core/dev.c      | 3 ++-
> >   net/sched/act_api.c | 3 ---
> >   2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index eb61a8821b3a..856ac1fb75b4 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >   #ifdef CONFIG_NET_CLS_ACT
> >       skb->tc_at_ingress = 0;
> >   # ifdef CONFIG_NET_EGRESS
> > -     if (static_branch_unlikely(&egress_needed_key)) {
> > +     if (static_branch_unlikely(&egress_needed_key) &&
> > +         !skb_skip_tc_classify(skb)) {
> >               skb = sch_handle_egress(skb, &rc, dev);
> >               if (!skb)
> >                       goto out;
> > diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> > index 7dd3a2dc5fa4..bd66f27178be 100644
> > --- a/net/sched/act_api.c
> > +++ b/net/sched/act_api.c
> > @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
> >       int i;
> >       int ret = TC_ACT_OK;
> >
> > -     if (skb_skip_tc_classify(skb))
> > -             return TC_ACT_OK;
> > -
>
> I think this might imply a change in behavior which could have the potential
> to break setups in the wild.
we may not change this code, i will send v2, if not comment.
> >   restart_act_graph:
> >       for (i = 0; i < nr_actions; i++) {
> >               const struct tc_action *a = actions[i];
> >
>

Daniel Borkmann Oct. 29, 2021, 3:35 p.m. UTC | #3

On 10/29/21 2:04 AM, Tonghao Zhang wrote:
> On Thu, Oct 28, 2021 at 10:28 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 10/28/21 3:56 PM, xiangxia.m.yue@gmail.com wrote:
>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>
>>> We look up and then check tc_skip_classify flag in net
>>> sched layer, even though skb don't want to be classified.
>>> That case may consume a lot of cpu cycles.
>>>
>>> Install the rules as below:
>>> $ for id in $(seq 1 100); do
>>> $     tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>>> $ done
>>
>> Do you actually have such a case in practice or is this just hypothetical?
> Hi Daniel, I did some research about this for k8s in production. There
> are not so many tc prio(~5 different prio).
> butg in this test, I use the 100 prio.
> 
> I reviewed the code, for the tx path, I think we check the
> tc_skip_classify too later. In the rx path, we check it
> in __netif_receive_skb_core.
> 
>> Asking as this feels rather broken to begin with.
>>> netperf:
>>> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
>>> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
>>>
>>> Without this patch:
>>> 10662.33 tps
>>> 108.95 Mbit/s
>>>
>>> With this patch:
>>> 12434.48 tps
>>> 145.89 Mbit/s
>>>
>>> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
>>>
>>> Cc: Willem de Bruijn <willemb@google.com>
>>> Cc: Cong Wang <xiyou.wangcong@gmail.com>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>> ---
>>>    net/core/dev.c      | 3 ++-
>>>    net/sched/act_api.c | 3 ---
>>>    2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index eb61a8821b3a..856ac1fb75b4 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>>>    #ifdef CONFIG_NET_CLS_ACT
>>>        skb->tc_at_ingress = 0;
>>>    # ifdef CONFIG_NET_EGRESS
>>> -     if (static_branch_unlikely(&egress_needed_key)) {
>>> +     if (static_branch_unlikely(&egress_needed_key) &&
>>> +         !skb_skip_tc_classify(skb)) {
>>>                skb = sch_handle_egress(skb, &rc, dev);
>>>                if (!skb)
>>>                        goto out;
>>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>> index 7dd3a2dc5fa4..bd66f27178be 100644
>>> --- a/net/sched/act_api.c
>>> +++ b/net/sched/act_api.c
>>> @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
>>>        int i;
>>>        int ret = TC_ACT_OK;
>>>
>>> -     if (skb_skip_tc_classify(skb))
>>> -             return TC_ACT_OK;
>>> -
>>
>> I think this might imply a change in behavior which could have the potential
>> to break setups in the wild.
> we may not change this code, i will send v2, if not comment.

Well none of it I'm afraid, the sch_handle_egress() is out for a very long time by
now and your change could have the potential to break setups in the wild.

>>>    restart_act_graph:
>>>        for (i = 0; i < nr_actions; i++) {
>>>                const struct tc_action *a = actions[i];
>>>
>>
> 
>

diff --git a/net/core/dev.c b/net/core/dev.c
index eb61a8821b3a..856ac1fb75b4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4155,7 +4155,8 @@  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_at_ingress = 0;
 # ifdef CONFIG_NET_EGRESS
-	if (static_branch_unlikely(&egress_needed_key)) {
+	if (static_branch_unlikely(&egress_needed_key) &&
+	    !skb_skip_tc_classify(skb)) {
 		skb = sch_handle_egress(skb, &rc, dev);
 		if (!skb)
 			goto out;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 7dd3a2dc5fa4..bd66f27178be 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -722,9 +722,6 @@  int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
 	int i;
 	int ret = TC_ACT_OK;
 
-	if (skb_skip_tc_classify(skb))
-		return TC_ACT_OK;
-
 restart_act_graph:
 	for (i = 0; i < nr_actions; i++) {
 		const struct tc_action *a = actions[i];

net: sched: check tc_skip_classify as far as possible

Checks

Commit Message

Comments

Patch