Message ID | 20211028135644.2258-1-xiangxia.m.yue@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: sched: check tc_skip_classify as far as possible | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
On 10/28/21 3:56 PM, xiangxia.m.yue@gmail.com wrote: > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > We look up and then check tc_skip_classify flag in net > sched layer, even though skb don't want to be classified. > That case may consume a lot of cpu cycles. > > Install the rules as below: > $ for id in $(seq 1 100); do > $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > $ done Do you actually have such a case in practice or is this just hypothetical? Asking as this feels rather broken to begin with. > netperf: > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > > Without this patch: > 10662.33 tps > 108.95 Mbit/s > > With this patch: > 12434.48 tps > 145.89 Mbit/s > > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. > > Cc: Willem de Bruijn <willemb@google.com> > Cc: Cong Wang <xiyou.wangcong@gmail.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > --- > net/core/dev.c | 3 ++- > net/sched/act_api.c | 3 --- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index eb61a8821b3a..856ac1fb75b4 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > #ifdef CONFIG_NET_CLS_ACT > skb->tc_at_ingress = 0; > # ifdef CONFIG_NET_EGRESS > - if (static_branch_unlikely(&egress_needed_key)) { > + if (static_branch_unlikely(&egress_needed_key) && > + !skb_skip_tc_classify(skb)) { > skb = sch_handle_egress(skb, &rc, dev); > if (!skb) > goto out; > diff --git a/net/sched/act_api.c b/net/sched/act_api.c > index 7dd3a2dc5fa4..bd66f27178be 100644 > --- a/net/sched/act_api.c > +++ b/net/sched/act_api.c > @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, > int i; > int ret = TC_ACT_OK; > > - if (skb_skip_tc_classify(skb)) > - return TC_ACT_OK; > - I think this might imply a change in behavior which could have the potential to break setups in the wild. > restart_act_graph: > for (i = 0; i < nr_actions; i++) { > const struct tc_action *a = actions[i]; >
On Thu, Oct 28, 2021 at 10:28 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 10/28/21 3:56 PM, xiangxia.m.yue@gmail.com wrote: > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > We look up and then check tc_skip_classify flag in net > > sched layer, even though skb don't want to be classified. > > That case may consume a lot of cpu cycles. > > > > Install the rules as below: > > $ for id in $(seq 1 100); do > > $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > > $ done > > Do you actually have such a case in practice or is this just hypothetical? Hi Daniel, I did some research about this for k8s in production. There are not so many tc prio(~5 different prio). butg in this test, I use the 100 prio. I reviewed the code, for the tx path, I think we check the tc_skip_classify too later. In the rx path, we check it in __netif_receive_skb_core. > Asking as this feels rather broken to begin with. > > netperf: > > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > > > > Without this patch: > > 10662.33 tps > > 108.95 Mbit/s > > > > With this patch: > > 12434.48 tps > > 145.89 Mbit/s > > > > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. > > > > Cc: Willem de Bruijn <willemb@google.com> > > Cc: Cong Wang <xiyou.wangcong@gmail.com> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > --- > > net/core/dev.c | 3 ++- > > net/sched/act_api.c | 3 --- > > 2 files changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index eb61a8821b3a..856ac1fb75b4 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > #ifdef CONFIG_NET_CLS_ACT > > skb->tc_at_ingress = 0; > > # ifdef CONFIG_NET_EGRESS > > - if (static_branch_unlikely(&egress_needed_key)) { > > + if (static_branch_unlikely(&egress_needed_key) && > > + !skb_skip_tc_classify(skb)) { > > skb = sch_handle_egress(skb, &rc, dev); > > if (!skb) > > goto out; > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c > > index 7dd3a2dc5fa4..bd66f27178be 100644 > > --- a/net/sched/act_api.c > > +++ b/net/sched/act_api.c > > @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, > > int i; > > int ret = TC_ACT_OK; > > > > - if (skb_skip_tc_classify(skb)) > > - return TC_ACT_OK; > > - > > I think this might imply a change in behavior which could have the potential > to break setups in the wild. we may not change this code, i will send v2, if not comment. > > restart_act_graph: > > for (i = 0; i < nr_actions; i++) { > > const struct tc_action *a = actions[i]; > > >
On 10/29/21 2:04 AM, Tonghao Zhang wrote: > On Thu, Oct 28, 2021 at 10:28 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 10/28/21 3:56 PM, xiangxia.m.yue@gmail.com wrote: >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com> >>> >>> We look up and then check tc_skip_classify flag in net >>> sched layer, even though skb don't want to be classified. >>> That case may consume a lot of cpu cycles. >>> >>> Install the rules as below: >>> $ for id in $(seq 1 100); do >>> $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 >>> $ done >> >> Do you actually have such a case in practice or is this just hypothetical? > Hi Daniel, I did some research about this for k8s in production. There > are not so many tc prio(~5 different prio). > butg in this test, I use the 100 prio. > > I reviewed the code, for the tx path, I think we check the > tc_skip_classify too later. In the rx path, we check it > in __netif_receive_skb_core. > >> Asking as this feels rather broken to begin with. >>> netperf: >>> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 >>> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 >>> >>> Without this patch: >>> 10662.33 tps >>> 108.95 Mbit/s >>> >>> With this patch: >>> 12434.48 tps >>> 145.89 Mbit/s >>> >>> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. >>> >>> Cc: Willem de Bruijn <willemb@google.com> >>> Cc: Cong Wang <xiyou.wangcong@gmail.com> >>> Cc: Jakub Kicinski <kuba@kernel.org> >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> >>> --- >>> net/core/dev.c | 3 ++- >>> net/sched/act_api.c | 3 --- >>> 2 files changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/net/core/dev.c b/net/core/dev.c >>> index eb61a8821b3a..856ac1fb75b4 100644 >>> --- a/net/core/dev.c >>> +++ b/net/core/dev.c >>> @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) >>> #ifdef CONFIG_NET_CLS_ACT >>> skb->tc_at_ingress = 0; >>> # ifdef CONFIG_NET_EGRESS >>> - if (static_branch_unlikely(&egress_needed_key)) { >>> + if (static_branch_unlikely(&egress_needed_key) && >>> + !skb_skip_tc_classify(skb)) { >>> skb = sch_handle_egress(skb, &rc, dev); >>> if (!skb) >>> goto out; >>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c >>> index 7dd3a2dc5fa4..bd66f27178be 100644 >>> --- a/net/sched/act_api.c >>> +++ b/net/sched/act_api.c >>> @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, >>> int i; >>> int ret = TC_ACT_OK; >>> >>> - if (skb_skip_tc_classify(skb)) >>> - return TC_ACT_OK; >>> - >> >> I think this might imply a change in behavior which could have the potential >> to break setups in the wild. > we may not change this code, i will send v2, if not comment. Well none of it I'm afraid, the sch_handle_egress() is out for a very long time by now and your change could have the potential to break setups in the wild. >>> restart_act_graph: >>> for (i = 0; i < nr_actions; i++) { >>> const struct tc_action *a = actions[i]; >>> >> > >
diff --git a/net/core/dev.c b/net/core/dev.c index eb61a8821b3a..856ac1fb75b4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) #ifdef CONFIG_NET_CLS_ACT skb->tc_at_ingress = 0; # ifdef CONFIG_NET_EGRESS - if (static_branch_unlikely(&egress_needed_key)) { + if (static_branch_unlikely(&egress_needed_key) && + !skb_skip_tc_classify(skb)) { skb = sch_handle_egress(skb, &rc, dev); if (!skb) goto out; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 7dd3a2dc5fa4..bd66f27178be 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, int i; int ret = TC_ACT_OK; - if (skb_skip_tc_classify(skb)) - return TC_ACT_OK; - restart_act_graph: for (i = 0; i < nr_actions; i++) { const struct tc_action *a = actions[i];