diff mbox series

[net,v4,2/3] net: sched: add check tc_skip_classify in sch egress

Message ID 20211202024723.76257-3-xiangxia.m.yue@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Fix bpf_redirect to ifb netdev | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers warning 1 maintainers not CCed: bpf@vger.kernel.org
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 9 this patch: 9
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next success VM_Test

Commit Message

Tonghao Zhang Dec. 2, 2021, 2:47 a.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Try to resolve the issues as below:
* We look up and then check tc_skip_classify flag in net
  sched layer, even though skb don't want to be classified.
  That case may consume a lot of cpu cycles.

  Install the rules as below:
  $ for id in $(seq 1 10000); do
  $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
  $ done

  netperf:
  $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
  $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32

  Before: 152.04 tps, 0.58 Mbit/s
  After:  303.07 tps, 1.51 Mbit/s
  For TCP_RR, there are 99.3% improvement, TCP_STREAM 160.3%.

* bpf_redirect may be invoked in egress path. If we don't
  check the flags and then return immediately, the packets
  will loopback.

  $ tc filter add dev eth0 egress bpf direct-action obj \
	ifb.o sec ifb

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Daniel Borkmann Dec. 3, 2021, 9:35 p.m. UTC | #1
On 12/2/21 3:47 AM, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> Try to resolve the issues as below:
> * We look up and then check tc_skip_classify flag in net
>    sched layer, even though skb don't want to be classified.
>    That case may consume a lot of cpu cycles.
> 
>    Install the rules as below:
>    $ for id in $(seq 1 10000); do
>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>    $ done
> 
>    netperf:
>    $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
>    $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> 
>    Before: 152.04 tps, 0.58 Mbit/s
>    After:  303.07 tps, 1.51 Mbit/s
>    For TCP_RR, there are 99.3% improvement, TCP_STREAM 160.3%.

As it was pointed out earlier by Eric in v3, these numbers are moot since noone
is realistically running such a setup in practice with 10k linear rules.
Eric Dumazet Dec. 3, 2021, 9:41 p.m. UTC | #2
On Fri, Dec 3, 2021 at 1:35 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/2/21 3:47 AM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >    sched layer, even though skb don't want to be classified.
> >    That case may consume a lot of cpu cycles.
> >
> >    Install the rules as below:
> >    $ for id in $(seq 1 10000); do
> >    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >    $ done
> >
> >    netperf:
> >    $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> >    $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> >
> >    Before: 152.04 tps, 0.58 Mbit/s
> >    After:  303.07 tps, 1.51 Mbit/s
> >    For TCP_RR, there are 99.3% improvement, TCP_STREAM 160.3%.
>
> As it was pointed out earlier by Eric in v3, these numbers are moot since noone
> is realistically running such a setup in practice with 10k linear rules.

Yes, I am so sorry that I used a sarcastic comment.

I really should have asked if a real world case was using a lot of filters.
If so, maybe we can do something about that, for packets actually
going through these filters.

Thanks
Tonghao Zhang Dec. 4, 2021, 1:42 a.m. UTC | #3
On Sat, Dec 4, 2021 at 5:35 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/2/21 3:47 AM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >    sched layer, even though skb don't want to be classified.
> >    That case may consume a lot of cpu cycles.
> >
> >    Install the rules as below:
> >    $ for id in $(seq 1 10000); do
> >    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >    $ done
> >
> >    netperf:
> >    $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> >    $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> >
> >    Before: 152.04 tps, 0.58 Mbit/s
> >    After:  303.07 tps, 1.51 Mbit/s
> >    For TCP_RR, there are 99.3% improvement, TCP_STREAM 160.3%.
>
> As it was pointed out earlier by Eric in v3, these numbers are moot since noone
> is realistically running such a setup in practice with 10k linear rules.
Yes. As I said in v1, in production, we use the 5+ prio. With this
patch, little improvements, 1.x%

This patch also fixes the packets loopback, if we use the bpf_redirect
to ifb in egress path.
Tonghao Zhang Dec. 7, 2021, 2:16 a.m. UTC | #4
On Sat, Dec 4, 2021 at 9:42 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 4, 2021 at 5:35 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 12/2/21 3:47 AM, xiangxia.m.yue@gmail.com wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >    sched layer, even though skb don't want to be classified.
> > >    That case may consume a lot of cpu cycles.
> > >
> > >    Install the rules as below:
> > >    $ for id in $(seq 1 10000); do
> > >    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >    $ done
> > >
> > >    netperf:
> > >    $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> > >    $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> > >
> > >    Before: 152.04 tps, 0.58 Mbit/s
> > >    After:  303.07 tps, 1.51 Mbit/s
> > >    For TCP_RR, there are 99.3% improvement, TCP_STREAM 160.3%.
> >
> > As it was pointed out earlier by Eric in v3, these numbers are moot since noone
> > is realistically running such a setup in practice with 10k linear rules.
> Yes. As I said in v1, in production, we use the 5+ prio. With this
> patch, little improvements, 1.x%
>
> This patch also fixes the packets loopback, if we use the bpf_redirect
> to ifb in egress path.
Hi Daniel, Eric
What should I do next?This patch try to fix the bug, and improve the
performance(~1% in production).
Should I update the commit message and send v5?
> --
> Best regards, Tonghao
Eric Dumazet Dec. 7, 2021, 2:38 a.m. UTC | #5
On Mon, Dec 6, 2021 at 6:17 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 4, 2021 at 9:42 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Sat, Dec 4, 2021 at 5:35 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 12/2/21 3:47 AM, xiangxia.m.yue@gmail.com wrote:
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > Try to resolve the issues as below:
> > > > * We look up and then check tc_skip_classify flag in net
> > > >    sched layer, even though skb don't want to be classified.
> > > >    That case may consume a lot of cpu cycles.
> > > >
> > > >    Install the rules as below:
> > > >    $ for id in $(seq 1 10000); do
> > > >    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > > >    $ done
> > > >
> > > >    netperf:
> > > >    $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> > > >    $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> > > >
> > > >    Before: 152.04 tps, 0.58 Mbit/s
> > > >    After:  303.07 tps, 1.51 Mbit/s
> > > >    For TCP_RR, there are 99.3% improvement, TCP_STREAM 160.3%.
> > >
> > > As it was pointed out earlier by Eric in v3, these numbers are moot since noone
> > > is realistically running such a setup in practice with 10k linear rules.
> > Yes. As I said in v1, in production, we use the 5+ prio. With this
> > patch, little improvements, 1.x%
> >
> > This patch also fixes the packets loopback, if we use the bpf_redirect
> > to ifb in egress path.
> Hi Daniel, Eric
> What should I do next?This patch try to fix the bug, and improve the
> performance(~1% in production).
> Should I update the commit message and send v5?


This is adding yet another bit in skb :/

We also have another patch series today adding one bit in skb.
For me, this is really an issue. This should be a last resort.

For example xmit_more is no longer a field in skb.
Why, because most probably this property  does not need to stick to
skb, but the context of execution.

Also this is not clear how stacked devices will be handled
(bonding/team/tunnels)
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index d30adecc2bb2..10bad44e2ec4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3823,6 +3823,9 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!miniq)
 		return skb;
 
+	if (skb_skip_tc_classify(skb))
+		return skb;
+
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_skb_cb(skb)->mru = 0;
 	qdisc_skb_cb(skb)->post_ct = false;