diff mbox series

[net-next] bpf, net: Support redirecting to ifb with bpf

Message ID 20230413025350.79809-1-laoar.shao@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [net-next] bpf, net: Support redirecting to ifb with bpf | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 26 this patch: 26
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 26 this patch: 26
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on s390x with gcc

Commit Message

Yafang Shao April 13, 2023, 2:53 a.m. UTC
In our container environment, we are using EDT-bpf to limit the egress
bandwidth. EDT-bpf can be used to limit egress only, but can't be used
to limit ingress. Some of our users also want to limit the ingress
bandwidth. But after applying EDT-bpf, which is based on clsact qdisc,
it is impossible to limit the ingress bandwidth currently, due to some
reasons,
1). We can't add ingress qdisc
The ingress qdisc can't coexist with clsact qdisc as clsact has both
ingress and egress handler. So our traditional method to limit ingress
bandwidth can't work any more.
2). We can't redirect ingress packet to ifb with bpf
By trying to analyze if it is possible to redirect the ingress packet to
ifb with a bpf program, we find that the ifb device is not supported by
bpf redirect yet.

This patch tries to resolve it by supporting redirecting to ifb with bpf
program.

Ingress bandwidth limit is useful in some scenarios, for example, for the
TCP-based service, there may be lots of clients connecting it, so it is
not wise to limit the clients' egress. After limiting the server-side's
ingress, it will lower the send rate of the client by lowering the TCP
cwnd if the ingress bandwidth limit is reached. If we don't limit it,
the clients will continue sending requests at a high rate.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Daniel Borkmann April 13, 2023, 11:47 a.m. UTC | #1
On 4/13/23 4:53 AM, Yafang Shao wrote:
> In our container environment, we are using EDT-bpf to limit the egress
> bandwidth. EDT-bpf can be used to limit egress only, but can't be used
> to limit ingress. Some of our users also want to limit the ingress
> bandwidth. But after applying EDT-bpf, which is based on clsact qdisc,
> it is impossible to limit the ingress bandwidth currently, due to some
> reasons,
> 1). We can't add ingress qdisc
> The ingress qdisc can't coexist with clsact qdisc as clsact has both
> ingress and egress handler. So our traditional method to limit ingress
> bandwidth can't work any more.

I'm not following, the latter is a super set of the former, why do you
need it to co-exist?

> 2). We can't redirect ingress packet to ifb with bpf
> By trying to analyze if it is possible to redirect the ingress packet to
> ifb with a bpf program, we find that the ifb device is not supported by
> bpf redirect yet.

You actually can: Just let BPF program return TC_ACT_UNSPEC for this
case and then add a matchall with higher prio (so it runs after bpf)
that contains an action with mirred egress redirect that pushes to ifb
dev - there is no change needed.

> This patch tries to resolve it by supporting redirecting to ifb with bpf
> program.
> 
> Ingress bandwidth limit is useful in some scenarios, for example, for the
> TCP-based service, there may be lots of clients connecting it, so it is
> not wise to limit the clients' egress. After limiting the server-side's
> ingress, it will lower the send rate of the client by lowering the TCP
> cwnd if the ingress bandwidth limit is reached. If we don't limit it,
> the clients will continue sending requests at a high rate.

Adding artificial queueing for the inbound traffic, aren't you worried
about DoS'ing your node? If you need to tell the sender to slow down,
have you looked at hbm (https://lpc.events/event/4/contributions/486/,
samples/bpf/hbm_out_kern.c) which uses ECN CE marking to tell the TCP
sender to slow down? (Fwiw, for UDP https://github.com/cloudflare/rakelimit
would be an option.)

Thanks,
Daniel
Yafang Shao April 13, 2023, 2:20 p.m. UTC | #2
On Thu, Apr 13, 2023 at 7:47 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/13/23 4:53 AM, Yafang Shao wrote:
> > In our container environment, we are using EDT-bpf to limit the egress
> > bandwidth. EDT-bpf can be used to limit egress only, but can't be used
> > to limit ingress. Some of our users also want to limit the ingress
> > bandwidth. But after applying EDT-bpf, which is based on clsact qdisc,
> > it is impossible to limit the ingress bandwidth currently, due to some
> > reasons,
> > 1). We can't add ingress qdisc
> > The ingress qdisc can't coexist with clsact qdisc as clsact has both
> > ingress and egress handler. So our traditional method to limit ingress
> > bandwidth can't work any more.
>
> I'm not following, the latter is a super set of the former, why do you
> need it to co-exist?
>

It seems that I have a misunderstanding.

> > 2). We can't redirect ingress packet to ifb with bpf
> > By trying to analyze if it is possible to redirect the ingress packet to
> > ifb with a bpf program, we find that the ifb device is not supported by
> > bpf redirect yet.
>
> You actually can: Just let BPF program return TC_ACT_UNSPEC for this
> case and then add a matchall with higher prio (so it runs after bpf)
> that contains an action with mirred egress redirect that pushes to ifb
> dev - there is no change needed.
>

Ah, indeed, it works.
Many thanks for your help.

> > This patch tries to resolve it by supporting redirecting to ifb with bpf
> > program.
> >
> > Ingress bandwidth limit is useful in some scenarios, for example, for the
> > TCP-based service, there may be lots of clients connecting it, so it is
> > not wise to limit the clients' egress. After limiting the server-side's
> > ingress, it will lower the send rate of the client by lowering the TCP
> > cwnd if the ingress bandwidth limit is reached. If we don't limit it,
> > the clients will continue sending requests at a high rate.
>
> Adding artificial queueing for the inbound traffic, aren't you worried
> about DoS'ing your node?

Yes, we worried about it, but we haven't observed it in our data center.

> If you need to tell the sender to slow down,
> have you looked at hbm (https://lpc.events/event/4/contributions/486/,
> samples/bpf/hbm_out_kern.c) which uses ECN CE marking to tell the TCP
> sender to slow down? (Fwiw, for UDP https://github.com/cloudflare/rakelimit
> would be an option.)
>

We're considering using ECN. Thanks for your information, I will analyze it.
Toke Høiland-Jørgensen April 13, 2023, 2:43 p.m. UTC | #3
Daniel Borkmann <daniel@iogearbox.net> writes:

>> 2). We can't redirect ingress packet to ifb with bpf
>> By trying to analyze if it is possible to redirect the ingress packet to
>> ifb with a bpf program, we find that the ifb device is not supported by
>> bpf redirect yet.
>
> You actually can: Just let BPF program return TC_ACT_UNSPEC for this
> case and then add a matchall with higher prio (so it runs after bpf)
> that contains an action with mirred egress redirect that pushes to ifb
> dev - there is no change needed.

I wasn't aware that BPF couldn't redirect directly to an IFB; any reason
why we shouldn't merge this patch in any case?

>> This patch tries to resolve it by supporting redirecting to ifb with bpf
>> program.
>> 
>> Ingress bandwidth limit is useful in some scenarios, for example, for the
>> TCP-based service, there may be lots of clients connecting it, so it is
>> not wise to limit the clients' egress. After limiting the server-side's
>> ingress, it will lower the send rate of the client by lowering the TCP
>> cwnd if the ingress bandwidth limit is reached. If we don't limit it,
>> the clients will continue sending requests at a high rate.
>
> Adding artificial queueing for the inbound traffic, aren't you worried
> about DoS'ing your node?

Just as an aside, the ingress filter -> ifb -> qdisc on the ifb
interface does work surprisingly well, and we've been using that over in
OpenWrt land for years[0]. It does have some overhead associated with it,
but I wouldn't expect it to be a source of self-DoS in itself (assuming
well-behaved TCP traffic).

-Toke

[0] https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm
Daniel Borkmann April 14, 2023, 9:34 a.m. UTC | #4
On 4/13/23 4:43 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
> 
>>> 2). We can't redirect ingress packet to ifb with bpf
>>> By trying to analyze if it is possible to redirect the ingress packet to
>>> ifb with a bpf program, we find that the ifb device is not supported by
>>> bpf redirect yet.
>>
>> You actually can: Just let BPF program return TC_ACT_UNSPEC for this
>> case and then add a matchall with higher prio (so it runs after bpf)
>> that contains an action with mirred egress redirect that pushes to ifb
>> dev - there is no change needed.
> 
> I wasn't aware that BPF couldn't redirect directly to an IFB; any reason
> why we shouldn't merge this patch in any case?
> 
>>> This patch tries to resolve it by supporting redirecting to ifb with bpf
>>> program.
>>>
>>> Ingress bandwidth limit is useful in some scenarios, for example, for the
>>> TCP-based service, there may be lots of clients connecting it, so it is
>>> not wise to limit the clients' egress. After limiting the server-side's
>>> ingress, it will lower the send rate of the client by lowering the TCP
>>> cwnd if the ingress bandwidth limit is reached. If we don't limit it,
>>> the clients will continue sending requests at a high rate.
>>
>> Adding artificial queueing for the inbound traffic, aren't you worried
>> about DoS'ing your node?
> 
> Just as an aside, the ingress filter -> ifb -> qdisc on the ifb
> interface does work surprisingly well, and we've been using that over in
> OpenWrt land for years[0]. It does have some overhead associated with it,
> but I wouldn't expect it to be a source of self-DoS in itself (assuming
> well-behaved TCP traffic).

Out of curiosity, wrt OpenWrt case, can you elaborate on the use case to why
choosing to do this on ingress via ifb rather than on the egress side? I
presume in this case it's regular router, so pkts would be forwarded anyway,
and in your case traversing qdisc layer / queuing twice (ingress phys dev ->
ifb, egress phys dev), right? What is the rationale that would justify such
setup aka why it cannot be solved differently?

Thanks,
Daniel

> -Toke
> 
> [0] https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm
Toke Høiland-Jørgensen April 14, 2023, 4:07 p.m. UTC | #5
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/13/23 4:43 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>> 
>>>> 2). We can't redirect ingress packet to ifb with bpf
>>>> By trying to analyze if it is possible to redirect the ingress packet to
>>>> ifb with a bpf program, we find that the ifb device is not supported by
>>>> bpf redirect yet.
>>>
>>> You actually can: Just let BPF program return TC_ACT_UNSPEC for this
>>> case and then add a matchall with higher prio (so it runs after bpf)
>>> that contains an action with mirred egress redirect that pushes to ifb
>>> dev - there is no change needed.
>> 
>> I wasn't aware that BPF couldn't redirect directly to an IFB; any reason
>> why we shouldn't merge this patch in any case?
>> 
>>>> This patch tries to resolve it by supporting redirecting to ifb with bpf
>>>> program.
>>>>
>>>> Ingress bandwidth limit is useful in some scenarios, for example, for the
>>>> TCP-based service, there may be lots of clients connecting it, so it is
>>>> not wise to limit the clients' egress. After limiting the server-side's
>>>> ingress, it will lower the send rate of the client by lowering the TCP
>>>> cwnd if the ingress bandwidth limit is reached. If we don't limit it,
>>>> the clients will continue sending requests at a high rate.
>>>
>>> Adding artificial queueing for the inbound traffic, aren't you worried
>>> about DoS'ing your node?
>> 
>> Just as an aside, the ingress filter -> ifb -> qdisc on the ifb
>> interface does work surprisingly well, and we've been using that over in
>> OpenWrt land for years[0]. It does have some overhead associated with it,
>> but I wouldn't expect it to be a source of self-DoS in itself (assuming
>> well-behaved TCP traffic).
>
> Out of curiosity, wrt OpenWrt case, can you elaborate on the use case to why
> choosing to do this on ingress via ifb rather than on the egress side? I
> presume in this case it's regular router, so pkts would be forwarded anyway,
> and in your case traversing qdisc layer / queuing twice (ingress phys dev ->
> ifb, egress phys dev), right? What is the rationale that would justify such
> setup aka why it cannot be solved differently?

Because there's not always a single egress on the other side. These are
mainly home routers, which tend to have one or more WiFi devices bridged
to one or more ethernet ports on the LAN side, and a single upstream WAN
port. And the objective is to control the total amount of traffic going
over the WAN link (in both directions), to deal with bufferbloat in the
ISP network (which is sadly still all too prevalent).

In this setup, the traffic can be split arbitrarily between the links on
the LAN side, and the only "single bottleneck" is the WAN link. So we
install both egress and ingress shapers on this, configured to something
like 95-98% of the true link bandwidth, thus moving the queues into the
qdisc layer in the router. It's usually necessary to set the ingress
bandwidth shaper a bit lower than the egress due to being "downstream"
of the bottleneck link, but it does work surprisingly well.

We usually use something like a matchall filter to put all ingress
traffic on the ifb, so doing the redirect from BPF has not been an
immediate requirement thus far. However, it does seem a bit odd that
this is not possible, and we do have a BPF-based filter that layers on
top of this kind of setup, which currently uses u32 as the ingress
filter and so it could presumably be improved to use BPF instead if that
was available:
https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README

-Toke
Daniel Borkmann April 14, 2023, 10:57 p.m. UTC | #6
On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
[...]
> https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README

Thanks for the explanation, that sounds reasonable and this should ideally
be part of the commit msg! Yafang, Toke, how about we craft it the following
way then to support this case:

 From f6c83e5e55c5eb9da8acd19369c688acf53951db Mon Sep 17 00:00:00 2001
Message-Id: <f6c83e5e55c5eb9da8acd19369c688acf53951db.1681512637.git.daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 15 Apr 2023 00:30:27 +0200
Subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There are some use-cases where it is desirable to use bpf_redirect()
in combination with ifb device, which currently is not supported, for
example, around filtering inbound traffic with BPF to then push it to
ifb which holds the qdisc for shaping in contrast to doing that on the
egress device.

Toke mentions the following case related to OpenWrt:

   Because there's not always a single egress on the other side. These are
   mainly home routers, which tend to have one or more WiFi devices bridged
   to one or more ethernet ports on the LAN side, and a single upstream WAN
   port. And the objective is to control the total amount of traffic going
   over the WAN link (in both directions), to deal with bufferbloat in the
   ISP network (which is sadly still all too prevalent).

   In this setup, the traffic can be split arbitrarily between the links
   on the LAN side, and the only "single bottleneck" is the WAN link. So we
   install both egress and ingress shapers on this, configured to something
   like 95-98% of the true link bandwidth, thus moving the queues into the
   qdisc layer in the router. It's usually necessary to set the ingress
   bandwidth shaper a bit lower than the egress due to being "downstream"
   of the bottleneck link, but it does work surprisingly well.

   We usually use something like a matchall filter to put all ingress
   traffic on the ifb, so doing the redirect from BPF has not been an
   immediate requirement thus far. However, it does seem a bit odd that this
   is not possible, and we do have a BPF-based filter that layers on top of
   this kind of setup, which currently uses u32 as the ingress filter and
   so it could presumably be improved to use BPF instead if that was
   available.

Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reported-by: Yafang Shao <laoar.shao@gmail.com>
Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README
Link: https://lore.kernel.org/bpf/875y9yzbuy.fsf@toke.dk
---
  include/linux/skbuff.h | 9 +++++++++
  net/core/filter.c      | 1 +
  2 files changed, 10 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ff7ad331fb82..2bbf9245640a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -5049,6 +5049,15 @@ static inline void skb_reset_redirect(struct sk_buff *skb)
  	skb->redirected = 0;
  }

+static inline void skb_set_redirected_noclear(struct sk_buff *skb,
+					      bool from_ingress)
+{
+	skb->redirected = 1;
+#ifdef CONFIG_NET_REDIRECT
+	skb->from_ingress = from_ingress;
+#endif
+}
+
  static inline bool skb_csum_is_sctp(struct sk_buff *skb)
  {
  	return skb->csum_not_inet;
diff --git a/net/core/filter.c b/net/core/filter.c
index 1d6f165923bf..27ba616aaa1a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2111,6 +2111,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
  	}

  	skb->dev = dev;
+	skb_set_redirected_noclear(skb, skb->tc_at_ingress);
  	skb_clear_tstamp(skb);

  	dev_xmit_recursion_inc();
kernel test robot April 15, 2023, 12:43 a.m. UTC | #7
Hi Daniel,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/c68bf723-3406-d177-49b4-6d5b485048de%40iogearbox.net
patch subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb
config: x86_64-randconfig-a002-20230410 (https://download.01.org/0day-ci/archive/20230415/202304150814.os34v8BI-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/91c0d1d0b071c23d95bf9747b89daf5ae378ad1a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912
        git checkout 91c0d1d0b071c23d95bf9747b89daf5ae378ad1a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304150814.os34v8BI-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/core/filter.c:2125:39: error: no member named 'tc_at_ingress' in 'struct sk_buff'
           skb_set_redirected_noclear(skb, skb->tc_at_ingress);
                                           ~~~  ^
   1 error generated.


vim +2125 net/core/filter.c

  2113	
  2114	static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
  2115	{
  2116		int ret;
  2117	
  2118		if (dev_xmit_recursion()) {
  2119			net_crit_ratelimited("bpf: recursion limit reached on datapath, buggy bpf program?\n");
  2120			kfree_skb(skb);
  2121			return -ENETDOWN;
  2122		}
  2123	
  2124		skb->dev = dev;
> 2125		skb_set_redirected_noclear(skb, skb->tc_at_ingress);
  2126		skb_clear_tstamp(skb);
  2127	
  2128		dev_xmit_recursion_inc();
  2129		ret = dev_queue_xmit(skb);
  2130		dev_xmit_recursion_dec();
  2131	
  2132		return ret;
  2133	}
  2134
kernel test robot April 15, 2023, 1:04 a.m. UTC | #8
Hi Daniel,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/c68bf723-3406-d177-49b4-6d5b485048de%40iogearbox.net
patch subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb
config: x86_64-randconfig-a014-20230410 (https://download.01.org/0day-ci/archive/20230415/202304150811.bzx9niRq-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/91c0d1d0b071c23d95bf9747b89daf5ae378ad1a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912
        git checkout 91c0d1d0b071c23d95bf9747b89daf5ae378ad1a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304150811.bzx9niRq-lkp@intel.com/

All errors (new ones prefixed by >>):

   net/core/filter.c: In function '__bpf_tx_skb':
>> net/core/filter.c:2125:44: error: 'struct sk_buff' has no member named 'tc_at_ingress'
    2125 |         skb_set_redirected_noclear(skb, skb->tc_at_ingress);
         |                                            ^~


vim +2125 net/core/filter.c

  2113	
  2114	static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
  2115	{
  2116		int ret;
  2117	
  2118		if (dev_xmit_recursion()) {
  2119			net_crit_ratelimited("bpf: recursion limit reached on datapath, buggy bpf program?\n");
  2120			kfree_skb(skb);
  2121			return -ENETDOWN;
  2122		}
  2123	
  2124		skb->dev = dev;
> 2125		skb_set_redirected_noclear(skb, skb->tc_at_ingress);
  2126		skb_clear_tstamp(skb);
  2127	
  2128		dev_xmit_recursion_inc();
  2129		ret = dev_queue_xmit(skb);
  2130		dev_xmit_recursion_dec();
  2131	
  2132		return ret;
  2133	}
  2134
Yafang Shao April 15, 2023, 1:38 p.m. UTC | #9
On Sat, Apr 15, 2023 at 6:57 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote:
> > Daniel Borkmann <daniel@iogearbox.net> writes:
> [...]
> > https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README
>
> Thanks for the explanation, that sounds reasonable and this should ideally
> be part of the commit msg! Yafang, Toke, how about we craft it the following
> way then to support this case:
>

LGTM. With the issue reported by kernel test robot [1] fixed,

Acked-by: Yafang Shao <laoar.shao@gmail.com>

[1]. https://lore.kernel.org/bpf/202304150811.bzx9niRq-lkp@intel.com/

>  From f6c83e5e55c5eb9da8acd19369c688acf53951db Mon Sep 17 00:00:00 2001
> Message-Id: <f6c83e5e55c5eb9da8acd19369c688acf53951db.1681512637.git.daniel@iogearbox.net>
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Sat, 15 Apr 2023 00:30:27 +0200
> Subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> There are some use-cases where it is desirable to use bpf_redirect()
> in combination with ifb device, which currently is not supported, for
> example, around filtering inbound traffic with BPF to then push it to
> ifb which holds the qdisc for shaping in contrast to doing that on the
> egress device.
>
> Toke mentions the following case related to OpenWrt:
>
>    Because there's not always a single egress on the other side. These are
>    mainly home routers, which tend to have one or more WiFi devices bridged
>    to one or more ethernet ports on the LAN side, and a single upstream WAN
>    port. And the objective is to control the total amount of traffic going
>    over the WAN link (in both directions), to deal with bufferbloat in the
>    ISP network (which is sadly still all too prevalent).
>
>    In this setup, the traffic can be split arbitrarily between the links
>    on the LAN side, and the only "single bottleneck" is the WAN link. So we
>    install both egress and ingress shapers on this, configured to something
>    like 95-98% of the true link bandwidth, thus moving the queues into the
>    qdisc layer in the router. It's usually necessary to set the ingress
>    bandwidth shaper a bit lower than the egress due to being "downstream"
>    of the bottleneck link, but it does work surprisingly well.
>
>    We usually use something like a matchall filter to put all ingress
>    traffic on the ifb, so doing the redirect from BPF has not been an
>    immediate requirement thus far. However, it does seem a bit odd that this
>    is not possible, and we do have a BPF-based filter that layers on top of
>    this kind of setup, which currently uses u32 as the ingress filter and
>    so it could presumably be improved to use BPF instead if that was
>    available.
>
> Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reported-by: Yafang Shao <laoar.shao@gmail.com>
> Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README
> Link: https://lore.kernel.org/bpf/875y9yzbuy.fsf@toke.dk
> ---
>   include/linux/skbuff.h | 9 +++++++++
>   net/core/filter.c      | 1 +
>   2 files changed, 10 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index ff7ad331fb82..2bbf9245640a 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -5049,6 +5049,15 @@ static inline void skb_reset_redirect(struct sk_buff *skb)
>         skb->redirected = 0;
>   }
>
> +static inline void skb_set_redirected_noclear(struct sk_buff *skb,
> +                                             bool from_ingress)
> +{
> +       skb->redirected = 1;
> +#ifdef CONFIG_NET_REDIRECT
> +       skb->from_ingress = from_ingress;
> +#endif
> +}
> +
>   static inline bool skb_csum_is_sctp(struct sk_buff *skb)
>   {
>         return skb->csum_not_inet;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1d6f165923bf..27ba616aaa1a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2111,6 +2111,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
>         }
>
>         skb->dev = dev;
> +       skb_set_redirected_noclear(skb, skb->tc_at_ingress);
>         skb_clear_tstamp(skb);
>
>         dev_xmit_recursion_inc();
> --
> 2.21.0
Toke Høiland-Jørgensen April 15, 2023, 2 p.m. UTC | #10
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
> [...]
>> https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README
>
> Thanks for the explanation, that sounds reasonable and this should ideally
> be part of the commit msg! Yafang, Toke, how about we craft it the following
> way then to support this case:

SGTM! With the kbot complaint fixed:

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Daniel Borkmann April 17, 2023, 1:50 p.m. UTC | #11
On 4/15/23 4:00 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
> 
>> On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>> [...]
>>> https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README
>>
>> Thanks for the explanation, that sounds reasonable and this should ideally
>> be part of the commit msg! Yafang, Toke, how about we craft it the following
>> way then to support this case:
> 
> SGTM! With the kbot complaint fixed:

Sounds good, thanks both; sent out now.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index c785319..da6b196 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3956,6 +3956,7 @@  int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* No need to push/pop skb's mac_header here on egress! */
+		skb_set_redirected(skb, skb->tc_at_ingress);
 		skb_do_redirect(skb);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
@@ -5138,6 +5139,7 @@  static __latent_entropy void net_tx_action(struct softirq_action *h)
 		 * redirecting to another netdev
 		 */
 		__skb_push(skb, skb->mac_len);
+		skb_set_redirected(skb, skb->tc_at_ingress);
 		if (skb_do_redirect(skb) == -EAGAIN) {
 			__skb_pull(skb, skb->mac_len);
 			*another = true;