Message ID | 20230413025350.79809-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [net-next] bpf, net: Support redirecting to ifb with bpf | expand |
On 4/13/23 4:53 AM, Yafang Shao wrote: > In our container environment, we are using EDT-bpf to limit the egress > bandwidth. EDT-bpf can be used to limit egress only, but can't be used > to limit ingress. Some of our users also want to limit the ingress > bandwidth. But after applying EDT-bpf, which is based on clsact qdisc, > it is impossible to limit the ingress bandwidth currently, due to some > reasons, > 1). We can't add ingress qdisc > The ingress qdisc can't coexist with clsact qdisc as clsact has both > ingress and egress handler. So our traditional method to limit ingress > bandwidth can't work any more. I'm not following, the latter is a super set of the former, why do you need it to co-exist? > 2). We can't redirect ingress packet to ifb with bpf > By trying to analyze if it is possible to redirect the ingress packet to > ifb with a bpf program, we find that the ifb device is not supported by > bpf redirect yet. You actually can: Just let BPF program return TC_ACT_UNSPEC for this case and then add a matchall with higher prio (so it runs after bpf) that contains an action with mirred egress redirect that pushes to ifb dev - there is no change needed. > This patch tries to resolve it by supporting redirecting to ifb with bpf > program. > > Ingress bandwidth limit is useful in some scenarios, for example, for the > TCP-based service, there may be lots of clients connecting it, so it is > not wise to limit the clients' egress. After limiting the server-side's > ingress, it will lower the send rate of the client by lowering the TCP > cwnd if the ingress bandwidth limit is reached. If we don't limit it, > the clients will continue sending requests at a high rate. Adding artificial queueing for the inbound traffic, aren't you worried about DoS'ing your node? If you need to tell the sender to slow down, have you looked at hbm (https://lpc.events/event/4/contributions/486/, samples/bpf/hbm_out_kern.c) which uses ECN CE marking to tell the TCP sender to slow down? (Fwiw, for UDP https://github.com/cloudflare/rakelimit would be an option.) Thanks, Daniel
On Thu, Apr 13, 2023 at 7:47 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 4/13/23 4:53 AM, Yafang Shao wrote: > > In our container environment, we are using EDT-bpf to limit the egress > > bandwidth. EDT-bpf can be used to limit egress only, but can't be used > > to limit ingress. Some of our users also want to limit the ingress > > bandwidth. But after applying EDT-bpf, which is based on clsact qdisc, > > it is impossible to limit the ingress bandwidth currently, due to some > > reasons, > > 1). We can't add ingress qdisc > > The ingress qdisc can't coexist with clsact qdisc as clsact has both > > ingress and egress handler. So our traditional method to limit ingress > > bandwidth can't work any more. > > I'm not following, the latter is a super set of the former, why do you > need it to co-exist? > It seems that I have a misunderstanding. > > 2). We can't redirect ingress packet to ifb with bpf > > By trying to analyze if it is possible to redirect the ingress packet to > > ifb with a bpf program, we find that the ifb device is not supported by > > bpf redirect yet. > > You actually can: Just let BPF program return TC_ACT_UNSPEC for this > case and then add a matchall with higher prio (so it runs after bpf) > that contains an action with mirred egress redirect that pushes to ifb > dev - there is no change needed. > Ah, indeed, it works. Many thanks for your help. > > This patch tries to resolve it by supporting redirecting to ifb with bpf > > program. > > > > Ingress bandwidth limit is useful in some scenarios, for example, for the > > TCP-based service, there may be lots of clients connecting it, so it is > > not wise to limit the clients' egress. After limiting the server-side's > > ingress, it will lower the send rate of the client by lowering the TCP > > cwnd if the ingress bandwidth limit is reached. If we don't limit it, > > the clients will continue sending requests at a high rate. > > Adding artificial queueing for the inbound traffic, aren't you worried > about DoS'ing your node? Yes, we worried about it, but we haven't observed it in our data center. > If you need to tell the sender to slow down, > have you looked at hbm (https://lpc.events/event/4/contributions/486/, > samples/bpf/hbm_out_kern.c) which uses ECN CE marking to tell the TCP > sender to slow down? (Fwiw, for UDP https://github.com/cloudflare/rakelimit > would be an option.) > We're considering using ECN. Thanks for your information, I will analyze it.
Daniel Borkmann <daniel@iogearbox.net> writes: >> 2). We can't redirect ingress packet to ifb with bpf >> By trying to analyze if it is possible to redirect the ingress packet to >> ifb with a bpf program, we find that the ifb device is not supported by >> bpf redirect yet. > > You actually can: Just let BPF program return TC_ACT_UNSPEC for this > case and then add a matchall with higher prio (so it runs after bpf) > that contains an action with mirred egress redirect that pushes to ifb > dev - there is no change needed. I wasn't aware that BPF couldn't redirect directly to an IFB; any reason why we shouldn't merge this patch in any case? >> This patch tries to resolve it by supporting redirecting to ifb with bpf >> program. >> >> Ingress bandwidth limit is useful in some scenarios, for example, for the >> TCP-based service, there may be lots of clients connecting it, so it is >> not wise to limit the clients' egress. After limiting the server-side's >> ingress, it will lower the send rate of the client by lowering the TCP >> cwnd if the ingress bandwidth limit is reached. If we don't limit it, >> the clients will continue sending requests at a high rate. > > Adding artificial queueing for the inbound traffic, aren't you worried > about DoS'ing your node? Just as an aside, the ingress filter -> ifb -> qdisc on the ifb interface does work surprisingly well, and we've been using that over in OpenWrt land for years[0]. It does have some overhead associated with it, but I wouldn't expect it to be a source of self-DoS in itself (assuming well-behaved TCP traffic). -Toke [0] https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm
On 4/13/23 4:43 PM, Toke Høiland-Jørgensen wrote: > Daniel Borkmann <daniel@iogearbox.net> writes: > >>> 2). We can't redirect ingress packet to ifb with bpf >>> By trying to analyze if it is possible to redirect the ingress packet to >>> ifb with a bpf program, we find that the ifb device is not supported by >>> bpf redirect yet. >> >> You actually can: Just let BPF program return TC_ACT_UNSPEC for this >> case and then add a matchall with higher prio (so it runs after bpf) >> that contains an action with mirred egress redirect that pushes to ifb >> dev - there is no change needed. > > I wasn't aware that BPF couldn't redirect directly to an IFB; any reason > why we shouldn't merge this patch in any case? > >>> This patch tries to resolve it by supporting redirecting to ifb with bpf >>> program. >>> >>> Ingress bandwidth limit is useful in some scenarios, for example, for the >>> TCP-based service, there may be lots of clients connecting it, so it is >>> not wise to limit the clients' egress. After limiting the server-side's >>> ingress, it will lower the send rate of the client by lowering the TCP >>> cwnd if the ingress bandwidth limit is reached. If we don't limit it, >>> the clients will continue sending requests at a high rate. >> >> Adding artificial queueing for the inbound traffic, aren't you worried >> about DoS'ing your node? > > Just as an aside, the ingress filter -> ifb -> qdisc on the ifb > interface does work surprisingly well, and we've been using that over in > OpenWrt land for years[0]. It does have some overhead associated with it, > but I wouldn't expect it to be a source of self-DoS in itself (assuming > well-behaved TCP traffic). Out of curiosity, wrt OpenWrt case, can you elaborate on the use case to why choosing to do this on ingress via ifb rather than on the egress side? I presume in this case it's regular router, so pkts would be forwarded anyway, and in your case traversing qdisc layer / queuing twice (ingress phys dev -> ifb, egress phys dev), right? What is the rationale that would justify such setup aka why it cannot be solved differently? Thanks, Daniel > -Toke > > [0] https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm
Daniel Borkmann <daniel@iogearbox.net> writes: > On 4/13/23 4:43 PM, Toke Høiland-Jørgensen wrote: >> Daniel Borkmann <daniel@iogearbox.net> writes: >> >>>> 2). We can't redirect ingress packet to ifb with bpf >>>> By trying to analyze if it is possible to redirect the ingress packet to >>>> ifb with a bpf program, we find that the ifb device is not supported by >>>> bpf redirect yet. >>> >>> You actually can: Just let BPF program return TC_ACT_UNSPEC for this >>> case and then add a matchall with higher prio (so it runs after bpf) >>> that contains an action with mirred egress redirect that pushes to ifb >>> dev - there is no change needed. >> >> I wasn't aware that BPF couldn't redirect directly to an IFB; any reason >> why we shouldn't merge this patch in any case? >> >>>> This patch tries to resolve it by supporting redirecting to ifb with bpf >>>> program. >>>> >>>> Ingress bandwidth limit is useful in some scenarios, for example, for the >>>> TCP-based service, there may be lots of clients connecting it, so it is >>>> not wise to limit the clients' egress. After limiting the server-side's >>>> ingress, it will lower the send rate of the client by lowering the TCP >>>> cwnd if the ingress bandwidth limit is reached. If we don't limit it, >>>> the clients will continue sending requests at a high rate. >>> >>> Adding artificial queueing for the inbound traffic, aren't you worried >>> about DoS'ing your node? >> >> Just as an aside, the ingress filter -> ifb -> qdisc on the ifb >> interface does work surprisingly well, and we've been using that over in >> OpenWrt land for years[0]. It does have some overhead associated with it, >> but I wouldn't expect it to be a source of self-DoS in itself (assuming >> well-behaved TCP traffic). > > Out of curiosity, wrt OpenWrt case, can you elaborate on the use case to why > choosing to do this on ingress via ifb rather than on the egress side? I > presume in this case it's regular router, so pkts would be forwarded anyway, > and in your case traversing qdisc layer / queuing twice (ingress phys dev -> > ifb, egress phys dev), right? What is the rationale that would justify such > setup aka why it cannot be solved differently? Because there's not always a single egress on the other side. These are mainly home routers, which tend to have one or more WiFi devices bridged to one or more ethernet ports on the LAN side, and a single upstream WAN port. And the objective is to control the total amount of traffic going over the WAN link (in both directions), to deal with bufferbloat in the ISP network (which is sadly still all too prevalent). In this setup, the traffic can be split arbitrarily between the links on the LAN side, and the only "single bottleneck" is the WAN link. So we install both egress and ingress shapers on this, configured to something like 95-98% of the true link bandwidth, thus moving the queues into the qdisc layer in the router. It's usually necessary to set the ingress bandwidth shaper a bit lower than the egress due to being "downstream" of the bottleneck link, but it does work surprisingly well. We usually use something like a matchall filter to put all ingress traffic on the ifb, so doing the redirect from BPF has not been an immediate requirement thus far. However, it does seem a bit odd that this is not possible, and we do have a BPF-based filter that layers on top of this kind of setup, which currently uses u32 as the ingress filter and so it could presumably be improved to use BPF instead if that was available: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README -Toke
On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote: > Daniel Borkmann <daniel@iogearbox.net> writes: [...] > https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README Thanks for the explanation, that sounds reasonable and this should ideally be part of the commit msg! Yafang, Toke, how about we craft it the following way then to support this case: From f6c83e5e55c5eb9da8acd19369c688acf53951db Mon Sep 17 00:00:00 2001 Message-Id: <f6c83e5e55c5eb9da8acd19369c688acf53951db.1681512637.git.daniel@iogearbox.net> From: Daniel Borkmann <daniel@iogearbox.net> Date: Sat, 15 Apr 2023 00:30:27 +0200 Subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are some use-cases where it is desirable to use bpf_redirect() in combination with ifb device, which currently is not supported, for example, around filtering inbound traffic with BPF to then push it to ifb which holds the qdisc for shaping in contrast to doing that on the egress device. Toke mentions the following case related to OpenWrt: Because there's not always a single egress on the other side. These are mainly home routers, which tend to have one or more WiFi devices bridged to one or more ethernet ports on the LAN side, and a single upstream WAN port. And the objective is to control the total amount of traffic going over the WAN link (in both directions), to deal with bufferbloat in the ISP network (which is sadly still all too prevalent). In this setup, the traffic can be split arbitrarily between the links on the LAN side, and the only "single bottleneck" is the WAN link. So we install both egress and ingress shapers on this, configured to something like 95-98% of the true link bandwidth, thus moving the queues into the qdisc layer in the router. It's usually necessary to set the ingress bandwidth shaper a bit lower than the egress due to being "downstream" of the bottleneck link, but it does work surprisingly well. We usually use something like a matchall filter to put all ingress traffic on the ifb, so doing the redirect from BPF has not been an immediate requirement thus far. However, it does seem a bit odd that this is not possible, and we do have a BPF-based filter that layers on top of this kind of setup, which currently uses u32 as the ingress filter and so it could presumably be improved to use BPF instead if that was available. Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Reported-by: Yafang Shao <laoar.shao@gmail.com> Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README Link: https://lore.kernel.org/bpf/875y9yzbuy.fsf@toke.dk --- include/linux/skbuff.h | 9 +++++++++ net/core/filter.c | 1 + 2 files changed, 10 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ff7ad331fb82..2bbf9245640a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -5049,6 +5049,15 @@ static inline void skb_reset_redirect(struct sk_buff *skb) skb->redirected = 0; } +static inline void skb_set_redirected_noclear(struct sk_buff *skb, + bool from_ingress) +{ + skb->redirected = 1; +#ifdef CONFIG_NET_REDIRECT + skb->from_ingress = from_ingress; +#endif +} + static inline bool skb_csum_is_sctp(struct sk_buff *skb) { return skb->csum_not_inet; diff --git a/net/core/filter.c b/net/core/filter.c index 1d6f165923bf..27ba616aaa1a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2111,6 +2111,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) } skb->dev = dev; + skb_set_redirected_noclear(skb, skb->tc_at_ingress); skb_clear_tstamp(skb); dev_xmit_recursion_inc();
Hi Daniel, kernel test robot noticed the following build errors: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/c68bf723-3406-d177-49b4-6d5b485048de%40iogearbox.net patch subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb config: x86_64-randconfig-a002-20230410 (https://download.01.org/0day-ci/archive/20230415/202304150814.os34v8BI-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/91c0d1d0b071c23d95bf9747b89daf5ae378ad1a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912 git checkout 91c0d1d0b071c23d95bf9747b89daf5ae378ad1a # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/core/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202304150814.os34v8BI-lkp@intel.com/ All errors (new ones prefixed by >>): >> net/core/filter.c:2125:39: error: no member named 'tc_at_ingress' in 'struct sk_buff' skb_set_redirected_noclear(skb, skb->tc_at_ingress); ~~~ ^ 1 error generated. vim +2125 net/core/filter.c 2113 2114 static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) 2115 { 2116 int ret; 2117 2118 if (dev_xmit_recursion()) { 2119 net_crit_ratelimited("bpf: recursion limit reached on datapath, buggy bpf program?\n"); 2120 kfree_skb(skb); 2121 return -ENETDOWN; 2122 } 2123 2124 skb->dev = dev; > 2125 skb_set_redirected_noclear(skb, skb->tc_at_ingress); 2126 skb_clear_tstamp(skb); 2127 2128 dev_xmit_recursion_inc(); 2129 ret = dev_queue_xmit(skb); 2130 dev_xmit_recursion_dec(); 2131 2132 return ret; 2133 } 2134
Hi Daniel, kernel test robot noticed the following build errors: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/c68bf723-3406-d177-49b4-6d5b485048de%40iogearbox.net patch subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb config: x86_64-randconfig-a014-20230410 (https://download.01.org/0day-ci/archive/20230415/202304150811.bzx9niRq-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/91c0d1d0b071c23d95bf9747b89daf5ae378ad1a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Daniel-Borkmann/bpf-Set-skb-redirect-and-from_ingress-info-in-__bpf_tx_skb/20230415-065912 git checkout 91c0d1d0b071c23d95bf9747b89daf5ae378ad1a # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 olddefconfig make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/core/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202304150811.bzx9niRq-lkp@intel.com/ All errors (new ones prefixed by >>): net/core/filter.c: In function '__bpf_tx_skb': >> net/core/filter.c:2125:44: error: 'struct sk_buff' has no member named 'tc_at_ingress' 2125 | skb_set_redirected_noclear(skb, skb->tc_at_ingress); | ^~ vim +2125 net/core/filter.c 2113 2114 static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) 2115 { 2116 int ret; 2117 2118 if (dev_xmit_recursion()) { 2119 net_crit_ratelimited("bpf: recursion limit reached on datapath, buggy bpf program?\n"); 2120 kfree_skb(skb); 2121 return -ENETDOWN; 2122 } 2123 2124 skb->dev = dev; > 2125 skb_set_redirected_noclear(skb, skb->tc_at_ingress); 2126 skb_clear_tstamp(skb); 2127 2128 dev_xmit_recursion_inc(); 2129 ret = dev_queue_xmit(skb); 2130 dev_xmit_recursion_dec(); 2131 2132 return ret; 2133 } 2134
On Sat, Apr 15, 2023 at 6:57 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote: > > Daniel Borkmann <daniel@iogearbox.net> writes: > [...] > > https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README > > Thanks for the explanation, that sounds reasonable and this should ideally > be part of the commit msg! Yafang, Toke, how about we craft it the following > way then to support this case: > LGTM. With the issue reported by kernel test robot [1] fixed, Acked-by: Yafang Shao <laoar.shao@gmail.com> [1]. https://lore.kernel.org/bpf/202304150811.bzx9niRq-lkp@intel.com/ > From f6c83e5e55c5eb9da8acd19369c688acf53951db Mon Sep 17 00:00:00 2001 > Message-Id: <f6c83e5e55c5eb9da8acd19369c688acf53951db.1681512637.git.daniel@iogearbox.net> > From: Daniel Borkmann <daniel@iogearbox.net> > Date: Sat, 15 Apr 2023 00:30:27 +0200 > Subject: [PATCH bpf-next] bpf: Set skb redirect and from_ingress info in __bpf_tx_skb > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > There are some use-cases where it is desirable to use bpf_redirect() > in combination with ifb device, which currently is not supported, for > example, around filtering inbound traffic with BPF to then push it to > ifb which holds the qdisc for shaping in contrast to doing that on the > egress device. > > Toke mentions the following case related to OpenWrt: > > Because there's not always a single egress on the other side. These are > mainly home routers, which tend to have one or more WiFi devices bridged > to one or more ethernet ports on the LAN side, and a single upstream WAN > port. And the objective is to control the total amount of traffic going > over the WAN link (in both directions), to deal with bufferbloat in the > ISP network (which is sadly still all too prevalent). > > In this setup, the traffic can be split arbitrarily between the links > on the LAN side, and the only "single bottleneck" is the WAN link. So we > install both egress and ingress shapers on this, configured to something > like 95-98% of the true link bandwidth, thus moving the queues into the > qdisc layer in the router. It's usually necessary to set the ingress > bandwidth shaper a bit lower than the egress due to being "downstream" > of the bottleneck link, but it does work surprisingly well. > > We usually use something like a matchall filter to put all ingress > traffic on the ifb, so doing the redirect from BPF has not been an > immediate requirement thus far. However, it does seem a bit odd that this > is not possible, and we do have a BPF-based filter that layers on top of > this kind of setup, which currently uses u32 as the ingress filter and > so it could presumably be improved to use BPF instead if that was > available. > > Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> > Reported-by: Yafang Shao <laoar.shao@gmail.com> > Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README > Link: https://lore.kernel.org/bpf/875y9yzbuy.fsf@toke.dk > --- > include/linux/skbuff.h | 9 +++++++++ > net/core/filter.c | 1 + > 2 files changed, 10 insertions(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index ff7ad331fb82..2bbf9245640a 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -5049,6 +5049,15 @@ static inline void skb_reset_redirect(struct sk_buff *skb) > skb->redirected = 0; > } > > +static inline void skb_set_redirected_noclear(struct sk_buff *skb, > + bool from_ingress) > +{ > + skb->redirected = 1; > +#ifdef CONFIG_NET_REDIRECT > + skb->from_ingress = from_ingress; > +#endif > +} > + > static inline bool skb_csum_is_sctp(struct sk_buff *skb) > { > return skb->csum_not_inet; > diff --git a/net/core/filter.c b/net/core/filter.c > index 1d6f165923bf..27ba616aaa1a 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2111,6 +2111,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) > } > > skb->dev = dev; > + skb_set_redirected_noclear(skb, skb->tc_at_ingress); > skb_clear_tstamp(skb); > > dev_xmit_recursion_inc(); > -- > 2.21.0
Daniel Borkmann <daniel@iogearbox.net> writes: > On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote: >> Daniel Borkmann <daniel@iogearbox.net> writes: > [...] >> https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README > > Thanks for the explanation, that sounds reasonable and this should ideally > be part of the commit msg! Yafang, Toke, how about we craft it the following > way then to support this case: SGTM! With the kbot complaint fixed: Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
On 4/15/23 4:00 PM, Toke Høiland-Jørgensen wrote: > Daniel Borkmann <daniel@iogearbox.net> writes: > >> On 4/14/23 6:07 PM, Toke Høiland-Jørgensen wrote: >>> Daniel Borkmann <daniel@iogearbox.net> writes: >> [...] >>> https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README >> >> Thanks for the explanation, that sounds reasonable and this should ideally >> be part of the commit msg! Yafang, Toke, how about we craft it the following >> way then to support this case: > > SGTM! With the kbot complaint fixed: Sounds good, thanks both; sent out now.
diff --git a/net/core/dev.c b/net/core/dev.c index c785319..da6b196 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3956,6 +3956,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) return NULL; case TC_ACT_REDIRECT: /* No need to push/pop skb's mac_header here on egress! */ + skb_set_redirected(skb, skb->tc_at_ingress); skb_do_redirect(skb); *ret = NET_XMIT_SUCCESS; return NULL; @@ -5138,6 +5139,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) * redirecting to another netdev */ __skb_push(skb, skb->mac_len); + skb_set_redirected(skb, skb->tc_at_ingress); if (skb_do_redirect(skb) == -EAGAIN) { __skb_pull(skb, skb->mac_len); *another = true;
In our container environment, we are using EDT-bpf to limit the egress bandwidth. EDT-bpf can be used to limit egress only, but can't be used to limit ingress. Some of our users also want to limit the ingress bandwidth. But after applying EDT-bpf, which is based on clsact qdisc, it is impossible to limit the ingress bandwidth currently, due to some reasons, 1). We can't add ingress qdisc The ingress qdisc can't coexist with clsact qdisc as clsact has both ingress and egress handler. So our traditional method to limit ingress bandwidth can't work any more. 2). We can't redirect ingress packet to ifb with bpf By trying to analyze if it is possible to redirect the ingress packet to ifb with a bpf program, we find that the ifb device is not supported by bpf redirect yet. This patch tries to resolve it by supporting redirecting to ifb with bpf program. Ingress bandwidth limit is useful in some scenarios, for example, for the TCP-based service, there may be lots of clients connecting it, so it is not wise to limit the clients' egress. After limiting the server-side's ingress, it will lower the send rate of the client by lowering the TCP cwnd if the ingress bandwidth limit is reached. If we don't limit it, the clients will continue sending requests at a high rate. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> --- net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+)