Message ID | 20210809070455.21051-1-liuhangbin@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | d09c548dbf3b31cb07bba562e0f452edfa01efe3 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | fail | Series targets non-next tree, but doesn't contain any Fixes tags |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 9 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
On 2021-08-09 10:04 AM, Hangbin Liu wrote: > When mirror/redirect a skb to a different port, the ct info should be reset > for reclassification. Or the pkts will match unexpected rules. For example, > with following topology and commands: > > ----------- > | > veth0 -+------- > | > veth1 -+------- > | > ------------ > > tc qdisc add dev veth0 clsact > # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1" > tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1 > tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1 > tc qdisc add dev veth1 clsact > tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop > > ping <remove ip via veth0> & > tc -s filter show dev veth1 ingress > > With command 'tc -s filter show', we can find the pkts were dropped on > veth1. > We can add a fixes line on the commit added ct support Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") > Signed-off-by: Roi Dayan <roid@nvidia.com> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > --- > net/sched/act_mirred.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c > index 7153c67f641e..2ef4cd2c848b 100644 > --- a/net/sched/act_mirred.c > +++ b/net/sched/act_mirred.c > @@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, > goto out; > } > > + /* All mirred/redirected skbs should clear previous ct info */ > + nf_reset_ct(skb2); > + > want_ingress = tcf_mirred_act_wants_ingress(m_eaction); > > expects_nh = want_ingress || !m_mac_header_xmit; >
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > When mirror/redirect a skb to a different port, the ct info should be reset > for reclassification. Or the pkts will match unexpected rules. For example, > with following topology and commands: > > ----------- > | > veth0 -+------- > | > veth1 -+------- > | > > [...] Here is the summary with links: - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb https://git.kernel.org/netdev/net/c/d09c548dbf3b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Hi, On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote: > > Hello: > > This patch was applied to netdev/net.git (refs/heads/master): > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > > When mirror/redirect a skb to a different port, the ct info should be reset > > for reclassification. Or the pkts will match unexpected rules. For example, > > with following topology and commands: > > > > ----------- > > | > > veth0 -+------- > > | > > veth1 -+------- > > | > > > > [...] > > Here is the summary with links: > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb > https://git.kernel.org/netdev/net/c/d09c548dbf3b Unfortunately this commit breaks DNAT when performed before going via mirred egress->ingress. The reason is that connection tracking is lost and therefore a new state is created on ingress. This breaks existing setups. See below a simplified script reproducing this issue. Therefore I suggest this commit be reverted and a knob is introduced to mirred for clearing ct as needed. Eyal. Reproduction script: #!/bin/bash ip netns add a ip netns add b ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1 ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1 ip link add veth0 netns a type veth peer name veth0 netns b ip -net a link set veth0 up ip -net a addr add dev veth0 198.51.100.1/30 ip -net a link add dum0 type dummy ip -net a link set dev dum0 up ip -net a addr add dev dum0 198.51.100.2/32 ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT --to-destination 10.0.0.2 ip -net a route add default dev dum0 ip -net a rule add pref 50 iif dum0 lookup 1000 ip -net a route add table 1000 default dev veth0 ip netns exec a tc qdisc add dev dum0 clsact ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic action mirred ingress redirect dev dum0 ip -net b link set veth0 up ip -net b addr add 10.0.0.2/32 dev veth0 ip -net b addr add 198.51.100.3/30 dev veth0 ip netns exec a ping 10.0.0.1 > > You are awesome, thank you! > -- > Deet-doot-dot, I am a bot. > https://korg.docs.kernel.org/patchwork/pwbot.html > >
Hi, On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote: > Hi, > > On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote: > > > > Hello: > > > > This patch was applied to netdev/net.git (refs/heads/master): > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > > > When mirror/redirect a skb to a different port, the ct info should be reset > > > for reclassification. Or the pkts will match unexpected rules. For example, > > > with following topology and commands: > > > > > > ----------- > > > | > > > veth0 -+------- > > > | > > > veth1 -+------- > > > | > > > > > > [...] > > > > Here is the summary with links: > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb > > https://git.kernel.org/netdev/net/c/d09c548dbf3b > > Unfortunately this commit breaks DNAT when performed before going via mirred > egress->ingress. > > The reason is that connection tracking is lost and therefore a new state > is created on ingress. > > This breaks existing setups. > > See below a simplified script reproducing this issue. I guess I can understand why the reproducer triggers it, but I fail to see the actual use case you have behind it. Can you please elaborate on it? > > Therefore I suggest this commit be reverted and a knob is introduced to mirred > for clearing ct as needed. > > Eyal. > > Reproduction script: > > #!/bin/bash > > ip netns add a > ip netns add b > > ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1 > ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1 > > ip link add veth0 netns a type veth peer name veth0 netns b > ip -net a link set veth0 up > ip -net a addr add dev veth0 198.51.100.1/30 > > ip -net a link add dum0 type dummy > ip -net a link set dev dum0 up > ip -net a addr add dev dum0 198.51.100.2/32 > > ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT > --to-destination 10.0.0.2 > ip -net a route add default dev dum0 > ip -net a rule add pref 50 iif dum0 lookup 1000 > ip -net a route add table 1000 default dev veth0 > > ip netns exec a tc qdisc add dev dum0 clsact > ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic > action mirred ingress redirect dev dum0 > > ip -net b link set veth0 up > ip -net b addr add 10.0.0.2/32 dev veth0 > ip -net b addr add 198.51.100.3/30 dev veth0 > > ip netns exec a ping 10.0.0.1 > > > > You are awesome, thank you! > > -- > > Deet-doot-dot, I am a bot. > > https://korg.docs.kernel.org/patchwork/pwbot.html > > > > >
Hi, On Tue, Apr 19, 2022 at 8:26 PM Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > > Hi, > > On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote: > > Hi, > > > > On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote: > > > > > > Hello: > > > > > > This patch was applied to netdev/net.git (refs/heads/master): > > > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > > > > When mirror/redirect a skb to a different port, the ct info should be reset > > > > for reclassification. Or the pkts will match unexpected rules. For example, > > > > with following topology and commands: > > > > > > > > ----------- > > > > | > > > > veth0 -+------- > > > > | > > > > veth1 -+------- > > > > | > > > > > > > > [...] > > > > > > Here is the summary with links: > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b > > > > Unfortunately this commit breaks DNAT when performed before going via mirred > > egress->ingress. > > > > The reason is that connection tracking is lost and therefore a new state > > is created on ingress. > > > > This breaks existing setups. > > > > See below a simplified script reproducing this issue. > > I guess I can understand why the reproducer triggers it, but I fail to > see the actual use case you have behind it. Can you please elaborate > on it? One use case we use mirred egress->ingress redirect for is when we want to reroute a packet after applying some change to the packet which would affect its routing. for example consider a bpf program running on tc ingress (after mirred) setting the skb->mark based on some criteria. So you have something like: packet routed to dummy device based on some criteria -> mirred redirect to ingress -> classification by ebpf logic at tc ingress -> packet routed again We have a setup where DNAT is performed before this flow in that case the ebpf logic needs to see the packet after the NAT. Eyal. > > > > > Therefore I suggest this commit be reverted and a knob is introduced to mirred > > for clearing ct as needed. > > > > Eyal. > > > > Reproduction script: > > > > #!/bin/bash > > > > ip netns add a > > ip netns add b > > > > ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1 > > ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1 > > > > ip link add veth0 netns a type veth peer name veth0 netns b > > ip -net a link set veth0 up > > ip -net a addr add dev veth0 198.51.100.1/30 > > > > ip -net a link add dum0 type dummy > > ip -net a link set dev dum0 up > > ip -net a addr add dev dum0 198.51.100.2/32 > > > > ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT > > --to-destination 10.0.0.2 > > ip -net a route add default dev dum0 > > ip -net a rule add pref 50 iif dum0 lookup 1000 > > ip -net a route add table 1000 default dev veth0 > > > > ip netns exec a tc qdisc add dev dum0 clsact > > ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic > > action mirred ingress redirect dev dum0 > > > > ip -net b link set veth0 up > > ip -net b addr add 10.0.0.2/32 dev veth0 > > ip -net b addr add 198.51.100.3/30 dev veth0 > > > > ip netns exec a ping 10.0.0.1 > > > > > > You are awesome, thank you! > > > -- > > > Deet-doot-dot, I am a bot. > > > https://korg.docs.kernel.org/patchwork/pwbot.html > > > > > > > > >
On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote: > > > > I guess I can understand why the reproducer triggers it, but I fail to > > see the actual use case you have behind it. Can you please elaborate > > on it? > > One use case we use mirred egress->ingress redirect for is when we want to > reroute a packet after applying some change to the packet which would affect > its routing. for example consider a bpf program running on tc ingress (after > mirred) setting the skb->mark based on some criteria. > > So you have something like: > > packet routed to dummy device based on some criteria -> > mirred redirect to ingress -> > classification by ebpf logic at tc ingress -> > packet routed again > > We have a setup where DNAT is performed before this flow in that case the > ebpf logic needs to see the packet after the NAT. Hi Marcelo, Thanks for taking care of this. Would you help following up this issue as you are more familiar with net sched? Thanks Hangbin
Hi Eyal, On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote: > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > > > > > When mirror/redirect a skb to a different port, the ct info should be reset > > > > > for reclassification. Or the pkts will match unexpected rules. For example, > > > > > with following topology and commands: > > > > > > > > > > ----------- > > > > > | > > > > > veth0 -+------- > > > > > | > > > > > veth1 -+------- > > > > > | > > > > > > > > > > [...] > > > > > > > > Here is the summary with links: > > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb > > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b > > > > > > Unfortunately this commit breaks DNAT when performed before going via mirred > > > egress->ingress. > > > > > > The reason is that connection tracking is lost and therefore a new state > > > is created on ingress. > > > > > > This breaks existing setups. > > > > > > See below a simplified script reproducing this issue. I think we come in to a paradox state. Some user don't want to have previous ct info after mirror, while others would like to keep. In my understanding, when we receive a pkt from a interface, the skb should be clean and no ct info at first. But I may wrong. Jamal, Wang Cong, Jiri, do you have any comments? > > > > I guess I can understand why the reproducer triggers it, but I fail to > > see the actual use case you have behind it. Can you please elaborate > > on it? > > One use case we use mirred egress->ingress redirect for is when we want to > reroute a packet after applying some change to the packet which would affect > its routing. for example consider a bpf program running on tc ingress (after > mirred) setting the skb->mark based on some criteria. > > So you have something like: > > packet routed to dummy device based on some criteria -> > mirred redirect to ingress -> > classification by ebpf logic at tc ingress -> > packet routed again > > We have a setup where DNAT is performed before this flow in that case the > ebpf logic needs to see the packet after the NAT. Is it possible to check whether it's need to set the skb->mark before DNAT? So we can update it before egress and no need to re-route. Thanks Hangbin
On Thu, Apr 21, 2022 at 07:00:07PM +0800, Hangbin Liu wrote: > Hi Eyal, > On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote: > > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > > > > > > When mirror/redirect a skb to a different port, the ct info should be reset > > > > > > for reclassification. Or the pkts will match unexpected rules. For example, > > > > > > with following topology and commands: > > > > > > > > > > > > ----------- > > > > > > | > > > > > > veth0 -+------- > > > > > > | > > > > > > veth1 -+------- > > > > > > | > > > > > > > > > > > > [...] > > > > > > > > > > Here is the summary with links: > > > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb > > > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b > > > > > > > > Unfortunately this commit breaks DNAT when performed before going via mirred > > > > egress->ingress. > > > > > > > > The reason is that connection tracking is lost and therefore a new state > > > > is created on ingress. > > > > > > > > This breaks existing setups. > > > > > > > > See below a simplified script reproducing this issue. > > I think we come in to a paradox state. Some user don't want to have previous > ct info after mirror, while others would like to keep. In my understanding, > when we receive a pkt from a interface, the skb should be clean and no ct info > at first. But I may wrong. Makes sense to me. Moreover, there were a couple of fixes on this on mirred around that time frame/area (like f799ada6bf23 ("net: sched: act_mirred: drop dst for the direction from egress to ingress")). That's because we are seeing that mirred xmit action when switching to ingress direction should be as close skb_scrub_packet. OVS needs this scrubbing as well, btw. This ct information could be easily stale if there were other packet changes after it. Point being, if we really need the knob for backwards compatibility here, it may have to be a broader one. > > Jamal, Wang Cong, Jiri, do you have any comments? > > > > > > > I guess I can understand why the reproducer triggers it, but I fail to > > > see the actual use case you have behind it. Can you please elaborate > > > on it? > > > > One use case we use mirred egress->ingress redirect for is when we want to > > reroute a packet after applying some change to the packet which would affect > > its routing. for example consider a bpf program running on tc ingress (after > > mirred) setting the skb->mark based on some criteria. > > > > So you have something like: > > > > packet routed to dummy device based on some criteria -> > > mirred redirect to ingress -> > > classification by ebpf logic at tc ingress -> > > packet routed again > > > > We have a setup where DNAT is performed before this flow in that case the > > ebpf logic needs to see the packet after the NAT. > > Is it possible to check whether it's need to set the skb->mark before DNAT? > So we can update it before egress and no need to re-route. > > Thanks > Hangbin >
On Fri, Apr 22, 2022 at 4:41 PM Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: > > On Thu, Apr 21, 2022 at 07:00:07PM +0800, Hangbin Liu wrote: > > Hi Eyal, > > On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote: > > > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote: > > > > > > > When mirror/redirect a skb to a different port, the ct info should be reset > > > > > > > for reclassification. Or the pkts will match unexpected rules. For example, > > > > > > > with following topology and commands: > > > > > > > > > > > > > > ----------- > > > > > > > | > > > > > > > veth0 -+------- > > > > > > > | > > > > > > > veth1 -+------- > > > > > > > | > > > > > > > > > > > > > > [...] > > > > > > > > > > > > Here is the summary with links: > > > > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb > > > > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b > > > > > > > > > > Unfortunately this commit breaks DNAT when performed before going via mirred > > > > > egress->ingress. > > > > > > > > > > The reason is that connection tracking is lost and therefore a new state > > > > > is created on ingress. > > > > > > > > > > This breaks existing setups. > > > > > > > > > > See below a simplified script reproducing this issue. > > > > I think we come in to a paradox state. Some user don't want to have previous > > ct info after mirror, while others would like to keep. In my understanding, > > when we receive a pkt from a interface, the skb should be clean and no ct info > > at first. But I may wrong. > > Makes sense to me. Moreover, there were a couple of fixes on this on > mirred around that time frame/area (like f799ada6bf23 ("net: sched: > act_mirred: drop dst for the direction from egress to ingress")). That's > because we are seeing that mirred xmit action when switching to > ingress direction should be as close skb_scrub_packet. OVS needs this > scrubbing as well, btw. This ct information could be easily stale if > there were other packet changes after it. Makes sense to me too. The main reason for bringing this up was that it's a subtle change and wasn't trivial to figure out. > > Point being, if we really need the knob for backwards compatibility > here, it may have to be a broader one. FWIW the dst change was ok in our setups. > > > > > Jamal, Wang Cong, Jiri, do you have any comments? > > > > > > > > > > I guess I can understand why the reproducer triggers it, but I fail to > > > > see the actual use case you have behind it. Can you please elaborate > > > > on it? > > > > > > One use case we use mirred egress->ingress redirect for is when we want to > > > reroute a packet after applying some change to the packet which would affect > > > its routing. for example consider a bpf program running on tc ingress (after > > > mirred) setting the skb->mark based on some criteria. > > > > > > So you have something like: > > > > > > packet routed to dummy device based on some criteria -> > > > mirred redirect to ingress -> > > > classification by ebpf logic at tc ingress -> > > > packet routed again > > > > > > We have a setup where DNAT is performed before this flow in that case the > > > ebpf logic needs to see the packet after the NAT. > > > > Is it possible to check whether it's need to set the skb->mark before DNAT? > > So we can update it before egress and no need to re-route. For future reference, we worked around this issue by moving some of the relevant ebpf functionality to the lwt output hook which allows classification and rerouting. Eyal.
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 7153c67f641e..2ef4cd2c848b 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, goto out; } + /* All mirred/redirected skbs should clear previous ct info */ + nf_reset_ct(skb2); + want_ingress = tcf_mirred_act_wants_ingress(m_eaction); expects_nh = want_ingress || !m_mac_header_xmit;