Message ID | 1636457577-43305-1-git-send-email-changliang.wu@smartx.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ipv4: add sysctl knob to control the discarding of skb from local in ip_forward | expand |
On 11/9/21 12:32 PM, Changliang Wu wrote: > This change is meant to add a control for forwarding skb from local. > By default, ip forward will not receive the pakcet to/from the local. > But in some special cases, for example: > - > | ovs-bridge gw-port | <----> kube-proxy(iptables) | > - > Ovs sends the packet to the gateway, which requires iptables for nat, > such as kube-proxy (iptables), and then sends it back to the gateway > through routing for further processing in ovs. > > Signed-off-by: Changliang Wu <changliang.wu@smartx.com> > --- > include/net/netns/ipv4.h | 1 + > net/ipv4/af_inet.c | 1 + > net/ipv4/ip_forward.c | 6 +++--- > net/ipv4/sysctl_net_ipv4.c | 7 +++++++ > 4 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > index 2f65701..0dbe0d6 100644 > --- a/include/net/netns/ipv4.h > +++ b/include/net/netns/ipv4.h > @@ -94,6 +94,7 @@ struct netns_ipv4 { > u8 sysctl_ip_no_pmtu_disc; > u8 sysctl_ip_fwd_use_pmtu; > u8 sysctl_ip_fwd_update_priority; > + u8 sysctl_ip_fwd_accept_local; > u8 sysctl_ip_nonlocal_bind; > u8 sysctl_ip_autobind_reuse; > /* Shall we try to damage output packets if routing dev changes? */ > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 0189e3c..b5dc205 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1844,6 +1844,7 @@ static __net_init int inet_init_net(struct net *net) > */ > net->ipv4.sysctl_ip_default_ttl = IPDEFTTL; > net->ipv4.sysctl_ip_fwd_update_priority = 1; > + net->ipv4.sysctl_ip_fwd_accept_local = 0; > net->ipv4.sysctl_ip_dynaddr = 0; > net->ipv4.sysctl_ip_early_demux = 1; > net->ipv4.sysctl_udp_early_demux = 1; > diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c > index 00ec819..06b7e00 100644 > --- a/net/ipv4/ip_forward.c > +++ b/net/ipv4/ip_forward.c > @@ -95,9 +95,6 @@ int ip_forward(struct sk_buff *skb) > if (skb->pkt_type != PACKET_HOST) > goto drop; > > - if (unlikely(skb->sk)) > - goto drop; > - > if (skb_warn_if_lro(skb)) > goto drop; > > @@ -110,6 +107,9 @@ int ip_forward(struct sk_buff *skb) > skb_forward_csum(skb); > net = dev_net(skb->dev); > > + if (unlikely(!net->ipv4.sysctl_ip_fwd_accept_local && skb->sk)) > + goto drop; > + Why moving the check further down instead of initializing dev_net(skb->dev) earlier? What about ip6_forward()? Same issue exists there too I presume? > /* > * According to the RFC, we must first decrease the TTL field. If > * that reaches zero, we must reply an ICMP control message telling > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > index 97eb547..d95e2e3 100644 > --- a/net/ipv4/sysctl_net_ipv4.c > +++ b/net/ipv4/sysctl_net_ipv4.c > @@ -756,6 +756,13 @@ static int proc_fib_multipath_hash_fields(struct ctl_table *table, int write, > .extra2 = SYSCTL_ONE, > }, > { > + .procname = "ip_forward_accept_local", > + .data = &init_net.ipv4.sysctl_ip_fwd_accept_local, > + .maxlen = sizeof(u8), > + .mode = 0644, > + .proc_handler = proc_dou8vec_minmax, > + }, > + { > .procname = "ip_nonlocal_bind", > .data = &init_net.ipv4.sysctl_ip_nonlocal_bind, > .maxlen = sizeof(u8), >
Changliang Wu <changliang.wu@smartx.com> wrote: > This change is meant to add a control for forwarding skb from local. > By default, ip forward will not receive the pakcet to/from the local. > But in some special cases, for example: > - > | ovs-bridge gw-port | <----> kube-proxy(iptables) | > - > Ovs sends the packet to the gateway, which requires iptables for nat, > such as kube-proxy (iptables), and then sends it back to the gateway > through routing for further processing in ovs. This a very terse description. How does packet end up in forward after skb->sk assignment? > diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c > index 00ec819..06b7e00 100644 > --- a/net/ipv4/ip_forward.c > +++ b/net/ipv4/ip_forward.c > @@ -95,9 +95,6 @@ int ip_forward(struct sk_buff *skb) > if (skb->pkt_type != PACKET_HOST) > goto drop; > > - if (unlikely(skb->sk)) > - goto drop; > - Please have a look at 2ab957492d13bb819400ac29ae55911d50a82a13 you need to explain why this is now ok. Without explanation i have to assume stack will now crash again when net->ipv4.sysctl_ip_fwd_accept_local=1 and TW socket is assigned.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 2f65701..0dbe0d6 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -94,6 +94,7 @@ struct netns_ipv4 { u8 sysctl_ip_no_pmtu_disc; u8 sysctl_ip_fwd_use_pmtu; u8 sysctl_ip_fwd_update_priority; + u8 sysctl_ip_fwd_accept_local; u8 sysctl_ip_nonlocal_bind; u8 sysctl_ip_autobind_reuse; /* Shall we try to damage output packets if routing dev changes? */ diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 0189e3c..b5dc205 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1844,6 +1844,7 @@ static __net_init int inet_init_net(struct net *net) */ net->ipv4.sysctl_ip_default_ttl = IPDEFTTL; net->ipv4.sysctl_ip_fwd_update_priority = 1; + net->ipv4.sysctl_ip_fwd_accept_local = 0; net->ipv4.sysctl_ip_dynaddr = 0; net->ipv4.sysctl_ip_early_demux = 1; net->ipv4.sysctl_udp_early_demux = 1; diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index 00ec819..06b7e00 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -95,9 +95,6 @@ int ip_forward(struct sk_buff *skb) if (skb->pkt_type != PACKET_HOST) goto drop; - if (unlikely(skb->sk)) - goto drop; - if (skb_warn_if_lro(skb)) goto drop; @@ -110,6 +107,9 @@ int ip_forward(struct sk_buff *skb) skb_forward_csum(skb); net = dev_net(skb->dev); + if (unlikely(!net->ipv4.sysctl_ip_fwd_accept_local && skb->sk)) + goto drop; + /* * According to the RFC, we must first decrease the TTL field. If * that reaches zero, we must reply an ICMP control message telling diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 97eb547..d95e2e3 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -756,6 +756,13 @@ static int proc_fib_multipath_hash_fields(struct ctl_table *table, int write, .extra2 = SYSCTL_ONE, }, { + .procname = "ip_forward_accept_local", + .data = &init_net.ipv4.sysctl_ip_fwd_accept_local, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dou8vec_minmax, + }, + { .procname = "ip_nonlocal_bind", .data = &init_net.ipv4.sysctl_ip_nonlocal_bind, .maxlen = sizeof(u8),
This change is meant to add a control for forwarding skb from local. By default, ip forward will not receive the pakcet to/from the local. But in some special cases, for example: - | ovs-bridge gw-port | <----> kube-proxy(iptables) | - Ovs sends the packet to the gateway, which requires iptables for nat, such as kube-proxy (iptables), and then sends it back to the gateway through routing for further processing in ovs. Signed-off-by: Changliang Wu <changliang.wu@smartx.com> --- include/net/netns/ipv4.h | 1 + net/ipv4/af_inet.c | 1 + net/ipv4/ip_forward.c | 6 +++--- net/ipv4/sysctl_net_ipv4.c | 7 +++++++ 4 files changed, 12 insertions(+), 3 deletions(-)