Message ID | 20210708072109.1241563-1-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c7bb4b89033b764eb07db4e060548a6311d801ee |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3,net] ipv6: tcp: drop silly ICMPv6 packet too big messages | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 3 this patch: 3 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 48 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 3 this patch: 3 |
netdev/header_inline | success | Link |
On Thu, Jul 08, 2021 at 12:21:09AM -0700, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > While TCP stack scales reasonably well, there is still one part that > can be used to DDOS it. > > IPv6 Packet too big messages have to lookup/insert a new route, > and if abused by attackers, can easily put hosts under high stress, > with many cpus contending on a spinlock while one is stuck in fib6_run_gc() > > ip6_protocol_deliver_rcu() > icmpv6_rcv() > icmpv6_notify() > tcp_v6_err() > tcp_v6_mtu_reduced() > inet6_csk_update_pmtu() > ip6_rt_update_pmtu() > __ip6_rt_update_pmtu() > ip6_rt_cache_alloc() > ip6_dst_alloc() > dst_alloc() > ip6_dst_gc() > fib6_run_gc() > spin_lock_bh() ... > > Some of our servers have been hit by malicious ICMPv6 packets > trying to _increase_ the MTU/MSS of TCP flows. > > We believe these ICMPv6 packets are a result of a bug in one ISP stack, > since they were blindly sent back for _every_ (small) packet sent to them. > > These packets are for one TCP flow: > 09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 > 09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 > 09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 > 09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 > 09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 > > TCP stack can filter some silly requests : > > 1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err() > 2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS. > > This tests happen before the IPv6 routing stack is entered, thus > removing the potential contention and route exhaustion. > > Note that IPv6 stack was performing these checks, but too late > (ie : after the route has been added, and after the potential > garbage collect war) > > v2: fix typo caught by Martin, thanks ! > v3: exports tcp_mtu_to_mss(), caught by David, thanks ! Acked-by: Martin KaFai Lau <kafai@fb.com>
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Thu, 8 Jul 2021 00:21:09 -0700 you wrote: > From: Eric Dumazet <edumazet@google.com> > > While TCP stack scales reasonably well, there is still one part that > can be used to DDOS it. > > IPv6 Packet too big messages have to lookup/insert a new route, > and if abused by attackers, can easily put hosts under high stress, > with many cpus contending on a spinlock while one is stuck in fib6_run_gc() > > [...] Here is the summary with links: - [v3,net] ipv6: tcp: drop silly ICMPv6 packet too big messages https://git.kernel.org/netdev/net/c/c7bb4b89033b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
On Thu, 8 Jul 2021 00:21:09 -0700 Eric Dumazet wrote: > + /* Drop requests trying to increase our current mss. > + * Check done in __ip6_rt_update_pmtu() is too late. > + */ > + if (tcp_mtu_to_mss(sk, mtu) >= tcp_sk(sk)->mss_cache) > + return; Hi Eric! I think this breaks TFO + pMTU. mss_cache is 1208 but TFO seems to use tp->rx_opt.mss_clamp to size the packet. So we ignore the incoming PTB. Should we init mss_cache for TFO?
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index bde781f46b41a5dd9eb8db3fb65b45d73e592b4b..29553fce8502861087830b94cc4fbebfce6e60dc 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1732,6 +1732,7 @@ int tcp_mtu_to_mss(struct sock *sk, int pmtu) return __tcp_mtu_to_mss(sk, pmtu) - (tcp_sk(sk)->tcp_header_len - sizeof(struct tcphdr)); } +EXPORT_SYMBOL(tcp_mtu_to_mss); /* Inverse of above */ int tcp_mss_to_mtu(struct sock *sk, int mss) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 593c32fe57ed13a218492fd6056f2593e601ec79..323989927a0a6a2274bcbc1cd0ac72e9d49b24ad 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -348,11 +348,20 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, static void tcp_v6_mtu_reduced(struct sock *sk) { struct dst_entry *dst; + u32 mtu; if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) return; - dst = inet6_csk_update_pmtu(sk, READ_ONCE(tcp_sk(sk)->mtu_info)); + mtu = READ_ONCE(tcp_sk(sk)->mtu_info); + + /* Drop requests trying to increase our current mss. + * Check done in __ip6_rt_update_pmtu() is too late. + */ + if (tcp_mtu_to_mss(sk, mtu) >= tcp_sk(sk)->mss_cache) + return; + + dst = inet6_csk_update_pmtu(sk, mtu); if (!dst) return; @@ -433,6 +442,8 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, } if (type == ICMPV6_PKT_TOOBIG) { + u32 mtu = ntohl(info); + /* We are not interested in TCP_LISTEN and open_requests * (SYN-ACKs send out by Linux are always <576bytes so * they should go through unfragmented). @@ -443,7 +454,11 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (!ip6_sk_accept_pmtu(sk)) goto out; - WRITE_ONCE(tp->mtu_info, ntohl(info)); + if (mtu < IPV6_MIN_MTU) + goto out; + + WRITE_ONCE(tp->mtu_info, mtu); + if (!sock_owned_by_user(sk)) tcp_v6_mtu_reduced(sk); else if (!test_and_set_bit(TCP_MTU_REDUCED_DEFERRED,