diff mbox series

[net] tcp: reduce accepted window in NEW_SYN_RECV state

Message ID 20240523130528.60376-1-edumazet@google.com (mailing list archive)
State Accepted
Commit f4dca95fc0f6350918f2e6727e35b41f7f86fcce
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: reduce accepted window in NEW_SYN_RECV state | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2923 this patch: 2923
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 950 this patch: 950
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 3107 this patch: 3107
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 60 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-24--15-00 (tests: 1037)

Commit Message

Eric Dumazet May 23, 2024, 1:05 p.m. UTC
Jason commit made checks against ACK sequence less strict
and can be exploited by attackers to establish spoofed flows
with less probes.

Innocent users might use tcp_rmem[1] == 1,000,000,000,
or something more reasonable.

An attacker can use a regular TCP connection to learn the server
initial tp->rcv_wnd, and use it to optimize the attack.

If we make sure that only the announced window (smaller than 65535)
is used for ACK validation, we force an attacker to use
65537 packets to complete the 3WHS (assuming server ISN is unknown)

Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Xing <kernelxing@tencent.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
 include/net/request_sock.h | 12 ++++++++++++
 net/ipv4/tcp_ipv4.c        |  7 +------
 net/ipv4/tcp_minisocks.c   |  7 +++++--
 net/ipv6/tcp_ipv6.c        |  7 +------
 4 files changed, 19 insertions(+), 14 deletions(-)

Comments

Neal Cardwell May 23, 2024, 3:37 p.m. UTC | #1
On Thu, May 23, 2024 at 9:05 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Jason commit made checks against ACK sequence less strict
> and can be exploited by attackers to establish spoofed flows
> with less probes.
>
> Innocent users might use tcp_rmem[1] == 1,000,000,000,
> or something more reasonable.
>
> An attacker can use a regular TCP connection to learn the server
> initial tp->rcv_wnd, and use it to optimize the attack.
>
> If we make sure that only the announced window (smaller than 65535)
> is used for ACK validation, we force an attacker to use
> 65537 packets to complete the 3WHS (assuming server ISN is unknown)
>
> Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
> Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Xing <kernelxing@tencent.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal
Jason Xing May 24, 2024, 8:50 a.m. UTC | #2
On Thu, May 23, 2024 at 9:07 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Jason commit made checks against ACK sequence less strict
> and can be exploited by attackers to establish spoofed flows
> with less probes.
>
> Innocent users might use tcp_rmem[1] == 1,000,000,000,
> or something more reasonable.
>
> An attacker can use a regular TCP connection to learn the server
> initial tp->rcv_wnd, and use it to optimize the attack.
>
> If we make sure that only the announced window (smaller than 65535)
> is used for ACK validation, we force an attacker to use
> 65537 packets to complete the 3WHS (assuming server ISN is unknown)
>
> Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
> Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Xing <kernelxing@tencent.com>
> Cc: Neal Cardwell <ncardwell@google.com>

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thank you, Eric!
patchwork-bot+netdevbpf@kernel.org May 28, 2024, midnight UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 23 May 2024 13:05:27 +0000 you wrote:
> Jason commit made checks against ACK sequence less strict
> and can be exploited by attackers to establish spoofed flows
> with less probes.
> 
> Innocent users might use tcp_rmem[1] == 1,000,000,000,
> or something more reasonable.
> 
> [...]

Here is the summary with links:
  - [net] tcp: reduce accepted window in NEW_SYN_RECV state
    https://git.kernel.org/netdev/net/c/f4dca95fc0f6

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index bdc737832da66a1eb5c50928e67d45d8b58d7b8e..68c1c5a5444c2c73a5e2209012b0f985f4361704 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -284,4 +284,16 @@  static inline int reqsk_queue_len_young(const struct request_sock_queue *queue)
 	return atomic_read(&queue->young);
 }
 
+/* RFC 7323 2.3 Using the Window Scale Option
+ *  The window field (SEG.WND) of every outgoing segment, with the
+ *  exception of <SYN> segments, MUST be right-shifted by
+ *  Rcv.Wind.Shift bits.
+ *
+ * This means the SEG.WND carried in SYNACK can not exceed 65535.
+ * We use this property to harden TCP stack while in NEW_SYN_RECV state.
+ */
+static inline u32 tcp_synack_window(const struct request_sock *req)
+{
+	return min(req->rsk_rcv_wnd, 65535U);
+}
 #endif /* _REQUEST_SOCK_H */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 30ef0c8f5e92d301c31ea1a05f662c1fc4cf37af..b710958393e64e2278c088018c87ac97a1291a23 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1144,14 +1144,9 @@  static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 #endif
 	}
 
-	/* RFC 7323 2.3
-	 * The window field (SEG.WND) of every outgoing segment, with the
-	 * exception of <SYN> segments, MUST be right-shifted by
-	 * Rcv.Wind.Shift bits:
-	 */
 	tcp_v4_send_ack(sk, skb, seq,
 			tcp_rsk(req)->rcv_nxt,
-			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
+			tcp_synack_window(req) >> inet_rsk(req)->rcv_wscale,
 			tcp_rsk_tsval(tcp_rsk(req)),
 			READ_ONCE(req->ts_recent),
 			0, &key,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b93619b2384b3735ecb6e40238f8367d9afb7e15..538c06f95918dedf29e0f4790795fcc417f2516f 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -783,8 +783,11 @@  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 
 	/* RFC793: "first check sequence number". */
 
-	if (paws_reject || !tcp_in_window(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
-					  tcp_rsk(req)->rcv_nxt, tcp_rsk(req)->rcv_nxt + req->rsk_rcv_wnd)) {
+	if (paws_reject || !tcp_in_window(TCP_SKB_CB(skb)->seq,
+					  TCP_SKB_CB(skb)->end_seq,
+					  tcp_rsk(req)->rcv_nxt,
+					  tcp_rsk(req)->rcv_nxt +
+					  tcp_synack_window(req))) {
 		/* Out of window: send ACK and drop. */
 		if (!(flg & TCP_FLAG_RST) &&
 		    !tcp_oow_rate_limited(sock_net(sk), skb,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 4c3605485b68e7c333a0144df3d685b3db9ff45d..8c577b651bfcd2f94b45e339ed4a2b47e93ff17a 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1272,15 +1272,10 @@  static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 	/* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV
 	 * sk->sk_state == TCP_SYN_RECV -> for Fast Open.
 	 */
-	/* RFC 7323 2.3
-	 * The window field (SEG.WND) of every outgoing segment, with the
-	 * exception of <SYN> segments, MUST be right-shifted by
-	 * Rcv.Wind.Shift bits:
-	 */
 	tcp_v6_send_ack(sk, skb, (sk->sk_state == TCP_LISTEN) ?
 			tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
 			tcp_rsk(req)->rcv_nxt,
-			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
+			tcp_synack_window(req) >> inet_rsk(req)->rcv_wscale,
 			tcp_rsk_tsval(tcp_rsk(req)),
 			READ_ONCE(req->ts_recent), sk->sk_bound_dev_if,
 			&key, ipv6_get_dsfield(ipv6_hdr(skb)), 0,