diff mbox series

[net] tcp: do not accept ACK of bytes we never sent

Message ID 20231205161841.2702925-1-edumazet@google.com (mailing list archive)
State Accepted
Commit 3d501dd326fb1c73f1b8206d4c6e1d7b15c07e27
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: do not accept ACK of bytes we never sent | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1118 this patch: 1118
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 1143 this patch: 1143
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1145 this patch: 1145
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 13 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Dec. 5, 2023, 4:18 p.m. UTC
This patch is based on a detailed report and ideas from Yepeng Pan
and Christian Rossow.

ACK seq validation is currently following RFC 5961 5.2 guidelines:

   The ACK value is considered acceptable only if
   it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
   SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
   above condition MUST be discarded and an ACK sent back.  It needs to
   be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
   duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
   acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
   ACK, drop the segment, and return".  The "ignored" above implies that
   the processing of the incoming data segment continues, which means
   the ACK value is treated as acceptable.  This mitigation makes the
   ACK check more stringent since any ACK < SND.UNA wouldn't be
   accepted, instead only ACKs that are in the range ((SND.UNA -
   MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.

This can be refined for new (and possibly spoofed) flows,
by not accepting ACK for bytes that were never sent.

This greatly improves TCP security at a little cost.

I added a Fixes: tag to make sure this patch will reach stable trees,
even if the 'blamed' patch was adhering to the RFC.

tp->bytes_acked was added in linux-4.2

Following packetdrill test (courtesy of Yepeng Pan) shows
the issue at hand:

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1024) = 0

// ---------------- Handshake ------------------- //

// when window scale is set to 14 the window size can be extended to
// 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
// with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
// ,though this ack number acknowledges some data never
// sent by the server.

+0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14>
+0 > S. 0:0(0) ack 1 <...>
+0 < . 1:1(0) ack 1 win 65535
+0 accept(3, ..., ...) = 4

// For the established connection, we send an ACK packet,
// the ack packet uses ack number 1 - 1073725300 + 2^32,
// where 2^32 is used to wrap around.
// Note: we used 1073725300 instead of 1073725440 to avoid possible
// edge cases.
// 1 - 1073725300 + 2^32 = 3221241997

// Oops, old kernels happily accept this packet.
+0 < . 1:1001(1000) ack 3221241997 win 65535

// After the kernel fix the following will be replaced by a challenge ACK,
// and prior malicious frame would be dropped.
+0 > . 1:1(0) ack 1001

Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yepeng Pan <yepeng.pan@cispa.de>
Reported-by: Christian Rossow <rossow@cispa.de>
---
 net/ipv4/tcp_input.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Yuchung Cheng Dec. 5, 2023, 4:47 p.m. UTC | #1
On Tue, Dec 5, 2023 at 8:18 AM Eric Dumazet <edumazet@google.com> wrote:
>
> This patch is based on a detailed report and ideas from Yepeng Pan
> and Christian Rossow.
>
> ACK seq validation is currently following RFC 5961 5.2 guidelines:
>
>    The ACK value is considered acceptable only if
>    it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
>    SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
>    above condition MUST be discarded and an ACK sent back.  It needs to
>    be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
>    duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
>    acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
>    ACK, drop the segment, and return".  The "ignored" above implies that
>    the processing of the incoming data segment continues, which means
>    the ACK value is treated as acceptable.  This mitigation makes the
>    ACK check more stringent since any ACK < SND.UNA wouldn't be
>    accepted, instead only ACKs that are in the range ((SND.UNA -
>    MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
Thank you Eric for the fix. It appears the newer RFC
https://www.rfc-editor.org/rfc/rfc9293.html also has this issue that
needs a revision?

>
> This can be refined for new (and possibly spoofed) flows,
> by not accepting ACK for bytes that were never sent.
>
> This greatly improves TCP security at a little cost.
>
> I added a Fixes: tag to make sure this patch will reach stable trees,
> even if the 'blamed' patch was adhering to the RFC.
>
> tp->bytes_acked was added in linux-4.2
>
> Following packetdrill test (courtesy of Yepeng Pan) shows
> the issue at hand:
>
> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1024) = 0
>
> // ---------------- Handshake ------------------- //
>
> // when window scale is set to 14 the window size can be extended to
> // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
> // with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
> // ,though this ack number acknowledges some data never
> // sent by the server.
>
> +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14>
> +0 > S. 0:0(0) ack 1 <...>
> +0 < . 1:1(0) ack 1 win 65535
> +0 accept(3, ..., ...) = 4
>
> // For the established connection, we send an ACK packet,
> // the ack packet uses ack number 1 - 1073725300 + 2^32,
> // where 2^32 is used to wrap around.
> // Note: we used 1073725300 instead of 1073725440 to avoid possible
> // edge cases.
> // 1 - 1073725300 + 2^32 = 3221241997
>
> // Oops, old kernels happily accept this packet.
> +0 < . 1:1001(1000) ack 3221241997 win 65535
>
> // After the kernel fix the following will be replaced by a challenge ACK,
> // and prior malicious frame would be dropped.
> +0 > . 1:1(0) ack 1001
>
> Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Yepeng Pan <yepeng.pan@cispa.de>
> Reported-by: Christian Rossow <rossow@cispa.de>
> ---
>  net/ipv4/tcp_input.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bcb55d98004c5213f0095613124d5193b15b2793..62cccc2e89ec68b3badae03168f1bfcd2698e0b7 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3871,8 +3871,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
>          * then we can probably ignore it.
>          */
>         if (before(ack, prior_snd_una)) {
> +               u32 max_window;
> +
> +               /* do not accept ACK for bytes we never sent. */
> +               max_window = min_t(u64, tp->max_window, tp->bytes_acked);
>                 /* RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] */
> -               if (before(ack, prior_snd_una - tp->max_window)) {
> +               if (before(ack, prior_snd_una - max_window)) {
>                         if (!(flag & FLAG_NO_CHALLENGE_ACK))
>                                 tcp_send_challenge_ack(sk);
>                         return -SKB_DROP_REASON_TCP_TOO_OLD_ACK;
> --
> 2.43.0.rc2.451.g8631bc7472-goog
>
Eric Dumazet Dec. 5, 2023, 5:19 p.m. UTC | #2
On Tue, Dec 5, 2023 at 5:47 PM Yuchung Cheng <ycheng@google.com> wrote:

> Thank you Eric for the fix. It appears the newer RFC
> https://www.rfc-editor.org/rfc/rfc9293.html also has this issue that
> needs a revision?

I do not think RFC 9293 made any refinement about RFC 5961 guidelines.

Perhaps Yepeng Pan and Christian Rossow have plans to bring this issue to IETF.
Neal Cardwell Dec. 5, 2023, 5:48 p.m. UTC | #3
On Tue, Dec 5, 2023 at 11:18 AM Eric Dumazet <edumazet@google.com> wrote:
>
> This patch is based on a detailed report and ideas from Yepeng Pan
> and Christian Rossow.
>
> ACK seq validation is currently following RFC 5961 5.2 guidelines:
>
>    The ACK value is considered acceptable only if
>    it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
>    SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
>    above condition MUST be discarded and an ACK sent back.  It needs to
>    be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
>    duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
>    acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
>    ACK, drop the segment, and return".  The "ignored" above implies that
>    the processing of the incoming data segment continues, which means
>    the ACK value is treated as acceptable.  This mitigation makes the
>    ACK check more stringent since any ACK < SND.UNA wouldn't be
>    accepted, instead only ACKs that are in the range ((SND.UNA -
>    MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
>
> This can be refined for new (and possibly spoofed) flows,
> by not accepting ACK for bytes that were never sent.
>
> This greatly improves TCP security at a little cost.
>
> I added a Fixes: tag to make sure this patch will reach stable trees,
> even if the 'blamed' patch was adhering to the RFC.
>
> tp->bytes_acked was added in linux-4.2
>
> Following packetdrill test (courtesy of Yepeng Pan) shows
> the issue at hand:
>
> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1024) = 0
>
> // ---------------- Handshake ------------------- //
>
> // when window scale is set to 14 the window size can be extended to
> // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
> // with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
> // ,though this ack number acknowledges some data never
> // sent by the server.
>
> +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14>
> +0 > S. 0:0(0) ack 1 <...>
> +0 < . 1:1(0) ack 1 win 65535
> +0 accept(3, ..., ...) = 4
>
> // For the established connection, we send an ACK packet,
> // the ack packet uses ack number 1 - 1073725300 + 2^32,
> // where 2^32 is used to wrap around.
> // Note: we used 1073725300 instead of 1073725440 to avoid possible
> // edge cases.
> // 1 - 1073725300 + 2^32 = 3221241997
>
> // Oops, old kernels happily accept this packet.
> +0 < . 1:1001(1000) ack 3221241997 win 65535
>
> // After the kernel fix the following will be replaced by a challenge ACK,
> // and prior malicious frame would be dropped.
> +0 > . 1:1(0) ack 1001
>
> Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Yepeng Pan <yepeng.pan@cispa.de>
> Reported-by: Christian Rossow <rossow@cispa.de>
> ---
>  net/ipv4/tcp_input.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bcb55d98004c5213f0095613124d5193b15b2793..62cccc2e89ec68b3badae03168f1bfcd2698e0b7 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3871,8 +3871,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
>          * then we can probably ignore it.
>          */
>         if (before(ack, prior_snd_una)) {
> +               u32 max_window;
> +
> +               /* do not accept ACK for bytes we never sent. */
> +               max_window = min_t(u64, tp->max_window, tp->bytes_acked);
>                 /* RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] */
> -               if (before(ack, prior_snd_una - tp->max_window)) {
> +               if (before(ack, prior_snd_una - max_window)) {
>                         if (!(flag & FLAG_NO_CHALLENGE_ACK))
>                                 tcp_send_challenge_ack(sk);
>                         return -SKB_DROP_REASON_TCP_TOO_OLD_ACK;
> --

Thanks, Eric!

Acked-by: Neal Cardwell <ncardwell@google.com>

neal
Christian Rossow Dec. 5, 2023, 7:44 p.m. UTC | #4
> Perhaps Yepeng Pan and Christian Rossow have plans to bring this issue to IETF.
We already brought this up to the IETF TCP WG chairs a few weeks ago. 
It's still unclear if the TCP WG wants to work on an Internet Draft that 
mitigates "ghost ACKs". We'll follow up with them.

Cheers,
Christian
patchwork-bot+netdevbpf@kernel.org Dec. 7, 2023, 3:10 a.m. UTC | #5
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  5 Dec 2023 16:18:41 +0000 you wrote:
> This patch is based on a detailed report and ideas from Yepeng Pan
> and Christian Rossow.
> 
> ACK seq validation is currently following RFC 5961 5.2 guidelines:
> 
>    The ACK value is considered acceptable only if
>    it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
>    SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
>    above condition MUST be discarded and an ACK sent back.  It needs to
>    be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
>    duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
>    acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
>    ACK, drop the segment, and return".  The "ignored" above implies that
>    the processing of the incoming data segment continues, which means
>    the ACK value is treated as acceptable.  This mitigation makes the
>    ACK check more stringent since any ACK < SND.UNA wouldn't be
>    accepted, instead only ACKs that are in the range ((SND.UNA -
>    MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
> 
> [...]

Here is the summary with links:
  - [net] tcp: do not accept ACK of bytes we never sent
    https://git.kernel.org/netdev/net/c/3d501dd326fb

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bcb55d98004c5213f0095613124d5193b15b2793..62cccc2e89ec68b3badae03168f1bfcd2698e0b7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3871,8 +3871,12 @@  static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	 * then we can probably ignore it.
 	 */
 	if (before(ack, prior_snd_una)) {
+		u32 max_window;
+
+		/* do not accept ACK for bytes we never sent. */
+		max_window = min_t(u64, tp->max_window, tp->bytes_acked);
 		/* RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] */
-		if (before(ack, prior_snd_una - tp->max_window)) {
+		if (before(ack, prior_snd_una - max_window)) {
 			if (!(flag & FLAG_NO_CHALLENGE_ACK))
 				tcp_send_challenge_ack(sk);
 			return -SKB_DROP_REASON_TCP_TOO_OLD_ACK;