diff mbox series

[net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO

Message ID 20220903121023.866900-1-ncardwell.kernel@gmail.com (mailing list archive)
State Accepted
Commit 686dc2db2a0fdc1d34b424ec2c0a735becd8d62b
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/cc_maintainers warning 4 maintainers not CCed: edumazet@google.com dsahern@kernel.org yoshfuji@linux-ipv6.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch warning WARNING: Block comments use a trailing */ on a separate line
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Neal Cardwell Sept. 3, 2022, 12:10 p.m. UTC
From: Neal Cardwell <ncardwell@google.com>

Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
of a spurious non-SACK RTO could cause a connection to fail to clear
retrans_stamp, causing a later RTO to very prematurely time out the
connection with ETIMEDOUT.

Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
report:

(*1) Send one data packet on a non-SACK connection

(*2) Because no ACK packet is received, the packet is retransmitted
     and we enter CA_Loss; but this retransmission is spurious.

(*3) The ACK for the original data is received. The transmitted packet
     is acknowledged.  The TCP timestamp is before the retrans_stamp,
     so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
     true without changing state to Open (because tcp_is_sack() is
     false), and tcp_process_loss() returns without calling
     tcp_try_undo_recovery().  Normally after undoing a CA_Loss
     episode, tcp_fastretrans_alert() would see that the connection
     has returned to CA_Open and fall through and call
     tcp_try_to_open(), which would set retrans_stamp to 0.  However,
     for non-SACK connections we hold the connection in CA_Loss, so do
     not fall through to call tcp_try_to_open() and do not set
     retrans_stamp to 0. So retrans_stamp is (erroneously) still
     non-zero.

     At this point the first "retransmission event" has passed and
     been recovered from. Any future retransmission is a completely
     new "event". However, retrans_stamp is erroneously still
     set. (And we are still in CA_Loss, which is correct.)

(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
     packet is sent. Note: No data is transmitted between (*3) and
     (*4) and we disabled keep alives.

     The socket's timeout SHOULD be calculated from this point in
     time, but instead it's calculated from the prior "event" 16
     minutes ago (step (*2)).

(*5) Because no ACK packet is received, the packet is retransmitted.

(*6) At the time of the 2nd retransmission, the socket returns
     ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
     too far in the past (set at the time of (*2)).

This commit fixes this bug by ensuring that we reuse in
tcp_try_undo_loss() the same careful logic for non-SACK connections
that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
we factor out that logic into a new
tcp_is_non_sack_preventing_reopen() helper and call that helper from
both undo functions.

Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss")
Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com>
Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_input.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Comments

Eric Dumazet Sept. 3, 2022, 5:03 p.m. UTC | #1
On 9/3/22 05:10, Neal Cardwell wrote:
> From: Neal Cardwell <ncardwell@google.com>
>
> Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
> of a spurious non-SACK RTO could cause a connection to fail to clear
> retrans_stamp, causing a later RTO to very prematurely time out the
> connection with ETIMEDOUT.
>
> Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
> report:
>
> (*1) Send one data packet on a non-SACK connection
>
> (*2) Because no ACK packet is received, the packet is retransmitted
>       and we enter CA_Loss; but this retransmission is spurious.
>
> (*3) The ACK for the original data is received. The transmitted packet
>       is acknowledged.  The TCP timestamp is before the retrans_stamp,
>       so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
>       true without changing state to Open (because tcp_is_sack() is
>       false), and tcp_process_loss() returns without calling
>       tcp_try_undo_recovery().  Normally after undoing a CA_Loss
>       episode, tcp_fastretrans_alert() would see that the connection
>       has returned to CA_Open and fall through and call
>       tcp_try_to_open(), which would set retrans_stamp to 0.  However,
>       for non-SACK connections we hold the connection in CA_Loss, so do
>       not fall through to call tcp_try_to_open() and do not set
>       retrans_stamp to 0. So retrans_stamp is (erroneously) still
>       non-zero.
>
>       At this point the first "retransmission event" has passed and
>       been recovered from. Any future retransmission is a completely
>       new "event". However, retrans_stamp is erroneously still
>       set. (And we are still in CA_Loss, which is correct.)
>
> (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
>       packet is sent. Note: No data is transmitted between (*3) and
>       (*4) and we disabled keep alives.
>
>       The socket's timeout SHOULD be calculated from this point in
>       time, but instead it's calculated from the prior "event" 16
>       minutes ago (step (*2)).
>
> (*5) Because no ACK packet is received, the packet is retransmitted.
>
> (*6) At the time of the 2nd retransmission, the socket returns
>       ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
>       too far in the past (set at the time of (*2)).
>
> This commit fixes this bug by ensuring that we reuse in
> tcp_try_undo_loss() the same careful logic for non-SACK connections
> that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
> we factor out that logic into a new
> tcp_is_non_sack_preventing_reopen() helper and call that helper from
> both undo functions.
>
> Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss")
> Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com>
> Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>


Reviewed-by: Eric Dumazet <edumazet@google.com>

Amazing that some folks still do not enable SACK...

Thanks !
patchwork-bot+netdevbpf@kernel.org Sept. 6, 2022, 9:20 a.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni@redhat.com>:

On Sat,  3 Sep 2022 08:10:23 -0400 you wrote:
> From: Neal Cardwell <ncardwell@google.com>
> 
> Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
> of a spurious non-SACK RTO could cause a connection to fail to clear
> retrans_stamp, causing a later RTO to very prematurely time out the
> connection with ETIMEDOUT.
> 
> [...]

Here is the summary with links:
  - [net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO
    https://git.kernel.org/netdev/net/c/686dc2db2a0f

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b85a9f755da41..bc2ea12221f95 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2513,6 +2513,21 @@  static inline bool tcp_may_undo(const struct tcp_sock *tp)
 	return tp->undo_marker && (!tp->undo_retrans || tcp_packet_delayed(tp));
 }
 
+static bool tcp_is_non_sack_preventing_reopen(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
+		/* Hold old state until something *above* high_seq
+		 * is ACKed. For Reno it is MUST to prevent false
+		 * fast retransmits (RFC2582). SACK TCP is safe. */
+		if (!tcp_any_retrans_done(sk))
+			tp->retrans_stamp = 0;
+		return true;
+	}
+	return false;
+}
+
 /* People celebrate: "We love our President!" */
 static bool tcp_try_undo_recovery(struct sock *sk)
 {
@@ -2535,14 +2550,8 @@  static bool tcp_try_undo_recovery(struct sock *sk)
 	} else if (tp->rack.reo_wnd_persist) {
 		tp->rack.reo_wnd_persist--;
 	}
-	if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
-		/* Hold old state until something *above* high_seq
-		 * is ACKed. For Reno it is MUST to prevent false
-		 * fast retransmits (RFC2582). SACK TCP is safe. */
-		if (!tcp_any_retrans_done(sk))
-			tp->retrans_stamp = 0;
+	if (tcp_is_non_sack_preventing_reopen(sk))
 		return true;
-	}
 	tcp_set_ca_state(sk, TCP_CA_Open);
 	tp->is_sack_reneg = 0;
 	return false;
@@ -2578,6 +2587,8 @@  static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo)
 			NET_INC_STATS(sock_net(sk),
 					LINUX_MIB_TCPSPURIOUSRTOS);
 		inet_csk(sk)->icsk_retransmits = 0;
+		if (tcp_is_non_sack_preventing_reopen(sk))
+			return true;
 		if (frto_undo || tcp_is_sack(tp)) {
 			tcp_set_ca_state(sk, TCP_CA_Open);
 			tp->is_sack_reneg = 0;