diff mbox series

[net] tcp: fix indefinite deferral of RTO with SACK reneging

Message ID 20221021170821.1093930-1-ncardwell.kernel@gmail.com (mailing list archive)
State Accepted
Commit 3d2af9cce3133b3bc596a9d065c6f9d93419ccfb
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: fix indefinite deferral of RTO with SACK reneging | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/cc_maintainers warning 3 maintainers not CCed: dsahern@kernel.org yoshfuji@linux-ipv6.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Neal Cardwell Oct. 21, 2022, 5:08 p.m. UTC
From: Neal Cardwell <ncardwell@google.com>

This commit fixes a bug that can cause a TCP data sender to repeatedly
defer RTOs when encountering SACK reneging.

The bug is that when we're in fast recovery in a scenario with SACK
reneging, every time we get an ACK we call tcp_check_sack_reneging()
and it can note the apparent SACK reneging and rearm the RTO timer for
srtt/2 into the future. In some SACK reneging scenarios that can
happen repeatedly until the receive window fills up, at which point
the sender can't send any more, the ACKs stop arriving, and the RTO
fires at srtt/2 after the last ACK. But that can take far too long
(O(10 secs)), since the connection is stuck in fast recovery with a
low cwnd that cannot grow beyond ssthresh, even if more bandwidth is
available.

This fix changes the logic in tcp_check_sack_reneging() to only rearm
the RTO timer if data is cumulatively ACKed, indicating forward
progress. This avoids this kind of nearly infinite loop of RTO timer
re-arming. In addition, this meets the goals of
tcp_check_sack_reneging() in handling Windows TCP behavior that looks
temporarily like SACK reneging but is not really.

Many thanks to Jakub Kicinski and Neil Spring, who reported this issue
and provided critical packet traces that enabled root-causing this
issue. Also, many thanks to Jakub Kicinski for testing this fix.

Fixes: 5ae344c949e7 ("tcp: reduce spurious retransmits due to transient SACK reneging")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Neil Spring <ntspring@fb.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

patchwork-bot+netdevbpf@kernel.org Oct. 24, 2022, 6:20 p.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 21 Oct 2022 17:08:21 +0000 you wrote:
> From: Neal Cardwell <ncardwell@google.com>
> 
> This commit fixes a bug that can cause a TCP data sender to repeatedly
> defer RTOs when encountering SACK reneging.
> 
> The bug is that when we're in fast recovery in a scenario with SACK
> reneging, every time we get an ACK we call tcp_check_sack_reneging()
> and it can note the apparent SACK reneging and rearm the RTO timer for
> srtt/2 into the future. In some SACK reneging scenarios that can
> happen repeatedly until the receive window fills up, at which point
> the sender can't send any more, the ACKs stop arriving, and the RTO
> fires at srtt/2 after the last ACK. But that can take far too long
> (O(10 secs)), since the connection is stuck in fast recovery with a
> low cwnd that cannot grow beyond ssthresh, even if more bandwidth is
> available.
> 
> [...]

Here is the summary with links:
  - [net] tcp: fix indefinite deferral of RTO with SACK reneging
    https://git.kernel.org/netdev/net/c/3d2af9cce313

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bc2ea12221f95..0640453fce54b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2192,7 +2192,8 @@  void tcp_enter_loss(struct sock *sk)
  */
 static bool tcp_check_sack_reneging(struct sock *sk, int flag)
 {
-	if (flag & FLAG_SACK_RENEGING) {
+	if (flag & FLAG_SACK_RENEGING &&
+	    flag & FLAG_SND_UNA_ADVANCED) {
 		struct tcp_sock *tp = tcp_sk(sk);
 		unsigned long delay = max(usecs_to_jiffies(tp->srtt_us >> 4),
 					  msecs_to_jiffies(10));