[net,2/3] tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe

From: Neal Cardwell <ncardwell@google.com>

From: Neal Cardwell <ncardwell@google.com>

Fix tcp_enter_recovery() so that if there are no retransmits out then
we zero retrans_stamp when entering fast recovery. This is necessary
to fix two buggy behaviors.

Currently a non-zero retrans_stamp value can persist across multiple
back-to-back loss recovery episodes. This is because we generally only
clears retrans_stamp if we are completely done with loss recoveries,
and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This
behavior causes two bugs:

(1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed
immediately by a new CA_Recovery, the retrans_stamp value can persist
and can be a time before this new CA_Recovery episode starts. That
means that timestamp-based undo will be using the wrong retrans_stamp
(a value that is too old) when comparing incoming TS ecr values to
retrans_stamp to see if the current fast recovery episode can be
undone.

(2) If there is a roughly minutes-long sequence of back-to-back fast
recovery episodes, one after another (e.g. in a shallow-buffered or
policed bottleneck), where each fast recovery successfully makes
forward progress and recovers one window of sequence space (but leaves
at least one retransmit in flight at the end of the recovery),
followed by several RTOs, then the ETIMEDOUT check may be using the
wrong retrans_stamp (a value set at the start of the first fast
recovery in the sequence). This can cause a very premature ETIMEDOUT,
killing the connection prematurely.

This commit changes the code to zero retrans_stamp when entering fast
recovery, when this is known to be safe (no retransmits are out in the
network). That ensures that when starting a fast recovery episode, and
it is safe to do so, retrans_stamp is set when we send the fast
retransmit packet. That addresses both bug (1) and bug (2) by ensuring
that (if no retransmits are out when we start a fast recovery) we use
the initial fast retransmit of this fast recovery as the time value
for undo and ETIMEDOUT calculations.

This makes intuitive sense, since the start of a new fast recovery
episode (in a scenario where no lost packets are out in the network)
means that the connection has made forward progress since the last RTO
or fast recovery, and we should thus "restart the clock" used for both
undo and ETIMEDOUT logic.

Note that if when we start fast recovery there *are* retransmits out
in the network, there can still be undesirable (1)/(2) issues. For
example, after this patch we can still have the (1) and (2) problems
in cases like this:

+ round 1: sender sends flight 1

+ round 2: sender receives SACKs and enters fast recovery 1,
  retransmits some packets in flight 1 and then sends some new data as
  flight 2

+ round 3: sender receives some SACKs for flight 2, notes losses, and
  retransmits some packets to fill the holes in flight 2

+ fast recovery has some lost retransmits in flight 1 and continues
  for one or more rounds sending retransmits for flight 1 and flight 2

+ fast recovery 1 completes when snd_una reaches high_seq at end of
  flight 1

+ there are still holes in the SACK scoreboard in flight 2, so we
  enter fast recovery 2, but some retransmits in the flight 2 sequence
  range are still in flight (retrans_out > 0), so we can't execute the
  new retrans_stamp=0 added here to clear retrans_stamp

It's not yet clear how to fix these remaining (1)/(2) issues in an
efficient way without breaking undo behavior, given that retrans_stamp
is currently used for undo and ETIMEDOUT. Perhaps the optimal (but
expensive) strategy would be to set retrans_stamp to the timestamp of
the earliest outstanding retransmit when entering fast recovery. But
at least this commit makes things better.

Note that this does not change the semantics of retrans_stamp; it
simply makes retrans_stamp accurate in some cases where it was not
before:

(1) Some loss recovery, followed by an immediate entry into a fast
recovery, where there are no retransmits out when entering the fast
recovery.

(2) When a TFO server has a SYNACK retransmit that sets retrans_stamp,
and then the ACK that completes the 3-way handshake has SACK blocks
that trigger a fast recovery. In this case when entering fast recovery
we want to zero out the retrans_stamp from the TFO SYNACK retransmit,
and set the retrans_stamp based on the timestamp of the fast recovery.

We introduce a tcp_retrans_stamp_cleanup() helper, because this
two-line sequence already appears in 3 places and is about to appear
in 2 more as a result of this bug fix patch series. Once this bug fix
patches series in the net branch makes it into the net-next branch
we'll update the 3 other call sites to use the new helper.

This is a long-standing issue. The Fixes tag below is chosen to be the
oldest commit at which the patch will apply cleanly, which is from
Linux v3.5 in 2012.

Fixes: 1fbc340514fc ("tcp: early retransmit: tcp_enter_recovery()")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Message ID	20241001200517.2756803-3-ncardwell.sw@gmail.com (mailing list archive)
State	Accepted
Commit	b41b4cbd9655bcebcce941bef3601db8110335be
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-vs1-f44.google.com (mail-vs1-f44.google.com [209.85.217.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 922981CDFD9 for <netdev@vger.kernel.org>; Tue, 1 Oct 2024 20:05:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727813129; cv=none; b=Qzzn2Ac5kLMoM/Z9UIuuEDyqW3dM4I8v6Qu2+dQ+lTqWUf23N55I98hHI+UN/q2YCSQaFqsed1thcMfehfRqiFukCMBDR16wPO3QEryNapDNao9w7AbYbosWdZflLbNXZM+nSUynZRHGI6LUA0XLmoWyQP6bm9KYd06eoZzAJm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727813129; c=relaxed/simple; bh=54uoi+2cFLXhqmfwK+PlsKWY6RtmgqNpGxTh5EzmW2M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zto4QmDdp6ftPXhIeHx1Cr7+PVK2FAOI//WUDBLzVPVUHMmawJoWWDJDPtjA9LWrn+UFSx9UHAx/DIW/fqcJ2mjRaZjx9fb/JSVOdHnizIg1v4PT7vAe5NDlKRB6djKhdpWH5uHaehsCs/PUlU7e22QbjDAeEfYCDlwhTV9JLjg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KdkXJ+/H; arc=none smtp.client-ip=209.85.217.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KdkXJ+/H" Received: by mail-vs1-f44.google.com with SMTP id ada2fe7eead31-4a3b5547b5dso186124137.1 for <netdev@vger.kernel.org>; Tue, 01 Oct 2024 13:05:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727813126; x=1728417926; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wmxk7j+3byLItJyf/Z7Lt+wp1NJBL8MowUBgAdHx/dQ=; b=KdkXJ+/HuQXRPjpol0Pz9VLz+KNNuq6t2yS+CcoU77YZ4iCsEQk5a8WfqlNVLHk3QV 2OmGwVukbwsxKMoyhiI484e3F+KlBWIgDdbxg7AFbqZxn6FFJuMLphOU1bKSLlx+pJnd idsShvzQTYsmEfCsWeiUyBu0SQfx9o5YQj4uKvOrkyUR88TvDQcKBF1E1IMLQmwgsujk nWB9ZIJIfQ4BbFAPl0BYmDY9kj0+wIrrUJOIxhxUF3wDrg7Q/MawZCE8LEW/PMQ2iQhd iMl31Btsghun2Cgc7v+axQQ9zH3TU6jasSMofgzRWMx5xV7tEdTT7v1ojLgi8rhLV+1B 78wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727813126; x=1728417926; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wmxk7j+3byLItJyf/Z7Lt+wp1NJBL8MowUBgAdHx/dQ=; b=hoea4wf7G2600APZz5ZfPTA/9PERYVu24UAg4V6M87BIFEFuNfoL6vcA1oTZmhlxaX fE0TLeHXk/mlZH2uGVmbq3l7YPcLUwvfKBkaDkIJDydKUGHcKV3b5MhPIjxepPjmm9Ps 5z+/vu9AIFnogdgsVY5eRFlOrnYjb6Fyl4IbqywxuqEUprtb1j86bb4nXzKrF365L+fC qghXFDgDiJGtPLyPjNIq2mGOb/8GZOiK9SCdTNo335JyZ6I7Oh+8G19ox1L8dTVLeLeZ rpzLOxuakqY+psrVfJdsRCfOZkhnRvo/njquA9YNE8KB9m5HZ+/HUu3IXxZW8ThgqA57 uP1A== X-Gm-Message-State: AOJu0YyiRSliN3S5tY2Za2UyNgdLC4NGvSxrdffD3bmtysYg0ixcsKVz ruu+fTSubNSvOva6IKC1aVg0cZmlxXfmeQXYFG/6FEPpI+KkKNCN X-Google-Smtp-Source: AGHT+IH40pxfJ2N8usqwZtlYgkwJMpPNBd11Tfp36vzE5tVzxFyEAIDb1fbHy6bP3JoixMWMTAtfQA== X-Received: by 2002:a05:6102:5121:b0:493:cc30:35f3 with SMTP id ada2fe7eead31-4a3e66180c2mr534776137.0.1727813126385; Tue, 01 Oct 2024 13:05:26 -0700 (PDT) Received: from ahoy.c.googlers.com.com (182.71.196.35.bc.googleusercontent.com. [35.196.71.182]) by smtp.gmail.com with ESMTPSA id a1e0cc1a2514c-84eb504217bsm1237447241.14.2024.10.01.13.05.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 13:05:25 -0700 (PDT) From: Neal Cardwell <ncardwell.sw@gmail.com> To: David Miller <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com> Cc: netdev@vger.kernel.org, Neal Cardwell <ncardwell@google.com>, Yuchung Cheng <ycheng@google.com> Subject: [PATCH net 2/3] tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe Date: Tue, 1 Oct 2024 20:05:16 +0000 Message-ID: <20241001200517.2756803-3-ncardwell.sw@gmail.com> X-Mailer: git-send-email 2.46.1.824.gd892dcdcdd-goog In-Reply-To: <20241001200517.2756803-1-ncardwell.sw@gmail.com> References: <20241001200517.2756803-1-ncardwell.sw@gmail.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: <netdev.vger.kernel.org> List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Delegate: kuba@kernel.org
Series	tcp: 3 fixes for retrans_stamp and undo logic \| expand [net,0/3] tcp: 3 fixes for retrans_stamp and undo logic [net,1/3] tcp: fix to allow timestamp undo if no retransmits were sent [net,2/3] tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe [net,3/3] tcp: fix TFO SYN_RECV to not zero retrans_stamp with retransmits out

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 9 this patch: 9
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	2 maintainers not CCed: pabeni@redhat.com dsahern@kernel.org
netdev/build_clang	success	Errors and warnings before: 9 this patch: 9
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 12 this patch: 12
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 25 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 1 this patch: 1
netdev/source_inline	success	Was 0 now: 0

[net,2/3] tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe

Checks

Commit Message

Patch