Message ID | 20210220005447.GA93678@localhost.localdomain (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: fix keepalive when data remain undelivered | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | fail | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 43 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Fri, Feb 19, 2021 at 4:54 PM Enke Chen <enkechen2020@gmail.com> wrote: > > From: Enke Chen <enchen@paloaltonetworks.com> > > TCP keepalive does not timeout under the condition that network connection > is lost and data remain undelivered (incl. retransmit). A very simple > scenarios of the failure is to write data to a tcp socket after the network > connection is lost. AFAIK current Linux TCP KA implementation does not violate the RFC793bis (Section 3.7.4 Keep-Alives) https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-20#section-3.7.4 We have even raised that in IETF tcpm list to get more clarity https://mailarchive.ietf.org/arch/msg/tcpm/KxcEsLtDuDNhcP8UjbyPkJqpzsE/ I believe you interpret the keep-alive differently -- so this is arguably a subjective "bug-fix". As Neal and I have expressed in our private communications, current Linux KA has been implemented for more than a decade. Changing this behavior may break many existing applications even if it may benefit certain. > > Under the specified condition the keepalive timeout is not evaluated in > the keepalive timer. That is the primary cause of the failure. In addition, > the keepalive probe is not sent out in the keepalive timer. Although packet > retransmit or 0-window probe can serve a similar purpose, they have their > own timers and backoffs that are generally not aligned with the keepalive > parameters for probes and timeout. > > As the timing and conditions of the events involved are random, the tcp > keepalive can fail randomly. Given the randomness of the failures, fixing > the issue would not cause any backward compatibility issues. As was well > said, "Determinism is a special case of randomness". > > The fix in this patch consists of the following: > > a. Always evaluate the keepalive timeout in the keepalive timer. > > b. Always send out the keepalive probe in the keepalive timer (post the > keepalive idle time). Given that the keepalive intervals are usually > in the range of 30 - 60 seconds, there is no need for an optimization > to further reduce the number of keepalive probes in the presence of > packet retransmit. > > c. Use the elapsed time (instead of the 0-window probe counter) in > evaluating tcp keepalive timeout. > > Thanks to Eric Dumazet, Neal Cardwell, and Yuchung Cheng for helpful > discussions about the issue and options for fixing it. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2 Initial git repository build") > Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> > --- > net/ipv4/tcp_timer.c | 20 ++++++-------------- > 1 file changed, 6 insertions(+), 14 deletions(-) > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 4ef08079ccfa..16a044da20db 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t) > ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))) > goto out; > > - elapsed = keepalive_time_when(tp); > - > - /* It is alive without keepalive 8) */ > - if (tp->packets_out || !tcp_write_queue_empty(sk)) > - goto resched; > - > elapsed = keepalive_time_elapsed(tp); > > if (elapsed >= keepalive_time_when(tp)) { > /* If the TCP_USER_TIMEOUT option is enabled, use that > * to determine when to timeout instead. > */ > - if ((icsk->icsk_user_timeout != 0 && > - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && > - icsk->icsk_probes_out > 0) || > - (icsk->icsk_user_timeout == 0 && > - icsk->icsk_probes_out >= keepalive_probes(tp))) { > + u32 timeout = icsk->icsk_user_timeout ? > + msecs_to_jiffies(icsk->icsk_user_timeout) : > + keepalive_intvl_when(tp) * keepalive_probes(tp) + > + keepalive_time_when(tp); > + > + if (elapsed >= timeout) { > tcp_send_active_reset(sk, GFP_ATOMIC); > tcp_write_err(sk); > goto out; > } > if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { > - icsk->icsk_probes_out++; > elapsed = keepalive_intvl_when(tp); > } else { > /* If keepalive was lost due to local congestion, > @@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t) > } > > sk_mem_reclaim(sk); > - > -resched: > inet_csk_reset_keepalive_timer (sk, elapsed); > goto out; > > -- > 2.29.2 >
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 4ef08079ccfa..16a044da20db 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t) ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))) goto out; - elapsed = keepalive_time_when(tp); - - /* It is alive without keepalive 8) */ - if (tp->packets_out || !tcp_write_queue_empty(sk)) - goto resched; - elapsed = keepalive_time_elapsed(tp); if (elapsed >= keepalive_time_when(tp)) { /* If the TCP_USER_TIMEOUT option is enabled, use that * to determine when to timeout instead. */ - if ((icsk->icsk_user_timeout != 0 && - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && - icsk->icsk_probes_out > 0) || - (icsk->icsk_user_timeout == 0 && - icsk->icsk_probes_out >= keepalive_probes(tp))) { + u32 timeout = icsk->icsk_user_timeout ? + msecs_to_jiffies(icsk->icsk_user_timeout) : + keepalive_intvl_when(tp) * keepalive_probes(tp) + + keepalive_time_when(tp); + + if (elapsed >= timeout) { tcp_send_active_reset(sk, GFP_ATOMIC); tcp_write_err(sk); goto out; } if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { - icsk->icsk_probes_out++; elapsed = keepalive_intvl_when(tp); } else { /* If keepalive was lost due to local congestion, @@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t) } sk_mem_reclaim(sk); - -resched: inet_csk_reset_keepalive_timer (sk, elapsed); goto out;