Message ID | 20210112192544.GA12209@localhost.localdomain (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: keepalive fixes | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 39 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/header_inline | success | Link |
netdev/stable | fail | Stable CC detected: Cc: stable@vger.kernel.org |
On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote: > > From: Enke Chen <enchen@paloaltonetworks.com> > > In this patch two issues with TCP keepalives are fixed: > > 1) TCP keepalive does not timeout when there are data waiting to be > delivered and then the connection got broken. The TCP keepalive > timeout is not evaluated in that condition. hi enke Do you have an example to demonstrate this issue -- in theory when there is data inflight, an RTO timer should be pending (which considers user-timeout setting). based on the user-timeout description (man tcp), the user timeout should abort the socket per the specified time after data commences. some data would help to understand the issue. > > The fix is to remove the code that prevents TCP keepalive from > being evaluated for timeout. > > 2) With the fix for #1, TCP keepalive can erroneously timeout after > the 0-window probe kicks in. The 0-window probe counter is wrongly > applied to TCP keepalives. > > The fix is to use the elapsed time instead of the 0-window probe > counter in evaluating TCP keepalive timeout. > > Cc: stable@vger.kernel.org > Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> > --- > net/ipv4/tcp_timer.c | 15 +++------------ > 1 file changed, 3 insertions(+), 12 deletions(-) > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 6c62b9ea1320..40953aa40d53 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -696,12 +696,6 @@ static void tcp_keepalive_timer (struct timer_list *t) > ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))) > goto out; > > - elapsed = keepalive_time_when(tp); > - > - /* It is alive without keepalive 8) */ > - if (tp->packets_out || !tcp_write_queue_empty(sk)) > - goto resched; > - > elapsed = keepalive_time_elapsed(tp); > > if (elapsed >= keepalive_time_when(tp)) { > @@ -709,16 +703,15 @@ static void tcp_keepalive_timer (struct timer_list *t) > * to determine when to timeout instead. > */ > if ((icsk->icsk_user_timeout != 0 && > - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && > - icsk->icsk_probes_out > 0) || > + elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout)) || > (icsk->icsk_user_timeout == 0 && > - icsk->icsk_probes_out >= keepalive_probes(tp))) { > + (elapsed >= keepalive_time_when(tp) + > + keepalive_intvl_when(tp) * keepalive_probes(tp)))) { > tcp_send_active_reset(sk, GFP_ATOMIC); > tcp_write_err(sk); > goto out; > } > if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { > - icsk->icsk_probes_out++; > elapsed = keepalive_intvl_when(tp); > } else { > /* If keepalive was lost due to local congestion, > @@ -732,8 +725,6 @@ static void tcp_keepalive_timer (struct timer_list *t) > } > > sk_mem_reclaim(sk); > - > -resched: > inet_csk_reset_keepalive_timer (sk, elapsed); > goto out; > > -- > 2.29.2 >
On Tue, Jan 12, 2021 at 11:48 PM Yuchung Cheng <ycheng@google.com> wrote: > > On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote: > > > > From: Enke Chen <enchen@paloaltonetworks.com> > > > > In this patch two issues with TCP keepalives are fixed: > > > > 1) TCP keepalive does not timeout when there are data waiting to be > > delivered and then the connection got broken. The TCP keepalive > > timeout is not evaluated in that condition. > hi enke > Do you have an example to demonstrate this issue -- in theory when > there is data inflight, an RTO timer should be pending (which > considers user-timeout setting). based on the user-timeout description > (man tcp), the user timeout should abort the socket per the specified > time after data commences. some data would help to understand the > issue. > +1 A packetdrill test would be ideal. Also, given that there is this ongoing issue with TCP_USER_TIMEOUT, lets not mix things or risk added work for backports to stable versions.
Hi, Yuchung: I have attached the python script that reproduces the keepalive issues. The script is a slight modification of the one written by Marek Majkowski: https://github.com/cloudflare/cloudflare-blog/blob/master/2019-09-tcp-keepalives/test-zero.py Please note that only the TCP keepalive is configured, and not the user timeout. Thanks. -- Enke On Tue, Jan 12, 2021 at 02:48:01PM -0800, Yuchung Cheng wrote: > On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote: > > > > From: Enke Chen <enchen@paloaltonetworks.com> > > > > In this patch two issues with TCP keepalives are fixed: > > > > 1) TCP keepalive does not timeout when there are data waiting to be > > delivered and then the connection got broken. The TCP keepalive > > timeout is not evaluated in that condition. > hi enke > Do you have an example to demonstrate this issue -- in theory when > there is data inflight, an RTO timer should be pending (which > considers user-timeout setting). based on the user-timeout description > (man tcp), the user timeout should abort the socket per the specified > time after data commences. some data would help to understand the > issue. > ------ #! /usr/bin/python import io import os import select import socket import time import utils import ctypes utils.new_ns() port = 1 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) s.bind(('127.0.0.1', port)) s.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) s.listen(16) tcpdump = utils.tcpdump_start(port) c = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) c.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) c.connect(('127.0.0.1', port)) x, _ = s.accept() if False: c.setsockopt(socket.IPPROTO_TCP, socket.TCP_USER_TIMEOUT, 90*1000) if True: c.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) c.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5) c.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 10) c.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10) time.sleep(0.2) print("[ ] c.send()") import fcntl TIOCOUTQ=0x5411 c.setblocking(False) while True: bytes_avail = ctypes.c_int() fcntl.ioctl(c.fileno(), TIOCOUTQ, bytes_avail) if bytes_avail.value > 64*1024: break try: c.send(b"A" * 16384 * 4) except io.BlockingIOError: break c.setblocking(True) time.sleep(0.2) utils.ss(port) utils.check_buffer(c) t0 = time.time() if True: utils.drop_start(dport=port) utils.drop_start(sport=port) poll = select.poll() poll.register(c, select.POLLIN) poll.poll() utils.ss(port) e = c.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR) print("[ ] SO_ERROR = %s" % (e,)) t1 = time.time() print("[ ] took: %f seconds" % (t1-t0,))
Hi, Eric: Just to clarify: the issues for tcp keepalive and TCP_USER_TIMEOUT are separate isues, and the fixes would not conflict afaik. Thanks. -- Enke On Tue, Jan 12, 2021 at 11:52:43PM +0100, Eric Dumazet wrote: > On Tue, Jan 12, 2021 at 11:48 PM Yuchung Cheng <ycheng@google.com> wrote: > > > > On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote: > > > > > > From: Enke Chen <enchen@paloaltonetworks.com> > > > > > > In this patch two issues with TCP keepalives are fixed: > > > > > > 1) TCP keepalive does not timeout when there are data waiting to be > > > delivered and then the connection got broken. The TCP keepalive > > > timeout is not evaluated in that condition. > > hi enke > > Do you have an example to demonstrate this issue -- in theory when > > there is data inflight, an RTO timer should be pending (which > > considers user-timeout setting). based on the user-timeout description > > (man tcp), the user timeout should abort the socket per the specified > > time after data commences. some data would help to understand the > > issue. > > > > +1 > > A packetdrill test would be ideal. > > Also, given that there is this ongoing issue with TCP_USER_TIMEOUT, > lets not mix things > or risk added work for backports to stable versions.
On Wed, Jan 13, 2021 at 12:06:27PM -0800, Enke Chen wrote: > Hi, Eric: > > Just to clarify: the issues for tcp keepalive and TCP_USER_TIMEOUT are > separate isues, and the fixes would not conflict afaik. > > Thanks. -- Enke I have posted patches for both issues, and there is no conflict between the patches. Thanks. -- Enke > > On Tue, Jan 12, 2021 at 11:52:43PM +0100, Eric Dumazet wrote: > > On Tue, Jan 12, 2021 at 11:48 PM Yuchung Cheng <ycheng@google.com> wrote: > > > > > > On Tue, Jan 12, 2021 at 2:31 PM Enke Chen <enkechen2020@gmail.com> wrote: > > > > > > > > From: Enke Chen <enchen@paloaltonetworks.com> > > > > > > > > In this patch two issues with TCP keepalives are fixed: > > > > > > > > 1) TCP keepalive does not timeout when there are data waiting to be > > > > delivered and then the connection got broken. The TCP keepalive > > > > timeout is not evaluated in that condition. > > > hi enke > > > Do you have an example to demonstrate this issue -- in theory when > > > there is data inflight, an RTO timer should be pending (which > > > considers user-timeout setting). based on the user-timeout description > > > (man tcp), the user timeout should abort the socket per the specified > > > time after data commences. some data would help to understand the > > > issue. > > > > > > > +1 > > > > A packetdrill test would be ideal. > > > > Also, given that there is this ongoing issue with TCP_USER_TIMEOUT, > > lets not mix things > > or risk added work for backports to stable versions.
Hi, Folks: Please ignore this patch. I will split it into separate ones as suggested off-list by Neal Cardwell <ncardwell@google.com>. Thanks. -- Enke On Tue, Jan 12, 2021 at 11:25:44AM -0800, Enke Chen wrote: > From: Enke Chen <enchen@paloaltonetworks.com> > > In this patch two issues with TCP keepalives are fixed: > > 1) TCP keepalive does not timeout when there are data waiting to be > delivered and then the connection got broken. The TCP keepalive > timeout is not evaluated in that condition. > > The fix is to remove the code that prevents TCP keepalive from > being evaluated for timeout. > > 2) With the fix for #1, TCP keepalive can erroneously timeout after > the 0-window probe kicks in. The 0-window probe counter is wrongly > applied to TCP keepalives. > > The fix is to use the elapsed time instead of the 0-window probe > counter in evaluating TCP keepalive timeout. > > Cc: stable@vger.kernel.org > Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> > --- > net/ipv4/tcp_timer.c | 15 +++------------ > 1 file changed, 3 insertions(+), 12 deletions(-) > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 6c62b9ea1320..40953aa40d53 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -696,12 +696,6 @@ static void tcp_keepalive_timer (struct timer_list *t) > ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))) > goto out; > > - elapsed = keepalive_time_when(tp); > - > - /* It is alive without keepalive 8) */ > - if (tp->packets_out || !tcp_write_queue_empty(sk)) > - goto resched; > - > elapsed = keepalive_time_elapsed(tp); > > if (elapsed >= keepalive_time_when(tp)) { > @@ -709,16 +703,15 @@ static void tcp_keepalive_timer (struct timer_list *t) > * to determine when to timeout instead. > */ > if ((icsk->icsk_user_timeout != 0 && > - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && > - icsk->icsk_probes_out > 0) || > + elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout)) || > (icsk->icsk_user_timeout == 0 && > - icsk->icsk_probes_out >= keepalive_probes(tp))) { > + (elapsed >= keepalive_time_when(tp) + > + keepalive_intvl_when(tp) * keepalive_probes(tp)))) { > tcp_send_active_reset(sk, GFP_ATOMIC); > tcp_write_err(sk); > goto out; > } > if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { > - icsk->icsk_probes_out++; > elapsed = keepalive_intvl_when(tp); > } else { > /* If keepalive was lost due to local congestion, > @@ -732,8 +725,6 @@ static void tcp_keepalive_timer (struct timer_list *t) > } > > sk_mem_reclaim(sk); > - > -resched: > inet_csk_reset_keepalive_timer (sk, elapsed); > goto out; > > -- > 2.29.2 >
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 6c62b9ea1320..40953aa40d53 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -696,12 +696,6 @@ static void tcp_keepalive_timer (struct timer_list *t) ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))) goto out; - elapsed = keepalive_time_when(tp); - - /* It is alive without keepalive 8) */ - if (tp->packets_out || !tcp_write_queue_empty(sk)) - goto resched; - elapsed = keepalive_time_elapsed(tp); if (elapsed >= keepalive_time_when(tp)) { @@ -709,16 +703,15 @@ static void tcp_keepalive_timer (struct timer_list *t) * to determine when to timeout instead. */ if ((icsk->icsk_user_timeout != 0 && - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && - icsk->icsk_probes_out > 0) || + elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout)) || (icsk->icsk_user_timeout == 0 && - icsk->icsk_probes_out >= keepalive_probes(tp))) { + (elapsed >= keepalive_time_when(tp) + + keepalive_intvl_when(tp) * keepalive_probes(tp)))) { tcp_send_active_reset(sk, GFP_ATOMIC); tcp_write_err(sk); goto out; } if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { - icsk->icsk_probes_out++; elapsed = keepalive_intvl_when(tp); } else { /* If keepalive was lost due to local congestion, @@ -732,8 +725,6 @@ static void tcp_keepalive_timer (struct timer_list *t) } sk_mem_reclaim(sk); - -resched: inet_csk_reset_keepalive_timer (sk, elapsed); goto out;