Message ID | 20240607125652.1472540-1-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 36534d3c54537bf098224a32dc31397793d4594d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() | expand |
On Fri, Jun 7, 2024 at 8:56 AM Eric Dumazet <edumazet@google.com> wrote: > > Due to timer wheel implementation, a timer will usually fire > after its schedule. > > For instance, for HZ=1000, a timeout between 512ms and 4s > has a granularity of 64ms. > For this range of values, the extra delay could be up to 63ms. > > For TCP, this means that tp->rcv_tstamp may be after > inet_csk(sk)->icsk_timeout whenever the timer interrupt > finally triggers, if one packet came during the extra delay. > > We need to make sure tcp_rtx_probe0_timed_out() handles this case. > > Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Menglong Dong <imagedong@tencent.com> > --- > net/ipv4/tcp_timer.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 83fe7f62f7f10ab111512a3ef15a97a04c79cb4a..5bfd76a31af6da6473d306d95c296180141f54e0 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -485,8 +485,12 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, > { > const struct tcp_sock *tp = tcp_sk(sk); > const int timeout = TCP_RTO_MAX * 2; > - u32 rcv_delta; > + s32 rcv_delta; > > + /* Note: timer interrupt might have been delayed by at least one jiffy, > + * and tp->rcv_tstamp might very well have been written recently. > + * rcv_delta can thus be negative. > + */ > rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; > if (rcv_delta <= timeout) > return false; Nice catch! Is this a sufficient fix? The icsk_timeout field is unsigned long and rcv_tstamp is u32. So on 64-bit architectures icsk_timeout is u64 and rcv_tstamp is u32. AFAICT it is not safe to subtract a u32 jiffies timestamp from a u64 jiffies timestamp and expect to get an answer we can use in this simple way (at least in general, after a few weeks of uptime when the u32 jiffies value has wrapped and the u64 value has not). I wonder if we also need something like this for a complete fix: - rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + rcv_delta = (u32)inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; thanks, neal
On Fri, Jun 7, 2024 at 5:12 PM Neal Cardwell <ncardwell@google.com> wrote: > > On Fri, Jun 7, 2024 at 8:56 AM Eric Dumazet <edumazet@google.com> wrote: > > > > Due to timer wheel implementation, a timer will usually fire > > after its schedule. > > > > For instance, for HZ=1000, a timeout between 512ms and 4s > > has a granularity of 64ms. > > For this range of values, the extra delay could be up to 63ms. > > > > For TCP, this means that tp->rcv_tstamp may be after > > inet_csk(sk)->icsk_timeout whenever the timer interrupt > > finally triggers, if one packet came during the extra delay. > > > > We need to make sure tcp_rtx_probe0_timed_out() handles this case. > > > > Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Menglong Dong <imagedong@tencent.com> > > --- > > net/ipv4/tcp_timer.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > > index 83fe7f62f7f10ab111512a3ef15a97a04c79cb4a..5bfd76a31af6da6473d306d95c296180141f54e0 100644 > > --- a/net/ipv4/tcp_timer.c > > +++ b/net/ipv4/tcp_timer.c > > @@ -485,8 +485,12 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, > > { > > const struct tcp_sock *tp = tcp_sk(sk); > > const int timeout = TCP_RTO_MAX * 2; > > - u32 rcv_delta; > > + s32 rcv_delta; > > > > + /* Note: timer interrupt might have been delayed by at least one jiffy, > > + * and tp->rcv_tstamp might very well have been written recently. > > + * rcv_delta can thus be negative. > > + */ > > rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; > > if (rcv_delta <= timeout) > > return false; > > Nice catch! > > Is this a sufficient fix? The icsk_timeout field is unsigned long and > rcv_tstamp is u32. So on 64-bit architectures icsk_timeout is u64 and > rcv_tstamp is u32. AFAICT it is not safe to subtract a u32 jiffies > timestamp from a u64 jiffies timestamp and expect to get an answer we > can use in this simple way (at least in general, after a few weeks of > uptime when the u32 jiffies value has wrapped and the u64 value has > not). > > I wonder if we also need something like this for a complete fix: > > - rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; > + rcv_delta = (u32)inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; We do this kind of (unsigned long) -> u32 conversions in many other places, and they work if we store the result in a 32bit field (we do not care of upper 32bits on 64bit kernels) rcv_delta is s32, it does not matter if we cast (u32) to inet_csk(sk)->icsk_timeout Compiler generates the same code (and result) : mov 0x430(%rdi),%eax // low 32bit of inet_csk(sk)->icsk_timeout sub 0x5b4(%rdi),%eax // tp->rcv_tstamp cmp $0x3a980,%eax jle aa8 <tcp_rtx_probe0_timed_out+0x38> Thanks !
On Fri, Jun 7, 2024 at 8:56 AM Eric Dumazet <edumazet@google.com> wrote: > > Due to timer wheel implementation, a timer will usually fire > after its schedule. > > For instance, for HZ=1000, a timeout between 512ms and 4s > has a granularity of 64ms. > For this range of values, the extra delay could be up to 63ms. > > For TCP, this means that tp->rcv_tstamp may be after > inet_csk(sk)->icsk_timeout whenever the timer interrupt > finally triggers, if one packet came during the extra delay. > > We need to make sure tcp_rtx_probe0_timed_out() handles this case. > > Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Menglong Dong <imagedong@tencent.com> > --- OK, makes sense. Thanks, Eric! Acked-by: Neal Cardwell <ncardwell@google.com> neal
On Fri, Jun 7, 2024 at 8:58 PM Eric Dumazet <edumazet@google.com> wrote: > > Due to timer wheel implementation, a timer will usually fire > after its schedule. > > For instance, for HZ=1000, a timeout between 512ms and 4s > has a granularity of 64ms. > For this range of values, the extra delay could be up to 63ms. > > For TCP, this means that tp->rcv_tstamp may be after > inet_csk(sk)->icsk_timeout whenever the timer interrupt > finally triggers, if one packet came during the extra delay. > > We need to make sure tcp_rtx_probe0_timed_out() handles this case. > > Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Thanks!
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 7 Jun 2024 12:56:52 +0000 you wrote: > Due to timer wheel implementation, a timer will usually fire > after its schedule. > > For instance, for HZ=1000, a timeout between 512ms and 4s > has a granularity of 64ms. > For this range of values, the extra delay could be up to 63ms. > > [...] Here is the summary with links: - [net] tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() https://git.kernel.org/netdev/net/c/36534d3c5453 You are awesome, thank you!
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 83fe7f62f7f10ab111512a3ef15a97a04c79cb4a..5bfd76a31af6da6473d306d95c296180141f54e0 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -485,8 +485,12 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, { const struct tcp_sock *tp = tcp_sk(sk); const int timeout = TCP_RTO_MAX * 2; - u32 rcv_delta; + s32 rcv_delta; + /* Note: timer interrupt might have been delayed by at least one jiffy, + * and tp->rcv_tstamp might very well have been written recently. + * rcv_delta can thus be negative. + */ rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false;
Due to timer wheel implementation, a timer will usually fire after its schedule. For instance, for HZ=1000, a timeout between 512ms and 4s has a granularity of 64ms. For this range of values, the extra delay could be up to 63ms. For TCP, this means that tp->rcv_tstamp may be after inet_csk(sk)->icsk_timeout whenever the timer interrupt finally triggers, if one packet came during the extra delay. We need to make sure tcp_rtx_probe0_timed_out() handles this case. Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Menglong Dong <imagedong@tencent.com> --- net/ipv4/tcp_timer.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)