Message ID | 20240823033444.1257321-2-johunt@akamai.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | NULL ptr dereference in tcp_rearm_rto | expand |
On Fri, Aug 23, 2024 at 5:34 AM Josh Hunt <johunt@akamai.com> wrote: > > There have been multiple occassions where we have crashed in this path > because packets_out suggested there were packets on the write or retransmit > queues, but in fact there weren't leading to a NULL skb being dereferenced. > While we should fix that root cause we should also just make sure the skb > is not NULL before dereferencing it. Also add a warn once here to capture > some information if/when the problem case is hit again. > > Signed-off-by: Josh Hunt <johunt@akamai.com> Hi Josh We do not want a patch series of one patch, with the stack trace in the cover letter. Please send a standalone patch, with all the information in its changelog. 1) Add Neal Cardwell in the CC list. 2) Are you using TCP_REPAIR by any chance ? 3) Please double check your kernel has these fixes. commit 1f85e6267caca44b30c54711652b0726fadbb131 tcp: do not send empty skb from tcp_write_xmit() commit 0c175da7b0378445f5ef53904247cfbfb87e0b78 tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent Thanks.
On 8/22/24 11:55 PM, Eric Dumazet wrote: > !-------------------------------------------------------------------| > This Message Is From an External Sender > This message came from outside your organization. > |-------------------------------------------------------------------! > > On Fri, Aug 23, 2024 at 5:34 AM Josh Hunt <johunt@akamai.com> wrote: >> >> There have been multiple occassions where we have crashed in this path >> because packets_out suggested there were packets on the write or retransmit >> queues, but in fact there weren't leading to a NULL skb being dereferenced. >> While we should fix that root cause we should also just make sure the skb >> is not NULL before dereferencing it. Also add a warn once here to capture >> some information if/when the problem case is hit again. >> >> Signed-off-by: Josh Hunt <johunt@akamai.com> > > Hi Josh > > We do not want a patch series of one patch, with the stack trace in > the cover letter. > Please send a standalone patch, with all the information in its changelog. > > 1) Add Neal Cardwell in the CC list. > > 2) Are you using TCP_REPAIR by any chance ? > > 3) Please double check your kernel has these fixes. > > commit 1f85e6267caca44b30c54711652b0726fadbb131 tcp: do not send > empty skb from tcp_write_xmit() > commit 0c175da7b0378445f5ef53904247cfbfb87e0b78 tcp: prohibit > TCP_REPAIR_OPTIONS if data was already sent > Thanks Eric. I will resend and also check the commits you mentioned. I didn't include the writeup in the patch submission b/c it was rather long and detailed, but will include it in a v2. Josh
On Fri, 23 Aug 2024 12:43:36 -0700 Josh Hunt wrote: > Thanks Eric. I will resend and also check the commits you mentioned. I > didn't include the writeup in the patch submission b/c it was rather > long and detailed, but will include it in a v2. FWIW in linux networking we use the cover letter as the merge message, no matter how many patches there are. So both end up in git logs.
On 8/22/24 11:55 PM, Eric Dumazet wrote: > On Fri, Aug 23, 2024 at 5:34 AM Josh Hunt <johunt@akamai.com> wrote: >> >> There have been multiple occassions where we have crashed in this path >> because packets_out suggested there were packets on the write or retransmit >> queues, but in fact there weren't leading to a NULL skb being dereferenced. >> While we should fix that root cause we should also just make sure the skb >> is not NULL before dereferencing it. Also add a warn once here to capture >> some information if/when the problem case is hit again. >> >> Signed-off-by: Josh Hunt <johunt@akamai.com> > > Hi Josh > > We do not want a patch series of one patch, with the stack trace in > the cover letter. > Please send a standalone patch, with all the information in its changelog. > > 1) Add Neal Cardwell in the CC list. Sending v2 now with Neal included. > > 2) Are you using TCP_REPAIR by any chance ? > No, we're not using TCP_REPAIR on these machines. > 3) Please double check your kernel has these fixes. > > commit 1f85e6267caca44b30c54711652b0726fadbb131 tcp: do not send > empty skb from tcp_write_xmit() > commit 0c175da7b0378445f5ef53904247cfbfb87e0b78 tcp: prohibit > TCP_REPAIR_OPTIONS if data was already sent > We have the first commit, but not the second. Josh
diff --git a/include/net/tcp.h b/include/net/tcp.h index 2aac11e7e1cc..19ea6ed87880 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2433,10 +2433,19 @@ void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb); static inline s64 tcp_rto_delta_us(const struct sock *sk) { const struct sk_buff *skb = tcp_rtx_queue_head(sk); - u32 rto = inet_csk(sk)->icsk_rto; - u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto); + u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto); + + if (likely(skb)) { + u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto; + + return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp; + } else { + WARN_ONCE(1, + "rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n", + tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state); + return rto; + } - return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp; } /*
There have been multiple occassions where we have crashed in this path because packets_out suggested there were packets on the write or retransmit queues, but in fact there weren't leading to a NULL skb being dereferenced. While we should fix that root cause we should also just make sure the skb is not NULL before dereferencing it. Also add a warn once here to capture some information if/when the problem case is hit again. Signed-off-by: Josh Hunt <johunt@akamai.com> --- include/net/tcp.h | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)