diff mbox series

[net,1/1] tcp: check skb is non-NULL in tcp_rto_delta_us()

Message ID 20240823033444.1257321-2-johunt@akamai.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series NULL ptr dereference in tcp_rearm_rto | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 23 this patch: 23
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 28 this patch: 28
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 927 this patch: 927
netdev/checkpatch warning CHECK: Alignment should match open parenthesis WARNING: else is not generally useful after a break or return WARNING: line length of 89 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-25--21-00 (tests: 714)

Commit Message

Josh Hunt Aug. 23, 2024, 3:34 a.m. UTC
There have been multiple occassions where we have crashed in this path
because packets_out suggested there were packets on the write or retransmit
queues, but in fact there weren't leading to a NULL skb being dereferenced.
While we should fix that root cause we should also just make sure the skb
is not NULL before dereferencing it. Also add a warn once here to capture
some information if/when the problem case is hit again.

Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 include/net/tcp.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Aug. 23, 2024, 6:55 a.m. UTC | #1
On Fri, Aug 23, 2024 at 5:34 AM Josh Hunt <johunt@akamai.com> wrote:
>
> There have been multiple occassions where we have crashed in this path
> because packets_out suggested there were packets on the write or retransmit
> queues, but in fact there weren't leading to a NULL skb being dereferenced.
> While we should fix that root cause we should also just make sure the skb
> is not NULL before dereferencing it. Also add a warn once here to capture
> some information if/when the problem case is hit again.
>
> Signed-off-by: Josh Hunt <johunt@akamai.com>

Hi Josh

We do not want a patch series of one patch, with the stack trace in
the cover letter.
Please send a standalone patch, with all the information in its changelog.

1) Add Neal Cardwell in the CC list.

2) Are you using TCP_REPAIR by any chance ?

3) Please double check your kernel has these fixes.

commit 1f85e6267caca44b30c54711652b0726fadbb131    tcp: do not send
empty skb from tcp_write_xmit()
commit 0c175da7b0378445f5ef53904247cfbfb87e0b78     tcp: prohibit
TCP_REPAIR_OPTIONS if data was already sent


Thanks.
Josh Hunt Aug. 23, 2024, 7:43 p.m. UTC | #2
On 8/22/24 11:55 PM, Eric Dumazet wrote:
> !-------------------------------------------------------------------|
>    This Message Is From an External Sender
>    This message came from outside your organization.
> |-------------------------------------------------------------------!
> 
> On Fri, Aug 23, 2024 at 5:34 AM Josh Hunt <johunt@akamai.com> wrote:
>>
>> There have been multiple occassions where we have crashed in this path
>> because packets_out suggested there were packets on the write or retransmit
>> queues, but in fact there weren't leading to a NULL skb being dereferenced.
>> While we should fix that root cause we should also just make sure the skb
>> is not NULL before dereferencing it. Also add a warn once here to capture
>> some information if/when the problem case is hit again.
>>
>> Signed-off-by: Josh Hunt <johunt@akamai.com>
> 
> Hi Josh
> 
> We do not want a patch series of one patch, with the stack trace in
> the cover letter.
> Please send a standalone patch, with all the information in its changelog.
> 
> 1) Add Neal Cardwell in the CC list.
> 
> 2) Are you using TCP_REPAIR by any chance ?
> 
> 3) Please double check your kernel has these fixes.
> 
> commit 1f85e6267caca44b30c54711652b0726fadbb131    tcp: do not send
> empty skb from tcp_write_xmit()
> commit 0c175da7b0378445f5ef53904247cfbfb87e0b78     tcp: prohibit
> TCP_REPAIR_OPTIONS if data was already sent
> 

Thanks Eric. I will resend and also check the commits you mentioned. I 
didn't include the writeup in the patch submission b/c it was rather 
long and detailed, but will include it in a v2.

Josh
Jakub Kicinski Aug. 26, 2024, 8:32 p.m. UTC | #3
On Fri, 23 Aug 2024 12:43:36 -0700 Josh Hunt wrote:
> Thanks Eric. I will resend and also check the commits you mentioned. I 
> didn't include the writeup in the patch submission b/c it was rather 
> long and detailed, but will include it in a v2.

FWIW in linux networking we use the cover letter as the merge message,
no matter how many patches there are. So both end up in git logs.
Josh Hunt Sept. 6, 2024, 11:16 p.m. UTC | #4
On 8/22/24 11:55 PM, Eric Dumazet wrote:
> On Fri, Aug 23, 2024 at 5:34 AM Josh Hunt <johunt@akamai.com> wrote:
>>
>> There have been multiple occassions where we have crashed in this path
>> because packets_out suggested there were packets on the write or retransmit
>> queues, but in fact there weren't leading to a NULL skb being dereferenced.
>> While we should fix that root cause we should also just make sure the skb
>> is not NULL before dereferencing it. Also add a warn once here to capture
>> some information if/when the problem case is hit again.
>>
>> Signed-off-by: Josh Hunt <johunt@akamai.com>
> 
> Hi Josh
> 
> We do not want a patch series of one patch, with the stack trace in
> the cover letter.
> Please send a standalone patch, with all the information in its changelog.
> 
> 1) Add Neal Cardwell in the CC list.

Sending v2 now with Neal included.

> 
> 2) Are you using TCP_REPAIR by any chance ?
> 

No, we're not using TCP_REPAIR on these machines.

> 3) Please double check your kernel has these fixes.
> 
> commit 1f85e6267caca44b30c54711652b0726fadbb131    tcp: do not send
> empty skb from tcp_write_xmit()
> commit 0c175da7b0378445f5ef53904247cfbfb87e0b78     tcp: prohibit
> TCP_REPAIR_OPTIONS if data was already sent
> 

We have the first commit, but not the second.

Josh
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2aac11e7e1cc..19ea6ed87880 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2433,10 +2433,19 @@  void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb);
 static inline s64 tcp_rto_delta_us(const struct sock *sk)
 {
 	const struct sk_buff *skb = tcp_rtx_queue_head(sk);
-	u32 rto = inet_csk(sk)->icsk_rto;
-	u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto);
+	u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto);
+
+	if (likely(skb)) {
+		u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto;
+
+		return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+	} else {
+		WARN_ONCE(1,
+			"rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n",
+			tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state);
+		return rto;
+	}
 
-	return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
 }
 
 /*