diff mbox series

[net,1/1] tcp: check skb is non-NULL in tcp_rto_delta_us()

Message ID 20240823021333.1252272-2-johunt@akamai.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series NULL ptr dereference in tcp_rearm_rto | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 23 this patch: 23
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 28 this patch: 28
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 927 this patch: 927
netdev/checkpatch warning CHECK: Alignment should match open parenthesis WARNING: else is not generally useful after a break or return WARNING: line length of 89 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0

Commit Message

Josh Hunt Aug. 23, 2024, 2:13 a.m. UTC
There have been multiple occassions where we have crashed in this path
because packets_out suggested there were packets on the write or retransmit
queues, but in fact there weren't leading to a NULL skb being dereferenced.
While we should fix that root cause we should also just make sure the skb
is not NULL before dereferencing it. Also add a warn once here to capture
some information if/when the problem case is hit again.

Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 include/net/tcp.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Jason Xing Aug. 23, 2024, 3:27 a.m. UTC | #1
Hello Josh,

On Fri, Aug 23, 2024 at 11:02 AM Josh Hunt <johunt@akamai.com> wrote:
>
> There have been multiple occassions where we have crashed in this path
> because packets_out suggested there were packets on the write or retransmit
> queues, but in fact there weren't leading to a NULL skb being dereferenced.

Could you show us the detailed splats and more information about it so
that we can know what exactly happened?

Thanks,
Jason
Josh Hunt Aug. 23, 2024, 3:33 a.m. UTC | #2
On 8/22/24 8:27 PM, Jason Xing wrote:
> 
> Hello Josh,
> 
> On Fri, Aug 23, 2024 at 11:02 AM Josh Hunt <johunt@akamai.com> wrote:
>>
>> There have been multiple occassions where we have crashed in this path
>> because packets_out suggested there were packets on the write or retransmit
>> queues, but in fact there weren't leading to a NULL skb being dereferenced.
> 
> Could you show us the detailed splats and more information about it so
> that we can know what exactly happened?

Hey Jason

Yeah for some reason my cover letter did not come through which has the 
oops info that we hit. I'll resend it now. Fingers crossed it goes 
through this time :)

Josh
Josh Hunt Aug. 23, 2024, 4:21 a.m. UTC | #3
On 8/22/24 8:33 PM, Josh Hunt wrote:
> On 8/22/24 8:27 PM, Jason Xing wrote:
>>
>> Hello Josh,
>>
>> On Fri, Aug 23, 2024 at 11:02 AM Josh Hunt <johunt@akamai.com> wrote:
>>>
>>> There have been multiple occassions where we have crashed in this path
>>> because packets_out suggested there were packets on the write or 
>>> retransmit
>>> queues, but in fact there weren't leading to a NULL skb being 
>>> dereferenced.
>>
>> Could you show us the detailed splats and more information about it so
>> that we can know what exactly happened?
> 
> Hey Jason
> 
> Yeah for some reason my cover letter did not come through which has the 
> oops info that we hit. I'll resend it now. Fingers crossed it goes 
> through this time :)
> 
> Josh
Seems like our mail server is block the cover letter for some reason 
right now. I'll have to figure out why tomorrow. I filed a bug with 
Ubuntu as well as sending this patch upstream b/c the kernel we're 
running is a stock Ubuntu kernel. The bug report there has most of what 
I put in the cover letter:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657

Josh
Eric Dumazet Aug. 23, 2024, 6:42 a.m. UTC | #4
On Fri, Aug 23, 2024 at 4:14 AM Josh Hunt <johunt@akamai.com> wrote:
>
> There have been multiple occassions where we have crashed in this path
> because packets_out suggested there were packets on the write or retransmit
> queues, but in fact there weren't leading to a NULL skb being dereferenced.
> While we should fix that root cause we should also just make sure the skb
> is not NULL before dereferencing it. Also add a warn once here to capture
> some information if/when the problem case is hit again.
>
> Signed-off-by: Josh Hunt <johunt@akamai.com>
> ---
>  include/net/tcp.h | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2aac11e7e1cc..19ea6ed87880 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2433,10 +2433,19 @@ void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb);
>  static inline s64 tcp_rto_delta_us(const struct sock *sk)
>  {
>         const struct sk_buff *skb = tcp_rtx_queue_head(sk);
> -       u32 rto = inet_csk(sk)->icsk_rto;
> -       u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto);
> +       u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto);
> +
> +       if (likely(skb)) {
> +               u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto;
> +
> +               return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
> +       } else {
> +               WARN_ONCE(1,
> +                       "rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n",
> +                       tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state);
> +               return rto;
> +       }
>
> -       return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
>  }
>
>  /*
> --
> 2.34.1
>

Are you using a recent linux kernel version ?

I am asking because sometimes patches are submitted while the authors
are using very old kernels,
and they do not state this clearly.

I have never seen such a state.

Please CC Neal on your next submission.
Jason Xing Aug. 23, 2024, 7:36 a.m. UTC | #5
On Fri, Aug 23, 2024 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Aug 23, 2024 at 4:14 AM Josh Hunt <johunt@akamai.com> wrote:
> >
> > There have been multiple occassions where we have crashed in this path
> > because packets_out suggested there were packets on the write or retransmit
> > queues, but in fact there weren't leading to a NULL skb being dereferenced.
> > While we should fix that root cause we should also just make sure the skb
> > is not NULL before dereferencing it. Also add a warn once here to capture
> > some information if/when the problem case is hit again.
> >
> > Signed-off-by: Josh Hunt <johunt@akamai.com>
> > ---
> >  include/net/tcp.h | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 2aac11e7e1cc..19ea6ed87880 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -2433,10 +2433,19 @@ void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb);
> >  static inline s64 tcp_rto_delta_us(const struct sock *sk)
> >  {
> >         const struct sk_buff *skb = tcp_rtx_queue_head(sk);
> > -       u32 rto = inet_csk(sk)->icsk_rto;
> > -       u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto);
> > +       u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto);
> > +
> > +       if (likely(skb)) {
> > +               u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto;
> > +
> > +               return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
> > +       } else {
> > +               WARN_ONCE(1,
> > +                       "rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n",
> > +                       tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state);
> > +               return rto;
> > +       }
> >
> > -       return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
> >  }
> >
> >  /*
> > --
> > 2.34.1
> >
>
> Are you using a recent linux kernel version ?

I'm afraid that he doesn't. What that link shows to me is the
5.4.0-174-generic ubuntu kernel.
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2aac11e7e1cc..19ea6ed87880 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2433,10 +2433,19 @@  void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb);
 static inline s64 tcp_rto_delta_us(const struct sock *sk)
 {
 	const struct sk_buff *skb = tcp_rtx_queue_head(sk);
-	u32 rto = inet_csk(sk)->icsk_rto;
-	u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto);
+	u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto);
+
+	if (likely(skb)) {
+		u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto;
+
+		return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+	} else {
+		WARN_ONCE(1,
+			"rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n",
+			tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state);
+		return rto;
+	}
 
-	return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
 }
 
 /*