[net,1/1] tcp: check skb is non-NULL in tcp_rto_delta_us()

Message ID	20240823021333.1252272-2-johunt@akamai.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [67.231.149.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B1B4171C2; Fri, 23 Aug 2024 03:01:56 +0000 (UTC) From: Josh Hunt <johunt@akamai.com> To: <edumazet@google.com>, <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>, <netdev@vger.kernel.org> CC: <linux-kernel@vger.kernel.org>, <johunt@akamai.com> Subject: [PATCH net 1/1] tcp: check skb is non-NULL in tcp_rto_delta_us() Date: Thu, 22 Aug 2024 22:13:33 -0400 Message-ID: <20240823021333.1252272-2-johunt@akamai.com> In-Reply-To: <20240823021333.1252272-1-johunt@akamai.com> References: <20240823021333.1252272-1-johunt@akamai.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	NULL ptr dereference in tcp_rearm_rto \| expand [net,0/1] NULL ptr dereference in tcp_rearm_rto [net,1/1] tcp: check skb is non-NULL in tcp_rto_delta_us()

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net, async
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	fail	Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 23 this patch: 23
netdev/build_tools	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	success	CCed 4 of 4 maintainers
netdev/build_clang	success	Errors and warnings before: 28 this patch: 28
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 927 this patch: 927
netdev/checkpatch	warning	CHECK: Alignment should match open parenthesis WARNING: else is not generally useful after a break or return WARNING: line length of 89 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 2 this patch: 2
netdev/source_inline	success	Was 0 now: 0

Josh Hunt Aug. 23, 2024, 2:13 a.m. UTC

There have been multiple occassions where we have crashed in this path
because packets_out suggested there were packets on the write or retransmit
queues, but in fact there weren't leading to a NULL skb being dereferenced.
While we should fix that root cause we should also just make sure the skb
is not NULL before dereferencing it. Also add a warn once here to capture
some information if/when the problem case is hit again.

Signed-off-by: Josh Hunt <johunt@akamai.com>
---
 include/net/tcp.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Jason Xing Aug. 23, 2024, 3:27 a.m. UTC | #1

Hello Josh,

On Fri, Aug 23, 2024 at 11:02 AM Josh Hunt <johunt@akamai.com> wrote:
>
> There have been multiple occassions where we have crashed in this path
> because packets_out suggested there were packets on the write or retransmit
> queues, but in fact there weren't leading to a NULL skb being dereferenced.

Could you show us the detailed splats and more information about it so
that we can know what exactly happened?

Thanks,
Jason

Josh Hunt Aug. 23, 2024, 3:33 a.m. UTC | #2

On 8/22/24 8:27 PM, Jason Xing wrote:
> 
> Hello Josh,
> 
> On Fri, Aug 23, 2024 at 11:02 AM Josh Hunt <johunt@akamai.com> wrote:
>>
>> There have been multiple occassions where we have crashed in this path
>> because packets_out suggested there were packets on the write or retransmit
>> queues, but in fact there weren't leading to a NULL skb being dereferenced.
> 
> Could you show us the detailed splats and more information about it so
> that we can know what exactly happened?

Hey Jason

Yeah for some reason my cover letter did not come through which has the 
oops info that we hit. I'll resend it now. Fingers crossed it goes 
through this time :)

Josh

Josh Hunt Aug. 23, 2024, 4:21 a.m. UTC | #3

On 8/22/24 8:33 PM, Josh Hunt wrote:
> On 8/22/24 8:27 PM, Jason Xing wrote:
>>
>> Hello Josh,
>>
>> On Fri, Aug 23, 2024 at 11:02 AM Josh Hunt <johunt@akamai.com> wrote:
>>>
>>> There have been multiple occassions where we have crashed in this path
>>> because packets_out suggested there were packets on the write or 
>>> retransmit
>>> queues, but in fact there weren't leading to a NULL skb being 
>>> dereferenced.
>>
>> Could you show us the detailed splats and more information about it so
>> that we can know what exactly happened?
> 
> Hey Jason
> 
> Yeah for some reason my cover letter did not come through which has the 
> oops info that we hit. I'll resend it now. Fingers crossed it goes 
> through this time :)
> 
> Josh
Seems like our mail server is block the cover letter for some reason 
right now. I'll have to figure out why tomorrow. I filed a bug with 
Ubuntu as well as sending this patch upstream b/c the kernel we're 
running is a stock Ubuntu kernel. The bug report there has most of what 
I put in the cover letter:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657

Josh

Eric Dumazet Aug. 23, 2024, 6:42 a.m. UTC | #4

On Fri, Aug 23, 2024 at 4:14 AM Josh Hunt <johunt@akamai.com> wrote:
>
> There have been multiple occassions where we have crashed in this path
> because packets_out suggested there were packets on the write or retransmit
> queues, but in fact there weren't leading to a NULL skb being dereferenced.
> While we should fix that root cause we should also just make sure the skb
> is not NULL before dereferencing it. Also add a warn once here to capture
> some information if/when the problem case is hit again.
>
> Signed-off-by: Josh Hunt <johunt@akamai.com>
> ---
>  include/net/tcp.h | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2aac11e7e1cc..19ea6ed87880 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2433,10 +2433,19 @@ void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb);
>  static inline s64 tcp_rto_delta_us(const struct sock *sk)
>  {
>         const struct sk_buff *skb = tcp_rtx_queue_head(sk);
> -       u32 rto = inet_csk(sk)->icsk_rto;
> -       u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto);
> +       u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto);
> +
> +       if (likely(skb)) {
> +               u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto;
> +
> +               return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
> +       } else {
> +               WARN_ONCE(1,
> +                       "rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n",
> +                       tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state);
> +               return rto;
> +       }
>
> -       return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
>  }
>
>  /*
> --
> 2.34.1
>

Are you using a recent linux kernel version ?

I am asking because sometimes patches are submitted while the authors
are using very old kernels,
and they do not state this clearly.

I have never seen such a state.

Please CC Neal on your next submission.

Jason Xing Aug. 23, 2024, 7:36 a.m. UTC | #5

On Fri, Aug 23, 2024 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Aug 23, 2024 at 4:14 AM Josh Hunt <johunt@akamai.com> wrote:
> >
> > There have been multiple occassions where we have crashed in this path
> > because packets_out suggested there were packets on the write or retransmit
> > queues, but in fact there weren't leading to a NULL skb being dereferenced.
> > While we should fix that root cause we should also just make sure the skb
> > is not NULL before dereferencing it. Also add a warn once here to capture
> > some information if/when the problem case is hit again.
> >
> > Signed-off-by: Josh Hunt <johunt@akamai.com>
> > ---
> >  include/net/tcp.h | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 2aac11e7e1cc..19ea6ed87880 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -2433,10 +2433,19 @@ void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb);
> >  static inline s64 tcp_rto_delta_us(const struct sock *sk)
> >  {
> >         const struct sk_buff *skb = tcp_rtx_queue_head(sk);
> > -       u32 rto = inet_csk(sk)->icsk_rto;
> > -       u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto);
> > +       u32 rto = jiffies_to_usecs(inet_csk(sk)->icsk_rto);
> > +
> > +       if (likely(skb)) {
> > +               u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + rto;
> > +
> > +               return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
> > +       } else {
> > +               WARN_ONCE(1,
> > +                       "rtx queue emtpy: inflight %u tlp_high_seq %u state %u\n",
> > +                       tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state);
> > +               return rto;
> > +       }
> >
> > -       return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
> >  }
> >
> >  /*
> > --
> > 2.34.1
> >
>
> Are you using a recent linux kernel version ?

I'm afraid that he doesn't. What that link shows to me is the
5.4.0-174-generic ubuntu kernel.

[net,1/1] tcp: check skb is non-NULL in tcp_rto_delta_us()

Checks

Commit Message

Comments

Patch