[net-next] net: tcp: Define tcp_listendrop() as noinline

Message ID	20241231055207.9208-1-laoar.shao@gmail.com (mailing list archive)
State	Rejected
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9940A14A088 for <netdev@vger.kernel.org>; Tue, 31 Dec 2024 05:52:27 +0000 (UTC) From: Yafang Shao <laoar.shao@gmail.com> To: edumazet@google.com, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org Cc: netdev@vger.kernel.org, Yafang Shao <laoar.shao@gmail.com> Subject: [PATCH net-next] net: tcp: Define tcp_listendrop() as noinline Date: Tue, 31 Dec 2024 13:52:07 +0800 Message-Id: <20241231055207.9208-1-laoar.shao@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[net-next] net: tcp: Define tcp_listendrop() as noinline \| expand [net-next] net: tcp: Define tcp_listendrop() as noinline

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net-next, async
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/build_tools	success	Errors and warnings before: 0 (+23) this patch: 0 (+23)
netdev/cc_maintainers	success	CCed 6 of 6 maintainers
netdev/build_clang	success	Errors and warnings before: 541 this patch: 541
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 909 this patch: 909
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 39 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 2 this patch: 2
netdev/source_inline	success	Was 0 now: 0
netdev/contest	fail	net-next-2024-12-31--15-00 (tests: 881)

Yafang Shao Dec. 31, 2024, 5:52 a.m. UTC

The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
such as:
- A full SYN backlog queue
- SYN flood attacks
- Memory exhaustion
- Other resource constraints

Recently, one of our services experienced frequent ListenDrops. While
attempting to trace the root cause, we discovered that tracing the function
tcp_listendrop() was not straightforward because it is currently inlined.
To overcome this, we had to create a livepatch that defined a non-inlined
version of the function, which we then traced using BPF programs.

  $ grep tcp_listendrop /proc/kallsyms
  ffffffffc093fba0 t tcp_listendrop_x     [livepatch_tmp]

Through this process, we eventually determined that the ListenDrops were
caused by SYN attacks.

Since tcp_listendrop() is not part of the critical path, defining it as
noinline would make it significantly easier to trace and diagnose issues
without requiring workarounds such as livepatching. This function can be
used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().

After that change, the result is as follows,

  $ grep tcp_listendrop /proc/kallsyms
  ffffffffb718eaa0 T __pfx_tcp_listendrop
  ffffffffb718eab0 T tcp_listendrop
  ffffffffb7e636b8 r __ksymtab_tcp_listendrop

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/net/tcp.h    | 13 +------------
 net/ipv4/tcp_input.c | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 12 deletions(-)

Eric Dumazet Dec. 31, 2024, 9:26 a.m. UTC | #1

On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
> such as:
> - A full SYN backlog queue
> - SYN flood attacks
> - Memory exhaustion
> - Other resource constraints
>
> Recently, one of our services experienced frequent ListenDrops. While
> attempting to trace the root cause, we discovered that tracing the function
> tcp_listendrop() was not straightforward because it is currently inlined.
> To overcome this, we had to create a livepatch that defined a non-inlined
> version of the function, which we then traced using BPF programs.
>
>   $ grep tcp_listendrop /proc/kallsyms
>   ffffffffc093fba0 t tcp_listendrop_x     [livepatch_tmp]
>
> Through this process, we eventually determined that the ListenDrops were
> caused by SYN attacks.
>
> Since tcp_listendrop() is not part of the critical path, defining it as
> noinline would make it significantly easier to trace and diagnose issues
> without requiring workarounds such as livepatching. This function can be
> used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().
>
> After that change, the result is as follows,
>
>   $ grep tcp_listendrop /proc/kallsyms
>   ffffffffb718eaa0 T __pfx_tcp_listendrop
>   ffffffffb718eab0 T tcp_listendrop
>   ffffffffb7e636b8 r __ksymtab_tcp_listendrop
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> --

Are we going to accept one patch at a time to un-inline all TCP
related functions in the kernel ?

My understanding is that current tools work fine. You may need to upgrade yours.

# perf probe -a tcp_listendrop
Added new events:
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)
  probe:tcp_listendrop (on tcp_listendrop)

You can now use it in all perf tools, such as:

perf record -e probe:tcp_listendrop -aR sleep 1

Yafang Shao Dec. 31, 2024, 11:49 a.m. UTC | #2

On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
> > such as:
> > - A full SYN backlog queue
> > - SYN flood attacks
> > - Memory exhaustion
> > - Other resource constraints
> >
> > Recently, one of our services experienced frequent ListenDrops. While
> > attempting to trace the root cause, we discovered that tracing the function
> > tcp_listendrop() was not straightforward because it is currently inlined.
> > To overcome this, we had to create a livepatch that defined a non-inlined
> > version of the function, which we then traced using BPF programs.
> >
> >   $ grep tcp_listendrop /proc/kallsyms
> >   ffffffffc093fba0 t tcp_listendrop_x     [livepatch_tmp]
> >
> > Through this process, we eventually determined that the ListenDrops were
> > caused by SYN attacks.
> >
> > Since tcp_listendrop() is not part of the critical path, defining it as
> > noinline would make it significantly easier to trace and diagnose issues
> > without requiring workarounds such as livepatching. This function can be
> > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().
> >
> > After that change, the result is as follows,
> >
> >   $ grep tcp_listendrop /proc/kallsyms
> >   ffffffffb718eaa0 T __pfx_tcp_listendrop
> >   ffffffffb718eab0 T tcp_listendrop
> >   ffffffffb7e636b8 r __ksymtab_tcp_listendrop
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > --
>
> Are we going to accept one patch at a time to un-inline all TCP
> related functions in the kernel ?

I don't have an in-depth understanding of the TCP/IP stack, so I can't
consolidate all related TCP functions in the error paths into a single
patch. I would greatly appreciate it if you could help ensure these
functions are marked as non-inlined based on your expertise. If you
don't have time to do it directly, could you provide some guidance?

The inlining of TCP functions in error paths often complicates tracing
efforts. For instance, we recently encountered issues with the inlined
tcp_write_err(), although we were fortunate to have an alternative
tracepoint available: inet_sk_error_report.

Unfortunately, no such alternative exists for tcp_listendrop(), which
is why I submitted this patch.

>
> My understanding is that current tools work fine. You may need to upgrade yours.
>
> # perf probe -a tcp_listendrop
> Added new events:
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>   probe:tcp_listendrop (on tcp_listendrop)
>
> You can now use it in all perf tools, such as:
>
> perf record -e probe:tcp_listendrop -aR sleep 1

The issue is that we can't extract the struct `sock *sk` from
tcp_listendrop() using perf. Please correct me if I’m mistaken.

In a containerized environment, it's common to have many pods running
on a single host, each assigned a unique IP. Our goal is to filter for
a specific local_ip:local_port combination while also capturing the
corresponding remote IP. This task is straightforward with bpftrace or
other BPF-based tools, but I’m unsure how to accomplish it using perf.

Eric Dumazet Dec. 31, 2024, 12:34 p.m. UTC | #3

On Tue, Dec 31, 2024 at 12:49 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
> > > such as:
> > > - A full SYN backlog queue
> > > - SYN flood attacks
> > > - Memory exhaustion
> > > - Other resource constraints
> > >
> > > Recently, one of our services experienced frequent ListenDrops. While
> > > attempting to trace the root cause, we discovered that tracing the function
> > > tcp_listendrop() was not straightforward because it is currently inlined.
> > > To overcome this, we had to create a livepatch that defined a non-inlined
> > > version of the function, which we then traced using BPF programs.
> > >
> > >   $ grep tcp_listendrop /proc/kallsyms
> > >   ffffffffc093fba0 t tcp_listendrop_x     [livepatch_tmp]
> > >
> > > Through this process, we eventually determined that the ListenDrops were
> > > caused by SYN attacks.
> > >
> > > Since tcp_listendrop() is not part of the critical path, defining it as
> > > noinline would make it significantly easier to trace and diagnose issues
> > > without requiring workarounds such as livepatching. This function can be
> > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().
> > >
> > > After that change, the result is as follows,
> > >
> > >   $ grep tcp_listendrop /proc/kallsyms
> > >   ffffffffb718eaa0 T __pfx_tcp_listendrop
> > >   ffffffffb718eab0 T tcp_listendrop
> > >   ffffffffb7e636b8 r __ksymtab_tcp_listendrop
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > --
> >
> > Are we going to accept one patch at a time to un-inline all TCP
> > related functions in the kernel ?
>
> I don't have an in-depth understanding of the TCP/IP stack, so I can't
> consolidate all related TCP functions in the error paths into a single
> patch. I would greatly appreciate it if you could help ensure these
> functions are marked as non-inlined based on your expertise. If you
> don't have time to do it directly, could you provide some guidance?
>
> The inlining of TCP functions in error paths often complicates tracing
> efforts. For instance, we recently encountered issues with the inlined
> tcp_write_err(), although we were fortunate to have an alternative
> tracepoint available: inet_sk_error_report.
>
> Unfortunately, no such alternative exists for tcp_listendrop(), which
> is why I submitted this patch.
>
> >
> > My understanding is that current tools work fine. You may need to upgrade yours.
> >
> > # perf probe -a tcp_listendrop
> > Added new events:
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >   probe:tcp_listendrop (on tcp_listendrop)
> >
> > You can now use it in all perf tools, such as:
> >
> > perf record -e probe:tcp_listendrop -aR sleep 1
>
> The issue is that we can't extract the struct `sock *sk` from
> tcp_listendrop() using perf. Please correct me if I’m mistaken.
>
> In a containerized environment, it's common to have many pods running
> on a single host, each assigned a unique IP. Our goal is to filter for
> a specific local_ip:local_port combination while also capturing the
> corresponding remote IP. This task is straightforward with bpftrace or
> other BPF-based tools, but I’m unsure how to accomplish it using perf.

I suggest you ask tracing experts' opinions, why is perf able to add a
probe, but not bpftrace ?

Again, accepting thousands of patches just because of tracing tools
seems not good to me.

Yafang Shao Dec. 31, 2024, 1:05 p.m. UTC | #4

On Tue, Dec 31, 2024 at 8:34 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Dec 31, 2024 at 12:49 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
> > > > such as:
> > > > - A full SYN backlog queue
> > > > - SYN flood attacks
> > > > - Memory exhaustion
> > > > - Other resource constraints
> > > >
> > > > Recently, one of our services experienced frequent ListenDrops. While
> > > > attempting to trace the root cause, we discovered that tracing the function
> > > > tcp_listendrop() was not straightforward because it is currently inlined.
> > > > To overcome this, we had to create a livepatch that defined a non-inlined
> > > > version of the function, which we then traced using BPF programs.
> > > >
> > > >   $ grep tcp_listendrop /proc/kallsyms
> > > >   ffffffffc093fba0 t tcp_listendrop_x     [livepatch_tmp]
> > > >
> > > > Through this process, we eventually determined that the ListenDrops were
> > > > caused by SYN attacks.
> > > >
> > > > Since tcp_listendrop() is not part of the critical path, defining it as
> > > > noinline would make it significantly easier to trace and diagnose issues
> > > > without requiring workarounds such as livepatching. This function can be
> > > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().
> > > >
> > > > After that change, the result is as follows,
> > > >
> > > >   $ grep tcp_listendrop /proc/kallsyms
> > > >   ffffffffb718eaa0 T __pfx_tcp_listendrop
> > > >   ffffffffb718eab0 T tcp_listendrop
> > > >   ffffffffb7e636b8 r __ksymtab_tcp_listendrop
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > --
> > >
> > > Are we going to accept one patch at a time to un-inline all TCP
> > > related functions in the kernel ?
> >
> > I don't have an in-depth understanding of the TCP/IP stack, so I can't
> > consolidate all related TCP functions in the error paths into a single
> > patch. I would greatly appreciate it if you could help ensure these
> > functions are marked as non-inlined based on your expertise. If you
> > don't have time to do it directly, could you provide some guidance?
> >
> > The inlining of TCP functions in error paths often complicates tracing
> > efforts. For instance, we recently encountered issues with the inlined
> > tcp_write_err(), although we were fortunate to have an alternative
> > tracepoint available: inet_sk_error_report.
> >
> > Unfortunately, no such alternative exists for tcp_listendrop(), which
> > is why I submitted this patch.
> >
> > >
> > > My understanding is that current tools work fine. You may need to upgrade yours.
> > >
> > > # perf probe -a tcp_listendrop
> > > Added new events:
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >
> > > You can now use it in all perf tools, such as:
> > >
> > > perf record -e probe:tcp_listendrop -aR sleep 1
> >
> > The issue is that we can't extract the struct `sock *sk` from
> > tcp_listendrop() using perf. Please correct me if I’m mistaken.
> >
> > In a containerized environment, it's common to have many pods running
> > on a single host, each assigned a unique IP. Our goal is to filter for
> > a specific local_ip:local_port combination while also capturing the
> > corresponding remote IP. This task is straightforward with bpftrace or
> > other BPF-based tools, but I’m unsure how to accomplish it using perf.
>
> I suggest you ask tracing experts' opinions, why is perf able to add a
> probe, but not bpftrace ?
>
> Again, accepting thousands of patches just because of tracing tools
> seems not good to me.

Thank you for your guidance on using perf. I now understand how to
achieve similar tracing with bpftrace.

I agree with your point that it’s not ideal to send thousands of
patches solely for the sake of tracing tools. It would indeed be much
better if this could be addressed in a single patch.

[net-next] net: tcp: Define tcp_listendrop() as noinline

Checks

Commit Message

Comments

Patch