diff mbox series

[bpf-next,v8,11/12] bpf: add a new callback in tcp_tx_timestamp()

Message ID 20250204183024.87508-12-kerneljasonxing@gmail.com (mailing list archive)
State New
Delegated to: BPF
Headers show
Series net-timestamp: bpf extension to equip applications transparently | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-11 success Logs for aarch64-gcc / veristat-meta
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-18 success Logs for s390x-gcc / veristat-meta
bpf/vmtest-bpf-next-VM_Test-19 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-gcc / veristat-kernel / x86_64-gcc veristat_kernel
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-gcc / veristat-meta / x86_64-gcc veristat_meta
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-17 / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-17 / veristat-meta
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-43 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-44 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-45 success Logs for x86_64-llvm-18 / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-46 success Logs for x86_64-llvm-18 / veristat-meta
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 193 this patch: 193
netdev/build_tools success Errors and warnings before: 27 (+1) this patch: 27 (+1)
netdev/cc_maintainers warning 1 maintainers not CCed: ncardwell@google.com
netdev/build_clang success Errors and warnings before: 9361 this patch: 9361
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 7017 this patch: 7017
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 33 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Xing Feb. 4, 2025, 6:30 p.m. UTC
Introduce the callback to correlate tcp_sendmsg timestamp with other
points, like SND/SW/ACK. let bpf prog trace the beginning of
tcp_sendmsg_locked() and then store the sendmsg timestamp at
the bpf_sk_storage, so that in tcp_tx_timestamp() we can correlate
the timestamp with tskey which can be found in other sending points.

More details can be found in the selftest:
The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp
(i.e. BPF_SOCK_OPS_TS_SND_CB added in this patch).

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 include/uapi/linux/bpf.h       | 7 +++++++
 net/ipv4/tcp.c                 | 1 +
 tools/include/uapi/linux/bpf.h | 7 +++++++
 3 files changed, 15 insertions(+)

Comments

Jason Xing Feb. 5, 2025, 5:28 a.m. UTC | #1
On Wed, Feb 5, 2025 at 2:31 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Introduce the callback to correlate tcp_sendmsg timestamp with other
> points, like SND/SW/ACK. let bpf prog trace the beginning of
> tcp_sendmsg_locked() and then store the sendmsg timestamp at
> the bpf_sk_storage, so that in tcp_tx_timestamp() we can correlate
> the timestamp with tskey which can be found in other sending points.
>
> More details can be found in the selftest:
> The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp
> (i.e. BPF_SOCK_OPS_TS_SND_CB added in this patch).
>
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
>  include/uapi/linux/bpf.h       | 7 +++++++
>  net/ipv4/tcp.c                 | 1 +
>  tools/include/uapi/linux/bpf.h | 7 +++++++
>  3 files changed, 15 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 800122a8abe5..accb3b314fff 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7052,6 +7052,13 @@ enum {
>                                          * when SK_BPF_CB_TX_TIMESTAMPING
>                                          * feature is on.
>                                          */
> +       BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> +                                        * is triggered. For TCP, it stays
> +                                        * in the last send process to
> +                                        * correlate with tcp_sendmsg timestamp
> +                                        * with other timestamping callbacks,
> +                                        * like SND/SW/ACK.
> +                                        */
>  };

In case the use of the new flag is buried in many threads, I decide to
rephrase here to manifest how UDP would use it:
1. introduce a field ts_opt_id_bpf which works like ts_opt_id[1] to allow
the bpf program to fully take control of the management of tskey.
2. use fentry hook udp_sendmsg(), and introduce a callback function
like BPF_SOCK_OPS_TIMEOUT_INIT in kernel to initialize the
ts_opt_id_bpf with tskey that bpf prog generates. We can directly use
BPF_SOCK_OPS_TS_SND_CB.
3. modify the SCM_TS_OPT_ID logic to support bpf extension so that the
newly added field ts_opt_id_bpf can be passed to the
skb_shinfo(skb)->tskey in __ip_append_data().

In this way, this approach can also be extended for other protocols.

[1]
commit 4aecca4c76808f3736056d18ff510df80424bc9f
Author: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Date:   Tue Oct 1 05:57:14 2024 -0700

    net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message

    SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
    timestamps and packets sent via socket. Unfortunately, there is no way
    to reliably predict socket timestamp ID value in case of error returned
    by sendmsg. For UDP sockets it's impossible because of lockless
    nature of UDP transmit, several threads may send packets in parallel. In
    case of RAW sockets MSG_MORE option makes things complicated. More
    details are in the conversation [1].
    This patch adds new control message type to give user-space
    software an opportunity to control the mapping between packets and
    values by providing ID with each sendmsg for UDP sockets.
    The documentation is also added in this patch.

    [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/

Thanks,
Jason

>
>  /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3df802410ebf..a2ac57543b6d 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -501,6 +501,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>                 tcb->txstamp_ack_bpf = 1;
>                 shinfo->tx_flags |= SKBTX_BPF;
>                 shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> +               bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB);
>         }
>  }
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 06e68d772989..384502996cdd 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7045,6 +7045,13 @@ enum {
>                                          * when SK_BPF_CB_TX_TIMESTAMPING
>                                          * feature is on.
>                                          */
> +       BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> +                                        * is triggered. For TCP, it stays
> +                                        * in the last send process to
> +                                        * correlate with tcp_sendmsg timestamp
> +                                        * with other timestamping callbacks,
> +                                        * like SND/SW/ACK.
> +                                        */
>  };
>
>  /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> --
> 2.43.5
>
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 800122a8abe5..accb3b314fff 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7052,6 +7052,13 @@  enum {
 					 * when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_SND_CB,		/* Called when every sendmsg syscall
+					 * is triggered. For TCP, it stays
+					 * in the last send process to
+					 * correlate with tcp_sendmsg timestamp
+					 * with other timestamping callbacks,
+					 * like SND/SW/ACK.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3df802410ebf..a2ac57543b6d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -501,6 +501,7 @@  static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
 		tcb->txstamp_ack_bpf = 1;
 		shinfo->tx_flags |= SKBTX_BPF;
 		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+		bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB);
 	}
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 06e68d772989..384502996cdd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7045,6 +7045,13 @@  enum {
 					 * when SK_BPF_CB_TX_TIMESTAMPING
 					 * feature is on.
 					 */
+	BPF_SOCK_OPS_TS_SND_CB,		/* Called when every sendmsg syscall
+					 * is triggered. For TCP, it stays
+					 * in the last send process to
+					 * correlate with tcp_sendmsg timestamp
+					 * with other timestamping callbacks,
+					 * like SND/SW/ACK.
+					 */
 };
 
 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect