Message ID | 20250208103220.72294-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net-timestamp: bpf extension to equip applications transparently | expand |
On 2/8/25 2:32 AM, Jason Xing wrote: > "Timestamping is key to debugging network stack latency. With > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be > network issues can be attributed to the kernel." This is extracted > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring" > addressed by Willem de Bruijn at netdevconf 0x17). > > There are a few areas that need optimization with the consideration of > easier use and less performance impact, which I highlighted and mainly > discussed at netconf 2024 with Willem de Bruijn and John Fastabend: > uAPI compatibility, extra system call overhead, and the need for > application modification. I initially managed to solve these issues > by writing a kernel module that hooks various key functions. However, > this approach is not suitable for the next kernel release. Therefore, > a BPF extension was proposed. During recent period, Martin KaFai Lau > provides invaluable suggestions about BPF along the way. Many thanks > here! > > In this series, only support foundamental codes and tx for TCP. typo: fundamental.... This had been brought up before (in v7?). By fundamental, I suspect you meant (?) bpf timestamping infrastructure, like: "This series adds the BPF networking timestamping infrastructure. This series also adds TX timestamping support for TCP. The RX timestamping and UDP support will be added in the future." > This approach mostly relies on existing SO_TIMESTAMPING feature, users It reuses most of the tx timestamping callback that is currently enabled by the SO_TIMESTAMPING. However, I don't think there is a lot of overlap in term of the SO_TIMESTAMPING api which does feel like API reuse when first reading this comment. > only needs to pass certain flags through bpf_setsocktopt() to a separate > tsflags. Please see the last selftest patch in this series. > > --- > v8 > Link: https://lore.kernel.org/all/20250128084620.57547-1-kerneljasonxing@gmail.com/ > 1. adjust some commit messages and titles > 2. add sk cookie in selftests > 3. handle the NULL pointer in hwstamp > 4. use kfunc to do selective sampling > > v7 > Link: https://lore.kernel.org/all/20250121012901.87763-1-kerneljasonxing@gmail.com/ > 1. target bpf-next tree > 2. simplely and directly stop timestamping callbacks calling a few BPF > CALLS due to safety concern. > 3. add more new testcases and adjust the existing testcases > 4. revise some comments of new timestamping callbacks > 5. remove a few BPF CGROUP locks > > RFC v6 > In the meantime, any suggestions and reviews are welcome! > Link: https://lore.kernel.org/all/20250112113748.73504-1-kerneljasonxing@gmail.com/ > 1. handle those safety problem by using the correct method. > 2. support bpf_getsockopt. > 3. adjust the position of BPF_SOCK_OPS_TS_TCP_SND_CB > 4. fix mishandling the hardware timestamp error > 5. add more corresponding tests > > v5 > Link: https://lore.kernel.org/all/20241207173803.90744-1-kerneljasonxing@gmail.com/ > 1. handle the safety issus when someone tries to call unrelated bpf > helpers. > 2. avoid adding direct function call in the hot path like > __dev_queue_xmit() > 3. remove reporting the hardware timestamp and tskey since they can be > fetched through the existing helper with the help of > bpf_skops_init_skb(), please see the selftest. > 4. add new sendmsg callback in tcp_sendmsg, and introduce tskey_bpf used > by bpf program to correlate tcp_sendmsg with other hook points in patch [13/15]. > > v4 > Link: https://lore.kernel.org/all/20241028110535.82999-1-kerneljasonxing@gmail.com/ > 1. introduce sk->sk_bpf_cb_flags to let user use bpf_setsockopt() (Martin) > 2. introduce SKBTX_BPF to enable the bpf SO_TIMESTAMPING feature (Martin) > 3. introduce bpf map in tests (Martin) > 4. I choose to make this series as simple as possible, so I only support > most cases in the tx path for TCP protocol. > > v3 > Link: https://lore.kernel.org/all/20241012040651.95616-1-kerneljasonxing@gmail.com/ > 1. support UDP proto by introducing a new generation point. > 2. for OPT_ID, introducing sk_tskey_bpf_offset to compute the delta > between the current socket key and bpf socket key. It is desiged for > UDP, which also applies to TCP. > 3. support bpf_getsockopt() > 4. use cgroup static key instead. > 5. add one simple bpf selftest to show how it can be used. > 6. remove the rx support from v2 because the number of patches could > exceed the limit of one series. > > V2 > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > 1. Introduce tsflag requestors so that we are able to extend more in the > future. Besides, it enables TX flags for bpf extension feature separately > without breaking users. It is suggested by Vadim Fedorenko. > 2. introduce a static key to control the whole feature. (Willem) > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > some TX/RX cases, not all the cases. > > Jason Xing (12): > bpf: add support for bpf_setsockopt() > bpf: prepare for timestamping callbacks use > bpf: stop unsafely accessing TCP fields in bpf callbacks > bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks > net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING > bpf: support SCM_TSTAMP_SCHED of SO_TIMESTAMPING > bpf: support sw SCM_TSTAMP_SND of SO_TIMESTAMPING > bpf: support hw SCM_TSTAMP_SND of SO_TIMESTAMPING > bpf: support SCM_TSTAMP_ACK of SO_TIMESTAMPING > bpf: add a new callback in tcp_tx_timestamp() > bpf: support selective sampling for bpf timestamping > selftests/bpf: add simple bpf tests in the tx path for timestamping > feature > > include/linux/filter.h | 1 + > include/linux/skbuff.h | 12 +- > include/net/sock.h | 10 + > include/net/tcp.h | 5 +- > include/uapi/linux/bpf.h | 30 ++ > kernel/bpf/btf.c | 1 + > net/core/dev.c | 3 +- > net/core/filter.c | 75 ++++- > net/core/skbuff.c | 48 ++- > net/core/sock.c | 15 + > net/dsa/user.c | 2 +- > net/ipv4/tcp.c | 4 + > net/ipv4/tcp_input.c | 2 + > net/ipv4/tcp_output.c | 2 + > net/socket.c | 2 +- > tools/include/uapi/linux/bpf.h | 23 ++ > .../bpf/prog_tests/so_timestamping.c | 79 +++++ > .../selftests/bpf/progs/so_timestamping.c | 312 ++++++++++++++++++ > 18 files changed, 612 insertions(+), 14 deletions(-) > create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c > create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c >
On Tue, Feb 11, 2025 at 7:37 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 2/8/25 2:32 AM, Jason Xing wrote: > > "Timestamping is key to debugging network stack latency. With > > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be > > network issues can be attributed to the kernel." This is extracted > > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring" > > addressed by Willem de Bruijn at netdevconf 0x17). > > > > There are a few areas that need optimization with the consideration of > > easier use and less performance impact, which I highlighted and mainly > > discussed at netconf 2024 with Willem de Bruijn and John Fastabend: > > uAPI compatibility, extra system call overhead, and the need for > > application modification. I initially managed to solve these issues > > by writing a kernel module that hooks various key functions. However, > > this approach is not suitable for the next kernel release. Therefore, > > a BPF extension was proposed. During recent period, Martin KaFai Lau > > provides invaluable suggestions about BPF along the way. Many thanks > > here! > > > > In this series, only support foundamental codes and tx for TCP. > > typo: fundamental.... This had been brought up before (in v7?). Oh, right! > > By fundamental, I suspect you meant (?) bpf timestamping infrastructure, like: > "This series adds the BPF networking timestamping infrastructure. This series > also adds TX timestamping support for TCP. The RX timestamping and UDP support > will be added in the future." Right! > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > It reuses most of the tx timestamping callback that is currently enabled by the > SO_TIMESTAMPING. However, I don't think there is a lot of overlap in term of the > SO_TIMESTAMPING api which does feel like API reuse when first reading this comment. I'm going to refine them. Thanks for the review! Thanks, Jason > > > only needs to pass certain flags through bpf_setsocktopt() to a separate > > tsflags. Please see the last selftest patch in this series. > > > > --- > > v8 > > Link: https://lore.kernel.org/all/20250128084620.57547-1-kerneljasonxing@gmail.com/ > > 1. adjust some commit messages and titles > > 2. add sk cookie in selftests > > 3. handle the NULL pointer in hwstamp > > 4. use kfunc to do selective sampling > > > > v7 > > Link: https://lore.kernel.org/all/20250121012901.87763-1-kerneljasonxing@gmail.com/ > > 1. target bpf-next tree > > 2. simplely and directly stop timestamping callbacks calling a few BPF > > CALLS due to safety concern. > > 3. add more new testcases and adjust the existing testcases > > 4. revise some comments of new timestamping callbacks > > 5. remove a few BPF CGROUP locks > > > > RFC v6 > > In the meantime, any suggestions and reviews are welcome! > > Link: https://lore.kernel.org/all/20250112113748.73504-1-kerneljasonxing@gmail.com/ > > 1. handle those safety problem by using the correct method. > > 2. support bpf_getsockopt. > > 3. adjust the position of BPF_SOCK_OPS_TS_TCP_SND_CB > > 4. fix mishandling the hardware timestamp error > > 5. add more corresponding tests > > > > v5 > > Link: https://lore.kernel.org/all/20241207173803.90744-1-kerneljasonxing@gmail.com/ > > 1. handle the safety issus when someone tries to call unrelated bpf > > helpers. > > 2. avoid adding direct function call in the hot path like > > __dev_queue_xmit() > > 3. remove reporting the hardware timestamp and tskey since they can be > > fetched through the existing helper with the help of > > bpf_skops_init_skb(), please see the selftest. > > 4. add new sendmsg callback in tcp_sendmsg, and introduce tskey_bpf used > > by bpf program to correlate tcp_sendmsg with other hook points in patch [13/15]. > > > > v4 > > Link: https://lore.kernel.org/all/20241028110535.82999-1-kerneljasonxing@gmail.com/ > > 1. introduce sk->sk_bpf_cb_flags to let user use bpf_setsockopt() (Martin) > > 2. introduce SKBTX_BPF to enable the bpf SO_TIMESTAMPING feature (Martin) > > 3. introduce bpf map in tests (Martin) > > 4. I choose to make this series as simple as possible, so I only support > > most cases in the tx path for TCP protocol. > > > > v3 > > Link: https://lore.kernel.org/all/20241012040651.95616-1-kerneljasonxing@gmail.com/ > > 1. support UDP proto by introducing a new generation point. > > 2. for OPT_ID, introducing sk_tskey_bpf_offset to compute the delta > > between the current socket key and bpf socket key. It is desiged for > > UDP, which also applies to TCP. > > 3. support bpf_getsockopt() > > 4. use cgroup static key instead. > > 5. add one simple bpf selftest to show how it can be used. > > 6. remove the rx support from v2 because the number of patches could > > exceed the limit of one series. > > > > V2 > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > 1. Introduce tsflag requestors so that we are able to extend more in the > > future. Besides, it enables TX flags for bpf extension feature separately > > without breaking users. It is suggested by Vadim Fedorenko. > > 2. introduce a static key to control the whole feature. (Willem) > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > some TX/RX cases, not all the cases. > > > > Jason Xing (12): > > bpf: add support for bpf_setsockopt() > > bpf: prepare for timestamping callbacks use > > bpf: stop unsafely accessing TCP fields in bpf callbacks > > bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks > > net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING > > bpf: support SCM_TSTAMP_SCHED of SO_TIMESTAMPING > > bpf: support sw SCM_TSTAMP_SND of SO_TIMESTAMPING > > bpf: support hw SCM_TSTAMP_SND of SO_TIMESTAMPING > > bpf: support SCM_TSTAMP_ACK of SO_TIMESTAMPING > > bpf: add a new callback in tcp_tx_timestamp() > > bpf: support selective sampling for bpf timestamping > > selftests/bpf: add simple bpf tests in the tx path for timestamping > > feature > > > > include/linux/filter.h | 1 + > > include/linux/skbuff.h | 12 +- > > include/net/sock.h | 10 + > > include/net/tcp.h | 5 +- > > include/uapi/linux/bpf.h | 30 ++ > > kernel/bpf/btf.c | 1 + > > net/core/dev.c | 3 +- > > net/core/filter.c | 75 ++++- > > net/core/skbuff.c | 48 ++- > > net/core/sock.c | 15 + > > net/dsa/user.c | 2 +- > > net/ipv4/tcp.c | 4 + > > net/ipv4/tcp_input.c | 2 + > > net/ipv4/tcp_output.c | 2 + > > net/socket.c | 2 +- > > tools/include/uapi/linux/bpf.h | 23 ++ > > .../bpf/prog_tests/so_timestamping.c | 79 +++++ > > .../selftests/bpf/progs/so_timestamping.c | 312 ++++++++++++++++++ > > 18 files changed, 612 insertions(+), 14 deletions(-) > > create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c > > create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c > > >