Message ID | 20241012040651.95616-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net-timestamp: bpf extension to equip applications transparently | expand |
Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > tracepoint to print information (say, tstamp) so that we can > transparently equip applications with this feature and require no > modification in user side. > > Later, we discussed at netconf and agreed that we can use bpf for better > extension, which is mainly suggested by John Fastabend and Willem de > Bruijn. Many thanks here! So I post this series to see if we have a > better solution to extend. My feeling is BPF is a good place to provide > a way to add timestamping by administrators, without having to rebuild > applications. > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > only needs to pass certain flags through bpf_setsocktop() to a separate > tsflags. For TX timestamps, they will be printed during generation > phase. For RX timestamps, we will wait for the moment when recvmsg() is > called. > > After this series, we could step by step implement more advanced > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > In this series, I only support TCP protocol which is widely used in > SO_TIMESTAMPING feature. > > --- > V2 > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > 1. Introduce tsflag requestors so that we are able to extend more in the > future. Besides, it enables TX flags for bpf extension feature separately > without breaking users. It is suggested by Vadim Fedorenko. > 2. introduce a static key to control the whole feature. (Willem) > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > some TX/RX cases, not all the cases. > > Note: > The main concern we've discussion in V1 thread is how to deal with the > applications using SO_TIMESTAMPING feature? In this series, I allow both > cases to happen at the same time, which indicates that even one > applications setting SO_TIMESTAMPING can still be traced through BPF > program. Please see patch [04/12]. This revision does not address the main concern. An administrator installed BPF program can affect results of a process using SO_TIMESTAMPING in ways that break it. My halfway suggestion was to only enable this if the process has not enabled timestamping on a socket, and to hard fail the application if it does enable it while BPF timestamping is active. You pushed back, entirely reasonably. But if anything we need a stronger method of isolation, not just ignore the issue.
On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > tracepoint to print information (say, tstamp) so that we can > > transparently equip applications with this feature and require no > > modification in user side. > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > extension, which is mainly suggested by John Fastabend and Willem de > > Bruijn. Many thanks here! So I post this series to see if we have a > > better solution to extend. My feeling is BPF is a good place to provide > > a way to add timestamping by administrators, without having to rebuild > > applications. > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > only needs to pass certain flags through bpf_setsocktop() to a separate > > tsflags. For TX timestamps, they will be printed during generation > > phase. For RX timestamps, we will wait for the moment when recvmsg() is > > called. > > > > After this series, we could step by step implement more advanced > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > In this series, I only support TCP protocol which is widely used in > > SO_TIMESTAMPING feature. > > > > --- > > V2 > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > 1. Introduce tsflag requestors so that we are able to extend more in the > > future. Besides, it enables TX flags for bpf extension feature separately > > without breaking users. It is suggested by Vadim Fedorenko. > > 2. introduce a static key to control the whole feature. (Willem) > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > some TX/RX cases, not all the cases. > > > > Note: > > The main concern we've discussion in V1 thread is how to deal with the > > applications using SO_TIMESTAMPING feature? In this series, I allow both > > cases to happen at the same time, which indicates that even one > > applications setting SO_TIMESTAMPING can still be traced through BPF > > program. Please see patch [04/12]. > > This revision does not address the main concern. > > An administrator installed BPF program can affect results of a process > using SO_TIMESTAMPING in ways that break it. Sorry, I didn't get it. How the following code snippet would break users? void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb, struct skb_shared_hwtstamps *hwtstamps, struct sock *sk, int tstype) { if (!sk) return; if (static_branch_unlikely(&bpf_tstamp_control)) bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps); skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); } You can see, the application shipped with SO_TIMESTAMPING still prints timestamps even when the application stays in the attached cgroup directory. Thanks, Jason
On Sun, Oct 13, 2024 at 11:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > tracepoint to print information (say, tstamp) so that we can > > > transparently equip applications with this feature and require no > > > modification in user side. > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > extension, which is mainly suggested by John Fastabend and Willem de > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > better solution to extend. My feeling is BPF is a good place to provide > > > a way to add timestamping by administrators, without having to rebuild > > > applications. > > > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > > only needs to pass certain flags through bpf_setsocktop() to a separate > > > tsflags. For TX timestamps, they will be printed during generation > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is > > > called. > > > > > > After this series, we could step by step implement more advanced > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > > > In this series, I only support TCP protocol which is widely used in > > > SO_TIMESTAMPING feature. > > > > > > --- > > > V2 > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > > 1. Introduce tsflag requestors so that we are able to extend more in the > > > future. Besides, it enables TX flags for bpf extension feature separately > > > without breaking users. It is suggested by Vadim Fedorenko. > > > 2. introduce a static key to control the whole feature. (Willem) > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > > some TX/RX cases, not all the cases. > > > > > > Note: > > > The main concern we've discussion in V1 thread is how to deal with the > > > applications using SO_TIMESTAMPING feature? In this series, I allow both > > > cases to happen at the same time, which indicates that even one > > > applications setting SO_TIMESTAMPING can still be traced through BPF > > > program. Please see patch [04/12]. > > > > This revision does not address the main concern. > > > > An administrator installed BPF program can affect results of a process > > using SO_TIMESTAMPING in ways that break it. > > Sorry, I didn't get it. How the following code snippet would break users? > > void __skb_tstamp_tx(struct sk_buff *orig_skb, > const struct sk_buff *ack_skb, > struct skb_shared_hwtstamps *hwtstamps, > struct sock *sk, int tstype) > { > if (!sk) > return; > > if (static_branch_unlikely(&bpf_tstamp_control)) > bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps); > > skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, > tstype); > } > > You can see, the application shipped with SO_TIMESTAMPING still prints > timestamps even when the application stays in the attached cgroup > directory. I tested this by running "./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 5" in the bpf attached directory and it can correctly print the timestamp. So it would not break users. And surprisingly I found the key is not that right (ERROR: key 1000, expected 999). I will investigate and fix it. Thanks, Jason
> I tested this by running "./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 5" > in the bpf attached directory and it can correctly print the > timestamp. So it would not break users. > > And surprisingly I found the key is not that right (ERROR: key 1000, > expected 999). I will investigate and fix it. Ah, I think I know the reason. In this series, the BPF extension allows setting before sending SYN packet in the beginning of tcp_connect(), which is different from the original design that allows setting after sending the SYN packet. It causes the unexpected key. They are different. The reason why the failure is triggered is because I reuse the tskey logic in the BPF extension... ==== Back to the question on how to solve the conflicts, if we finally reckon that the original feature has the first priority, I can change the order in the next version. void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb, struct skb_shared_hwtstamps *hwtstamps, struct sock *sk, int tstype) { if (!sk) return; ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); if (ret) /* Apps does set the SO_TIMESTAMPING flag, return directly */ return; if (static_branch_unlikely(&bpf_tstamp_control)) bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps); } In this way, it will allow either of two features to work. Willem, do you think it is fine with you? Thanks, Jason
Jason Xing wrote: > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > tracepoint to print information (say, tstamp) so that we can > > > transparently equip applications with this feature and require no > > > modification in user side. > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > extension, which is mainly suggested by John Fastabend and Willem de > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > better solution to extend. My feeling is BPF is a good place to provide > > > a way to add timestamping by administrators, without having to rebuild > > > applications. > > > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > > only needs to pass certain flags through bpf_setsocktop() to a separate > > > tsflags. For TX timestamps, they will be printed during generation > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is > > > called. > > > > > > After this series, we could step by step implement more advanced > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > > > In this series, I only support TCP protocol which is widely used in > > > SO_TIMESTAMPING feature. > > > > > > --- > > > V2 > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > > 1. Introduce tsflag requestors so that we are able to extend more in the > > > future. Besides, it enables TX flags for bpf extension feature separately > > > without breaking users. It is suggested by Vadim Fedorenko. > > > 2. introduce a static key to control the whole feature. (Willem) > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > > some TX/RX cases, not all the cases. > > > > > > Note: > > > The main concern we've discussion in V1 thread is how to deal with the > > > applications using SO_TIMESTAMPING feature? In this series, I allow both > > > cases to happen at the same time, which indicates that even one > > > applications setting SO_TIMESTAMPING can still be traced through BPF > > > program. Please see patch [04/12]. > > > > This revision does not address the main concern. > > > > An administrator installed BPF program can affect results of a process > > using SO_TIMESTAMPING in ways that break it. > > Sorry, I didn't get it. How the following code snippet would break users? The state between user and bpf timestamping needs to be separate to avoid interference. Introducing a new sk_tsflags for bpf goes a long way. Though I prefer a separate sk_tsflags_bpf and not touching existing sk_tsflags over the array approach of patch 1. Also need to check pahole and maybe move sk_tsflags_bpf elsewhere in the struct. Other state is sk_tskey. The current approach can initialize the key in bpf before the user attempts it for the same socket. Admittedly unlikely. But hard to reach states creates hard to debug issues. This field cannot easily be duplicated, because the key is tracked in skb_shinfo. Where there is not sufficient room for two keys. The same goes for txflags. The current approach is to set those flags if either user or bpf requestss them, then on __skb_tstamp_tx detect if the user did not set them, and if so skip output to the user. Need to take a closer look, but seems to work. So getting closer.
On Tue, Oct 15, 2024 at 9:28 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > > tracepoint to print information (say, tstamp) so that we can > > > > transparently equip applications with this feature and require no > > > > modification in user side. > > > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > > extension, which is mainly suggested by John Fastabend and Willem de > > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > > better solution to extend. My feeling is BPF is a good place to provide > > > > a way to add timestamping by administrators, without having to rebuild > > > > applications. > > > > > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > > > only needs to pass certain flags through bpf_setsocktop() to a separate > > > > tsflags. For TX timestamps, they will be printed during generation > > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is > > > > called. > > > > > > > > After this series, we could step by step implement more advanced > > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > > > > > In this series, I only support TCP protocol which is widely used in > > > > SO_TIMESTAMPING feature. > > > > > > > > --- > > > > V2 > > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > > > 1. Introduce tsflag requestors so that we are able to extend more in the > > > > future. Besides, it enables TX flags for bpf extension feature separately > > > > without breaking users. It is suggested by Vadim Fedorenko. > > > > 2. introduce a static key to control the whole feature. (Willem) > > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > > > some TX/RX cases, not all the cases. > > > > > > > > Note: > > > > The main concern we've discussion in V1 thread is how to deal with the > > > > applications using SO_TIMESTAMPING feature? In this series, I allow both > > > > cases to happen at the same time, which indicates that even one > > > > applications setting SO_TIMESTAMPING can still be traced through BPF > > > > program. Please see patch [04/12]. > > > > > > This revision does not address the main concern. > > > > > > An administrator installed BPF program can affect results of a process > > > using SO_TIMESTAMPING in ways that break it. > > > > Sorry, I didn't get it. How the following code snippet would break users? > > The state between user and bpf timestamping needs to be separate to > avoid interference. Do you agree that we will use this method as following, only allow either of them to work? void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb, struct skb_shared_hwtstamps *hwtstamps, struct sock *sk, int tstype) { if (!sk) return; ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); if (ret) /* Apps does set the SO_TIMESTAMPING flag, return directly */ return; if (static_branch_unlikely(&bpf_tstamp_control)) bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps); } which means if the apps using non-bpf method, we will not see the output even if we load bpf program. > > Introducing a new sk_tsflags for bpf goes a long way. Though I prefer > a separate sk_tsflags_bpf and not touching existing sk_tsflags over > the array approach of patch 1. Also need to check pahole and maybe > move sk_tsflags_bpf elsewhere in the struct. Yes, I will use this instead. > > Other state is sk_tskey. The current approach can initialize the key > in bpf before the user attempts it for the same socket. Admittedly > unlikely. But hard to reach states creates hard to debug issues. > > This field cannot easily be duplicated, because the key is tracked > in skb_shinfo. Where there is not sufficient room for two keys. > > The same goes for txflags. They are not that easy to handle in a proper way. That's the reason why I chose to use the same logic, so that there is no side effect. If we expect to separate them as well, it seems a little bit weird to introduce another similar flags in struct sk_buff. > > The current approach is to set those flags if either user or bpf > requestss them, then on __skb_tstamp_tx detect if the user did not set > them, and if so skip output to the user. Need to take a closer look, > but seems to work. Let me keep this current approach, it will not affect each other. > > So getting closer. Thanks for the careful review. Thanks, Jason
Jason Xing wrote: > On Tue, Oct 15, 2024 at 9:28 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > > > tracepoint to print information (say, tstamp) so that we can > > > > > transparently equip applications with this feature and require no > > > > > modification in user side. > > > > > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > > > extension, which is mainly suggested by John Fastabend and Willem de > > > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > > > better solution to extend. My feeling is BPF is a good place to provide > > > > > a way to add timestamping by administrators, without having to rebuild > > > > > applications. > > > > > > > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > > > > only needs to pass certain flags through bpf_setsocktop() to a separate > > > > > tsflags. For TX timestamps, they will be printed during generation > > > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is > > > > > called. > > > > > > > > > > After this series, we could step by step implement more advanced > > > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > > > > > > > In this series, I only support TCP protocol which is widely used in > > > > > SO_TIMESTAMPING feature. > > > > > > > > > > --- > > > > > V2 > > > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > > > > 1. Introduce tsflag requestors so that we are able to extend more in the > > > > > future. Besides, it enables TX flags for bpf extension feature separately > > > > > without breaking users. It is suggested by Vadim Fedorenko. > > > > > 2. introduce a static key to control the whole feature. (Willem) > > > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > > > > some TX/RX cases, not all the cases. > > > > > > > > > > Note: > > > > > The main concern we've discussion in V1 thread is how to deal with the > > > > > applications using SO_TIMESTAMPING feature? In this series, I allow both > > > > > cases to happen at the same time, which indicates that even one > > > > > applications setting SO_TIMESTAMPING can still be traced through BPF > > > > > program. Please see patch [04/12]. > > > > > > > > This revision does not address the main concern. > > > > > > > > An administrator installed BPF program can affect results of a process > > > > using SO_TIMESTAMPING in ways that break it. > > > > > > Sorry, I didn't get it. How the following code snippet would break users? > > > > The state between user and bpf timestamping needs to be separate to > > avoid interference. > > Do you agree that we will use this method as following, only allow > either of them to work? > > void __skb_tstamp_tx(struct sk_buff *orig_skb, > const struct sk_buff *ack_skb, > struct skb_shared_hwtstamps *hwtstamps, > struct sock *sk, int tstype) > { > if (!sk) > return; > > ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); > if (ret) > /* Apps does set the SO_TIMESTAMPING flag, return directly */ > return; > > if (static_branch_unlikely(&bpf_tstamp_control)) > bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps); > } > > which means if the apps using non-bpf method, we will not see the > output even if we load bpf program. Could the bpf setsockopt fail hard in that case? Your current patch tries to make them work at the same time. It mostly does work. There are just a few concerning edge cases that may result in hard to understand bugs. Making only one method work per socket and fail hard if both try it is crude, but at least the failure will be clear: the setsockopt fails. I think that's safer. And in practice, the use cases for BPF timestamping probably are exactly when application timestamping is missing?
On Tue, Oct 15, 2024 at 10:59 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Tue, Oct 15, 2024 at 9:28 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > Jason Xing wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > > > > tracepoint to print information (say, tstamp) so that we can > > > > > > transparently equip applications with this feature and require no > > > > > > modification in user side. > > > > > > > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > > > > extension, which is mainly suggested by John Fastabend and Willem de > > > > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > > > > better solution to extend. My feeling is BPF is a good place to provide > > > > > > a way to add timestamping by administrators, without having to rebuild > > > > > > applications. > > > > > > > > > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users > > > > > > only needs to pass certain flags through bpf_setsocktop() to a separate > > > > > > tsflags. For TX timestamps, they will be printed during generation > > > > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is > > > > > > called. > > > > > > > > > > > > After this series, we could step by step implement more advanced > > > > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > > > > > > > > > In this series, I only support TCP protocol which is widely used in > > > > > > SO_TIMESTAMPING feature. > > > > > > > > > > > > --- > > > > > > V2 > > > > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ > > > > > > 1. Introduce tsflag requestors so that we are able to extend more in the > > > > > > future. Besides, it enables TX flags for bpf extension feature separately > > > > > > without breaking users. It is suggested by Vadim Fedorenko. > > > > > > 2. introduce a static key to control the whole feature. (Willem) > > > > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in > > > > > > some TX/RX cases, not all the cases. > > > > > > > > > > > > Note: > > > > > > The main concern we've discussion in V1 thread is how to deal with the > > > > > > applications using SO_TIMESTAMPING feature? In this series, I allow both > > > > > > cases to happen at the same time, which indicates that even one > > > > > > applications setting SO_TIMESTAMPING can still be traced through BPF > > > > > > program. Please see patch [04/12]. > > > > > > > > > > This revision does not address the main concern. > > > > > > > > > > An administrator installed BPF program can affect results of a process > > > > > using SO_TIMESTAMPING in ways that break it. > > > > > > > > Sorry, I didn't get it. How the following code snippet would break users? > > > > > > The state between user and bpf timestamping needs to be separate to > > > avoid interference. > > > > Do you agree that we will use this method as following, only allow > > either of them to work? > > > > void __skb_tstamp_tx(struct sk_buff *orig_skb, > > const struct sk_buff *ack_skb, > > struct skb_shared_hwtstamps *hwtstamps, > > struct sock *sk, int tstype) > > { > > if (!sk) > > return; > > > > ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); > > if (ret) > > /* Apps does set the SO_TIMESTAMPING flag, return directly */ > > return; > > > > if (static_branch_unlikely(&bpf_tstamp_control)) > > bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps); > > } > > > > which means if the apps using non-bpf method, we will not see the > > output even if we load bpf program. > > Could the bpf setsockopt fail hard in that case? We can do this. I think I will add some if test statements to see if sk_tsflags is initialized before. > > Your current patch tries to make them work at the same time. It mostly > does work. There are just a few concerning edge cases that may result > in hard to understand bugs. Agree. > > Making only one method work per socket and fail hard if both try it is > crude, but at least the failure will be clear: the setsockopt fails. > > I think that's safer. And in practice, the use cases for BPF > timestamping probably are exactly when application timestamping is > missing? Fair enough. Let me try this way:) Thanks, Jason
From: Jason Xing <kernelxing@tencent.com> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using tracepoint to print information (say, tstamp) so that we can transparently equip applications with this feature and require no modification in user side. Later, we discussed at netconf and agreed that we can use bpf for better extension, which is mainly suggested by John Fastabend and Willem de Bruijn. Many thanks here! So I post this series to see if we have a better solution to extend. My feeling is BPF is a good place to provide a way to add timestamping by administrators, without having to rebuild applications. This approach mostly relies on existing SO_TIMESTAMPING feature, users only needs to pass certain flags through bpf_setsocktop() to a separate tsflags. For TX timestamps, they will be printed during generation phase. For RX timestamps, we will wait for the moment when recvmsg() is called. After this series, we could step by step implement more advanced functions/flags already in SO_TIMESTAMPING feature for bpf extension. In this series, I only support TCP protocol which is widely used in SO_TIMESTAMPING feature. --- V2 Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/ 1. Introduce tsflag requestors so that we are able to extend more in the future. Besides, it enables TX flags for bpf extension feature separately without breaking users. It is suggested by Vadim Fedorenko. 2. introduce a static key to control the whole feature. (Willem) 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in some TX/RX cases, not all the cases. Note: The main concern we've discussion in V1 thread is how to deal with the applications using SO_TIMESTAMPING feature? In this series, I allow both cases to happen at the same time, which indicates that even one applications setting SO_TIMESTAMPING can still be traced through BPF program. Please see patch [04/12]. Here is the test output: 1) receive path iperf3-987305 [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0 2) xmit path iperf3-19765 [013] ...11 2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584 iperf3-19765 [013] b..11 2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976 iperf3-19765 [013] ...11 2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739 Here is the full bpf program: #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h> #include <uapi/linux/net_tstamp.h> int _version SEC("version") = 1; char _license[] SEC("license") = "GPL"; # define SO_TIMESTAMPING 37 __section("sockops") int set_initial_rto(struct bpf_sock_ops *skops) { int op = (int) skops->op; u32 sport = 0, dport = 0; int flags; switch (op) { //case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: case BPF_SOCK_OPS_TCP_CONNECT_CB: case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: flags = SOF_TIMESTAMPING_RX_SOFTWARE | SOF_TIMESTAMPING_TX_SCHED | SOF_TIMESTAMPING_TX_ACK | SOF_TIMESTAMPING_TX_SOFTWARE | SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP; bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags)); bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG|BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG); break; case BPF_SOCK_OPS_TS_SCHED_OPT_CB: case BPF_SOCK_OPS_TS_SW_OPT_CB: case BPF_SOCK_OPS_TS_ACK_OPT_CB: dport = bpf_ntohl(skops->remote_port); sport = skops->local_port; bpf_printk("tx: port: %u:%u, key: %u, timestamp: %u,%u\n", sport, dport, skops->args[0], skops->args[1], skops->args[2]); break; case BPF_SOCK_OPS_TS_RX_OPT_CB: dport = bpf_ntohl(skops->remote_port); sport = skops->local_port; bpf_printk("rx: port: %u:%u, swtimestamp: %u,%u, hwtimestamp: %u,%u\n", sport, dport, skops->args[0], skops->args[1], skops->args[2], skops->args[3]); break; } return 1; } Jason Xing (12): net-timestamp: introduce socket tsflag requestors net-timestamp: open gate for bpf_setsockopt net-timestamp: reorganize in skb_tstamp_tx_output() net-timestamp: add static key to control the whole bpf extension net-timestamp: add bpf infrastructure to allow exposing timestamp later net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp net-timestamp: add tx OPT_ID_TCP support for bpf case net-timestamp: make bpf for tx timestamp work net-timestamp: add bpf framework for rx timestamps net-timestamp: add bpf support for rx software/hardware timestamp include/linux/tcp.h | 2 +- include/net/ip.h | 2 +- include/net/sock.h | 19 ++++-- include/net/tcp.h | 16 +++++- include/uapi/linux/bpf.h | 28 ++++++++- net/can/j1939/socket.c | 2 +- net/core/filter.c | 39 +++++++++++++ net/core/skbuff.c | 102 ++++++++++++++++++++++++++++++--- net/core/sock.c | 68 +++++++++++++++------- net/ipv4/ip_output.c | 2 +- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/tcp.c | 60 ++++++++++++++++++- net/ipv6/ip6_output.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/udp.c | 2 +- net/sctp/socket.c | 2 +- net/socket.c | 4 +- tools/include/uapi/linux/bpf.h | 28 ++++++++- 19 files changed, 333 insertions(+), 51 deletions(-)