Message ID | 20241008095109.99918-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net-timestamp: bpf extension to equip applications transparently | expand |
Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > tracepoint to print information (say, tstamp) so that we can > transparently equip applications with this feature and require no > modification in user side. > > Later, we discussed at netconf and agreed that we can use bpf for better > extension, which is mainly suggested by John Fastabend and Willem de > Bruijn. Many thanks here! So I post this series to see if we have a > better solution to extend. > > This approach relies on existing SO_TIMESTAMPING feature, for tx path, > users only needs to pass certain flags through bpf program to make sure > the last skb from each sendmsg() has timestamp related controlled flag. > For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > and wait for the moment when recvmsg() is called. As you mention, overall I am very supportive of having a way to add timestamping by adminstrators, without having to rebuild applications. BPF hooks seem to be the right place for this. There is existing kprobe/kretprobe/kfunc support. Supporting SO_TIMESTAMPING directly may be useful due to its targeted feature set, and correlation between measurements for the same data in the stream. > After this series, we could step by step implement more advanced > functions/flags already in SO_TIMESTAMPING feature for bpf extension. My main implementation concern is where this API overlaps with the existing user API, and how they might conflict. A few questions in the patches. Also, I'm not the best person to give in-depth BPF feedback. > Here is the test output: > 1) receive path > iperf3-987305 [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0 > 2) xmit path > iperf3-19765 [013] ...11 2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584 > iperf3-19765 [013] b..11 2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976 > iperf3-19765 [013] ...11 2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739
On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > tracepoint to print information (say, tstamp) so that we can > > transparently equip applications with this feature and require no > > modification in user side. > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > extension, which is mainly suggested by John Fastabend and Willem de > > Bruijn. Many thanks here! So I post this series to see if we have a > > better solution to extend. > > > > This approach relies on existing SO_TIMESTAMPING feature, for tx path, > > users only needs to pass certain flags through bpf program to make sure > > the last skb from each sendmsg() has timestamp related controlled flag. > > For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > > and wait for the moment when recvmsg() is called. > > As you mention, overall I am very supportive of having a way to add > timestamping by adminstrators, without having to rebuild applications. > BPF hooks seem to be the right place for this. > > There is existing kprobe/kretprobe/kfunc support. Supporting > SO_TIMESTAMPING directly may be useful due to its targeted feature > set, and correlation between measurements for the same data in the > stream. > > > After this series, we could step by step implement more advanced > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > My main implementation concern is where this API overlaps with the > existing user API, and how they might conflict. A few questions in the > patches. Agreed. That's also what I'm concerned about. So I decided to ask for related experts' help. How to deal with it without interfering with the existing apps in the right way is the key problem.
On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > tracepoint to print information (say, tstamp) so that we can > > > transparently equip applications with this feature and require no > > > modification in user side. > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > extension, which is mainly suggested by John Fastabend and Willem de > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > better solution to extend. > > > > > > This approach relies on existing SO_TIMESTAMPING feature, for tx path, > > > users only needs to pass certain flags through bpf program to make sure > > > the last skb from each sendmsg() has timestamp related controlled flag. > > > For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > > > and wait for the moment when recvmsg() is called. > > > > As you mention, overall I am very supportive of having a way to add > > timestamping by adminstrators, without having to rebuild applications. > > BPF hooks seem to be the right place for this. > > > > There is existing kprobe/kretprobe/kfunc support. Supporting > > SO_TIMESTAMPING directly may be useful due to its targeted feature > > set, and correlation between measurements for the same data in the > > stream. > > > > > After this series, we could step by step implement more advanced > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > My main implementation concern is where this API overlaps with the > > existing user API, and how they might conflict. A few questions in the > > patches. > > Agreed. That's also what I'm concerned about. So I decided to ask for > related experts' help. > > How to deal with it without interfering with the existing apps in the > right way is the key problem. What I try to implement is let the bpf program have the highest precedence. It's similar to RTO min, see the commit as an example: commit f086edef71be7174a16c1ed67ac65a085cda28b1 Author: Kevin Yang <yyd@google.com> Date: Mon Jun 3 21:30:54 2024 +0000 tcp: add sysctl_tcp_rto_min_us Adding a sysctl knob to allow user to specify a default rto_min at socket init time, other than using the hard coded 200ms default rto_min. Note that the rto_min route option has the highest precedence for configuring this setting, followed by the TCP_BPF_RTO_MIN socket option, followed by the tcp_rto_min_us sysctl. It includes three cases, 1) route option, 2) bpf option, 3) sysctl. The first priority can override others. It doesn't have a good chance/point to restore the icsk_rto_min field if users want to shutdown the bpf program because it is set in bpf_sol_tcp_setsockopt().
On 09/10/2024 02:05, Jason Xing wrote: > On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >> >> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn >> <willemdebruijn.kernel@gmail.com> wrote: >>> >>> Jason Xing wrote: >>>> From: Jason Xing <kernelxing@tencent.com> >>>> >>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using >>>> tracepoint to print information (say, tstamp) so that we can >>>> transparently equip applications with this feature and require no >>>> modification in user side. >>>> >>>> Later, we discussed at netconf and agreed that we can use bpf for better >>>> extension, which is mainly suggested by John Fastabend and Willem de >>>> Bruijn. Many thanks here! So I post this series to see if we have a >>>> better solution to extend. >>>> >>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, >>>> users only needs to pass certain flags through bpf program to make sure >>>> the last skb from each sendmsg() has timestamp related controlled flag. >>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags >>>> and wait for the moment when recvmsg() is called. >>> >>> As you mention, overall I am very supportive of having a way to add >>> timestamping by adminstrators, without having to rebuild applications. >>> BPF hooks seem to be the right place for this. >>> >>> There is existing kprobe/kretprobe/kfunc support. Supporting >>> SO_TIMESTAMPING directly may be useful due to its targeted feature >>> set, and correlation between measurements for the same data in the >>> stream. >>> >>>> After this series, we could step by step implement more advanced >>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. >>> >>> My main implementation concern is where this API overlaps with the >>> existing user API, and how they might conflict. A few questions in the >>> patches. >> >> Agreed. That's also what I'm concerned about. So I decided to ask for >> related experts' help. >> >> How to deal with it without interfering with the existing apps in the >> right way is the key problem. > > What I try to implement is let the bpf program have the highest > precedence. It's similar to RTO min, see the commit as an example: > > commit f086edef71be7174a16c1ed67ac65a085cda28b1 > Author: Kevin Yang <yyd@google.com> > Date: Mon Jun 3 21:30:54 2024 +0000 > > tcp: add sysctl_tcp_rto_min_us > > Adding a sysctl knob to allow user to specify a default > rto_min at socket init time, other than using the hard > coded 200ms default rto_min. > > Note that the rto_min route option has the highest precedence > for configuring this setting, followed by the TCP_BPF_RTO_MIN > socket option, followed by the tcp_rto_min_us sysctl. > > It includes three cases, 1) route option, 2) bpf option, 3) sysctl. > The first priority can override others. It doesn't have a good > chance/point to restore the icsk_rto_min field if users want to > shutdown the bpf program because it is set in > bpf_sol_tcp_setsockopt(). rto_min example is slightly different. With tcp_rto_min the doesn't expect any data to come back to user space while for timestamping the app may be confused directly by providing more data, or by not providing expected data. I believe some hint about requestor of the data is needed here. It will also help to solve the problem of populating sk_err_queue mentioned by Martin.
On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > > On 09/10/2024 02:05, Jason Xing wrote: > > On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > >> > >> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > >> <willemdebruijn.kernel@gmail.com> wrote: > >>> > >>> Jason Xing wrote: > >>>> From: Jason Xing <kernelxing@tencent.com> > >>>> > >>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > >>>> tracepoint to print information (say, tstamp) so that we can > >>>> transparently equip applications with this feature and require no > >>>> modification in user side. > >>>> > >>>> Later, we discussed at netconf and agreed that we can use bpf for better > >>>> extension, which is mainly suggested by John Fastabend and Willem de > >>>> Bruijn. Many thanks here! So I post this series to see if we have a > >>>> better solution to extend. > >>>> > >>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, > >>>> users only needs to pass certain flags through bpf program to make sure > >>>> the last skb from each sendmsg() has timestamp related controlled flag. > >>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > >>>> and wait for the moment when recvmsg() is called. > >>> > >>> As you mention, overall I am very supportive of having a way to add > >>> timestamping by adminstrators, without having to rebuild applications. > >>> BPF hooks seem to be the right place for this. > >>> > >>> There is existing kprobe/kretprobe/kfunc support. Supporting > >>> SO_TIMESTAMPING directly may be useful due to its targeted feature > >>> set, and correlation between measurements for the same data in the > >>> stream. > >>> > >>>> After this series, we could step by step implement more advanced > >>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. > >>> > >>> My main implementation concern is where this API overlaps with the > >>> existing user API, and how they might conflict. A few questions in the > >>> patches. > >> > >> Agreed. That's also what I'm concerned about. So I decided to ask for > >> related experts' help. > >> > >> How to deal with it without interfering with the existing apps in the > >> right way is the key problem. > > > > What I try to implement is let the bpf program have the highest > > precedence. It's similar to RTO min, see the commit as an example: > > > > commit f086edef71be7174a16c1ed67ac65a085cda28b1 > > Author: Kevin Yang <yyd@google.com> > > Date: Mon Jun 3 21:30:54 2024 +0000 > > > > tcp: add sysctl_tcp_rto_min_us > > > > Adding a sysctl knob to allow user to specify a default > > rto_min at socket init time, other than using the hard > > coded 200ms default rto_min. > > > > Note that the rto_min route option has the highest precedence > > for configuring this setting, followed by the TCP_BPF_RTO_MIN > > socket option, followed by the tcp_rto_min_us sysctl. > > > > It includes three cases, 1) route option, 2) bpf option, 3) sysctl. > > The first priority can override others. It doesn't have a good > > chance/point to restore the icsk_rto_min field if users want to > > shutdown the bpf program because it is set in > > bpf_sol_tcp_setsockopt(). > > rto_min example is slightly different. With tcp_rto_min the doesn't > expect any data to come back to user space while for timestamping the > app may be confused directly by providing more data, or by not providing > expected data. I believe some hint about requestor of the data is needed > here. It will also help to solve the problem of populating sk_err_queue > mentioned by Martin. Sorry, I don't fully get it. In this patch series, this bpf extension feature will not rely on sk_err_queue any more to report tx timestamps to userspace. Bpf program can do that printing. Do you mean that it could be wrong if one skb carries the tsflags that are previously set due to the bpf program and then suddenly users detach the program? It indeed will put a new/cloned skb into the error queue. Interesting corner case. It seems I have to re-implement a totally independent tsflags for bpf extension feature. Do you have a better idea on this? Thanks, Jason
On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko > <vadim.fedorenko@linux.dev> wrote: > > > > On 09/10/2024 02:05, Jason Xing wrote: > > > On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > >> > > >> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > > >> <willemdebruijn.kernel@gmail.com> wrote: > > >>> > > >>> Jason Xing wrote: > > >>>> From: Jason Xing <kernelxing@tencent.com> > > >>>> > > >>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > >>>> tracepoint to print information (say, tstamp) so that we can > > >>>> transparently equip applications with this feature and require no > > >>>> modification in user side. > > >>>> > > >>>> Later, we discussed at netconf and agreed that we can use bpf for better > > >>>> extension, which is mainly suggested by John Fastabend and Willem de > > >>>> Bruijn. Many thanks here! So I post this series to see if we have a > > >>>> better solution to extend. > > >>>> > > >>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, > > >>>> users only needs to pass certain flags through bpf program to make sure > > >>>> the last skb from each sendmsg() has timestamp related controlled flag. > > >>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > > >>>> and wait for the moment when recvmsg() is called. > > >>> > > >>> As you mention, overall I am very supportive of having a way to add > > >>> timestamping by adminstrators, without having to rebuild applications. > > >>> BPF hooks seem to be the right place for this. > > >>> > > >>> There is existing kprobe/kretprobe/kfunc support. Supporting > > >>> SO_TIMESTAMPING directly may be useful due to its targeted feature > > >>> set, and correlation between measurements for the same data in the > > >>> stream. > > >>> > > >>>> After this series, we could step by step implement more advanced > > >>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > >>> > > >>> My main implementation concern is where this API overlaps with the > > >>> existing user API, and how they might conflict. A few questions in the > > >>> patches. > > >> > > >> Agreed. That's also what I'm concerned about. So I decided to ask for > > >> related experts' help. > > >> > > >> How to deal with it without interfering with the existing apps in the > > >> right way is the key problem. > > > > > > What I try to implement is let the bpf program have the highest > > > precedence. It's similar to RTO min, see the commit as an example: > > > > > > commit f086edef71be7174a16c1ed67ac65a085cda28b1 > > > Author: Kevin Yang <yyd@google.com> > > > Date: Mon Jun 3 21:30:54 2024 +0000 > > > > > > tcp: add sysctl_tcp_rto_min_us > > > > > > Adding a sysctl knob to allow user to specify a default > > > rto_min at socket init time, other than using the hard > > > coded 200ms default rto_min. > > > > > > Note that the rto_min route option has the highest precedence > > > for configuring this setting, followed by the TCP_BPF_RTO_MIN > > > socket option, followed by the tcp_rto_min_us sysctl. > > > > > > It includes three cases, 1) route option, 2) bpf option, 3) sysctl. > > > The first priority can override others. It doesn't have a good > > > chance/point to restore the icsk_rto_min field if users want to > > > shutdown the bpf program because it is set in > > > bpf_sol_tcp_setsockopt(). > > > > rto_min example is slightly different. With tcp_rto_min the doesn't > > expect any data to come back to user space while for timestamping the > > app may be confused directly by providing more data, or by not providing > > expected data. I believe some hint about requestor of the data is needed > > here. It will also help to solve the problem of populating sk_err_queue > > mentioned by Martin. > > Sorry, I don't fully get it. In this patch series, this bpf extension > feature will not rely on sk_err_queue any more to report tx timestamps > to userspace. Bpf program can do that printing. > > Do you mean that it could be wrong if one skb carries the tsflags that > are previously set due to the bpf program and then suddenly users > detach the program? It indeed will put a new/cloned skb into the error > queue. Interesting corner case. It seems I have to re-implement a > totally independent tsflags for bpf extension feature. Do you have a > better idea on this? I feel that if I could introduce bpf new flags like SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch series, then it will not populate skb in sk_err_queue even users remove the bpf program all of sudden. With this kind of specific bpf flags, we can also avoid conflicting with the apps using SO_TIEMSTAMPING feature. Let me give it a shot unless a better solution shows up.
On 09/10/2024 12:48, Jason Xing wrote: > On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote: >> >> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko >> <vadim.fedorenko@linux.dev> wrote: >>> >>> On 09/10/2024 02:05, Jason Xing wrote: >>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>>> >>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn >>>>> <willemdebruijn.kernel@gmail.com> wrote: >>>>>> >>>>>> Jason Xing wrote: >>>>>>> From: Jason Xing <kernelxing@tencent.com> >>>>>>> >>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using >>>>>>> tracepoint to print information (say, tstamp) so that we can >>>>>>> transparently equip applications with this feature and require no >>>>>>> modification in user side. >>>>>>> >>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better >>>>>>> extension, which is mainly suggested by John Fastabend and Willem de >>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a >>>>>>> better solution to extend. >>>>>>> >>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, >>>>>>> users only needs to pass certain flags through bpf program to make sure >>>>>>> the last skb from each sendmsg() has timestamp related controlled flag. >>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags >>>>>>> and wait for the moment when recvmsg() is called. >>>>>> >>>>>> As you mention, overall I am very supportive of having a way to add >>>>>> timestamping by adminstrators, without having to rebuild applications. >>>>>> BPF hooks seem to be the right place for this. >>>>>> >>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting >>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature >>>>>> set, and correlation between measurements for the same data in the >>>>>> stream. >>>>>> >>>>>>> After this series, we could step by step implement more advanced >>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. >>>>>> >>>>>> My main implementation concern is where this API overlaps with the >>>>>> existing user API, and how they might conflict. A few questions in the >>>>>> patches. >>>>> >>>>> Agreed. That's also what I'm concerned about. So I decided to ask for >>>>> related experts' help. >>>>> >>>>> How to deal with it without interfering with the existing apps in the >>>>> right way is the key problem. >>>> >>>> What I try to implement is let the bpf program have the highest >>>> precedence. It's similar to RTO min, see the commit as an example: >>>> >>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1 >>>> Author: Kevin Yang <yyd@google.com> >>>> Date: Mon Jun 3 21:30:54 2024 +0000 >>>> >>>> tcp: add sysctl_tcp_rto_min_us >>>> >>>> Adding a sysctl knob to allow user to specify a default >>>> rto_min at socket init time, other than using the hard >>>> coded 200ms default rto_min. >>>> >>>> Note that the rto_min route option has the highest precedence >>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN >>>> socket option, followed by the tcp_rto_min_us sysctl. >>>> >>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl. >>>> The first priority can override others. It doesn't have a good >>>> chance/point to restore the icsk_rto_min field if users want to >>>> shutdown the bpf program because it is set in >>>> bpf_sol_tcp_setsockopt(). >>> >>> rto_min example is slightly different. With tcp_rto_min the doesn't >>> expect any data to come back to user space while for timestamping the >>> app may be confused directly by providing more data, or by not providing >>> expected data. I believe some hint about requestor of the data is needed >>> here. It will also help to solve the problem of populating sk_err_queue >>> mentioned by Martin. >> >> Sorry, I don't fully get it. In this patch series, this bpf extension >> feature will not rely on sk_err_queue any more to report tx timestamps >> to userspace. Bpf program can do that printing. >> >> Do you mean that it could be wrong if one skb carries the tsflags that >> are previously set due to the bpf program and then suddenly users >> detach the program? It indeed will put a new/cloned skb into the error >> queue. Interesting corner case. It seems I have to re-implement a >> totally independent tsflags for bpf extension feature. Do you have a >> better idea on this? > > I feel that if I could introduce bpf new flags like > SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch > series, then it will not populate skb in sk_err_queue even users > remove the bpf program all of sudden. With this kind of specific bpf > flags, we can also avoid conflicting with the apps using > SO_TIEMSTAMPING feature. Let me give it a shot unless a better > solution shows up. It doesn't look great to have duplicate flags just to indicate that this particular timestamp was asked by a bpf program, even though it looks like a straight forward solution. Sounds like we have to re-think the interface for timestamping requests, but I don't have proper suggestion right now.
On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > > On 09/10/2024 12:48, Jason Xing wrote: > > On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > >> > >> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko > >> <vadim.fedorenko@linux.dev> wrote: > >>> > >>> On 09/10/2024 02:05, Jason Xing wrote: > >>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > >>>>> > >>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > >>>>> <willemdebruijn.kernel@gmail.com> wrote: > >>>>>> > >>>>>> Jason Xing wrote: > >>>>>>> From: Jason Xing <kernelxing@tencent.com> > >>>>>>> > >>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > >>>>>>> tracepoint to print information (say, tstamp) so that we can > >>>>>>> transparently equip applications with this feature and require no > >>>>>>> modification in user side. > >>>>>>> > >>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better > >>>>>>> extension, which is mainly suggested by John Fastabend and Willem de > >>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a > >>>>>>> better solution to extend. > >>>>>>> > >>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, > >>>>>>> users only needs to pass certain flags through bpf program to make sure > >>>>>>> the last skb from each sendmsg() has timestamp related controlled flag. > >>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > >>>>>>> and wait for the moment when recvmsg() is called. > >>>>>> > >>>>>> As you mention, overall I am very supportive of having a way to add > >>>>>> timestamping by adminstrators, without having to rebuild applications. > >>>>>> BPF hooks seem to be the right place for this. > >>>>>> > >>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting > >>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature > >>>>>> set, and correlation between measurements for the same data in the > >>>>>> stream. > >>>>>> > >>>>>>> After this series, we could step by step implement more advanced > >>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. > >>>>>> > >>>>>> My main implementation concern is where this API overlaps with the > >>>>>> existing user API, and how they might conflict. A few questions in the > >>>>>> patches. > >>>>> > >>>>> Agreed. That's also what I'm concerned about. So I decided to ask for > >>>>> related experts' help. > >>>>> > >>>>> How to deal with it without interfering with the existing apps in the > >>>>> right way is the key problem. > >>>> > >>>> What I try to implement is let the bpf program have the highest > >>>> precedence. It's similar to RTO min, see the commit as an example: > >>>> > >>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1 > >>>> Author: Kevin Yang <yyd@google.com> > >>>> Date: Mon Jun 3 21:30:54 2024 +0000 > >>>> > >>>> tcp: add sysctl_tcp_rto_min_us > >>>> > >>>> Adding a sysctl knob to allow user to specify a default > >>>> rto_min at socket init time, other than using the hard > >>>> coded 200ms default rto_min. > >>>> > >>>> Note that the rto_min route option has the highest precedence > >>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN > >>>> socket option, followed by the tcp_rto_min_us sysctl. > >>>> > >>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl. > >>>> The first priority can override others. It doesn't have a good > >>>> chance/point to restore the icsk_rto_min field if users want to > >>>> shutdown the bpf program because it is set in > >>>> bpf_sol_tcp_setsockopt(). > >>> > >>> rto_min example is slightly different. With tcp_rto_min the doesn't > >>> expect any data to come back to user space while for timestamping the > >>> app may be confused directly by providing more data, or by not providing > >>> expected data. I believe some hint about requestor of the data is needed > >>> here. It will also help to solve the problem of populating sk_err_queue > >>> mentioned by Martin. > >> > >> Sorry, I don't fully get it. In this patch series, this bpf extension > >> feature will not rely on sk_err_queue any more to report tx timestamps > >> to userspace. Bpf program can do that printing. > >> > >> Do you mean that it could be wrong if one skb carries the tsflags that > >> are previously set due to the bpf program and then suddenly users > >> detach the program? It indeed will put a new/cloned skb into the error > >> queue. Interesting corner case. It seems I have to re-implement a > >> totally independent tsflags for bpf extension feature. Do you have a > >> better idea on this? > > > > I feel that if I could introduce bpf new flags like > > SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch > > series, then it will not populate skb in sk_err_queue even users > > remove the bpf program all of sudden. With this kind of specific bpf > > flags, we can also avoid conflicting with the apps using > > SO_TIEMSTAMPING feature. Let me give it a shot unless a better > > solution shows up. > > It doesn't look great to have duplicate flags just to indicate that this > particular timestamp was asked by a bpf program, even though it looks Or introduce a new field in struct sock or struct sk_buff so that existing SOF_TIMESTAMPING_* can be reused. > like a straight forward solution. Sounds like we have to re-think the > interface for timestamping requests, but I don't have proper suggestion > right now. Thanks for your help :) Thanks, Jason
On 09/10/2024 14:47, Jason Xing wrote: > On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko > <vadim.fedorenko@linux.dev> wrote: >> >> On 09/10/2024 12:48, Jason Xing wrote: >>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>> >>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko >>>> <vadim.fedorenko@linux.dev> wrote: >>>>> >>>>> On 09/10/2024 02:05, Jason Xing wrote: >>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>>>>> >>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn >>>>>>> <willemdebruijn.kernel@gmail.com> wrote: >>>>>>>> >>>>>>>> Jason Xing wrote: >>>>>>>>> From: Jason Xing <kernelxing@tencent.com> >>>>>>>>> >>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using >>>>>>>>> tracepoint to print information (say, tstamp) so that we can >>>>>>>>> transparently equip applications with this feature and require no >>>>>>>>> modification in user side. >>>>>>>>> >>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better >>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de >>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a >>>>>>>>> better solution to extend. >>>>>>>>> >>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, >>>>>>>>> users only needs to pass certain flags through bpf program to make sure >>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag. >>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags >>>>>>>>> and wait for the moment when recvmsg() is called. >>>>>>>> >>>>>>>> As you mention, overall I am very supportive of having a way to add >>>>>>>> timestamping by adminstrators, without having to rebuild applications. >>>>>>>> BPF hooks seem to be the right place for this. >>>>>>>> >>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting >>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature >>>>>>>> set, and correlation between measurements for the same data in the >>>>>>>> stream. >>>>>>>> >>>>>>>>> After this series, we could step by step implement more advanced >>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. >>>>>>>> >>>>>>>> My main implementation concern is where this API overlaps with the >>>>>>>> existing user API, and how they might conflict. A few questions in the >>>>>>>> patches. >>>>>>> >>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for >>>>>>> related experts' help. >>>>>>> >>>>>>> How to deal with it without interfering with the existing apps in the >>>>>>> right way is the key problem. >>>>>> >>>>>> What I try to implement is let the bpf program have the highest >>>>>> precedence. It's similar to RTO min, see the commit as an example: >>>>>> >>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1 >>>>>> Author: Kevin Yang <yyd@google.com> >>>>>> Date: Mon Jun 3 21:30:54 2024 +0000 >>>>>> >>>>>> tcp: add sysctl_tcp_rto_min_us >>>>>> >>>>>> Adding a sysctl knob to allow user to specify a default >>>>>> rto_min at socket init time, other than using the hard >>>>>> coded 200ms default rto_min. >>>>>> >>>>>> Note that the rto_min route option has the highest precedence >>>>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN >>>>>> socket option, followed by the tcp_rto_min_us sysctl. >>>>>> >>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl. >>>>>> The first priority can override others. It doesn't have a good >>>>>> chance/point to restore the icsk_rto_min field if users want to >>>>>> shutdown the bpf program because it is set in >>>>>> bpf_sol_tcp_setsockopt(). >>>>> >>>>> rto_min example is slightly different. With tcp_rto_min the doesn't >>>>> expect any data to come back to user space while for timestamping the >>>>> app may be confused directly by providing more data, or by not providing >>>>> expected data. I believe some hint about requestor of the data is needed >>>>> here. It will also help to solve the problem of populating sk_err_queue >>>>> mentioned by Martin. >>>> >>>> Sorry, I don't fully get it. In this patch series, this bpf extension >>>> feature will not rely on sk_err_queue any more to report tx timestamps >>>> to userspace. Bpf program can do that printing. >>>> >>>> Do you mean that it could be wrong if one skb carries the tsflags that >>>> are previously set due to the bpf program and then suddenly users >>>> detach the program? It indeed will put a new/cloned skb into the error >>>> queue. Interesting corner case. It seems I have to re-implement a >>>> totally independent tsflags for bpf extension feature. Do you have a >>>> better idea on this? >>> >>> I feel that if I could introduce bpf new flags like >>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch >>> series, then it will not populate skb in sk_err_queue even users >>> remove the bpf program all of sudden. With this kind of specific bpf >>> flags, we can also avoid conflicting with the apps using >>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better >>> solution shows up. >> >> It doesn't look great to have duplicate flags just to indicate that this >> particular timestamp was asked by a bpf program, even though it looks > > Or introduce a new field in struct sock or struct sk_buff so that > existing SOF_TIMESTAMPING_* can be reused. Well, I was thinking about this way. We can potentially add an array of tsflags meaning the index of the array is the requestor. That will be more flexible in terms of adding new requestor (like scheduler or congestion control algo) if needed. But it comes with increased memory usage on hot path which might be a blocker. >> like a straight forward solution. Sounds like we have to re-think the >> interface for timestamping requests, but I don't have proper suggestion >> right now. > > Thanks for your help :) > > Thanks, > Jason
On Wed, Oct 9, 2024 at 9:58 PM Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > > On 09/10/2024 14:47, Jason Xing wrote: > > On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko > > <vadim.fedorenko@linux.dev> wrote: > >> > >> On 09/10/2024 12:48, Jason Xing wrote: > >>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > >>>> > >>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko > >>>> <vadim.fedorenko@linux.dev> wrote: > >>>>> > >>>>> On 09/10/2024 02:05, Jason Xing wrote: > >>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > >>>>>>> > >>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > >>>>>>> <willemdebruijn.kernel@gmail.com> wrote: > >>>>>>>> > >>>>>>>> Jason Xing wrote: > >>>>>>>>> From: Jason Xing <kernelxing@tencent.com> > >>>>>>>>> > >>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > >>>>>>>>> tracepoint to print information (say, tstamp) so that we can > >>>>>>>>> transparently equip applications with this feature and require no > >>>>>>>>> modification in user side. > >>>>>>>>> > >>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better > >>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de > >>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a > >>>>>>>>> better solution to extend. > >>>>>>>>> > >>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, > >>>>>>>>> users only needs to pass certain flags through bpf program to make sure > >>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag. > >>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > >>>>>>>>> and wait for the moment when recvmsg() is called. > >>>>>>>> > >>>>>>>> As you mention, overall I am very supportive of having a way to add > >>>>>>>> timestamping by adminstrators, without having to rebuild applications. > >>>>>>>> BPF hooks seem to be the right place for this. > >>>>>>>> > >>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting > >>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature > >>>>>>>> set, and correlation between measurements for the same data in the > >>>>>>>> stream. > >>>>>>>> > >>>>>>>>> After this series, we could step by step implement more advanced > >>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. > >>>>>>>> > >>>>>>>> My main implementation concern is where this API overlaps with the > >>>>>>>> existing user API, and how they might conflict. A few questions in the > >>>>>>>> patches. > >>>>>>> > >>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for > >>>>>>> related experts' help. > >>>>>>> > >>>>>>> How to deal with it without interfering with the existing apps in the > >>>>>>> right way is the key problem. > >>>>>> > >>>>>> What I try to implement is let the bpf program have the highest > >>>>>> precedence. It's similar to RTO min, see the commit as an example: > >>>>>> > >>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1 > >>>>>> Author: Kevin Yang <yyd@google.com> > >>>>>> Date: Mon Jun 3 21:30:54 2024 +0000 > >>>>>> > >>>>>> tcp: add sysctl_tcp_rto_min_us > >>>>>> > >>>>>> Adding a sysctl knob to allow user to specify a default > >>>>>> rto_min at socket init time, other than using the hard > >>>>>> coded 200ms default rto_min. > >>>>>> > >>>>>> Note that the rto_min route option has the highest precedence > >>>>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN > >>>>>> socket option, followed by the tcp_rto_min_us sysctl. > >>>>>> > >>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl. > >>>>>> The first priority can override others. It doesn't have a good > >>>>>> chance/point to restore the icsk_rto_min field if users want to > >>>>>> shutdown the bpf program because it is set in > >>>>>> bpf_sol_tcp_setsockopt(). > >>>>> > >>>>> rto_min example is slightly different. With tcp_rto_min the doesn't > >>>>> expect any data to come back to user space while for timestamping the > >>>>> app may be confused directly by providing more data, or by not providing > >>>>> expected data. I believe some hint about requestor of the data is needed > >>>>> here. It will also help to solve the problem of populating sk_err_queue > >>>>> mentioned by Martin. > >>>> > >>>> Sorry, I don't fully get it. In this patch series, this bpf extension > >>>> feature will not rely on sk_err_queue any more to report tx timestamps > >>>> to userspace. Bpf program can do that printing. > >>>> > >>>> Do you mean that it could be wrong if one skb carries the tsflags that > >>>> are previously set due to the bpf program and then suddenly users > >>>> detach the program? It indeed will put a new/cloned skb into the error > >>>> queue. Interesting corner case. It seems I have to re-implement a > >>>> totally independent tsflags for bpf extension feature. Do you have a > >>>> better idea on this? > >>> > >>> I feel that if I could introduce bpf new flags like > >>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch > >>> series, then it will not populate skb in sk_err_queue even users > >>> remove the bpf program all of sudden. With this kind of specific bpf > >>> flags, we can also avoid conflicting with the apps using > >>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better > >>> solution shows up. > >> > >> It doesn't look great to have duplicate flags just to indicate that this > >> particular timestamp was asked by a bpf program, even though it looks > > > > Or introduce a new field in struct sock or struct sk_buff so that > > existing SOF_TIMESTAMPING_* can be reused. > > Well, I was thinking about this way. We can potentially add an array of > tsflags meaning the index of the array is the requestor. That will be > more flexible in terms of adding new requestor (like scheduler or > congestion control algo) if needed. But it comes with increased memory > usage on hot path which might be a blocker. Is the following code snippet what you expect? But I wonder why not just add a u32 field instead and then use each bit of it defined in include/uapi/linux/net_tstamp.h? diff --git a/include/net/sock.h b/include/net/sock.h index b32f1424ecc5..4677f53da75a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -445,6 +445,7 @@ struct sock { u32 sk_reserved_mem; int sk_forward_alloc; u32 sk_tsflags; + u32 new_tsflags[10]; __cacheline_group_end(sock_write_rxtx); __cacheline_group_begin(sock_write_tx); I could be missing something. Sorry. If possible, could you show me some code snippets? As for the new requestor, IIUC, do you want to add more tx timestamp generating points in the future? Thanks, Jason
On 09/10/2024 15:35, Jason Xing wrote: > On Wed, Oct 9, 2024 at 9:58 PM Vadim Fedorenko > <vadim.fedorenko@linux.dev> wrote: >> >> On 09/10/2024 14:47, Jason Xing wrote: >>> On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko >>> <vadim.fedorenko@linux.dev> wrote: >>>> >>>> On 09/10/2024 12:48, Jason Xing wrote: >>>>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>>>> >>>>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko >>>>>> <vadim.fedorenko@linux.dev> wrote: >>>>>>> >>>>>>> On 09/10/2024 02:05, Jason Xing wrote: >>>>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn >>>>>>>>> <willemdebruijn.kernel@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Jason Xing wrote: >>>>>>>>>>> From: Jason Xing <kernelxing@tencent.com> >>>>>>>>>>> >>>>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using >>>>>>>>>>> tracepoint to print information (say, tstamp) so that we can >>>>>>>>>>> transparently equip applications with this feature and require no >>>>>>>>>>> modification in user side. >>>>>>>>>>> >>>>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better >>>>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de >>>>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a >>>>>>>>>>> better solution to extend. >>>>>>>>>>> >>>>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, >>>>>>>>>>> users only needs to pass certain flags through bpf program to make sure >>>>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag. >>>>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags >>>>>>>>>>> and wait for the moment when recvmsg() is called. >>>>>>>>>> >>>>>>>>>> As you mention, overall I am very supportive of having a way to add >>>>>>>>>> timestamping by adminstrators, without having to rebuild applications. >>>>>>>>>> BPF hooks seem to be the right place for this. >>>>>>>>>> >>>>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting >>>>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature >>>>>>>>>> set, and correlation between measurements for the same data in the >>>>>>>>>> stream. >>>>>>>>>> >>>>>>>>>>> After this series, we could step by step implement more advanced >>>>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. >>>>>>>>>> >>>>>>>>>> My main implementation concern is where this API overlaps with the >>>>>>>>>> existing user API, and how they might conflict. A few questions in the >>>>>>>>>> patches. >>>>>>>>> >>>>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for >>>>>>>>> related experts' help. >>>>>>>>> >>>>>>>>> How to deal with it without interfering with the existing apps in the >>>>>>>>> right way is the key problem. >>>>>>>> >>>>>>>> What I try to implement is let the bpf program have the highest >>>>>>>> precedence. It's similar to RTO min, see the commit as an example: >>>>>>>> >>>>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1 >>>>>>>> Author: Kevin Yang <yyd@google.com> >>>>>>>> Date: Mon Jun 3 21:30:54 2024 +0000 >>>>>>>> >>>>>>>> tcp: add sysctl_tcp_rto_min_us >>>>>>>> >>>>>>>> Adding a sysctl knob to allow user to specify a default >>>>>>>> rto_min at socket init time, other than using the hard >>>>>>>> coded 200ms default rto_min. >>>>>>>> >>>>>>>> Note that the rto_min route option has the highest precedence >>>>>>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN >>>>>>>> socket option, followed by the tcp_rto_min_us sysctl. >>>>>>>> >>>>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl. >>>>>>>> The first priority can override others. It doesn't have a good >>>>>>>> chance/point to restore the icsk_rto_min field if users want to >>>>>>>> shutdown the bpf program because it is set in >>>>>>>> bpf_sol_tcp_setsockopt(). >>>>>>> >>>>>>> rto_min example is slightly different. With tcp_rto_min the doesn't >>>>>>> expect any data to come back to user space while for timestamping the >>>>>>> app may be confused directly by providing more data, or by not providing >>>>>>> expected data. I believe some hint about requestor of the data is needed >>>>>>> here. It will also help to solve the problem of populating sk_err_queue >>>>>>> mentioned by Martin. >>>>>> >>>>>> Sorry, I don't fully get it. In this patch series, this bpf extension >>>>>> feature will not rely on sk_err_queue any more to report tx timestamps >>>>>> to userspace. Bpf program can do that printing. >>>>>> >>>>>> Do you mean that it could be wrong if one skb carries the tsflags that >>>>>> are previously set due to the bpf program and then suddenly users >>>>>> detach the program? It indeed will put a new/cloned skb into the error >>>>>> queue. Interesting corner case. It seems I have to re-implement a >>>>>> totally independent tsflags for bpf extension feature. Do you have a >>>>>> better idea on this? >>>>> >>>>> I feel that if I could introduce bpf new flags like >>>>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch >>>>> series, then it will not populate skb in sk_err_queue even users >>>>> remove the bpf program all of sudden. With this kind of specific bpf >>>>> flags, we can also avoid conflicting with the apps using >>>>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better >>>>> solution shows up. >>>> >>>> It doesn't look great to have duplicate flags just to indicate that this >>>> particular timestamp was asked by a bpf program, even though it looks >>> >>> Or introduce a new field in struct sock or struct sk_buff so that >>> existing SOF_TIMESTAMPING_* can be reused. >> >> Well, I was thinking about this way. We can potentially add an array of >> tsflags meaning the index of the array is the requestor. That will be >> more flexible in terms of adding new requestor (like scheduler or >> congestion control algo) if needed. But it comes with increased memory >> usage on hot path which might be a blocker. > > Is the following code snippet what you expect? But I wonder why not > just add a u32 field instead and then use each bit of it defined in > include/uapi/linux/net_tstamp.h? > > diff --git a/include/net/sock.h b/include/net/sock.h > index b32f1424ecc5..4677f53da75a 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -445,6 +445,7 @@ struct sock { > u32 sk_reserved_mem; > int sk_forward_alloc; > u32 sk_tsflags; > + u32 new_tsflags[10]; > __cacheline_group_end(sock_write_rxtx); > > __cacheline_group_begin(sock_write_tx); > > I could be missing something. Sorry. If possible, could you show me > some code snippets? > > As for the new requestor, IIUC, do you want to add more tx timestamp > generating points in the future? It's more like this: diff --git a/include/net/sock.h b/include/net/sock.h index c58ca8dd561b..93f931dcc4cc 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -234,6 +234,14 @@ struct sock_common { struct bpf_local_storage; struct sk_filter; +enum { + SOCKETOPT_TS_REQUESTOR = 0, + CMSG_TS_REQUESTOR, + BPFPROG_TS_REQUESTOR, + + __MAX_TS_REQUESTOR, +}; + /** * struct sock - network layer representation of sockets * @__sk_common: shared layout with inet_timewait_sock @@ -444,7 +452,7 @@ struct sock { socket_lock_t sk_lock; u32 sk_reserved_mem; int sk_forward_alloc; - u32 sk_tsflags; + u32 sk_tsflags[__MAX_TS_REQUESTOR]; __cacheline_group_end(sock_write_rxtx); __cacheline_group_begin(sock_write_tx); And use existing SOF_TIMESTAMPING_* for each element in the array. Not sure that struct sock is the best place though, as some timestamping requests may be on per-packet basis for protocols other than TCP. Again, I'm just thinking out loud, kinda wild idea.
> It's more like this: > > diff --git a/include/net/sock.h b/include/net/sock.h > index c58ca8dd561b..93f931dcc4cc 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -234,6 +234,14 @@ struct sock_common { > struct bpf_local_storage; > struct sk_filter; > > +enum { > + SOCKETOPT_TS_REQUESTOR = 0, > + CMSG_TS_REQUESTOR, > + BPFPROG_TS_REQUESTOR, > + > + __MAX_TS_REQUESTOR, > +}; > + > /** > * struct sock - network layer representation of sockets > * @__sk_common: shared layout with inet_timewait_sock > @@ -444,7 +452,7 @@ struct sock { > socket_lock_t sk_lock; > u32 sk_reserved_mem; > int sk_forward_alloc; > - u32 sk_tsflags; > + u32 sk_tsflags[__MAX_TS_REQUESTOR]; > __cacheline_group_end(sock_write_rxtx); > > __cacheline_group_begin(sock_write_tx); > > > And use existing SOF_TIMESTAMPING_* for each element in the array. Not > sure that struct sock is the best place though, as some timestamping > requests may be on per-packet basis for protocols other than TCP. > > Again, I'm just thinking out loud, kinda wild idea. Thanks. I see. Requestor or requester? I don't know. For now, __MAX_TS_REQUESTOR can be two, one is used for the old implementation, the other one is used for BPF extension. One irrelevant question is if we need CMSG_TS_REQUESTOR to split the old tsflags into two because the cmsg relies on sk->sk_tsflags which works well. The whole idea is very interesting and inspiring to me! It could be a good way to go. But as you said, the memory can be a blocker. And where exactly we should add in struct sock is another problem because the size of this array could be different if we add more requestors in the future. I think I can write in the next version based on this idea (sk_tsflags[MAX] array has two requestor members at the current stage) and then seek for other experts' opinions. +Willem, I'd like to know if you are for or against this idea? Thanks, Jason
From: Jason Xing <kernelxing@tencent.com> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using tracepoint to print information (say, tstamp) so that we can transparently equip applications with this feature and require no modification in user side. Later, we discussed at netconf and agreed that we can use bpf for better extension, which is mainly suggested by John Fastabend and Willem de Bruijn. Many thanks here! So I post this series to see if we have a better solution to extend. This approach relies on existing SO_TIMESTAMPING feature, for tx path, users only needs to pass certain flags through bpf program to make sure the last skb from each sendmsg() has timestamp related controlled flag. For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags and wait for the moment when recvmsg() is called. After this series, we could step by step implement more advanced functions/flags already in SO_TIMESTAMPING feature for bpf extension. Here is the test output: 1) receive path iperf3-987305 [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0 2) xmit path iperf3-19765 [013] ...11 2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584 iperf3-19765 [013] b..11 2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976 iperf3-19765 [013] ...11 2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739 Here is the full bpf program: #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h> #include <uapi/linux/net_tstamp.h> int _version SEC("version") = 1; char _license[] SEC("license") = "GPL"; # define SO_TIMESTAMPING 37 __section("sockops") int set_initial_rto(struct bpf_sock_ops *skops) { int op = (int) skops->op; u32 sport = 0, dport = 0; int rcv_flags; switch (op) { case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: rcv_flags = SOF_TIMESTAMPING_RX_SOFTWARE; bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &rcv_flags, sizeof(rcv_flags)); bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG); break; case BPF_SOCK_OPS_TX_TS_OPT_CB: skops->reply = SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE| SOF_TIMESTAMPING_OPT_ID|SOF_TIMESTAMPING_OPT_ID_TCP; bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG); break; case BPF_SOCK_OPS_TS_SCHED_OPT_CB: case BPF_SOCK_OPS_TS_SW_OPT_CB: case BPF_SOCK_OPS_TS_ACK_OPT_CB: dport = bpf_ntohl(skops->remote_port); sport = skops->local_port; bpf_printk("tx: port: %u:%u, key: %u, timestamp: %u,%u\n", sport, dport, skops->args[0], skops->args[1], skops->args[2]); break; case BPF_SOCK_OPS_TS_RX_OPT_CB: dport = bpf_ntohl(skops->remote_port); sport = skops->local_port; bpf_printk("rx: port: %u:%u, swtimestamp: %u,%u, hwtimestamp: %u,%u\n", sport, dport, skops->args[0], skops->args[1], skops->args[2], skops->args[3]); break; } return 1; } Jason Xing (9): net-timestamp: add bpf infrastructure to allow exposing more information later net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp net-timestamp: ready to turn on the button to generate tx timestamps net-timestamp: add tx OPT_ID_TCP support for bpf case net-timestamp: open gate for bpf_setsockopt net-timestamp: add bpf framework for rx timestamps net-timestamp: add bpf support for rx software/hardware timestamp include/linux/tcp.h | 2 +- include/net/tcp.h | 14 ++++++ include/uapi/linux/bpf.h | 36 ++++++++++++++- net/core/filter.c | 3 ++ net/core/skbuff.c | 51 +++++++++++++++++++++ net/ipv4/tcp.c | 81 ++++++++++++++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 36 ++++++++++++++- 7 files changed, 217 insertions(+), 6 deletions(-)