mbox series

[net-next,v2,00/12] net-timestamp: bpf extension to equip applications transparently

Message ID 20241012040651.95616-1-kerneljasonxing@gmail.com (mailing list archive)
Headers show
Series net-timestamp: bpf extension to equip applications transparently | expand

Message

Jason Xing Oct. 12, 2024, 4:06 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
tracepoint to print information (say, tstamp) so that we can
transparently equip applications with this feature and require no
modification in user side.

Later, we discussed at netconf and agreed that we can use bpf for better
extension, which is mainly suggested by John Fastabend and Willem de
Bruijn. Many thanks here! So I post this series to see if we have a
better solution to extend. My feeling is BPF is a good place to provide
a way to add timestamping by administrators, without having to rebuild
applications. 

This approach mostly relies on existing SO_TIMESTAMPING feature, users
only needs to pass certain flags through bpf_setsocktop() to a separate
tsflags. For TX timestamps, they will be printed during generation
phase. For RX timestamps, we will wait for the moment when recvmsg() is
called.

After this series, we could step by step implement more advanced
functions/flags already in SO_TIMESTAMPING feature for bpf extension.

In this series, I only support TCP protocol which is widely used in
SO_TIMESTAMPING feature.

---
V2
Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
1. Introduce tsflag requestors so that we are able to extend more in the
future. Besides, it enables TX flags for bpf extension feature separately
without breaking users. It is suggested by Vadim Fedorenko.
2. introduce a static key to control the whole feature. (Willem)
3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
some TX/RX cases, not all the cases.

Note:
The main concern we've discussion in V1 thread is how to deal with the
applications using SO_TIMESTAMPING feature? In this series, I allow both
cases to happen at the same time, which indicates that even one
applications setting SO_TIMESTAMPING can still be traced through BPF
program. Please see patch [04/12].


Here is the test output:
1) receive path
iperf3-987305  [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0
2) xmit path
iperf3-19765   [013] ...11  2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584
iperf3-19765   [013] b..11  2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976
iperf3-19765   [013] ...11  2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739

Here is the full bpf program:
#include <linux/bpf.h>

#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include <uapi/linux/net_tstamp.h>

int _version SEC("version") = 1;
char _license[] SEC("license") = "GPL";

# define SO_TIMESTAMPING         37

__section("sockops")
int set_initial_rto(struct bpf_sock_ops *skops)
{
	int op = (int) skops->op;
	u32 sport = 0, dport = 0;
	int flags;

	switch (op) {
	//case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
	case BPF_SOCK_OPS_TCP_CONNECT_CB:
	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
		flags = SOF_TIMESTAMPING_RX_SOFTWARE |
			SOF_TIMESTAMPING_TX_SCHED | SOF_TIMESTAMPING_TX_ACK | SOF_TIMESTAMPING_TX_SOFTWARE |
			SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP;
		bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
		bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG|BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG);
		break;
	case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
	case BPF_SOCK_OPS_TS_SW_OPT_CB:
	case BPF_SOCK_OPS_TS_ACK_OPT_CB:
		dport = bpf_ntohl(skops->remote_port);
		sport = skops->local_port;
		bpf_printk("tx: port: %u:%u, key: %u, timestamp: %u,%u\n",
			    sport, dport, skops->args[0], skops->args[1], skops->args[2]);
		break;
	case BPF_SOCK_OPS_TS_RX_OPT_CB:
		dport = bpf_ntohl(skops->remote_port);
		sport = skops->local_port;
		bpf_printk("rx: port: %u:%u, swtimestamp: %u,%u, hwtimestamp: %u,%u\n",
			   sport, dport, skops->args[0], skops->args[1], skops->args[2], skops->args[3]);
		break;
	}
	return 1;
}


Jason Xing (12):
  net-timestamp: introduce socket tsflag requestors
  net-timestamp: open gate for bpf_setsockopt
  net-timestamp: reorganize in skb_tstamp_tx_output()
  net-timestamp: add static key to control the whole bpf extension
  net-timestamp: add bpf infrastructure to allow exposing timestamp
    later
  net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit
    timestamp
  net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp
  net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp
  net-timestamp: add tx OPT_ID_TCP support for bpf case
  net-timestamp: make bpf for tx timestamp work
  net-timestamp: add bpf framework for rx timestamps
  net-timestamp: add bpf support for rx software/hardware timestamp

 include/linux/tcp.h            |   2 +-
 include/net/ip.h               |   2 +-
 include/net/sock.h             |  19 ++++--
 include/net/tcp.h              |  16 +++++-
 include/uapi/linux/bpf.h       |  28 ++++++++-
 net/can/j1939/socket.c         |   2 +-
 net/core/filter.c              |  39 +++++++++++++
 net/core/skbuff.c              | 102 ++++++++++++++++++++++++++++++---
 net/core/sock.c                |  68 +++++++++++++++-------
 net/ipv4/ip_output.c           |   2 +-
 net/ipv4/ip_sockglue.c         |   2 +-
 net/ipv4/tcp.c                 |  60 ++++++++++++++++++-
 net/ipv6/ip6_output.c          |   2 +-
 net/ipv6/ping.c                |   2 +-
 net/ipv6/raw.c                 |   2 +-
 net/ipv6/udp.c                 |   2 +-
 net/sctp/socket.c              |   2 +-
 net/socket.c                   |   4 +-
 tools/include/uapi/linux/bpf.h |  28 ++++++++-
 19 files changed, 333 insertions(+), 51 deletions(-)

Comments

Willem de Bruijn Oct. 12, 2024, 5:48 p.m. UTC | #1
Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> tracepoint to print information (say, tstamp) so that we can
> transparently equip applications with this feature and require no
> modification in user side.
> 
> Later, we discussed at netconf and agreed that we can use bpf for better
> extension, which is mainly suggested by John Fastabend and Willem de
> Bruijn. Many thanks here! So I post this series to see if we have a
> better solution to extend. My feeling is BPF is a good place to provide
> a way to add timestamping by administrators, without having to rebuild
> applications. 
> 
> This approach mostly relies on existing SO_TIMESTAMPING feature, users
> only needs to pass certain flags through bpf_setsocktop() to a separate
> tsflags. For TX timestamps, they will be printed during generation
> phase. For RX timestamps, we will wait for the moment when recvmsg() is
> called.
> 
> After this series, we could step by step implement more advanced
> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> 
> In this series, I only support TCP protocol which is widely used in
> SO_TIMESTAMPING feature.
> 
> ---
> V2
> Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> 1. Introduce tsflag requestors so that we are able to extend more in the
> future. Besides, it enables TX flags for bpf extension feature separately
> without breaking users. It is suggested by Vadim Fedorenko.
> 2. introduce a static key to control the whole feature. (Willem)
> 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> some TX/RX cases, not all the cases.
> 
> Note:
> The main concern we've discussion in V1 thread is how to deal with the
> applications using SO_TIMESTAMPING feature? In this series, I allow both
> cases to happen at the same time, which indicates that even one
> applications setting SO_TIMESTAMPING can still be traced through BPF
> program. Please see patch [04/12].

This revision does not address the main concern.

An administrator installed BPF program can affect results of a process
using SO_TIMESTAMPING in ways that break it.

My halfway suggestion was to only enable this if the process has not
enabled timestamping on a socket, and to hard fail the application if
it does enable it while BPF timestamping is active. You pushed back,
entirely reasonably. But if anything we need a stronger method of
isolation, not just ignore the issue.
Jason Xing Oct. 13, 2024, 3:28 a.m. UTC | #2
On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > tracepoint to print information (say, tstamp) so that we can
> > transparently equip applications with this feature and require no
> > modification in user side.
> >
> > Later, we discussed at netconf and agreed that we can use bpf for better
> > extension, which is mainly suggested by John Fastabend and Willem de
> > Bruijn. Many thanks here! So I post this series to see if we have a
> > better solution to extend. My feeling is BPF is a good place to provide
> > a way to add timestamping by administrators, without having to rebuild
> > applications.
> >
> > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > only needs to pass certain flags through bpf_setsocktop() to a separate
> > tsflags. For TX timestamps, they will be printed during generation
> > phase. For RX timestamps, we will wait for the moment when recvmsg() is
> > called.
> >
> > After this series, we could step by step implement more advanced
> > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >
> > In this series, I only support TCP protocol which is widely used in
> > SO_TIMESTAMPING feature.
> >
> > ---
> > V2
> > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> > 1. Introduce tsflag requestors so that we are able to extend more in the
> > future. Besides, it enables TX flags for bpf extension feature separately
> > without breaking users. It is suggested by Vadim Fedorenko.
> > 2. introduce a static key to control the whole feature. (Willem)
> > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> > some TX/RX cases, not all the cases.
> >
> > Note:
> > The main concern we've discussion in V1 thread is how to deal with the
> > applications using SO_TIMESTAMPING feature? In this series, I allow both
> > cases to happen at the same time, which indicates that even one
> > applications setting SO_TIMESTAMPING can still be traced through BPF
> > program. Please see patch [04/12].
>
> This revision does not address the main concern.
>
> An administrator installed BPF program can affect results of a process
> using SO_TIMESTAMPING in ways that break it.

Sorry, I didn't get it. How the following code snippet would break users?

void __skb_tstamp_tx(struct sk_buff *orig_skb,
                     const struct sk_buff *ack_skb,
                     struct skb_shared_hwtstamps *hwtstamps,
                     struct sock *sk, int tstype)
{
        if (!sk)
                return;

        if (static_branch_unlikely(&bpf_tstamp_control))
                bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps);

        skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk,
tstype);
}

You can see, the application shipped with SO_TIMESTAMPING still prints
timestamps even when the application stays in the attached cgroup
directory.

Thanks,
Jason
Jason Xing Oct. 13, 2024, 3:43 a.m. UTC | #3
On Sun, Oct 13, 2024 at 11:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > > tracepoint to print information (say, tstamp) so that we can
> > > transparently equip applications with this feature and require no
> > > modification in user side.
> > >
> > > Later, we discussed at netconf and agreed that we can use bpf for better
> > > extension, which is mainly suggested by John Fastabend and Willem de
> > > Bruijn. Many thanks here! So I post this series to see if we have a
> > > better solution to extend. My feeling is BPF is a good place to provide
> > > a way to add timestamping by administrators, without having to rebuild
> > > applications.
> > >
> > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > only needs to pass certain flags through bpf_setsocktop() to a separate
> > > tsflags. For TX timestamps, they will be printed during generation
> > > phase. For RX timestamps, we will wait for the moment when recvmsg() is
> > > called.
> > >
> > > After this series, we could step by step implement more advanced
> > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > >
> > > In this series, I only support TCP protocol which is widely used in
> > > SO_TIMESTAMPING feature.
> > >
> > > ---
> > > V2
> > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> > > 1. Introduce tsflag requestors so that we are able to extend more in the
> > > future. Besides, it enables TX flags for bpf extension feature separately
> > > without breaking users. It is suggested by Vadim Fedorenko.
> > > 2. introduce a static key to control the whole feature. (Willem)
> > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> > > some TX/RX cases, not all the cases.
> > >
> > > Note:
> > > The main concern we've discussion in V1 thread is how to deal with the
> > > applications using SO_TIMESTAMPING feature? In this series, I allow both
> > > cases to happen at the same time, which indicates that even one
> > > applications setting SO_TIMESTAMPING can still be traced through BPF
> > > program. Please see patch [04/12].
> >
> > This revision does not address the main concern.
> >
> > An administrator installed BPF program can affect results of a process
> > using SO_TIMESTAMPING in ways that break it.
>
> Sorry, I didn't get it. How the following code snippet would break users?
>
> void __skb_tstamp_tx(struct sk_buff *orig_skb,
>                      const struct sk_buff *ack_skb,
>                      struct skb_shared_hwtstamps *hwtstamps,
>                      struct sock *sk, int tstype)
> {
>         if (!sk)
>                 return;
>
>         if (static_branch_unlikely(&bpf_tstamp_control))
>                 bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps);
>
>         skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk,
> tstype);
> }
>
> You can see, the application shipped with SO_TIMESTAMPING still prints
> timestamps even when the application stays in the attached cgroup
> directory.

I tested this by running "./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 5"
in the bpf attached directory and it can correctly print the
timestamp. So it would not break users.

And surprisingly I found the key is not that right (ERROR: key 1000,
expected 999). I will investigate and fix it.

Thanks,
Jason
Jason Xing Oct. 13, 2024, 6:05 a.m. UTC | #4
> I tested this by running "./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 5"
> in the bpf attached directory and it can correctly print the
> timestamp. So it would not break users.
>
> And surprisingly I found the key is not that right (ERROR: key 1000,
> expected 999). I will investigate and fix it.

Ah, I think I know the reason. In this series, the BPF extension
allows setting before sending SYN packet in the beginning of
tcp_connect(), which is different from the original design that allows
setting after sending the SYN packet. It causes the unexpected key.
They are different. The reason why the failure is triggered is because
I reuse the tskey logic in the BPF extension...

====
Back to the question on how to solve the conflicts, if we finally
reckon that the original feature has the first priority, I can change
the order in the next version.

void __skb_tstamp_tx(struct sk_buff *orig_skb,
                     const struct sk_buff *ack_skb,
                     struct skb_shared_hwtstamps *hwtstamps,
                     struct sock *sk, int tstype)
{
        if (!sk)
                return;

       ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
       if (ret)
               /* Apps does set the SO_TIMESTAMPING flag, return directly */
               return;

       if (static_branch_unlikely(&bpf_tstamp_control))
                bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps);
}

In this way, it will allow either of two features to work. Willem, do
you think it is fine with you?

Thanks,
Jason
Willem de Bruijn Oct. 15, 2024, 1:28 a.m. UTC | #5
Jason Xing wrote:
> On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > > tracepoint to print information (say, tstamp) so that we can
> > > transparently equip applications with this feature and require no
> > > modification in user side.
> > >
> > > Later, we discussed at netconf and agreed that we can use bpf for better
> > > extension, which is mainly suggested by John Fastabend and Willem de
> > > Bruijn. Many thanks here! So I post this series to see if we have a
> > > better solution to extend. My feeling is BPF is a good place to provide
> > > a way to add timestamping by administrators, without having to rebuild
> > > applications.
> > >
> > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > only needs to pass certain flags through bpf_setsocktop() to a separate
> > > tsflags. For TX timestamps, they will be printed during generation
> > > phase. For RX timestamps, we will wait for the moment when recvmsg() is
> > > called.
> > >
> > > After this series, we could step by step implement more advanced
> > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > >
> > > In this series, I only support TCP protocol which is widely used in
> > > SO_TIMESTAMPING feature.
> > >
> > > ---
> > > V2
> > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> > > 1. Introduce tsflag requestors so that we are able to extend more in the
> > > future. Besides, it enables TX flags for bpf extension feature separately
> > > without breaking users. It is suggested by Vadim Fedorenko.
> > > 2. introduce a static key to control the whole feature. (Willem)
> > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> > > some TX/RX cases, not all the cases.
> > >
> > > Note:
> > > The main concern we've discussion in V1 thread is how to deal with the
> > > applications using SO_TIMESTAMPING feature? In this series, I allow both
> > > cases to happen at the same time, which indicates that even one
> > > applications setting SO_TIMESTAMPING can still be traced through BPF
> > > program. Please see patch [04/12].
> >
> > This revision does not address the main concern.
> >
> > An administrator installed BPF program can affect results of a process
> > using SO_TIMESTAMPING in ways that break it.
> 
> Sorry, I didn't get it. How the following code snippet would break users?

The state between user and bpf timestamping needs to be separate to
avoid interference.

Introducing a new sk_tsflags for bpf goes a long way. Though I prefer
a separate sk_tsflags_bpf and not touching existing sk_tsflags over
the array approach of patch 1. Also need to check pahole and maybe
move sk_tsflags_bpf elsewhere in the struct.

Other state is sk_tskey. The current approach can initialize the key
in bpf before the user attempts it for the same socket. Admittedly
unlikely. But hard to reach states creates hard to debug issues.

This field cannot easily be duplicated, because the key is tracked
in skb_shinfo. Where there is not sufficient room for two keys.

The same goes for txflags.

The current approach is to set those flags if either user or bpf
requestss them, then on __skb_tstamp_tx detect if the user did not set
them, and if so skip output to the user. Need to take a closer look,
but seems to work.

So getting closer.
Jason Xing Oct. 15, 2024, 2:52 a.m. UTC | #6
On Tue, Oct 15, 2024 at 9:28 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > > > tracepoint to print information (say, tstamp) so that we can
> > > > transparently equip applications with this feature and require no
> > > > modification in user side.
> > > >
> > > > Later, we discussed at netconf and agreed that we can use bpf for better
> > > > extension, which is mainly suggested by John Fastabend and Willem de
> > > > Bruijn. Many thanks here! So I post this series to see if we have a
> > > > better solution to extend. My feeling is BPF is a good place to provide
> > > > a way to add timestamping by administrators, without having to rebuild
> > > > applications.
> > > >
> > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > > only needs to pass certain flags through bpf_setsocktop() to a separate
> > > > tsflags. For TX timestamps, they will be printed during generation
> > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is
> > > > called.
> > > >
> > > > After this series, we could step by step implement more advanced
> > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > > >
> > > > In this series, I only support TCP protocol which is widely used in
> > > > SO_TIMESTAMPING feature.
> > > >
> > > > ---
> > > > V2
> > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> > > > 1. Introduce tsflag requestors so that we are able to extend more in the
> > > > future. Besides, it enables TX flags for bpf extension feature separately
> > > > without breaking users. It is suggested by Vadim Fedorenko.
> > > > 2. introduce a static key to control the whole feature. (Willem)
> > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> > > > some TX/RX cases, not all the cases.
> > > >
> > > > Note:
> > > > The main concern we've discussion in V1 thread is how to deal with the
> > > > applications using SO_TIMESTAMPING feature? In this series, I allow both
> > > > cases to happen at the same time, which indicates that even one
> > > > applications setting SO_TIMESTAMPING can still be traced through BPF
> > > > program. Please see patch [04/12].
> > >
> > > This revision does not address the main concern.
> > >
> > > An administrator installed BPF program can affect results of a process
> > > using SO_TIMESTAMPING in ways that break it.
> >
> > Sorry, I didn't get it. How the following code snippet would break users?
>
> The state between user and bpf timestamping needs to be separate to
> avoid interference.

Do you agree that we will use this method as following, only allow
either of them to work?

void __skb_tstamp_tx(struct sk_buff *orig_skb,
                     const struct sk_buff *ack_skb,
                     struct skb_shared_hwtstamps *hwtstamps,
                     struct sock *sk, int tstype)
{
        if (!sk)
                return;

       ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
       if (ret)
               /* Apps does set the SO_TIMESTAMPING flag, return directly */
               return;

       if (static_branch_unlikely(&bpf_tstamp_control))
                bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps);
}

which means if the apps using non-bpf method, we will not see the
output even if we load bpf program.

>
> Introducing a new sk_tsflags for bpf goes a long way. Though I prefer
> a separate sk_tsflags_bpf and not touching existing sk_tsflags over
> the array approach of patch 1. Also need to check pahole and maybe
> move sk_tsflags_bpf elsewhere in the struct.

Yes, I will use this instead.

>
> Other state is sk_tskey. The current approach can initialize the key
> in bpf before the user attempts it for the same socket. Admittedly
> unlikely. But hard to reach states creates hard to debug issues.
>
> This field cannot easily be duplicated, because the key is tracked
> in skb_shinfo. Where there is not sufficient room for two keys.
>
> The same goes for txflags.

They are not that easy to handle in a proper way. That's the reason
why I chose to use the same logic, so that there is no side effect.

If we expect to separate them as well, it seems a little bit weird to
introduce another similar flags in struct sk_buff.

>
> The current approach is to set those flags if either user or bpf
> requestss them, then on __skb_tstamp_tx detect if the user did not set
> them, and if so skip output to the user. Need to take a closer look,
> but seems to work.

Let me keep this current approach, it will not affect each other.

>
> So getting closer.

Thanks for the careful review.

Thanks,
Jason
Willem de Bruijn Oct. 15, 2024, 2:59 a.m. UTC | #7
Jason Xing wrote:
> On Tue, Oct 15, 2024 at 9:28 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > > > > tracepoint to print information (say, tstamp) so that we can
> > > > > transparently equip applications with this feature and require no
> > > > > modification in user side.
> > > > >
> > > > > Later, we discussed at netconf and agreed that we can use bpf for better
> > > > > extension, which is mainly suggested by John Fastabend and Willem de
> > > > > Bruijn. Many thanks here! So I post this series to see if we have a
> > > > > better solution to extend. My feeling is BPF is a good place to provide
> > > > > a way to add timestamping by administrators, without having to rebuild
> > > > > applications.
> > > > >
> > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > > > only needs to pass certain flags through bpf_setsocktop() to a separate
> > > > > tsflags. For TX timestamps, they will be printed during generation
> > > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is
> > > > > called.
> > > > >
> > > > > After this series, we could step by step implement more advanced
> > > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > > > >
> > > > > In this series, I only support TCP protocol which is widely used in
> > > > > SO_TIMESTAMPING feature.
> > > > >
> > > > > ---
> > > > > V2
> > > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> > > > > 1. Introduce tsflag requestors so that we are able to extend more in the
> > > > > future. Besides, it enables TX flags for bpf extension feature separately
> > > > > without breaking users. It is suggested by Vadim Fedorenko.
> > > > > 2. introduce a static key to control the whole feature. (Willem)
> > > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> > > > > some TX/RX cases, not all the cases.
> > > > >
> > > > > Note:
> > > > > The main concern we've discussion in V1 thread is how to deal with the
> > > > > applications using SO_TIMESTAMPING feature? In this series, I allow both
> > > > > cases to happen at the same time, which indicates that even one
> > > > > applications setting SO_TIMESTAMPING can still be traced through BPF
> > > > > program. Please see patch [04/12].
> > > >
> > > > This revision does not address the main concern.
> > > >
> > > > An administrator installed BPF program can affect results of a process
> > > > using SO_TIMESTAMPING in ways that break it.
> > >
> > > Sorry, I didn't get it. How the following code snippet would break users?
> >
> > The state between user and bpf timestamping needs to be separate to
> > avoid interference.
> 
> Do you agree that we will use this method as following, only allow
> either of them to work?
> 
> void __skb_tstamp_tx(struct sk_buff *orig_skb,
>                      const struct sk_buff *ack_skb,
>                      struct skb_shared_hwtstamps *hwtstamps,
>                      struct sock *sk, int tstype)
> {
>         if (!sk)
>                 return;
> 
>        ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
>        if (ret)
>                /* Apps does set the SO_TIMESTAMPING flag, return directly */
>                return;
> 
>        if (static_branch_unlikely(&bpf_tstamp_control))
>                 bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps);
> }
> 
> which means if the apps using non-bpf method, we will not see the
> output even if we load bpf program.

Could the bpf setsockopt fail hard in that case?

Your current patch tries to make them work at the same time. It mostly
does work. There are just a few concerning edge cases that may result
in hard to understand bugs.

Making only one method work per socket and fail hard if both try it is
crude, but at least the failure will be clear: the setsockopt fails.

I think that's safer. And in practice, the use cases for BPF
timestamping probably are exactly when application timestamping is
missing?
Jason Xing Oct. 15, 2024, 3:02 a.m. UTC | #8
On Tue, Oct 15, 2024 at 10:59 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Tue, Oct 15, 2024 at 9:28 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Sun, Oct 13, 2024 at 1:48 AM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > >
> > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
> > > > > > tracepoint to print information (say, tstamp) so that we can
> > > > > > transparently equip applications with this feature and require no
> > > > > > modification in user side.
> > > > > >
> > > > > > Later, we discussed at netconf and agreed that we can use bpf for better
> > > > > > extension, which is mainly suggested by John Fastabend and Willem de
> > > > > > Bruijn. Many thanks here! So I post this series to see if we have a
> > > > > > better solution to extend. My feeling is BPF is a good place to provide
> > > > > > a way to add timestamping by administrators, without having to rebuild
> > > > > > applications.
> > > > > >
> > > > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > > > > only needs to pass certain flags through bpf_setsocktop() to a separate
> > > > > > tsflags. For TX timestamps, they will be printed during generation
> > > > > > phase. For RX timestamps, we will wait for the moment when recvmsg() is
> > > > > > called.
> > > > > >
> > > > > > After this series, we could step by step implement more advanced
> > > > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > > > > >
> > > > > > In this series, I only support TCP protocol which is widely used in
> > > > > > SO_TIMESTAMPING feature.
> > > > > >
> > > > > > ---
> > > > > > V2
> > > > > > Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
> > > > > > 1. Introduce tsflag requestors so that we are able to extend more in the
> > > > > > future. Besides, it enables TX flags for bpf extension feature separately
> > > > > > without breaking users. It is suggested by Vadim Fedorenko.
> > > > > > 2. introduce a static key to control the whole feature. (Willem)
> > > > > > 3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
> > > > > > some TX/RX cases, not all the cases.
> > > > > >
> > > > > > Note:
> > > > > > The main concern we've discussion in V1 thread is how to deal with the
> > > > > > applications using SO_TIMESTAMPING feature? In this series, I allow both
> > > > > > cases to happen at the same time, which indicates that even one
> > > > > > applications setting SO_TIMESTAMPING can still be traced through BPF
> > > > > > program. Please see patch [04/12].
> > > > >
> > > > > This revision does not address the main concern.
> > > > >
> > > > > An administrator installed BPF program can affect results of a process
> > > > > using SO_TIMESTAMPING in ways that break it.
> > > >
> > > > Sorry, I didn't get it. How the following code snippet would break users?
> > >
> > > The state between user and bpf timestamping needs to be separate to
> > > avoid interference.
> >
> > Do you agree that we will use this method as following, only allow
> > either of them to work?
> >
> > void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >                      const struct sk_buff *ack_skb,
> >                      struct skb_shared_hwtstamps *hwtstamps,
> >                      struct sock *sk, int tstype)
> > {
> >         if (!sk)
> >                 return;
> >
> >        ret = skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
> >        if (ret)
> >                /* Apps does set the SO_TIMESTAMPING flag, return directly */
> >                return;
> >
> >        if (static_branch_unlikely(&bpf_tstamp_control))
> >                 bpf_skb_tstamp_tx_output(sk, orig_skb, tstype, hwtstamps);
> > }
> >
> > which means if the apps using non-bpf method, we will not see the
> > output even if we load bpf program.
>
> Could the bpf setsockopt fail hard in that case?

We can do this. I think I will add some if test statements to see if
sk_tsflags is initialized before.

>
> Your current patch tries to make them work at the same time. It mostly
> does work. There are just a few concerning edge cases that may result
> in hard to understand bugs.

Agree.

>
> Making only one method work per socket and fail hard if both try it is
> crude, but at least the failure will be clear: the setsockopt fails.
>
> I think that's safer. And in practice, the use cases for BPF
> timestamping probably are exactly when application timestamping is
> missing?

Fair enough. Let me try this way:)

Thanks,
Jason