Message ID | 20241012040651.95616-5-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net-timestamp: bpf extension to equip applications transparently | expand |
Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Willem suggested that we use a static key to control. The advantage > is that we will not affect the existing applications at all if we > don't load BPF program. > > In this patch, except the static key, I also add one logic that is > used to test if the socket has enabled its tsflags in order to > support bpf logic to allow both cases to happen at the same time. These two features are unrelated, should probably be separate patches. > Or else, the skb carring related timestamp flag doesn't know which > way of printing is desirable. > > One thing important is this patch allows print from both applications > and bpf program at the same time. Now we have three kinds of print: > 1) only BPF program prints > 2) only application program prints > 3) both can print without side effect > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
On Tue, Oct 15, 2024 at 9:36 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Willem suggested that we use a static key to control. The advantage > > is that we will not affect the existing applications at all if we > > don't load BPF program. > > > > In this patch, except the static key, I also add one logic that is > > used to test if the socket has enabled its tsflags in order to > > support bpf logic to allow both cases to happen at the same time. > > These two features are unrelated, should probably be separate patches. Will do it, thanks.
On 10/11/24 9:06 PM, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Willem suggested that we use a static key to control. The advantage > is that we will not affect the existing applications at all if we > don't load BPF program. > > In this patch, except the static key, I also add one logic that is > used to test if the socket has enabled its tsflags in order to > support bpf logic to allow both cases to happen at the same time. > Or else, the skb carring related timestamp flag doesn't know which > way of printing is desirable. > > One thing important is this patch allows print from both applications > and bpf program at the same time. Now we have three kinds of print: > 1) only BPF program prints > 2) only application program prints > 3) both can print without side effect > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > include/net/sock.h | 1 + > net/core/filter.c | 3 +++ > net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 42 insertions(+) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 66ecd78f1dfe..b7c51b95c92d 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) > void sock_def_readable(struct sock *sk); > > int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); > +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); > void sock_set_timestamp(struct sock *sk, int optname, bool valbool); > int sock_get_timestamping(struct so_timestamping *timestamping, > sockptr_t optval, unsigned int optlen); > diff --git a/net/core/filter.c b/net/core/filter.c > index 996426095bd9..08135f538c99 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { > .arg1_type = ARG_PTR_TO_CTX, > }; > > +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); > + > static int bpf_sock_set_timestamping(struct sock *sk, > struct so_timestamping *timestamping) > { > @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, > return -EINVAL; > > WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); > + static_branch_enable(&bpf_tstamp_control); Not sure when is a good time to do static_branch_disable(). The bpf prog may be detached also. (IF) it ends up staying with the cgroup/sockops interface, it should depend on the existing static key in cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > > return 0; > } > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index f36eb9daa31a..d0f912f1ff7b 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, > } > EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); > > +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) sk is unused. > +{ > + u32 testflag; > + > + switch (tstype) { > + case SCM_TSTAMP_SCHED: Instead of doing this translation, is it easier to directly store the bpf prog desired ts"type" (i.e. the SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? or there is a specific need to keep the SOF_TIMESTAMPING_* value in sk->sk_tsflags_bpf? > + testflag = SOF_TIMESTAMPING_TX_SCHED; > + break; > + case SCM_TSTAMP_SND: > + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; > + break; > + case SCM_TSTAMP_ACK: > + testflag = SOF_TIMESTAMPING_TX_ACK; > + break; > + default: > + return false; > + } > + if (tsflags & testflag) > + return true; > + > + return false; > +} > + > static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > const struct sk_buff *ack_skb, > struct skb_shared_hwtstamps *hwtstamps, > @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > if (!skb_may_tx_timestamp(sk, tsonly)) > return; > > + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual skb. Would it break? Is it the similar case on the skb tx_flags that Willem has mentioned in the patch 0's thread? > + return; > + > if (tsonly) { > #ifdef CONFIG_INET > if ((tsflags & SOF_TIMESTAMPING_OPT_STATS) && > @@ -5593,6 +5619,15 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > __skb_complete_tx_timestamp(skb, sk, tstype, opt_stats); > } > > +static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype) > +{ > + u32 tsflags; > + > + tsflags = READ_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR]); > + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > + return; > +} > + > void __skb_tstamp_tx(struct sk_buff *orig_skb, > const struct sk_buff *ack_skb, > struct skb_shared_hwtstamps *hwtstamps, > @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, > if (!sk) > return; > > + if (static_branch_unlikely(&bpf_tstamp_control)) > + bpf_skb_tstamp_tx_output(sk, tstype); > + > skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); > } > EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/11/24 9:06 PM, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Willem suggested that we use a static key to control. The advantage > > is that we will not affect the existing applications at all if we > > don't load BPF program. > > > > In this patch, except the static key, I also add one logic that is > > used to test if the socket has enabled its tsflags in order to > > support bpf logic to allow both cases to happen at the same time. > > Or else, the skb carring related timestamp flag doesn't know which > > way of printing is desirable. > > > > One thing important is this patch allows print from both applications > > and bpf program at the same time. Now we have three kinds of print: > > 1) only BPF program prints > > 2) only application program prints > > 3) both can print without side effect > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > include/net/sock.h | 1 + > > net/core/filter.c | 3 +++ > > net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 42 insertions(+) > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 66ecd78f1dfe..b7c51b95c92d 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) > > void sock_def_readable(struct sock *sk); > > > > int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); > > +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); > > void sock_set_timestamp(struct sock *sk, int optname, bool valbool); > > int sock_get_timestamping(struct so_timestamping *timestamping, > > sockptr_t optval, unsigned int optlen); > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 996426095bd9..08135f538c99 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { > > .arg1_type = ARG_PTR_TO_CTX, > > }; > > > > +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); > > + > > static int bpf_sock_set_timestamping(struct sock *sk, > > struct so_timestamping *timestamping) > > { > > @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, > > return -EINVAL; > > > > WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); > > + static_branch_enable(&bpf_tstamp_control); > > Not sure when is a good time to do static_branch_disable(). Thanks for the review. To be honest, I considered how to disable the static key. Like you said, I failed to find a good chance that I can accurately disable it. > > The bpf prog may be detached also. (IF) it ends up staying with the > cgroup/sockops interface, it should depend on the existing static key in > cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. Are you suggesting that we need to remove the current static key? In the previous thread, the reason why Willem came up with this idea is, I think, to avoid affect the non-bpf timestamping feature. > > > > > return 0; > > } > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index f36eb9daa31a..d0f912f1ff7b 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, > > } > > EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); > > > > +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) > > sk is unused. Thanks for the careful check. > > > +{ > > + u32 testflag; > > + > > + switch (tstype) { > > + case SCM_TSTAMP_SCHED: > > Instead of doing this translation, > is it easier to directly store the bpf prog desired ts"type" (i.e. the > SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? > or there is a specific need to keep the SOF_TIMESTAMPING_* value in > sk->sk_tsflags_bpf? We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, SOF_TIMESTAMPING_OPT_ID, that we need to support. > > > + testflag = SOF_TIMESTAMPING_TX_SCHED; > > + break; > > + case SCM_TSTAMP_SND: > > + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; > > + break; > > + case SCM_TSTAMP_ACK: > > + testflag = SOF_TIMESTAMPING_TX_ACK; > > + break; > > + default: > > + return false; > > + } > > + if (tsflags & testflag) > > + return true; > > + > > + return false; > > +} > > + > > static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > > const struct sk_buff *ack_skb, > > struct skb_shared_hwtstamps *hwtstamps, > > @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > > if (!skb_may_tx_timestamp(sk, tsonly)) > > return; > > > > + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > > This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. This test will be used in bpf and non-bpf cases. Because of this, we can support BPF extension. In this function, if skb has tsflags but we don't know which approach the user expects, sk_tstamp_tx_flags() can help us. > > My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual > skb. Would it break? Oh, you're right. I didn't support cmsg mode... > Is it the similar case on the skb tx_flags that Willem has > mentioned in the patch 0's thread? Yes, but am I supposed to add new a bpf tx_flags in the struct sk_buff? Thanks, Jason
On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > On 10/11/24 9:06 PM, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Willem suggested that we use a static key to control. The advantage > > > is that we will not affect the existing applications at all if we > > > don't load BPF program. > > > > > > In this patch, except the static key, I also add one logic that is > > > used to test if the socket has enabled its tsflags in order to > > > support bpf logic to allow both cases to happen at the same time. > > > Or else, the skb carring related timestamp flag doesn't know which > > > way of printing is desirable. > > > > > > One thing important is this patch allows print from both applications > > > and bpf program at the same time. Now we have three kinds of print: > > > 1) only BPF program prints > > > 2) only application program prints > > > 3) both can print without side effect > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > include/net/sock.h | 1 + > > > net/core/filter.c | 3 +++ > > > net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 42 insertions(+) > > > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > > index 66ecd78f1dfe..b7c51b95c92d 100644 > > > --- a/include/net/sock.h > > > +++ b/include/net/sock.h > > > @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) > > > void sock_def_readable(struct sock *sk); > > > > > > int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); > > > +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); > > > void sock_set_timestamp(struct sock *sk, int optname, bool valbool); > > > int sock_get_timestamping(struct so_timestamping *timestamping, > > > sockptr_t optval, unsigned int optlen); > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index 996426095bd9..08135f538c99 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { > > > .arg1_type = ARG_PTR_TO_CTX, > > > }; > > > > > > +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); > > > + > > > static int bpf_sock_set_timestamping(struct sock *sk, > > > struct so_timestamping *timestamping) > > > { > > > @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, > > > return -EINVAL; > > > > > > WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); > > > + static_branch_enable(&bpf_tstamp_control); > > > > Not sure when is a good time to do static_branch_disable(). > > Thanks for the review. > > To be honest, I considered how to disable the static key. Like you > said, I failed to find a good chance that I can accurately disable it. > > > > > The bpf prog may be detached also. (IF) it ends up staying with the > > cgroup/sockops interface, it should depend on the existing static key in > > cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > > Are you suggesting that we need to remove the current static key? In > the previous thread, the reason why Willem came up with this idea is, > I think, to avoid affect the non-bpf timestamping feature. > > > > > > > > > return 0; > > > } > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index f36eb9daa31a..d0f912f1ff7b 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, > > > } > > > EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); > > > > > > +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) > > > > sk is unused. > > Thanks for the careful check. > > > > > > +{ > > > + u32 testflag; > > > + > > > + switch (tstype) { > > > + case SCM_TSTAMP_SCHED: > > > > Instead of doing this translation, > > is it easier to directly store the bpf prog desired ts"type" (i.e. the > > SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? > > or there is a specific need to keep the SOF_TIMESTAMPING_* value in > > sk->sk_tsflags_bpf? > > We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, > SOF_TIMESTAMPING_OPT_ID, that we need to support. > > > > > > + testflag = SOF_TIMESTAMPING_TX_SCHED; > > > + break; > > > + case SCM_TSTAMP_SND: > > > + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; > > > + break; > > > + case SCM_TSTAMP_ACK: > > > + testflag = SOF_TIMESTAMPING_TX_ACK; > > > + break; > > > + default: > > > + return false; > > > + } > > > + if (tsflags & testflag) > > > + return true; > > > + > > > + return false; > > > +} > > > + > > > static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > > > const struct sk_buff *ack_skb, > > > struct skb_shared_hwtstamps *hwtstamps, > > > @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > > > if (!skb_may_tx_timestamp(sk, tsonly)) > > > return; > > > > > > + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > > > > This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. > > This test will be used in bpf and non-bpf cases. Because of this, we > can support BPF extension. In this function, if skb has tsflags but we > don't know which approach the user expects, sk_tstamp_tx_flags() can > help us. > > > > > My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual > > skb. Would it break? > > Oh, you're right. I didn't support cmsg mode... I think I only need to test if it's in the bpf mode, or else let the original way print the timestamp, which can solve the issue.
On 10/15/24 6:32 PM, Jason Xing wrote: > On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >> >> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: >>> >>> On 10/11/24 9:06 PM, Jason Xing wrote: >>>> From: Jason Xing <kernelxing@tencent.com> >>>> >>>> Willem suggested that we use a static key to control. The advantage >>>> is that we will not affect the existing applications at all if we >>>> don't load BPF program. >>>> >>>> In this patch, except the static key, I also add one logic that is >>>> used to test if the socket has enabled its tsflags in order to >>>> support bpf logic to allow both cases to happen at the same time. >>>> Or else, the skb carring related timestamp flag doesn't know which >>>> way of printing is desirable. >>>> >>>> One thing important is this patch allows print from both applications >>>> and bpf program at the same time. Now we have three kinds of print: >>>> 1) only BPF program prints >>>> 2) only application program prints >>>> 3) both can print without side effect >>>> >>>> Signed-off-by: Jason Xing <kernelxing@tencent.com> >>>> --- >>>> include/net/sock.h | 1 + >>>> net/core/filter.c | 3 +++ >>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ >>>> 3 files changed, 42 insertions(+) >>>> >>>> diff --git a/include/net/sock.h b/include/net/sock.h >>>> index 66ecd78f1dfe..b7c51b95c92d 100644 >>>> --- a/include/net/sock.h >>>> +++ b/include/net/sock.h >>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) >>>> void sock_def_readable(struct sock *sk); >>>> >>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); >>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); >>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool); >>>> int sock_get_timestamping(struct so_timestamping *timestamping, >>>> sockptr_t optval, unsigned int optlen); >>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>> index 996426095bd9..08135f538c99 100644 >>>> --- a/net/core/filter.c >>>> +++ b/net/core/filter.c >>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { >>>> .arg1_type = ARG_PTR_TO_CTX, >>>> }; >>>> >>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); >>>> + >>>> static int bpf_sock_set_timestamping(struct sock *sk, >>>> struct so_timestamping *timestamping) >>>> { >>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, >>>> return -EINVAL; >>>> >>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); >>>> + static_branch_enable(&bpf_tstamp_control); >>> >>> Not sure when is a good time to do static_branch_disable(). >> >> Thanks for the review. >> >> To be honest, I considered how to disable the static key. Like you >> said, I failed to find a good chance that I can accurately disable it. >> >>> >>> The bpf prog may be detached also. (IF) it ends up staying with the >>> cgroup/sockops interface, it should depend on the existing static key in >>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. >> >> Are you suggesting that we need to remove the current static key? In >> the previous thread, the reason why Willem came up with this idea is, >> I think, to avoid affect the non-bpf timestamping feature. >> >>> >>>> >>>> return 0; >>>> } >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>> index f36eb9daa31a..d0f912f1ff7b 100644 >>>> --- a/net/core/skbuff.c >>>> +++ b/net/core/skbuff.c >>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, >>>> } >>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); >>>> >>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) >>> >>> sk is unused. >> >> Thanks for the careful check. >> >>> >>>> +{ >>>> + u32 testflag; >>>> + >>>> + switch (tstype) { >>>> + case SCM_TSTAMP_SCHED: >>> >>> Instead of doing this translation, >>> is it easier to directly store the bpf prog desired ts"type" (i.e. the >>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? >>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in >>> sk->sk_tsflags_bpf? >> >> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, >> SOF_TIMESTAMPING_OPT_ID, that we need to support. >> >>> >>>> + testflag = SOF_TIMESTAMPING_TX_SCHED; >>>> + break; >>>> + case SCM_TSTAMP_SND: >>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; >>>> + break; >>>> + case SCM_TSTAMP_ACK: >>>> + testflag = SOF_TIMESTAMPING_TX_ACK; >>>> + break; >>>> + default: >>>> + return false; >>>> + } >>>> + if (tsflags & testflag) >>>> + return true; >>>> + >>>> + return false; >>>> +} >>>> + >>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, >>>> const struct sk_buff *ack_skb, >>>> struct skb_shared_hwtstamps *hwtstamps, >>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, >>>> if (!skb_may_tx_timestamp(sk, tsonly)) >>>> return; >>>> >>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) >>> >>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. >> >> This test will be used in bpf and non-bpf cases. Because of this, we >> can support BPF extension. In this function, if skb has tsflags but we >> don't know which approach the user expects, sk_tstamp_tx_flags() can >> help us. >> >>> >>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual >>> skb. Would it break? >> >> Oh, you're right. I didn't support cmsg mode... > > I think I only need to test if it's in the bpf mode, or else let the > original way print the timestamp, which can solve the issue. From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);": int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) { /* ... */ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); /* ... */ } I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled the timestamping. I may have missed somewhere in the patch set that the skb's tx_flags is changed by sk->sk_tsflags_bpf alone? I think a skb tskey is still desired (?), so eventually we want some spaces in the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending skb->data_meta usage outside of xdp and tc. I think here we want to have it available at the tx side to store the tx_flags and tskey but probably want them at a specific place/offset at the data_meta. For now, is there thing we can explore to share in the skb_shared_info? Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey only at the "tx" side? There is already another union member. The hwtstamps should only be needed when the NIC is done sending?
On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/15/24 6:32 PM, Jason Xing wrote: > > On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > >> > >> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >>> > >>> On 10/11/24 9:06 PM, Jason Xing wrote: > >>>> From: Jason Xing <kernelxing@tencent.com> > >>>> > >>>> Willem suggested that we use a static key to control. The advantage > >>>> is that we will not affect the existing applications at all if we > >>>> don't load BPF program. > >>>> > >>>> In this patch, except the static key, I also add one logic that is > >>>> used to test if the socket has enabled its tsflags in order to > >>>> support bpf logic to allow both cases to happen at the same time. > >>>> Or else, the skb carring related timestamp flag doesn't know which > >>>> way of printing is desirable. > >>>> > >>>> One thing important is this patch allows print from both applications > >>>> and bpf program at the same time. Now we have three kinds of print: > >>>> 1) only BPF program prints > >>>> 2) only application program prints > >>>> 3) both can print without side effect > >>>> > >>>> Signed-off-by: Jason Xing <kernelxing@tencent.com> > >>>> --- > >>>> include/net/sock.h | 1 + > >>>> net/core/filter.c | 3 +++ > >>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ > >>>> 3 files changed, 42 insertions(+) > >>>> > >>>> diff --git a/include/net/sock.h b/include/net/sock.h > >>>> index 66ecd78f1dfe..b7c51b95c92d 100644 > >>>> --- a/include/net/sock.h > >>>> +++ b/include/net/sock.h > >>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) > >>>> void sock_def_readable(struct sock *sk); > >>>> > >>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); > >>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); > >>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool); > >>>> int sock_get_timestamping(struct so_timestamping *timestamping, > >>>> sockptr_t optval, unsigned int optlen); > >>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>> index 996426095bd9..08135f538c99 100644 > >>>> --- a/net/core/filter.c > >>>> +++ b/net/core/filter.c > >>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { > >>>> .arg1_type = ARG_PTR_TO_CTX, > >>>> }; > >>>> > >>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); > >>>> + > >>>> static int bpf_sock_set_timestamping(struct sock *sk, > >>>> struct so_timestamping *timestamping) > >>>> { > >>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, > >>>> return -EINVAL; > >>>> > >>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); > >>>> + static_branch_enable(&bpf_tstamp_control); > >>> > >>> Not sure when is a good time to do static_branch_disable(). > >> > >> Thanks for the review. > >> > >> To be honest, I considered how to disable the static key. Like you > >> said, I failed to find a good chance that I can accurately disable it. > >> > >>> > >>> The bpf prog may be detached also. (IF) it ends up staying with the > >>> cgroup/sockops interface, it should depend on the existing static key in > >>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > >> > >> Are you suggesting that we need to remove the current static key? In > >> the previous thread, the reason why Willem came up with this idea is, > >> I think, to avoid affect the non-bpf timestamping feature. > >> > >>> > >>>> > >>>> return 0; > >>>> } > >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>> index f36eb9daa31a..d0f912f1ff7b 100644 > >>>> --- a/net/core/skbuff.c > >>>> +++ b/net/core/skbuff.c > >>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, > >>>> } > >>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); > >>>> > >>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) > >>> > >>> sk is unused. > >> > >> Thanks for the careful check. > >> > >>> > >>>> +{ > >>>> + u32 testflag; > >>>> + > >>>> + switch (tstype) { > >>>> + case SCM_TSTAMP_SCHED: > >>> > >>> Instead of doing this translation, > >>> is it easier to directly store the bpf prog desired ts"type" (i.e. the > >>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? > >>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in > >>> sk->sk_tsflags_bpf? > >> > >> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, > >> SOF_TIMESTAMPING_OPT_ID, that we need to support. > >> > >>> > >>>> + testflag = SOF_TIMESTAMPING_TX_SCHED; > >>>> + break; > >>>> + case SCM_TSTAMP_SND: > >>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; > >>>> + break; > >>>> + case SCM_TSTAMP_ACK: > >>>> + testflag = SOF_TIMESTAMPING_TX_ACK; > >>>> + break; > >>>> + default: > >>>> + return false; > >>>> + } > >>>> + if (tsflags & testflag) > >>>> + return true; > >>>> + > >>>> + return false; > >>>> +} > >>>> + > >>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > >>>> const struct sk_buff *ack_skb, > >>>> struct skb_shared_hwtstamps *hwtstamps, > >>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > >>>> if (!skb_may_tx_timestamp(sk, tsonly)) > >>>> return; > >>>> > >>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > >>> > >>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. > >> > >> This test will be used in bpf and non-bpf cases. Because of this, we > >> can support BPF extension. In this function, if skb has tsflags but we > >> don't know which approach the user expects, sk_tstamp_tx_flags() can > >> help us. > >> > >>> > >>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual > >>> skb. Would it break? > >> > >> Oh, you're right. I didn't support cmsg mode... > > > > I think I only need to test if it's in the bpf mode, or else let the > > original way print the timestamp, which can solve the issue. > > From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, > SCM_TSTAMP_SCHED);": > > int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > { > /* ... */ > > if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) > __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); > > /* ... */ > } > > I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled > the timestamping. I may have missed somewhere in the patch set that the skb's > tx_flags is changed by sk->sk_tsflags_bpf alone? If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be helpful, which initializes every last skb, please see patch [10/12]. > > I think a skb tskey is still desired (?), so eventually we want some spaces in tskey function is optional I think. It depends whether users want to use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag. > the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending > skb->data_meta usage outside of xdp and tc. I think here we want to have it > available at the tx side to store the tx_flags and tskey but probably want them > at a specific place/offset at the data_meta. If we have the plan to store extra information in data_meta, I can give it a try:) > > For now, is there thing we can explore to share in the skb_shared_info? My initial thought is just to reuse these fields in skb. It can work without interfering one another. > Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey > only at the "tx" side? There is already another union member. tskey is always used in the tx path. hwtstamps can be used in both rx and tx cases (please see tcp_update_recv_tstamps() and skb_tstamp_tx()). > The hwtstamps should only be needed when the NIC is done sending? In this patch, yes, hwtstamps are the records in tx path. Thanks, Jason
On 10/15/24 6:04 PM, Jason Xing wrote: > To be honest, I considered how to disable the static key. Like you > said, I failed to find a good chance that I can accurately disable it. It at least needs to be disabled whenever that bpf prog got detached. > >> The bpf prog may be detached also. (IF) it ends up staying with the >> cgroup/sockops interface, it should depend on the existing static key in >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > Are you suggesting that we need to remove the current static key? In > the previous thread, the reason why Willem came up with this idea is, > I think, to avoid affect the non-bpf timestamping feature. Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am saying to use that existing key. afaict, the newly added bpf_tstamp_control key is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less granular but it has the needed accounting to disable whenever the bpf prog got detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS).
On Wed, Oct 16, 2024 at 2:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/15/24 6:04 PM, Jason Xing wrote: > > To be honest, I considered how to disable the static key. Like you > > said, I failed to find a good chance that I can accurately disable it. > > It at least needs to be disabled whenever that bpf prog got detached. > > > > >> The bpf prog may be detached also. (IF) it ends up staying with the > >> cgroup/sockops interface, it should depend on the existing static key in > >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > > > Are you suggesting that we need to remove the current static key? In > > the previous thread, the reason why Willem came up with this idea is, > > I think, to avoid affect the non-bpf timestamping feature. > > Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am > saying to use that existing key. afaict, the newly added bpf_tstamp_control key > is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less > granular but it has the needed accounting to disable whenever the bpf prog got > detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS). Good suggestion. Good thing is that I don't need to figure out a proper place to disable it any more. I can directly use cgroup_bpf_enabled(CGROUP_SOCK_OPS) to test if the timestamp should be printed with BPF program loaded. BTW, I found that we don't implement how to disable the ip4_min_ttl static key. Sometimes, I'm confused whether we have to disable it at a certain time. Thanks, Jason
On 10/15/24 11:30 PM, Jason Xing wrote: > On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 10/15/24 6:32 PM, Jason Xing wrote: >>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>> >>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: >>>>> >>>>> On 10/11/24 9:06 PM, Jason Xing wrote: >>>>>> From: Jason Xing <kernelxing@tencent.com> >>>>>> >>>>>> Willem suggested that we use a static key to control. The advantage >>>>>> is that we will not affect the existing applications at all if we >>>>>> don't load BPF program. >>>>>> >>>>>> In this patch, except the static key, I also add one logic that is >>>>>> used to test if the socket has enabled its tsflags in order to >>>>>> support bpf logic to allow both cases to happen at the same time. >>>>>> Or else, the skb carring related timestamp flag doesn't know which >>>>>> way of printing is desirable. >>>>>> >>>>>> One thing important is this patch allows print from both applications >>>>>> and bpf program at the same time. Now we have three kinds of print: >>>>>> 1) only BPF program prints >>>>>> 2) only application program prints >>>>>> 3) both can print without side effect >>>>>> >>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com> >>>>>> --- >>>>>> include/net/sock.h | 1 + >>>>>> net/core/filter.c | 3 +++ >>>>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ >>>>>> 3 files changed, 42 insertions(+) >>>>>> >>>>>> diff --git a/include/net/sock.h b/include/net/sock.h >>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644 >>>>>> --- a/include/net/sock.h >>>>>> +++ b/include/net/sock.h >>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) >>>>>> void sock_def_readable(struct sock *sk); >>>>>> >>>>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); >>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); >>>>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool); >>>>>> int sock_get_timestamping(struct so_timestamping *timestamping, >>>>>> sockptr_t optval, unsigned int optlen); >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>>> index 996426095bd9..08135f538c99 100644 >>>>>> --- a/net/core/filter.c >>>>>> +++ b/net/core/filter.c >>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { >>>>>> .arg1_type = ARG_PTR_TO_CTX, >>>>>> }; >>>>>> >>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); >>>>>> + >>>>>> static int bpf_sock_set_timestamping(struct sock *sk, >>>>>> struct so_timestamping *timestamping) >>>>>> { >>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, >>>>>> return -EINVAL; >>>>>> >>>>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); >>>>>> + static_branch_enable(&bpf_tstamp_control); >>>>> >>>>> Not sure when is a good time to do static_branch_disable(). >>>> >>>> Thanks for the review. >>>> >>>> To be honest, I considered how to disable the static key. Like you >>>> said, I failed to find a good chance that I can accurately disable it. >>>> >>>>> >>>>> The bpf prog may be detached also. (IF) it ends up staying with the >>>>> cgroup/sockops interface, it should depend on the existing static key in >>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. >>>> >>>> Are you suggesting that we need to remove the current static key? In >>>> the previous thread, the reason why Willem came up with this idea is, >>>> I think, to avoid affect the non-bpf timestamping feature. >>>> >>>>> >>>>>> >>>>>> return 0; >>>>>> } >>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>>> index f36eb9daa31a..d0f912f1ff7b 100644 >>>>>> --- a/net/core/skbuff.c >>>>>> +++ b/net/core/skbuff.c >>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, >>>>>> } >>>>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); >>>>>> >>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) >>>>> >>>>> sk is unused. >>>> >>>> Thanks for the careful check. >>>> >>>>> >>>>>> +{ >>>>>> + u32 testflag; >>>>>> + >>>>>> + switch (tstype) { >>>>>> + case SCM_TSTAMP_SCHED: >>>>> >>>>> Instead of doing this translation, >>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the >>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? >>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in >>>>> sk->sk_tsflags_bpf? >>>> >>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, >>>> SOF_TIMESTAMPING_OPT_ID, that we need to support. >>>> >>>>> >>>>>> + testflag = SOF_TIMESTAMPING_TX_SCHED; >>>>>> + break; >>>>>> + case SCM_TSTAMP_SND: >>>>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; >>>>>> + break; >>>>>> + case SCM_TSTAMP_ACK: >>>>>> + testflag = SOF_TIMESTAMPING_TX_ACK; >>>>>> + break; >>>>>> + default: >>>>>> + return false; >>>>>> + } >>>>>> + if (tsflags & testflag) >>>>>> + return true; >>>>>> + >>>>>> + return false; >>>>>> +} >>>>>> + >>>>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, >>>>>> const struct sk_buff *ack_skb, >>>>>> struct skb_shared_hwtstamps *hwtstamps, >>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, >>>>>> if (!skb_may_tx_timestamp(sk, tsonly)) >>>>>> return; >>>>>> >>>>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) >>>>> >>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. >>>> >>>> This test will be used in bpf and non-bpf cases. Because of this, we >>>> can support BPF extension. In this function, if skb has tsflags but we >>>> don't know which approach the user expects, sk_tstamp_tx_flags() can >>>> help us. >>>> >>>>> >>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual >>>>> skb. Would it break? >>>> >>>> Oh, you're right. I didn't support cmsg mode... >>> >>> I think I only need to test if it's in the bpf mode, or else let the >>> original way print the timestamp, which can solve the issue. >> >> From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, >> SCM_TSTAMP_SCHED);": >> >> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) >> { >> /* ... */ >> >> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) >> __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); >> >> /* ... */ >> } >> >> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled >> the timestamping. I may have missed somewhere in the patch set that the skb's >> tx_flags is changed by sk->sk_tsflags_bpf alone? > > If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be > helpful, which initializes every last skb, please see patch [10/12]. Ah. ok. It is the thing I missed. Thanks for the pointer. >> >> I think a skb tskey is still desired (?), so eventually we want some spaces in > > tskey function is optional I think. It depends whether users want to > use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag. > >> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending >> skb->data_meta usage outside of xdp and tc. I think here we want to have it >> available at the tx side to store the tx_flags and tskey but probably want them >> at a specific place/offset at the data_meta. > > If we have the plan to store extra information in data_meta, I can > give it a try:) > >> >> For now, is there thing we can explore to share in the skb_shared_info? > > My initial thought is just to reuse these fields in skb. It can work > without interfering one another. After reading closer to patch 10, I am likely still missing something. How can it tell if the tx_flags is set by the bpf or by the user space cmsg? > >> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey >> only at the "tx" side? There is already another union member. > > tskey is always used in the tx path. > > hwtstamps can be used in both rx and tx cases (please see > tcp_update_recv_tstamps() and skb_tstamp_tx()). hmm... we only need some where to store the bpf tx_flags and bpf tskey in the TX-ing skb. You meant the hwtstamps of a Tx-ing skb is not empty? At skb_tstamp_tx (TX side only?), the orig_skb's hwtstamps has not been written yet? > >> The hwtstamps should only be needed when the NIC is done sending? > > In this patch, yes, hwtstamps are the records in tx path. > > Thanks, > Jason
On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/15/24 11:30 PM, Jason Xing wrote: > > On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 10/15/24 6:32 PM, Jason Xing wrote: > >>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > >>>> > >>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >>>>> > >>>>> On 10/11/24 9:06 PM, Jason Xing wrote: > >>>>>> From: Jason Xing <kernelxing@tencent.com> > >>>>>> > >>>>>> Willem suggested that we use a static key to control. The advantage > >>>>>> is that we will not affect the existing applications at all if we > >>>>>> don't load BPF program. > >>>>>> > >>>>>> In this patch, except the static key, I also add one logic that is > >>>>>> used to test if the socket has enabled its tsflags in order to > >>>>>> support bpf logic to allow both cases to happen at the same time. > >>>>>> Or else, the skb carring related timestamp flag doesn't know which > >>>>>> way of printing is desirable. > >>>>>> > >>>>>> One thing important is this patch allows print from both applications > >>>>>> and bpf program at the same time. Now we have three kinds of print: > >>>>>> 1) only BPF program prints > >>>>>> 2) only application program prints > >>>>>> 3) both can print without side effect > >>>>>> > >>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com> > >>>>>> --- > >>>>>> include/net/sock.h | 1 + > >>>>>> net/core/filter.c | 3 +++ > >>>>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ > >>>>>> 3 files changed, 42 insertions(+) > >>>>>> > >>>>>> diff --git a/include/net/sock.h b/include/net/sock.h > >>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644 > >>>>>> --- a/include/net/sock.h > >>>>>> +++ b/include/net/sock.h > >>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) > >>>>>> void sock_def_readable(struct sock *sk); > >>>>>> > >>>>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); > >>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); > >>>>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool); > >>>>>> int sock_get_timestamping(struct so_timestamping *timestamping, > >>>>>> sockptr_t optval, unsigned int optlen); > >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>>>> index 996426095bd9..08135f538c99 100644 > >>>>>> --- a/net/core/filter.c > >>>>>> +++ b/net/core/filter.c > >>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { > >>>>>> .arg1_type = ARG_PTR_TO_CTX, > >>>>>> }; > >>>>>> > >>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); > >>>>>> + > >>>>>> static int bpf_sock_set_timestamping(struct sock *sk, > >>>>>> struct so_timestamping *timestamping) > >>>>>> { > >>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, > >>>>>> return -EINVAL; > >>>>>> > >>>>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); > >>>>>> + static_branch_enable(&bpf_tstamp_control); > >>>>> > >>>>> Not sure when is a good time to do static_branch_disable(). > >>>> > >>>> Thanks for the review. > >>>> > >>>> To be honest, I considered how to disable the static key. Like you > >>>> said, I failed to find a good chance that I can accurately disable it. > >>>> > >>>>> > >>>>> The bpf prog may be detached also. (IF) it ends up staying with the > >>>>> cgroup/sockops interface, it should depend on the existing static key in > >>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > >>>> > >>>> Are you suggesting that we need to remove the current static key? In > >>>> the previous thread, the reason why Willem came up with this idea is, > >>>> I think, to avoid affect the non-bpf timestamping feature. > >>>> > >>>>> > >>>>>> > >>>>>> return 0; > >>>>>> } > >>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>>>> index f36eb9daa31a..d0f912f1ff7b 100644 > >>>>>> --- a/net/core/skbuff.c > >>>>>> +++ b/net/core/skbuff.c > >>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, > >>>>>> } > >>>>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); > >>>>>> > >>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) > >>>>> > >>>>> sk is unused. > >>>> > >>>> Thanks for the careful check. > >>>> > >>>>> > >>>>>> +{ > >>>>>> + u32 testflag; > >>>>>> + > >>>>>> + switch (tstype) { > >>>>>> + case SCM_TSTAMP_SCHED: > >>>>> > >>>>> Instead of doing this translation, > >>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the > >>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? > >>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in > >>>>> sk->sk_tsflags_bpf? > >>>> > >>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, > >>>> SOF_TIMESTAMPING_OPT_ID, that we need to support. > >>>> > >>>>> > >>>>>> + testflag = SOF_TIMESTAMPING_TX_SCHED; > >>>>>> + break; > >>>>>> + case SCM_TSTAMP_SND: > >>>>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; > >>>>>> + break; > >>>>>> + case SCM_TSTAMP_ACK: > >>>>>> + testflag = SOF_TIMESTAMPING_TX_ACK; > >>>>>> + break; > >>>>>> + default: > >>>>>> + return false; > >>>>>> + } > >>>>>> + if (tsflags & testflag) > >>>>>> + return true; > >>>>>> + > >>>>>> + return false; > >>>>>> +} > >>>>>> + > >>>>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > >>>>>> const struct sk_buff *ack_skb, > >>>>>> struct skb_shared_hwtstamps *hwtstamps, > >>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > >>>>>> if (!skb_may_tx_timestamp(sk, tsonly)) > >>>>>> return; > >>>>>> > >>>>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > >>>>> > >>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. > >>>> > >>>> This test will be used in bpf and non-bpf cases. Because of this, we > >>>> can support BPF extension. In this function, if skb has tsflags but we > >>>> don't know which approach the user expects, sk_tstamp_tx_flags() can > >>>> help us. > >>>> > >>>>> > >>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual > >>>>> skb. Would it break? > >>>> > >>>> Oh, you're right. I didn't support cmsg mode... > >>> > >>> I think I only need to test if it's in the bpf mode, or else let the > >>> original way print the timestamp, which can solve the issue. > >> > >> From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, > >> SCM_TSTAMP_SCHED);": > >> > >> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > >> { > >> /* ... */ > >> > >> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) > >> __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); > >> > >> /* ... */ > >> } > >> > >> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled > >> the timestamping. I may have missed somewhere in the patch set that the skb's > >> tx_flags is changed by sk->sk_tsflags_bpf alone? > > > > If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be > > helpful, which initializes every last skb, please see patch [10/12]. > > Ah. ok. It is the thing I missed. Thanks for the pointer. > > >> > >> I think a skb tskey is still desired (?), so eventually we want some spaces in > > > > tskey function is optional I think. It depends whether users want to > > use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag. > > > >> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending > >> skb->data_meta usage outside of xdp and tc. I think here we want to have it > >> available at the tx side to store the tx_flags and tskey but probably want them > >> at a specific place/offset at the data_meta. > > > > If we have the plan to store extra information in data_meta, I can > > give it a try:) > > > >> > >> For now, is there thing we can explore to share in the skb_shared_info? > > > > My initial thought is just to reuse these fields in skb. It can work > > without interfering one another. > > After reading closer to patch 10, I am likely still missing something. How can > it tell if the tx_flags is set by the bpf or by the user space cmsg? If the skb carries the timestamp, there are three cases: 1) non-bpf case and users uses setsockopt() 2) cmsg case 3) bpf case #1 and #2 are already handled well before this patch. I only need to test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or else it could be #1 or #2, then we will let the old way print timestamps in __skb_tstamp_tx(). > > > > >> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey > >> only at the "tx" side? There is already another union member. > > > > tskey is always used in the tx path. > > > > hwtstamps can be used in both rx and tx cases (please see > > tcp_update_recv_tstamps() and skb_tstamp_tx()). > > hmm... we only need some where to store the bpf tx_flags and bpf tskey in the > TX-ing skb. And there is one more field we have to take care of: txstamp_ack which indicates whether we print timestamp when the last skb is acked. Please see tcp_tx_timestamp(). > You meant the hwtstamps of a Tx-ing skb is not empty? Sometimes, it's not empty if the hardware supports the timestamp feature and the user wants to see it (by enabling the SOF_TIMESTAMPING_TX_HARDWARE flag). As we can see, there are many callers calling skb_tstamp_tx(). > > At skb_tstamp_tx (TX side only?), the orig_skb's hwtstamps has not been written yet? I'm not that sure about the orig_skb. It seems no. I can see some callers reading ptp timestamp from the nic and pass the timestamp to skb_tstamp_tx(). Thanks, Jason
On 10/16/24 12:54 AM, Jason Xing wrote: > On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 10/15/24 11:30 PM, Jason Xing wrote: >>> On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >>>> >>>> On 10/15/24 6:32 PM, Jason Xing wrote: >>>>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: >>>>>> >>>>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: >>>>>>> >>>>>>> On 10/11/24 9:06 PM, Jason Xing wrote: >>>>>>>> From: Jason Xing <kernelxing@tencent.com> >>>>>>>> >>>>>>>> Willem suggested that we use a static key to control. The advantage >>>>>>>> is that we will not affect the existing applications at all if we >>>>>>>> don't load BPF program. >>>>>>>> >>>>>>>> In this patch, except the static key, I also add one logic that is >>>>>>>> used to test if the socket has enabled its tsflags in order to >>>>>>>> support bpf logic to allow both cases to happen at the same time. >>>>>>>> Or else, the skb carring related timestamp flag doesn't know which >>>>>>>> way of printing is desirable. >>>>>>>> >>>>>>>> One thing important is this patch allows print from both applications >>>>>>>> and bpf program at the same time. Now we have three kinds of print: >>>>>>>> 1) only BPF program prints >>>>>>>> 2) only application program prints >>>>>>>> 3) both can print without side effect >>>>>>>> >>>>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com> >>>>>>>> --- >>>>>>>> include/net/sock.h | 1 + >>>>>>>> net/core/filter.c | 3 +++ >>>>>>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ >>>>>>>> 3 files changed, 42 insertions(+) >>>>>>>> >>>>>>>> diff --git a/include/net/sock.h b/include/net/sock.h >>>>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644 >>>>>>>> --- a/include/net/sock.h >>>>>>>> +++ b/include/net/sock.h >>>>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) >>>>>>>> void sock_def_readable(struct sock *sk); >>>>>>>> >>>>>>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); >>>>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); >>>>>>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool); >>>>>>>> int sock_get_timestamping(struct so_timestamping *timestamping, >>>>>>>> sockptr_t optval, unsigned int optlen); >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>>>>> index 996426095bd9..08135f538c99 100644 >>>>>>>> --- a/net/core/filter.c >>>>>>>> +++ b/net/core/filter.c >>>>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { >>>>>>>> .arg1_type = ARG_PTR_TO_CTX, >>>>>>>> }; >>>>>>>> >>>>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); >>>>>>>> + >>>>>>>> static int bpf_sock_set_timestamping(struct sock *sk, >>>>>>>> struct so_timestamping *timestamping) >>>>>>>> { >>>>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, >>>>>>>> return -EINVAL; >>>>>>>> >>>>>>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); >>>>>>>> + static_branch_enable(&bpf_tstamp_control); >>>>>>> >>>>>>> Not sure when is a good time to do static_branch_disable(). >>>>>> >>>>>> Thanks for the review. >>>>>> >>>>>> To be honest, I considered how to disable the static key. Like you >>>>>> said, I failed to find a good chance that I can accurately disable it. >>>>>> >>>>>>> >>>>>>> The bpf prog may be detached also. (IF) it ends up staying with the >>>>>>> cgroup/sockops interface, it should depend on the existing static key in >>>>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. >>>>>> >>>>>> Are you suggesting that we need to remove the current static key? In >>>>>> the previous thread, the reason why Willem came up with this idea is, >>>>>> I think, to avoid affect the non-bpf timestamping feature. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> return 0; >>>>>>>> } >>>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>>>>> index f36eb9daa31a..d0f912f1ff7b 100644 >>>>>>>> --- a/net/core/skbuff.c >>>>>>>> +++ b/net/core/skbuff.c >>>>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, >>>>>>>> } >>>>>>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); >>>>>>>> >>>>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) >>>>>>> >>>>>>> sk is unused. >>>>>> >>>>>> Thanks for the careful check. >>>>>> >>>>>>> >>>>>>>> +{ >>>>>>>> + u32 testflag; >>>>>>>> + >>>>>>>> + switch (tstype) { >>>>>>>> + case SCM_TSTAMP_SCHED: >>>>>>> >>>>>>> Instead of doing this translation, >>>>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the >>>>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? >>>>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in >>>>>>> sk->sk_tsflags_bpf? >>>>>> >>>>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, >>>>>> SOF_TIMESTAMPING_OPT_ID, that we need to support. >>>>>> >>>>>>> >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_SCHED; >>>>>>>> + break; >>>>>>>> + case SCM_TSTAMP_SND: >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; >>>>>>>> + break; >>>>>>>> + case SCM_TSTAMP_ACK: >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_ACK; >>>>>>>> + break; >>>>>>>> + default: >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> + if (tsflags & testflag) >>>>>>>> + return true; >>>>>>>> + >>>>>>>> + return false; >>>>>>>> +} >>>>>>>> + >>>>>>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, >>>>>>>> const struct sk_buff *ack_skb, >>>>>>>> struct skb_shared_hwtstamps *hwtstamps, >>>>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, >>>>>>>> if (!skb_may_tx_timestamp(sk, tsonly)) >>>>>>>> return; >>>>>>>> >>>>>>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) >>>>>>> >>>>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. >>>>>> >>>>>> This test will be used in bpf and non-bpf cases. Because of this, we >>>>>> can support BPF extension. In this function, if skb has tsflags but we >>>>>> don't know which approach the user expects, sk_tstamp_tx_flags() can >>>>>> help us. >>>>>> >>>>>>> >>>>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual >>>>>>> skb. Would it break? >>>>>> >>>>>> Oh, you're right. I didn't support cmsg mode... >>>>> >>>>> I think I only need to test if it's in the bpf mode, or else let the >>>>> original way print the timestamp, which can solve the issue. >>>> >>>> From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, >>>> SCM_TSTAMP_SCHED);": >>>> >>>> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) >>>> { >>>> /* ... */ >>>> >>>> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) >>>> __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); >>>> >>>> /* ... */ >>>> } >>>> >>>> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled >>>> the timestamping. I may have missed somewhere in the patch set that the skb's >>>> tx_flags is changed by sk->sk_tsflags_bpf alone? >>> >>> If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be >>> helpful, which initializes every last skb, please see patch [10/12]. >> >> Ah. ok. It is the thing I missed. Thanks for the pointer. >> >>>> >>>> I think a skb tskey is still desired (?), so eventually we want some spaces in >>> >>> tskey function is optional I think. It depends whether users want to >>> use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag. >>> >>>> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending >>>> skb->data_meta usage outside of xdp and tc. I think here we want to have it >>>> available at the tx side to store the tx_flags and tskey but probably want them >>>> at a specific place/offset at the data_meta. >>> >>> If we have the plan to store extra information in data_meta, I can >>> give it a try:) >>> >>>> >>>> For now, is there thing we can explore to share in the skb_shared_info? >>> >>> My initial thought is just to reuse these fields in skb. It can work >>> without interfering one another. >> >> After reading closer to patch 10, I am likely still missing something. How can >> it tell if the tx_flags is set by the bpf or by the user space cmsg? > > If the skb carries the timestamp, there are three cases: > 1) non-bpf case and users uses setsockopt() > 2) cmsg case > 3) bpf case > > #1 and #2 are already handled well before this patch. I only need to > test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > else it could be #1 or #2, then we will let the old way print > timestamps in __skb_tstamp_tx(). hmm... I am still not sure I fully understand...but I think I may start getting it. Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. It has to be able to set and clear. Does it also mean either the bpf or the user space can enable the timetstamping but not both? I don't think we can assume this also. It will be hard to deploy the bpf prog in production to collect continuous data. The user space may have some timestamping enabled but the bpf may want to do its parallel investigation also. The user space may rollout timestamping in the future and suddenly break the bpf prog. [ getting late here. will continue later. ] > >> >>> >>>> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey >>>> only at the "tx" side? There is already another union member. >>> >>> tskey is always used in the tx path. >>> >>> hwtstamps can be used in both rx and tx cases (please see >>> tcp_update_recv_tstamps() and skb_tstamp_tx()). >> >> hmm... we only need some where to store the bpf tx_flags and bpf tskey in the >> TX-ing skb. > > And there is one more field we have to take care of: txstamp_ack which > indicates whether we print timestamp when the last skb is acked. > Please see tcp_tx_timestamp(). > >> You meant the hwtstamps of a Tx-ing skb is not empty? > > Sometimes, it's not empty if the hardware supports the timestamp > feature and the user wants to see it (by enabling the > SOF_TIMESTAMPING_TX_HARDWARE flag). As we can see, there are many > callers calling skb_tstamp_tx(). > >> >> At skb_tstamp_tx (TX side only?), the orig_skb's hwtstamps has not been written yet? > > I'm not that sure about the orig_skb. It seems no. I can see some > callers reading ptp timestamp from the nic and pass the timestamp to > skb_tstamp_tx(). > > Thanks, > Jason
On Wed, Oct 16, 2024 at 4:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/16/24 12:54 AM, Jason Xing wrote: > > On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 10/15/24 11:30 PM, Jason Xing wrote: > >>> On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >>>> > >>>> On 10/15/24 6:32 PM, Jason Xing wrote: > >>>>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > >>>>>> > >>>>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >>>>>>> > >>>>>>> On 10/11/24 9:06 PM, Jason Xing wrote: > >>>>>>>> From: Jason Xing <kernelxing@tencent.com> > >>>>>>>> > >>>>>>>> Willem suggested that we use a static key to control. The advantage > >>>>>>>> is that we will not affect the existing applications at all if we > >>>>>>>> don't load BPF program. > >>>>>>>> > >>>>>>>> In this patch, except the static key, I also add one logic that is > >>>>>>>> used to test if the socket has enabled its tsflags in order to > >>>>>>>> support bpf logic to allow both cases to happen at the same time. > >>>>>>>> Or else, the skb carring related timestamp flag doesn't know which > >>>>>>>> way of printing is desirable. > >>>>>>>> > >>>>>>>> One thing important is this patch allows print from both applications > >>>>>>>> and bpf program at the same time. Now we have three kinds of print: > >>>>>>>> 1) only BPF program prints > >>>>>>>> 2) only application program prints > >>>>>>>> 3) both can print without side effect > >>>>>>>> > >>>>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com> > >>>>>>>> --- > >>>>>>>> include/net/sock.h | 1 + > >>>>>>>> net/core/filter.c | 3 +++ > >>>>>>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++ > >>>>>>>> 3 files changed, 42 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/include/net/sock.h b/include/net/sock.h > >>>>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644 > >>>>>>>> --- a/include/net/sock.h > >>>>>>>> +++ b/include/net/sock.h > >>>>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) > >>>>>>>> void sock_def_readable(struct sock *sk); > >>>>>>>> > >>>>>>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); > >>>>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); > >>>>>>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool); > >>>>>>>> int sock_get_timestamping(struct so_timestamping *timestamping, > >>>>>>>> sockptr_t optval, unsigned int optlen); > >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>>>>>> index 996426095bd9..08135f538c99 100644 > >>>>>>>> --- a/net/core/filter.c > >>>>>>>> +++ b/net/core/filter.c > >>>>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { > >>>>>>>> .arg1_type = ARG_PTR_TO_CTX, > >>>>>>>> }; > >>>>>>>> > >>>>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); > >>>>>>>> + > >>>>>>>> static int bpf_sock_set_timestamping(struct sock *sk, > >>>>>>>> struct so_timestamping *timestamping) > >>>>>>>> { > >>>>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, > >>>>>>>> return -EINVAL; > >>>>>>>> > >>>>>>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); > >>>>>>>> + static_branch_enable(&bpf_tstamp_control); > >>>>>>> > >>>>>>> Not sure when is a good time to do static_branch_disable(). > >>>>>> > >>>>>> Thanks for the review. > >>>>>> > >>>>>> To be honest, I considered how to disable the static key. Like you > >>>>>> said, I failed to find a good chance that I can accurately disable it. > >>>>>> > >>>>>>> > >>>>>>> The bpf prog may be detached also. (IF) it ends up staying with the > >>>>>>> cgroup/sockops interface, it should depend on the existing static key in > >>>>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > >>>>>> > >>>>>> Are you suggesting that we need to remove the current static key? In > >>>>>> the previous thread, the reason why Willem came up with this idea is, > >>>>>> I think, to avoid affect the non-bpf timestamping feature. > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> return 0; > >>>>>>>> } > >>>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>>>>>> index f36eb9daa31a..d0f912f1ff7b 100644 > >>>>>>>> --- a/net/core/skbuff.c > >>>>>>>> +++ b/net/core/skbuff.c > >>>>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, > >>>>>>>> } > >>>>>>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); > >>>>>>>> > >>>>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) > >>>>>>> > >>>>>>> sk is unused. > >>>>>> > >>>>>> Thanks for the careful check. > >>>>>> > >>>>>>> > >>>>>>>> +{ > >>>>>>>> + u32 testflag; > >>>>>>>> + > >>>>>>>> + switch (tstype) { > >>>>>>>> + case SCM_TSTAMP_SCHED: > >>>>>>> > >>>>>>> Instead of doing this translation, > >>>>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the > >>>>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf? > >>>>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in > >>>>>>> sk->sk_tsflags_bpf? > >>>>>> > >>>>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say, > >>>>>> SOF_TIMESTAMPING_OPT_ID, that we need to support. > >>>>>> > >>>>>>> > >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_SCHED; > >>>>>>>> + break; > >>>>>>>> + case SCM_TSTAMP_SND: > >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; > >>>>>>>> + break; > >>>>>>>> + case SCM_TSTAMP_ACK: > >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_ACK; > >>>>>>>> + break; > >>>>>>>> + default: > >>>>>>>> + return false; > >>>>>>>> + } > >>>>>>>> + if (tsflags & testflag) > >>>>>>>> + return true; > >>>>>>>> + > >>>>>>>> + return false; > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > >>>>>>>> const struct sk_buff *ack_skb, > >>>>>>>> struct skb_shared_hwtstamps *hwtstamps, > >>>>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > >>>>>>>> if (!skb_may_tx_timestamp(sk, tsonly)) > >>>>>>>> return; > >>>>>>>> > >>>>>>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > >>>>>>> > >>>>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly. > >>>>>> > >>>>>> This test will be used in bpf and non-bpf cases. Because of this, we > >>>>>> can support BPF extension. In this function, if skb has tsflags but we > >>>>>> don't know which approach the user expects, sk_tstamp_tx_flags() can > >>>>>> help us. > >>>>>> > >>>>>>> > >>>>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual > >>>>>>> skb. Would it break? > >>>>>> > >>>>>> Oh, you're right. I didn't support cmsg mode... > >>>>> > >>>>> I think I only need to test if it's in the bpf mode, or else let the > >>>>> original way print the timestamp, which can solve the issue. > >>>> > >>>> From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, > >>>> SCM_TSTAMP_SCHED);": > >>>> > >>>> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > >>>> { > >>>> /* ... */ > >>>> > >>>> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) > >>>> __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); > >>>> > >>>> /* ... */ > >>>> } > >>>> > >>>> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled > >>>> the timestamping. I may have missed somewhere in the patch set that the skb's > >>>> tx_flags is changed by sk->sk_tsflags_bpf alone? > >>> > >>> If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be > >>> helpful, which initializes every last skb, please see patch [10/12]. > >> > >> Ah. ok. It is the thing I missed. Thanks for the pointer. > >> > >>>> > >>>> I think a skb tskey is still desired (?), so eventually we want some spaces in > >>> > >>> tskey function is optional I think. It depends whether users want to > >>> use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag. > >>> > >>>> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending > >>>> skb->data_meta usage outside of xdp and tc. I think here we want to have it > >>>> available at the tx side to store the tx_flags and tskey but probably want them > >>>> at a specific place/offset at the data_meta. > >>> > >>> If we have the plan to store extra information in data_meta, I can > >>> give it a try:) > >>> > >>>> > >>>> For now, is there thing we can explore to share in the skb_shared_info? > >>> > >>> My initial thought is just to reuse these fields in skb. It can work > >>> without interfering one another. > >> > >> After reading closer to patch 10, I am likely still missing something. How can > >> it tell if the tx_flags is set by the bpf or by the user space cmsg? > > > > If the skb carries the timestamp, there are three cases: > > 1) non-bpf case and users uses setsockopt() > > 2) cmsg case > > 3) bpf case > > > > #1 and #2 are already handled well before this patch. I only need to > > test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > > else it could be #1 or #2, then we will let the old way print > > timestamps in __skb_tstamp_tx(). > > hmm... I am still not sure I fully understand...but I think I may start getting it. Sorry, my bad. I gave the wrong answer... It should be: Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should work fine. If it has the flag, we could use skb_tstamp_tx_output() to print based on patch [4/12]; if not, we will use bpf_skb_tstamp_tx_output() to print. If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag (please see Documentation/networking/timestamping.rst). We can see a good example on how to use in tools/testing/selftests/net/txtimestamp.c: do_test() { sock_opt = SOF_TIMESTAMPING_SOFTWARE | ... if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (char *) &sock_opt, sizeof(sock_opt))) } > > Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once > it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. > It has to be able to set and clear. I cannot find a good time to clear all the sockets which are set through the BPF program. If we detach the BPF program, it will not print of course. Does it really matter if we don't clear the sk_tsflags_bpf? > > Does it also mean either the bpf or the user space can enable the timetstamping > but not both? I don't think we can assume this also. It will be hard to deploy > the bpf prog in production to collect continuous data. The user space may have > some timestamping enabled but the bpf may want to do its parallel investigation > also. The user space may rollout timestamping in the future and suddenly break > the bpf prog. Well, IIUC, it's also the basic idea from the current series which allows both happening at the same time. Let us put it in a simple way, I hope that if the app uses the SO_TIMESTAMPING feature already, then one admin deploys the BPF program, that app should be traced both in bpf and non-bpf ways. But Willem doesn't agree about this approach[1] because of hard to debug. [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/ Regarding to this link, I have a few more words to say: the socket could be set through bpf_setsockopt() in different phases not like setsockopt(), so in some cases we cannot make setsockopt hard failed. After rethinking this point more, I still reckon that letting BPF program trace timestamping parallelly without caring whether the socket is set to the SO_TIMESTAMPING feature through setsockopt() method. It means I would like to keep this part in patch [04/12]: @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, if (!sk) return; + if (static_branch_unlikely(&bpf_tstamp_control)) + bpf_skb_tstamp_tx_output(sk, tstype); + skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); } EXPORT_SYMBOL_GPL(__skb_tstamp_tx); > > [ getting late here. will continue later. ] Thanks for your effort :) Thanks, Jason
Jason Xing wrote: > On Wed, Oct 16, 2024 at 2:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > On 10/15/24 6:04 PM, Jason Xing wrote: > > > To be honest, I considered how to disable the static key. Like you > > > said, I failed to find a good chance that I can accurately disable it. > > > > It at least needs to be disabled whenever that bpf prog got detached. > > > > > > > >> The bpf prog may be detached also. (IF) it ends up staying with the > > >> cgroup/sockops interface, it should depend on the existing static key in > > >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > > > > > Are you suggesting that we need to remove the current static key? In > > > the previous thread, the reason why Willem came up with this idea is, > > > I think, to avoid affect the non-bpf timestamping feature. > > > > Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am > > saying to use that existing key. afaict, the newly added bpf_tstamp_control key > > is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less > > granular but it has the needed accounting to disable whenever the bpf prog got > > detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS). > > Good suggestion. Good thing is that I don't need to figure out a > proper place to disable it any more. I can directly use > cgroup_bpf_enabled(CGROUP_SOCK_OPS) to test if the timestamp should be > printed with BPF program loaded. > > BTW, I found that we don't implement how to disable the ip4_min_ttl > static key. Sometimes, I'm confused whether we have to disable it at a > certain time. In this case it would be fine to not disable it at all. The crux is that it is disabled on the vast majority of machines not using the feature. If a socket uses the feature, adding the small cost of the branches on the rest of the system is fine. Disabling requires refcounting usage. Sometimes the complexity and cost of that outweights the benefit.
On Wed, Oct 16, 2024 at 9:13 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Wed, Oct 16, 2024 at 2:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > > > On 10/15/24 6:04 PM, Jason Xing wrote: > > > > To be honest, I considered how to disable the static key. Like you > > > > said, I failed to find a good chance that I can accurately disable it. > > > > > > It at least needs to be disabled whenever that bpf prog got detached. > > > > > > > > > > >> The bpf prog may be detached also. (IF) it ends up staying with the > > > >> cgroup/sockops interface, it should depend on the existing static key in > > > >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one. > > > > > > > Are you suggesting that we need to remove the current static key? In > > > > the previous thread, the reason why Willem came up with this idea is, > > > > I think, to avoid affect the non-bpf timestamping feature. > > > > > > Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am > > > saying to use that existing key. afaict, the newly added bpf_tstamp_control key > > > is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less > > > granular but it has the needed accounting to disable whenever the bpf prog got > > > detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS). > > > > Good suggestion. Good thing is that I don't need to figure out a > > proper place to disable it any more. I can directly use > > cgroup_bpf_enabled(CGROUP_SOCK_OPS) to test if the timestamp should be > > printed with BPF program loaded. > > > > BTW, I found that we don't implement how to disable the ip4_min_ttl > > static key. Sometimes, I'm confused whether we have to disable it at a > > certain time. > > In this case it would be fine to not disable it at all. > > The crux is that it is disabled on the vast majority of machines not > using the feature. If a socket uses the feature, adding the small cost > of the branches on the rest of the system is fine. > > Disabling requires refcounting usage. Sometimes the complexity and > cost of that outweights the benefit. Thanks for the explanation. I will take Martin's advice and use the CGROUP_SOCK_OPS static key. So I don't have to take efforts to implement the inc/dec/enable/disable the static key Thanks, Jason
On 10/16/24 3:36 AM, Jason Xing wrote: >>> If the skb carries the timestamp, there are three cases: >>> 1) non-bpf case and users uses setsockopt() >>> 2) cmsg case >>> 3) bpf case These should have tests in the selftests/bpf/ sooner than later. (More below). >>> >>> #1 and #2 are already handled well before this patch. I only need to >>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or >>> else it could be #1 or #2, then we will let the old way print >>> timestamps in __skb_tstamp_tx(). >> >> hmm... I am still not sure I fully understand...but I think I may start getting it. > > Sorry, my bad. I gave the wrong answer... > > It should be: > Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current behavior is to just queue to the sk_error_queue as long as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the sk_tsflags. This will eventually get ignored when user read it from the error queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? I suspect the user space will still read something from the error queue unless there is SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it from even queuing to the error queue? I think it is ok but I am not sure if anyone is depending on the above behavior. > work fine. If it has the flag, we could use skb_tstamp_tx_output() to > print based on patch [4/12]; if not, we will use > bpf_skb_tstamp_tx_output() to print. > > If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags > always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag > (please see Documentation/networking/timestamping.rst). We can see a > good example on how to use in > tools/testing/selftests/net/txtimestamp.c: > do_test() > { > sock_opt = SOF_TIMESTAMPING_SOFTWARE | > ... > if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > (char *) &sock_opt, sizeof(sock_opt))) > } > >> >> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once >> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. >> It has to be able to set and clear. > > I cannot find a good time to clear all the sockets which are set > through the BPF program. If we detach the BPF program, it will not > print of course. Does it really matter if we don't clear the > sk_tsflags_bpf? Yes, it matters. The same reason goes for why the existing bpf prog can clear the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the timestamp. For sockops program that stays forever, not all usages expect to do timestamping for the whole lifetime of the connection. If there is a way for the prog to turn it on, it should have a way for the prog to turn it off. What is the concern of allowing the bpf prog to disable something that it has enabled before? While we are on bpf_sock_ops_cb_flags, the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in the new sk_tsflags_bpf. It is something we need to clean up later when we decide what interface to use for bpf timestamping. > >> >> Does it also mean either the bpf or the user space can enable the timetstamping >> but not both? I don't think we can assume this also. It will be hard to deploy >> the bpf prog in production to collect continuous data. The user space may have >> some timestamping enabled but the bpf may want to do its parallel investigation >> also. The user space may rollout timestamping in the future and suddenly break >> the bpf prog. > > Well, IIUC, it's also the basic idea from the current series which > allows both happening at the same time. Let us put it in a simple way, > I hope that if the app uses the SO_TIMESTAMPING feature already, then > one admin deploys the BPF program, that app should be traced both in > bpf and non-bpf ways. > > But Willem doesn't agree about this approach[1] because of hard to debug. > > [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/ > Regarding to this link, I have a few more words to say: the socket > could be set through bpf_setsockopt() in different phases not like > setsockopt(), so in some cases we cannot make setsockopt hard failed. > > After rethinking this point more, I still reckon that letting BPF > program trace timestamping parallelly without caring whether the > socket is set to the SO_TIMESTAMPING feature through setsockopt() I am afraid having both work in parallel is needed. Otherwise, it will be very hard to deploy a bpf prog to run continuously in scale. Being able to collect timestamp without worrying about application changes/updates/downgrades is important. e.g. App changes from no time stamping to time stamping Please help to add selftests to show how the above cases (1), (2), (3), and other tsflags/txflags sharing cases will work. This should not be delayed until the discussion is done. It is needed sooner or later to prove both bpf and non-bpf ways can work at the same time. It will help the reviewer and also help to think about the design with a real use case in bpf prog. The example in patch 0 only prints the reported tstamp, can you share how it will be used to investigate issue? Is it also useful to know when the skb is written to the kernel during sendmsg()? Regarding the bpf_setsockopt() can be called in different phase, bpf_setsockopt() is not limited to sockops program. e.g. it can also be called from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be surprised people will try to trigger some on-and-off timestamping from bpf-tcp-cc to measure some delay. More about bpf_setsockopt() in different phase, understand that UDP is not your priority. However, it needs to have some clarity on how UDP will work and how to enable it. UDP usually has no connect/established phase. Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in another thread. What else? > method. It means I would like to keep this part in patch [04/12]: > @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, > if (!sk) > return; > > + if (static_branch_unlikely(&bpf_tstamp_control)) > + bpf_skb_tstamp_tx_output(sk, tstype); > + > skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, > tstype); > } > EXPORT_SYMBOL_GPL(__skb_tstamp_tx); > >> >> [ getting late here. will continue later. ] > > Thanks for your effort :) > > Thanks, > Jason
On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/16/24 3:36 AM, Jason Xing wrote: > >>> If the skb carries the timestamp, there are three cases: > >>> 1) non-bpf case and users uses setsockopt() > >>> 2) cmsg case > >>> 3) bpf case > > These should have tests in the selftests/bpf/ sooner than later. (More below). > > >>> > >>> #1 and #2 are already handled well before this patch. I only need to > >>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > >>> else it could be #1 or #2, then we will let the old way print > >>> timestamps in __skb_tstamp_tx(). > >> > >> hmm... I am still not sure I fully understand...but I think I may start getting it. > > > > Sorry, my bad. I gave the wrong answer... > > > > It should be: > > Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should > > You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? Yep. > > Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current > behavior is to just queue to the sk_error_queue as long as there is > "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the > sk_tsflags. This will eventually get ignored when user read it from the error > queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can read the skb from the errqueue but are not able to parse the timestamps. Please see tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp(): if ((tsflags & SOF_TIMESTAMPING_SOFTWARE... ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) > I suspect > the user space will still read something from the error queue unless there is > SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. No, please see above. > > Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it > from even queuing to the error queue? I think it is ok but I am not sure if > anyone is depending on the above behavior. SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING features including cmsg mode. But it will not be used in bpf mode. So the test statement is enough to divided those three cases into two groups. > > > work fine. If it has the flag, we could use skb_tstamp_tx_output() to > > print based on patch [4/12]; if not, we will use > > bpf_skb_tstamp_tx_output() to print. > > > > If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags > > always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag > > (please see Documentation/networking/timestamping.rst). We can see a > > good example on how to use in > > tools/testing/selftests/net/txtimestamp.c: > > do_test() > > { > > sock_opt = SOF_TIMESTAMPING_SOFTWARE | > > ... > > if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > > (char *) &sock_opt, sizeof(sock_opt))) > > } > > > >> > >> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once > >> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. > >> It has to be able to set and clear. > > > > I cannot find a good time to clear all the sockets which are set > > through the BPF program. If we detach the BPF program, it will not > > print of course. Does it really matter if we don't clear the > > sk_tsflags_bpf? > > Yes, it matters. The same reason goes for why the existing bpf prog can clear > the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the > timestamp. For sockops program that stays forever, not all usages expect to do > timestamping for the whole lifetime of the connection. If there is a way for the > prog to turn it on, it should have a way for the prog to turn it off. I see what you meant here. If we don't clear sk_tsflags_bpf, after we detach the bpf program, it will do nothing in __skb_tstamp_tx() and return earlier. It is almost equal to the effect of turning off. It is why I don't handle clearing the flag. > > What is the concern of allowing the bpf prog to disable something that it has > enabled before? Let me give one instance: If one socket is established and stays idle, how can the bpf program clear the tsflags from that socket? I have no idea. > > While we are on bpf_sock_ops_cb_flags, the > BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in > the new sk_tsflags_bpf. It is something we need to clean up later when we decide > what interface to use for bpf timestamping. I'm not sure if I understand correctly. I mimicked the use of BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of bpf_sock_ops_cb_flags_set() in BPF program? > > > > >> > >> Does it also mean either the bpf or the user space can enable the timetstamping > >> but not both? I don't think we can assume this also. It will be hard to deploy > >> the bpf prog in production to collect continuous data. The user space may have > >> some timestamping enabled but the bpf may want to do its parallel investigation > >> also. The user space may rollout timestamping in the future and suddenly break > >> the bpf prog. > > > > Well, IIUC, it's also the basic idea from the current series which > > allows both happening at the same time. Let us put it in a simple way, > > I hope that if the app uses the SO_TIMESTAMPING feature already, then > > one admin deploys the BPF program, that app should be traced both in > > bpf and non-bpf ways. > > > > But Willem doesn't agree about this approach[1] because of hard to debug. > > > > [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/ > > Regarding to this link, I have a few more words to say: the socket > > could be set through bpf_setsockopt() in different phases not like > > setsockopt(), so in some cases we cannot make setsockopt hard failed. > > > > After rethinking this point more, I still reckon that letting BPF > > program trace timestamping parallelly without caring whether the > > socket is set to the SO_TIMESTAMPING feature through setsockopt() > > I am afraid having both work in parallel is needed. Otherwise, it will be very > hard to deploy a bpf prog to run continuously in scale. Being able to collect > timestamp without worrying about application changes/updates/downgrades is > important. e.g. App changes from no time stamping to time stamping Sorry, I didn't make myself clear. Yesterday, I said I agreed with you :) So let me keep the current logic of printing (see the __skb_tstamp_tx() function in patch [04/12]) in the next version. Then I don't need to add some test statements to distinguish which way of printing. > > Please help to add selftests to show how the above cases (1), (2), (3), and > other tsflags/txflags sharing cases will work. This should not be delayed until > the discussion is done. It is needed sooner or later to prove both bpf and > non-bpf ways can work at the same time. It will help the reviewer and also help > to think about the design with a real use case in bpf prog. Got it. But I'm not sure where I should put those test cases? Could you help me point out a good example that I can follow? > > The example in patch 0 only prints the reported tstamp, can you share how it > will be used to investigate issue? No problem. Please see chapter 3 about "goal" in https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf. > Is it also useful to know when the skb is > written to the kernel during sendmsg()? You are right. Before this patch, normally applications will record an accurate timestamp before do sendmsg(). After you remind me of this, I feel that we can add the timestamp print in the future for bpf use. > > Regarding the bpf_setsockopt() can be called in different phase, > bpf_setsockopt() is not limited to sockops program. e.g. it can also be called > from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be > surprised people will try to trigger some on-and-off timestamping from > bpf-tcp-cc to measure some delay. > > > More about bpf_setsockopt() in different phase, understand that UDP is not your > priority. However, it needs to have some clarity on how UDP will work and how to > enable it. UDP usually has no connect/established phase. For now, I don't expect an extension for UDP because it will bring too much extra work. Could we discuss this later? I mean, after we finish the basic bpf extension :) > > Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are > planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in > another thread. What else? In this patch series, I support SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE, which you've already noticed from the BPF example in patch [0/12]. They all come from the original design of SO_TIMESTAMPING feature. The question you proposed is what I am willing to implement in the future, like adding one hook in tcp_write_xmit()? It's part of my plans to extend in the future, not be included in this series. Thanks for your review. Thanks, Jason
On 10/16/24 7:28 PM, Jason Xing wrote: > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 10/16/24 3:36 AM, Jason Xing wrote: >>>>> If the skb carries the timestamp, there are three cases: >>>>> 1) non-bpf case and users uses setsockopt() >>>>> 2) cmsg case >>>>> 3) bpf case >> >> These should have tests in the selftests/bpf/ sooner than later. (More below). >> >>>>> >>>>> #1 and #2 are already handled well before this patch. I only need to >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or >>>>> else it could be #1 or #2, then we will let the old way print >>>>> timestamps in __skb_tstamp_tx(). >>>> >>>> hmm... I am still not sure I fully understand...but I think I may start getting it. >>> >>> Sorry, my bad. I gave the wrong answer... >>> >>> It should be: >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should >> >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? > > Yep. > >> >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current >> behavior is to just queue to the sk_error_queue as long as there is >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the >> sk_tsflags. This will eventually get ignored when user read it from the error >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > read the skb from the errqueue but are not able to parse the > timestamps. Please see > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp(): > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE... > ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error still returns the skb to the user. I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka traditional) way is that the generated skb will still be queued in the error queue. The user space can still read it but just won't have the timestamp cmsg. If I understand how the v3 may look like, the skb will not be queued in the error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user space won't get it from the error queue which is a change of behavior. I was saying I am fine but not sure if someone depends on this behavior. I think we start talking pass each other on this. I will wait for the code on this part and the selftest first. > >> I suspect >> the user space will still read something from the error queue unless there is >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. > > No, please see above. > >> >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it >> from even queuing to the error queue? I think it is ok but I am not sure if >> anyone is depending on the above behavior. > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING Got it. This part is now understood. It is one of the reasons for my earlier question on which SOF_* that bpf needs to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but lets leave these nits for a little later. However, it will be very useful to highlight which SOF_* will never be used in bpf in v3. > features including cmsg mode. But it will not be used in bpf mode. So > the test statement is enough to divided those three cases into two > groups. > >> >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to >>> print based on patch [4/12]; if not, we will use >>> bpf_skb_tstamp_tx_output() to print. >>> >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag >>> (please see Documentation/networking/timestamping.rst). We can see a >>> good example on how to use in >>> tools/testing/selftests/net/txtimestamp.c: >>> do_test() >>> { >>> sock_opt = SOF_TIMESTAMPING_SOFTWARE | >>> ... >>> if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, >>> (char *) &sock_opt, sizeof(sock_opt))) >>> } >>> >>>> >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. >>>> It has to be able to set and clear. >>> >>> I cannot find a good time to clear all the sockets which are set >>> through the BPF program. If we detach the BPF program, it will not >>> print of course. Does it really matter if we don't clear the >>> sk_tsflags_bpf? >> >> Yes, it matters. The same reason goes for why the existing bpf prog can clear >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the >> timestamp. For sockops program that stays forever, not all usages expect to do >> timestamping for the whole lifetime of the connection. If there is a way for the >> prog to turn it on, it should have a way for the prog to turn it off. > > I see what you meant here. If we don't clear sk_tsflags_bpf, after we > detach the bpf program, it will do nothing in __skb_tstamp_tx() and > return earlier. It is almost equal to the effect of turning off. It is > why I don't handle clearing the flag. > >> >> What is the concern of allowing the bpf prog to disable something that it has >> enabled before? > > Let me give one instance: > If one socket is established and stays idle, how can the bpf program > clear the tsflags from that socket? I have no idea. bpf_tcp_iter prog can. That said, the idle connection example is too carry away as an excuse that bpf_setsockopt does not need to support turning-off. Sure, idle connection is as good as off. and yes, detach is as good as off also. I am now acting as a broken clock repeating myself that not all use cases run for 5 mins and then detach, so I need to be specific here that bpf_setsockopt not supporting off is a nack. There are many use cases in production that the bpf prog runs forever and wants to turn it on-and-off. Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn it off if needed. I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am asking to support clearing the sk_tsflags_bpf in bpf_setsockopt(). I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits. > >> >> While we are on bpf_sock_ops_cb_flags, the >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide >> what interface to use for bpf timestamping. > > I'm not sure if I understand correctly. I mimicked the use of > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of > bpf_sock_ops_cb_flags_set() in BPF program? In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant. It is as good as testing and setting sk_tsflags_bpf alone. This could be some cleanup for the later stage of the set. > >> >>> >>>> >>>> Does it also mean either the bpf or the user space can enable the timetstamping >>>> but not both? I don't think we can assume this also. It will be hard to deploy >>>> the bpf prog in production to collect continuous data. The user space may have >>>> some timestamping enabled but the bpf may want to do its parallel investigation >>>> also. The user space may rollout timestamping in the future and suddenly break >>>> the bpf prog. >>> >>> Well, IIUC, it's also the basic idea from the current series which >>> allows both happening at the same time. Let us put it in a simple way, >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then >>> one admin deploys the BPF program, that app should be traced both in >>> bpf and non-bpf ways. >>> >>> But Willem doesn't agree about this approach[1] because of hard to debug. >>> >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/ >>> Regarding to this link, I have a few more words to say: the socket >>> could be set through bpf_setsockopt() in different phases not like >>> setsockopt(), so in some cases we cannot make setsockopt hard failed. >>> >>> After rethinking this point more, I still reckon that letting BPF >>> program trace timestamping parallelly without caring whether the >>> socket is set to the SO_TIMESTAMPING feature through setsockopt() >> >> I am afraid having both work in parallel is needed. Otherwise, it will be very >> hard to deploy a bpf prog to run continuously in scale. Being able to collect >> timestamp without worrying about application changes/updates/downgrades is >> important. e.g. App changes from no time stamping to time stamping > > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you > :) So let me keep the current logic of printing (see the > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then > I don't need to add some test statements to distinguish which way of > printing. > >> >> Please help to add selftests to show how the above cases (1), (2), (3), and >> other tsflags/txflags sharing cases will work. This should not be delayed until >> the discussion is done. It is needed sooner or later to prove both bpf and >> non-bpf ways can work at the same time. It will help the reviewer and also help >> to think about the design with a real use case in bpf prog. > > Got it. But I'm not sure where I should put those test cases? Could > you help me point out a good example that I can follow? Have you looked at the selftests/bpf directory? prog_tests/setget_sockopt.c may be something closer to what you need. There is a recent one in the mailing list also: https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com/ The expectation is to be able to run the test like this: ./test_progs -t setget_sockopt > >> >> The example in patch 0 only prints the reported tstamp, can you share how it >> will be used to investigate issue? > > No problem. Please see chapter 3 about "goal" in > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf. Thanks. > >> Is it also useful to know when the skb is >> written to the kernel during sendmsg()? > > You are right. Before this patch, normally applications will record an > accurate timestamp before do sendmsg(). > > After you remind me of this, I feel that we can add the timestamp > print in the future for bpf use. Yes, please add the sendmsg timestamp capturing in the selftest. It is useful. > >> >> Regarding the bpf_setsockopt() can be called in different phase, >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be >> surprised people will try to trigger some on-and-off timestamping from >> bpf-tcp-cc to measure some delay. >> >> >> More about bpf_setsockopt() in different phase, understand that UDP is not your >> priority. However, it needs to have some clarity on how UDP will work and how to >> enable it. UDP usually has no connect/established phase. > > For now, I don't expect an extension for UDP because it will bring too > much extra work. Could we discuss this later? I mean, after we finish > the basic bpf extension :) Later is fine but before this set lands. I am not asking a full UDP implementation but need ideas on how that may look like. We need some clarity on how UDP will work and also how much new sockops API extension will be needed to decide if sockops is the correct one going forward. I don't want to end up tcp is in sockops and UDP (and others) is non sockops. That said, the current priority is to get bpf prog and user space work without stepping on each other first. > >> >> Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are >> planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in >> another thread. What else? > > In this patch series, I support > SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE, > which you've already noticed from the BPF example in patch [0/12]. > They all come from the original design of SO_TIMESTAMPING feature. > > The question you proposed is what I am willing to implement in the > future, like adding one hook in tcp_write_xmit()? It's part of my > plans to extend in the future, not be included in this series.
On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/16/24 7:28 PM, Jason Xing wrote: > > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 10/16/24 3:36 AM, Jason Xing wrote: > >>>>> If the skb carries the timestamp, there are three cases: > >>>>> 1) non-bpf case and users uses setsockopt() > >>>>> 2) cmsg case > >>>>> 3) bpf case > >> > >> These should have tests in the selftests/bpf/ sooner than later. (More below). > >> > >>>>> > >>>>> #1 and #2 are already handled well before this patch. I only need to > >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > >>>>> else it could be #1 or #2, then we will let the old way print > >>>>> timestamps in __skb_tstamp_tx(). > >>>> > >>>> hmm... I am still not sure I fully understand...but I think I may start getting it. > >>> > >>> Sorry, my bad. I gave the wrong answer... > >>> > >>> It should be: > >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should > >> > >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? > > > > Yep. > > > >> > >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current > >> behavior is to just queue to the sk_error_queue as long as there is > >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the > >> sk_tsflags. This will eventually get ignored when user read it from the error > >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > read the skb from the errqueue but are not able to parse the > > timestamps. Please see > > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp(): > > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE... > > ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) > > afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error > still returns the skb to the user. > > I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not > set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka > traditional) way is that the generated skb will still be queued in the error > queue. The user space can still read it but just won't have the timestamp cmsg. Apparently, we're on the same page. What you were saying is right, of course :) > > If I understand how the v3 may look like, the skb will not be queued in the > error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user Right, skb will not even be cloned or generated, let alone queue it in the error queue. For bpf extension, preventing skb to be queued in the error queue is a very vital thing. > space won't get it from the error queue which is a change of behavior. I was > saying I am fine but not sure if someone depends on this behavior. For bpf part, it's okay to bypass the queuing skb into the errqueue logic, which has been discussed at netconf before this series with Willem also. For non-bpf part, I will not touch and modify their prior behaviour. > > I think we start talking pass each other on this. I will wait for the code on > this part and the selftest first. I will keep this code snippets in V3 so that three kinds of printing could work parallelly: @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, if (!sk) return; + if (static_branch_unlikely(&bpf_tstamp_control)) + bpf_skb_tstamp_tx_output(sk, tstype); + skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); } So please forget what we've talked about testing the SOF_TIMESTAMPING_SOFTWARE flag thing. > > > > >> I suspect > >> the user space will still read something from the error queue unless there is > >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. > > > > No, please see above. > > > >> > >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it > >> from even queuing to the error queue? I think it is ok but I am not sure if > >> anyone is depending on the above behavior. > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > Got it. This part is now understood. > > It is one of the reasons for my earlier question on which SOF_* that bpf needs > to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but > lets leave these nits for a little later. > > However, it will be very useful to highlight which SOF_* will never be used in > bpf in v3. Got it. Will do that. > > > features including cmsg mode. But it will not be used in bpf mode. So > > the test statement is enough to divided those three cases into two > > groups. > > > > >> > >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to > >>> print based on patch [4/12]; if not, we will use > >>> bpf_skb_tstamp_tx_output() to print. > >>> > >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags > >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag > >>> (please see Documentation/networking/timestamping.rst). We can see a > >>> good example on how to use in > >>> tools/testing/selftests/net/txtimestamp.c: > >>> do_test() > >>> { > >>> sock_opt = SOF_TIMESTAMPING_SOFTWARE | > >>> ... > >>> if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > >>> (char *) &sock_opt, sizeof(sock_opt))) > >>> } > >>> > >>>> > >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once > >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. > >>>> It has to be able to set and clear. > >>> > >>> I cannot find a good time to clear all the sockets which are set > >>> through the BPF program. If we detach the BPF program, it will not > >>> print of course. Does it really matter if we don't clear the > >>> sk_tsflags_bpf? > >> > >> Yes, it matters. The same reason goes for why the existing bpf prog can clear > >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the > >> timestamp. For sockops program that stays forever, not all usages expect to do > >> timestamping for the whole lifetime of the connection. If there is a way for the > >> prog to turn it on, it should have a way for the prog to turn it off. > > > > I see what you meant here. If we don't clear sk_tsflags_bpf, after we > > detach the bpf program, it will do nothing in __skb_tstamp_tx() and > > return earlier. It is almost equal to the effect of turning off. It is > > why I don't handle clearing the flag. > > > >> > >> What is the concern of allowing the bpf prog to disable something that it has > >> enabled before? > > > > Let me give one instance: > > If one socket is established and stays idle, how can the bpf program > > clear the tsflags from that socket? I have no idea. > > bpf_tcp_iter prog can. That said, the idle connection example is too carry away > as an excuse that bpf_setsockopt does not need to support turning-off. Sure, > idle connection is as good as off. and yes, detach is as good as off also. Thanks, I see. > > I am now acting as a broken clock repeating myself that not all use cases run > for 5 mins and then detach, so I need to be specific here that bpf_setsockopt > not supporting off is a nack. There are many use cases in production that the > bpf prog runs forever and wants to turn it on-and-off. > > Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc > that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do > periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn > it off if needed. I'm not insisting not to support this. Just curious why. Now I get it. > > I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am > asking to support clearing the sk_tsflags_bpf in bpf_setsockopt(). Yeah, I know that. > > I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits. It's easy for me to support the function clearing the sk_tsflags_bpf in the bpf_setsockopt() function. Will do that :) > > > > >> > >> While we are on bpf_sock_ops_cb_flags, the > >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in > >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide > >> what interface to use for bpf timestamping. > > > > I'm not sure if I understand correctly. I mimicked the use of > > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of > > bpf_sock_ops_cb_flags_set() in BPF program? > > In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant. > It is as good as testing and setting sk_tsflags_bpf alone. I will remove it, then the code will be simplified. > > This could be some cleanup for the later stage of the set. > > > >> > >>> > >>>> > >>>> Does it also mean either the bpf or the user space can enable the timetstamping > >>>> but not both? I don't think we can assume this also. It will be hard to deploy > >>>> the bpf prog in production to collect continuous data. The user space may have > >>>> some timestamping enabled but the bpf may want to do its parallel investigation > >>>> also. The user space may rollout timestamping in the future and suddenly break > >>>> the bpf prog. > >>> > >>> Well, IIUC, it's also the basic idea from the current series which > >>> allows both happening at the same time. Let us put it in a simple way, > >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then > >>> one admin deploys the BPF program, that app should be traced both in > >>> bpf and non-bpf ways. > >>> > >>> But Willem doesn't agree about this approach[1] because of hard to debug. > >>> > >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/ > >>> Regarding to this link, I have a few more words to say: the socket > >>> could be set through bpf_setsockopt() in different phases not like > >>> setsockopt(), so in some cases we cannot make setsockopt hard failed. > >>> > >>> After rethinking this point more, I still reckon that letting BPF > >>> program trace timestamping parallelly without caring whether the > >>> socket is set to the SO_TIMESTAMPING feature through setsockopt() > >> > >> I am afraid having both work in parallel is needed. Otherwise, it will be very > >> hard to deploy a bpf prog to run continuously in scale. Being able to collect > >> timestamp without worrying about application changes/updates/downgrades is > >> important. e.g. App changes from no time stamping to time stamping > > > > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you > > :) So let me keep the current logic of printing (see the > > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then > > I don't need to add some test statements to distinguish which way of > > printing. > > > >> > >> Please help to add selftests to show how the above cases (1), (2), (3), and > >> other tsflags/txflags sharing cases will work. This should not be delayed until > >> the discussion is done. It is needed sooner or later to prove both bpf and > >> non-bpf ways can work at the same time. It will help the reviewer and also help > >> to think about the design with a real use case in bpf prog. > > > > Got it. But I'm not sure where I should put those test cases? Could > > you help me point out a good example that I can follow? > > Have you looked at the selftests/bpf directory? Sure, the full bpf program was written based on the examples in selftests/bpf. But I remembered that selftests/bpf is already deprecated? > > prog_tests/setget_sockopt.c may be something closer to what you need. > > There is a recent one in the mailing list also: > > https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com/ > > The expectation is to be able to run the test like this: ./test_progs -t > setget_sockopt Thanks for the pointer. I will spend some time studying it. > > > > >> > >> The example in patch 0 only prints the reported tstamp, can you share how it > >> will be used to investigate issue? > > > > No problem. Please see chapter 3 about "goal" in > > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf. > > Thanks. > > > > >> Is it also useful to know when the skb is > >> written to the kernel during sendmsg()? > > > > You are right. Before this patch, normally applications will record an > > accurate timestamp before do sendmsg(). > > > > After you remind me of this, I feel that we can add the timestamp > > print in the future for bpf use. > > Yes, please add the sendmsg timestamp capturing in the selftest. It is useful. > > > > >> > >> Regarding the bpf_setsockopt() can be called in different phase, > >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called > >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be > >> surprised people will try to trigger some on-and-off timestamping from > >> bpf-tcp-cc to measure some delay. > >> > >> > >> More about bpf_setsockopt() in different phase, understand that UDP is not your > >> priority. However, it needs to have some clarity on how UDP will work and how to > >> enable it. UDP usually has no connect/established phase. > > > > For now, I don't expect an extension for UDP because it will bring too > > much extra work. Could we discuss this later? I mean, after we finish > > the basic bpf extension :) > > Later is fine but before this set lands. I am not asking a full UDP > implementation but need ideas on how that may look like. We need some clarity on > how UDP will work and also how much new sockops API extension will be needed to > decide if sockops is the correct one going forward. I don't want to end up tcp > is in sockops and UDP (and others) is non sockops. I see. After removing the use of "BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG)", the bpf extension will be limited only in TCP, I think. Using sk->sk_tsflags_bpf as an indicator could work for both protocols. Thanks for your great help :) Thanks, Jason
On Fri, Oct 18, 2024 at 10:52 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > On 10/16/24 7:28 PM, Jason Xing wrote: > > > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > >> > > >> On 10/16/24 3:36 AM, Jason Xing wrote: > > >>>>> If the skb carries the timestamp, there are three cases: > > >>>>> 1) non-bpf case and users uses setsockopt() > > >>>>> 2) cmsg case > > >>>>> 3) bpf case > > >> > > >> These should have tests in the selftests/bpf/ sooner than later. (More below). > > >> > > >>>>> > > >>>>> #1 and #2 are already handled well before this patch. I only need to > > >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > > >>>>> else it could be #1 or #2, then we will let the old way print > > >>>>> timestamps in __skb_tstamp_tx(). > > >>>> > > >>>> hmm... I am still not sure I fully understand...but I think I may start getting it. > > >>> > > >>> Sorry, my bad. I gave the wrong answer... > > >>> > > >>> It should be: > > >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should > > >> > > >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? > > > > > > Yep. > > > > > >> > > >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current > > >> behavior is to just queue to the sk_error_queue as long as there is > > >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the > > >> sk_tsflags. This will eventually get ignored when user read it from the error > > >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? > > > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > > read the skb from the errqueue but are not able to parse the > > > timestamps. Please see > > > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp(): > > > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE... > > > ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) > > > > afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error > > still returns the skb to the user. > > > > I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not > > set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka > > traditional) way is that the generated skb will still be queued in the error > > queue. The user space can still read it but just won't have the timestamp cmsg. > > Apparently, we're on the same page. What you were saying is right, of course :) > > > > > If I understand how the v3 may look like, the skb will not be queued in the > > error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user > > Right, skb will not even be cloned or generated, let alone queue it in > the error queue. For bpf extension, preventing skb to be queued in the > error queue is a very vital thing. > > > space won't get it from the error queue which is a change of behavior. I was > > saying I am fine but not sure if someone depends on this behavior. > > For bpf part, it's okay to bypass the queuing skb into the errqueue > logic, which has been discussed at netconf before this series with > Willem also. > For non-bpf part, I will not touch and modify their prior behaviour. > > > > > I think we start talking pass each other on this. I will wait for the code on > > this part and the selftest first. > > I will keep this code snippets in V3 so that three kinds of printing > could work parallelly: > @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, > if (!sk) > return; > > + if (static_branch_unlikely(&bpf_tstamp_control)) > + bpf_skb_tstamp_tx_output(sk, tstype); > + > skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); > } > > So please forget what we've talked about testing the > SOF_TIMESTAMPING_SOFTWARE flag thing. > > > > > > > > >> I suspect > > >> the user space will still read something from the error queue unless there is > > >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. > > > > > > No, please see above. > > > > > >> > > >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it > > >> from even queuing to the error queue? I think it is ok but I am not sure if > > >> anyone is depending on the above behavior. > > > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > > > Got it. This part is now understood. > > > > It is one of the reasons for my earlier question on which SOF_* that bpf needs > > to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but > > lets leave these nits for a little later. > > > > However, it will be very useful to highlight which SOF_* will never be used in > > bpf in v3. > > Got it. Will do that. > > > > > > features including cmsg mode. But it will not be used in bpf mode. So > > > the test statement is enough to divided those three cases into two > > > groups. > > > > > > > >> > > >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to > > >>> print based on patch [4/12]; if not, we will use > > >>> bpf_skb_tstamp_tx_output() to print. > > >>> > > >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags > > >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag > > >>> (please see Documentation/networking/timestamping.rst). We can see a > > >>> good example on how to use in > > >>> tools/testing/selftests/net/txtimestamp.c: > > >>> do_test() > > >>> { > > >>> sock_opt = SOF_TIMESTAMPING_SOFTWARE | > > >>> ... > > >>> if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > > >>> (char *) &sock_opt, sizeof(sock_opt))) > > >>> } > > >>> > > >>>> > > >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once > > >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. > > >>>> It has to be able to set and clear. > > >>> > > >>> I cannot find a good time to clear all the sockets which are set > > >>> through the BPF program. If we detach the BPF program, it will not > > >>> print of course. Does it really matter if we don't clear the > > >>> sk_tsflags_bpf? > > >> > > >> Yes, it matters. The same reason goes for why the existing bpf prog can clear > > >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the > > >> timestamp. For sockops program that stays forever, not all usages expect to do > > >> timestamping for the whole lifetime of the connection. If there is a way for the > > >> prog to turn it on, it should have a way for the prog to turn it off. > > > > > > I see what you meant here. If we don't clear sk_tsflags_bpf, after we > > > detach the bpf program, it will do nothing in __skb_tstamp_tx() and > > > return earlier. It is almost equal to the effect of turning off. It is > > > why I don't handle clearing the flag. > > > > > >> > > >> What is the concern of allowing the bpf prog to disable something that it has > > >> enabled before? > > > > > > Let me give one instance: > > > If one socket is established and stays idle, how can the bpf program > > > clear the tsflags from that socket? I have no idea. > > > > bpf_tcp_iter prog can. That said, the idle connection example is too carry away > > as an excuse that bpf_setsockopt does not need to support turning-off. Sure, > > idle connection is as good as off. and yes, detach is as good as off also. > > Thanks, I see. > > > > > I am now acting as a broken clock repeating myself that not all use cases run > > for 5 mins and then detach, so I need to be specific here that bpf_setsockopt > > not supporting off is a nack. There are many use cases in production that the > > bpf prog runs forever and wants to turn it on-and-off. > > > > Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc > > that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do > > periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn > > it off if needed. > > I'm not insisting not to support this. Just curious why. Now I get it. > > > > > I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am > > asking to support clearing the sk_tsflags_bpf in bpf_setsockopt(). > > Yeah, I know that. > > > > > I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits. > > It's easy for me to support the function clearing the sk_tsflags_bpf > in the bpf_setsockopt() function. Will do that :) > > > > > > > > >> > > >> While we are on bpf_sock_ops_cb_flags, the > > >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in > > >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide > > >> what interface to use for bpf timestamping. > > > > > > I'm not sure if I understand correctly. I mimicked the use of > > > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of > > > bpf_sock_ops_cb_flags_set() in BPF program? > > > > In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant. > > It is as good as testing and setting sk_tsflags_bpf alone. > > I will remove it, then the code will be simplified. > > > > > This could be some cleanup for the later stage of the set. > > > > > >> > > >>> > > >>>> > > >>>> Does it also mean either the bpf or the user space can enable the timetstamping > > >>>> but not both? I don't think we can assume this also. It will be hard to deploy > > >>>> the bpf prog in production to collect continuous data. The user space may have > > >>>> some timestamping enabled but the bpf may want to do its parallel investigation > > >>>> also. The user space may rollout timestamping in the future and suddenly break > > >>>> the bpf prog. > > >>> > > >>> Well, IIUC, it's also the basic idea from the current series which > > >>> allows both happening at the same time. Let us put it in a simple way, > > >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then > > >>> one admin deploys the BPF program, that app should be traced both in > > >>> bpf and non-bpf ways. > > >>> > > >>> But Willem doesn't agree about this approach[1] because of hard to debug. > > >>> > > >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/ > > >>> Regarding to this link, I have a few more words to say: the socket > > >>> could be set through bpf_setsockopt() in different phases not like > > >>> setsockopt(), so in some cases we cannot make setsockopt hard failed. > > >>> > > >>> After rethinking this point more, I still reckon that letting BPF > > >>> program trace timestamping parallelly without caring whether the > > >>> socket is set to the SO_TIMESTAMPING feature through setsockopt() > > >> > > >> I am afraid having both work in parallel is needed. Otherwise, it will be very > > >> hard to deploy a bpf prog to run continuously in scale. Being able to collect > > >> timestamp without worrying about application changes/updates/downgrades is > > >> important. e.g. App changes from no time stamping to time stamping > > > > > > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you > > > :) So let me keep the current logic of printing (see the > > > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then > > > I don't need to add some test statements to distinguish which way of > > > printing. > > > > > >> > > >> Please help to add selftests to show how the above cases (1), (2), (3), and > > >> other tsflags/txflags sharing cases will work. This should not be delayed until > > >> the discussion is done. It is needed sooner or later to prove both bpf and > > >> non-bpf ways can work at the same time. It will help the reviewer and also help > > >> to think about the design with a real use case in bpf prog. > > > > > > Got it. But I'm not sure where I should put those test cases? Could > > > you help me point out a good example that I can follow? > > > > Have you looked at the selftests/bpf directory? > > Sure, the full bpf program was written based on the examples in > selftests/bpf. But I remembered that selftests/bpf is already > deprecated? > > > > > prog_tests/setget_sockopt.c may be something closer to what you need. > > > > There is a recent one in the mailing list also: > > > > https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com/ > > > > The expectation is to be able to run the test like this: ./test_progs -t > > setget_sockopt > > Thanks for the pointer. I will spend some time studying it. > > > > > > > > >> > > >> The example in patch 0 only prints the reported tstamp, can you share how it > > >> will be used to investigate issue? > > > > > > No problem. Please see chapter 3 about "goal" in > > > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf. > > > > Thanks. > > > > > > > >> Is it also useful to know when the skb is > > >> written to the kernel during sendmsg()? > > > > > > You are right. Before this patch, normally applications will record an > > > accurate timestamp before do sendmsg(). > > > > > > After you remind me of this, I feel that we can add the timestamp > > > print in the future for bpf use. > > > > Yes, please add the sendmsg timestamp capturing in the selftest. It is useful. > > > > > > > >> > > >> Regarding the bpf_setsockopt() can be called in different phase, > > >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called > > >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be > > >> surprised people will try to trigger some on-and-off timestamping from > > >> bpf-tcp-cc to measure some delay. > > >> > > >> > > >> More about bpf_setsockopt() in different phase, understand that UDP is not your > > >> priority. However, it needs to have some clarity on how UDP will work and how to > > >> enable it. UDP usually has no connect/established phase. > > > > > > For now, I don't expect an extension for UDP because it will bring too > > > much extra work. Could we discuss this later? I mean, after we finish > > > the basic bpf extension :) > > > > Later is fine but before this set lands. I am not asking a full UDP > > implementation but need ideas on how that may look like. We need some clarity on > > how UDP will work and also how much new sockops API extension will be needed to > > decide if sockops is the correct one going forward. I don't want to end up tcp > > is in sockops and UDP (and others) is non sockops. The differences between TCP and UDP are: 1) TCP supports SOF_TIMESTAMPING_TX_ACK, SOF_TIMESTAMPING_OPT_ID_TCP, while UDP does not. 2) the tskey works in different ways. We have a good example to run and test: tools/testing/selftests/net/txtimestamp.c If that arouses your interesting, you could try: 1) for UDP, ./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 2 -u 2) for TCP, ./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 2 I think the V3 series could easily support the UDP protocol. Let me try, then we could discuss more based on V3. Thanks, Jason
Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Willem suggested that we use a static key to control. The advantage > is that we will not affect the existing applications at all if we > don't load BPF program. > > In this patch, except the static key, I also add one logic that is > used to test if the socket has enabled its tsflags in order to > support bpf logic to allow both cases to happen at the same time. > Or else, the skb carring related timestamp flag doesn't know which > way of printing is desirable. > > One thing important is this patch allows print from both applications > and bpf program at the same time. Now we have three kinds of print: > 1) only BPF program prints > 2) only application program prints > 3) both can print without side effect > > Signed-off-by: Jason Xing <kernelxing@tencent.com> Getting back to this thread. It is long, instead of responding to multiple messages, let me combine them in a single response. * On future extensions: +1 that the UDP case, and datagrams more broadly, must have a clear development path, before we can merge TCP. Similarly, hardware timestamps need not be supported from the start, but must clearly be supportable. * On queueing packets to userspace: > > the current behavior is to just queue to the sk_error_queue as long > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > is regardless of the sk_tsflags. " > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > read the skb from the errqueue but are not able to parse the > timestamps Before queuing a packet to userspace on the error queue, the relevant reporting flag is always tested. sock_recv_timestamp has: /* * generate control messages if * - receive time stamping in software requested * - software time stamp available and wanted * - hardware time stamps available and wanted */ if (sock_flag(sk, SOCK_RCVTSTAMP) || (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || (hwtstamps->hwtstamp && (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) __sock_recv_timestamp(msg, sk, skb); Otherwise applications could get error messages queued, and epoll/poll/select would unexpectedly behave differently. > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > features including cmsg mode. But it will not be used in bpf mode. For simplicity, the two uses of the API are best kept identical. If there is a technical reason why BPF has to diverge from established behavior, this needs to be explicitly called out in the commit message. Also, if you want to extend the API for BPF in the future, good to call this out now and ideally extensions will apply to both, to maintain a uniform API. * On extra measurement points, at sendmsg or tcp_write_xmit: The first is interesting. For application timestamping, this was never needed, as the application can just call clock_gettime before sendmsg. In general, additional measurement points are not only useful if the interval between is not constant. So far, we have seen no need for any additional points. * On skb state: > > For now, is there thing we can explore to share in the skb_shared_info? skb_shinfo space is at a premium. I don't think we can justify two extra fields just for this use case. > My initial thought is just to reuse these fields in skb. It can work > without interfering one another. I'm skeptical that two methods can work at the same time. If they are started at different times, their sk_tskey will be different, for one. There may be workarounds. Maybe BPF can store its state in some BPF specific field, indeed. Or perhaps it can store per-sk shadow state that resolves the conflict. For instance, the offset between sk_tskey and bpf_tskey.
On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Willem suggested that we use a static key to control. The advantage > > is that we will not affect the existing applications at all if we > > don't load BPF program. > > > > In this patch, except the static key, I also add one logic that is > > used to test if the socket has enabled its tsflags in order to > > support bpf logic to allow both cases to happen at the same time. > > Or else, the skb carring related timestamp flag doesn't know which > > way of printing is desirable. > > > > One thing important is this patch allows print from both applications > > and bpf program at the same time. Now we have three kinds of print: > > 1) only BPF program prints > > 2) only application program prints > > 3) both can print without side effect > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > Getting back to this thread. It is long, instead of responding to > multiple messages, let me combine them in a single response. Thank you so much! > > > * On future extensions: > > +1 that the UDP case, and datagrams more broadly, must have a clear > development path, before we can merge TCP. > > Similarly, hardware timestamps need not be supported from the start, > but must clearly be supportable. Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG) could work well for both protos. Let me give it a try first. > > > * On queueing packets to userspace: > > > > the current behavior is to just queue to the sk_error_queue as long > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > > is regardless of the sk_tsflags. " > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > read the skb from the errqueue but are not able to parse the > > timestamps Above is what I tried to explain how the application timestamping feature works, not what I tried to implement for the BPF extension. > > Before queuing a packet to userspace on the error queue, the relevant > reporting flag is always tested. sock_recv_timestamp has: > > /* > * generate control messages if > * - receive time stamping in software requested > * - software time stamp available and wanted > * - hardware time stamps available and wanted > */ > if (sock_flag(sk, SOCK_RCVTSTAMP) || > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > (hwtstamps->hwtstamp && > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > __sock_recv_timestamp(msg, sk, skb); > > Otherwise applications could get error messages queued, and > epoll/poll/select would unexpectedly behave differently. Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE flag for BPF. > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > features including cmsg mode. But it will not be used in bpf mode. > > For simplicity, the two uses of the API are best kept identical. If > there is a technical reason why BPF has to diverge from established > behavior, this needs to be explicitly called out in the commit > message. > > Also, if you want to extend the API for BPF in the future, good to > call this out now and ideally extensions will apply to both, to > maintain a uniform API. As you said, I also agree on "two uses of the API are best kept identical". > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > The first is interesting. For application timestamping, this was > never needed, as the application can just call clock_gettime before > sendmsg. Yes, we could add it after we finish the current series. I'm going to write it down on my todo list. > > In general, additional measurement points are not only useful if the > interval between is not constant. So far, we have seen no need for > any additional points. Taking a snapshot of tcp_write_xmit() could be useful especially when the skb is not transmitted due to nagle algorithm. > > > * On skb state: > > > > For now, is there thing we can explore to share in the skb_shared_info? > > skb_shinfo space is at a premium. I don't think we can justify two > extra fields just for this use case. > > > My initial thought is just to reuse these fields in skb. It can work > > without interfering one another. > > I'm skeptical that two methods can work at the same time. If they are > started at different times, their sk_tskey will be different, for one. Right, sk_tskey is the only special one that I will take care of. Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be reused. > > There may be workarounds. Maybe BPF can store its state in some BPF > specific field, indeed. Or perhaps it can store per-sk shadow state > that resolves the conflict. For instance, the offset between sk_tskey > and bpf_tskey. Things could get complicated in the future if we want to unified the final tskey value for all the cases. Since 1) the value of shinfo->tskey depends on skb seq and len, 2) the final tskey output is the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in struct sock and related output logic for bpf without caring if it's the same as sk_tskey. That said, the outputs from two methods differ. Do you think it is acceptable? It could be simpler and easier if we keep them identical. Again, thanks for your long conclusion and every review. Thanks, Jason
Jason Xing wrote: > On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Willem suggested that we use a static key to control. The advantage > > > is that we will not affect the existing applications at all if we > > > don't load BPF program. > > > > > > In this patch, except the static key, I also add one logic that is > > > used to test if the socket has enabled its tsflags in order to > > > support bpf logic to allow both cases to happen at the same time. > > > Or else, the skb carring related timestamp flag doesn't know which > > > way of printing is desirable. > > > > > > One thing important is this patch allows print from both applications > > > and bpf program at the same time. Now we have three kinds of print: > > > 1) only BPF program prints > > > 2) only application program prints > > > 3) both can print without side effect > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > Getting back to this thread. It is long, instead of responding to > > multiple messages, let me combine them in a single response. > > Thank you so much! > > > > > > > * On future extensions: > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > development path, before we can merge TCP. > > > > Similarly, hardware timestamps need not be supported from the start, > > but must clearly be supportable. > > Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing > the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG) > could work well for both protos. Let me give it a try first. Great, thanks. > > > > > > * On queueing packets to userspace: > > > > > > the current behavior is to just queue to the sk_error_queue as long > > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > > > is regardless of the sk_tsflags. " > > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > > read the skb from the errqueue but are not able to parse the > > > timestamps > > Above is what I tried to explain how the application timestamping > feature works, not what I tried to implement for the BPF extension. > > > > > Before queuing a packet to userspace on the error queue, the relevant > > reporting flag is always tested. sock_recv_timestamp has: > > > > /* > > * generate control messages if > > * - receive time stamping in software requested > > * - software time stamp available and wanted > > * - hardware time stamps available and wanted > > */ > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > (hwtstamps->hwtstamp && > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > __sock_recv_timestamp(msg, sk, skb); > > > > Otherwise applications could get error messages queued, and > > epoll/poll/select would unexpectedly behave differently. > > Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE > flag for BPF. Can you elaborate on this? This sounds like it would go against the intent to have the two versions of the API (application and BPF) be equivalent. > > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > > features including cmsg mode. But it will not be used in bpf mode. > > > > For simplicity, the two uses of the API are best kept identical. If > > there is a technical reason why BPF has to diverge from established > > behavior, this needs to be explicitly called out in the commit > > message. > > > > Also, if you want to extend the API for BPF in the future, good to > > call this out now and ideally extensions will apply to both, to > > maintain a uniform API. > > As you said, I also agree on "two uses of the API are best kept identical". > > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > The first is interesting. For application timestamping, this was > > never needed, as the application can just call clock_gettime before > > sendmsg. > > Yes, we could add it after we finish the current series. I'm going to > write it down on my todo list. > > > > > In general, additional measurement points are not only useful if the > > interval between is not constant. So far, we have seen no need for > > any additional points. > > Taking a snapshot of tcp_write_xmit() could be useful especially when > the skb is not transmitted due to nagle algorithm. > > > > > > > * On skb state: > > > > > > For now, is there thing we can explore to share in the skb_shared_info? > > > > skb_shinfo space is at a premium. I don't think we can justify two > > extra fields just for this use case. > > > > > My initial thought is just to reuse these fields in skb. It can work > > > without interfering one another. > > > > I'm skeptical that two methods can work at the same time. If they are > > started at different times, their sk_tskey will be different, for one. > > Right, sk_tskey is the only special one that I will take care of. > Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be > reused. > > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > specific field, indeed. Or perhaps it can store per-sk shadow state > > that resolves the conflict. For instance, the offset between sk_tskey > > and bpf_tskey. > > Things could get complicated in the future if we want to unified the > final tskey value for all the cases. Since 1) the value of > shinfo->tskey depends on skb seq and len, 2) the final tskey output is > the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in > struct sock and related output logic for bpf without caring if it's > the same as sk_tskey. I think we can add fields to struct sock without too much concern. Adding fields to sk_buff or skb_shared_info would be more difficult. > That said, the outputs from two methods differ. Do you think it is > acceptable? It could be simpler and easier if we keep them identical. Since we can only have one skb_shared_info.tskey, if both user and bpf request OPT_ID, starting at different times, then we will have two bases against which to compute the difference. Having two fields in struct sock should suffice.
On Mon, Oct 21, 2024 at 10:49 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Willem suggested that we use a static key to control. The advantage > > > > is that we will not affect the existing applications at all if we > > > > don't load BPF program. > > > > > > > > In this patch, except the static key, I also add one logic that is > > > > used to test if the socket has enabled its tsflags in order to > > > > support bpf logic to allow both cases to happen at the same time. > > > > Or else, the skb carring related timestamp flag doesn't know which > > > > way of printing is desirable. > > > > > > > > One thing important is this patch allows print from both applications > > > > and bpf program at the same time. Now we have three kinds of print: > > > > 1) only BPF program prints > > > > 2) only application program prints > > > > 3) both can print without side effect > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > Getting back to this thread. It is long, instead of responding to > > > multiple messages, let me combine them in a single response. > > > > Thank you so much! > > > > > > > > > > > * On future extensions: > > > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > > development path, before we can merge TCP. > > > > > > Similarly, hardware timestamps need not be supported from the start, > > > but must clearly be supportable. > > > > Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing > > the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG) > > could work well for both protos. Let me give it a try first. > > Great, thanks. > > > > > > > > > > * On queueing packets to userspace: > > > > > > > > the current behavior is to just queue to the sk_error_queue as long > > > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > > > > is regardless of the sk_tsflags. " > > > > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > > > read the skb from the errqueue but are not able to parse the > > > > timestamps > > > > Above is what I tried to explain how the application timestamping > > feature works, not what I tried to implement for the BPF extension. > > > > > > > > Before queuing a packet to userspace on the error queue, the relevant > > > reporting flag is always tested. sock_recv_timestamp has: > > > > > > /* > > > * generate control messages if > > > * - receive time stamping in software requested > > > * - software time stamp available and wanted > > > * - hardware time stamps available and wanted > > > */ > > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > > (hwtstamps->hwtstamp && > > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > > __sock_recv_timestamp(msg, sk, skb); > > > > > > Otherwise applications could get error messages queued, and > > > epoll/poll/select would unexpectedly behave differently. > > > > Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE > > flag for BPF. > > Can you elaborate on this? This sounds like it would go against the > intent to have the two versions of the API (application and BPF) be > equivalent. Oh, I see what you mean here. I have no preference. Well, I can add this report flag into the BPF extension like how application timestamping works. > > > > > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > > > features including cmsg mode. But it will not be used in bpf mode. > > > > > > For simplicity, the two uses of the API are best kept identical. If > > > there is a technical reason why BPF has to diverge from established > > > behavior, this needs to be explicitly called out in the commit > > > message. > > > > > > Also, if you want to extend the API for BPF in the future, good to > > > call this out now and ideally extensions will apply to both, to > > > maintain a uniform API. > > > > As you said, I also agree on "two uses of the API are best kept identical". > > > > > > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > > > The first is interesting. For application timestamping, this was > > > never needed, as the application can just call clock_gettime before > > > sendmsg. > > > > Yes, we could add it after we finish the current series. I'm going to > > write it down on my todo list. > > > > > > > > In general, additional measurement points are not only useful if the > > > interval between is not constant. So far, we have seen no need for > > > any additional points. > > > > Taking a snapshot of tcp_write_xmit() could be useful especially when > > the skb is not transmitted due to nagle algorithm. > > > > > > > > > > > * On skb state: > > > > > > > > For now, is there thing we can explore to share in the skb_shared_info? > > > > > > skb_shinfo space is at a premium. I don't think we can justify two > > > extra fields just for this use case. > > > > > > > My initial thought is just to reuse these fields in skb. It can work > > > > without interfering one another. > > > > > > I'm skeptical that two methods can work at the same time. If they are > > > started at different times, their sk_tskey will be different, for one. > > > > Right, sk_tskey is the only special one that I will take care of. > > Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be > > reused. > > > > > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > > specific field, indeed. Or perhaps it can store per-sk shadow state > > > that resolves the conflict. For instance, the offset between sk_tskey > > > and bpf_tskey. > > > > Things could get complicated in the future if we want to unified the > > final tskey value for all the cases. Since 1) the value of > > shinfo->tskey depends on skb seq and len, 2) the final tskey output is > > the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in > > struct sock and related output logic for bpf without caring if it's > > the same as sk_tskey. > > I think we can add fields to struct sock without too much concern. > Adding fields to sk_buff or skb_shared_info would be more difficult. Got it:) > > > That said, the outputs from two methods differ. Do you think it is > > acceptable? It could be simpler and easier if we keep them identical. > > Since we can only have one skb_shared_info.tskey, if both user and bpf > request OPT_ID, starting at different times, then we will have two > bases against which to compute the difference. Having two fields in > struct sock should suffice. Exactly! I will do it. Thanks, Jason
On 10/20/24 2:51 PM, Willem de Bruijn wrote: > Jason Xing wrote: >> From: Jason Xing <kernelxing@tencent.com> >> >> Willem suggested that we use a static key to control. The advantage >> is that we will not affect the existing applications at all if we >> don't load BPF program. >> >> In this patch, except the static key, I also add one logic that is >> used to test if the socket has enabled its tsflags in order to >> support bpf logic to allow both cases to happen at the same time. >> Or else, the skb carring related timestamp flag doesn't know which >> way of printing is desirable. >> >> One thing important is this patch allows print from both applications >> and bpf program at the same time. Now we have three kinds of print: >> 1) only BPF program prints >> 2) only application program prints >> 3) both can print without side effect >> >> Signed-off-by: Jason Xing <kernelxing@tencent.com> > > Getting back to this thread. It is long, instead of responding to > multiple messages, let me combine them in a single response. > > > * On future extensions: > > +1 that the UDP case, and datagrams more broadly, must have a clear > development path, before we can merge TCP. > > Similarly, hardware timestamps need not be supported from the start, > but must clearly be supportable. > > > * On queueing packets to userspace: > >>> the current behavior is to just queue to the sk_error_queue as long >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it >>> is regardless of the sk_tsflags. " > >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can >> read the skb from the errqueue but are not able to parse the >> timestamps > > Before queuing a packet to userspace on the error queue, the relevant > reporting flag is always tested. sock_recv_timestamp has: > > /* > * generate control messages if > * - receive time stamping in software requested > * - software time stamp available and wanted > * - hardware time stamps available and wanted > */ > if (sock_flag(sk, SOCK_RCVTSTAMP) || > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > (hwtstamps->hwtstamp && > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > __sock_recv_timestamp(msg, sk, skb); > > Otherwise applications could get error messages queued, and > epoll/poll/select would unexpectedly behave differently. I just tried the following diff to remove setsockopt from txtimestamp.c and run "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue with only cmsg flag. I did a printk in __skb_tstamp_tx to ensure the sk->sk_tsflags is empty also. diff --git i/tools/testing/selftests/net/txtimestamp.c w/tools/testing/selftests/net/txtimestamp.c index dae91eb97d69..5d9d2773b076 100644 --- i/tools/testing/selftests/net/txtimestamp.c +++ w/tools/testing/selftests/net/txtimestamp.c @@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len) for (cm = CMSG_FIRSTHDR(msg); cm && cm->cmsg_len; cm = CMSG_NXTHDR(msg, cm)) { + printf("cm->cmsg_level %d cm->cmsg_type %d\n", + cm->cmsg_level, cm->cmsg_type); if (cm->cmsg_level == SOL_SOCKET && cm->cmsg_type == SCM_TIMESTAMPING) { tss = (void *) CMSG_DATA(cm); @@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len) if (batch > 1) { fprintf(stderr, "batched %d timestamps\n", batch); } else if (!batch) { - fprintf(stderr, "Failed to report timestamps\n"); + fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len); test_failed = true; } } @@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt) if (cfg_loop_nodata) sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY; + (void)sock_opt; +/* if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (char *) &sock_opt, sizeof(sock_opt))) error(1, 0, "setsockopt timestamping"); +*/ for (i = 0; i < cfg_num_pkts; i++) { memset(&msg, 0, sizeof(msg)); > >> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING >> features including cmsg mode. But it will not be used in bpf mode. > > For simplicity, the two uses of the API are best kept identical. If > there is a technical reason why BPF has to diverge from established > behavior, this needs to be explicitly called out in the commit > message. SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be passed directly to the bpf if needed without extra cost. The same probably goes for SOF_TIMESTAMPING_OPT_PKTINFO. SOF_TIMESTAMPING_SOFTWARE does not seem to be useful either. I think only a subset of SOF_* will be supported, probably only the TX_* and RX_* ones. > > Also, if you want to extend the API for BPF in the future, good to > call this out now and ideally extensions will apply to both, to > maintain a uniform API. > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > The first is interesting. For application timestamping, this was > never needed, as the application can just call clock_gettime before > sendmsg. > > In general, additional measurement points are not only useful if the > interval between is not constant. So far, we have seen no need for > any additional points. > > > * On skb state: > >>> For now, is there thing we can explore to share in the skb_shared_info? > > skb_shinfo space is at a premium. I don't think we can justify two > extra fields just for this use case. > >> My initial thought is just to reuse these fields in skb. It can work >> without interfering one another. > > I'm skeptical that two methods can work at the same time. If they are > started at different times, their sk_tskey will be different, for one. For the skb's tx_flags, Jason seems to be able to figure out by only using the new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in tx_flags. I am also not very positive on the skb's tskey for now. Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I think that approach was stalled because the tx_flags could be changed by the netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone in the tcp stack will still break things if the hwtstamps is reused for other purpose? > > There may be workarounds. Maybe BPF can store its state in some BPF > specific field, indeed. Or perhaps it can store per-sk shadow state > that resolves the conflict. For instance, the offset between sk_tskey > and bpf_tskey. I have also been proposing to explore other way for the key since bpf has direct access to the skb (also the sk, bpf prog can store data in the sk). The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack comes back, it can also learn the ack seq no. Does it help? It will be harder to use because it probably needs to store this info in the bpf map (or in the bpf sk storage). However, if it needs to learn the timestamp at the tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored somewhere also. Either in a bpf map or in a bpf sk storage. SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g. it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use SOF_TIMESTAMPING_OPT_ID.
On Tue, Oct 22, 2024 at 8:53 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 10/20/24 2:51 PM, Willem de Bruijn wrote: > > Jason Xing wrote: > >> From: Jason Xing <kernelxing@tencent.com> > >> > >> Willem suggested that we use a static key to control. The advantage > >> is that we will not affect the existing applications at all if we > >> don't load BPF program. > >> > >> In this patch, except the static key, I also add one logic that is > >> used to test if the socket has enabled its tsflags in order to > >> support bpf logic to allow both cases to happen at the same time. > >> Or else, the skb carring related timestamp flag doesn't know which > >> way of printing is desirable. > >> > >> One thing important is this patch allows print from both applications > >> and bpf program at the same time. Now we have three kinds of print: > >> 1) only BPF program prints > >> 2) only application program prints > >> 3) both can print without side effect > >> > >> Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > Getting back to this thread. It is long, instead of responding to > > multiple messages, let me combine them in a single response. > > > > > > * On future extensions: > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > development path, before we can merge TCP. > > > > Similarly, hardware timestamps need not be supported from the start, > > but must clearly be supportable. > > > > > > * On queueing packets to userspace: > > > >>> the current behavior is to just queue to the sk_error_queue as long > >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > >>> is regardless of the sk_tsflags. " > > > >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > >> read the skb from the errqueue but are not able to parse the > >> timestamps > > > > Before queuing a packet to userspace on the error queue, the relevant > > reporting flag is always tested. sock_recv_timestamp has: > > > > /* > > * generate control messages if > > * - receive time stamping in software requested > > * - software time stamp available and wanted > > * - hardware time stamps available and wanted > > */ > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > (hwtstamps->hwtstamp && > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > __sock_recv_timestamp(msg, sk, skb); > > > > Otherwise applications could get error messages queued, and > > epoll/poll/select would unexpectedly behave differently. > > I just tried the following diff to remove setsockopt from txtimestamp.c and run > "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue > with only cmsg flag. I did a printk in __skb_tstamp_tx to ensure the > sk->sk_tsflags is empty also. > > diff --git i/tools/testing/selftests/net/txtimestamp.c > w/tools/testing/selftests/net/txtimestamp.c > index dae91eb97d69..5d9d2773b076 100644 > --- i/tools/testing/selftests/net/txtimestamp.c > +++ w/tools/testing/selftests/net/txtimestamp.c > @@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int > payload_len) > for (cm = CMSG_FIRSTHDR(msg); > cm && cm->cmsg_len; > cm = CMSG_NXTHDR(msg, cm)) { > + printf("cm->cmsg_level %d cm->cmsg_type %d\n", > + cm->cmsg_level, cm->cmsg_type); > if (cm->cmsg_level == SOL_SOCKET && > cm->cmsg_type == SCM_TIMESTAMPING) { > tss = (void *) CMSG_DATA(cm); > @@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int > payload_len) > if (batch > 1) { > fprintf(stderr, "batched %d timestamps\n", batch); > } else if (!batch) { > - fprintf(stderr, "Failed to report timestamps\n"); > + fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len); > test_failed = true; > } > } > @@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt) > if (cfg_loop_nodata) > sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY; > > + (void)sock_opt; > +/* > if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > (char *) &sock_opt, sizeof(sock_opt))) > error(1, 0, "setsockopt timestamping"); > +*/ > > for (i = 0; i < cfg_num_pkts; i++) { > memset(&msg, 0, sizeof(msg)); > > > >> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > >> features including cmsg mode. But it will not be used in bpf mode. > > > > For simplicity, the two uses of the API are best kept identical. If > > there is a technical reason why BPF has to diverge from established > > behavior, this needs to be explicitly called out in the commit > > message. > > SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be > passed directly to the bpf if needed without extra cost. The same probably goes > for SOF_TIMESTAMPING_OPT_PKTINFO. Right, they will not be supported. > SOF_TIMESTAMPING_SOFTWARE does not seem to be > useful either. I think only a subset of SOF_* will be supported, probably only I had a discussion with Willem on this point yesterday. If I understand what Willem was thinking correctly, he doesn't expect users' behaviors to change too much. As I said previously, I have no strong preference. Whether keeping this report flag or not doesn't affect the core logic for BPF extension. > the TX_* and RX_* ones. > > > > > Also, if you want to extend the API for BPF in the future, good to > > call this out now and ideally extensions will apply to both, to > > maintain a uniform API. > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > The first is interesting. For application timestamping, this was > > never needed, as the application can just call clock_gettime before > > sendmsg. > > > > In general, additional measurement points are not only useful if the > > interval between is not constant. So far, we have seen no need for > > any additional points. > > > > > > * On skb state: > > > >>> For now, is there thing we can explore to share in the skb_shared_info? > > > > skb_shinfo space is at a premium. I don't think we can justify two > > extra fields just for this use case. > > > >> My initial thought is just to reuse these fields in skb. It can work > >> without interfering one another. > > > > I'm skeptical that two methods can work at the same time. If they are > > started at different times, their sk_tskey will be different, for one. > > For the skb's tx_flags, Jason seems to be able to figure out by only using the > new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in > tx_flags. Let me try, then we'll see if it works. > > I am also not very positive on the skb's tskey for now. For TCP, the final output of tskey that is reflected to users is the result of this calculation "shinfo->tskey - $KEY". $KEY is the base which could be either sk->sk_tskey or sk->sk_tskey_bpf. They are initialized at different points. You can see the calculation in __skb_complete_tx_timestamp(): serr->ee.ee_data = skb_shinfo(skb)->tskey; serr->ee.ee_data -= atomic_read(&sk->sk_tskey); With that said, we will keep two different $KEY to let each feature (bpf SO_TIMESTAMPING or application SO_TIMESTAMPING) work respectively, which also means, we probably will see two different tskeys when two methods work parallely. It's fine because as long as we can make sure the final tskeys are consistent in each feature. tskey is used to identify which sendmsg() the skb should belong to. It also works for UDP proto. > > Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping > the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I > think that approach was stalled because the tx_flags could be changed by the > netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the > skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed > until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone > in the tcp stack will still break things if the hwtstamps is reused for other > purpose? > > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > specific field, indeed. Or perhaps it can store per-sk shadow state > > that resolves the conflict. For instance, the offset between sk_tskey > > and bpf_tskey. > > I have also been proposing to explore other way for the key since bpf has direct > access to the skb (also the sk, bpf prog can store data in the sk). > > The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack > comes back, it can also learn the ack seq no. Does it help? It will be harder to > use because it probably needs to store this info in the bpf map (or in the bpf > sk storage). However, if it needs to learn the timestamp at the > tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored > somewhere also. Either in a bpf map or in a bpf sk storage. Thanks for the idea. But please see the above comment, we could keep the logic as simple as it is :) > > SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g. > it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use > SOF_TIMESTAMPING_OPT_ID. Interesting. Thanks, Jason
Martin KaFai Lau wrote: > On 10/20/24 2:51 PM, Willem de Bruijn wrote: > > Jason Xing wrote: > >> From: Jason Xing <kernelxing@tencent.com> > >> > >> Willem suggested that we use a static key to control. The advantage > >> is that we will not affect the existing applications at all if we > >> don't load BPF program. > >> > >> In this patch, except the static key, I also add one logic that is > >> used to test if the socket has enabled its tsflags in order to > >> support bpf logic to allow both cases to happen at the same time. > >> Or else, the skb carring related timestamp flag doesn't know which > >> way of printing is desirable. > >> > >> One thing important is this patch allows print from both applications > >> and bpf program at the same time. Now we have three kinds of print: > >> 1) only BPF program prints > >> 2) only application program prints > >> 3) both can print without side effect > >> > >> Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > Getting back to this thread. It is long, instead of responding to > > multiple messages, let me combine them in a single response. > > > > > > * On future extensions: > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > development path, before we can merge TCP. > > > > Similarly, hardware timestamps need not be supported from the start, > > but must clearly be supportable. > > > > > > * On queueing packets to userspace: > > > >>> the current behavior is to just queue to the sk_error_queue as long > >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > >>> is regardless of the sk_tsflags. " > > > >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > >> read the skb from the errqueue but are not able to parse the > >> timestamps > > > > Before queuing a packet to userspace on the error queue, the relevant > > reporting flag is always tested. sock_recv_timestamp has: > > > > /* > > * generate control messages if > > * - receive time stamping in software requested > > * - software time stamp available and wanted > > * - hardware time stamps available and wanted > > */ > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > (hwtstamps->hwtstamp && > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > __sock_recv_timestamp(msg, sk, skb); > > > > Otherwise applications could get error messages queued, and > > epoll/poll/select would unexpectedly behave differently. > > I just tried the following diff to remove setsockopt from txtimestamp.c and run > "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue > with only cmsg flag. That it surprising and against the API intent as I understand it. Let me reproduce and take a closer look. > I did a printk in __skb_tstamp_tx to ensure the > sk->sk_tsflags is empty also. > > diff --git i/tools/testing/selftests/net/txtimestamp.c > w/tools/testing/selftests/net/txtimestamp.c > index dae91eb97d69..5d9d2773b076 100644 > --- i/tools/testing/selftests/net/txtimestamp.c > +++ w/tools/testing/selftests/net/txtimestamp.c > @@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int > payload_len) > for (cm = CMSG_FIRSTHDR(msg); > cm && cm->cmsg_len; > cm = CMSG_NXTHDR(msg, cm)) { > + printf("cm->cmsg_level %d cm->cmsg_type %d\n", > + cm->cmsg_level, cm->cmsg_type); > if (cm->cmsg_level == SOL_SOCKET && > cm->cmsg_type == SCM_TIMESTAMPING) { > tss = (void *) CMSG_DATA(cm); > @@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int > payload_len) > if (batch > 1) { > fprintf(stderr, "batched %d timestamps\n", batch); > } else if (!batch) { > - fprintf(stderr, "Failed to report timestamps\n"); > + fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len); > test_failed = true; > } > } > @@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt) > if (cfg_loop_nodata) > sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY; > > + (void)sock_opt; > +/* > if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > (char *) &sock_opt, sizeof(sock_opt))) > error(1, 0, "setsockopt timestamping"); > +*/ > > for (i = 0; i < cfg_num_pkts; i++) { > memset(&msg, 0, sizeof(msg)); > > > >> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > >> features including cmsg mode. But it will not be used in bpf mode. > > > > For simplicity, the two uses of the API are best kept identical. If > > there is a technical reason why BPF has to diverge from established > > behavior, this needs to be explicitly called out in the commit > > message. > > SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be > passed directly to the bpf if needed without extra cost. The same probably goes > for SOF_TIMESTAMPING_OPT_PKTINFO. SOF_TIMESTAMPING_SOFTWARE does not seem to be > useful either. I think only a subset of SOF_* will be supported, probably only > the TX_* and RX_* ones. > > > > > Also, if you want to extend the API for BPF in the future, good to > > call this out now and ideally extensions will apply to both, to > > maintain a uniform API. > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > The first is interesting. For application timestamping, this was > > never needed, as the application can just call clock_gettime before > > sendmsg. > > > > In general, additional measurement points are not only useful if the > > interval between is not constant. So far, we have seen no need for > > any additional points. > > > > > > * On skb state: > > > >>> For now, is there thing we can explore to share in the skb_shared_info? > > > > skb_shinfo space is at a premium. I don't think we can justify two > > extra fields just for this use case. > > > >> My initial thought is just to reuse these fields in skb. It can work > >> without interfering one another. > > > > I'm skeptical that two methods can work at the same time. If they are > > started at different times, their sk_tskey will be different, for one. > > For the skb's tx_flags, Jason seems to be able to figure out by only using the > new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in > tx_flags. > > I am also not very positive on the skb's tskey for now. > > Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping > the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I > think that approach was stalled because the tx_flags could be changed by the > netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the > skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed > until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone > in the tcp stack will still break things if the hwtstamps is reused for other > purpose? True. I think on Tx hwtstamps is only used on the path from the driver tx completion handler to when it calls skb_tstamp_tx. It does not even really have to be an skb field. The first driver cscope happens to point me to indeed just allocates it on the stack: tsnep_tx_poll. > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > specific field, indeed. Or perhaps it can store per-sk shadow state > > that resolves the conflict. For instance, the offset between sk_tskey > > and bpf_tskey. > > I have also been proposing to explore other way for the key since bpf has direct > access to the skb (also the sk, bpf prog can store data in the sk). > > The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack > comes back, it can also learn the ack seq no. Does it help? It will be harder to > use because it probably needs to store this info in the bpf map (or in the bpf > sk storage). However, if it needs to learn the timestamp at the > tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored > somewhere also. Either in a bpf map or in a bpf sk storage. > > SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g. > it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use > SOF_TIMESTAMPING_OPT_ID.
Willem de Bruijn wrote: > Martin KaFai Lau wrote: > > On 10/20/24 2:51 PM, Willem de Bruijn wrote: > > > Jason Xing wrote: > > >> From: Jason Xing <kernelxing@tencent.com> > > >> > > >> Willem suggested that we use a static key to control. The advantage > > >> is that we will not affect the existing applications at all if we > > >> don't load BPF program. > > >> > > >> In this patch, except the static key, I also add one logic that is > > >> used to test if the socket has enabled its tsflags in order to > > >> support bpf logic to allow both cases to happen at the same time. > > >> Or else, the skb carring related timestamp flag doesn't know which > > >> way of printing is desirable. > > >> > > >> One thing important is this patch allows print from both applications > > >> and bpf program at the same time. Now we have three kinds of print: > > >> 1) only BPF program prints > > >> 2) only application program prints > > >> 3) both can print without side effect > > >> > > >> Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > Getting back to this thread. It is long, instead of responding to > > > multiple messages, let me combine them in a single response. > > > > > > > > > * On future extensions: > > > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > > development path, before we can merge TCP. > > > > > > Similarly, hardware timestamps need not be supported from the start, > > > but must clearly be supportable. > > > > > > > > > * On queueing packets to userspace: > > > > > >>> the current behavior is to just queue to the sk_error_queue as long > > >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > >>> is regardless of the sk_tsflags. " > > > > > >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > >> read the skb from the errqueue but are not able to parse the > > >> timestamps > > > > > > Before queuing a packet to userspace on the error queue, the relevant > > > reporting flag is always tested. sock_recv_timestamp has: > > > > > > /* > > > * generate control messages if > > > * - receive time stamping in software requested > > > * - software time stamp available and wanted > > > * - hardware time stamps available and wanted > > > */ > > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > > (hwtstamps->hwtstamp && > > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > > __sock_recv_timestamp(msg, sk, skb); > > > > > > Otherwise applications could get error messages queued, and > > > epoll/poll/select would unexpectedly behave differently. > > > > I just tried the following diff to remove setsockopt from txtimestamp.c and run > > "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue > > with only cmsg flag. > > That it surprising and against the API intent as I understand it. > Let me reproduce and take a closer look. Interesting. I guess my interpretation was wrong. The reporting flags prevent reporting of the timestamp, but not queuing of the skb on the error queue. Even if the only purpose is to report a timestamp. It goes back until well before all the API extensions. At least v3.6. It still does suppress the timestamp itself if the relevant reporting flag, SOF_TIMESTAMPING_SOFTWARE or SOF_TIMESTAMPING_RAW_HARDWARE, is not set. So BPF should really still match that, I guess.
diff --git a/include/net/sock.h b/include/net/sock.h index 66ecd78f1dfe..b7c51b95c92d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) void sock_def_readable(struct sock *sk); int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control); void sock_set_timestamp(struct sock *sk, int optname, bool valbool); int sock_get_timestamping(struct so_timestamping *timestamping, sockptr_t optval, unsigned int optlen); diff --git a/net/core/filter.c b/net/core/filter.c index 996426095bd9..08135f538c99 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control); + static int bpf_sock_set_timestamping(struct sock *sk, struct so_timestamping *timestamping) { @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk, return -EINVAL; WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags); + static_branch_enable(&bpf_tstamp_control); return 0; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f36eb9daa31a..d0f912f1ff7b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, } EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype) +{ + u32 testflag; + + switch (tstype) { + case SCM_TSTAMP_SCHED: + testflag = SOF_TIMESTAMPING_TX_SCHED; + break; + case SCM_TSTAMP_SND: + testflag = SOF_TIMESTAMPING_TX_SOFTWARE; + break; + case SCM_TSTAMP_ACK: + testflag = SOF_TIMESTAMPING_TX_ACK; + break; + default: + return false; + } + if (tsflags & testflag) + return true; + + return false; +} + static void skb_tstamp_tx_output(struct sk_buff *orig_skb, const struct sk_buff *ack_skb, struct skb_shared_hwtstamps *hwtstamps, @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, if (!skb_may_tx_timestamp(sk, tsonly)) return; + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) + return; + if (tsonly) { #ifdef CONFIG_INET if ((tsflags & SOF_TIMESTAMPING_OPT_STATS) && @@ -5593,6 +5619,15 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, __skb_complete_tx_timestamp(skb, sk, tstype, opt_stats); } +static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype) +{ + u32 tsflags; + + tsflags = READ_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR]); + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) + return; +} + void __skb_tstamp_tx(struct sk_buff *orig_skb, const struct sk_buff *ack_skb, struct skb_shared_hwtstamps *hwtstamps, @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, if (!sk) return; + if (static_branch_unlikely(&bpf_tstamp_control)) + bpf_skb_tstamp_tx_output(sk, tstype); + skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); } EXPORT_SYMBOL_GPL(__skb_tstamp_tx);