diff mbox series

[net-next,v2,04/12] net-timestamp: add static key to control the whole bpf extension

Message ID 20241012040651.95616-5-kerneljasonxing@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net-timestamp: bpf extension to equip applications transparently | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/build_tools success Errors and warnings before: 109 (+0) this patch: 109 (+0)
netdev/cc_maintainers warning 1 maintainers not CCed: horms@kernel.org
netdev/build_clang success Errors and warnings before: 36 this patch: 36
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2609 this patch: 2609
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 84 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 64 this patch: 64
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Xing Oct. 12, 2024, 4:06 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

Willem suggested that we use a static key to control. The advantage
is that we will not affect the existing applications at all if we
don't load BPF program.

In this patch, except the static key, I also add one logic that is
used to test if the socket has enabled its tsflags in order to
support bpf logic to allow both cases to happen at the same time.
Or else, the skb carring related timestamp flag doesn't know which
way of printing is desirable.

One thing important is this patch allows print from both applications
and bpf program at the same time. Now we have three kinds of print:
1) only BPF program prints
2) only application program prints
3) both can print without side effect

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/sock.h |  1 +
 net/core/filter.c  |  3 +++
 net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

Comments

Willem de Bruijn Oct. 15, 2024, 1:36 a.m. UTC | #1
Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Willem suggested that we use a static key to control. The advantage
> is that we will not affect the existing applications at all if we
> don't load BPF program.
> 
> In this patch, except the static key, I also add one logic that is
> used to test if the socket has enabled its tsflags in order to
> support bpf logic to allow both cases to happen at the same time.

These two features are unrelated, should probably be separate patches.

> Or else, the skb carring related timestamp flag doesn't know which
> way of printing is desirable.
> 
> One thing important is this patch allows print from both applications
> and bpf program at the same time. Now we have three kinds of print:
> 1) only BPF program prints
> 2) only application program prints
> 3) both can print without side effect
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
Jason Xing Oct. 15, 2024, 2:25 a.m. UTC | #2
On Tue, Oct 15, 2024 at 9:36 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Willem suggested that we use a static key to control. The advantage
> > is that we will not affect the existing applications at all if we
> > don't load BPF program.
> >
> > In this patch, except the static key, I also add one logic that is
> > used to test if the socket has enabled its tsflags in order to
> > support bpf logic to allow both cases to happen at the same time.
>
> These two features are unrelated, should probably be separate patches.

Will do it, thanks.
Martin KaFai Lau Oct. 16, 2024, 12:09 a.m. UTC | #3
On 10/11/24 9:06 PM, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Willem suggested that we use a static key to control. The advantage
> is that we will not affect the existing applications at all if we
> don't load BPF program.
> 
> In this patch, except the static key, I also add one logic that is
> used to test if the socket has enabled its tsflags in order to
> support bpf logic to allow both cases to happen at the same time.
> Or else, the skb carring related timestamp flag doesn't know which
> way of printing is desirable.
> 
> One thing important is this patch allows print from both applications
> and bpf program at the same time. Now we have three kinds of print:
> 1) only BPF program prints
> 2) only application program prints
> 3) both can print without side effect
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   include/net/sock.h |  1 +
>   net/core/filter.c  |  3 +++
>   net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 42 insertions(+)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 66ecd78f1dfe..b7c51b95c92d 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
>   void sock_def_readable(struct sock *sk);
>   
>   int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
>   void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
>   int sock_get_timestamping(struct so_timestamping *timestamping,
>   			  sockptr_t optval, unsigned int optlen);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 996426095bd9..08135f538c99 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
>   	.arg1_type      = ARG_PTR_TO_CTX,
>   };
>   
> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> +
>   static int bpf_sock_set_timestamping(struct sock *sk,
>   				     struct so_timestamping *timestamping)
>   {
> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
>   		return -EINVAL;
>   
>   	WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> +	static_branch_enable(&bpf_tstamp_control);

Not sure when is a good time to do static_branch_disable().

The bpf prog may be detached also. (IF) it ends up staying with the 
cgroup/sockops interface, it should depend on the existing static key in 
cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.

>   
>   	return 0;
>   }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f36eb9daa31a..d0f912f1ff7b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>   }
>   EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
>   
> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)

sk is unused.

> +{
> +	u32 testflag;
> +
> +	switch (tstype) {
> +	case SCM_TSTAMP_SCHED:

Instead of doing this translation,
is it easier to directly store the bpf prog desired ts"type" (i.e. the 
SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
or there is a specific need to keep the SOF_TIMESTAMPING_* value in
sk->sk_tsflags_bpf?

> +		testflag = SOF_TIMESTAMPING_TX_SCHED;
> +		break;
> +	case SCM_TSTAMP_SND:
> +		testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> +		break;
> +	case SCM_TSTAMP_ACK:
> +		testflag = SOF_TIMESTAMPING_TX_ACK;
> +		break;
> +	default:
> +		return false;
> +	}
> +	if (tsflags & testflag)
> +		return true;
> +
> +	return false;
> +}
> +
>   static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>   				 const struct sk_buff *ack_skb,
>   				 struct skb_shared_hwtstamps *hwtstamps,
> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>   	if (!skb_may_tx_timestamp(sk, tsonly))
>   		return;
>   
> +	if (!sk_tstamp_tx_flags(sk, tsflags, tstype))

This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.

My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual 
skb. Would it break? Is it the similar case on the skb tx_flags that Willem has 
mentioned in the patch 0's thread?

> +		return;
> +
>   	if (tsonly) {
>   #ifdef CONFIG_INET
>   		if ((tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
> @@ -5593,6 +5619,15 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>   	__skb_complete_tx_timestamp(skb, sk, tstype, opt_stats);
>   }
>   
> +static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype)
> +{
> +	u32 tsflags;
> +
> +	tsflags = READ_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR]);
> +	if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> +		return;
> +}
> +
>   void __skb_tstamp_tx(struct sk_buff *orig_skb,
>   		     const struct sk_buff *ack_skb,
>   		     struct skb_shared_hwtstamps *hwtstamps,
> @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>   	if (!sk)
>   		return;
>   
> +	if (static_branch_unlikely(&bpf_tstamp_control))
> +		bpf_skb_tstamp_tx_output(sk, tstype);
> +
>   	skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
>   }
>   EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
Jason Xing Oct. 16, 2024, 1:04 a.m. UTC | #4
On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/11/24 9:06 PM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Willem suggested that we use a static key to control. The advantage
> > is that we will not affect the existing applications at all if we
> > don't load BPF program.
> >
> > In this patch, except the static key, I also add one logic that is
> > used to test if the socket has enabled its tsflags in order to
> > support bpf logic to allow both cases to happen at the same time.
> > Or else, the skb carring related timestamp flag doesn't know which
> > way of printing is desirable.
> >
> > One thing important is this patch allows print from both applications
> > and bpf program at the same time. Now we have three kinds of print:
> > 1) only BPF program prints
> > 2) only application program prints
> > 3) both can print without side effect
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   include/net/sock.h |  1 +
> >   net/core/filter.c  |  3 +++
> >   net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 42 insertions(+)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 66ecd78f1dfe..b7c51b95c92d 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
> >   void sock_def_readable(struct sock *sk);
> >
> >   int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> > +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >   void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
> >   int sock_get_timestamping(struct so_timestamping *timestamping,
> >                         sockptr_t optval, unsigned int optlen);
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 996426095bd9..08135f538c99 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >       .arg1_type      = ARG_PTR_TO_CTX,
> >   };
> >
> > +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> > +
> >   static int bpf_sock_set_timestamping(struct sock *sk,
> >                                    struct so_timestamping *timestamping)
> >   {
> > @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
> >               return -EINVAL;
> >
> >       WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> > +     static_branch_enable(&bpf_tstamp_control);
>
> Not sure when is a good time to do static_branch_disable().

Thanks for the review.

To be honest, I considered how to disable the static key. Like you
said, I failed to find a good chance that I can accurately disable it.

>
> The bpf prog may be detached also. (IF) it ends up staying with the
> cgroup/sockops interface, it should depend on the existing static key in
> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.

Are you suggesting that we need to remove the current static key? In
the previous thread, the reason why Willem came up with this idea is,
I think, to avoid affect the non-bpf timestamping feature.

>
> >
> >       return 0;
> >   }
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index f36eb9daa31a..d0f912f1ff7b 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >   }
> >   EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> >
> > +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
>
> sk is unused.

Thanks for the careful check.

>
> > +{
> > +     u32 testflag;
> > +
> > +     switch (tstype) {
> > +     case SCM_TSTAMP_SCHED:
>
> Instead of doing this translation,
> is it easier to directly store the bpf prog desired ts"type" (i.e. the
> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
> sk->sk_tsflags_bpf?

We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
SOF_TIMESTAMPING_OPT_ID, that we need to support.

>
> > +             testflag = SOF_TIMESTAMPING_TX_SCHED;
> > +             break;
> > +     case SCM_TSTAMP_SND:
> > +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> > +             break;
> > +     case SCM_TSTAMP_ACK:
> > +             testflag = SOF_TIMESTAMPING_TX_ACK;
> > +             break;
> > +     default:
> > +             return false;
> > +     }
> > +     if (tsflags & testflag)
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> >   static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >                                const struct sk_buff *ack_skb,
> >                                struct skb_shared_hwtstamps *hwtstamps,
> > @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >       if (!skb_may_tx_timestamp(sk, tsonly))
> >               return;
> >
> > +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
>
> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.

This test will be used in bpf and non-bpf cases. Because of this, we
can support BPF extension. In this function, if skb has tsflags but we
don't know which approach the user expects, sk_tstamp_tx_flags() can
help us.

>
> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
> skb. Would it break?

Oh, you're right. I didn't support cmsg mode...

> Is it the similar case on the skb tx_flags that Willem has
> mentioned in the patch 0's thread?

Yes, but am I supposed to add new a bpf tx_flags in the struct sk_buff?

Thanks,
Jason
Jason Xing Oct. 16, 2024, 1:32 a.m. UTC | #5
On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 10/11/24 9:06 PM, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Willem suggested that we use a static key to control. The advantage
> > > is that we will not affect the existing applications at all if we
> > > don't load BPF program.
> > >
> > > In this patch, except the static key, I also add one logic that is
> > > used to test if the socket has enabled its tsflags in order to
> > > support bpf logic to allow both cases to happen at the same time.
> > > Or else, the skb carring related timestamp flag doesn't know which
> > > way of printing is desirable.
> > >
> > > One thing important is this patch allows print from both applications
> > > and bpf program at the same time. Now we have three kinds of print:
> > > 1) only BPF program prints
> > > 2) only application program prints
> > > 3) both can print without side effect
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >   include/net/sock.h |  1 +
> > >   net/core/filter.c  |  3 +++
> > >   net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 42 insertions(+)
> > >
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 66ecd78f1dfe..b7c51b95c92d 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
> > >   void sock_def_readable(struct sock *sk);
> > >
> > >   int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> > > +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
> > >   void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
> > >   int sock_get_timestamping(struct so_timestamping *timestamping,
> > >                         sockptr_t optval, unsigned int optlen);
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 996426095bd9..08135f538c99 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> > >       .arg1_type      = ARG_PTR_TO_CTX,
> > >   };
> > >
> > > +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> > > +
> > >   static int bpf_sock_set_timestamping(struct sock *sk,
> > >                                    struct so_timestamping *timestamping)
> > >   {
> > > @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
> > >               return -EINVAL;
> > >
> > >       WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> > > +     static_branch_enable(&bpf_tstamp_control);
> >
> > Not sure when is a good time to do static_branch_disable().
>
> Thanks for the review.
>
> To be honest, I considered how to disable the static key. Like you
> said, I failed to find a good chance that I can accurately disable it.
>
> >
> > The bpf prog may be detached also. (IF) it ends up staying with the
> > cgroup/sockops interface, it should depend on the existing static key in
> > cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
>
> Are you suggesting that we need to remove the current static key? In
> the previous thread, the reason why Willem came up with this idea is,
> I think, to avoid affect the non-bpf timestamping feature.
>
> >
> > >
> > >       return 0;
> > >   }
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index f36eb9daa31a..d0f912f1ff7b 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> > >   }
> > >   EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> > >
> > > +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
> >
> > sk is unused.
>
> Thanks for the careful check.
>
> >
> > > +{
> > > +     u32 testflag;
> > > +
> > > +     switch (tstype) {
> > > +     case SCM_TSTAMP_SCHED:
> >
> > Instead of doing this translation,
> > is it easier to directly store the bpf prog desired ts"type" (i.e. the
> > SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
> > or there is a specific need to keep the SOF_TIMESTAMPING_* value in
> > sk->sk_tsflags_bpf?
>
> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
> SOF_TIMESTAMPING_OPT_ID, that we need to support.
>
> >
> > > +             testflag = SOF_TIMESTAMPING_TX_SCHED;
> > > +             break;
> > > +     case SCM_TSTAMP_SND:
> > > +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> > > +             break;
> > > +     case SCM_TSTAMP_ACK:
> > > +             testflag = SOF_TIMESTAMPING_TX_ACK;
> > > +             break;
> > > +     default:
> > > +             return false;
> > > +     }
> > > +     if (tsflags & testflag)
> > > +             return true;
> > > +
> > > +     return false;
> > > +}
> > > +
> > >   static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> > >                                const struct sk_buff *ack_skb,
> > >                                struct skb_shared_hwtstamps *hwtstamps,
> > > @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> > >       if (!skb_may_tx_timestamp(sk, tsonly))
> > >               return;
> > >
> > > +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> >
> > This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
>
> This test will be used in bpf and non-bpf cases. Because of this, we
> can support BPF extension. In this function, if skb has tsflags but we
> don't know which approach the user expects, sk_tstamp_tx_flags() can
> help us.
>
> >
> > My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
> > skb. Would it break?
>
> Oh, you're right. I didn't support cmsg mode...

I think I only need to test if it's in the bpf mode, or else let the
original way print the timestamp, which can solve the issue.
Martin KaFai Lau Oct. 16, 2024, 6:13 a.m. UTC | #6
On 10/15/24 6:32 PM, Jason Xing wrote:
> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 10/11/24 9:06 PM, Jason Xing wrote:
>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>
>>>> Willem suggested that we use a static key to control. The advantage
>>>> is that we will not affect the existing applications at all if we
>>>> don't load BPF program.
>>>>
>>>> In this patch, except the static key, I also add one logic that is
>>>> used to test if the socket has enabled its tsflags in order to
>>>> support bpf logic to allow both cases to happen at the same time.
>>>> Or else, the skb carring related timestamp flag doesn't know which
>>>> way of printing is desirable.
>>>>
>>>> One thing important is this patch allows print from both applications
>>>> and bpf program at the same time. Now we have three kinds of print:
>>>> 1) only BPF program prints
>>>> 2) only application program prints
>>>> 3) both can print without side effect
>>>>
>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>>> ---
>>>>    include/net/sock.h |  1 +
>>>>    net/core/filter.c  |  3 +++
>>>>    net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
>>>>    3 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>> index 66ecd78f1dfe..b7c51b95c92d 100644
>>>> --- a/include/net/sock.h
>>>> +++ b/include/net/sock.h
>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
>>>>    void sock_def_readable(struct sock *sk);
>>>>
>>>>    int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
>>>>    void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
>>>>    int sock_get_timestamping(struct so_timestamping *timestamping,
>>>>                          sockptr_t optval, unsigned int optlen);
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index 996426095bd9..08135f538c99 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
>>>>        .arg1_type      = ARG_PTR_TO_CTX,
>>>>    };
>>>>
>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
>>>> +
>>>>    static int bpf_sock_set_timestamping(struct sock *sk,
>>>>                                     struct so_timestamping *timestamping)
>>>>    {
>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
>>>>                return -EINVAL;
>>>>
>>>>        WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
>>>> +     static_branch_enable(&bpf_tstamp_control);
>>>
>>> Not sure when is a good time to do static_branch_disable().
>>
>> Thanks for the review.
>>
>> To be honest, I considered how to disable the static key. Like you
>> said, I failed to find a good chance that I can accurately disable it.
>>
>>>
>>> The bpf prog may be detached also. (IF) it ends up staying with the
>>> cgroup/sockops interface, it should depend on the existing static key in
>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
>>
>> Are you suggesting that we need to remove the current static key? In
>> the previous thread, the reason why Willem came up with this idea is,
>> I think, to avoid affect the non-bpf timestamping feature.
>>
>>>
>>>>
>>>>        return 0;
>>>>    }
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index f36eb9daa31a..d0f912f1ff7b 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
>>>>
>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
>>>
>>> sk is unused.
>>
>> Thanks for the careful check.
>>
>>>
>>>> +{
>>>> +     u32 testflag;
>>>> +
>>>> +     switch (tstype) {
>>>> +     case SCM_TSTAMP_SCHED:
>>>
>>> Instead of doing this translation,
>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
>>> sk->sk_tsflags_bpf?
>>
>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
>> SOF_TIMESTAMPING_OPT_ID, that we need to support.
>>
>>>
>>>> +             testflag = SOF_TIMESTAMPING_TX_SCHED;
>>>> +             break;
>>>> +     case SCM_TSTAMP_SND:
>>>> +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
>>>> +             break;
>>>> +     case SCM_TSTAMP_ACK:
>>>> +             testflag = SOF_TIMESTAMPING_TX_ACK;
>>>> +             break;
>>>> +     default:
>>>> +             return false;
>>>> +     }
>>>> +     if (tsflags & testflag)
>>>> +             return true;
>>>> +
>>>> +     return false;
>>>> +}
>>>> +
>>>>    static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>>>>                                 const struct sk_buff *ack_skb,
>>>>                                 struct skb_shared_hwtstamps *hwtstamps,
>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>>>>        if (!skb_may_tx_timestamp(sk, tsonly))
>>>>                return;
>>>>
>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
>>>
>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
>>
>> This test will be used in bpf and non-bpf cases. Because of this, we
>> can support BPF extension. In this function, if skb has tsflags but we
>> don't know which approach the user expects, sk_tstamp_tx_flags() can
>> help us.
>>
>>>
>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
>>> skb. Would it break?
>>
>> Oh, you're right. I didn't support cmsg mode...
> 
> I think I only need to test if it's in the bpf mode, or else let the
> original way print the timestamp, which can solve the issue.

 From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk, 
SCM_TSTAMP_SCHED);":

int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
{
	/* ... */

	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
		__skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);

	/* ... */
}

I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled 
the timestamping. I may have missed somewhere in the patch set that the skb's 
tx_flags is changed by sk->sk_tsflags_bpf alone?

I think a skb tskey is still desired (?), so eventually we want some spaces in 
the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending 
skb->data_meta usage outside of xdp and tc. I think here we want to have it 
available at the tx side to store the tx_flags and tskey but probably want them 
at a specific place/offset at the data_meta.

For now, is there thing we can explore to share in the skb_shared_info? Can the 
"struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey 
only at the "tx" side? There is already another union member. The hwtstamps 
should only be needed when the NIC is done sending?
Jason Xing Oct. 16, 2024, 6:30 a.m. UTC | #7
On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/15/24 6:32 PM, Jason Xing wrote:
> > On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>
> >> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>
> >>> On 10/11/24 9:06 PM, Jason Xing wrote:
> >>>> From: Jason Xing <kernelxing@tencent.com>
> >>>>
> >>>> Willem suggested that we use a static key to control. The advantage
> >>>> is that we will not affect the existing applications at all if we
> >>>> don't load BPF program.
> >>>>
> >>>> In this patch, except the static key, I also add one logic that is
> >>>> used to test if the socket has enabled its tsflags in order to
> >>>> support bpf logic to allow both cases to happen at the same time.
> >>>> Or else, the skb carring related timestamp flag doesn't know which
> >>>> way of printing is desirable.
> >>>>
> >>>> One thing important is this patch allows print from both applications
> >>>> and bpf program at the same time. Now we have three kinds of print:
> >>>> 1) only BPF program prints
> >>>> 2) only application program prints
> >>>> 3) both can print without side effect
> >>>>
> >>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>>> ---
> >>>>    include/net/sock.h |  1 +
> >>>>    net/core/filter.c  |  3 +++
> >>>>    net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
> >>>>    3 files changed, 42 insertions(+)
> >>>>
> >>>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>>> index 66ecd78f1dfe..b7c51b95c92d 100644
> >>>> --- a/include/net/sock.h
> >>>> +++ b/include/net/sock.h
> >>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
> >>>>    void sock_def_readable(struct sock *sk);
> >>>>
> >>>>    int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> >>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>    void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
> >>>>    int sock_get_timestamping(struct so_timestamping *timestamping,
> >>>>                          sockptr_t optval, unsigned int optlen);
> >>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>> index 996426095bd9..08135f538c99 100644
> >>>> --- a/net/core/filter.c
> >>>> +++ b/net/core/filter.c
> >>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >>>>        .arg1_type      = ARG_PTR_TO_CTX,
> >>>>    };
> >>>>
> >>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>> +
> >>>>    static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>                                     struct so_timestamping *timestamping)
> >>>>    {
> >>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>                return -EINVAL;
> >>>>
> >>>>        WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> >>>> +     static_branch_enable(&bpf_tstamp_control);
> >>>
> >>> Not sure when is a good time to do static_branch_disable().
> >>
> >> Thanks for the review.
> >>
> >> To be honest, I considered how to disable the static key. Like you
> >> said, I failed to find a good chance that I can accurately disable it.
> >>
> >>>
> >>> The bpf prog may be detached also. (IF) it ends up staying with the
> >>> cgroup/sockops interface, it should depend on the existing static key in
> >>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
> >>
> >> Are you suggesting that we need to remove the current static key? In
> >> the previous thread, the reason why Willem came up with this idea is,
> >> I think, to avoid affect the non-bpf timestamping feature.
> >>
> >>>
> >>>>
> >>>>        return 0;
> >>>>    }
> >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>>> index f36eb9daa31a..d0f912f1ff7b 100644
> >>>> --- a/net/core/skbuff.c
> >>>> +++ b/net/core/skbuff.c
> >>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >>>>    }
> >>>>    EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> >>>>
> >>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
> >>>
> >>> sk is unused.
> >>
> >> Thanks for the careful check.
> >>
> >>>
> >>>> +{
> >>>> +     u32 testflag;
> >>>> +
> >>>> +     switch (tstype) {
> >>>> +     case SCM_TSTAMP_SCHED:
> >>>
> >>> Instead of doing this translation,
> >>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
> >>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
> >>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
> >>> sk->sk_tsflags_bpf?
> >>
> >> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
> >> SOF_TIMESTAMPING_OPT_ID, that we need to support.
> >>
> >>>
> >>>> +             testflag = SOF_TIMESTAMPING_TX_SCHED;
> >>>> +             break;
> >>>> +     case SCM_TSTAMP_SND:
> >>>> +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> >>>> +             break;
> >>>> +     case SCM_TSTAMP_ACK:
> >>>> +             testflag = SOF_TIMESTAMPING_TX_ACK;
> >>>> +             break;
> >>>> +     default:
> >>>> +             return false;
> >>>> +     }
> >>>> +     if (tsflags & testflag)
> >>>> +             return true;
> >>>> +
> >>>> +     return false;
> >>>> +}
> >>>> +
> >>>>    static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>                                 const struct sk_buff *ack_skb,
> >>>>                                 struct skb_shared_hwtstamps *hwtstamps,
> >>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>        if (!skb_may_tx_timestamp(sk, tsonly))
> >>>>                return;
> >>>>
> >>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> >>>
> >>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
> >>
> >> This test will be used in bpf and non-bpf cases. Because of this, we
> >> can support BPF extension. In this function, if skb has tsflags but we
> >> don't know which approach the user expects, sk_tstamp_tx_flags() can
> >> help us.
> >>
> >>>
> >>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
> >>> skb. Would it break?
> >>
> >> Oh, you're right. I didn't support cmsg mode...
> >
> > I think I only need to test if it's in the bpf mode, or else let the
> > original way print the timestamp, which can solve the issue.
>
>  From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk,
> SCM_TSTAMP_SCHED);":
>
> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> {
>         /* ... */
>
>         if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
>                 __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
>
>         /* ... */
> }
>
> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled
> the timestamping. I may have missed somewhere in the patch set that the skb's
> tx_flags is changed by sk->sk_tsflags_bpf alone?

If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be
helpful, which initializes every last skb, please see patch [10/12].

>
> I think a skb tskey is still desired (?), so eventually we want some spaces in

tskey function is optional I think. It depends whether users want to
use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag.

> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending
> skb->data_meta usage outside of xdp and tc. I think here we want to have it
> available at the tx side to store the tx_flags and tskey but probably want them
> at a specific place/offset at the data_meta.

If we have the plan to store extra information in data_meta, I can
give it a try:)

>
> For now, is there thing we can explore to share in the skb_shared_info?

My initial thought is just to reuse these fields in skb. It can work
without interfering one another.

> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey
> only at the "tx" side? There is already another union member.

tskey is always used in the tx path.

hwtstamps can be used in both rx and tx cases (please see
tcp_update_recv_tstamps() and skb_tstamp_tx()).

> The hwtstamps should only be needed when the NIC is done sending?

In this patch, yes, hwtstamps are the records in tx path.

Thanks,
Jason
Martin KaFai Lau Oct. 16, 2024, 6:31 a.m. UTC | #8
On 10/15/24 6:04 PM, Jason Xing wrote:
> To be honest, I considered how to disable the static key. Like you
> said, I failed to find a good chance that I can accurately disable it.

It at least needs to be disabled whenever that bpf prog got detached.

> 
>> The bpf prog may be detached also. (IF) it ends up staying with the
>> cgroup/sockops interface, it should depend on the existing static key in
>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.

> Are you suggesting that we need to remove the current static key? In
> the previous thread, the reason why Willem came up with this idea is,
> I think, to avoid affect the non-bpf timestamping feature.

Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am 
saying to use that existing key. afaict, the newly added bpf_tstamp_control key 
is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less 
granular but it has the needed accounting to disable whenever the bpf prog got 
detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS).
Jason Xing Oct. 16, 2024, 6:45 a.m. UTC | #9
On Wed, Oct 16, 2024 at 2:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/15/24 6:04 PM, Jason Xing wrote:
> > To be honest, I considered how to disable the static key. Like you
> > said, I failed to find a good chance that I can accurately disable it.
>
> It at least needs to be disabled whenever that bpf prog got detached.
>
> >
> >> The bpf prog may be detached also. (IF) it ends up staying with the
> >> cgroup/sockops interface, it should depend on the existing static key in
> >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
>
> > Are you suggesting that we need to remove the current static key? In
> > the previous thread, the reason why Willem came up with this idea is,
> > I think, to avoid affect the non-bpf timestamping feature.
>
> Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am
> saying to use that existing key. afaict, the newly added bpf_tstamp_control key
> is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less
> granular but it has the needed accounting to disable whenever the bpf prog got
> detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS).

Good suggestion. Good thing is that I don't need to figure out a
proper place to disable it any more. I can directly use
cgroup_bpf_enabled(CGROUP_SOCK_OPS) to test if the timestamp should be
printed with BPF program loaded.

BTW, I found that we don't implement how to disable the ip4_min_ttl
static key. Sometimes, I'm confused whether we have to disable it at a
certain time.

Thanks,
Jason
Martin KaFai Lau Oct. 16, 2024, 7:01 a.m. UTC | #10
On 10/15/24 11:30 PM, Jason Xing wrote:
> On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/15/24 6:32 PM, Jason Xing wrote:
>>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>
>>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>>
>>>>> On 10/11/24 9:06 PM, Jason Xing wrote:
>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>
>>>>>> Willem suggested that we use a static key to control. The advantage
>>>>>> is that we will not affect the existing applications at all if we
>>>>>> don't load BPF program.
>>>>>>
>>>>>> In this patch, except the static key, I also add one logic that is
>>>>>> used to test if the socket has enabled its tsflags in order to
>>>>>> support bpf logic to allow both cases to happen at the same time.
>>>>>> Or else, the skb carring related timestamp flag doesn't know which
>>>>>> way of printing is desirable.
>>>>>>
>>>>>> One thing important is this patch allows print from both applications
>>>>>> and bpf program at the same time. Now we have three kinds of print:
>>>>>> 1) only BPF program prints
>>>>>> 2) only application program prints
>>>>>> 3) both can print without side effect
>>>>>>
>>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>>>>> ---
>>>>>>     include/net/sock.h |  1 +
>>>>>>     net/core/filter.c  |  3 +++
>>>>>>     net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
>>>>>>     3 files changed, 42 insertions(+)
>>>>>>
>>>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644
>>>>>> --- a/include/net/sock.h
>>>>>> +++ b/include/net/sock.h
>>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
>>>>>>     void sock_def_readable(struct sock *sk);
>>>>>>
>>>>>>     int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
>>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
>>>>>>     void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
>>>>>>     int sock_get_timestamping(struct so_timestamping *timestamping,
>>>>>>                           sockptr_t optval, unsigned int optlen);
>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>> index 996426095bd9..08135f538c99 100644
>>>>>> --- a/net/core/filter.c
>>>>>> +++ b/net/core/filter.c
>>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
>>>>>>         .arg1_type      = ARG_PTR_TO_CTX,
>>>>>>     };
>>>>>>
>>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
>>>>>> +
>>>>>>     static int bpf_sock_set_timestamping(struct sock *sk,
>>>>>>                                      struct so_timestamping *timestamping)
>>>>>>     {
>>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
>>>>>>                 return -EINVAL;
>>>>>>
>>>>>>         WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
>>>>>> +     static_branch_enable(&bpf_tstamp_control);
>>>>>
>>>>> Not sure when is a good time to do static_branch_disable().
>>>>
>>>> Thanks for the review.
>>>>
>>>> To be honest, I considered how to disable the static key. Like you
>>>> said, I failed to find a good chance that I can accurately disable it.
>>>>
>>>>>
>>>>> The bpf prog may be detached also. (IF) it ends up staying with the
>>>>> cgroup/sockops interface, it should depend on the existing static key in
>>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
>>>>
>>>> Are you suggesting that we need to remove the current static key? In
>>>> the previous thread, the reason why Willem came up with this idea is,
>>>> I think, to avoid affect the non-bpf timestamping feature.
>>>>
>>>>>
>>>>>>
>>>>>>         return 0;
>>>>>>     }
>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>>>> index f36eb9daa31a..d0f912f1ff7b 100644
>>>>>> --- a/net/core/skbuff.c
>>>>>> +++ b/net/core/skbuff.c
>>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>>>>>>     }
>>>>>>     EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
>>>>>>
>>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
>>>>>
>>>>> sk is unused.
>>>>
>>>> Thanks for the careful check.
>>>>
>>>>>
>>>>>> +{
>>>>>> +     u32 testflag;
>>>>>> +
>>>>>> +     switch (tstype) {
>>>>>> +     case SCM_TSTAMP_SCHED:
>>>>>
>>>>> Instead of doing this translation,
>>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
>>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
>>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
>>>>> sk->sk_tsflags_bpf?
>>>>
>>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
>>>> SOF_TIMESTAMPING_OPT_ID, that we need to support.
>>>>
>>>>>
>>>>>> +             testflag = SOF_TIMESTAMPING_TX_SCHED;
>>>>>> +             break;
>>>>>> +     case SCM_TSTAMP_SND:
>>>>>> +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
>>>>>> +             break;
>>>>>> +     case SCM_TSTAMP_ACK:
>>>>>> +             testflag = SOF_TIMESTAMPING_TX_ACK;
>>>>>> +             break;
>>>>>> +     default:
>>>>>> +             return false;
>>>>>> +     }
>>>>>> +     if (tsflags & testflag)
>>>>>> +             return true;
>>>>>> +
>>>>>> +     return false;
>>>>>> +}
>>>>>> +
>>>>>>     static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>>>>>>                                  const struct sk_buff *ack_skb,
>>>>>>                                  struct skb_shared_hwtstamps *hwtstamps,
>>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>>>>>>         if (!skb_may_tx_timestamp(sk, tsonly))
>>>>>>                 return;
>>>>>>
>>>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
>>>>>
>>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
>>>>
>>>> This test will be used in bpf and non-bpf cases. Because of this, we
>>>> can support BPF extension. In this function, if skb has tsflags but we
>>>> don't know which approach the user expects, sk_tstamp_tx_flags() can
>>>> help us.
>>>>
>>>>>
>>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
>>>>> skb. Would it break?
>>>>
>>>> Oh, you're right. I didn't support cmsg mode...
>>>
>>> I think I only need to test if it's in the bpf mode, or else let the
>>> original way print the timestamp, which can solve the issue.
>>
>>   From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk,
>> SCM_TSTAMP_SCHED);":
>>
>> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>> {
>>          /* ... */
>>
>>          if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
>>                  __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
>>
>>          /* ... */
>> }
>>
>> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled
>> the timestamping. I may have missed somewhere in the patch set that the skb's
>> tx_flags is changed by sk->sk_tsflags_bpf alone?
> 
> If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be
> helpful, which initializes every last skb, please see patch [10/12].

Ah. ok. It is the thing I missed. Thanks for the pointer.

>>
>> I think a skb tskey is still desired (?), so eventually we want some spaces in
> 
> tskey function is optional I think. It depends whether users want to
> use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag.
> 
>> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending
>> skb->data_meta usage outside of xdp and tc. I think here we want to have it
>> available at the tx side to store the tx_flags and tskey but probably want them
>> at a specific place/offset at the data_meta.
> 
> If we have the plan to store extra information in data_meta, I can
> give it a try:)
> 
>>
>> For now, is there thing we can explore to share in the skb_shared_info?
> 
> My initial thought is just to reuse these fields in skb. It can work
> without interfering one another.

After reading closer to patch 10, I am likely still missing something. How can 
it tell if the tx_flags is set by the bpf or by the user space cmsg?

> 
>> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey
>> only at the "tx" side? There is already another union member.
> 
> tskey is always used in the tx path.
> 
> hwtstamps can be used in both rx and tx cases (please see
> tcp_update_recv_tstamps() and skb_tstamp_tx()).

hmm... we only need some where to store the bpf tx_flags and bpf tskey in the 
TX-ing skb. You meant the hwtstamps of a Tx-ing skb is not empty?

At skb_tstamp_tx (TX side only?), the orig_skb's hwtstamps has not been written yet?

> 
>> The hwtstamps should only be needed when the NIC is done sending?
> 
> In this patch, yes, hwtstamps are the records in tx path.
> 
> Thanks,
> Jason
Jason Xing Oct. 16, 2024, 7:54 a.m. UTC | #11
On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/15/24 11:30 PM, Jason Xing wrote:
> > On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 10/15/24 6:32 PM, Jason Xing wrote:
> >>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>>>
> >>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>>>
> >>>>> On 10/11/24 9:06 PM, Jason Xing wrote:
> >>>>>> From: Jason Xing <kernelxing@tencent.com>
> >>>>>>
> >>>>>> Willem suggested that we use a static key to control. The advantage
> >>>>>> is that we will not affect the existing applications at all if we
> >>>>>> don't load BPF program.
> >>>>>>
> >>>>>> In this patch, except the static key, I also add one logic that is
> >>>>>> used to test if the socket has enabled its tsflags in order to
> >>>>>> support bpf logic to allow both cases to happen at the same time.
> >>>>>> Or else, the skb carring related timestamp flag doesn't know which
> >>>>>> way of printing is desirable.
> >>>>>>
> >>>>>> One thing important is this patch allows print from both applications
> >>>>>> and bpf program at the same time. Now we have three kinds of print:
> >>>>>> 1) only BPF program prints
> >>>>>> 2) only application program prints
> >>>>>> 3) both can print without side effect
> >>>>>>
> >>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>>>>> ---
> >>>>>>     include/net/sock.h |  1 +
> >>>>>>     net/core/filter.c  |  3 +++
> >>>>>>     net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
> >>>>>>     3 files changed, 42 insertions(+)
> >>>>>>
> >>>>>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644
> >>>>>> --- a/include/net/sock.h
> >>>>>> +++ b/include/net/sock.h
> >>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
> >>>>>>     void sock_def_readable(struct sock *sk);
> >>>>>>
> >>>>>>     int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> >>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>>>     void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
> >>>>>>     int sock_get_timestamping(struct so_timestamping *timestamping,
> >>>>>>                           sockptr_t optval, unsigned int optlen);
> >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>> index 996426095bd9..08135f538c99 100644
> >>>>>> --- a/net/core/filter.c
> >>>>>> +++ b/net/core/filter.c
> >>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >>>>>>         .arg1_type      = ARG_PTR_TO_CTX,
> >>>>>>     };
> >>>>>>
> >>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>>> +
> >>>>>>     static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>>>                                      struct so_timestamping *timestamping)
> >>>>>>     {
> >>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>>>                 return -EINVAL;
> >>>>>>
> >>>>>>         WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> >>>>>> +     static_branch_enable(&bpf_tstamp_control);
> >>>>>
> >>>>> Not sure when is a good time to do static_branch_disable().
> >>>>
> >>>> Thanks for the review.
> >>>>
> >>>> To be honest, I considered how to disable the static key. Like you
> >>>> said, I failed to find a good chance that I can accurately disable it.
> >>>>
> >>>>>
> >>>>> The bpf prog may be detached also. (IF) it ends up staying with the
> >>>>> cgroup/sockops interface, it should depend on the existing static key in
> >>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
> >>>>
> >>>> Are you suggesting that we need to remove the current static key? In
> >>>> the previous thread, the reason why Willem came up with this idea is,
> >>>> I think, to avoid affect the non-bpf timestamping feature.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>         return 0;
> >>>>>>     }
> >>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>>>>> index f36eb9daa31a..d0f912f1ff7b 100644
> >>>>>> --- a/net/core/skbuff.c
> >>>>>> +++ b/net/core/skbuff.c
> >>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >>>>>>     }
> >>>>>>     EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> >>>>>>
> >>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
> >>>>>
> >>>>> sk is unused.
> >>>>
> >>>> Thanks for the careful check.
> >>>>
> >>>>>
> >>>>>> +{
> >>>>>> +     u32 testflag;
> >>>>>> +
> >>>>>> +     switch (tstype) {
> >>>>>> +     case SCM_TSTAMP_SCHED:
> >>>>>
> >>>>> Instead of doing this translation,
> >>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
> >>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
> >>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
> >>>>> sk->sk_tsflags_bpf?
> >>>>
> >>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
> >>>> SOF_TIMESTAMPING_OPT_ID, that we need to support.
> >>>>
> >>>>>
> >>>>>> +             testflag = SOF_TIMESTAMPING_TX_SCHED;
> >>>>>> +             break;
> >>>>>> +     case SCM_TSTAMP_SND:
> >>>>>> +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> >>>>>> +             break;
> >>>>>> +     case SCM_TSTAMP_ACK:
> >>>>>> +             testflag = SOF_TIMESTAMPING_TX_ACK;
> >>>>>> +             break;
> >>>>>> +     default:
> >>>>>> +             return false;
> >>>>>> +     }
> >>>>>> +     if (tsflags & testflag)
> >>>>>> +             return true;
> >>>>>> +
> >>>>>> +     return false;
> >>>>>> +}
> >>>>>> +
> >>>>>>     static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>>>                                  const struct sk_buff *ack_skb,
> >>>>>>                                  struct skb_shared_hwtstamps *hwtstamps,
> >>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>>>         if (!skb_may_tx_timestamp(sk, tsonly))
> >>>>>>                 return;
> >>>>>>
> >>>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> >>>>>
> >>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
> >>>>
> >>>> This test will be used in bpf and non-bpf cases. Because of this, we
> >>>> can support BPF extension. In this function, if skb has tsflags but we
> >>>> don't know which approach the user expects, sk_tstamp_tx_flags() can
> >>>> help us.
> >>>>
> >>>>>
> >>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
> >>>>> skb. Would it break?
> >>>>
> >>>> Oh, you're right. I didn't support cmsg mode...
> >>>
> >>> I think I only need to test if it's in the bpf mode, or else let the
> >>> original way print the timestamp, which can solve the issue.
> >>
> >>   From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk,
> >> SCM_TSTAMP_SCHED);":
> >>
> >> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >> {
> >>          /* ... */
> >>
> >>          if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
> >>                  __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
> >>
> >>          /* ... */
> >> }
> >>
> >> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled
> >> the timestamping. I may have missed somewhere in the patch set that the skb's
> >> tx_flags is changed by sk->sk_tsflags_bpf alone?
> >
> > If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be
> > helpful, which initializes every last skb, please see patch [10/12].
>
> Ah. ok. It is the thing I missed. Thanks for the pointer.
>
> >>
> >> I think a skb tskey is still desired (?), so eventually we want some spaces in
> >
> > tskey function is optional I think. It depends whether users want to
> > use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag.
> >
> >> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending
> >> skb->data_meta usage outside of xdp and tc. I think here we want to have it
> >> available at the tx side to store the tx_flags and tskey but probably want them
> >> at a specific place/offset at the data_meta.
> >
> > If we have the plan to store extra information in data_meta, I can
> > give it a try:)
> >
> >>
> >> For now, is there thing we can explore to share in the skb_shared_info?
> >
> > My initial thought is just to reuse these fields in skb. It can work
> > without interfering one another.
>
> After reading closer to patch 10, I am likely still missing something. How can
> it tell if the tx_flags is set by the bpf or by the user space cmsg?

If the skb carries the timestamp, there are three cases:
1) non-bpf case and users uses setsockopt()
2) cmsg case
3) bpf case

#1 and #2 are already handled well before this patch. I only need to
test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
else it could be #1 or #2, then we will let the old way print
timestamps in __skb_tstamp_tx().

>
> >
> >> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey
> >> only at the "tx" side? There is already another union member.
> >
> > tskey is always used in the tx path.
> >
> > hwtstamps can be used in both rx and tx cases (please see
> > tcp_update_recv_tstamps() and skb_tstamp_tx()).
>
> hmm... we only need some where to store the bpf tx_flags and bpf tskey in the
> TX-ing skb.

And there is one more field we have to take care of: txstamp_ack which
indicates whether we print timestamp when the last skb is acked.
Please see tcp_tx_timestamp().

> You meant the hwtstamps of a Tx-ing skb is not empty?

Sometimes, it's not empty if the hardware supports the timestamp
feature and the user wants to see it (by enabling the
SOF_TIMESTAMPING_TX_HARDWARE flag). As we can see, there are many
callers calling skb_tstamp_tx().

>
> At skb_tstamp_tx (TX side only?), the orig_skb's hwtstamps has not been written yet?

I'm not that sure about the orig_skb. It seems no. I can see some
callers reading ptp timestamp from the nic and pass the timestamp to
skb_tstamp_tx().

Thanks,
Jason
Martin KaFai Lau Oct. 16, 2024, 8:31 a.m. UTC | #12
On 10/16/24 12:54 AM, Jason Xing wrote:
> On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/15/24 11:30 PM, Jason Xing wrote:
>>> On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> On 10/15/24 6:32 PM, Jason Xing wrote:
>>>>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>>>>
>>>>>>> On 10/11/24 9:06 PM, Jason Xing wrote:
>>>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>>>
>>>>>>>> Willem suggested that we use a static key to control. The advantage
>>>>>>>> is that we will not affect the existing applications at all if we
>>>>>>>> don't load BPF program.
>>>>>>>>
>>>>>>>> In this patch, except the static key, I also add one logic that is
>>>>>>>> used to test if the socket has enabled its tsflags in order to
>>>>>>>> support bpf logic to allow both cases to happen at the same time.
>>>>>>>> Or else, the skb carring related timestamp flag doesn't know which
>>>>>>>> way of printing is desirable.
>>>>>>>>
>>>>>>>> One thing important is this patch allows print from both applications
>>>>>>>> and bpf program at the same time. Now we have three kinds of print:
>>>>>>>> 1) only BPF program prints
>>>>>>>> 2) only application program prints
>>>>>>>> 3) both can print without side effect
>>>>>>>>
>>>>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>>>>>>> ---
>>>>>>>>      include/net/sock.h |  1 +
>>>>>>>>      net/core/filter.c  |  3 +++
>>>>>>>>      net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
>>>>>>>>      3 files changed, 42 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644
>>>>>>>> --- a/include/net/sock.h
>>>>>>>> +++ b/include/net/sock.h
>>>>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
>>>>>>>>      void sock_def_readable(struct sock *sk);
>>>>>>>>
>>>>>>>>      int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
>>>>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
>>>>>>>>      void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
>>>>>>>>      int sock_get_timestamping(struct so_timestamping *timestamping,
>>>>>>>>                            sockptr_t optval, unsigned int optlen);
>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>>>> index 996426095bd9..08135f538c99 100644
>>>>>>>> --- a/net/core/filter.c
>>>>>>>> +++ b/net/core/filter.c
>>>>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
>>>>>>>>          .arg1_type      = ARG_PTR_TO_CTX,
>>>>>>>>      };
>>>>>>>>
>>>>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
>>>>>>>> +
>>>>>>>>      static int bpf_sock_set_timestamping(struct sock *sk,
>>>>>>>>                                       struct so_timestamping *timestamping)
>>>>>>>>      {
>>>>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
>>>>>>>>                  return -EINVAL;
>>>>>>>>
>>>>>>>>          WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
>>>>>>>> +     static_branch_enable(&bpf_tstamp_control);
>>>>>>>
>>>>>>> Not sure when is a good time to do static_branch_disable().
>>>>>>
>>>>>> Thanks for the review.
>>>>>>
>>>>>> To be honest, I considered how to disable the static key. Like you
>>>>>> said, I failed to find a good chance that I can accurately disable it.
>>>>>>
>>>>>>>
>>>>>>> The bpf prog may be detached also. (IF) it ends up staying with the
>>>>>>> cgroup/sockops interface, it should depend on the existing static key in
>>>>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
>>>>>>
>>>>>> Are you suggesting that we need to remove the current static key? In
>>>>>> the previous thread, the reason why Willem came up with this idea is,
>>>>>> I think, to avoid affect the non-bpf timestamping feature.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>          return 0;
>>>>>>>>      }
>>>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>>>>>> index f36eb9daa31a..d0f912f1ff7b 100644
>>>>>>>> --- a/net/core/skbuff.c
>>>>>>>> +++ b/net/core/skbuff.c
>>>>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>>>>>>>>      }
>>>>>>>>      EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
>>>>>>>>
>>>>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
>>>>>>>
>>>>>>> sk is unused.
>>>>>>
>>>>>> Thanks for the careful check.
>>>>>>
>>>>>>>
>>>>>>>> +{
>>>>>>>> +     u32 testflag;
>>>>>>>> +
>>>>>>>> +     switch (tstype) {
>>>>>>>> +     case SCM_TSTAMP_SCHED:
>>>>>>>
>>>>>>> Instead of doing this translation,
>>>>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
>>>>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
>>>>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
>>>>>>> sk->sk_tsflags_bpf?
>>>>>>
>>>>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
>>>>>> SOF_TIMESTAMPING_OPT_ID, that we need to support.
>>>>>>
>>>>>>>
>>>>>>>> +             testflag = SOF_TIMESTAMPING_TX_SCHED;
>>>>>>>> +             break;
>>>>>>>> +     case SCM_TSTAMP_SND:
>>>>>>>> +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
>>>>>>>> +             break;
>>>>>>>> +     case SCM_TSTAMP_ACK:
>>>>>>>> +             testflag = SOF_TIMESTAMPING_TX_ACK;
>>>>>>>> +             break;
>>>>>>>> +     default:
>>>>>>>> +             return false;
>>>>>>>> +     }
>>>>>>>> +     if (tsflags & testflag)
>>>>>>>> +             return true;
>>>>>>>> +
>>>>>>>> +     return false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>>>>>>>>                                   const struct sk_buff *ack_skb,
>>>>>>>>                                   struct skb_shared_hwtstamps *hwtstamps,
>>>>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
>>>>>>>>          if (!skb_may_tx_timestamp(sk, tsonly))
>>>>>>>>                  return;
>>>>>>>>
>>>>>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
>>>>>>>
>>>>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
>>>>>>
>>>>>> This test will be used in bpf and non-bpf cases. Because of this, we
>>>>>> can support BPF extension. In this function, if skb has tsflags but we
>>>>>> don't know which approach the user expects, sk_tstamp_tx_flags() can
>>>>>> help us.
>>>>>>
>>>>>>>
>>>>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
>>>>>>> skb. Would it break?
>>>>>>
>>>>>> Oh, you're right. I didn't support cmsg mode...
>>>>>
>>>>> I think I only need to test if it's in the bpf mode, or else let the
>>>>> original way print the timestamp, which can solve the issue.
>>>>
>>>>    From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk,
>>>> SCM_TSTAMP_SCHED);":
>>>>
>>>> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>>>> {
>>>>           /* ... */
>>>>
>>>>           if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
>>>>                   __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
>>>>
>>>>           /* ... */
>>>> }
>>>>
>>>> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled
>>>> the timestamping. I may have missed somewhere in the patch set that the skb's
>>>> tx_flags is changed by sk->sk_tsflags_bpf alone?
>>>
>>> If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be
>>> helpful, which initializes every last skb, please see patch [10/12].
>>
>> Ah. ok. It is the thing I missed. Thanks for the pointer.
>>
>>>>
>>>> I think a skb tskey is still desired (?), so eventually we want some spaces in
>>>
>>> tskey function is optional I think. It depends whether users want to
>>> use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag.
>>>
>>>> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending
>>>> skb->data_meta usage outside of xdp and tc. I think here we want to have it
>>>> available at the tx side to store the tx_flags and tskey but probably want them
>>>> at a specific place/offset at the data_meta.
>>>
>>> If we have the plan to store extra information in data_meta, I can
>>> give it a try:)
>>>
>>>>
>>>> For now, is there thing we can explore to share in the skb_shared_info?
>>>
>>> My initial thought is just to reuse these fields in skb. It can work
>>> without interfering one another.
>>
>> After reading closer to patch 10, I am likely still missing something. How can
>> it tell if the tx_flags is set by the bpf or by the user space cmsg?
> 
> If the skb carries the timestamp, there are three cases:
> 1) non-bpf case and users uses setsockopt()
> 2) cmsg case
> 3) bpf case
> 
> #1 and #2 are already handled well before this patch. I only need to
> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> else it could be #1 or #2, then we will let the old way print
> timestamps in __skb_tstamp_tx().

hmm... I am still not sure I fully understand...but I think I may start getting it.

Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once 
it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. 
It has to be able to set and clear.

Does it also mean either the bpf or the user space can enable the timetstamping 
but not both? I don't think we can assume this also. It will be hard to deploy 
the bpf prog in production to collect continuous data. The user space may have 
some timestamping enabled but the bpf may want to do its parallel investigation 
also. The user space may rollout timestamping in the future and suddenly break 
the bpf prog.

[ getting late here. will continue later. ]

> 
>>
>>>
>>>> Can the "struct skb_shared_hwtstamps hwtstamps;" be used for the bpf tx_flags and tskey
>>>> only at the "tx" side? There is already another union member.
>>>
>>> tskey is always used in the tx path.
>>>
>>> hwtstamps can be used in both rx and tx cases (please see
>>> tcp_update_recv_tstamps() and skb_tstamp_tx()).
>>
>> hmm... we only need some where to store the bpf tx_flags and bpf tskey in the
>> TX-ing skb.
> 
> And there is one more field we have to take care of: txstamp_ack which
> indicates whether we print timestamp when the last skb is acked.
> Please see tcp_tx_timestamp().
> 
>> You meant the hwtstamps of a Tx-ing skb is not empty?
> 
> Sometimes, it's not empty if the hardware supports the timestamp
> feature and the user wants to see it (by enabling the
> SOF_TIMESTAMPING_TX_HARDWARE flag). As we can see, there are many
> callers calling skb_tstamp_tx().
> 
>>
>> At skb_tstamp_tx (TX side only?), the orig_skb's hwtstamps has not been written yet?
> 
> I'm not that sure about the orig_skb. It seems no. I can see some
> callers reading ptp timestamp from the nic and pass the timestamp to
> skb_tstamp_tx().
> 
> Thanks,
> Jason
Jason Xing Oct. 16, 2024, 10:36 a.m. UTC | #13
On Wed, Oct 16, 2024 at 4:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/16/24 12:54 AM, Jason Xing wrote:
> > On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 10/15/24 11:30 PM, Jason Xing wrote:
> >>> On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>>
> >>>> On 10/15/24 6:32 PM, Jason Xing wrote:
> >>>>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>>>>>
> >>>>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>>>>>
> >>>>>>> On 10/11/24 9:06 PM, Jason Xing wrote:
> >>>>>>>> From: Jason Xing <kernelxing@tencent.com>
> >>>>>>>>
> >>>>>>>> Willem suggested that we use a static key to control. The advantage
> >>>>>>>> is that we will not affect the existing applications at all if we
> >>>>>>>> don't load BPF program.
> >>>>>>>>
> >>>>>>>> In this patch, except the static key, I also add one logic that is
> >>>>>>>> used to test if the socket has enabled its tsflags in order to
> >>>>>>>> support bpf logic to allow both cases to happen at the same time.
> >>>>>>>> Or else, the skb carring related timestamp flag doesn't know which
> >>>>>>>> way of printing is desirable.
> >>>>>>>>
> >>>>>>>> One thing important is this patch allows print from both applications
> >>>>>>>> and bpf program at the same time. Now we have three kinds of print:
> >>>>>>>> 1) only BPF program prints
> >>>>>>>> 2) only application program prints
> >>>>>>>> 3) both can print without side effect
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>>>>>>> ---
> >>>>>>>>      include/net/sock.h |  1 +
> >>>>>>>>      net/core/filter.c  |  3 +++
> >>>>>>>>      net/core/skbuff.c  | 38 ++++++++++++++++++++++++++++++++++++++
> >>>>>>>>      3 files changed, 42 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>>>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644
> >>>>>>>> --- a/include/net/sock.h
> >>>>>>>> +++ b/include/net/sock.h
> >>>>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
> >>>>>>>>      void sock_def_readable(struct sock *sk);
> >>>>>>>>
> >>>>>>>>      int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> >>>>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>>>>>      void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
> >>>>>>>>      int sock_get_timestamping(struct so_timestamping *timestamping,
> >>>>>>>>                            sockptr_t optval, unsigned int optlen);
> >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>>>> index 996426095bd9..08135f538c99 100644
> >>>>>>>> --- a/net/core/filter.c
> >>>>>>>> +++ b/net/core/filter.c
> >>>>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >>>>>>>>          .arg1_type      = ARG_PTR_TO_CTX,
> >>>>>>>>      };
> >>>>>>>>
> >>>>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>>>>> +
> >>>>>>>>      static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>>>>>                                       struct so_timestamping *timestamping)
> >>>>>>>>      {
> >>>>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>>>>>                  return -EINVAL;
> >>>>>>>>
> >>>>>>>>          WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> >>>>>>>> +     static_branch_enable(&bpf_tstamp_control);
> >>>>>>>
> >>>>>>> Not sure when is a good time to do static_branch_disable().
> >>>>>>
> >>>>>> Thanks for the review.
> >>>>>>
> >>>>>> To be honest, I considered how to disable the static key. Like you
> >>>>>> said, I failed to find a good chance that I can accurately disable it.
> >>>>>>
> >>>>>>>
> >>>>>>> The bpf prog may be detached also. (IF) it ends up staying with the
> >>>>>>> cgroup/sockops interface, it should depend on the existing static key in
> >>>>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
> >>>>>>
> >>>>>> Are you suggesting that we need to remove the current static key? In
> >>>>>> the previous thread, the reason why Willem came up with this idea is,
> >>>>>> I think, to avoid affect the non-bpf timestamping feature.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>          return 0;
> >>>>>>>>      }
> >>>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>>>>>>> index f36eb9daa31a..d0f912f1ff7b 100644
> >>>>>>>> --- a/net/core/skbuff.c
> >>>>>>>> +++ b/net/core/skbuff.c
> >>>>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >>>>>>>>      }
> >>>>>>>>      EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> >>>>>>>>
> >>>>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
> >>>>>>>
> >>>>>>> sk is unused.
> >>>>>>
> >>>>>> Thanks for the careful check.
> >>>>>>
> >>>>>>>
> >>>>>>>> +{
> >>>>>>>> +     u32 testflag;
> >>>>>>>> +
> >>>>>>>> +     switch (tstype) {
> >>>>>>>> +     case SCM_TSTAMP_SCHED:
> >>>>>>>
> >>>>>>> Instead of doing this translation,
> >>>>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
> >>>>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
> >>>>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
> >>>>>>> sk->sk_tsflags_bpf?
> >>>>>>
> >>>>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
> >>>>>> SOF_TIMESTAMPING_OPT_ID, that we need to support.
> >>>>>>
> >>>>>>>
> >>>>>>>> +             testflag = SOF_TIMESTAMPING_TX_SCHED;
> >>>>>>>> +             break;
> >>>>>>>> +     case SCM_TSTAMP_SND:
> >>>>>>>> +             testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> >>>>>>>> +             break;
> >>>>>>>> +     case SCM_TSTAMP_ACK:
> >>>>>>>> +             testflag = SOF_TIMESTAMPING_TX_ACK;
> >>>>>>>> +             break;
> >>>>>>>> +     default:
> >>>>>>>> +             return false;
> >>>>>>>> +     }
> >>>>>>>> +     if (tsflags & testflag)
> >>>>>>>> +             return true;
> >>>>>>>> +
> >>>>>>>> +     return false;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>      static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>>>>>                                   const struct sk_buff *ack_skb,
> >>>>>>>>                                   struct skb_shared_hwtstamps *hwtstamps,
> >>>>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>>>>>          if (!skb_may_tx_timestamp(sk, tsonly))
> >>>>>>>>                  return;
> >>>>>>>>
> >>>>>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> >>>>>>>
> >>>>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
> >>>>>>
> >>>>>> This test will be used in bpf and non-bpf cases. Because of this, we
> >>>>>> can support BPF extension. In this function, if skb has tsflags but we
> >>>>>> don't know which approach the user expects, sk_tstamp_tx_flags() can
> >>>>>> help us.
> >>>>>>
> >>>>>>>
> >>>>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
> >>>>>>> skb. Would it break?
> >>>>>>
> >>>>>> Oh, you're right. I didn't support cmsg mode...
> >>>>>
> >>>>> I think I only need to test if it's in the bpf mode, or else let the
> >>>>> original way print the timestamp, which can solve the issue.
> >>>>
> >>>>    From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk,
> >>>> SCM_TSTAMP_SCHED);":
> >>>>
> >>>> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >>>> {
> >>>>           /* ... */
> >>>>
> >>>>           if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
> >>>>                   __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
> >>>>
> >>>>           /* ... */
> >>>> }
> >>>>
> >>>> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled
> >>>> the timestamping. I may have missed somewhere in the patch set that the skb's
> >>>> tx_flags is changed by sk->sk_tsflags_bpf alone?
> >>>
> >>> If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be
> >>> helpful, which initializes every last skb, please see patch [10/12].
> >>
> >> Ah. ok. It is the thing I missed. Thanks for the pointer.
> >>
> >>>>
> >>>> I think a skb tskey is still desired (?), so eventually we want some spaces in
> >>>
> >>> tskey function is optional I think. It depends whether users want to
> >>> use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag.
> >>>
> >>>> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending
> >>>> skb->data_meta usage outside of xdp and tc. I think here we want to have it
> >>>> available at the tx side to store the tx_flags and tskey but probably want them
> >>>> at a specific place/offset at the data_meta.
> >>>
> >>> If we have the plan to store extra information in data_meta, I can
> >>> give it a try:)
> >>>
> >>>>
> >>>> For now, is there thing we can explore to share in the skb_shared_info?
> >>>
> >>> My initial thought is just to reuse these fields in skb. It can work
> >>> without interfering one another.
> >>
> >> After reading closer to patch 10, I am likely still missing something. How can
> >> it tell if the tx_flags is set by the bpf or by the user space cmsg?
> >
> > If the skb carries the timestamp, there are three cases:
> > 1) non-bpf case and users uses setsockopt()
> > 2) cmsg case
> > 3) bpf case
> >
> > #1 and #2 are already handled well before this patch. I only need to
> > test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> > else it could be #1 or #2, then we will let the old way print
> > timestamps in __skb_tstamp_tx().
>
> hmm... I am still not sure I fully understand...but I think I may start getting it.

Sorry, my bad. I gave the wrong answer...

It should be:
Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
work fine. If it has the flag, we could use skb_tstamp_tx_output() to
print based on patch [4/12]; if not, we will use
bpf_skb_tstamp_tx_output() to print.

If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
(please see Documentation/networking/timestamping.rst). We can see a
good example on how to use in
tools/testing/selftests/net/txtimestamp.c:
do_test()
{
        sock_opt = SOF_TIMESTAMPING_SOFTWARE |
        ...
        if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
                              (char *) &sock_opt, sizeof(sock_opt)))
}

>
> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> It has to be able to set and clear.

I cannot find a good time to clear all the sockets which are set
through the BPF program. If we detach the BPF program, it will not
print of course. Does it really matter if we don't clear the
sk_tsflags_bpf?

>
> Does it also mean either the bpf or the user space can enable the timetstamping
> but not both? I don't think we can assume this also. It will be hard to deploy
> the bpf prog in production to collect continuous data. The user space may have
> some timestamping enabled but the bpf may want to do its parallel investigation
> also. The user space may rollout timestamping in the future and suddenly break
> the bpf prog.

Well, IIUC, it's also the basic idea from the current series which
allows both happening at the same time. Let us put it in a simple way,
I hope that if the app uses the SO_TIMESTAMPING feature already, then
one admin deploys the BPF program, that app should be traced both in
bpf and non-bpf ways.

But Willem doesn't agree about this approach[1] because of hard to debug.

[1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
Regarding to this link, I have a few more words to say: the socket
could be set through bpf_setsockopt() in different phases not like
setsockopt(), so in some cases we cannot make setsockopt hard failed.

After rethinking this point more, I still reckon that letting BPF
program trace timestamping parallelly without caring whether the
socket is set to the SO_TIMESTAMPING feature through setsockopt()
method. It means I would like to keep this part in patch [04/12]:
@@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
        if (!sk)
                return;

+       if (static_branch_unlikely(&bpf_tstamp_control))
+               bpf_skb_tstamp_tx_output(sk, tstype);
+
        skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk,
tstype);
 }
 EXPORT_SYMBOL_GPL(__skb_tstamp_tx);

>
> [ getting late here. will continue later. ]

Thanks for your effort :)

Thanks,
Jason
Willem de Bruijn Oct. 16, 2024, 1:13 p.m. UTC | #14
Jason Xing wrote:
> On Wed, Oct 16, 2024 at 2:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 10/15/24 6:04 PM, Jason Xing wrote:
> > > To be honest, I considered how to disable the static key. Like you
> > > said, I failed to find a good chance that I can accurately disable it.
> >
> > It at least needs to be disabled whenever that bpf prog got detached.
> >
> > >
> > >> The bpf prog may be detached also. (IF) it ends up staying with the
> > >> cgroup/sockops interface, it should depend on the existing static key in
> > >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
> >
> > > Are you suggesting that we need to remove the current static key? In
> > > the previous thread, the reason why Willem came up with this idea is,
> > > I think, to avoid affect the non-bpf timestamping feature.
> >
> > Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am
> > saying to use that existing key. afaict, the newly added bpf_tstamp_control key
> > is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less
> > granular but it has the needed accounting to disable whenever the bpf prog got
> > detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS).
> 
> Good suggestion. Good thing is that I don't need to figure out a
> proper place to disable it any more. I can directly use
> cgroup_bpf_enabled(CGROUP_SOCK_OPS) to test if the timestamp should be
> printed with BPF program loaded.
> 
> BTW, I found that we don't implement how to disable the ip4_min_ttl
> static key. Sometimes, I'm confused whether we have to disable it at a
> certain time.

In this case it would be fine to not disable it at all.

The crux is that it is disabled on the vast majority of machines not
using the feature. If a socket uses the feature, adding the small cost
of the branches on the rest of the system is fine.

Disabling requires refcounting usage. Sometimes the complexity and
cost of that outweights the benefit.
Jason Xing Oct. 16, 2024, 1:22 p.m. UTC | #15
On Wed, Oct 16, 2024 at 9:13 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Oct 16, 2024 at 2:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >
> > > On 10/15/24 6:04 PM, Jason Xing wrote:
> > > > To be honest, I considered how to disable the static key. Like you
> > > > said, I failed to find a good chance that I can accurately disable it.
> > >
> > > It at least needs to be disabled whenever that bpf prog got detached.
> > >
> > > >
> > > >> The bpf prog may be detached also. (IF) it ends up staying with the
> > > >> cgroup/sockops interface, it should depend on the existing static key in
> > > >> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
> > >
> > > > Are you suggesting that we need to remove the current static key? In
> > > > the previous thread, the reason why Willem came up with this idea is,
> > > > I think, to avoid affect the non-bpf timestamping feature.
> > >
> > > Take a look at cgroup_bpf_enabled(CGROUP_SOCK_OPS). There is a static key. I am
> > > saying to use that existing key. afaict, the newly added bpf_tstamp_control key
> > > is mainly an optimization. Yes, cgroup_bpf_enabled(CGROUP_SOCK_OPS) is less
> > > granular but it has the needed accounting to disable whenever the bpf prog got
> > > detached, so better just reuse the cgroup_bpf_enabled(CGROUP_SOCK_OPS).
> >
> > Good suggestion. Good thing is that I don't need to figure out a
> > proper place to disable it any more. I can directly use
> > cgroup_bpf_enabled(CGROUP_SOCK_OPS) to test if the timestamp should be
> > printed with BPF program loaded.
> >
> > BTW, I found that we don't implement how to disable the ip4_min_ttl
> > static key. Sometimes, I'm confused whether we have to disable it at a
> > certain time.
>
> In this case it would be fine to not disable it at all.
>
> The crux is that it is disabled on the vast majority of machines not
> using the feature. If a socket uses the feature, adding the small cost
> of the branches on the rest of the system is fine.
>
> Disabling requires refcounting usage. Sometimes the complexity and
> cost of that outweights the benefit.

Thanks for the explanation. I will take Martin's advice and use the
CGROUP_SOCK_OPS static key. So I don't have to take efforts to
implement the inc/dec/enable/disable the static key

Thanks,
Jason
Martin KaFai Lau Oct. 17, 2024, 12:48 a.m. UTC | #16
On 10/16/24 3:36 AM, Jason Xing wrote:
>>> If the skb carries the timestamp, there are three cases:
>>> 1) non-bpf case and users uses setsockopt()
>>> 2) cmsg case
>>> 3) bpf case

These should have tests in the selftests/bpf/ sooner than later. (More below).

>>>
>>> #1 and #2 are already handled well before this patch. I only need to
>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
>>> else it could be #1 or #2, then we will let the old way print
>>> timestamps in __skb_tstamp_tx().
>>
>> hmm... I am still not sure I fully understand...but I think I may start getting it.
> 
> Sorry, my bad. I gave the wrong answer...
> 
> It should be:
> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should

You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?

Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current 
behavior is to just queue to the sk_error_queue as long as there is 
"SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the 
sk_tsflags. This will eventually get ignored when user read it from the error 
queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? I suspect 
the user space will still read something from the error queue unless there is 
SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.

Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it 
from even queuing to the error queue? I think it is ok but I am not sure if 
anyone is depending on the above behavior.

> work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> print based on patch [4/12]; if not, we will use
> bpf_skb_tstamp_tx_output() to print.
> 
> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> (please see Documentation/networking/timestamping.rst). We can see a
> good example on how to use in
> tools/testing/selftests/net/txtimestamp.c:
> do_test()
> {
>          sock_opt = SOF_TIMESTAMPING_SOFTWARE |
>          ...
>          if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
>                                (char *) &sock_opt, sizeof(sock_opt)))
> }
> 
>>
>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
>> It has to be able to set and clear.
> 
> I cannot find a good time to clear all the sockets which are set
> through the BPF program. If we detach the BPF program, it will not
> print of course. Does it really matter if we don't clear the
> sk_tsflags_bpf?

Yes, it matters. The same reason goes for why the existing bpf prog can clear 
the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the 
timestamp. For sockops program that stays forever, not all usages expect to do 
timestamping for the whole lifetime of the connection. If there is a way for the 
prog to turn it on, it should have a way for the prog to turn it off.

What is the concern of allowing the bpf prog to disable something that it has 
enabled before?

While we are on bpf_sock_ops_cb_flags, the 
BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in 
the new sk_tsflags_bpf. It is something we need to clean up later when we decide 
what interface to use for bpf timestamping.

> 
>>
>> Does it also mean either the bpf or the user space can enable the timetstamping
>> but not both? I don't think we can assume this also. It will be hard to deploy
>> the bpf prog in production to collect continuous data. The user space may have
>> some timestamping enabled but the bpf may want to do its parallel investigation
>> also. The user space may rollout timestamping in the future and suddenly break
>> the bpf prog.
> 
> Well, IIUC, it's also the basic idea from the current series which
> allows both happening at the same time. Let us put it in a simple way,
> I hope that if the app uses the SO_TIMESTAMPING feature already, then
> one admin deploys the BPF program, that app should be traced both in
> bpf and non-bpf ways.
> 
> But Willem doesn't agree about this approach[1] because of hard to debug.
> 
> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
> Regarding to this link, I have a few more words to say: the socket
> could be set through bpf_setsockopt() in different phases not like
> setsockopt(), so in some cases we cannot make setsockopt hard failed.
> 
> After rethinking this point more, I still reckon that letting BPF
> program trace timestamping parallelly without caring whether the
> socket is set to the SO_TIMESTAMPING feature through setsockopt()

I am afraid having both work in parallel is needed. Otherwise, it will be very 
hard to deploy a bpf prog to run continuously in scale. Being able to collect 
timestamp without worrying about application changes/updates/downgrades is 
important. e.g. App changes from no time stamping to time stamping

Please help to add selftests to show how the above cases (1), (2), (3), and 
other tsflags/txflags sharing cases will work. This should not be delayed until 
the discussion is done. It is needed sooner or later to prove both bpf and 
non-bpf ways can work at the same time. It will help the reviewer and also help 
to think about the design with a real use case in bpf prog.

The example in patch 0 only prints the reported tstamp, can you share how it 
will be used to investigate issue? Is it also useful to know when the skb is 
written to the kernel during sendmsg()?

Regarding the bpf_setsockopt() can be called in different phase, 
bpf_setsockopt() is not limited to sockops program. e.g. it can also be called 
from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be 
surprised people will try to trigger some on-and-off timestamping from 
bpf-tcp-cc to measure some delay.


More about bpf_setsockopt() in different phase, understand that UDP is not your 
priority. However, it needs to have some clarity on how UDP will work and how to 
enable it. UDP usually has no connect/established phase.

Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are 
planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in 
another thread. What else?

> method. It means I would like to keep this part in patch [04/12]:
> @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>          if (!sk)
>                  return;
> 
> +       if (static_branch_unlikely(&bpf_tstamp_control))
> +               bpf_skb_tstamp_tx_output(sk, tstype);
> +
>          skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk,
> tstype);
>   }
>   EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
> 
>>
>> [ getting late here. will continue later. ]
> 
> Thanks for your effort :)
> 
> Thanks,
> Jason
Jason Xing Oct. 17, 2024, 2:28 a.m. UTC | #17
On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/16/24 3:36 AM, Jason Xing wrote:
> >>> If the skb carries the timestamp, there are three cases:
> >>> 1) non-bpf case and users uses setsockopt()
> >>> 2) cmsg case
> >>> 3) bpf case
>
> These should have tests in the selftests/bpf/ sooner than later. (More below).
>
> >>>
> >>> #1 and #2 are already handled well before this patch. I only need to
> >>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> >>> else it could be #1 or #2, then we will let the old way print
> >>> timestamps in __skb_tstamp_tx().
> >>
> >> hmm... I am still not sure I fully understand...but I think I may start getting it.
> >
> > Sorry, my bad. I gave the wrong answer...
> >
> > It should be:
> > Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
>
> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?

Yep.

>
> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
> behavior is to just queue to the sk_error_queue as long as there is
> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
> sk_tsflags. This will eventually get ignored when user read it from the error
> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?

Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
read the skb from the errqueue but are not able to parse the
timestamps. Please see
tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
       ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))

> I suspect
> the user space will still read something from the error queue unless there is
> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.

No, please see above.

>
> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
> from even queuing to the error queue? I think it is ok but I am not sure if
> anyone is depending on the above behavior.

SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
features including cmsg mode. But it will not be used in bpf mode. So
the test statement is enough to divided those three cases into two
groups.

>
> > work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> > print based on patch [4/12]; if not, we will use
> > bpf_skb_tstamp_tx_output() to print.
> >
> > If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> > always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> > (please see Documentation/networking/timestamping.rst). We can see a
> > good example on how to use in
> > tools/testing/selftests/net/txtimestamp.c:
> > do_test()
> > {
> >          sock_opt = SOF_TIMESTAMPING_SOFTWARE |
> >          ...
> >          if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
> >                                (char *) &sock_opt, sizeof(sock_opt)))
> > }
> >
> >>
> >> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> >> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> >> It has to be able to set and clear.
> >
> > I cannot find a good time to clear all the sockets which are set
> > through the BPF program. If we detach the BPF program, it will not
> > print of course. Does it really matter if we don't clear the
> > sk_tsflags_bpf?
>
> Yes, it matters. The same reason goes for why the existing bpf prog can clear
> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
> timestamp. For sockops program that stays forever, not all usages expect to do
> timestamping for the whole lifetime of the connection. If there is a way for the
> prog to turn it on, it should have a way for the prog to turn it off.

I see what you meant here. If we don't clear sk_tsflags_bpf, after we
detach the bpf program, it will do nothing in __skb_tstamp_tx() and
return earlier. It is almost equal to the effect of turning off. It is
why I don't handle clearing the flag.

>
> What is the concern of allowing the bpf prog to disable something that it has
> enabled before?

Let me give one instance:
If one socket is established and stays idle, how can the bpf program
clear the tsflags from that socket? I have no idea.

>
> While we are on bpf_sock_ops_cb_flags, the
> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
> what interface to use for bpf timestamping.

I'm not sure if I understand correctly. I mimicked the use of
BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
bpf_sock_ops_cb_flags_set() in BPF program?

>
> >
> >>
> >> Does it also mean either the bpf or the user space can enable the timetstamping
> >> but not both? I don't think we can assume this also. It will be hard to deploy
> >> the bpf prog in production to collect continuous data. The user space may have
> >> some timestamping enabled but the bpf may want to do its parallel investigation
> >> also. The user space may rollout timestamping in the future and suddenly break
> >> the bpf prog.
> >
> > Well, IIUC, it's also the basic idea from the current series which
> > allows both happening at the same time. Let us put it in a simple way,
> > I hope that if the app uses the SO_TIMESTAMPING feature already, then
> > one admin deploys the BPF program, that app should be traced both in
> > bpf and non-bpf ways.
> >
> > But Willem doesn't agree about this approach[1] because of hard to debug.
> >
> > [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
> > Regarding to this link, I have a few more words to say: the socket
> > could be set through bpf_setsockopt() in different phases not like
> > setsockopt(), so in some cases we cannot make setsockopt hard failed.
> >
> > After rethinking this point more, I still reckon that letting BPF
> > program trace timestamping parallelly without caring whether the
> > socket is set to the SO_TIMESTAMPING feature through setsockopt()
>
> I am afraid having both work in parallel is needed. Otherwise, it will be very
> hard to deploy a bpf prog to run continuously in scale. Being able to collect
> timestamp without worrying about application changes/updates/downgrades is
> important. e.g. App changes from no time stamping to time stamping

Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
:) So let me keep the current logic of printing (see the
__skb_tstamp_tx() function in patch [04/12]) in the next version. Then
I don't need to add some test statements to distinguish which way of
printing.

>
> Please help to add selftests to show how the above cases (1), (2), (3), and
> other tsflags/txflags sharing cases will work. This should not be delayed until
> the discussion is done. It is needed sooner or later to prove both bpf and
> non-bpf ways can work at the same time. It will help the reviewer and also help
> to think about the design with a real use case in bpf prog.

Got it. But I'm not sure where I should put those test cases? Could
you help me point out a good example that I can follow?

>
> The example in patch 0 only prints the reported tstamp, can you share how it
> will be used to investigate issue?

No problem. Please see chapter 3 about "goal" in
https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.

> Is it also useful to know when the skb is
> written to the kernel during sendmsg()?

You are right. Before this patch, normally applications will record an
accurate timestamp before do sendmsg().

After you remind me of this, I feel that we can add the timestamp
print in the future for bpf use.

>
> Regarding the bpf_setsockopt() can be called in different phase,
> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
> surprised people will try to trigger some on-and-off timestamping from
> bpf-tcp-cc to measure some delay.
>
>
> More about bpf_setsockopt() in different phase, understand that UDP is not your
> priority. However, it needs to have some clarity on how UDP will work and how to
> enable it. UDP usually has no connect/established phase.

For now, I don't expect an extension for UDP because it will bring too
much extra work. Could we discuss this later? I mean, after we finish
the basic bpf extension :)

>
> Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are
> planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in
> another thread. What else?

In this patch series, I support
SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE,
which you've already noticed from the BPF example in patch [0/12].
They all come from the original design of SO_TIMESTAMPING feature.

The question you proposed is what I am willing to implement in the
future, like adding one hook in tcp_write_xmit()? It's part of my
plans to extend in the future, not be included in this series.

Thanks for your review.

Thanks,
Jason
Martin KaFai Lau Oct. 17, 2024, 8:43 p.m. UTC | #18
On 10/16/24 7:28 PM, Jason Xing wrote:
> On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/16/24 3:36 AM, Jason Xing wrote:
>>>>> If the skb carries the timestamp, there are three cases:
>>>>> 1) non-bpf case and users uses setsockopt()
>>>>> 2) cmsg case
>>>>> 3) bpf case
>>
>> These should have tests in the selftests/bpf/ sooner than later. (More below).
>>
>>>>>
>>>>> #1 and #2 are already handled well before this patch. I only need to
>>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
>>>>> else it could be #1 or #2, then we will let the old way print
>>>>> timestamps in __skb_tstamp_tx().
>>>>
>>>> hmm... I am still not sure I fully understand...but I think I may start getting it.
>>>
>>> Sorry, my bad. I gave the wrong answer...
>>>
>>> It should be:
>>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
>>
>> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?
> 
> Yep.
> 
>>
>> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
>> behavior is to just queue to the sk_error_queue as long as there is
>> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
>> sk_tsflags. This will eventually get ignored when user read it from the error
>> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?
> 
> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> read the skb from the errqueue but are not able to parse the
> timestamps. Please see
> tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
> if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
>         ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))

afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error 
still returns the skb to the user.

I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not 
set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka 
traditional) way is that the generated skb will still be queued in the error 
queue. The user space can still read it but just won't have the timestamp cmsg.

If I understand how the v3 may look like, the skb will not be queued in the 
error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user 
space won't get it from the error queue which is a change of behavior. I was 
saying I am fine but not sure if someone depends on this behavior.

I think we start talking pass each other on this. I will wait for the code on 
this part and the selftest first.

> 
>> I suspect
>> the user space will still read something from the error queue unless there is
>> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.
> 
> No, please see above.
> 
>>
>> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
>> from even queuing to the error queue? I think it is ok but I am not sure if
>> anyone is depending on the above behavior.
> 
> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING

Got it. This part is now understood.

It is one of the reasons for my earlier question on which SOF_* that bpf needs 
to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but 
lets leave these nits for a little later.

However, it will be very useful to highlight which SOF_* will never be used in 
bpf in v3.

> features including cmsg mode. But it will not be used in bpf mode. So
> the test statement is enough to divided those three cases into two
> groups.

> 
>>
>>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to
>>> print based on patch [4/12]; if not, we will use
>>> bpf_skb_tstamp_tx_output() to print.
>>>
>>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
>>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
>>> (please see Documentation/networking/timestamping.rst). We can see a
>>> good example on how to use in
>>> tools/testing/selftests/net/txtimestamp.c:
>>> do_test()
>>> {
>>>           sock_opt = SOF_TIMESTAMPING_SOFTWARE |
>>>           ...
>>>           if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
>>>                                 (char *) &sock_opt, sizeof(sock_opt)))
>>> }
>>>
>>>>
>>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
>>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
>>>> It has to be able to set and clear.
>>>
>>> I cannot find a good time to clear all the sockets which are set
>>> through the BPF program. If we detach the BPF program, it will not
>>> print of course. Does it really matter if we don't clear the
>>> sk_tsflags_bpf?
>>
>> Yes, it matters. The same reason goes for why the existing bpf prog can clear
>> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
>> timestamp. For sockops program that stays forever, not all usages expect to do
>> timestamping for the whole lifetime of the connection. If there is a way for the
>> prog to turn it on, it should have a way for the prog to turn it off.
> 
> I see what you meant here. If we don't clear sk_tsflags_bpf, after we
> detach the bpf program, it will do nothing in __skb_tstamp_tx() and
> return earlier. It is almost equal to the effect of turning off. It is
> why I don't handle clearing the flag.
> 
>>
>> What is the concern of allowing the bpf prog to disable something that it has
>> enabled before?
> 
> Let me give one instance:
> If one socket is established and stays idle, how can the bpf program
> clear the tsflags from that socket? I have no idea.

bpf_tcp_iter prog can. That said, the idle connection example is too carry away 
as an excuse that bpf_setsockopt does not need to support turning-off. Sure, 
idle connection is as good as off. and yes, detach is as good as off also.

I am now acting as a broken clock repeating myself that not all use cases run 
for 5 mins and then detach, so I need to be specific here that bpf_setsockopt 
not supporting off is a nack. There are many use cases in production that the 
bpf prog runs forever and wants to turn it on-and-off.

Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc 
that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do 
periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn 
it off if needed.

I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am 
asking to support clearing the sk_tsflags_bpf in bpf_setsockopt().

I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits.

> 
>>
>> While we are on bpf_sock_ops_cb_flags, the
>> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
>> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
>> what interface to use for bpf timestamping.
> 
> I'm not sure if I understand correctly. I mimicked the use of
> BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
> bpf_sock_ops_cb_flags_set() in BPF program?

In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant. 
It is as good as testing and setting sk_tsflags_bpf alone.

This could be some cleanup for the later stage of the set.

 > >>
>>>
>>>>
>>>> Does it also mean either the bpf or the user space can enable the timetstamping
>>>> but not both? I don't think we can assume this also. It will be hard to deploy
>>>> the bpf prog in production to collect continuous data. The user space may have
>>>> some timestamping enabled but the bpf may want to do its parallel investigation
>>>> also. The user space may rollout timestamping in the future and suddenly break
>>>> the bpf prog.
>>>
>>> Well, IIUC, it's also the basic idea from the current series which
>>> allows both happening at the same time. Let us put it in a simple way,
>>> I hope that if the app uses the SO_TIMESTAMPING feature already, then
>>> one admin deploys the BPF program, that app should be traced both in
>>> bpf and non-bpf ways.
>>>
>>> But Willem doesn't agree about this approach[1] because of hard to debug.
>>>
>>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
>>> Regarding to this link, I have a few more words to say: the socket
>>> could be set through bpf_setsockopt() in different phases not like
>>> setsockopt(), so in some cases we cannot make setsockopt hard failed.
>>>
>>> After rethinking this point more, I still reckon that letting BPF
>>> program trace timestamping parallelly without caring whether the
>>> socket is set to the SO_TIMESTAMPING feature through setsockopt()
>>
>> I am afraid having both work in parallel is needed. Otherwise, it will be very
>> hard to deploy a bpf prog to run continuously in scale. Being able to collect
>> timestamp without worrying about application changes/updates/downgrades is
>> important. e.g. App changes from no time stamping to time stamping
> 
> Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
> :) So let me keep the current logic of printing (see the
> __skb_tstamp_tx() function in patch [04/12]) in the next version. Then
> I don't need to add some test statements to distinguish which way of
> printing.
> 
>>
>> Please help to add selftests to show how the above cases (1), (2), (3), and
>> other tsflags/txflags sharing cases will work. This should not be delayed until
>> the discussion is done. It is needed sooner or later to prove both bpf and
>> non-bpf ways can work at the same time. It will help the reviewer and also help
>> to think about the design with a real use case in bpf prog.
> 
> Got it. But I'm not sure where I should put those test cases? Could
> you help me point out a good example that I can follow?

Have you looked at the selftests/bpf directory?

prog_tests/setget_sockopt.c may be something closer to what you need.

There is a recent one in the mailing list also:

https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com/

The expectation is to be able to run the test like this: ./test_progs -t 
setget_sockopt

> 
>>
>> The example in patch 0 only prints the reported tstamp, can you share how it
>> will be used to investigate issue?
> 
> No problem. Please see chapter 3 about "goal" in
> https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.

Thanks.

> 
>> Is it also useful to know when the skb is
>> written to the kernel during sendmsg()?
> 
> You are right. Before this patch, normally applications will record an
> accurate timestamp before do sendmsg().
> 
> After you remind me of this, I feel that we can add the timestamp
> print in the future for bpf use.

Yes, please add the sendmsg timestamp capturing in the selftest. It is useful.

> 
>>
>> Regarding the bpf_setsockopt() can be called in different phase,
>> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
>> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
>> surprised people will try to trigger some on-and-off timestamping from
>> bpf-tcp-cc to measure some delay.
>>
>>
>> More about bpf_setsockopt() in different phase, understand that UDP is not your
>> priority. However, it needs to have some clarity on how UDP will work and how to
>> enable it. UDP usually has no connect/established phase.
> 
> For now, I don't expect an extension for UDP because it will bring too
> much extra work. Could we discuss this later? I mean, after we finish
> the basic bpf extension :)

Later is fine but before this set lands. I am not asking a full UDP 
implementation but need ideas on how that may look like. We need some clarity on 
how UDP will work and also how much new sockops API extension will be needed to 
decide if sockops is the correct one going forward. I don't want to end up tcp 
is in sockops and UDP (and others) is non sockops.

That said, the current priority is to get bpf prog and user space work without 
stepping on each other first.

> 
>>
>> Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are
>> planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in
>> another thread. What else?
> 
> In this patch series, I support
> SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE,
> which you've already noticed from the BPF example in patch [0/12].
> They all come from the original design of SO_TIMESTAMPING feature.
> 
> The question you proposed is what I am willing to implement in the
> future, like adding one hook in tcp_write_xmit()? It's part of my
> plans to extend in the future, not be included in this series.
Jason Xing Oct. 18, 2024, 2:52 a.m. UTC | #19
On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/16/24 7:28 PM, Jason Xing wrote:
> > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 10/16/24 3:36 AM, Jason Xing wrote:
> >>>>> If the skb carries the timestamp, there are three cases:
> >>>>> 1) non-bpf case and users uses setsockopt()
> >>>>> 2) cmsg case
> >>>>> 3) bpf case
> >>
> >> These should have tests in the selftests/bpf/ sooner than later. (More below).
> >>
> >>>>>
> >>>>> #1 and #2 are already handled well before this patch. I only need to
> >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> >>>>> else it could be #1 or #2, then we will let the old way print
> >>>>> timestamps in __skb_tstamp_tx().
> >>>>
> >>>> hmm... I am still not sure I fully understand...but I think I may start getting it.
> >>>
> >>> Sorry, my bad. I gave the wrong answer...
> >>>
> >>> It should be:
> >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
> >>
> >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?
> >
> > Yep.
> >
> >>
> >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
> >> behavior is to just queue to the sk_error_queue as long as there is
> >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
> >> sk_tsflags. This will eventually get ignored when user read it from the error
> >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?
> >
> > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > read the skb from the errqueue but are not able to parse the
> > timestamps. Please see
> > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
> > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
> >         ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))
>
> afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error
> still returns the skb to the user.
>
> I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not
> set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka
> traditional) way is that the generated skb will still be queued in the error
> queue. The user space can still read it but just won't have the timestamp cmsg.

Apparently, we're on the same page. What you were saying is right, of course :)

>
> If I understand how the v3 may look like, the skb will not be queued in the
> error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user

Right, skb will not even be cloned or generated, let alone queue it in
the error queue. For bpf extension, preventing skb to be queued in the
error queue is a very vital thing.

> space won't get it from the error queue which is a change of behavior. I was
> saying I am fine but not sure if someone depends on this behavior.

For bpf part, it's okay to bypass the queuing skb into the errqueue
logic, which has been discussed at netconf before this series with
Willem also.
For non-bpf part, I will not touch and modify their prior behaviour.

>
> I think we start talking pass each other on this. I will wait for the code on
> this part and the selftest first.

I will keep this code snippets in V3 so that three kinds of printing
could work parallelly:
@@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
        if (!sk)
                return;

+       if (static_branch_unlikely(&bpf_tstamp_control))
+               bpf_skb_tstamp_tx_output(sk, tstype);
+
        skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
 }

So please forget what we've talked about testing the
SOF_TIMESTAMPING_SOFTWARE flag thing.

>
> >
> >> I suspect
> >> the user space will still read something from the error queue unless there is
> >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.
> >
> > No, please see above.
> >
> >>
> >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
> >> from even queuing to the error queue? I think it is ok but I am not sure if
> >> anyone is depending on the above behavior.
> >
> > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
>
> Got it. This part is now understood.
>
> It is one of the reasons for my earlier question on which SOF_* that bpf needs
> to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but
> lets leave these nits for a little later.
>
> However, it will be very useful to highlight which SOF_* will never be used in
> bpf in v3.

Got it. Will do that.

>
> > features including cmsg mode. But it will not be used in bpf mode. So
> > the test statement is enough to divided those three cases into two
> > groups.
>
> >
> >>
> >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> >>> print based on patch [4/12]; if not, we will use
> >>> bpf_skb_tstamp_tx_output() to print.
> >>>
> >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> >>> (please see Documentation/networking/timestamping.rst). We can see a
> >>> good example on how to use in
> >>> tools/testing/selftests/net/txtimestamp.c:
> >>> do_test()
> >>> {
> >>>           sock_opt = SOF_TIMESTAMPING_SOFTWARE |
> >>>           ...
> >>>           if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
> >>>                                 (char *) &sock_opt, sizeof(sock_opt)))
> >>> }
> >>>
> >>>>
> >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> >>>> It has to be able to set and clear.
> >>>
> >>> I cannot find a good time to clear all the sockets which are set
> >>> through the BPF program. If we detach the BPF program, it will not
> >>> print of course. Does it really matter if we don't clear the
> >>> sk_tsflags_bpf?
> >>
> >> Yes, it matters. The same reason goes for why the existing bpf prog can clear
> >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
> >> timestamp. For sockops program that stays forever, not all usages expect to do
> >> timestamping for the whole lifetime of the connection. If there is a way for the
> >> prog to turn it on, it should have a way for the prog to turn it off.
> >
> > I see what you meant here. If we don't clear sk_tsflags_bpf, after we
> > detach the bpf program, it will do nothing in __skb_tstamp_tx() and
> > return earlier. It is almost equal to the effect of turning off. It is
> > why I don't handle clearing the flag.
> >
> >>
> >> What is the concern of allowing the bpf prog to disable something that it has
> >> enabled before?
> >
> > Let me give one instance:
> > If one socket is established and stays idle, how can the bpf program
> > clear the tsflags from that socket? I have no idea.
>
> bpf_tcp_iter prog can. That said, the idle connection example is too carry away
> as an excuse that bpf_setsockopt does not need to support turning-off. Sure,
> idle connection is as good as off. and yes, detach is as good as off also.

Thanks, I see.

>
> I am now acting as a broken clock repeating myself that not all use cases run
> for 5 mins and then detach, so I need to be specific here that bpf_setsockopt
> not supporting off is a nack. There are many use cases in production that the
> bpf prog runs forever and wants to turn it on-and-off.
>
> Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc
> that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do
> periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn
> it off if needed.

I'm not insisting not to support this. Just curious why. Now I get it.

>
> I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am
> asking to support clearing the sk_tsflags_bpf in bpf_setsockopt().

Yeah, I know that.

>
> I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits.

It's easy for me to support the function clearing the sk_tsflags_bpf
in the bpf_setsockopt() function. Will do that :)

>
> >
> >>
> >> While we are on bpf_sock_ops_cb_flags, the
> >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
> >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
> >> what interface to use for bpf timestamping.
> >
> > I'm not sure if I understand correctly. I mimicked the use of
> > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
> > bpf_sock_ops_cb_flags_set() in BPF program?
>
> In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant.
> It is as good as testing and setting sk_tsflags_bpf alone.

I will remove it, then the code will be simplified.

>
> This could be some cleanup for the later stage of the set.
>
>  > >>
> >>>
> >>>>
> >>>> Does it also mean either the bpf or the user space can enable the timetstamping
> >>>> but not both? I don't think we can assume this also. It will be hard to deploy
> >>>> the bpf prog in production to collect continuous data. The user space may have
> >>>> some timestamping enabled but the bpf may want to do its parallel investigation
> >>>> also. The user space may rollout timestamping in the future and suddenly break
> >>>> the bpf prog.
> >>>
> >>> Well, IIUC, it's also the basic idea from the current series which
> >>> allows both happening at the same time. Let us put it in a simple way,
> >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then
> >>> one admin deploys the BPF program, that app should be traced both in
> >>> bpf and non-bpf ways.
> >>>
> >>> But Willem doesn't agree about this approach[1] because of hard to debug.
> >>>
> >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
> >>> Regarding to this link, I have a few more words to say: the socket
> >>> could be set through bpf_setsockopt() in different phases not like
> >>> setsockopt(), so in some cases we cannot make setsockopt hard failed.
> >>>
> >>> After rethinking this point more, I still reckon that letting BPF
> >>> program trace timestamping parallelly without caring whether the
> >>> socket is set to the SO_TIMESTAMPING feature through setsockopt()
> >>
> >> I am afraid having both work in parallel is needed. Otherwise, it will be very
> >> hard to deploy a bpf prog to run continuously in scale. Being able to collect
> >> timestamp without worrying about application changes/updates/downgrades is
> >> important. e.g. App changes from no time stamping to time stamping
> >
> > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
> > :) So let me keep the current logic of printing (see the
> > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then
> > I don't need to add some test statements to distinguish which way of
> > printing.
> >
> >>
> >> Please help to add selftests to show how the above cases (1), (2), (3), and
> >> other tsflags/txflags sharing cases will work. This should not be delayed until
> >> the discussion is done. It is needed sooner or later to prove both bpf and
> >> non-bpf ways can work at the same time. It will help the reviewer and also help
> >> to think about the design with a real use case in bpf prog.
> >
> > Got it. But I'm not sure where I should put those test cases? Could
> > you help me point out a good example that I can follow?
>
> Have you looked at the selftests/bpf directory?

Sure, the full bpf program was written based on the examples in
selftests/bpf. But I remembered that selftests/bpf is already
deprecated?

>
> prog_tests/setget_sockopt.c may be something closer to what you need.
>
> There is a recent one in the mailing list also:
>
> https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com/
>
> The expectation is to be able to run the test like this: ./test_progs -t
> setget_sockopt

Thanks for the pointer. I will spend some time studying it.

>
> >
> >>
> >> The example in patch 0 only prints the reported tstamp, can you share how it
> >> will be used to investigate issue?
> >
> > No problem. Please see chapter 3 about "goal" in
> > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.
>
> Thanks.
>
> >
> >> Is it also useful to know when the skb is
> >> written to the kernel during sendmsg()?
> >
> > You are right. Before this patch, normally applications will record an
> > accurate timestamp before do sendmsg().
> >
> > After you remind me of this, I feel that we can add the timestamp
> > print in the future for bpf use.
>
> Yes, please add the sendmsg timestamp capturing in the selftest. It is useful.
>
> >
> >>
> >> Regarding the bpf_setsockopt() can be called in different phase,
> >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
> >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
> >> surprised people will try to trigger some on-and-off timestamping from
> >> bpf-tcp-cc to measure some delay.
> >>
> >>
> >> More about bpf_setsockopt() in different phase, understand that UDP is not your
> >> priority. However, it needs to have some clarity on how UDP will work and how to
> >> enable it. UDP usually has no connect/established phase.
> >
> > For now, I don't expect an extension for UDP because it will bring too
> > much extra work. Could we discuss this later? I mean, after we finish
> > the basic bpf extension :)
>
> Later is fine but before this set lands. I am not asking a full UDP
> implementation but need ideas on how that may look like. We need some clarity on
> how UDP will work and also how much new sockops API extension will be needed to
> decide if sockops is the correct one going forward. I don't want to end up tcp
> is in sockops and UDP (and others) is non sockops.

I see.

After removing the use of "BPF_SOCK_OPS_TEST_FLAG(tp,
BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG)", the bpf extension will be
limited only in TCP, I think.

Using sk->sk_tsflags_bpf as an indicator could work for both protocols.

Thanks for your great help :)

Thanks,
Jason
Jason Xing Oct. 18, 2024, 3:05 a.m. UTC | #20
On Fri, Oct 18, 2024 at 10:52 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 10/16/24 7:28 PM, Jason Xing wrote:
> > > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >>
> > >> On 10/16/24 3:36 AM, Jason Xing wrote:
> > >>>>> If the skb carries the timestamp, there are three cases:
> > >>>>> 1) non-bpf case and users uses setsockopt()
> > >>>>> 2) cmsg case
> > >>>>> 3) bpf case
> > >>
> > >> These should have tests in the selftests/bpf/ sooner than later. (More below).
> > >>
> > >>>>>
> > >>>>> #1 and #2 are already handled well before this patch. I only need to
> > >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> > >>>>> else it could be #1 or #2, then we will let the old way print
> > >>>>> timestamps in __skb_tstamp_tx().
> > >>>>
> > >>>> hmm... I am still not sure I fully understand...but I think I may start getting it.
> > >>>
> > >>> Sorry, my bad. I gave the wrong answer...
> > >>>
> > >>> It should be:
> > >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
> > >>
> > >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?
> > >
> > > Yep.
> > >
> > >>
> > >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
> > >> behavior is to just queue to the sk_error_queue as long as there is
> > >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
> > >> sk_tsflags. This will eventually get ignored when user read it from the error
> > >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?
> > >
> > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > > read the skb from the errqueue but are not able to parse the
> > > timestamps. Please see
> > > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
> > > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
> > >         ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))
> >
> > afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error
> > still returns the skb to the user.
> >
> > I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not
> > set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka
> > traditional) way is that the generated skb will still be queued in the error
> > queue. The user space can still read it but just won't have the timestamp cmsg.
>
> Apparently, we're on the same page. What you were saying is right, of course :)
>
> >
> > If I understand how the v3 may look like, the skb will not be queued in the
> > error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user
>
> Right, skb will not even be cloned or generated, let alone queue it in
> the error queue. For bpf extension, preventing skb to be queued in the
> error queue is a very vital thing.
>
> > space won't get it from the error queue which is a change of behavior. I was
> > saying I am fine but not sure if someone depends on this behavior.
>
> For bpf part, it's okay to bypass the queuing skb into the errqueue
> logic, which has been discussed at netconf before this series with
> Willem also.
> For non-bpf part, I will not touch and modify their prior behaviour.
>
> >
> > I think we start talking pass each other on this. I will wait for the code on
> > this part and the selftest first.
>
> I will keep this code snippets in V3 so that three kinds of printing
> could work parallelly:
> @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>         if (!sk)
>                 return;
>
> +       if (static_branch_unlikely(&bpf_tstamp_control))
> +               bpf_skb_tstamp_tx_output(sk, tstype);
> +
>         skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
>  }
>
> So please forget what we've talked about testing the
> SOF_TIMESTAMPING_SOFTWARE flag thing.
>
> >
> > >
> > >> I suspect
> > >> the user space will still read something from the error queue unless there is
> > >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.
> > >
> > > No, please see above.
> > >
> > >>
> > >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
> > >> from even queuing to the error queue? I think it is ok but I am not sure if
> > >> anyone is depending on the above behavior.
> > >
> > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> >
> > Got it. This part is now understood.
> >
> > It is one of the reasons for my earlier question on which SOF_* that bpf needs
> > to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but
> > lets leave these nits for a little later.
> >
> > However, it will be very useful to highlight which SOF_* will never be used in
> > bpf in v3.
>
> Got it. Will do that.
>
> >
> > > features including cmsg mode. But it will not be used in bpf mode. So
> > > the test statement is enough to divided those three cases into two
> > > groups.
> >
> > >
> > >>
> > >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> > >>> print based on patch [4/12]; if not, we will use
> > >>> bpf_skb_tstamp_tx_output() to print.
> > >>>
> > >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> > >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> > >>> (please see Documentation/networking/timestamping.rst). We can see a
> > >>> good example on how to use in
> > >>> tools/testing/selftests/net/txtimestamp.c:
> > >>> do_test()
> > >>> {
> > >>>           sock_opt = SOF_TIMESTAMPING_SOFTWARE |
> > >>>           ...
> > >>>           if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
> > >>>                                 (char *) &sock_opt, sizeof(sock_opt)))
> > >>> }
> > >>>
> > >>>>
> > >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> > >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> > >>>> It has to be able to set and clear.
> > >>>
> > >>> I cannot find a good time to clear all the sockets which are set
> > >>> through the BPF program. If we detach the BPF program, it will not
> > >>> print of course. Does it really matter if we don't clear the
> > >>> sk_tsflags_bpf?
> > >>
> > >> Yes, it matters. The same reason goes for why the existing bpf prog can clear
> > >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
> > >> timestamp. For sockops program that stays forever, not all usages expect to do
> > >> timestamping for the whole lifetime of the connection. If there is a way for the
> > >> prog to turn it on, it should have a way for the prog to turn it off.
> > >
> > > I see what you meant here. If we don't clear sk_tsflags_bpf, after we
> > > detach the bpf program, it will do nothing in __skb_tstamp_tx() and
> > > return earlier. It is almost equal to the effect of turning off. It is
> > > why I don't handle clearing the flag.
> > >
> > >>
> > >> What is the concern of allowing the bpf prog to disable something that it has
> > >> enabled before?
> > >
> > > Let me give one instance:
> > > If one socket is established and stays idle, how can the bpf program
> > > clear the tsflags from that socket? I have no idea.
> >
> > bpf_tcp_iter prog can. That said, the idle connection example is too carry away
> > as an excuse that bpf_setsockopt does not need to support turning-off. Sure,
> > idle connection is as good as off. and yes, detach is as good as off also.
>
> Thanks, I see.
>
> >
> > I am now acting as a broken clock repeating myself that not all use cases run
> > for 5 mins and then detach, so I need to be specific here that bpf_setsockopt
> > not supporting off is a nack. There are many use cases in production that the
> > bpf prog runs forever and wants to turn it on-and-off.
> >
> > Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc
> > that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do
> > periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn
> > it off if needed.
>
> I'm not insisting not to support this. Just curious why. Now I get it.
>
> >
> > I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am
> > asking to support clearing the sk_tsflags_bpf in bpf_setsockopt().
>
> Yeah, I know that.
>
> >
> > I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits.
>
> It's easy for me to support the function clearing the sk_tsflags_bpf
> in the bpf_setsockopt() function. Will do that :)
>
> >
> > >
> > >>
> > >> While we are on bpf_sock_ops_cb_flags, the
> > >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
> > >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
> > >> what interface to use for bpf timestamping.
> > >
> > > I'm not sure if I understand correctly. I mimicked the use of
> > > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
> > > bpf_sock_ops_cb_flags_set() in BPF program?
> >
> > In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant.
> > It is as good as testing and setting sk_tsflags_bpf alone.
>
> I will remove it, then the code will be simplified.
>
> >
> > This could be some cleanup for the later stage of the set.
> >
> >  > >>
> > >>>
> > >>>>
> > >>>> Does it also mean either the bpf or the user space can enable the timetstamping
> > >>>> but not both? I don't think we can assume this also. It will be hard to deploy
> > >>>> the bpf prog in production to collect continuous data. The user space may have
> > >>>> some timestamping enabled but the bpf may want to do its parallel investigation
> > >>>> also. The user space may rollout timestamping in the future and suddenly break
> > >>>> the bpf prog.
> > >>>
> > >>> Well, IIUC, it's also the basic idea from the current series which
> > >>> allows both happening at the same time. Let us put it in a simple way,
> > >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then
> > >>> one admin deploys the BPF program, that app should be traced both in
> > >>> bpf and non-bpf ways.
> > >>>
> > >>> But Willem doesn't agree about this approach[1] because of hard to debug.
> > >>>
> > >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
> > >>> Regarding to this link, I have a few more words to say: the socket
> > >>> could be set through bpf_setsockopt() in different phases not like
> > >>> setsockopt(), so in some cases we cannot make setsockopt hard failed.
> > >>>
> > >>> After rethinking this point more, I still reckon that letting BPF
> > >>> program trace timestamping parallelly without caring whether the
> > >>> socket is set to the SO_TIMESTAMPING feature through setsockopt()
> > >>
> > >> I am afraid having both work in parallel is needed. Otherwise, it will be very
> > >> hard to deploy a bpf prog to run continuously in scale. Being able to collect
> > >> timestamp without worrying about application changes/updates/downgrades is
> > >> important. e.g. App changes from no time stamping to time stamping
> > >
> > > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
> > > :) So let me keep the current logic of printing (see the
> > > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then
> > > I don't need to add some test statements to distinguish which way of
> > > printing.
> > >
> > >>
> > >> Please help to add selftests to show how the above cases (1), (2), (3), and
> > >> other tsflags/txflags sharing cases will work. This should not be delayed until
> > >> the discussion is done. It is needed sooner or later to prove both bpf and
> > >> non-bpf ways can work at the same time. It will help the reviewer and also help
> > >> to think about the design with a real use case in bpf prog.
> > >
> > > Got it. But I'm not sure where I should put those test cases? Could
> > > you help me point out a good example that I can follow?
> >
> > Have you looked at the selftests/bpf directory?
>
> Sure, the full bpf program was written based on the examples in
> selftests/bpf. But I remembered that selftests/bpf is already
> deprecated?
>
> >
> > prog_tests/setget_sockopt.c may be something closer to what you need.
> >
> > There is a recent one in the mailing list also:
> >
> > https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com/
> >
> > The expectation is to be able to run the test like this: ./test_progs -t
> > setget_sockopt
>
> Thanks for the pointer. I will spend some time studying it.
>
> >
> > >
> > >>
> > >> The example in patch 0 only prints the reported tstamp, can you share how it
> > >> will be used to investigate issue?
> > >
> > > No problem. Please see chapter 3 about "goal" in
> > > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.
> >
> > Thanks.
> >
> > >
> > >> Is it also useful to know when the skb is
> > >> written to the kernel during sendmsg()?
> > >
> > > You are right. Before this patch, normally applications will record an
> > > accurate timestamp before do sendmsg().
> > >
> > > After you remind me of this, I feel that we can add the timestamp
> > > print in the future for bpf use.
> >
> > Yes, please add the sendmsg timestamp capturing in the selftest. It is useful.
> >
> > >
> > >>
> > >> Regarding the bpf_setsockopt() can be called in different phase,
> > >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
> > >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
> > >> surprised people will try to trigger some on-and-off timestamping from
> > >> bpf-tcp-cc to measure some delay.
> > >>
> > >>
> > >> More about bpf_setsockopt() in different phase, understand that UDP is not your
> > >> priority. However, it needs to have some clarity on how UDP will work and how to
> > >> enable it. UDP usually has no connect/established phase.
> > >
> > > For now, I don't expect an extension for UDP because it will bring too
> > > much extra work. Could we discuss this later? I mean, after we finish
> > > the basic bpf extension :)
> >
> > Later is fine but before this set lands. I am not asking a full UDP
> > implementation but need ideas on how that may look like. We need some clarity on
> > how UDP will work and also how much new sockops API extension will be needed to
> > decide if sockops is the correct one going forward. I don't want to end up tcp
> > is in sockops and UDP (and others) is non sockops.

The differences between TCP and UDP are:
1) TCP supports SOF_TIMESTAMPING_TX_ACK, SOF_TIMESTAMPING_OPT_ID_TCP,
while UDP does not.
2) the tskey works in different ways.
We have a good example to run and test:
tools/testing/selftests/net/txtimestamp.c

If that arouses your interesting, you could try:
1) for UDP, ./txtimestamp -4 -L 127.0.0.1 -l 1000  -c 2 -u
2) for TCP, ./txtimestamp -4 -L 127.0.0.1 -l 1000  -c 2

I think the V3 series could easily support the UDP protocol. Let me
try, then we could discuss more based on V3.

Thanks,
Jason
Willem de Bruijn Oct. 20, 2024, 9:51 p.m. UTC | #21
Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Willem suggested that we use a static key to control. The advantage
> is that we will not affect the existing applications at all if we
> don't load BPF program.
> 
> In this patch, except the static key, I also add one logic that is
> used to test if the socket has enabled its tsflags in order to
> support bpf logic to allow both cases to happen at the same time.
> Or else, the skb carring related timestamp flag doesn't know which
> way of printing is desirable.
> 
> One thing important is this patch allows print from both applications
> and bpf program at the same time. Now we have three kinds of print:
> 1) only BPF program prints
> 2) only application program prints
> 3) both can print without side effect
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>

Getting back to this thread. It is long, instead of responding to
multiple messages, let me combine them in a single response.


* On future extensions:

+1 that the UDP case, and datagrams more broadly, must have a clear
development path, before we can merge TCP.

Similarly, hardware timestamps need not be supported from the start,
but must clearly be supportable.


* On queueing packets to userspace:

> > the current behavior is to just queue to the sk_error_queue as long
> > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > is regardless of the sk_tsflags. "

> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> read the skb from the errqueue but are not able to parse the
> timestamps

Before queuing a packet to userspace on the error queue, the relevant
reporting flag is always tested. sock_recv_timestamp has:

        /*
         * generate control messages if
         * - receive time stamping in software requested
         * - software time stamp available and wanted
         * - hardware time stamps available and wanted
         */
        if (sock_flag(sk, SOCK_RCVTSTAMP) ||
            (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
            (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
            (hwtstamps->hwtstamp &&
             (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
                __sock_recv_timestamp(msg, sk, skb);

Otherwise applications could get error messages queued, and
epoll/poll/select would unexpectedly behave differently.

> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> features including cmsg mode. But it will not be used in bpf mode. 

For simplicity, the two uses of the API are best kept identical. If
there is a technical reason why BPF has to diverge from established
behavior, this needs to be explicitly called out in the commit
message.

Also, if you want to extend the API for BPF in the future, good to
call this out now and ideally extensions will apply to both, to
maintain a uniform API.


* On extra measurement points, at sendmsg or tcp_write_xmit:

The first is interesting. For application timestamping, this was
never needed, as the application can just call clock_gettime before
sendmsg.

In general, additional measurement points are not only useful if the
interval between is not constant. So far, we have seen no need for
any additional points.


* On skb state:

> > For now, is there thing we can explore to share in the skb_shared_info?

skb_shinfo space is at a premium. I don't think we can justify two
extra fields just for this use case.

> My initial thought is just to reuse these fields in skb. It can work
> without interfering one another.

I'm skeptical that two methods can work at the same time. If they are
started at different times, their sk_tskey will be different, for one.

There may be workarounds. Maybe BPF can store its state in some BPF
specific field, indeed. Or perhaps it can store per-sk shadow state
that resolves the conflict. For instance, the offset between sk_tskey
and bpf_tskey.
Jason Xing Oct. 21, 2024, 3:21 a.m. UTC | #22
On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Willem suggested that we use a static key to control. The advantage
> > is that we will not affect the existing applications at all if we
> > don't load BPF program.
> >
> > In this patch, except the static key, I also add one logic that is
> > used to test if the socket has enabled its tsflags in order to
> > support bpf logic to allow both cases to happen at the same time.
> > Or else, the skb carring related timestamp flag doesn't know which
> > way of printing is desirable.
> >
> > One thing important is this patch allows print from both applications
> > and bpf program at the same time. Now we have three kinds of print:
> > 1) only BPF program prints
> > 2) only application program prints
> > 3) both can print without side effect
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
>
> Getting back to this thread. It is long, instead of responding to
> multiple messages, let me combine them in a single response.

Thank you so much!

>
>
> * On future extensions:
>
> +1 that the UDP case, and datagrams more broadly, must have a clear
> development path, before we can merge TCP.
>
> Similarly, hardware timestamps need not be supported from the start,
> but must clearly be supportable.

Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing
the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)
could work well for both protos. Let me give it a try first.

>
>
> * On queueing packets to userspace:
>
> > > the current behavior is to just queue to the sk_error_queue as long
> > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > > is regardless of the sk_tsflags. "
>
> > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > read the skb from the errqueue but are not able to parse the
> > timestamps

Above is what I tried to explain how the application timestamping
feature works, not what I tried to implement for the BPF extension.

>
> Before queuing a packet to userspace on the error queue, the relevant
> reporting flag is always tested. sock_recv_timestamp has:
>
>         /*
>          * generate control messages if
>          * - receive time stamping in software requested
>          * - software time stamp available and wanted
>          * - hardware time stamps available and wanted
>          */
>         if (sock_flag(sk, SOCK_RCVTSTAMP) ||
>             (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
>             (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
>             (hwtstamps->hwtstamp &&
>              (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
>                 __sock_recv_timestamp(msg, sk, skb);
>
> Otherwise applications could get error messages queued, and
> epoll/poll/select would unexpectedly behave differently.

Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE
flag for BPF.

>
> > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> > features including cmsg mode. But it will not be used in bpf mode.
>
> For simplicity, the two uses of the API are best kept identical. If
> there is a technical reason why BPF has to diverge from established
> behavior, this needs to be explicitly called out in the commit
> message.
>
> Also, if you want to extend the API for BPF in the future, good to
> call this out now and ideally extensions will apply to both, to
> maintain a uniform API.

As you said, I also agree on "two uses of the API are best kept identical".

>
>
> * On extra measurement points, at sendmsg or tcp_write_xmit:
>
> The first is interesting. For application timestamping, this was
> never needed, as the application can just call clock_gettime before
> sendmsg.

Yes, we could add it after we finish the current series. I'm going to
write it down on my todo list.

>
> In general, additional measurement points are not only useful if the
> interval between is not constant. So far, we have seen no need for
> any additional points.

Taking a snapshot of tcp_write_xmit() could be useful especially when
the skb is not transmitted due to nagle algorithm.

>
>
> * On skb state:
>
> > > For now, is there thing we can explore to share in the skb_shared_info?
>
> skb_shinfo space is at a premium. I don't think we can justify two
> extra fields just for this use case.
>
> > My initial thought is just to reuse these fields in skb. It can work
> > without interfering one another.
>
> I'm skeptical that two methods can work at the same time. If they are
> started at different times, their sk_tskey will be different, for one.

Right, sk_tskey is the only special one that I will take care of.
Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be
reused.

>
> There may be workarounds. Maybe BPF can store its state in some BPF
> specific field, indeed. Or perhaps it can store per-sk shadow state
> that resolves the conflict. For instance, the offset between sk_tskey
> and bpf_tskey.

Things could get complicated in the future if we want to unified the
final tskey value for all the cases. Since 1) the value of
shinfo->tskey depends on skb seq and len, 2) the final tskey output is
the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in
struct sock and related output logic for bpf without caring if it's
the same as sk_tskey.

That said, the outputs from two methods differ. Do you think it is
acceptable? It could be simpler and easier if we keep them identical.

Again, thanks for your long conclusion and every review.

Thanks,
Jason
Willem de Bruijn Oct. 21, 2024, 2:49 p.m. UTC | #23
Jason Xing wrote:
> On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Willem suggested that we use a static key to control. The advantage
> > > is that we will not affect the existing applications at all if we
> > > don't load BPF program.
> > >
> > > In this patch, except the static key, I also add one logic that is
> > > used to test if the socket has enabled its tsflags in order to
> > > support bpf logic to allow both cases to happen at the same time.
> > > Or else, the skb carring related timestamp flag doesn't know which
> > > way of printing is desirable.
> > >
> > > One thing important is this patch allows print from both applications
> > > and bpf program at the same time. Now we have three kinds of print:
> > > 1) only BPF program prints
> > > 2) only application program prints
> > > 3) both can print without side effect
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >
> > Getting back to this thread. It is long, instead of responding to
> > multiple messages, let me combine them in a single response.
> 
> Thank you so much!
> 
> >
> >
> > * On future extensions:
> >
> > +1 that the UDP case, and datagrams more broadly, must have a clear
> > development path, before we can merge TCP.
> >
> > Similarly, hardware timestamps need not be supported from the start,
> > but must clearly be supportable.
> 
> Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing
> the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)
> could work well for both protos. Let me give it a try first.

Great, thanks.

> >
> >
> > * On queueing packets to userspace:
> >
> > > > the current behavior is to just queue to the sk_error_queue as long
> > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > > > is regardless of the sk_tsflags. "
> >
> > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > > read the skb from the errqueue but are not able to parse the
> > > timestamps
> 
> Above is what I tried to explain how the application timestamping
> feature works, not what I tried to implement for the BPF extension.
> 
> >
> > Before queuing a packet to userspace on the error queue, the relevant
> > reporting flag is always tested. sock_recv_timestamp has:
> >
> >         /*
> >          * generate control messages if
> >          * - receive time stamping in software requested
> >          * - software time stamp available and wanted
> >          * - hardware time stamps available and wanted
> >          */
> >         if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> >             (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> >             (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> >             (hwtstamps->hwtstamp &&
> >              (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> >                 __sock_recv_timestamp(msg, sk, skb);
> >
> > Otherwise applications could get error messages queued, and
> > epoll/poll/select would unexpectedly behave differently.
> 
> Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE
> flag for BPF.

Can you elaborate on this? This sounds like it would go against the
intent to have the two versions of the API (application and BPF) be
equivalent.
 
> >
> > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> > > features including cmsg mode. But it will not be used in bpf mode.
> >
> > For simplicity, the two uses of the API are best kept identical. If
> > there is a technical reason why BPF has to diverge from established
> > behavior, this needs to be explicitly called out in the commit
> > message.
> >
> > Also, if you want to extend the API for BPF in the future, good to
> > call this out now and ideally extensions will apply to both, to
> > maintain a uniform API.
> 
> As you said, I also agree on "two uses of the API are best kept identical".
> 
> >
> >
> > * On extra measurement points, at sendmsg or tcp_write_xmit:
> >
> > The first is interesting. For application timestamping, this was
> > never needed, as the application can just call clock_gettime before
> > sendmsg.
> 
> Yes, we could add it after we finish the current series. I'm going to
> write it down on my todo list.
> 
> >
> > In general, additional measurement points are not only useful if the
> > interval between is not constant. So far, we have seen no need for
> > any additional points.
> 
> Taking a snapshot of tcp_write_xmit() could be useful especially when
> the skb is not transmitted due to nagle algorithm.
> 
> >
> >
> > * On skb state:
> >
> > > > For now, is there thing we can explore to share in the skb_shared_info?
> >
> > skb_shinfo space is at a premium. I don't think we can justify two
> > extra fields just for this use case.
> >
> > > My initial thought is just to reuse these fields in skb. It can work
> > > without interfering one another.
> >
> > I'm skeptical that two methods can work at the same time. If they are
> > started at different times, their sk_tskey will be different, for one.
> 
> Right, sk_tskey is the only special one that I will take care of.
> Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be
> reused.
> 
> >
> > There may be workarounds. Maybe BPF can store its state in some BPF
> > specific field, indeed. Or perhaps it can store per-sk shadow state
> > that resolves the conflict. For instance, the offset between sk_tskey
> > and bpf_tskey.
> 
> Things could get complicated in the future if we want to unified the
> final tskey value for all the cases. Since 1) the value of
> shinfo->tskey depends on skb seq and len, 2) the final tskey output is
> the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in
> struct sock and related output logic for bpf without caring if it's
> the same as sk_tskey.

I think we can add fields to struct sock without too much concern.
Adding fields to sk_buff or skb_shared_info would be more difficult.

> That said, the outputs from two methods differ. Do you think it is
> acceptable? It could be simpler and easier if we keep them identical.

Since we can only have one skb_shared_info.tskey, if both user and bpf
request OPT_ID, starting at different times, then we will have two
bases against which to compute the difference. Having two fields in
struct sock should suffice.
Jason Xing Oct. 21, 2024, 3:05 p.m. UTC | #24
On Mon, Oct 21, 2024 at 10:49 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Willem suggested that we use a static key to control. The advantage
> > > > is that we will not affect the existing applications at all if we
> > > > don't load BPF program.
> > > >
> > > > In this patch, except the static key, I also add one logic that is
> > > > used to test if the socket has enabled its tsflags in order to
> > > > support bpf logic to allow both cases to happen at the same time.
> > > > Or else, the skb carring related timestamp flag doesn't know which
> > > > way of printing is desirable.
> > > >
> > > > One thing important is this patch allows print from both applications
> > > > and bpf program at the same time. Now we have three kinds of print:
> > > > 1) only BPF program prints
> > > > 2) only application program prints
> > > > 3) both can print without side effect
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > >
> > > Getting back to this thread. It is long, instead of responding to
> > > multiple messages, let me combine them in a single response.
> >
> > Thank you so much!
> >
> > >
> > >
> > > * On future extensions:
> > >
> > > +1 that the UDP case, and datagrams more broadly, must have a clear
> > > development path, before we can merge TCP.
> > >
> > > Similarly, hardware timestamps need not be supported from the start,
> > > but must clearly be supportable.
> >
> > Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing
> > the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)
> > could work well for both protos. Let me give it a try first.
>
> Great, thanks.
>
> > >
> > >
> > > * On queueing packets to userspace:
> > >
> > > > > the current behavior is to just queue to the sk_error_queue as long
> > > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > > > > is regardless of the sk_tsflags. "
> > >
> > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > > > read the skb from the errqueue but are not able to parse the
> > > > timestamps
> >
> > Above is what I tried to explain how the application timestamping
> > feature works, not what I tried to implement for the BPF extension.
> >
> > >
> > > Before queuing a packet to userspace on the error queue, the relevant
> > > reporting flag is always tested. sock_recv_timestamp has:
> > >
> > >         /*
> > >          * generate control messages if
> > >          * - receive time stamping in software requested
> > >          * - software time stamp available and wanted
> > >          * - hardware time stamps available and wanted
> > >          */
> > >         if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> > >             (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> > >             (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> > >             (hwtstamps->hwtstamp &&
> > >              (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> > >                 __sock_recv_timestamp(msg, sk, skb);
> > >
> > > Otherwise applications could get error messages queued, and
> > > epoll/poll/select would unexpectedly behave differently.
> >
> > Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE
> > flag for BPF.
>
> Can you elaborate on this? This sounds like it would go against the
> intent to have the two versions of the API (application and BPF) be
> equivalent.

Oh, I see what you mean here. I have no preference. Well, I can add
this report flag into the BPF extension like how application
timestamping works.

>
> > >
> > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> > > > features including cmsg mode. But it will not be used in bpf mode.
> > >
> > > For simplicity, the two uses of the API are best kept identical. If
> > > there is a technical reason why BPF has to diverge from established
> > > behavior, this needs to be explicitly called out in the commit
> > > message.
> > >
> > > Also, if you want to extend the API for BPF in the future, good to
> > > call this out now and ideally extensions will apply to both, to
> > > maintain a uniform API.
> >
> > As you said, I also agree on "two uses of the API are best kept identical".
> >
> > >
> > >
> > > * On extra measurement points, at sendmsg or tcp_write_xmit:
> > >
> > > The first is interesting. For application timestamping, this was
> > > never needed, as the application can just call clock_gettime before
> > > sendmsg.
> >
> > Yes, we could add it after we finish the current series. I'm going to
> > write it down on my todo list.
> >
> > >
> > > In general, additional measurement points are not only useful if the
> > > interval between is not constant. So far, we have seen no need for
> > > any additional points.
> >
> > Taking a snapshot of tcp_write_xmit() could be useful especially when
> > the skb is not transmitted due to nagle algorithm.
> >
> > >
> > >
> > > * On skb state:
> > >
> > > > > For now, is there thing we can explore to share in the skb_shared_info?
> > >
> > > skb_shinfo space is at a premium. I don't think we can justify two
> > > extra fields just for this use case.
> > >
> > > > My initial thought is just to reuse these fields in skb. It can work
> > > > without interfering one another.
> > >
> > > I'm skeptical that two methods can work at the same time. If they are
> > > started at different times, their sk_tskey will be different, for one.
> >
> > Right, sk_tskey is the only special one that I will take care of.
> > Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be
> > reused.
> >
> > >
> > > There may be workarounds. Maybe BPF can store its state in some BPF
> > > specific field, indeed. Or perhaps it can store per-sk shadow state
> > > that resolves the conflict. For instance, the offset between sk_tskey
> > > and bpf_tskey.
> >
> > Things could get complicated in the future if we want to unified the
> > final tskey value for all the cases. Since 1) the value of
> > shinfo->tskey depends on skb seq and len, 2) the final tskey output is
> > the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in
> > struct sock and related output logic for bpf without caring if it's
> > the same as sk_tskey.
>
> I think we can add fields to struct sock without too much concern.
> Adding fields to sk_buff or skb_shared_info would be more difficult.

Got it:)

>
> > That said, the outputs from two methods differ. Do you think it is
> > acceptable? It could be simpler and easier if we keep them identical.
>
> Since we can only have one skb_shared_info.tskey, if both user and bpf
> request OPT_ID, starting at different times, then we will have two
> bases against which to compute the difference. Having two fields in
> struct sock should suffice.

Exactly! I will do it.

Thanks,
Jason
Martin KaFai Lau Oct. 22, 2024, 12:53 a.m. UTC | #25
On 10/20/24 2:51 PM, Willem de Bruijn wrote:
> Jason Xing wrote:
>> From: Jason Xing <kernelxing@tencent.com>
>>
>> Willem suggested that we use a static key to control. The advantage
>> is that we will not affect the existing applications at all if we
>> don't load BPF program.
>>
>> In this patch, except the static key, I also add one logic that is
>> used to test if the socket has enabled its tsflags in order to
>> support bpf logic to allow both cases to happen at the same time.
>> Or else, the skb carring related timestamp flag doesn't know which
>> way of printing is desirable.
>>
>> One thing important is this patch allows print from both applications
>> and bpf program at the same time. Now we have three kinds of print:
>> 1) only BPF program prints
>> 2) only application program prints
>> 3) both can print without side effect
>>
>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> 
> Getting back to this thread. It is long, instead of responding to
> multiple messages, let me combine them in a single response.
> 
> 
> * On future extensions:
> 
> +1 that the UDP case, and datagrams more broadly, must have a clear
> development path, before we can merge TCP.
> 
> Similarly, hardware timestamps need not be supported from the start,
> but must clearly be supportable.
> 
> 
> * On queueing packets to userspace:
> 
>>> the current behavior is to just queue to the sk_error_queue as long
>>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
>>> is regardless of the sk_tsflags. "
> 
>> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
>> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
>> read the skb from the errqueue but are not able to parse the
>> timestamps
> 
> Before queuing a packet to userspace on the error queue, the relevant
> reporting flag is always tested. sock_recv_timestamp has:
> 
>          /*
>           * generate control messages if
>           * - receive time stamping in software requested
>           * - software time stamp available and wanted
>           * - hardware time stamps available and wanted
>           */
>          if (sock_flag(sk, SOCK_RCVTSTAMP) ||
>              (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
>              (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
>              (hwtstamps->hwtstamp &&
>               (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
>                  __sock_recv_timestamp(msg, sk, skb);
> 
> Otherwise applications could get error messages queued, and
> epoll/poll/select would unexpectedly behave differently.

I just tried the following diff to remove setsockopt from txtimestamp.c and run 
"./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue 
with only cmsg flag. I did a printk in __skb_tstamp_tx to ensure the
sk->sk_tsflags is empty also.

diff --git i/tools/testing/selftests/net/txtimestamp.c 
w/tools/testing/selftests/net/txtimestamp.c
index dae91eb97d69..5d9d2773b076 100644
--- i/tools/testing/selftests/net/txtimestamp.c
+++ w/tools/testing/selftests/net/txtimestamp.c
@@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int 
payload_len)
  	for (cm = CMSG_FIRSTHDR(msg);
  	     cm && cm->cmsg_len;
  	     cm = CMSG_NXTHDR(msg, cm)) {
+		printf("cm->cmsg_level %d cm->cmsg_type %d\n",
+		       cm->cmsg_level, cm->cmsg_type);
  		if (cm->cmsg_level == SOL_SOCKET &&
  		    cm->cmsg_type == SCM_TIMESTAMPING) {
  			tss = (void *) CMSG_DATA(cm);
@@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int 
payload_len)
  	if (batch > 1) {
  		fprintf(stderr, "batched %d timestamps\n", batch);
  	} else if (!batch) {
-		fprintf(stderr, "Failed to report timestamps\n");
+		fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len);
  		test_failed = true;
  	}
  }
@@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt)
  	if (cfg_loop_nodata)
  		sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY;

+	(void)sock_opt;
+/*
  	if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
  		       (char *) &sock_opt, sizeof(sock_opt)))
  		error(1, 0, "setsockopt timestamping");
+*/

  	for (i = 0; i < cfg_num_pkts; i++) {
  		memset(&msg, 0, sizeof(msg));
> 
>> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
>> features including cmsg mode. But it will not be used in bpf mode.
> 
> For simplicity, the two uses of the API are best kept identical. If
> there is a technical reason why BPF has to diverge from established
> behavior, this needs to be explicitly called out in the commit
> message.

SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be 
passed directly to the bpf if needed without extra cost. The same probably goes 
for SOF_TIMESTAMPING_OPT_PKTINFO. SOF_TIMESTAMPING_SOFTWARE does not seem to be 
useful either. I think only a subset of SOF_* will be supported, probably only 
the TX_* and RX_* ones.

> 
> Also, if you want to extend the API for BPF in the future, good to
> call this out now and ideally extensions will apply to both, to
> maintain a uniform API.
> 
> 
> * On extra measurement points, at sendmsg or tcp_write_xmit:
> 
> The first is interesting. For application timestamping, this was
> never needed, as the application can just call clock_gettime before
> sendmsg.
> 
> In general, additional measurement points are not only useful if the
> interval between is not constant. So far, we have seen no need for
> any additional points.
> 
> 
> * On skb state:
> 
>>> For now, is there thing we can explore to share in the skb_shared_info?
> 
> skb_shinfo space is at a premium. I don't think we can justify two
> extra fields just for this use case.
> 
>> My initial thought is just to reuse these fields in skb. It can work
>> without interfering one another.
> 
> I'm skeptical that two methods can work at the same time. If they are
> started at different times, their sk_tskey will be different, for one.

For the skb's tx_flags, Jason seems to be able to figure out by only using the 
new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in 
tx_flags.

I am also not very positive on the skb's tskey for now.

Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping 
the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I 
think that approach was stalled because the tx_flags could be changed by the 
netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the 
skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed 
until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone 
in the tcp stack will still break things if the hwtstamps is reused for other 
purpose?

> 
> There may be workarounds. Maybe BPF can store its state in some BPF
> specific field, indeed. Or perhaps it can store per-sk shadow state
> that resolves the conflict. For instance, the offset between sk_tskey
> and bpf_tskey.

I have also been proposing to explore other way for the key since bpf has direct 
access to the skb (also the sk, bpf prog can store data in the sk).

The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack 
comes back, it can also learn the ack seq no. Does it help? It will be harder to 
use because it probably needs to store this info in the bpf map (or in the bpf 
sk storage). However, if it needs to learn the timestamp at the 
tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored 
somewhere also. Either in a bpf map or in a bpf sk storage.

SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g. 
it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use 
SOF_TIMESTAMPING_OPT_ID.
Jason Xing Oct. 22, 2024, 2:30 a.m. UTC | #26
On Tue, Oct 22, 2024 at 8:53 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/20/24 2:51 PM, Willem de Bruijn wrote:
> > Jason Xing wrote:
> >> From: Jason Xing <kernelxing@tencent.com>
> >>
> >> Willem suggested that we use a static key to control. The advantage
> >> is that we will not affect the existing applications at all if we
> >> don't load BPF program.
> >>
> >> In this patch, except the static key, I also add one logic that is
> >> used to test if the socket has enabled its tsflags in order to
> >> support bpf logic to allow both cases to happen at the same time.
> >> Or else, the skb carring related timestamp flag doesn't know which
> >> way of printing is desirable.
> >>
> >> One thing important is this patch allows print from both applications
> >> and bpf program at the same time. Now we have three kinds of print:
> >> 1) only BPF program prints
> >> 2) only application program prints
> >> 3) both can print without side effect
> >>
> >> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >
> > Getting back to this thread. It is long, instead of responding to
> > multiple messages, let me combine them in a single response.
> >
> >
> > * On future extensions:
> >
> > +1 that the UDP case, and datagrams more broadly, must have a clear
> > development path, before we can merge TCP.
> >
> > Similarly, hardware timestamps need not be supported from the start,
> > but must clearly be supportable.
> >
> >
> > * On queueing packets to userspace:
> >
> >>> the current behavior is to just queue to the sk_error_queue as long
> >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> >>> is regardless of the sk_tsflags. "
> >
> >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> >> read the skb from the errqueue but are not able to parse the
> >> timestamps
> >
> > Before queuing a packet to userspace on the error queue, the relevant
> > reporting flag is always tested. sock_recv_timestamp has:
> >
> >          /*
> >           * generate control messages if
> >           * - receive time stamping in software requested
> >           * - software time stamp available and wanted
> >           * - hardware time stamps available and wanted
> >           */
> >          if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> >              (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> >              (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> >              (hwtstamps->hwtstamp &&
> >               (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> >                  __sock_recv_timestamp(msg, sk, skb);
> >
> > Otherwise applications could get error messages queued, and
> > epoll/poll/select would unexpectedly behave differently.
>
> I just tried the following diff to remove setsockopt from txtimestamp.c and run
> "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue
> with only cmsg flag. I did a printk in __skb_tstamp_tx to ensure the
> sk->sk_tsflags is empty also.
>
> diff --git i/tools/testing/selftests/net/txtimestamp.c
> w/tools/testing/selftests/net/txtimestamp.c
> index dae91eb97d69..5d9d2773b076 100644
> --- i/tools/testing/selftests/net/txtimestamp.c
> +++ w/tools/testing/selftests/net/txtimestamp.c
> @@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int
> payload_len)
>         for (cm = CMSG_FIRSTHDR(msg);
>              cm && cm->cmsg_len;
>              cm = CMSG_NXTHDR(msg, cm)) {
> +               printf("cm->cmsg_level %d cm->cmsg_type %d\n",
> +                      cm->cmsg_level, cm->cmsg_type);
>                 if (cm->cmsg_level == SOL_SOCKET &&
>                     cm->cmsg_type == SCM_TIMESTAMPING) {
>                         tss = (void *) CMSG_DATA(cm);
> @@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int
> payload_len)
>         if (batch > 1) {
>                 fprintf(stderr, "batched %d timestamps\n", batch);
>         } else if (!batch) {
> -               fprintf(stderr, "Failed to report timestamps\n");
> +               fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len);
>                 test_failed = true;
>         }
>   }
> @@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt)
>         if (cfg_loop_nodata)
>                 sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY;
>
> +       (void)sock_opt;
> +/*
>         if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
>                        (char *) &sock_opt, sizeof(sock_opt)))
>                 error(1, 0, "setsockopt timestamping");
> +*/
>
>         for (i = 0; i < cfg_num_pkts; i++) {
>                 memset(&msg, 0, sizeof(msg));
> >
> >> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> >> features including cmsg mode. But it will not be used in bpf mode.
> >
> > For simplicity, the two uses of the API are best kept identical. If
> > there is a technical reason why BPF has to diverge from established
> > behavior, this needs to be explicitly called out in the commit
> > message.
>
> SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be
> passed directly to the bpf if needed without extra cost. The same probably goes
> for SOF_TIMESTAMPING_OPT_PKTINFO.

Right, they will not be supported.

> SOF_TIMESTAMPING_SOFTWARE does not seem to be
> useful either. I think only a subset of SOF_* will be supported, probably only

I had a discussion with Willem on this point yesterday. If I
understand what Willem was thinking correctly, he doesn't expect
users' behaviors to change too much.

As I said previously, I have no strong preference. Whether keeping
this report flag or not doesn't affect the core logic for BPF
extension.

> the TX_* and RX_* ones.
>
> >
> > Also, if you want to extend the API for BPF in the future, good to
> > call this out now and ideally extensions will apply to both, to
> > maintain a uniform API.
> >
> >
> > * On extra measurement points, at sendmsg or tcp_write_xmit:
> >
> > The first is interesting. For application timestamping, this was
> > never needed, as the application can just call clock_gettime before
> > sendmsg.
> >
> > In general, additional measurement points are not only useful if the
> > interval between is not constant. So far, we have seen no need for
> > any additional points.
> >
> >
> > * On skb state:
> >
> >>> For now, is there thing we can explore to share in the skb_shared_info?
> >
> > skb_shinfo space is at a premium. I don't think we can justify two
> > extra fields just for this use case.
> >
> >> My initial thought is just to reuse these fields in skb. It can work
> >> without interfering one another.
> >
> > I'm skeptical that two methods can work at the same time. If they are
> > started at different times, their sk_tskey will be different, for one.
>
> For the skb's tx_flags, Jason seems to be able to figure out by only using the
> new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in
> tx_flags.

Let me try, then we'll see if it works.

>
> I am also not very positive on the skb's tskey for now.

For TCP, the final output of tskey that is reflected to users is the
result of this calculation "shinfo->tskey - $KEY". $KEY is the base
which could be either sk->sk_tskey or sk->sk_tskey_bpf. They are
initialized at different points.

You can see the calculation in __skb_complete_tx_timestamp():
serr->ee.ee_data = skb_shinfo(skb)->tskey;
serr->ee.ee_data -= atomic_read(&sk->sk_tskey);

With that said, we will keep two different $KEY to let each feature
(bpf SO_TIMESTAMPING or application SO_TIMESTAMPING) work
respectively, which also means, we probably will see two different
tskeys when two methods work parallely. It's fine because as long as
we can make sure the final tskeys are consistent in each feature.
tskey is used to identify which sendmsg() the skb should belong to.

It also works for UDP proto.

>
> Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping
> the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I
> think that approach was stalled because the tx_flags could be changed by the
> netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the
> skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed
> until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone
> in the tcp stack will still break things if the hwtstamps is reused for other
> purpose?
>
> >
> > There may be workarounds. Maybe BPF can store its state in some BPF
> > specific field, indeed. Or perhaps it can store per-sk shadow state
> > that resolves the conflict. For instance, the offset between sk_tskey
> > and bpf_tskey.
>
> I have also been proposing to explore other way for the key since bpf has direct
> access to the skb (also the sk, bpf prog can store data in the sk).
>
> The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack
> comes back, it can also learn the ack seq no. Does it help? It will be harder to
> use because it probably needs to store this info in the bpf map (or in the bpf
> sk storage). However, if it needs to learn the timestamp at the
> tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored
> somewhere also. Either in a bpf map or in a bpf sk storage.

Thanks for the idea. But please see the above comment, we could keep
the logic as simple as it is :)

>
> SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g.
> it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use
> SOF_TIMESTAMPING_OPT_ID.

Interesting.

Thanks,
Jason
Willem de Bruijn Oct. 23, 2024, 12:17 a.m. UTC | #27
Martin KaFai Lau wrote:
> On 10/20/24 2:51 PM, Willem de Bruijn wrote:
> > Jason Xing wrote:
> >> From: Jason Xing <kernelxing@tencent.com>
> >>
> >> Willem suggested that we use a static key to control. The advantage
> >> is that we will not affect the existing applications at all if we
> >> don't load BPF program.
> >>
> >> In this patch, except the static key, I also add one logic that is
> >> used to test if the socket has enabled its tsflags in order to
> >> support bpf logic to allow both cases to happen at the same time.
> >> Or else, the skb carring related timestamp flag doesn't know which
> >> way of printing is desirable.
> >>
> >> One thing important is this patch allows print from both applications
> >> and bpf program at the same time. Now we have three kinds of print:
> >> 1) only BPF program prints
> >> 2) only application program prints
> >> 3) both can print without side effect
> >>
> >> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > 
> > Getting back to this thread. It is long, instead of responding to
> > multiple messages, let me combine them in a single response.
> > 
> > 
> > * On future extensions:
> > 
> > +1 that the UDP case, and datagrams more broadly, must have a clear
> > development path, before we can merge TCP.
> > 
> > Similarly, hardware timestamps need not be supported from the start,
> > but must clearly be supportable.
> > 
> > 
> > * On queueing packets to userspace:
> > 
> >>> the current behavior is to just queue to the sk_error_queue as long
> >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> >>> is regardless of the sk_tsflags. "
> > 
> >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> >> read the skb from the errqueue but are not able to parse the
> >> timestamps
> > 
> > Before queuing a packet to userspace on the error queue, the relevant
> > reporting flag is always tested. sock_recv_timestamp has:
> > 
> >          /*
> >           * generate control messages if
> >           * - receive time stamping in software requested
> >           * - software time stamp available and wanted
> >           * - hardware time stamps available and wanted
> >           */
> >          if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> >              (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> >              (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> >              (hwtstamps->hwtstamp &&
> >               (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> >                  __sock_recv_timestamp(msg, sk, skb);
> > 
> > Otherwise applications could get error messages queued, and
> > epoll/poll/select would unexpectedly behave differently.
> 
> I just tried the following diff to remove setsockopt from txtimestamp.c and run 
> "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue 
> with only cmsg flag.

That it surprising and against the API intent as I understand it.
Let me reproduce and take a closer look.

> I did a printk in __skb_tstamp_tx to ensure the
> sk->sk_tsflags is empty also.
> 
> diff --git i/tools/testing/selftests/net/txtimestamp.c 
> w/tools/testing/selftests/net/txtimestamp.c
> index dae91eb97d69..5d9d2773b076 100644
> --- i/tools/testing/selftests/net/txtimestamp.c
> +++ w/tools/testing/selftests/net/txtimestamp.c
> @@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int 
> payload_len)
>   	for (cm = CMSG_FIRSTHDR(msg);
>   	     cm && cm->cmsg_len;
>   	     cm = CMSG_NXTHDR(msg, cm)) {
> +		printf("cm->cmsg_level %d cm->cmsg_type %d\n",
> +		       cm->cmsg_level, cm->cmsg_type);
>   		if (cm->cmsg_level == SOL_SOCKET &&
>   		    cm->cmsg_type == SCM_TIMESTAMPING) {
>   			tss = (void *) CMSG_DATA(cm);
> @@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int 
> payload_len)
>   	if (batch > 1) {
>   		fprintf(stderr, "batched %d timestamps\n", batch);
>   	} else if (!batch) {
> -		fprintf(stderr, "Failed to report timestamps\n");
> +		fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len);
>   		test_failed = true;
>   	}
>   }
> @@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt)
>   	if (cfg_loop_nodata)
>   		sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY;
> 
> +	(void)sock_opt;
> +/*
>   	if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
>   		       (char *) &sock_opt, sizeof(sock_opt)))
>   		error(1, 0, "setsockopt timestamping");
> +*/
> 
>   	for (i = 0; i < cfg_num_pkts; i++) {
>   		memset(&msg, 0, sizeof(msg));
> > 
> >> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> >> features including cmsg mode. But it will not be used in bpf mode.
> > 
> > For simplicity, the two uses of the API are best kept identical. If
> > there is a technical reason why BPF has to diverge from established
> > behavior, this needs to be explicitly called out in the commit
> > message.
> 
> SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be 
> passed directly to the bpf if needed without extra cost. The same probably goes 
> for SOF_TIMESTAMPING_OPT_PKTINFO. SOF_TIMESTAMPING_SOFTWARE does not seem to be 
> useful either. I think only a subset of SOF_* will be supported, probably only 
> the TX_* and RX_* ones.
> 
> > 
> > Also, if you want to extend the API for BPF in the future, good to
> > call this out now and ideally extensions will apply to both, to
> > maintain a uniform API.
> > 
> > 
> > * On extra measurement points, at sendmsg or tcp_write_xmit:
> > 
> > The first is interesting. For application timestamping, this was
> > never needed, as the application can just call clock_gettime before
> > sendmsg.
> > 
> > In general, additional measurement points are not only useful if the
> > interval between is not constant. So far, we have seen no need for
> > any additional points.
> > 
> > 
> > * On skb state:
> > 
> >>> For now, is there thing we can explore to share in the skb_shared_info?
> > 
> > skb_shinfo space is at a premium. I don't think we can justify two
> > extra fields just for this use case.
> > 
> >> My initial thought is just to reuse these fields in skb. It can work
> >> without interfering one another.
> > 
> > I'm skeptical that two methods can work at the same time. If they are
> > started at different times, their sk_tskey will be different, for one.
> 
> For the skb's tx_flags, Jason seems to be able to figure out by only using the 
> new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in 
> tx_flags.
> 
> I am also not very positive on the skb's tskey for now.
> 
> Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping 
> the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I 
> think that approach was stalled because the tx_flags could be changed by the 
> netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the 
> skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed 
> until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone 
> in the tcp stack will still break things if the hwtstamps is reused for other 
> purpose?

True. I think on Tx hwtstamps is only used on the path from the driver
tx completion handler to when it calls skb_tstamp_tx.

It does not even really have to be an skb field. The first driver
cscope happens to point me to indeed just allocates it on the stack:
tsnep_tx_poll.

> > 
> > There may be workarounds. Maybe BPF can store its state in some BPF
> > specific field, indeed. Or perhaps it can store per-sk shadow state
> > that resolves the conflict. For instance, the offset between sk_tskey
> > and bpf_tskey.
> 
> I have also been proposing to explore other way for the key since bpf has direct 
> access to the skb (also the sk, bpf prog can store data in the sk).
> 
> The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack 
> comes back, it can also learn the ack seq no. Does it help? It will be harder to 
> use because it probably needs to store this info in the bpf map (or in the bpf 
> sk storage). However, if it needs to learn the timestamp at the 
> tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored 
> somewhere also. Either in a bpf map or in a bpf sk storage.
> 
> SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g. 
> it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use 
> SOF_TIMESTAMPING_OPT_ID.
Willem de Bruijn Oct. 23, 2024, 2:31 a.m. UTC | #28
Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
> > On 10/20/24 2:51 PM, Willem de Bruijn wrote:
> > > Jason Xing wrote:
> > >> From: Jason Xing <kernelxing@tencent.com>
> > >>
> > >> Willem suggested that we use a static key to control. The advantage
> > >> is that we will not affect the existing applications at all if we
> > >> don't load BPF program.
> > >>
> > >> In this patch, except the static key, I also add one logic that is
> > >> used to test if the socket has enabled its tsflags in order to
> > >> support bpf logic to allow both cases to happen at the same time.
> > >> Or else, the skb carring related timestamp flag doesn't know which
> > >> way of printing is desirable.
> > >>
> > >> One thing important is this patch allows print from both applications
> > >> and bpf program at the same time. Now we have three kinds of print:
> > >> 1) only BPF program prints
> > >> 2) only application program prints
> > >> 3) both can print without side effect
> > >>
> > >> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > 
> > > Getting back to this thread. It is long, instead of responding to
> > > multiple messages, let me combine them in a single response.
> > > 
> > > 
> > > * On future extensions:
> > > 
> > > +1 that the UDP case, and datagrams more broadly, must have a clear
> > > development path, before we can merge TCP.
> > > 
> > > Similarly, hardware timestamps need not be supported from the start,
> > > but must clearly be supportable.
> > > 
> > > 
> > > * On queueing packets to userspace:
> > > 
> > >>> the current behavior is to just queue to the sk_error_queue as long
> > >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > >>> is regardless of the sk_tsflags. "
> > > 
> > >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > >> read the skb from the errqueue but are not able to parse the
> > >> timestamps
> > > 
> > > Before queuing a packet to userspace on the error queue, the relevant
> > > reporting flag is always tested. sock_recv_timestamp has:
> > > 
> > >          /*
> > >           * generate control messages if
> > >           * - receive time stamping in software requested
> > >           * - software time stamp available and wanted
> > >           * - hardware time stamps available and wanted
> > >           */
> > >          if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> > >              (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> > >              (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> > >              (hwtstamps->hwtstamp &&
> > >               (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> > >                  __sock_recv_timestamp(msg, sk, skb);
> > > 
> > > Otherwise applications could get error messages queued, and
> > > epoll/poll/select would unexpectedly behave differently.
> > 
> > I just tried the following diff to remove setsockopt from txtimestamp.c and run 
> > "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue 
> > with only cmsg flag.
> 
> That it surprising and against the API intent as I understand it.
> Let me reproduce and take a closer look.

Interesting. I guess my interpretation was wrong.

The reporting flags prevent reporting of the timestamp, but not
queuing of the skb on the error queue. Even if the only purpose is to
report a timestamp.

It goes back until well before all the API extensions. At least v3.6.

It still does suppress the timestamp itself if the relevant reporting
flag, SOF_TIMESTAMPING_SOFTWARE or SOF_TIMESTAMPING_RAW_HARDWARE, is
not set. So BPF should really still match that, I guess.
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 66ecd78f1dfe..b7c51b95c92d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2889,6 +2889,7 @@  static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
 void sock_def_readable(struct sock *sk);
 
 int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
+DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
 void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
 int sock_get_timestamping(struct so_timestamping *timestamping,
 			  sockptr_t optval, unsigned int optlen);
diff --git a/net/core/filter.c b/net/core/filter.c
index 996426095bd9..08135f538c99 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5204,6 +5204,8 @@  static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
+
 static int bpf_sock_set_timestamping(struct sock *sk,
 				     struct so_timestamping *timestamping)
 {
@@ -5217,6 +5219,7 @@  static int bpf_sock_set_timestamping(struct sock *sk,
 		return -EINVAL;
 
 	WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
+	static_branch_enable(&bpf_tstamp_control);
 
 	return 0;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f36eb9daa31a..d0f912f1ff7b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5540,6 +5540,29 @@  void skb_complete_tx_timestamp(struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
 
+static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
+{
+	u32 testflag;
+
+	switch (tstype) {
+	case SCM_TSTAMP_SCHED:
+		testflag = SOF_TIMESTAMPING_TX_SCHED;
+		break;
+	case SCM_TSTAMP_SND:
+		testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
+		break;
+	case SCM_TSTAMP_ACK:
+		testflag = SOF_TIMESTAMPING_TX_ACK;
+		break;
+	default:
+		return false;
+	}
+	if (tsflags & testflag)
+		return true;
+
+	return false;
+}
+
 static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
 				 const struct sk_buff *ack_skb,
 				 struct skb_shared_hwtstamps *hwtstamps,
@@ -5558,6 +5581,9 @@  static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
 	if (!skb_may_tx_timestamp(sk, tsonly))
 		return;
 
+	if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
+		return;
+
 	if (tsonly) {
 #ifdef CONFIG_INET
 		if ((tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
@@ -5593,6 +5619,15 @@  static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
 	__skb_complete_tx_timestamp(skb, sk, tstype, opt_stats);
 }
 
+static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype)
+{
+	u32 tsflags;
+
+	tsflags = READ_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR]);
+	if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
+		return;
+}
+
 void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     const struct sk_buff *ack_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
@@ -5601,6 +5636,9 @@  void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!sk)
 		return;
 
+	if (static_branch_unlikely(&bpf_tstamp_control))
+		bpf_skb_tstamp_tx_output(sk, tstype);
+
 	skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
 }
 EXPORT_SYMBOL_GPL(__skb_tstamp_tx);