diff mbox series

[net-next,v4,10/11] net-timestamp: export the tskey for TCP bpf extension

Message ID 20241207173803.90744-11-kerneljasonxing@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net-timestamp: bpf extension to equip applications transparently | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: horms@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 54 this patch: 54
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-12-08--03-00 (tests: 682)

Commit Message

Jason Xing Dec. 7, 2024, 5:38 p.m. UTC
From: Jason Xing <kernelxing@tencent.com>

For now, there are three phases where we are not able to fetch
the right seqno from the skops->skb_data, because:
1) in __dev_queue_xmit(), the skb->data doesn't point to the start
offset in tcp header.
2) in tcp_ack_tstamp(), the skb doesn't have the tcp header.

In the long run, we may add other trace points for bpf extension.
And the shinfo->tskey is always the same value for both bpf and
non-bpf cases. With that said, let's directly use shinfo->tskey
for TCP protocol.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/core/skbuff.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Martin KaFai Lau Dec. 13, 2024, 12:28 a.m. UTC | #1
On 12/7/24 9:38 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> For now, there are three phases where we are not able to fetch
> the right seqno from the skops->skb_data, because:
> 1) in __dev_queue_xmit(), the skb->data doesn't point to the start
> offset in tcp header.
> 2) in tcp_ack_tstamp(), the skb doesn't have the tcp header.
> 
> In the long run, we may add other trace points for bpf extension.
> And the shinfo->tskey is always the same value for both bpf and
> non-bpf cases. With that said, let's directly use shinfo->tskey
> for TCP protocol.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   net/core/skbuff.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7c59ef501c74..2e13643f791c 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5544,7 +5544,7 @@ static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
>   				int tstype)
>   {
>   	struct timespec64 tstamp;
> -	u32 args[2] = {0, 0};
> +	u32 args[3] = {0, 0, 0};
>   	int op;
>   
>   	if (!sk)
> @@ -5569,7 +5569,10 @@ static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
>   		return;
>   	}
>   
> -	bpf_skops_tx_timestamping(sk, skb, op, 2, args);
> +	if (sk_is_tcp(sk))
> +		args[2] = skb_shinfo(skb)->tskey;

Instead of only passing one info "skb_shinfo(skb)->tskey" of a skb, pass the 
whole skb ptr to the bpf prog. Take a look at bpf_skops_init_skb. Lets start 
with end_offset = 0 for now so that the bpf prog won't use it to read the 
skb->data. It can be revisited later.

	bpf_skops_init_skb(&sock_ops, skb, 0);

The bpf prog can use bpf_cast_to_kern_ctx() and bpf_core_cast() to get to the 
skb_shinfo(skb). Take a look at the md_skb example in type_cast.c.

Then it needs to add a bpf_sock->op check to the existing 
bpf_sock_ops_{load,store}_hdr_opt() helpers to ensure these helpers can only be 
used by the BPF_SOCK_OPS_PARSE_HDR_OPT_CB, BPF_SOCK_OPS_HDR_OPT_LEN_CB, and 
BPF_SOCK_OPS_WRITE_HDR_OPT_CB callback.

btw, how is the ack_skb used for the SCM_TSTAMP_ACK by the user space now?
Jason Xing Dec. 13, 2024, 3:44 p.m. UTC | #2
On Fri, Dec 13, 2024 at 8:28 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/7/24 9:38 AM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > For now, there are three phases where we are not able to fetch
> > the right seqno from the skops->skb_data, because:
> > 1) in __dev_queue_xmit(), the skb->data doesn't point to the start
> > offset in tcp header.
> > 2) in tcp_ack_tstamp(), the skb doesn't have the tcp header.
> >
> > In the long run, we may add other trace points for bpf extension.
> > And the shinfo->tskey is always the same value for both bpf and
> > non-bpf cases. With that said, let's directly use shinfo->tskey
> > for TCP protocol.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   net/core/skbuff.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 7c59ef501c74..2e13643f791c 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5544,7 +5544,7 @@ static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
> >                               int tstype)
> >   {
> >       struct timespec64 tstamp;
> > -     u32 args[2] = {0, 0};
> > +     u32 args[3] = {0, 0, 0};
> >       int op;
> >
> >       if (!sk)
> > @@ -5569,7 +5569,10 @@ static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
> >               return;
> >       }
> >
> > -     bpf_skops_tx_timestamping(sk, skb, op, 2, args);
> > +     if (sk_is_tcp(sk))
> > +             args[2] = skb_shinfo(skb)->tskey;
>
> Instead of only passing one info "skb_shinfo(skb)->tskey" of a skb, pass the
> whole skb ptr to the bpf prog. Take a look at bpf_skops_init_skb. Lets start
> with end_offset = 0 for now so that the bpf prog won't use it to read the
> skb->data. It can be revisited later.
>
>         bpf_skops_init_skb(&sock_ops, skb, 0);
>
> The bpf prog can use bpf_cast_to_kern_ctx() and bpf_core_cast() to get to the
> skb_shinfo(skb). Take a look at the md_skb example in type_cast.c.

Sorry, I didn't give it much thought on getting to the shinfo. That's
why I quickly gave up using bpf_skops_init_skb() after I noticed the
seq of skb is always zero :(

I will test it tomorrow. Thanks.

>
> Then it needs to add a bpf_sock->op check to the existing
> bpf_sock_ops_{load,store}_hdr_opt() helpers to ensure these helpers can only be
> used by the BPF_SOCK_OPS_PARSE_HDR_OPT_CB, BPF_SOCK_OPS_HDR_OPT_LEN_CB, and
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB callback.

Forgive me. I cannot see how the bpf_sock_ops_load_hdr_opt helper has
something to do with the current thread? Could you enlighten me?

>
> btw, how is the ack_skb used for the SCM_TSTAMP_ACK by the user space now?

To be honest, I hardly use the ack_skb[1] under this circumstance... I
think if someone offers a suggestion to use it, then we can support
it?

[1]
commit e7ed11ee945438b737e2ae2370e35591e16ec371
Author: Yousuk Seung <ysseung@google.com>
Date:   Wed Jan 20 12:41:55 2021 -0800

    tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS

    This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
    the time-to-live or hop limit of the latest incoming packet with
    SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
    the sequence when incoming packets are aggregated. Exporting the
    time-to-live or hop limit value of incoming packets helps to estimate
    the hop count of the path of the flow that may change over time.
Martin KaFai Lau Dec. 13, 2024, 11:55 p.m. UTC | #3
On 12/13/24 7:44 AM, Jason Xing wrote:
>>> @@ -5569,7 +5569,10 @@ static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
>>>                return;
>>>        }
>>>
>>> -     bpf_skops_tx_timestamping(sk, skb, op, 2, args);
>>> +     if (sk_is_tcp(sk))
>>> +             args[2] = skb_shinfo(skb)->tskey;
>> Instead of only passing one info "skb_shinfo(skb)->tskey" of a skb, pass the
>> whole skb ptr to the bpf prog. Take a look at bpf_skops_init_skb. Lets start
>> with end_offset = 0 for now so that the bpf prog won't use it to read the
>> skb->data. It can be revisited later.
>>
>>          bpf_skops_init_skb(&sock_ops, skb, 0);
>>
>> The bpf prog can use bpf_cast_to_kern_ctx() and bpf_core_cast() to get to the
>> skb_shinfo(skb). Take a look at the md_skb example in type_cast.c.
> Sorry, I didn't give it much thought on getting to the shinfo. That's
> why I quickly gave up using bpf_skops_init_skb() after I noticed the
> seq of skb is always zero 
Jason Xing Dec. 14, 2024, 12:09 a.m. UTC | #4
On Sat, Dec 14, 2024 at 7:55 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/13/24 7:44 AM, Jason Xing wrote:
> >>> @@ -5569,7 +5569,10 @@ static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
> >>>                return;
> >>>        }
> >>>
> >>> -     bpf_skops_tx_timestamping(sk, skb, op, 2, args);
> >>> +     if (sk_is_tcp(sk))
> >>> +             args[2] = skb_shinfo(skb)->tskey;
> >> Instead of only passing one info "skb_shinfo(skb)->tskey" of a skb, pass the
> >> whole skb ptr to the bpf prog. Take a look at bpf_skops_init_skb. Lets start
> >> with end_offset = 0 for now so that the bpf prog won't use it to read the
> >> skb->data. It can be revisited later.
> >>
> >>          bpf_skops_init_skb(&sock_ops, skb, 0);
> >>
> >> The bpf prog can use bpf_cast_to_kern_ctx() and bpf_core_cast() to get to the
> >> skb_shinfo(skb). Take a look at the md_skb example in type_cast.c.
> > Sorry, I didn't give it much thought on getting to the shinfo. That's
> > why I quickly gave up using bpf_skops_init_skb() after I noticed the
> > seq of skb is always zero 
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7c59ef501c74..2e13643f791c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5544,7 +5544,7 @@  static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
 				int tstype)
 {
 	struct timespec64 tstamp;
-	u32 args[2] = {0, 0};
+	u32 args[3] = {0, 0, 0};
 	int op;
 
 	if (!sk)
@@ -5569,7 +5569,10 @@  static void __skb_tstamp_tx_bpf(struct sock *sk, struct sk_buff *skb,
 		return;
 	}
 
-	bpf_skops_tx_timestamping(sk, skb, op, 2, args);
+	if (sk_is_tcp(sk))
+		args[2] = skb_shinfo(skb)->tskey;
+
+	bpf_skops_tx_timestamping(sk, skb, op, 3, args);
 }
 
 static void skb_tstamp_tx_output(struct sk_buff *orig_skb,