[net-next,v4,02/11] net-timestamp: prepare for bpf prog use

Message ID	20241207173803.90744-3-kerneljasonxing@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13AC342A9B; Sat, 7 Dec 2024 17:38:25 +0000 (UTC) From: Jason Xing <kerneljasonxing@gmail.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, willemdebruijn.kernel@gmail.com, willemb@google.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Jason Xing <kernelxing@tencent.com> Subject: [PATCH net-next v4 02/11] net-timestamp: prepare for bpf prog use Date: Sun, 8 Dec 2024 01:37:54 +0800 Message-Id: <20241207173803.90744-3-kerneljasonxing@gmail.com> In-Reply-To: <20241207173803.90744-1-kerneljasonxing@gmail.com> References: <20241207173803.90744-1-kerneljasonxing@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	net-timestamp: bpf extension to equip applications transparently \| expand [net-next,v4,00/11] net-timestamp: bpf extension to equip applications transparently [net-next,v4,01/11] net-timestamp: add support for bpf_setsockopt() [net-next,v4,02/11] net-timestamp: prepare for bpf prog use [net-next,v4,03/11] net-timestamp: reorganize in skb_tstamp_tx_output() [net-next,v4,04/11] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension [net-next,v4,05/11] net-timestamp: support SCM_TSTAMP_SND for bpf extension [net-next,v4,06/11] net-timestamp: support SCM_TSTAMP_ACK for bpf extension [net-next,v4,07/11] net-timestamp: support hwtstamp print for bpf extension [net-next,v4,08/11] net-timestamp: make TCP tx timestamp bpf extension work [net-next,v4,09/11] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases [net-next,v4,10/11] net-timestamp: export the tskey for TCP bpf extension [net-next,v4,11/11] bpf: add simple bpf tests in the tx path for so_timstamping feature

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net-next, async
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 15 this patch: 15
netdev/build_tools	success	Errors and warnings before: 0 (+0) this patch: 0 (+0)
netdev/cc_maintainers	warning	1 maintainers not CCed: horms@kernel.org
netdev/build_clang	success	Errors and warnings before: 28 this patch: 28
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 2597 this patch: 2597
netdev/checkpatch	warning	WARNING: line length of 90 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 13 this patch: 13
netdev/source_inline	success	Was 0 now: 0
netdev/contest	fail	net-next-2024-12-08--03-00 (tests: 682)

Jason Xing Dec. 7, 2024, 5:37 p.m. UTC

From: Jason Xing <kernelxing@tencent.com>

Later, I would introduce three points to report some information
to user space based on this.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/sock.h |  7 +++++++
 net/core/sock.c    | 15 +++++++++++++++
 2 files changed, 22 insertions(+)

Martin KaFai Lau Dec. 11, 2024, 2:02 a.m. UTC | #1

On 12/7/24 9:37 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Later, I would introduce three points to report some information
> to user space based on this.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>   include/net/sock.h |  7 +++++++
>   net/core/sock.c    | 15 +++++++++++++++
>   2 files changed, 22 insertions(+)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 0dd464ba9e46..f88a00108a2f 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2920,6 +2920,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
>   			  struct so_timestamping timestamping);
>   
>   void sock_enable_timestamps(struct sock *sk);
> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> +#else
> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> +{
> +}
> +#endif
>   void sock_no_linger(struct sock *sk);
>   void sock_set_keepalive(struct sock *sk);
>   void sock_set_priority(struct sock *sk, u32 priority);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 74729d20cd00..79cb5c74c76c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -941,6 +941,21 @@ int sock_set_timestamping(struct sock *sk, int optname,
>   	return 0;
>   }
>   
> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> +{
> +	struct bpf_sock_ops_kern sock_ops;
> +
> +	sock_owned_by_me(sk);

I don't think this can be assumed in the time stamping callback.

To remove this assumption for sockops, I believe it needs to stop the bpf prog 
from calling a few bpf helpers. In particular, the bpf_sock_ops_cb_flags_set and 
bpf_sock_ops_setsockopt. This should be easy by asking the helpers to check the 
"u8 op" in "struct bpf_sock_ops_kern *".

I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The 
same should go for reading from sk. Also, sockops prog assumes a fullsock sk is 
a tcp_sock which also won't work for the udp case. A quick thought is to do 
something similar to is_fullsock. May be repurpose the is_fullsock somehow or a 
new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids 
writing/reading the sk when is_fullsock is false.

This is a signal that the existing sockops interface has already seen better 
days. I hope not too many fixes like these are needed to get tcp/udp 
timestamping to work.

> +
> +	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> +	sock_ops.op = op;
> +	sock_ops.is_fullsock = 1;

I don't think we can assume it is always is_fullsock either.

> +	sock_ops.sk = sk;
> +	__cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);

Same here. sk may not be fullsock. BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops) is 
needed.

[ I will continue the rest of the set later. ]

> +}
> +#endif
> +
>   void sock_set_keepalive(struct sock *sk)
>   {
>   	lock_sock(sk);

Jason Xing Dec. 11, 2024, 9:17 a.m. UTC | #2

On Wed, Dec 11, 2024 at 10:02 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/7/24 9:37 AM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Later, I would introduce three points to report some information
> > to user space based on this.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >   include/net/sock.h |  7 +++++++
> >   net/core/sock.c    | 15 +++++++++++++++
> >   2 files changed, 22 insertions(+)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 0dd464ba9e46..f88a00108a2f 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2920,6 +2920,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >                         struct so_timestamping timestamping);
> >
> >   void sock_enable_timestamps(struct sock *sk);
> > +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> > +#else
> > +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > +{
> > +}
> > +#endif
> >   void sock_no_linger(struct sock *sk);
> >   void sock_set_keepalive(struct sock *sk);
> >   void sock_set_priority(struct sock *sk, u32 priority);
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 74729d20cd00..79cb5c74c76c 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -941,6 +941,21 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >       return 0;
> >   }
> >
> > +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> > +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> > +{
> > +     struct bpf_sock_ops_kern sock_ops;
> > +
> > +     sock_owned_by_me(sk);
>
> I don't think this can be assumed in the time stamping callback.

I'll remove this.

>
> To remove this assumption for sockops, I believe it needs to stop the bpf prog
> from calling a few bpf helpers. In particular, the bpf_sock_ops_cb_flags_set and
> bpf_sock_ops_setsockopt. This should be easy by asking the helpers to check the
> "u8 op" in "struct bpf_sock_ops_kern *".

Sorry, I don't follow. Could you rephrase your thoughts? Thanks.

>
> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The
> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is
> a tcp_sock which also won't work for the udp case. A quick thought is to do
> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a
> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids
> writing/reading the sk when is_fullsock is false.

Do you mean that if we introduce a new field, then bpf prog can
read/write the socket?

Reading the socket could be very helpful in the long run.

>
> This is a signal that the existing sockops interface has already seen better
> days. I hope not too many fixes like these are needed to get tcp/udp
> timestamping to work.
>
> > +
> > +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> > +     sock_ops.op = op;
> > +     sock_ops.is_fullsock = 1;
>
> I don't think we can assume it is always is_fullsock either.

Right, but for now, TCP seems to need this. I can remove this also.

>
> > +     sock_ops.sk = sk;
> > +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>
> Same here. sk may not be fullsock. BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops) is
> needed.

If we use this helper, we will change when the udp bpf extension needs
to be supported.

>
> [ I will continue the rest of the set later. ]

Thanks a lot :)

>
> > +}
> > +#endif
> > +
> >   void sock_set_keepalive(struct sock *sk)
> >   {
> >       lock_sock(sk);
>

Martin KaFai Lau Dec. 13, 2024, 1:41 a.m. UTC | #3

On 12/11/24 1:17 AM, Jason Xing wrote:
> On Wed, Dec 11, 2024 at 10:02 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 12/7/24 9:37 AM, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> Later, I would introduce three points to report some information
>>> to user space based on this.
>>>
>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>> ---
>>>    include/net/sock.h |  7 +++++++
>>>    net/core/sock.c    | 15 +++++++++++++++
>>>    2 files changed, 22 insertions(+)
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index 0dd464ba9e46..f88a00108a2f 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -2920,6 +2920,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
>>>                          struct so_timestamping timestamping);
>>>
>>>    void sock_enable_timestamps(struct sock *sk);
>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
>>> +#else
>>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>>> +{
>>> +}
>>> +#endif
>>>    void sock_no_linger(struct sock *sk);
>>>    void sock_set_keepalive(struct sock *sk);
>>>    void sock_set_priority(struct sock *sk, u32 priority);
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 74729d20cd00..79cb5c74c76c 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -941,6 +941,21 @@ int sock_set_timestamping(struct sock *sk, int optname,
>>>        return 0;
>>>    }
>>>
>>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
>>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
>>> +{
>>> +     struct bpf_sock_ops_kern sock_ops;
>>> +
>>> +     sock_owned_by_me(sk);
>>
>> I don't think this can be assumed in the time stamping callback.
> 
> I'll remove this.
> 
>>
>> To remove this assumption for sockops, I believe it needs to stop the bpf prog
>> from calling a few bpf helpers. In particular, the bpf_sock_ops_cb_flags_set and
>> bpf_sock_ops_setsockopt. This should be easy by asking the helpers to check the
>> "u8 op" in "struct bpf_sock_ops_kern *".
> 
> Sorry, I don't follow. Could you rephrase your thoughts? Thanks.

Take a look at bpf_sock_ops_setsockopt in filter.c. To change a sk, it needs to 
hold the sk_lock. If you drill down bpf_sock_ops_setsockopt, 
sock_owned_by_me(sk) is checked somewhere.

The sk_lock held assumption is true so far for the existing sockops callbacks.
The new timestamping sockops callback does not necessary have the sk_lock held, 
so it will break the bpf_sock_ops_setsockopt() assumption on the sk_lock.

> 
>>
>> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The
>> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is
>> a tcp_sock which also won't work for the udp case. A quick thought is to do
>> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a
>> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids
>> writing/reading the sk when is_fullsock is false.
> 
> Do you mean that if we introduce a new field, then bpf prog can
> read/write the socket?

The same goes for writing the sk, e.g. writing the sk->sk_txhash. It needs the 
sk_lock held. Reading may be ok-ish. The bpf prog can read it anyway by 
bpf_probe_read...etc.

When adding udp timestamp callback later, it needs to stop reading the tcp_sock 
through skops from the udp callback for sure. Do take a look at 
SOCK_OPS_GET_TCP_SOCK_FIELD. I think we need to ensure the udp timestamp 
callback won't break here before moving forward.

> 
> Reading the socket could be very helpful in the long run.
> 
>>
>> This is a signal that the existing sockops interface has already seen better
>> days. I hope not too many fixes like these are needed to get tcp/udp
>> timestamping to work.
>>
>>> +
>>> +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
>>> +     sock_ops.op = op;
>>> +     sock_ops.is_fullsock = 1;
>>
>> I don't think we can assume it is always is_fullsock either.
> 
> Right, but for now, TCP seems to need this. I can remove this also.

I take this back. After reading the existing __skb_tstamp_tx, I think sk is 
always fullsock here.

> 
>>
>>> +     sock_ops.sk = sk;
>>> +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
>>
>> Same here. sk may not be fullsock. BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops) is
>> needed.
> 
> If we use this helper, we will change when the udp bpf extension needs
> to be supported.
> 
>>
>> [ I will continue the rest of the set later. ]
> 
> Thanks a lot :)
> 
>>
>>> +}
>>> +#endif
>>> +
>>>    void sock_set_keepalive(struct sock *sk)
>>>    {
>>>        lock_sock(sk);
>>

Jason Xing Dec. 13, 2024, 2:42 p.m. UTC | #4

On Fri, Dec 13, 2024 at 9:41 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/11/24 1:17 AM, Jason Xing wrote:
> > On Wed, Dec 11, 2024 at 10:02 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 12/7/24 9:37 AM, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@tencent.com>
> >>>
> >>> Later, I would introduce three points to report some information
> >>> to user space based on this.
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >>> ---
> >>>    include/net/sock.h |  7 +++++++
> >>>    net/core/sock.c    | 15 +++++++++++++++
> >>>    2 files changed, 22 insertions(+)
> >>>
> >>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>> index 0dd464ba9e46..f88a00108a2f 100644
> >>> --- a/include/net/sock.h
> >>> +++ b/include/net/sock.h
> >>> @@ -2920,6 +2920,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >>>                          struct so_timestamping timestamping);
> >>>
> >>>    void sock_enable_timestamps(struct sock *sk);
> >>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> >>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> >>> +#else
> >>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >>> +{
> >>> +}
> >>> +#endif
> >>>    void sock_no_linger(struct sock *sk);
> >>>    void sock_set_keepalive(struct sock *sk);
> >>>    void sock_set_priority(struct sock *sk, u32 priority);
> >>> diff --git a/net/core/sock.c b/net/core/sock.c
> >>> index 74729d20cd00..79cb5c74c76c 100644
> >>> --- a/net/core/sock.c
> >>> +++ b/net/core/sock.c
> >>> @@ -941,6 +941,21 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >>>        return 0;
> >>>    }
> >>>
> >>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> >>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >>> +{
> >>> +     struct bpf_sock_ops_kern sock_ops;
> >>> +
> >>> +     sock_owned_by_me(sk);
> >>
> >> I don't think this can be assumed in the time stamping callback.
> >
> > I'll remove this.
> >
> >>
> >> To remove this assumption for sockops, I believe it needs to stop the bpf prog
> >> from calling a few bpf helpers. In particular, the bpf_sock_ops_cb_flags_set and
> >> bpf_sock_ops_setsockopt. This should be easy by asking the helpers to check the
> >> "u8 op" in "struct bpf_sock_ops_kern *".
> >
> > Sorry, I don't follow. Could you rephrase your thoughts? Thanks.
>
> Take a look at bpf_sock_ops_setsockopt in filter.c. To change a sk, it needs to
> hold the sk_lock. If you drill down bpf_sock_ops_setsockopt,
> sock_owned_by_me(sk) is checked somewhere.

Thanks, now I totally follow.

>
> The sk_lock held assumption is true so far for the existing sockops callbacks.
> The new timestamping sockops callback does not necessary have the sk_lock held,

Well, for TCP only, there are three new callbacks that I think own the
sk_lock already, say, the tcp_sendmsg() will call the lock_sock(). For
other types, like you said, maybe not.

> so it will break the bpf_sock_ops_setsockopt() assumption on the sk_lock.

Agreed, at least for the writer accessing the sk is not allowed actually.

>
> >
> >>
> >> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The
> >> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is
> >> a tcp_sock which also won't work for the udp case. A quick thought is to do
> >> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a
> >> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids
> >> writing/reading the sk when is_fullsock is false.
> >
> > Do you mean that if we introduce a new field, then bpf prog can
> > read/write the socket?
>
> The same goes for writing the sk, e.g. writing the sk->sk_txhash. It needs the
> sk_lock held. Reading may be ok-ish. The bpf prog can read it anyway by
> bpf_probe_read...etc.
>
> When adding udp timestamp callback later, it needs to stop reading the tcp_sock
> through skops from the udp callback for sure. Do take a look at
> SOCK_OPS_GET_TCP_SOCK_FIELD. I think we need to ensure the udp timestamp
> callback won't break here before moving forward.

Agreed. Removing the "sock_ops.sk = sk;" is simple, but I still want
the bpf prog to be able to read some fields from the socket under
those new callbacks.

Let me figure out a feasible solution this weekend if everything goes well.

>
> >
> > Reading the socket could be very helpful in the long run.
> >
> >>
> >> This is a signal that the existing sockops interface has already seen better
> >> days. I hope not too many fixes like these are needed to get tcp/udp
> >> timestamping to work.
> >>
> >>> +
> >>> +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> >>> +     sock_ops.op = op;
> >>> +     sock_ops.is_fullsock = 1;
> >>
> >> I don't think we can assume it is always is_fullsock either.
> >
> > Right, but for now, TCP seems to need this. I can remove this also.
>
> I take this back. After reading the existing __skb_tstamp_tx, I think sk is
> always fullsock here.

Got it.

>
> >
> >>
> >>> +     sock_ops.sk = sk;
> >>> +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> >>
> >> Same here. sk may not be fullsock. BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops) is
> >> needed.
> >
> > If we use this helper, we will change when the udp bpf extension needs
> > to be supported.
> >
> >>
> >> [ I will continue the rest of the set later. ]
> >
> > Thanks a lot :)
> >
> >>
> >>> +}
> >>> +#endif
> >>> +
> >>>    void sock_set_keepalive(struct sock *sk)
> >>>    {
> >>>        lock_sock(sk);
> >>
>

Martin KaFai Lau Dec. 13, 2024, 10:26 p.m. UTC | #5

On 12/13/24 6:42 AM, Jason Xing wrote:
>>>> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The
>>>> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is
>>>> a tcp_sock which also won't work for the udp case. A quick thought is to do
>>>> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a
>>>> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids
>>>> writing/reading the sk when is_fullsock is false.

May be this message buried in the earlier reply or some piece was not clear, so 
worth to highlight here.

Take a look at how is_fullsock is used in SOCK_OPS_{GET,SET}_FIELD. I think a 
similar idea can be borrowed here.

>>>
>>> Do you mean that if we introduce a new field, then bpf prog can
>>> read/write the socket?
>>
>> The same goes for writing the sk, e.g. writing the sk->sk_txhash. It needs the
>> sk_lock held. Reading may be ok-ish. The bpf prog can read it anyway by
>> bpf_probe_read...etc.
>>
>> When adding udp timestamp callback later, it needs to stop reading the tcp_sock
>> through skops from the udp callback for sure. Do take a look at
>> SOCK_OPS_GET_TCP_SOCK_FIELD. I think we need to ensure the udp timestamp
>> callback won't break here before moving forward.
> 
> Agreed. Removing the "sock_ops.sk = sk;" is simple, but I still want
> the bpf prog to be able to read some fields from the socket under
> those new callbacks.

No need to remove "sock_ops.sk = sk;". Try to borrow the is_fullsock idea.

Overall, the new timestamp callback breaks assumptions like, sk_lock is held and 
is_fullsock must be a tcp_sock. This needs to be audited. In particular, please 
check sock_ops_func_proto() for all accessible bpf helpers. Also check the 
sock_ops_is_valid_access() and sock_ops_convert_ctx_access() for directly 
accessible fields without the helpers. In particular, the BPF_WRITE (able) 
fields and the tcp_sock fields.

Jason Xing Dec. 13, 2024, 11:56 p.m. UTC | #6

On Sat, Dec 14, 2024 at 6:26 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/13/24 6:42 AM, Jason Xing wrote:
> >>>> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The
> >>>> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is
> >>>> a tcp_sock which also won't work for the udp case. A quick thought is to do
> >>>> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a
> >>>> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids
> >>>> writing/reading the sk when is_fullsock is false.
>
> May be this message buried in the earlier reply or some piece was not clear, so
> worth to highlight here.
>
> Take a look at how is_fullsock is used in SOCK_OPS_{GET,SET}_FIELD. I think a
> similar idea can be borrowed here.
>
> >>>
> >>> Do you mean that if we introduce a new field, then bpf prog can
> >>> read/write the socket?
> >>
> >> The same goes for writing the sk, e.g. writing the sk->sk_txhash. It needs the
> >> sk_lock held. Reading may be ok-ish. The bpf prog can read it anyway by
> >> bpf_probe_read...etc.
> >>
> >> When adding udp timestamp callback later, it needs to stop reading the tcp_sock
> >> through skops from the udp callback for sure. Do take a look at
> >> SOCK_OPS_GET_TCP_SOCK_FIELD. I think we need to ensure the udp timestamp
> >> callback won't break here before moving forward.
> >
> > Agreed. Removing the "sock_ops.sk = sk;" is simple, but I still want
> > the bpf prog to be able to read some fields from the socket under
> > those new callbacks.
>
> No need to remove "sock_ops.sk = sk;". Try to borrow the is_fullsock idea.
>
> Overall, the new timestamp callback breaks assumptions like, sk_lock is held and
> is_fullsock must be a tcp_sock. This needs to be audited. In particular, please
> check sock_ops_func_proto() for all accessible bpf helpers. Also check the
> sock_ops_is_valid_access() and sock_ops_convert_ctx_access() for directly
> accessible fields without the helpers. In particular, the BPF_WRITE (able)
> fields and the tcp_sock fields.

Thanks for the valuable information. I will dig into them.

[net-next,v4,02/11] net-timestamp: prepare for bpf prog use

Checks

Commit Message

Comments

Patch