diff mbox series

[mptcp-next,v5,2/5] Squash to "bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock"

Message ID c43f5e1fe8da4db01749ede9d18af1a3a2134fd4.1740997925.git.tanggeliang@kylinos.cn (mailing list archive)
State Rejected, archived
Delegated to: Matthieu Baerts
Headers show
Series Squash to "Add mptcp_subflow bpf_iter support" | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
matttbe/shellcheck success MPTCP selftests files have not been modified
matttbe/KVM_Validation__normal success Success! ✅
matttbe/KVM_Validation__debug success Success! ✅
matttbe/KVM_Validation__btf-normal__only_bpftest_all_ success Success! ✅
matttbe/KVM_Validation__btf-debug__only_bpftest_all_ success Success! ✅

Commit Message

Geliang Tang March 3, 2025, 10:33 a.m. UTC
From: Geliang Tang <tanggeliang@kylinos.cn>

Set msk->bpf_iter_task in bpf_mptcp_sock_from_sock() to allow
mptcp_subflow bpt_iter can be used in cgroup/getsockopt,
otherwise, the selftest in this set fails.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/bpf.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Mat Martineau March 4, 2025, 1:37 a.m. UTC | #1
On Mon, 3 Mar 2025, Geliang Tang wrote:

> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Set msk->bpf_iter_task in bpf_mptcp_sock_from_sock() to allow
> mptcp_subflow bpt_iter can be used in cgroup/getsockopt,
> otherwise, the selftest in this set fails.
>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> net/mptcp/bpf.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index be222fa5f308..aff92f458e7f 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -197,14 +197,22 @@ static struct bpf_struct_ops bpf_mptcp_sched_ops = {
>
> struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk)
> {
> +	struct mptcp_sock *msk;
> +
> 	if (unlikely(!sk || !sk_fullsock(sk)))
> 		return NULL;
>
> -	if (sk->sk_protocol == IPPROTO_MPTCP)
> -		return mptcp_sk(sk);
> +	if (sk->sk_protocol == IPPROTO_MPTCP) {
> +		msk = mptcp_sk(sk);
> +		mptcp_set_bpf_iter_task(msk);

Hi Geliang -

The important part of the bpf_iter_task approach is to only set the 
task_struct pointer when the msk lock is already held, and to clear it 
before the msk lock is released. When used this way, the check ensures 
that the iterator code is both *running with the socket locked* and *in 
the same context where the lock is held*.

The code above will set msk->bpf_iter_task even when the lock is not held, 
and then it is never cleared.

To set/clear as expected for get/setsockopt I think the task pointer would 
have to be set in places where the locks are acquired and released, like 
these:

https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1955
https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1846


- Mat

> +		return msk;
> +	}
>
> -	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
> -		return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> +	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) {
> +		msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> +		mptcp_set_bpf_iter_task(msk);
> +		return msk;
> +	}
>
> 	return NULL;
> }
> -- 
> 2.43.0
>
>
>
Geliang Tang March 7, 2025, 3:51 a.m. UTC | #2
Hi Mat,

Thanks for the review.

On Mon, 2025-03-03 at 17:37 -0800, Mat Martineau wrote:
> On Mon, 3 Mar 2025, Geliang Tang wrote:
> 
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Set msk->bpf_iter_task in bpf_mptcp_sock_from_sock() to allow
> > mptcp_subflow bpt_iter can be used in cgroup/getsockopt,
> > otherwise, the selftest in this set fails.
> > 
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/mptcp/bpf.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index be222fa5f308..aff92f458e7f 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -197,14 +197,22 @@ static struct bpf_struct_ops
> > bpf_mptcp_sched_ops = {
> > 
> > struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk)
> > {
> > +	struct mptcp_sock *msk;
> > +
> > 	if (unlikely(!sk || !sk_fullsock(sk)))
> > 		return NULL;
> > 
> > -	if (sk->sk_protocol == IPPROTO_MPTCP)
> > -		return mptcp_sk(sk);
> > +	if (sk->sk_protocol == IPPROTO_MPTCP) {
> > +		msk = mptcp_sk(sk);
> > +		mptcp_set_bpf_iter_task(msk);
> 
> Hi Geliang -
> 
> The important part of the bpf_iter_task approach is to only set the 
> task_struct pointer when the msk lock is already held, and to clear
> it 
> before the msk lock is released. When used this way, the check
> ensures 
> that the iterator code is both *running with the socket locked* and
> *in 
> the same context where the lock is held*.

Got it, thanks for your explanation.

> 
> The code above will set msk->bpf_iter_task even when the lock is not
> held, 
> and then it is never cleared.
> 
> To set/clear as expected for get/setsockopt I think the task pointer
> would 
> have to be set in places where the locks are acquired and released,
> like 
> these:
> 
> https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1955
> https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1846
> 

The get/setsockopt scenario is a bit complicated. I think we can use
the bpf_iter_task check only for bpf schedulers, keep the
get/setsockopt scenario unchanged, and continue to use the
msk_owned_by_me(msk) check. What do you think?

In addition, I think it's better to add bpf_iter_task to struct
mptcp_sched_data instead in struct mptcp_sock, since it's more related
to mptcp scheduler. For specific usage, please see [1]. What do you
think of this?

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1739788598.git.tanggeliang@kylinos.cn/

> 
> - Mat
> 
> > +		return msk;
> > +	}
> > 
> > -	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
> > -		return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> > +	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) {
> > +		msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> > +		mptcp_set_bpf_iter_task(msk);
> > +		return msk;
> > +	}
> > 
> > 	return NULL;
> > }
> > -- 
> > 2.43.0
> > 
> > 
> >
Geliang Tang March 7, 2025, 3:54 a.m. UTC | #3
Hi Mat,

Thanks for the review.

On Mon, 2025-03-03 at 17:37 -0800, Mat Martineau wrote:
> On Mon, 3 Mar 2025, Geliang Tang wrote:
> 
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Set msk->bpf_iter_task in bpf_mptcp_sock_from_sock() to allow
> > mptcp_subflow bpt_iter can be used in cgroup/getsockopt,
> > otherwise, the selftest in this set fails.
> > 
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/mptcp/bpf.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index be222fa5f308..aff92f458e7f 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -197,14 +197,22 @@ static struct bpf_struct_ops
> > bpf_mptcp_sched_ops = {
> > 
> > struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk)
> > {
> > +	struct mptcp_sock *msk;
> > +
> > 	if (unlikely(!sk || !sk_fullsock(sk)))
> > 		return NULL;
> > 
> > -	if (sk->sk_protocol == IPPROTO_MPTCP)
> > -		return mptcp_sk(sk);
> > +	if (sk->sk_protocol == IPPROTO_MPTCP) {
> > +		msk = mptcp_sk(sk);
> > +		mptcp_set_bpf_iter_task(msk);
> 
> Hi Geliang -
> 
> The important part of the bpf_iter_task approach is to only set the 
> task_struct pointer when the msk lock is already held, and to clear
> it 
> before the msk lock is released. When used this way, the check
> ensures 
> that the iterator code is both *running with the socket locked* and
> *in 
> the same context where the lock is held*.

Got it, thanks for your explanation.

> 
> The code above will set msk->bpf_iter_task even when the lock is not
> held, 
> and then it is never cleared.
> 
> To set/clear as expected for get/setsockopt I think the task pointer
> would 
> have to be set in places where the locks are acquired and released,
> like 
> these:
> 
> https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1955
> https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1846
> 

The get/setsockopt scenario is a bit complicated. I think we can use
the bpf_iter_task check only for bpf schedulers, keep the
get/setsockopt scenario unchanged, and continue to use the
msk_owned_by_me(msk) check. What do you think?

In addition, I think it's better to add bpf_iter_task to struct
mptcp_sched_data instead in struct mptcp_sock, since it's more related
to mptcp scheduler. For specific usage, please see [1]. What do you
think of this?

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1739788598.git.tanggeliang@kylinos.cn/

> 
> - Mat
> 
> > +		return msk;
> > +	}
> > 
> > -	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
> > -		return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> > +	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) {
> > +		msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> > +		mptcp_set_bpf_iter_task(msk);
> > +		return msk;
> > +	}
> > 
> > 	return NULL;
> > }
> > -- 
> > 2.43.0
> > 
> > 
> >
Matthieu Baerts March 7, 2025, 8:53 a.m. UTC | #4
Hi Geliang,

On 07/03/2025 04:54, Geliang Tang wrote:
> On Mon, 2025-03-03 at 17:37 -0800, Mat Martineau wrote:

(...)

>> The code above will set msk->bpf_iter_task even when the lock is not
>> held, 
>> and then it is never cleared.
>>
>> To set/clear as expected for get/setsockopt I think the task pointer
>> would 
>> have to be set in places where the locks are acquired and released,
>> like 
>> these:
>>
>> https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1955
>> https://elixir.bootlin.com/linux/v6.14-rc5/source/kernel/bpf/cgroup.c#L1846
>>
> 
> The get/setsockopt scenario is a bit complicated. I think we can use
> the bpf_iter_task check only for bpf schedulers, keep the
> get/setsockopt scenario unchanged, and continue to use the
> msk_owned_by_me(msk) check. What do you think?

I think in any case, it doesn't hurt to keep msk_owned_by_me(msk) for
lockdep.

If we think it is difficult to have the bpf_iter for both CG
[gs]etsockopt and struct_ops, then we can also drop the patches for the
[gs]etsockopt.

Or, from a kfunc, is it easy to check if we are called from a struct_ops?

> In addition, I think it's better to add bpf_iter_task to struct
> mptcp_sched_data instead in struct mptcp_sock, since it's more related
> to mptcp scheduler. For specific usage, please see [1]. What do you
> think of this?

Please see my reply to your comment on this thread. In short, we can
always add new parameters later, when we need them, no?

https://lore.kernel.org/706a3051-8328-4321-8eea-71d4120ea996@kernel.org

> 
> Thanks,
> -Geliang
> 
> [1]
> https://patchwork.kernel.org/project/mptcp/cover/cover.1739788598.git.tanggeliang@kylinos.cn/
> 
>>
>> - Mat
>>
>>> +		return msk;
>>> +	}
>>>
>>> -	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
>>> -		return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
>>> +	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) {
>>> +		msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn);
>>> +		mptcp_set_bpf_iter_task(msk);
>>> +		return msk;
>>> +	}
>>>
>>> 	return NULL;
>>> }
>>> -- 
>>> 2.43.0
>>>
>>>
>>>
> 
> 

Cheers,
Matt
diff mbox series

Patch

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index be222fa5f308..aff92f458e7f 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -197,14 +197,22 @@  static struct bpf_struct_ops bpf_mptcp_sched_ops = {
 
 struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk)
 {
+	struct mptcp_sock *msk;
+
 	if (unlikely(!sk || !sk_fullsock(sk)))
 		return NULL;
 
-	if (sk->sk_protocol == IPPROTO_MPTCP)
-		return mptcp_sk(sk);
+	if (sk->sk_protocol == IPPROTO_MPTCP) {
+		msk = mptcp_sk(sk);
+		mptcp_set_bpf_iter_task(msk);
+		return msk;
+	}
 
-	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
-		return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
+	if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) {
+		msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn);
+		mptcp_set_bpf_iter_task(msk);
+		return msk;
+	}
 
 	return NULL;
 }