[v4,bpf-next,2/4] bpf: Add bpf_sock_destroy kfunc

Message ID	20230323200633.3175753-3-aditi.ghag@isovalent.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Aditi Ghag <aditi.ghag@isovalent.com> To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v4 bpf-next 2/4] bpf: Add bpf_sock_destroy kfunc Date: Thu, 23 Mar 2023 20:06:31 +0000 Message-Id: <20230323200633.3175753-3-aditi.ghag@isovalent.com> In-Reply-To: <20230323200633.3175753-1-aditi.ghag@isovalent.com> References: <20230323200633.3175753-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bpf-nex: Add socket destroy capability \| expand [v4,bpf-next,0/5] bpf-nex: Add socket destroy capability [v4,bpf-next,1/4] bpf: Implement batching in UDP iterator [v4,bpf-next,2/4] bpf: Add bpf_sock_destroy kfunc [v4,bpf-next,3/4] bpf,tcp: Avoid taking fast sock lock in iterator [v4,bpf-next,4/4] selftests/bpf: Add tests for bpf_sock_destroy

Context	Check	Description
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for test_verifier on x86_64 with llvm-16

Aditi Ghag March 23, 2023, 8:06 p.m. UTC

The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium to force
client sockets to reconnect when their remote load-balancing backends are
deleted. The other use case is on-the-fly policy enforcement where existing
socket connections prevented by policies need to be forcefully terminated.
The helper allows terminating sockets that may or may not be actively
sending traffic.

The helper is currently exposed to certain BPF iterators where users can
filter, and terminate selected sockets.  Additionally, the helper can only
be called from these BPF contexts that ensure socket locking in order to
allow synchronous execution of destroy helpers that also acquire socket
locks. The previous commit that batches UDP sockets during iteration
facilitated a synchronous invocation of the destroy helper from BPF context
by skipping taking socket locks in the destroy handler. TCP iterators
already supported batching.

The helper takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. As a comparison, BPF helpers enable this behavior with the
`ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
option available with the verifier logic that handles kfuncs where BTF
types are inferred. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe. Hence, the BPF kfunc converts the argument to a full sock
before casting.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp.c    | 10 ++++++---
 net/ipv4/udp.c    |  6 ++++--
 3 files changed, 65 insertions(+), 5 deletions(-)

Martin KaFai Lau March 23, 2023, 11:58 p.m. UTC | #1

On 3/23/23 1:06 PM, Aditi Ghag wrote:
> The socket destroy kfunc is used to forcefully terminate sockets from
> certain BPF contexts. We plan to use the capability in Cilium to force
> client sockets to reconnect when their remote load-balancing backends are
> deleted. The other use case is on-the-fly policy enforcement where existing
> socket connections prevented by policies need to be forcefully terminated.
> The helper allows terminating sockets that may or may not be actively
> sending traffic.
> 
> The helper is currently exposed to certain BPF iterators where users can
> filter, and terminate selected sockets.  Additionally, the helper can only
> be called from these BPF contexts that ensure socket locking in order to
> allow synchronous execution of destroy helpers that also acquire socket
> locks. The previous commit that batches UDP sockets during iteration
> facilitated a synchronous invocation of the destroy helper from BPF context
> by skipping taking socket locks in the destroy handler. TCP iterators
> already supported batching.
> 
> The helper takes `sock_common` type argument, even though it expects, and
> casts them to a `sock` pointer. This enables the verifier to allow the
> sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
> `sock` structs. As a comparison, BPF helpers enable this behavior with the
> `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
> option available with the verifier logic that handles kfuncs where BTF
> types are inferred. Furthermore, as `sock_common` only has a subset of
> certain fields of `sock`, casting pointer to the latter type might not
> always be safe. Hence, the BPF kfunc converts the argument to a full sock
> before casting.
> 
> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++

This patch has merge conflict: https://github.com/kernel-patches/bpf/pull/4811, 
so please rebase in the next respin.

I took a quick skim but haven't finished. Please hold off the respin and review 
can be continued on this revision.

Stanislav Fomichev March 24, 2023, 9:37 p.m. UTC | #2

On 03/23, Aditi Ghag wrote:
> The socket destroy kfunc is used to forcefully terminate sockets from
> certain BPF contexts. We plan to use the capability in Cilium to force
> client sockets to reconnect when their remote load-balancing backends are
> deleted. The other use case is on-the-fly policy enforcement where  
> existing
> socket connections prevented by policies need to be forcefully terminated.
> The helper allows terminating sockets that may or may not be actively
> sending traffic.

> The helper is currently exposed to certain BPF iterators where users can
> filter, and terminate selected sockets.  Additionally, the helper can only
> be called from these BPF contexts that ensure socket locking in order to
> allow synchronous execution of destroy helpers that also acquire socket
> locks. The previous commit that batches UDP sockets during iteration
> facilitated a synchronous invocation of the destroy helper from BPF  
> context
> by skipping taking socket locks in the destroy handler. TCP iterators
> already supported batching.

> The helper takes `sock_common` type argument, even though it expects, and
> casts them to a `sock` pointer. This enables the verifier to allow the
> sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
> `sock` structs. As a comparison, BPF helpers enable this behavior with the
> `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
> option available with the verifier logic that handles kfuncs where BTF
> types are inferred. Furthermore, as `sock_common` only has a subset of
> certain fields of `sock`, casting pointer to the latter type might not
> always be safe. Hence, the BPF kfunc converts the argument to a full sock
> before casting.

> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
>   net/ipv4/tcp.c    | 10 ++++++---
>   net/ipv4/udp.c    |  6 ++++--
>   3 files changed, 65 insertions(+), 5 deletions(-)

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1d6f165923bf..ba3e0dac119c 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11621,3 +11621,57 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)

>   	return func;
>   }
> +
> +/* Disables missing prototype warnings */
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in vmlinux BTF");
> +
> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error  
> code.
> + *
> + * The helper expects a non-NULL pointer to a socket. It invokes the
> + * protocol specific socket destroy handlers.
> + *
> + * The helper can only be called from BPF contexts that have acquired  
> the socket
> + * locks.
> + *
> + * Parameters:
> + * @sock: Pointer to socket to be destroyed
> + *
> + * Return:
> + * On error, may return EPROTONOSUPPORT, EINVAL.
> + * EPROTONOSUPPORT if protocol specific destroy handler is not  
> implemented.
> + * 0 otherwise
> + */
> +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
> +{
> +	struct sock *sk = (struct sock *)sock;
> +
> +	if (!sk)
> +		return -EINVAL;
> +
> +	/* The locking semantics that allow for synchronous execution of the
> +	 * destroy handlers are only supported for TCP and UDP.
> +	 */
> +	if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW)
> +		return -EOPNOTSUPP;

Copy-pasting from v3, let's discuss here.

Maybe make it more opt-in? (vs current "opt ipproto_raw out")

if (sk->sk_prot->diag_destroy != udp_abort &&
     sk->sk_prot->diag_destroy != tcp_abort)
             return -EOPNOTSUPP;

Is it more robust? Or does it look uglier? )
But maybe fine as is, I'm just thinking out loud..

> +
> +	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
> +}
> +
> +__diag_pop()
> +
> +BTF_SET8_START(sock_destroy_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_sock_destroy)
> +BTF_SET8_END(sock_destroy_kfunc_set)
> +
> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &sock_destroy_kfunc_set,
> +};
> +
> +static int init_subsystem(void)
> +{
> +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,  
> &bpf_sock_destroy_kfunc_set);
> +}
> +late_initcall(init_subsystem);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 33f559f491c8..5df6231016e3 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err)
>   		return 0;
>   	}

> -	/* Don't race with userspace socket closes such as tcp_close. */
> -	lock_sock(sk);
> +	/* BPF context ensures sock locking. */
> +	if (!has_current_bpf_ctx())
> +		/* Don't race with userspace socket closes such as tcp_close. */
> +		lock_sock(sk);

>   	if (sk->sk_state == TCP_LISTEN) {
>   		tcp_set_state(sk, TCP_CLOSE);
> @@ -4701,9 +4703,11 @@ int tcp_abort(struct sock *sk, int err)
>   	}

>   	bh_unlock_sock(sk);
> +
>   	local_bh_enable();
>   	tcp_write_queue_purge(sk);
> -	release_sock(sk);
> +	if (!has_current_bpf_ctx())
> +		release_sock(sk);
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(tcp_abort);
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 58c620243e47..408836102e20 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll);

>   int udp_abort(struct sock *sk, int err)
>   {
> -	lock_sock(sk);
> +	if (!has_current_bpf_ctx())
> +		lock_sock(sk);

>   	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
>   	 * with close()
> @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err)
>   	__udp_disconnect(sk, 0);

>   out:
> -	release_sock(sk);
> +	if (!has_current_bpf_ctx())
> +		release_sock(sk);

>   	return 0;
>   }
> --
> 2.34.1

Aditi Ghag March 30, 2023, 2:42 p.m. UTC | #3

> On Mar 24, 2023, at 2:37 PM, Stanislav Fomichev <sdf@google.com> wrote:
> 
> On 03/23, Aditi Ghag wrote:
>> The socket destroy kfunc is used to forcefully terminate sockets from
>> certain BPF contexts. We plan to use the capability in Cilium to force
>> client sockets to reconnect when their remote load-balancing backends are
>> deleted. The other use case is on-the-fly policy enforcement where existing
>> socket connections prevented by policies need to be forcefully terminated.
>> The helper allows terminating sockets that may or may not be actively
>> sending traffic.
> 
>> The helper is currently exposed to certain BPF iterators where users can
>> filter, and terminate selected sockets.  Additionally, the helper can only
>> be called from these BPF contexts that ensure socket locking in order to
>> allow synchronous execution of destroy helpers that also acquire socket
>> locks. The previous commit that batches UDP sockets during iteration
>> facilitated a synchronous invocation of the destroy helper from BPF context
>> by skipping taking socket locks in the destroy handler. TCP iterators
>> already supported batching.
> 
>> The helper takes `sock_common` type argument, even though it expects, and
>> casts them to a `sock` pointer. This enables the verifier to allow the
>> sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
>> `sock` structs. As a comparison, BPF helpers enable this behavior with the
>> `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
>> option available with the verifier logic that handles kfuncs where BTF
>> types are inferred. Furthermore, as `sock_common` only has a subset of
>> certain fields of `sock`, casting pointer to the latter type might not
>> always be safe. Hence, the BPF kfunc converts the argument to a full sock
>> before casting.
> 
>> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
>> ---
>>  net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
>>  net/ipv4/tcp.c    | 10 ++++++---
>>  net/ipv4/udp.c    |  6 ++++--
>>  3 files changed, 65 insertions(+), 5 deletions(-)
> 
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 1d6f165923bf..ba3e0dac119c 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -11621,3 +11621,57 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
> 
>>  	return func;
>>  }
>> +
>> +/* Disables missing prototype warnings */
>> +__diag_push();
>> +__diag_ignore_all("-Wmissing-prototypes",
>> +		  "Global functions as their definitions will be in vmlinux BTF");
>> +
>> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
>> + *
>> + * The helper expects a non-NULL pointer to a socket. It invokes the
>> + * protocol specific socket destroy handlers.
>> + *
>> + * The helper can only be called from BPF contexts that have acquired the socket
>> + * locks.
>> + *
>> + * Parameters:
>> + * @sock: Pointer to socket to be destroyed
>> + *
>> + * Return:
>> + * On error, may return EPROTONOSUPPORT, EINVAL.
>> + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented.
>> + * 0 otherwise
>> + */
>> +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
>> +{
>> +	struct sock *sk = (struct sock *)sock;
>> +
>> +	if (!sk)
>> +		return -EINVAL;
>> +
>> +	/* The locking semantics that allow for synchronous execution of the
>> +	 * destroy handlers are only supported for TCP and UDP.
>> +	 */
>> +	if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW)
>> +		return -EOPNOTSUPP;
> 
> Copy-pasting from v3, let's discuss here.
> 
> Maybe make it more opt-in? (vs current "opt ipproto_raw out")
> 
> if (sk->sk_prot->diag_destroy != udp_abort &&
>    sk->sk_prot->diag_destroy != tcp_abort)
>            return -EOPNOTSUPP;
> 
> Is it more robust? Or does it look uglier? )
> But maybe fine as is, I'm just thinking out loud..

Do we expect the handler to be extended for more types? Probably not... So I'll leave it as is.

> 
>> +
>> +	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
>> +}
>> +
>> +__diag_pop()
>> +
>> +BTF_SET8_START(sock_destroy_kfunc_set)
>> +BTF_ID_FLAGS(func, bpf_sock_destroy)
>> +BTF_SET8_END(sock_destroy_kfunc_set)
>> +
>> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
>> +	.owner = THIS_MODULE,
>> +	.set   = &sock_destroy_kfunc_set,
>> +};
>> +
>> +static int init_subsystem(void)
>> +{
>> +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set);
>> +}
>> +late_initcall(init_subsystem);
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 33f559f491c8..5df6231016e3 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err)
>>  		return 0;
>>  	}
> 
>> -	/* Don't race with userspace socket closes such as tcp_close. */
>> -	lock_sock(sk);
>> +	/* BPF context ensures sock locking. */
>> +	if (!has_current_bpf_ctx())
>> +		/* Don't race with userspace socket closes such as tcp_close. */
>> +		lock_sock(sk);
> 
>>  	if (sk->sk_state == TCP_LISTEN) {
>>  		tcp_set_state(sk, TCP_CLOSE);
>> @@ -4701,9 +4703,11 @@ int tcp_abort(struct sock *sk, int err)
>>  	}
> 
>>  	bh_unlock_sock(sk);
>> +
>>  	local_bh_enable();
>>  	tcp_write_queue_purge(sk);
>> -	release_sock(sk);
>> +	if (!has_current_bpf_ctx())
>> +		release_sock(sk);
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL_GPL(tcp_abort);
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 58c620243e47..408836102e20 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll);
> 
>>  int udp_abort(struct sock *sk, int err)
>>  {
>> -	lock_sock(sk);
>> +	if (!has_current_bpf_ctx())
>> +		lock_sock(sk);
> 
>>  	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
>>  	 * with close()
>> @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err)
>>  	__udp_disconnect(sk, 0);
> 
>>  out:
>> -	release_sock(sk);
>> +	if (!has_current_bpf_ctx())
>> +		release_sock(sk);
> 
>>  	return 0;
>>  }
>> --
>> 2.34.1

Stanislav Fomichev March 30, 2023, 4:32 p.m. UTC | #4

On Thu, Mar 30, 2023 at 7:42 AM Aditi Ghag <aditi.ghag@isovalent.com> wrote:
>
>
>
> > On Mar 24, 2023, at 2:37 PM, Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 03/23, Aditi Ghag wrote:
> >> The socket destroy kfunc is used to forcefully terminate sockets from
> >> certain BPF contexts. We plan to use the capability in Cilium to force
> >> client sockets to reconnect when their remote load-balancing backends are
> >> deleted. The other use case is on-the-fly policy enforcement where existing
> >> socket connections prevented by policies need to be forcefully terminated.
> >> The helper allows terminating sockets that may or may not be actively
> >> sending traffic.
> >
> >> The helper is currently exposed to certain BPF iterators where users can
> >> filter, and terminate selected sockets.  Additionally, the helper can only
> >> be called from these BPF contexts that ensure socket locking in order to
> >> allow synchronous execution of destroy helpers that also acquire socket
> >> locks. The previous commit that batches UDP sockets during iteration
> >> facilitated a synchronous invocation of the destroy helper from BPF context
> >> by skipping taking socket locks in the destroy handler. TCP iterators
> >> already supported batching.
> >
> >> The helper takes `sock_common` type argument, even though it expects, and
> >> casts them to a `sock` pointer. This enables the verifier to allow the
> >> sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
> >> `sock` structs. As a comparison, BPF helpers enable this behavior with the
> >> `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
> >> option available with the verifier logic that handles kfuncs where BTF
> >> types are inferred. Furthermore, as `sock_common` only has a subset of
> >> certain fields of `sock`, casting pointer to the latter type might not
> >> always be safe. Hence, the BPF kfunc converts the argument to a full sock
> >> before casting.
> >
> >> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> >> ---
> >>  net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
> >>  net/ipv4/tcp.c    | 10 ++++++---
> >>  net/ipv4/udp.c    |  6 ++++--
> >>  3 files changed, 65 insertions(+), 5 deletions(-)
> >
> >> diff --git a/net/core/filter.c b/net/core/filter.c
> >> index 1d6f165923bf..ba3e0dac119c 100644
> >> --- a/net/core/filter.c
> >> +++ b/net/core/filter.c
> >> @@ -11621,3 +11621,57 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
> >
> >>      return func;
> >>  }
> >> +
> >> +/* Disables missing prototype warnings */
> >> +__diag_push();
> >> +__diag_ignore_all("-Wmissing-prototypes",
> >> +              "Global functions as their definitions will be in vmlinux BTF");
> >> +
> >> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
> >> + *
> >> + * The helper expects a non-NULL pointer to a socket. It invokes the
> >> + * protocol specific socket destroy handlers.
> >> + *
> >> + * The helper can only be called from BPF contexts that have acquired the socket
> >> + * locks.
> >> + *
> >> + * Parameters:
> >> + * @sock: Pointer to socket to be destroyed
> >> + *
> >> + * Return:
> >> + * On error, may return EPROTONOSUPPORT, EINVAL.
> >> + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented.
> >> + * 0 otherwise
> >> + */
> >> +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
> >> +{
> >> +    struct sock *sk = (struct sock *)sock;
> >> +
> >> +    if (!sk)
> >> +            return -EINVAL;
> >> +
> >> +    /* The locking semantics that allow for synchronous execution of the
> >> +     * destroy handlers are only supported for TCP and UDP.
> >> +     */
> >> +    if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW)
> >> +            return -EOPNOTSUPP;
> >
> > Copy-pasting from v3, let's discuss here.
> >
> > Maybe make it more opt-in? (vs current "opt ipproto_raw out")
> >
> > if (sk->sk_prot->diag_destroy != udp_abort &&
> >    sk->sk_prot->diag_destroy != tcp_abort)
> >            return -EOPNOTSUPP;
> >
> > Is it more robust? Or does it look uglier? )
> > But maybe fine as is, I'm just thinking out loud..
>
> Do we expect the handler to be extended for more types? Probably not... So I'll leave it as is.

My worry is about somebody adding .diag_destroy to some new/old
protocol in the future, say sctp_prot, without being aware
of this bpf_sock_destroy helper and its locking requirements.

So having an opt-in here (as in sk_protocol == IPPROTO_TCP ||
sk_protocol == IPPROTO_UDP) feels more future-proof than your current
opt-out (sk_proto != IPPROTO_RAW).
WDYT?

> >
> >> +
> >> +    return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
> >> +}
> >> +
> >> +__diag_pop()
> >> +
> >> +BTF_SET8_START(sock_destroy_kfunc_set)
> >> +BTF_ID_FLAGS(func, bpf_sock_destroy)
> >> +BTF_SET8_END(sock_destroy_kfunc_set)
> >> +
> >> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
> >> +    .owner = THIS_MODULE,
> >> +    .set   = &sock_destroy_kfunc_set,
> >> +};
> >> +
> >> +static int init_subsystem(void)
> >> +{
> >> +    return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set);
> >> +}
> >> +late_initcall(init_subsystem);
> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >> index 33f559f491c8..5df6231016e3 100644
> >> --- a/net/ipv4/tcp.c
> >> +++ b/net/ipv4/tcp.c
> >> @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err)
> >>              return 0;
> >>      }
> >
> >> -    /* Don't race with userspace socket closes such as tcp_close. */
> >> -    lock_sock(sk);
> >> +    /* BPF context ensures sock locking. */
> >> +    if (!has_current_bpf_ctx())
> >> +            /* Don't race with userspace socket closes such as tcp_close. */
> >> +            lock_sock(sk);
> >
> >>      if (sk->sk_state == TCP_LISTEN) {
> >>              tcp_set_state(sk, TCP_CLOSE);
> >> @@ -4701,9 +4703,11 @@ int tcp_abort(struct sock *sk, int err)
> >>      }
> >
> >>      bh_unlock_sock(sk);
> >> +
> >>      local_bh_enable();
> >>      tcp_write_queue_purge(sk);
> >> -    release_sock(sk);
> >> +    if (!has_current_bpf_ctx())
> >> +            release_sock(sk);
> >>      return 0;
> >>  }
> >>  EXPORT_SYMBOL_GPL(tcp_abort);
> >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> >> index 58c620243e47..408836102e20 100644
> >> --- a/net/ipv4/udp.c
> >> +++ b/net/ipv4/udp.c
> >> @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll);
> >
> >>  int udp_abort(struct sock *sk, int err)
> >>  {
> >> -    lock_sock(sk);
> >> +    if (!has_current_bpf_ctx())
> >> +            lock_sock(sk);
> >
> >>      /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
> >>       * with close()
> >> @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err)
> >>      __udp_disconnect(sk, 0);
> >
> >>  out:
> >> -    release_sock(sk);
> >> +    if (!has_current_bpf_ctx())
> >> +            release_sock(sk);
> >
> >>      return 0;
> >>  }
> >> --
> >> 2.34.1
>

Martin KaFai Lau March 30, 2023, 5:30 p.m. UTC | #5

On 3/30/23 9:32 AM, Stanislav Fomichev wrote:
>>> Maybe make it more opt-in? (vs current "opt ipproto_raw out")
>>>
>>> if (sk->sk_prot->diag_destroy != udp_abort &&
>>>     sk->sk_prot->diag_destroy != tcp_abort)
>>>             return -EOPNOTSUPP;
>>>
>>> Is it more robust? Or does it look uglier? )
>>> But maybe fine as is, I'm just thinking out loud..
>>
>> Do we expect the handler to be extended for more types? Probably not... So I'll leave it as is.
> 
> My worry is about somebody adding .diag_destroy to some new/old
> protocol in the future, say sctp_prot, without being aware
> of this bpf_sock_destroy helper and its locking requirements.

Other helpers in filter.c is also opt-in. I think it is better to do the same 
here. IPPROTO_TCP and IPPROTO_UDP should have very good use case coverage to 
begin with. It can also help to ensure new selftests are written to cover the 
protocol supporting bpf_sock_destroy in the future.

I like the comment in bpf_sock_destroy() in this patch. It will be even better 
if it can spell out more clearly that future supporting protocol needs to assume 
the lock_sock has already been done on the bpf side.

> 
> So having an opt-in here (as in sk_protocol == IPPROTO_TCP ||
> sk_protocol == IPPROTO_UDP) feels more future-proof than your current
> opt-out (sk_proto != IPPROTO_RAW).
> WDYT?
>>>> +            release_sock(sk);
>>>
>>>>       return 0;
>>>>   }
>>>> --
>>>> 2.34.1
>>

Aditi Ghag April 3, 2023, 3:58 p.m. UTC | #6

> On Mar 30, 2023, at 10:30 AM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> On 3/30/23 9:32 AM, Stanislav Fomichev wrote:
>>>> Maybe make it more opt-in? (vs current "opt ipproto_raw out")
>>>> 
>>>> if (sk->sk_prot->diag_destroy != udp_abort &&
>>>>    sk->sk_prot->diag_destroy != tcp_abort)
>>>>            return -EOPNOTSUPP;
>>>> 
>>>> Is it more robust? Or does it look uglier? )
>>>> But maybe fine as is, I'm just thinking out loud..
>>> 
>>> Do we expect the handler to be extended for more types? Probably not... So I'll leave it as is.
>> My worry is about somebody adding .diag_destroy to some new/old
>> protocol in the future, say sctp_prot, without being aware
>> of this bpf_sock_destroy helper and its locking requirements.

Ah, sctp!! 

> 
> Other helpers in filter.c is also opt-in. I think it is better to do the same here. IPPROTO_TCP and IPPROTO_UDP should have very good use case coverage to begin with. It can also help to ensure new selftests are written to cover the protocol supporting bpf_sock_destroy in the future.
> 
> I like the comment in bpf_sock_destroy() in this patch. It will be even better if it can spell out more clearly that future supporting protocol needs to assume the lock_sock has already been done on the bpf side.

Ack... I'll make it opt-in. 

> 
>> So having an opt-in here (as in sk_protocol == IPPROTO_TCP ||
>> sk_protocol == IPPROTO_UDP) feels more future-proof than your current
>> opt-out (sk_proto != IPPROTO_RAW).
>> WDYT?
>>>>> +            release_sock(sk);
>>>> 
>>>>>      return 0;
>>>>>  }
>>>>> --
>>>>> 2.34.1
>>> 
>

[v4,bpf-next,2/4] bpf: Add bpf_sock_destroy kfunc

Checks

Commit Message

Comments

Patch