diff mbox series

[RFC,1/5] net: Add distinct sk_psock field

Message ID 165030056960.5073.6664402939918720250.stgit@oracle-102.nfsv4.dev (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series Implement a TLS handshake upcall | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3015 this patch: 3015
netdev/cc_maintainers fail 13 maintainers not CCed: songliubraving@fb.com jakub@cloudflare.com andrii@kernel.org davem@davemloft.net kuba@kernel.org pabeni@redhat.com daniel@iogearbox.net kafai@fb.com yhs@fb.com bpf@vger.kernel.org john.fastabend@gmail.com kpsingh@kernel.org ast@kernel.org
netdev/build_clang success Errors and warnings before: 626 this patch: 626
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3143 this patch: 3143
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 48 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply, async

Commit Message

Chuck Lever April 18, 2022, 4:49 p.m. UTC
The sk_psock facility populates the sk_user_data field with the
address of an extra bit of metadata. User space sockets never
populate the sk_user_data field, so this has worked out fine.

However, kernel consumers such as the RPC client and server do
populate the sk_user_data field. The sk_psock() function cannot tell
that the content of sk_user_data does not point to psock metadata,
so it will happily return a pointer to something else, cast to a
struct sk_psock.

Thus kernel consumers and psock currently cannot co-exist.

We could educate sk_psock() to return NULL if sk_user_data does
not point to a struct sk_psock. However, a more general solution
that enables full co-existence psock and other uses of sk_user_data
might be more interesting.

Move the struct sk_psock address to its own pointer field so that
the contents of the sk_user_data field is preserved.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/skmsg.h |    2 +-
 include/net/sock.h    |    4 +++-
 net/core/skmsg.c      |    6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

Comments

Hannes Reinecke April 21, 2022, 7:35 a.m. UTC | #1
On 4/18/22 18:49, Chuck Lever wrote:
> The sk_psock facility populates the sk_user_data field with the
> address of an extra bit of metadata. User space sockets never
> populate the sk_user_data field, so this has worked out fine.
> 
> However, kernel consumers such as the RPC client and server do
> populate the sk_user_data field. The sk_psock() function cannot tell
> that the content of sk_user_data does not point to psock metadata,
> so it will happily return a pointer to something else, cast to a
> struct sk_psock.
> 
> Thus kernel consumers and psock currently cannot co-exist.
> 
> We could educate sk_psock() to return NULL if sk_user_data does
> not point to a struct sk_psock. However, a more general solution
> that enables full co-existence psock and other uses of sk_user_data
> might be more interesting.
> 
> Move the struct sk_psock address to its own pointer field so that
> the contents of the sk_user_data field is preserved.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   include/linux/skmsg.h |    2 +-
>   include/net/sock.h    |    4 +++-
>   net/core/skmsg.c      |    6 +++---
>   3 files changed, 7 insertions(+), 5 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Hawkins Jiawei July 13, 2022, 4:46 a.m. UTC | #2
>On 4/18/22 18:49, Chuck Lever wrote:
>> The sk_psock facility populates the sk_user_data field with the
>> address of an extra bit of metadata. User space sockets never
>> populate the sk_user_data field, so this has worked out fine.
>> 
>> However, kernel consumers such as the RPC client and server do
>> populate the sk_user_data field. The sk_psock() function cannot tell
>> that the content of sk_user_data does not point to psock metadata,
>> so it will happily return a pointer to something else, cast to a
>> struct sk_psock.
>> 
>> Thus kernel consumers and psock currently cannot co-exist.
>> 
>> We could educate sk_psock() to return NULL if sk_user_data does
>> not point to a struct sk_psock. However, a more general solution
>> that enables full co-existence psock and other uses of sk_user_data
>> might be more interesting.
>> 
>> Move the struct sk_psock address to its own pointer field so that
>> the contents of the sk_user_data field is preserved.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>   include/linux/skmsg.h |    2 +-
>>   include/net/sock.h    |    4 +++-
>>   net/core/skmsg.c      |    6 +++---
>>   3 files changed, 7 insertions(+), 5 deletions(-)
>> 
>Reviewed-by: Hannes Reinecke <hare@suse.de>
>
>Cheers,
>
>Hannes

In Patchwork website, this patch fails the checks on
netdev/cc_maintainers.

So maybe you need CC folks pointed out by
scripts/get_maintainer.pl script, which is suggested
by Jakub Kicinski <kuba@kernel.org>.

What's more, Syskaller reports
refcount bug in sk_psock_get (2).

In this bug, the problem is that smc and psock, 
both use sk_user_data field to save their 
private data. So they will treat field in their own way.

> in smc_switch_to_fallback(), and set smc->clcsock->sk_user_data
> to origin smc in smc_fback_replace_callbacks().
> 
> Later, sk_psock_get() will treat the smc->clcsock->sk_user_data
> as sk_psock type, which triggers the refcnt warning.

I have tested this patch and the reproducer did not trigger any issue.
For more details, you can check the email
[PATCH] smc: fix refcount bug in sk_psock_get (2)
diff mbox series

Patch

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..5ef3a07c5b6c 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -277,7 +277,7 @@  static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start)
 
 static inline struct sk_psock *sk_psock(const struct sock *sk)
 {
-	return rcu_dereference_sk_user_data(sk);
+	return rcu_dereference(sk->sk_psock);
 }
 
 static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/include/net/sock.h b/include/net/sock.h
index c4b91fc19b9c..d2a513169527 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -327,7 +327,8 @@  struct sk_filter;
   *	@sk_tskey: counter to disambiguate concurrent tstamp requests
   *	@sk_zckey: counter to order MSG_ZEROCOPY notifications
   *	@sk_socket: Identd and reporting IO signals
-  *	@sk_user_data: RPC layer private data
+  *	@sk_user_data: Upper layer private data
+  *	@sk_psock: socket policy data (bpf)
   *	@sk_frag: cached page frag
   *	@sk_peek_off: current peek_offset value
   *	@sk_send_head: front of stuff to transmit
@@ -519,6 +520,7 @@  struct sock {
 
 	struct socket		*sk_socket;
 	void			*sk_user_data;
+	struct sk_psock	__rcu	*sk_psock;
 #ifdef CONFIG_SECURITY
 	void			*sk_security;
 #endif
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index cc381165ea08..2b3d01d92790 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -695,7 +695,7 @@  struct sk_psock *sk_psock_init(struct sock *sk, int node)
 
 	write_lock_bh(&sk->sk_callback_lock);
 
-	if (sk->sk_user_data) {
+	if (sk->sk_psock) {
 		psock = ERR_PTR(-EBUSY);
 		goto out;
 	}
@@ -726,7 +726,7 @@  struct sk_psock *sk_psock_init(struct sock *sk, int node)
 	sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED);
 	refcount_set(&psock->refcnt, 1);
 
-	rcu_assign_sk_user_data_nocopy(sk, psock);
+	rcu_assign_pointer(sk->sk_psock, psock);
 	sock_hold(sk);
 
 out:
@@ -825,7 +825,7 @@  void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
 {
 	write_lock_bh(&sk->sk_callback_lock);
 	sk_psock_restore_proto(sk, psock);
-	rcu_assign_sk_user_data(sk, NULL);
+	rcu_assign_pointer(sk->sk_psock, NULL);
 	if (psock->progs.stream_parser)
 		sk_psock_stop_strp(sk, psock);
 	else if (psock->progs.stream_verdict || psock->progs.skb_verdict)