diff mbox series

[v1] net: Add distinct sk_psock field

Message ID 165772238175.1757.4978340330606055982.stgit@oracle-102.nfsv4.dev (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v1] net: Add distinct sk_psock field | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply, async

Commit Message

Chuck Lever III July 13, 2022, 2:26 p.m. UTC
The sk_psock facility populates the sk_user_data field with the
address of an extra bit of metadata. User space sockets never
populate the sk_user_data field, so this has worked out fine.

However, kernel socket consumers such as the RPC client and server
do populate the sk_user_data field. The sk_psock() function cannot
tell that the content of sk_user_data does not point to psock
metadata, so it will happily return a pointer to something else,
cast to a struct sk_psock.

Thus kernel socket consumers and psock currently cannot co-exist.

We could educate sk_psock() to return NULL if sk_user_data does
not point to a struct sk_psock. However, a more general solution
that enables full co-existence psock and other uses of sk_user_data
might be more interesting.

Move the struct sk_psock address to its own pointer field so that
the contents of the sk_user_data field is preserved.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/skmsg.h |    2 +-
 include/net/sock.h    |    4 +++-
 net/core/skmsg.c      |    6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski July 14, 2022, 1:51 a.m. UTC | #1
On Wed, 13 Jul 2022 10:26:21 -0400 Chuck Lever wrote:
> The sk_psock facility populates the sk_user_data field with the
> address of an extra bit of metadata. User space sockets never
> populate the sk_user_data field, so this has worked out fine.
> 
> However, kernel socket consumers such as the RPC client and server
> do populate the sk_user_data field. The sk_psock() function cannot
> tell that the content of sk_user_data does not point to psock
> metadata, so it will happily return a pointer to something else,
> cast to a struct sk_psock.
> 
> Thus kernel socket consumers and psock currently cannot co-exist.
> 
> We could educate sk_psock() to return NULL if sk_user_data does
> not point to a struct sk_psock. However, a more general solution
> that enables full co-existence psock and other uses of sk_user_data
> might be more interesting.
> 
> Move the struct sk_psock address to its own pointer field so that
> the contents of the sk_user_data field is preserved.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Thanks for posting separately. We already have the (somewhat
nondescript) SK_USER_DATA_BPF, can we use another bit for psock?
Or add a u8 user_data type and have TCP ULP reject if the type is
anything but psock. I'm not sure why psock is special to deserve 
its own pointer.
Khalid Masum July 14, 2022, 4:55 a.m. UTC | #2
On Wed, 13 Jul 2022 10:26:21 -0400 Chuck Lever wrote:
> The sk_psock facility populates the sk_user_data field with the
> address of an extra bit of metadata. User space sockets never
> populate the sk_user_data field, so this has worked out fine.
> 
> However, kernel socket consumers such as the RPC client and server
> do populate the sk_user_data field. The sk_psock() function cannot
> tell that the content of sk_user_data does not point to psock
> metadata, so it will happily return a pointer to something else,
> cast to a struct sk_psock.
> 
> Thus kernel socket consumers and psock currently cannot co-exist.
> 
> We could educate sk_psock() to return NULL if sk_user_data does
> not point to a struct sk_psock. However, a more general solution
> that enables full co-existence psock and other uses of sk_user_data
> might be more interesting.
> 
> Move the struct sk_psock address to its own pointer field so that
> the contents of the sk_user_data field is preserved.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

The patch seems to fix the syzbot bug: 
[syzbot] KASAN: slab-out-of-bounds Read in sk_psock_get
Reported-by: syzbot+1fa91bcd05206ff8cbb5@syzkaller.appspotmail.com
As the reproducer no longer triggers the warning.

Tested-by: Khalid Masum <khalid.masum.92@gmail.com>
Chuck Lever III July 14, 2022, 1:50 p.m. UTC | #3
> On Jul 13, 2022, at 9:51 PM, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> On Wed, 13 Jul 2022 10:26:21 -0400 Chuck Lever wrote:
>> The sk_psock facility populates the sk_user_data field with the
>> address of an extra bit of metadata. User space sockets never
>> populate the sk_user_data field, so this has worked out fine.
>> 
>> However, kernel socket consumers such as the RPC client and server
>> do populate the sk_user_data field. The sk_psock() function cannot
>> tell that the content of sk_user_data does not point to psock
>> metadata, so it will happily return a pointer to something else,
>> cast to a struct sk_psock.
>> 
>> Thus kernel socket consumers and psock currently cannot co-exist.
>> 
>> We could educate sk_psock() to return NULL if sk_user_data does
>> not point to a struct sk_psock. However, a more general solution
>> that enables full co-existence psock and other uses of sk_user_data
>> might be more interesting.
>> 
>> Move the struct sk_psock address to its own pointer field so that
>> the contents of the sk_user_data field is preserved.
>> 
>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> Thanks for posting separately. We already have the (somewhat
> nondescript) SK_USER_DATA_BPF, can we use another bit for psock?
> Or add a u8 user_data type and have TCP ULP reject if the type is
> anything but psock. I'm not sure why psock is special to deserve 
> its own pointer.

Hi Jakub, for an informed answer, you will need to ask the folks
who maintain psock. My guess is that kernel consumers might need
to populate both BPF/psock and sk_user_data concurrently for
separate purposes. If concurrent usage is never necessary, then
you can probably get away with a small enumerator that describes
the content of sk_user_data. But after some code auditing it didn't
look to me like that would be adequate.


--
Chuck Lever
diff mbox series

Patch

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..5ef3a07c5b6c 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -277,7 +277,7 @@  static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start)
 
 static inline struct sk_psock *sk_psock(const struct sock *sk)
 {
-	return rcu_dereference_sk_user_data(sk);
+	return rcu_dereference(sk->sk_psock);
 }
 
 static inline void sk_psock_set_state(struct sk_psock *psock,
diff --git a/include/net/sock.h b/include/net/sock.h
index c4b91fc19b9c..d2a513169527 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -327,7 +327,8 @@  struct sk_filter;
   *	@sk_tskey: counter to disambiguate concurrent tstamp requests
   *	@sk_zckey: counter to order MSG_ZEROCOPY notifications
   *	@sk_socket: Identd and reporting IO signals
-  *	@sk_user_data: RPC layer private data
+  *	@sk_user_data: Upper layer private data
+  *	@sk_psock: socket policy data (bpf)
   *	@sk_frag: cached page frag
   *	@sk_peek_off: current peek_offset value
   *	@sk_send_head: front of stuff to transmit
@@ -519,6 +520,7 @@  struct sock {
 
 	struct socket		*sk_socket;
 	void			*sk_user_data;
+	struct sk_psock	__rcu	*sk_psock;
 #ifdef CONFIG_SECURITY
 	void			*sk_security;
 #endif
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index cc381165ea08..2b3d01d92790 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -695,7 +695,7 @@  struct sk_psock *sk_psock_init(struct sock *sk, int node)
 
 	write_lock_bh(&sk->sk_callback_lock);
 
-	if (sk->sk_user_data) {
+	if (sk->sk_psock) {
 		psock = ERR_PTR(-EBUSY);
 		goto out;
 	}
@@ -726,7 +726,7 @@  struct sk_psock *sk_psock_init(struct sock *sk, int node)
 	sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED);
 	refcount_set(&psock->refcnt, 1);
 
-	rcu_assign_sk_user_data_nocopy(sk, psock);
+	rcu_assign_pointer(sk->sk_psock, psock);
 	sock_hold(sk);
 
 out:
@@ -825,7 +825,7 @@  void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
 {
 	write_lock_bh(&sk->sk_callback_lock);
 	sk_psock_restore_proto(sk, psock);
-	rcu_assign_sk_user_data(sk, NULL);
+	rcu_assign_pointer(sk->sk_psock, NULL);
 	if (psock->progs.stream_parser)
 		sk_psock_stop_strp(sk, psock);
 	else if (psock->progs.stream_verdict || psock->progs.skb_verdict)