Message ID | 165772238175.1757.4978340330606055982.stgit@oracle-102.nfsv4.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v1] net: Add distinct sk_psock field | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply, async |
On Wed, 13 Jul 2022 10:26:21 -0400 Chuck Lever wrote: > The sk_psock facility populates the sk_user_data field with the > address of an extra bit of metadata. User space sockets never > populate the sk_user_data field, so this has worked out fine. > > However, kernel socket consumers such as the RPC client and server > do populate the sk_user_data field. The sk_psock() function cannot > tell that the content of sk_user_data does not point to psock > metadata, so it will happily return a pointer to something else, > cast to a struct sk_psock. > > Thus kernel socket consumers and psock currently cannot co-exist. > > We could educate sk_psock() to return NULL if sk_user_data does > not point to a struct sk_psock. However, a more general solution > that enables full co-existence psock and other uses of sk_user_data > might be more interesting. > > Move the struct sk_psock address to its own pointer field so that > the contents of the sk_user_data field is preserved. > > Reviewed-by: Hannes Reinecke <hare@suse.de> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Thanks for posting separately. We already have the (somewhat nondescript) SK_USER_DATA_BPF, can we use another bit for psock? Or add a u8 user_data type and have TCP ULP reject if the type is anything but psock. I'm not sure why psock is special to deserve its own pointer.
On Wed, 13 Jul 2022 10:26:21 -0400 Chuck Lever wrote: > The sk_psock facility populates the sk_user_data field with the > address of an extra bit of metadata. User space sockets never > populate the sk_user_data field, so this has worked out fine. > > However, kernel socket consumers such as the RPC client and server > do populate the sk_user_data field. The sk_psock() function cannot > tell that the content of sk_user_data does not point to psock > metadata, so it will happily return a pointer to something else, > cast to a struct sk_psock. > > Thus kernel socket consumers and psock currently cannot co-exist. > > We could educate sk_psock() to return NULL if sk_user_data does > not point to a struct sk_psock. However, a more general solution > that enables full co-existence psock and other uses of sk_user_data > might be more interesting. > > Move the struct sk_psock address to its own pointer field so that > the contents of the sk_user_data field is preserved. > > Reviewed-by: Hannes Reinecke <hare@suse.de> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> The patch seems to fix the syzbot bug: [syzbot] KASAN: slab-out-of-bounds Read in sk_psock_get Reported-by: syzbot+1fa91bcd05206ff8cbb5@syzkaller.appspotmail.com As the reproducer no longer triggers the warning. Tested-by: Khalid Masum <khalid.masum.92@gmail.com>
> On Jul 13, 2022, at 9:51 PM, Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 13 Jul 2022 10:26:21 -0400 Chuck Lever wrote: >> The sk_psock facility populates the sk_user_data field with the >> address of an extra bit of metadata. User space sockets never >> populate the sk_user_data field, so this has worked out fine. >> >> However, kernel socket consumers such as the RPC client and server >> do populate the sk_user_data field. The sk_psock() function cannot >> tell that the content of sk_user_data does not point to psock >> metadata, so it will happily return a pointer to something else, >> cast to a struct sk_psock. >> >> Thus kernel socket consumers and psock currently cannot co-exist. >> >> We could educate sk_psock() to return NULL if sk_user_data does >> not point to a struct sk_psock. However, a more general solution >> that enables full co-existence psock and other uses of sk_user_data >> might be more interesting. >> >> Move the struct sk_psock address to its own pointer field so that >> the contents of the sk_user_data field is preserved. >> >> Reviewed-by: Hannes Reinecke <hare@suse.de> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > Thanks for posting separately. We already have the (somewhat > nondescript) SK_USER_DATA_BPF, can we use another bit for psock? > Or add a u8 user_data type and have TCP ULP reject if the type is > anything but psock. I'm not sure why psock is special to deserve > its own pointer. Hi Jakub, for an informed answer, you will need to ask the folks who maintain psock. My guess is that kernel consumers might need to populate both BPF/psock and sk_user_data concurrently for separate purposes. If concurrent usage is never necessary, then you can probably get away with a small enumerator that describes the content of sk_user_data. But after some code auditing it didn't look to me like that would be adequate. -- Chuck Lever
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c5a2d6f50f25..5ef3a07c5b6c 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -277,7 +277,7 @@ static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start) static inline struct sk_psock *sk_psock(const struct sock *sk) { - return rcu_dereference_sk_user_data(sk); + return rcu_dereference(sk->sk_psock); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index c4b91fc19b9c..d2a513169527 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -327,7 +327,8 @@ struct sk_filter; * @sk_tskey: counter to disambiguate concurrent tstamp requests * @sk_zckey: counter to order MSG_ZEROCOPY notifications * @sk_socket: Identd and reporting IO signals - * @sk_user_data: RPC layer private data + * @sk_user_data: Upper layer private data + * @sk_psock: socket policy data (bpf) * @sk_frag: cached page frag * @sk_peek_off: current peek_offset value * @sk_send_head: front of stuff to transmit @@ -519,6 +520,7 @@ struct sock { struct socket *sk_socket; void *sk_user_data; + struct sk_psock __rcu *sk_psock; #ifdef CONFIG_SECURITY void *sk_security; #endif diff --git a/net/core/skmsg.c b/net/core/skmsg.c index cc381165ea08..2b3d01d92790 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -695,7 +695,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) write_lock_bh(&sk->sk_callback_lock); - if (sk->sk_user_data) { + if (sk->sk_psock) { psock = ERR_PTR(-EBUSY); goto out; } @@ -726,7 +726,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); refcount_set(&psock->refcnt, 1); - rcu_assign_sk_user_data_nocopy(sk, psock); + rcu_assign_pointer(sk->sk_psock, psock); sock_hold(sk); out: @@ -825,7 +825,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) { write_lock_bh(&sk->sk_callback_lock); sk_psock_restore_proto(sk, psock); - rcu_assign_sk_user_data(sk, NULL); + rcu_assign_pointer(sk->sk_psock, NULL); if (psock->progs.stream_parser) sk_psock_stop_strp(sk, psock); else if (psock->progs.stream_verdict || psock->progs.skb_verdict)