Message ID | 20220803124121.173303-1-yin31149@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v4] net: fix refcount bug in sk_psock_get (2) | expand |
On Wed, 3 Aug 2022 20:41:22 +0800 Hawkins Jiawei wrote: > -/* Pointer stored in sk_user_data might not be suitable for copying > - * when cloning the socket. For instance, it can point to a reference > - * counted object. sk_user_data bottom bit is set if pointer must not > - * be copied. > +/* flag bits in sk_user_data > + * > + * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might > + * not be suitable for copying when cloning the socket. > + * For instance, it can point to a reference counted object. > + * sk_user_data bottom bit is set if pointer must not be copied. > + * > + * SK_USER_DATA_BPF - Managed by BPF I'd use this opportunity to add more info here, BPF is too general. Maybe "Pointer is used by a BPF reuseport array"? Martin, WDYT? > + * SK_USER_DATA_PSOCK - Mark whether pointer stored in sk_user_data points > + * to psock type. This bit should be set when sk_user_data is > + * assigned to a psock object. > +/** > + * rcu_dereference_sk_user_data_psock - return psock if sk_user_data > + * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise > + * return NULL > + * > + * @sk: socket > + */ > +static inline > +struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk) nit: the return type more commonly goes on the same line as "static inline" > +{ > + uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk))); > + > + if (__tmp & SK_USER_DATA_PSOCK) > + return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK); > + > + return NULL; > +} As a follow up we can probably generalize this into __rcu_dereference_sk_user_data_cond(sk, bit) and make the psock just call that: static inline struct sk_psock * rcu_dereference_sk_user_data_psock(const struct sock *sk) { return __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_PSOCK); } then reuseport can also benefit, maybe: diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c index e2618fb5870e..ad5c447a690c 100644 --- a/kernel/bpf/reuseport_array.c +++ b/kernel/bpf/reuseport_array.c @@ -21,14 +21,11 @@ static struct reuseport_array *reuseport_array(struct bpf_map *map) /* The caller must hold the reuseport_lock */ void bpf_sk_reuseport_detach(struct sock *sk) { - uintptr_t sk_user_data; + struct sock __rcu **socks; write_lock_bh(&sk->sk_callback_lock); - sk_user_data = (uintptr_t)sk->sk_user_data; - if (sk_user_data & SK_USER_DATA_BPF) { - struct sock __rcu **socks; - - socks = (void *)(sk_user_data & SK_USER_DATA_PTRMASK); + socks = __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_BPF); + if (socks) { WRITE_ONCE(sk->sk_user_data, NULL); /* * Do not move this NULL assignment outside of But that must be a separate patch, not part of this fix.
On Wed, Aug 03, 2022 at 08:14:13AM -0700, Jakub Kicinski wrote: > On Wed, 3 Aug 2022 20:41:22 +0800 Hawkins Jiawei wrote: > > -/* Pointer stored in sk_user_data might not be suitable for copying > > - * when cloning the socket. For instance, it can point to a reference > > - * counted object. sk_user_data bottom bit is set if pointer must not > > - * be copied. > > +/* flag bits in sk_user_data > > + * > > + * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might > > + * not be suitable for copying when cloning the socket. > > + * For instance, it can point to a reference counted object. > > + * sk_user_data bottom bit is set if pointer must not be copied. > > + * > > + * SK_USER_DATA_BPF - Managed by BPF > > I'd use this opportunity to add more info here, BPF is too general. > Maybe "Pointer is used by a BPF reuseport array"? Martin, WDYT? SGTM. Thanks.
On Wed, 3 Aug 2022 at 23:37, Martin KaFai Lau <kafai@fb.com> wrote: > > > + * SK_USER_DATA_BPF - Managed by BPF > > > > I'd use this opportunity to add more info here, BPF is too general. > > Maybe "Pointer is used by a BPF reuseport array"? Martin, WDYT? > SGTM. Thanks. OK. It seems that this flag is introduced from c9a368f1c0fb ("bpf: net: Avoid incorrect bpf_sk_reuseport_detach call"). I will search for more detailed description in this commit. > > > > > +/** > > > + * rcu_dereference_sk_user_data_psock - return psock if sk_user_data > > > + * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise > > > + * return NULL > > > + * > > > + * @sk: socket > > > + */ > > > +static inline > > > +struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk) > > > > nit: the return type more commonly goes on the same line as "static > > inline" Ok. I will correct it. > > > > > +{ > > > + uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk))); > > > + > > > + if (__tmp & SK_USER_DATA_PSOCK) > > > + return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK); > > > + > > > + return NULL; > > > +} > > > > As a follow up we can probably generalize this into > > __rcu_dereference_sk_user_data_cond(sk, bit) > > > > and make the psock just call that: > > > > static inline struct sk_psock * > > rcu_dereference_sk_user_data_psock(const struct sock *sk) > > { > > return __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_PSOCK); > > } Yes. I will refactor it in this way. > > > > diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c > > index e2618fb5870e..ad5c447a690c 100644 > > --- a/kernel/bpf/reuseport_array.c > > +++ b/kernel/bpf/reuseport_array.c > > @@ -21,14 +21,11 @@ static struct reuseport_array *reuseport_array(struct bpf_map *map) > > /* The caller must hold the reuseport_lock */ > > void bpf_sk_reuseport_detach(struct sock *sk) > > { > > - uintptr_t sk_user_data; > > + struct sock __rcu **socks; > > > > write_lock_bh(&sk->sk_callback_lock); > > - sk_user_data = (uintptr_t)sk->sk_user_data; > > - if (sk_user_data & SK_USER_DATA_BPF) { > > - struct sock __rcu **socks; > > - > > - socks = (void *)(sk_user_data & SK_USER_DATA_PTRMASK); > > + socks = __rcu_dereference_sk_user_data_cond(sk, SK_USER_DATA_BPF); > > + if (socks) { > > WRITE_ONCE(sk->sk_user_data, NULL); > > /* > > * Do not move this NULL assignment outside of > > > > > > But that must be a separate patch, not part of this fix. I wonder if it is proper to gather these together in a patchset, for they are all about flags in sk_user_data, maybe: [PATCH v5 0/2] net: enhancement to flags in sk_user_data field - introduce the patchset [PATCH v5 1/2] net: clean up code for flags in sk_user_data field - refactor the things in include/linux/skmsg.h and include/net/sock.h - refactor the flags's usage by other code, such as net/core/skmsg.c and kernel/bpf/reuseport_array.c [PATCH v5 2/2] net: fix refcount bug in sk_psock_get (2) - add SK_USER_DATA_PSOCK flag in sk_user_data field
On Thu, 4 Aug 2022 11:05:15 +0800 Hawkins Jiawei wrote: > I wonder if it is proper to gather these together in a patchset, for > they are all about flags in sk_user_data, maybe: > > [PATCH v5 0/2] net: enhancement to flags in sk_user_data field > - introduce the patchset > > [PATCH v5 1/2] net: clean up code for flags in sk_user_data field > - refactor the things in include/linux/skmsg.h and > include/net/sock.h > - refactor the flags's usage by other code, such as > net/core/skmsg.c and kernel/bpf/reuseport_array.c > > [PATCH v5 2/2] net: fix refcount bug in sk_psock_get (2) > - add SK_USER_DATA_PSOCK flag in sk_user_data field Usually the fix comes first, because it needs to be backported to stable, and we don't want to have to pull extra commits into stable and risk regressions in code which was not directly involved in the bug.
On Thu, 4 Aug 2022 at 23:29, Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 4 Aug 2022 11:05:15 +0800 Hawkins Jiawei wrote: > > I wonder if it is proper to gather these together in a patchset, for > > they are all about flags in sk_user_data, maybe: > > > > [PATCH v5 0/2] net: enhancement to flags in sk_user_data field > > - introduce the patchset > > > > [PATCH v5 1/2] net: clean up code for flags in sk_user_data field > > - refactor the things in include/linux/skmsg.h and > > include/net/sock.h > > - refactor the flags's usage by other code, such as > > net/core/skmsg.c and kernel/bpf/reuseport_array.c > > > > [PATCH v5 2/2] net: fix refcount bug in sk_psock_get (2) > > - add SK_USER_DATA_PSOCK flag in sk_user_data field > > Usually the fix comes first, because it needs to be backported to > stable, and we don't want to have to pull extra commits into stable > and risk regressions in code which was not directly involved in the bug. Ok, I got it. Thanks for the explanation.
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c5a2d6f50f25..81bfa1a33623 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -277,7 +277,7 @@ static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start) static inline struct sk_psock *sk_psock(const struct sock *sk) { - return rcu_dereference_sk_user_data(sk); + return rcu_dereference_sk_user_data_psock(sk); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index 9fa54762e077..d010910d5879 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -545,14 +545,24 @@ enum sk_pacing { SK_PACING_FQ = 2, }; -/* Pointer stored in sk_user_data might not be suitable for copying - * when cloning the socket. For instance, it can point to a reference - * counted object. sk_user_data bottom bit is set if pointer must not - * be copied. +/* flag bits in sk_user_data + * + * SK_USER_DATA_NOCOPY - Pointer stored in sk_user_data might + * not be suitable for copying when cloning the socket. + * For instance, it can point to a reference counted object. + * sk_user_data bottom bit is set if pointer must not be copied. + * + * SK_USER_DATA_BPF - Managed by BPF + * + * SK_USER_DATA_PSOCK - Mark whether pointer stored in sk_user_data points + * to psock type. This bit should be set when sk_user_data is + * assigned to a psock object. */ #define SK_USER_DATA_NOCOPY 1UL -#define SK_USER_DATA_BPF 2UL /* Managed by BPF */ -#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF) +#define SK_USER_DATA_BPF 2UL +#define SK_USER_DATA_PSOCK 4UL +#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF |\ + SK_USER_DATA_PSOCK) /** * sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied @@ -570,19 +580,35 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk) void *__tmp = rcu_dereference(__sk_user_data((sk))); \ (void *)((uintptr_t)__tmp & SK_USER_DATA_PTRMASK); \ }) -#define rcu_assign_sk_user_data(sk, ptr) \ +#define rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \ ({ \ - uintptr_t __tmp = (uintptr_t)(ptr); \ - WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \ - rcu_assign_pointer(__sk_user_data((sk)), __tmp); \ -}) -#define rcu_assign_sk_user_data_nocopy(sk, ptr) \ -({ \ - uintptr_t __tmp = (uintptr_t)(ptr); \ - WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \ + uintptr_t __tmp1 = (uintptr_t)(ptr), \ + __tmp2 = (uintptr_t)(flags); \ + WARN_ON_ONCE(__tmp1 & ~SK_USER_DATA_PTRMASK); \ + WARN_ON_ONCE(__tmp2 & SK_USER_DATA_PTRMASK); \ rcu_assign_pointer(__sk_user_data((sk)), \ - __tmp | SK_USER_DATA_NOCOPY); \ + __tmp1 | __tmp2); \ }) +#define rcu_assign_sk_user_data(sk, ptr) \ + rcu_assign_sk_user_data_with_flags(sk, ptr, 0) + +/** + * rcu_dereference_sk_user_data_psock - return psock if sk_user_data + * points to the psock type(SK_USER_DATA_PSOCK flag is set), otherwise + * return NULL + * + * @sk: socket + */ +static inline +struct sk_psock *rcu_dereference_sk_user_data_psock(const struct sock *sk) +{ + uintptr_t __tmp = (uintptr_t)rcu_dereference(__sk_user_data((sk))); + + if (__tmp & SK_USER_DATA_PSOCK) + return (struct sk_psock *)(__tmp & SK_USER_DATA_PTRMASK); + + return NULL; +} static inline struct net *sock_net(const struct sock *sk) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index b0fcd0200e84..d174897dbb4b 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -735,7 +735,8 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); refcount_set(&psock->refcnt, 1); - rcu_assign_sk_user_data_nocopy(sk, psock); + rcu_assign_sk_user_data_with_flags(sk, psock, SK_USER_DATA_NOCOPY | + SK_USER_DATA_PSOCK); sock_hold(sk); out: