Message ID | 20220727060909.2371812-1-kafai@fb.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: net: Remove duplicated codes from bpf_setsockopt() | expand |
From: Martin KaFai Lau > Sent: 27 July 2022 07:09 > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > the sock_setsockopt(). The number of supported options are > increasing ever and so as the duplicated codes. > > One issue in reusing sock_setsockopt() is that the bpf prog > has already acquired the sk lock. sockptr_t is useful to handle this. > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > has already been ensured by the bpf prog. That is a really horrid place to hide an 'is locked' bit. You'd be better off splitting sock_setsockopt() to add a function that is called with sk_lock held and the value read. That would also save the churn of all the callers. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 07/26, Martin KaFai Lau wrote: > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > the sock_setsockopt(). The number of supported options are > increasing ever and so as the duplicated codes. > One issue in reusing sock_setsockopt() is that the bpf prog > has already acquired the sk lock. sockptr_t is useful to handle this. > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > has already been ensured by the bpf prog. Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some point, we can have code paths in bpf where the socket has been already locked by the stack?
On Wed, Jul 27, 2022 at 09:47:25AM -0700, sdf@google.com wrote: > On 07/26, Martin KaFai Lau wrote: > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > the sock_setsockopt(). The number of supported options are > > increasing ever and so as the duplicated codes. > > > One issue in reusing sock_setsockopt() is that the bpf prog > > has already acquired the sk lock. sockptr_t is useful to handle this. > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > has already been ensured by the bpf prog. > > Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some > point, is_locked was my initial attempt. The bpf_setsockopt() also skips the ns_capable() check, like in patch 3. I ended up using one is_bpf bit here to do both. > we can have code paths in bpf where the socket has been already locked by > the stack? hmm... You meant the opposite, like the bpf hook does not have the lock pre-acquired before the bpf prog gets run and sock_setsockopt() should do lock_sock() as usual? I was thinking a likely situation is a bpf 'sleepable' hook does not have the lock pre-acquired. In that case, the bpf_setsockopt() could always acquire the lock first but it may turn out to be too pessmissitic for the future bpf_[G]etsockopt() refactoring. or we could do this 'bit' break up (into one is_locked bit for locked and one is_bpf to skip-capable-check). I was waiting until a real need comes up instead of having both bits always true now. I don't mind to add is_locked now since the bpf_lsm_cgroup may come to sleepable soon. I can do this in the next spin.
On Wed, Jul 27, 2022 at 08:36:18AM +0000, David Laight wrote: > From: Martin KaFai Lau > > Sent: 27 July 2022 07:09 > > > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > the sock_setsockopt(). The number of supported options are > > increasing ever and so as the duplicated codes. > > > > One issue in reusing sock_setsockopt() is that the bpf prog > > has already acquired the sk lock. sockptr_t is useful to handle this. > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > has already been ensured by the bpf prog. > > That is a really horrid place to hide an 'is locked' bit. > > You'd be better off splitting sock_setsockopt() to add a function > that is called with sk_lock held and the value read. > That would also save the churn of all the callers. There is no churn to the callers after this patch, so quite the opposite.
On Wed, Jul 27, 2022 at 11:37 AM Martin KaFai Lau <kafai@fb.com> wrote: > > On Wed, Jul 27, 2022 at 09:47:25AM -0700, sdf@google.com wrote: > > On 07/26, Martin KaFai Lau wrote: > > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > > the sock_setsockopt(). The number of supported options are > > > increasing ever and so as the duplicated codes. > > > > > One issue in reusing sock_setsockopt() is that the bpf prog > > > has already acquired the sk lock. sockptr_t is useful to handle this. > > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > > has already been ensured by the bpf prog. > > > > Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some > > point, > is_locked was my initial attempt. The bpf_setsockopt() also skips > the ns_capable() check, like in patch 3. I ended up using > one is_bpf bit here to do both. Yeah, sorry, I haven't read the whole series before I sent my first reply. Let's discuss it here. This reminds me of ns_capable in __inet_bind where we also had to add special handling. In general, not specific to the series, I wonder if we want some new in_bpf() context indication and bypass ns_capable() from those contexts? Then we can do things like: if (sk->sk_bound_dev_if && !in_bpf() && !ns_capable(net->user_ns, CAP_NET_RAW)) return ...; Or would it make things more confusing? > > we can have code paths in bpf where the socket has been already locked by > > the stack? > hmm... You meant the opposite, like the bpf hook does not have the > lock pre-acquired before the bpf prog gets run and sock_setsockopt() > should do lock_sock() as usual? > > I was thinking a likely situation is a bpf 'sleepable' hook does not > have the lock pre-acquired. In that case, the bpf_setsockopt() could > always acquire the lock first but it may turn out to be too > pessmissitic for the future bpf_[G]etsockopt() refactoring. > > or we could do this 'bit' break up (into one is_locked bit > for locked and one is_bpf to skip-capable-check). I was waiting until a real > need comes up instead of having both bits always true now. I don't mind to > add is_locked now since the bpf_lsm_cgroup may come to sleepable soon. > I can do this in the next spin.
On Wed, Jul 27, 2022 at 01:39:08PM -0700, Stanislav Fomichev wrote: > On Wed, Jul 27, 2022 at 11:37 AM Martin KaFai Lau <kafai@fb.com> wrote: > > > > On Wed, Jul 27, 2022 at 09:47:25AM -0700, sdf@google.com wrote: > > > On 07/26, Martin KaFai Lau wrote: > > > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > > > the sock_setsockopt(). The number of supported options are > > > > increasing ever and so as the duplicated codes. > > > > > > > One issue in reusing sock_setsockopt() is that the bpf prog > > > > has already acquired the sk lock. sockptr_t is useful to handle this. > > > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > > > has already been ensured by the bpf prog. > > > > > > Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some > > > point, > > is_locked was my initial attempt. The bpf_setsockopt() also skips > > the ns_capable() check, like in patch 3. I ended up using > > one is_bpf bit here to do both. > > Yeah, sorry, I haven't read the whole series before I sent my first > reply. Let's discuss it here. > > This reminds me of ns_capable in __inet_bind where we also had to add > special handling. > > In general, not specific to the series, I wonder if we want some new > in_bpf() context indication and bypass ns_capable() from those > contexts? > Then we can do things like: > > if (sk->sk_bound_dev_if && !in_bpf() && !ns_capable(net->user_ns, > CAP_NET_RAW)) > return ...; Don't see a way to implement in_bpf() after some thoughts. Do you have idea ? > > Or would it make things more confusing? > > > > > > we can have code paths in bpf where the socket has been already locked by > > > the stack? > > hmm... You meant the opposite, like the bpf hook does not have the > > lock pre-acquired before the bpf prog gets run and sock_setsockopt() > > should do lock_sock() as usual? > > > > I was thinking a likely situation is a bpf 'sleepable' hook does not > > have the lock pre-acquired. In that case, the bpf_setsockopt() could > > always acquire the lock first but it may turn out to be too > > pessmissitic for the future bpf_[G]etsockopt() refactoring. > > > > or we could do this 'bit' break up (into one is_locked bit > > for locked and one is_bpf to skip-capable-check). I was waiting until a real > > need comes up instead of having both bits always true now. I don't mind to > > add is_locked now since the bpf_lsm_cgroup may come to sleepable soon. > > I can do this in the next spin.
On Wed, Jul 27, 2022 at 2:21 PM Martin KaFai Lau <kafai@fb.com> wrote: > > On Wed, Jul 27, 2022 at 01:39:08PM -0700, Stanislav Fomichev wrote: > > On Wed, Jul 27, 2022 at 11:37 AM Martin KaFai Lau <kafai@fb.com> wrote: > > > > > > On Wed, Jul 27, 2022 at 09:47:25AM -0700, sdf@google.com wrote: > > > > On 07/26, Martin KaFai Lau wrote: > > > > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > > > > the sock_setsockopt(). The number of supported options are > > > > > increasing ever and so as the duplicated codes. > > > > > > > > > One issue in reusing sock_setsockopt() is that the bpf prog > > > > > has already acquired the sk lock. sockptr_t is useful to handle this. > > > > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > > > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > > > > has already been ensured by the bpf prog. > > > > > > > > Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some > > > > point, > > > is_locked was my initial attempt. The bpf_setsockopt() also skips > > > the ns_capable() check, like in patch 3. I ended up using > > > one is_bpf bit here to do both. > > > > Yeah, sorry, I haven't read the whole series before I sent my first > > reply. Let's discuss it here. > > > > This reminds me of ns_capable in __inet_bind where we also had to add > > special handling. > > > > In general, not specific to the series, I wonder if we want some new > > in_bpf() context indication and bypass ns_capable() from those > > contexts? > > Then we can do things like: > > > > if (sk->sk_bound_dev_if && !in_bpf() && !ns_capable(net->user_ns, > > CAP_NET_RAW)) > > return ...; > Don't see a way to implement in_bpf() after some thoughts. > Do you have idea ? I wonder if we can cheat a bit with the following: bool setsockopt_capable(struct user_namespace *ns, int cap) { if (!in_task()) { /* Running in irq/softirq -> setsockopt invoked by bpf program. * [not sure, is it safe to assume no regular path leads to setsockopt from sirq?] */ return true; } /* Running in process context, task has bpf_ctx set -> invoked by bpf program. */ if (current->bpf_ctx != NULL) return true; return ns_capable(ns, cap); } And then do /ns_capable/setsockopt_capable/ in net/core/sock.c But that might be more fragile than passing the flag, idk. > > Or would it make things more confusing? > > > > > > > > > > we can have code paths in bpf where the socket has been already locked by > > > > the stack? > > > hmm... You meant the opposite, like the bpf hook does not have the > > > lock pre-acquired before the bpf prog gets run and sock_setsockopt() > > > should do lock_sock() as usual? > > > > > > I was thinking a likely situation is a bpf 'sleepable' hook does not > > > have the lock pre-acquired. In that case, the bpf_setsockopt() could > > > always acquire the lock first but it may turn out to be too > > > pessmissitic for the future bpf_[G]etsockopt() refactoring. > > > > > > or we could do this 'bit' break up (into one is_locked bit > > > for locked and one is_bpf to skip-capable-check). I was waiting until a real > > > need comes up instead of having both bits always true now. I don't mind to > > > add is_locked now since the bpf_lsm_cgroup may come to sleepable soon. > > > I can do this in the next spin.
On Wed, Jul 27, 2022 at 02:38:51PM -0700, Stanislav Fomichev wrote: > On Wed, Jul 27, 2022 at 2:21 PM Martin KaFai Lau <kafai@fb.com> wrote: > > > > On Wed, Jul 27, 2022 at 01:39:08PM -0700, Stanislav Fomichev wrote: > > > On Wed, Jul 27, 2022 at 11:37 AM Martin KaFai Lau <kafai@fb.com> wrote: > > > > > > > > On Wed, Jul 27, 2022 at 09:47:25AM -0700, sdf@google.com wrote: > > > > > On 07/26, Martin KaFai Lau wrote: > > > > > > Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from > > > > > > the sock_setsockopt(). The number of supported options are > > > > > > increasing ever and so as the duplicated codes. > > > > > > > > > > > One issue in reusing sock_setsockopt() is that the bpf prog > > > > > > has already acquired the sk lock. sockptr_t is useful to handle this. > > > > > > sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user > > > > > > memory copy. This patch adds a 'is_bpf' bit to tell if sk locking > > > > > > has already been ensured by the bpf prog. > > > > > > > > > > Why not explicitly call it is_locked/is_unlocked? I'm assuming, at some > > > > > point, > > > > is_locked was my initial attempt. The bpf_setsockopt() also skips > > > > the ns_capable() check, like in patch 3. I ended up using > > > > one is_bpf bit here to do both. > > > > > > Yeah, sorry, I haven't read the whole series before I sent my first > > > reply. Let's discuss it here. > > > > > > This reminds me of ns_capable in __inet_bind where we also had to add > > > special handling. > > > > > > In general, not specific to the series, I wonder if we want some new > > > in_bpf() context indication and bypass ns_capable() from those > > > contexts? > > > Then we can do things like: > > > > > > if (sk->sk_bound_dev_if && !in_bpf() && !ns_capable(net->user_ns, > > > CAP_NET_RAW)) > > > return ...; > > Don't see a way to implement in_bpf() after some thoughts. > > Do you have idea ? > > I wonder if we can cheat a bit with the following: > > bool setsockopt_capable(struct user_namespace *ns, int cap) > { > if (!in_task()) { > /* Running in irq/softirq -> setsockopt invoked by bpf program. > * [not sure, is it safe to assume no regular path leads > to setsockopt from sirq?] > */ > return true; > } > > /* Running in process context, task has bpf_ctx set -> invoked > by bpf program. */ > if (current->bpf_ctx != NULL) > return true; > > return ns_capable(ns, cap); > } > > And then do /ns_capable/setsockopt_capable/ in net/core/sock.c > > But that might be more fragile than passing the flag, idk. I think it should work. From a quick look, all bpf_setsockopt usage has bpf_ctx. The one from bpf_tcp_ca (struct_ops) and bpf_iter is trampoline which also has bpf_ctx. Not sure about the future use cases. To be honest, I am not sure if I have missed cases and also have similar questions your have in the above sample code. This may deserve a separate patch set for discussion. Using a bit in sockptr is mostly free now. WDYT ? > > > > Or would it make things more confusing? > > > > > > > > > > > > > > we can have code paths in bpf where the socket has been already locked by > > > > > the stack? > > > > hmm... You meant the opposite, like the bpf hook does not have the > > > > lock pre-acquired before the bpf prog gets run and sock_setsockopt() > > > > should do lock_sock() as usual? > > > > > > > > I was thinking a likely situation is a bpf 'sleepable' hook does not > > > > have the lock pre-acquired. In that case, the bpf_setsockopt() could > > > > always acquire the lock first but it may turn out to be too > > > > pessmissitic for the future bpf_[G]etsockopt() refactoring. > > > > > > > > or we could do this 'bit' break up (into one is_locked bit > > > > for locked and one is_bpf to skip-capable-check). I was waiting until a real > > > > need comes up instead of having both bits always true now. I don't mind to > > > > add is_locked now since the bpf_lsm_cgroup may come to sleepable soon. > > > > I can do this in the next spin.
On Wed, 27 Jul 2022 17:45:46 -0700 Martin KaFai Lau wrote: > > bool setsockopt_capable(struct user_namespace *ns, int cap) > > { > > if (!in_task()) { > > /* Running in irq/softirq -> setsockopt invoked by bpf program. > > * [not sure, is it safe to assume no regular path leads > > to setsockopt from sirq?] > > */ > > return true; > > } > > > > /* Running in process context, task has bpf_ctx set -> invoked > > by bpf program. */ > > if (current->bpf_ctx != NULL) > > return true; > > > > return ns_capable(ns, cap); > > } > > > > And then do /ns_capable/setsockopt_capable/ in net/core/sock.c > > > > But that might be more fragile than passing the flag, idk. > I think it should work. From a quick look, all bpf_setsockopt usage has > bpf_ctx. The one from bpf_tcp_ca (struct_ops) and bpf_iter is trampoline > which also has bpf_ctx. Not sure about the future use cases. > > To be honest, I am not sure if I have missed cases and also have similar questions > your have in the above sample code. This may deserve a separate patch > set for discussion. Using a bit in sockptr is mostly free now. > WDYT ? Sorry to chime in but I vote against @in_bpf. I had to search the git history recently to figure out what SK_USER_DATA_BPF means. It's not going to be obvious to a networking person what semantics to attribute to "in bpf".
On Wed, Jul 27, 2022 at 06:49:03PM -0700, Jakub Kicinski wrote: > On Wed, 27 Jul 2022 17:45:46 -0700 Martin KaFai Lau wrote: > > > bool setsockopt_capable(struct user_namespace *ns, int cap) > > > { > > > if (!in_task()) { > > > /* Running in irq/softirq -> setsockopt invoked by bpf program. > > > * [not sure, is it safe to assume no regular path leads > > > to setsockopt from sirq?] > > > */ > > > return true; > > > } > > > > > > /* Running in process context, task has bpf_ctx set -> invoked > > > by bpf program. */ > > > if (current->bpf_ctx != NULL) > > > return true; > > > > > > return ns_capable(ns, cap); > > > } > > > > > > And then do /ns_capable/setsockopt_capable/ in net/core/sock.c > > > > > > But that might be more fragile than passing the flag, idk. > > I think it should work. From a quick look, all bpf_setsockopt usage has > > bpf_ctx. The one from bpf_tcp_ca (struct_ops) and bpf_iter is trampoline > > which also has bpf_ctx. Not sure about the future use cases. > > > > To be honest, I am not sure if I have missed cases and also have similar questions > > your have in the above sample code. This may deserve a separate patch > > set for discussion. Using a bit in sockptr is mostly free now. > > WDYT ? > > Sorry to chime in but I vote against @in_bpf. I had to search the git > history recently to figure out what SK_USER_DATA_BPF means. It's not > going to be obvious to a networking person what semantics to attribute > to "in bpf". If I understand the concern correctly, it may not be straight forward to grip the reason behind the testings at in_bpf() [ the in_task() and the current->bpf_ctx test ] ? Yes, it is a valid point. The optval.is_bpf bit can be directly traced back to the bpf_setsockopt helper and should be easier to reason about.
On Thu, 28 Jul 2022 09:31:04 -0700 Martin KaFai Lau wrote: > If I understand the concern correctly, it may not be straight forward to > grip the reason behind the testings at in_bpf() [ the in_task() and > the current->bpf_ctx test ] ? Yes, it is a valid point. > > The optval.is_bpf bit can be directly traced back to the bpf_setsockopt > helper and should be easier to reason about. I think we're saying the opposite thing. in_bpf() the context checking function is fine. There is a clear parallel to in_task() and combined with the capability check it should be pretty obvious what the code is intending to achieve. sockptr_t::in_bpf which randomly implies that the lock is already held will be hard to understand for anyone not intimately familiar with the BPF code. Naming that bit is_locked seems much clearer. Which is what I believe Stan was proposing.
On Thu, Jul 28, 2022 at 09:56:29AM -0700, Jakub Kicinski wrote: > On Thu, 28 Jul 2022 09:31:04 -0700 Martin KaFai Lau wrote: > > If I understand the concern correctly, it may not be straight forward to > > grip the reason behind the testings at in_bpf() [ the in_task() and > > the current->bpf_ctx test ] ? Yes, it is a valid point. > > > > The optval.is_bpf bit can be directly traced back to the bpf_setsockopt > > helper and should be easier to reason about. > > I think we're saying the opposite thing. in_bpf() the context checking > function is fine. There is a clear parallel to in_task() and combined > with the capability check it should be pretty obvious what the code > is intending to achieve. > > sockptr_t::in_bpf which randomly implies that the lock is already held > will be hard to understand for anyone not intimately familiar with the > BPF code. Naming that bit is_locked seems much clearer. > > Which is what I believe Stan was proposing. Yeah, I think I read the 'vote against @in_bpf' in the other way. :) Sure. I will do s/is_bpf/is_locked/ and do the in_bpf() context checking before ns_capable().
On Thu, 28 Jul 2022 10:20:04 -0700 Martin KaFai Lau wrote: > > Which is what I believe Stan was proposing. > Yeah, I think I read the 'vote against @in_bpf' in the other way. :) My bad, I didn't read the proposal in sufficient detail to realize the helper is called the same thing as the bit :D
From: Jakub Kicinski > Sent: 28 July 2022 17:56 > > On Thu, 28 Jul 2022 09:31:04 -0700 Martin KaFai Lau wrote: > > If I understand the concern correctly, it may not be straight forward to > > grip the reason behind the testings at in_bpf() [ the in_task() and > > the current->bpf_ctx test ] ? Yes, it is a valid point. > > > > The optval.is_bpf bit can be directly traced back to the bpf_setsockopt > > helper and should be easier to reason about. > > I think we're saying the opposite thing. in_bpf() the context checking > function is fine. There is a clear parallel to in_task() and combined > with the capability check it should be pretty obvious what the code > is intending to achieve. > > sockptr_t::in_bpf which randomly implies that the lock is already held > will be hard to understand for anyone not intimately familiar with the > BPF code. Naming that bit is_locked seems much clearer. > > Which is what I believe Stan was proposing. Or make sk_setsockopt() be called after the integer value has been read and with the lock held. That saves any (horrid) conditional locking. Also sockptr_t should probably have been a structure with separate user and kernel address fields. Putting the length in there would (probably) save code. There then might be scope for pre-copying short user buffers into a kernel buffer while still allowing the requests that ignore the length copy directly from a user buffer. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Jul 29, 2022 at 10:04:29AM +0000, David Laight wrote: > From: Jakub Kicinski > > Sent: 28 July 2022 17:56 > > > > On Thu, 28 Jul 2022 09:31:04 -0700 Martin KaFai Lau wrote: > > > If I understand the concern correctly, it may not be straight forward to > > > grip the reason behind the testings at in_bpf() [ the in_task() and > > > the current->bpf_ctx test ] ? Yes, it is a valid point. > > > > > > The optval.is_bpf bit can be directly traced back to the bpf_setsockopt > > > helper and should be easier to reason about. > > > > I think we're saying the opposite thing. in_bpf() the context checking > > function is fine. There is a clear parallel to in_task() and combined > > with the capability check it should be pretty obvious what the code > > is intending to achieve. > > > > sockptr_t::in_bpf which randomly implies that the lock is already held > > will be hard to understand for anyone not intimately familiar with the > > BPF code. Naming that bit is_locked seems much clearer. > > > > Which is what I believe Stan was proposing. > > Or make sk_setsockopt() be called after the integer value > has been read and with the lock held. > > That saves any (horrid) conditional locking. > > Also sockptr_t should probably have been a structure with separate > user and kernel address fields. > Putting the length in there would (probably) save code. > > There then might be scope for pre-copying short user buffers > into a kernel buffer while still allowing the requests that > ignore the length copy directly from a user buffer. Some optnames take its own lock. e.g. some in do_tcp_setsockopt. Those will need to be broken down to its own locked and unlocked functions. Not only setsockopt, this applies to the future [g]etsockopt() refactoring also where most optnames are not under one lock_sock() and each optname could take the lock or release it in its own optname. imo, this is unnecessary code churn for long switching cases like setsockopt without a clear benefit. While the patch is not the first conditional locking in the kernel, I would like to hear how others think about doing this in a helper like lock_sock_sockopt() for set/getsockopt(). With in_bpf() helper suggested by Stan, the is_locked can be passed as one additional argument instead. Then there is no need to change the sockptr_t and leave sockptr_t to contain the optval itself.
diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index d45902fb4cad..787d0be08fb7 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -16,7 +16,8 @@ typedef struct { void *kernel; void __user *user; }; - bool is_kernel : 1; + bool is_kernel : 1, + is_bpf : 1; } sockptr_t; static inline bool sockptr_is_kernel(sockptr_t sockptr) @@ -24,6 +25,11 @@ static inline bool sockptr_is_kernel(sockptr_t sockptr) return sockptr.is_kernel; } +static inline sockptr_t KERNEL_SOCKPTR_BPF(void *p) +{ + return (sockptr_t) { .kernel = p, .is_kernel = true, .is_bpf = true }; +} + static inline sockptr_t KERNEL_SOCKPTR(void *p) { return (sockptr_t) { .kernel = p, .is_kernel = true }; diff --git a/include/net/sock.h b/include/net/sock.h index 9e2539dcc293..2f913bdf0035 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1721,6 +1721,18 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) } } +static inline void lock_sock_sockopt(struct sock *sk, sockptr_t optval) +{ + if (!optval.is_bpf) + lock_sock(sk); +} + +static inline void release_sock_sockopt(struct sock *sk, sockptr_t optval) +{ + if (!optval.is_bpf) + release_sock(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/net/core/sock.c b/net/core/sock.c index 18bb4f269cf1..61d927a5f6cb 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -703,7 +703,9 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen) goto out; } - return sock_bindtoindex(sk, index, true); + lock_sock_sockopt(sk, optval); + ret = sock_bindtoindex_locked(sk, index); + release_sock_sockopt(sk, optval); out: #endif @@ -1067,7 +1069,7 @@ int sock_setsockopt(struct sock *sk, int level, int optname, valbool = val ? 1 : 0; - lock_sock(sk); + lock_sock_sockopt(sk, optval); switch (optname) { case SO_DEBUG: @@ -1496,7 +1498,7 @@ int sock_setsockopt(struct sock *sk, int level, int optname, ret = -ENOPROTOOPT; break; } - release_sock(sk); + release_sock_sockopt(sk, optval); return ret; } EXPORT_SYMBOL(sock_setsockopt);
Most of the codes in bpf_setsockopt(SOL_SOCKET) are duplicated from the sock_setsockopt(). The number of supported options are increasing ever and so as the duplicated codes. One issue in reusing sock_setsockopt() is that the bpf prog has already acquired the sk lock. sockptr_t is useful to handle this. sockptr_t already has a bit 'is_kernel' to handle the kernel-or-user memory copy. This patch adds a 'is_bpf' bit to tell if sk locking has already been ensured by the bpf prog. {lock,release}_sock_sockopt() helpers are added for the sock_setsockopt() to use. These helpers will handle the new 'is_bpf' bit. Note on the change in sock_setbindtodevice(). lock_sock_sockopt() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Signed-off-by: Martin KaFai Lau <kafai@fb.com> --- include/linux/sockptr.h | 8 +++++++- include/net/sock.h | 12 ++++++++++++ net/core/sock.c | 8 +++++--- 3 files changed, 24 insertions(+), 4 deletions(-)