diff mbox series

[v2,bpf-next,02/15] bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf

Message ID 20220803204614.3077284-1-kafai@fb.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: net: Remove duplicated code from bpf_setsockopt() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3004 this patch: 3004
netdev/cc_maintainers warning 7 maintainers not CCed: john.fastabend@gmail.com song@kernel.org martin.lau@linux.dev kpsingh@kernel.org jolsa@kernel.org haoluo@google.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 627 this patch: 627
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3116 this patch: 3116
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 79 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for Kernel LATEST on z15 with gcc

Commit Message

Martin KaFai Lau Aug. 3, 2022, 8:46 p.m. UTC
Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
the sk_setsockopt().  The number of supported optnames are
increasing ever and so as the duplicated code.

One issue in reusing sk_setsockopt() is that the bpf prog
has already acquired the sk lock.  This patch adds a in_bpf()
to tell if the sk_setsockopt() is called from a bpf prog.
The bpf prog calling bpf_setsockopt() is either running in_task()
or in_serving_softirq().  Both cases have the current->bpf_ctx
initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.

This patch also adds sockopt_{lock,release}_sock() helpers
for sk_setsockopt() to use.  These helpers will test in_bpf()
before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
for the ipv6 module to use in a latter patch.

Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
is done in sock_setbindtodevice() instead of doing the lock_sock
in sock_bindtoindex(..., lock_sk = true).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/bpf.h |  8 ++++++++
 include/net/sock.h  |  3 +++
 net/core/sock.c     | 26 +++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

Comments

Stanislav Fomichev Aug. 3, 2022, 10:59 p.m. UTC | #1
On 08/03, Martin KaFai Lau wrote:
> Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> the sk_setsockopt().  The number of supported optnames are
> increasing ever and so as the duplicated code.

> One issue in reusing sk_setsockopt() is that the bpf prog
> has already acquired the sk lock.  This patch adds a in_bpf()
> to tell if the sk_setsockopt() is called from a bpf prog.
> The bpf prog calling bpf_setsockopt() is either running in_task()
> or in_serving_softirq().  Both cases have the current->bpf_ctx
> initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.

> This patch also adds sockopt_{lock,release}_sock() helpers
> for sk_setsockopt() to use.  These helpers will test in_bpf()
> before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> for the ipv6 module to use in a latter patch.

> Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> is done in sock_setbindtodevice() instead of doing the lock_sock
> in sock_bindtoindex(..., lock_sk = true).

> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>   include/linux/bpf.h |  8 ++++++++
>   include/net/sock.h  |  3 +++
>   net/core/sock.c     | 26 +++++++++++++++++++++++---
>   3 files changed, 34 insertions(+), 3 deletions(-)

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 20c26aed7896..b905b1b34fe4 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
>   	return !sysctl_unprivileged_bpf_disabled;
>   }

> +static inline bool in_bpf(void)
> +{
> +	return !!current->bpf_ctx;
> +}

Good point on not needing to care about softirq!
That actually turned even nicer :-)

QQ: do we need to add a comment here about potential false-negatives?
I see you're adding ctx to the iter, but there is still a bunch of places
that don't use it.
Martin KaFai Lau Aug. 3, 2022, 11:19 p.m. UTC | #2
On Wed, Aug 03, 2022 at 03:59:26PM -0700, sdf@google.com wrote:
> On 08/03, Martin KaFai Lau wrote:
> > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > the sk_setsockopt().  The number of supported optnames are
> > increasing ever and so as the duplicated code.
> 
> > One issue in reusing sk_setsockopt() is that the bpf prog
> > has already acquired the sk lock.  This patch adds a in_bpf()
> > to tell if the sk_setsockopt() is called from a bpf prog.
> > The bpf prog calling bpf_setsockopt() is either running in_task()
> > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> 
> > This patch also adds sockopt_{lock,release}_sock() helpers
> > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > for the ipv6 module to use in a latter patch.
> 
> > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > is done in sock_setbindtodevice() instead of doing the lock_sock
> > in sock_bindtoindex(..., lock_sk = true).
> 
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >   include/linux/bpf.h |  8 ++++++++
> >   include/net/sock.h  |  3 +++
> >   net/core/sock.c     | 26 +++++++++++++++++++++++---
> >   3 files changed, 34 insertions(+), 3 deletions(-)
> 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 20c26aed7896..b905b1b34fe4 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> >   	return !sysctl_unprivileged_bpf_disabled;
> >   }
> 
> > +static inline bool in_bpf(void)
> > +{
> > +	return !!current->bpf_ctx;
> > +}
> 
> Good point on not needing to care about softirq!
> That actually turned even nicer :-)
> 
> QQ: do we need to add a comment here about potential false-negatives?
> I see you're adding ctx to the iter, but there is still a bunch of places
> that don't use it.
Make sense.  I will add a comment on the requirement that the bpf prog type
needs to setup the bpf_run_ctx.
Stanislav Fomichev Aug. 3, 2022, 11:24 p.m. UTC | #3
On Wed, Aug 3, 2022 at 4:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Aug 03, 2022 at 03:59:26PM -0700, sdf@google.com wrote:
> > On 08/03, Martin KaFai Lau wrote:
> > > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > > the sk_setsockopt().  The number of supported optnames are
> > > increasing ever and so as the duplicated code.
> >
> > > One issue in reusing sk_setsockopt() is that the bpf prog
> > > has already acquired the sk lock.  This patch adds a in_bpf()
> > > to tell if the sk_setsockopt() is called from a bpf prog.
> > > The bpf prog calling bpf_setsockopt() is either running in_task()
> > > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> >
> > > This patch also adds sockopt_{lock,release}_sock() helpers
> > > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > > for the ipv6 module to use in a latter patch.
> >
> > > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > > is done in sock_setbindtodevice() instead of doing the lock_sock
> > > in sock_bindtoindex(..., lock_sk = true).
> >
> > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > ---
> > >   include/linux/bpf.h |  8 ++++++++
> > >   include/net/sock.h  |  3 +++
> > >   net/core/sock.c     | 26 +++++++++++++++++++++++---
> > >   3 files changed, 34 insertions(+), 3 deletions(-)
> >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 20c26aed7896..b905b1b34fe4 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> > >     return !sysctl_unprivileged_bpf_disabled;
> > >   }
> >
> > > +static inline bool in_bpf(void)
> > > +{
> > > +   return !!current->bpf_ctx;
> > > +}
> >
> > Good point on not needing to care about softirq!
> > That actually turned even nicer :-)
> >
> > QQ: do we need to add a comment here about potential false-negatives?
> > I see you're adding ctx to the iter, but there is still a bunch of places
> > that don't use it.
> Make sense.  I will add a comment on the requirement that the bpf prog type
> needs to setup the bpf_run_ctx.

Thanks! White at it, is it worth adding a short sentence to
sockopt_lock_sock on why it's safe to skip locking in the bpf case as
well?
Feels like the current state where bpf always runs with the locked
socket might change in the future.
Martin KaFai Lau Aug. 3, 2022, 11:35 p.m. UTC | #4
On Wed, Aug 03, 2022 at 04:24:49PM -0700, Stanislav Fomichev wrote:
> On Wed, Aug 3, 2022 at 4:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Aug 03, 2022 at 03:59:26PM -0700, sdf@google.com wrote:
> > > On 08/03, Martin KaFai Lau wrote:
> > > > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > > > the sk_setsockopt().  The number of supported optnames are
> > > > increasing ever and so as the duplicated code.
> > >
> > > > One issue in reusing sk_setsockopt() is that the bpf prog
> > > > has already acquired the sk lock.  This patch adds a in_bpf()
> > > > to tell if the sk_setsockopt() is called from a bpf prog.
> > > > The bpf prog calling bpf_setsockopt() is either running in_task()
> > > > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > > > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> > >
> > > > This patch also adds sockopt_{lock,release}_sock() helpers
> > > > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > > > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > > > for the ipv6 module to use in a latter patch.
> > >
> > > > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > > > is done in sock_setbindtodevice() instead of doing the lock_sock
> > > > in sock_bindtoindex(..., lock_sk = true).
> > >
> > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > ---
> > > >   include/linux/bpf.h |  8 ++++++++
> > > >   include/net/sock.h  |  3 +++
> > > >   net/core/sock.c     | 26 +++++++++++++++++++++++---
> > > >   3 files changed, 34 insertions(+), 3 deletions(-)
> > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 20c26aed7896..b905b1b34fe4 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> > > >     return !sysctl_unprivileged_bpf_disabled;
> > > >   }
> > >
> > > > +static inline bool in_bpf(void)
> > > > +{
> > > > +   return !!current->bpf_ctx;
> > > > +}
> > >
> > > Good point on not needing to care about softirq!
> > > That actually turned even nicer :-)
> > >
> > > QQ: do we need to add a comment here about potential false-negatives?
> > > I see you're adding ctx to the iter, but there is still a bunch of places
> > > that don't use it.
> > Make sense.  I will add a comment on the requirement that the bpf prog type
> > needs to setup the bpf_run_ctx.
> 
> Thanks! White at it, is it worth adding a short sentence to
> sockopt_lock_sock on why it's safe to skip locking in the bpf case as
> well?
Yep. will do.

> Feels like the current state where bpf always runs with the locked
> socket might change in the future.
That likely will be from the sleepable bpf prog.
It can probably either acquire the lock in __bpf_setsockopt() before
calling sk_setsockopt() or flag the bpf_run_ctx to say the lock is not acquired.
The former should be more straight forward.
Andrii Nakryiko Aug. 4, 2022, 7:03 p.m. UTC | #5
On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> the sk_setsockopt().  The number of supported optnames are
> increasing ever and so as the duplicated code.
>
> One issue in reusing sk_setsockopt() is that the bpf prog
> has already acquired the sk lock.  This patch adds a in_bpf()
> to tell if the sk_setsockopt() is called from a bpf prog.
> The bpf prog calling bpf_setsockopt() is either running in_task()
> or in_serving_softirq().  Both cases have the current->bpf_ctx
> initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
>
> This patch also adds sockopt_{lock,release}_sock() helpers
> for sk_setsockopt() to use.  These helpers will test in_bpf()
> before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> for the ipv6 module to use in a latter patch.
>
> Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> is done in sock_setbindtodevice() instead of doing the lock_sock
> in sock_bindtoindex(..., lock_sk = true).
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/bpf.h |  8 ++++++++
>  include/net/sock.h  |  3 +++
>  net/core/sock.c     | 26 +++++++++++++++++++++++---
>  3 files changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 20c26aed7896..b905b1b34fe4 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
>         return !sysctl_unprivileged_bpf_disabled;
>  }
>
> +static inline bool in_bpf(void)

I think this function deserves a big comment explaining that it's not
100% accurate, as not every BPF program type sets bpf_ctx. As it is
named in_bpf() promises a lot more generality than it actually
provides.

Should this be named either more specific has_current_bpf_ctx() maybe?

Also, separately, should be make an effort to set bpf_ctx for all
program types (instead or in addition to the above)?

> +{
> +       return !!current->bpf_ctx;
> +}
>  #else /* !CONFIG_BPF_SYSCALL */
>  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>  {
> @@ -2175,6 +2179,10 @@ static inline bool unprivileged_ebpf_enabled(void)
>         return false;
>  }
>
> +static inline bool in_bpf(void)
> +{
> +       return false;
> +}
>  #endif /* CONFIG_BPF_SYSCALL */
>
>  void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index a7273b289188..b2ff230860c6 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1721,6 +1721,9 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
>         }
>  }
>
> +void sockopt_lock_sock(struct sock *sk);
> +void sockopt_release_sock(struct sock *sk);
> +
>  /* Used by processes to "lock" a socket state, so that
>   * interrupts and bottom half handlers won't change it
>   * from under us. It essentially blocks any incoming
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 20269c37ab3b..82759540ae2c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -703,7 +703,9 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
>                         goto out;
>         }
>
> -       return sock_bindtoindex(sk, index, true);
> +       sockopt_lock_sock(sk);
> +       ret = sock_bindtoindex_locked(sk, index);
> +       sockopt_release_sock(sk);
>  out:
>  #endif
>
> @@ -1036,6 +1038,24 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
>         return 0;
>  }
>
> +void sockopt_lock_sock(struct sock *sk)
> +{
> +       if (in_bpf())
> +               return;
> +
> +       lock_sock(sk);
> +}
> +EXPORT_SYMBOL(sockopt_lock_sock);
> +
> +void sockopt_release_sock(struct sock *sk)
> +{
> +       if (in_bpf())
> +               return;
> +
> +       release_sock(sk);
> +}
> +EXPORT_SYMBOL(sockopt_release_sock);
> +
>  /*
>   *     This is meant for all protocols to use and covers goings on
>   *     at the socket level. Everything here is generic.
> @@ -1067,7 +1087,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname,
>
>         valbool = val ? 1 : 0;
>
> -       lock_sock(sk);
> +       sockopt_lock_sock(sk);
>
>         switch (optname) {
>         case SO_DEBUG:
> @@ -1496,7 +1516,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname,
>                 ret = -ENOPROTOOPT;
>                 break;
>         }
> -       release_sock(sk);
> +       sockopt_release_sock(sk);
>         return ret;
>  }
>
> --
> 2.30.2
>
Martin KaFai Lau Aug. 4, 2022, 7:29 p.m. UTC | #6
On Thu, Aug 04, 2022 at 12:03:04PM -0700, Andrii Nakryiko wrote:
> On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > the sk_setsockopt().  The number of supported optnames are
> > increasing ever and so as the duplicated code.
> >
> > One issue in reusing sk_setsockopt() is that the bpf prog
> > has already acquired the sk lock.  This patch adds a in_bpf()
> > to tell if the sk_setsockopt() is called from a bpf prog.
> > The bpf prog calling bpf_setsockopt() is either running in_task()
> > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> >
> > This patch also adds sockopt_{lock,release}_sock() helpers
> > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > for the ipv6 module to use in a latter patch.
> >
> > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > is done in sock_setbindtodevice() instead of doing the lock_sock
> > in sock_bindtoindex(..., lock_sk = true).
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >  include/linux/bpf.h |  8 ++++++++
> >  include/net/sock.h  |  3 +++
> >  net/core/sock.c     | 26 +++++++++++++++++++++++---
> >  3 files changed, 34 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 20c26aed7896..b905b1b34fe4 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> >         return !sysctl_unprivileged_bpf_disabled;
> >  }
> >
> > +static inline bool in_bpf(void)
> 
> I think this function deserves a big comment explaining that it's not
> 100% accurate, as not every BPF program type sets bpf_ctx. As it is
> named in_bpf() promises a lot more generality than it actually
> provides.
> 
> Should this be named either more specific has_current_bpf_ctx() maybe?
Stans also made a similar point on this to add comment.
Rename makes sense until all bpf prog has bpf_ctx.  in_bpf() was
just the name it was used in the v1 discussion for the setsockopt
context.

> Also, separately, should be make an effort to set bpf_ctx for all
> program types (instead or in addition to the above)?
I would prefer to separate this as a separate effort.  This set is
getting pretty long and the bpf_getsockopt() is still not posted.

If you prefer this must be done first, I can do that also.
Andrii Nakryiko Aug. 4, 2022, 8:51 p.m. UTC | #7
On Thu, Aug 4, 2022 at 12:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Aug 04, 2022 at 12:03:04PM -0700, Andrii Nakryiko wrote:
> > On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > > the sk_setsockopt().  The number of supported optnames are
> > > increasing ever and so as the duplicated code.
> > >
> > > One issue in reusing sk_setsockopt() is that the bpf prog
> > > has already acquired the sk lock.  This patch adds a in_bpf()
> > > to tell if the sk_setsockopt() is called from a bpf prog.
> > > The bpf prog calling bpf_setsockopt() is either running in_task()
> > > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> > >
> > > This patch also adds sockopt_{lock,release}_sock() helpers
> > > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > > for the ipv6 module to use in a latter patch.
> > >
> > > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > > is done in sock_setbindtodevice() instead of doing the lock_sock
> > > in sock_bindtoindex(..., lock_sk = true).
> > >
> > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > ---
> > >  include/linux/bpf.h |  8 ++++++++
> > >  include/net/sock.h  |  3 +++
> > >  net/core/sock.c     | 26 +++++++++++++++++++++++---
> > >  3 files changed, 34 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 20c26aed7896..b905b1b34fe4 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> > >         return !sysctl_unprivileged_bpf_disabled;
> > >  }
> > >
> > > +static inline bool in_bpf(void)
> >
> > I think this function deserves a big comment explaining that it's not
> > 100% accurate, as not every BPF program type sets bpf_ctx. As it is
> > named in_bpf() promises a lot more generality than it actually
> > provides.
> >
> > Should this be named either more specific has_current_bpf_ctx() maybe?
> Stans also made a similar point on this to add comment.
> Rename makes sense until all bpf prog has bpf_ctx.  in_bpf() was
> just the name it was used in the v1 discussion for the setsockopt
> context.
>
> > Also, separately, should be make an effort to set bpf_ctx for all
> > program types (instead or in addition to the above)?
> I would prefer to separate this as a separate effort.  This set is
> getting pretty long and the bpf_getsockopt() is still not posted.

Yeah, sure, I don't think you should be blocked on that.

>
> If you prefer this must be done first, I can do that also.

I wanted to bring this up for discussion. I find bpf_ctx a very useful
construct, if we had it available universally we could use it
(reliably) for this in_bpf() check, we could also have a sleepable vs
non-sleepable flag stored in such context and thus avoid all the
special handling we have for providing different gfp flags, etc.

But it's not just up for me to decide if we want to add it for all
program types (e.g., I wouldn't be surprised if I got push back adding
this to XDP). Most program types I normally use already have bpf_ctx
(and bpf_cookie built on top), but I was wondering what others feel
regarding making this (bpf_ctx in general, bpf_cookie in particular)
universally available.

So please proceed with your changes, I just used your patch as an
anchor for this discussion :)
Stanislav Fomichev Aug. 4, 2022, 9:43 p.m. UTC | #8
On Thu, Aug 4, 2022 at 1:51 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 4, 2022 at 12:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Aug 04, 2022 at 12:03:04PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > > > the sk_setsockopt().  The number of supported optnames are
> > > > increasing ever and so as the duplicated code.
> > > >
> > > > One issue in reusing sk_setsockopt() is that the bpf prog
> > > > has already acquired the sk lock.  This patch adds a in_bpf()
> > > > to tell if the sk_setsockopt() is called from a bpf prog.
> > > > The bpf prog calling bpf_setsockopt() is either running in_task()
> > > > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > > > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> > > >
> > > > This patch also adds sockopt_{lock,release}_sock() helpers
> > > > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > > > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > > > for the ipv6 module to use in a latter patch.
> > > >
> > > > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > > > is done in sock_setbindtodevice() instead of doing the lock_sock
> > > > in sock_bindtoindex(..., lock_sk = true).
> > > >
> > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > ---
> > > >  include/linux/bpf.h |  8 ++++++++
> > > >  include/net/sock.h  |  3 +++
> > > >  net/core/sock.c     | 26 +++++++++++++++++++++++---
> > > >  3 files changed, 34 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 20c26aed7896..b905b1b34fe4 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> > > >         return !sysctl_unprivileged_bpf_disabled;
> > > >  }
> > > >
> > > > +static inline bool in_bpf(void)
> > >
> > > I think this function deserves a big comment explaining that it's not
> > > 100% accurate, as not every BPF program type sets bpf_ctx. As it is
> > > named in_bpf() promises a lot more generality than it actually
> > > provides.
> > >
> > > Should this be named either more specific has_current_bpf_ctx() maybe?
> > Stans also made a similar point on this to add comment.
> > Rename makes sense until all bpf prog has bpf_ctx.  in_bpf() was
> > just the name it was used in the v1 discussion for the setsockopt
> > context.
> >
> > > Also, separately, should be make an effort to set bpf_ctx for all
> > > program types (instead or in addition to the above)?
> > I would prefer to separate this as a separate effort.  This set is
> > getting pretty long and the bpf_getsockopt() is still not posted.
>
> Yeah, sure, I don't think you should be blocked on that.
>
> >
> > If you prefer this must be done first, I can do that also.
>
> I wanted to bring this up for discussion. I find bpf_ctx a very useful
> construct, if we had it available universally we could use it
> (reliably) for this in_bpf() check, we could also have a sleepable vs
> non-sleepable flag stored in such context and thus avoid all the
> special handling we have for providing different gfp flags, etc.

+1

> But it's not just up for me to decide if we want to add it for all
> program types (e.g., I wouldn't be surprised if I got push back adding
> this to XDP). Most program types I normally use already have bpf_ctx
> (and bpf_cookie built on top), but I was wondering what others feel
> regarding making this (bpf_ctx in general, bpf_cookie in particular)
> universally available.

If we can get universal bpf_ctx, do we still need bpf_prog_active?
Regarding xdp: assigning a bunch of pointers shouldn't hopefully be
that big of a deal?
Martin KaFai Lau Aug. 5, 2022, 12:29 a.m. UTC | #9
On Thu, Aug 04, 2022 at 01:51:12PM -0700, Andrii Nakryiko wrote:
> On Thu, Aug 4, 2022 at 12:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Aug 04, 2022 at 12:03:04PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > > > the sk_setsockopt().  The number of supported optnames are
> > > > increasing ever and so as the duplicated code.
> > > >
> > > > One issue in reusing sk_setsockopt() is that the bpf prog
> > > > has already acquired the sk lock.  This patch adds a in_bpf()
> > > > to tell if the sk_setsockopt() is called from a bpf prog.
> > > > The bpf prog calling bpf_setsockopt() is either running in_task()
> > > > or in_serving_softirq().  Both cases have the current->bpf_ctx
> > > > initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> > > >
> > > > This patch also adds sockopt_{lock,release}_sock() helpers
> > > > for sk_setsockopt() to use.  These helpers will test in_bpf()
> > > > before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> > > > for the ipv6 module to use in a latter patch.
> > > >
> > > > Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> > > > is done in sock_setbindtodevice() instead of doing the lock_sock
> > > > in sock_bindtoindex(..., lock_sk = true).
> > > >
> > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > ---
> > > >  include/linux/bpf.h |  8 ++++++++
> > > >  include/net/sock.h  |  3 +++
> > > >  net/core/sock.c     | 26 +++++++++++++++++++++++---
> > > >  3 files changed, 34 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 20c26aed7896..b905b1b34fe4 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> > > >         return !sysctl_unprivileged_bpf_disabled;
> > > >  }
> > > >
> > > > +static inline bool in_bpf(void)
> > >
> > > I think this function deserves a big comment explaining that it's not
> > > 100% accurate, as not every BPF program type sets bpf_ctx. As it is
> > > named in_bpf() promises a lot more generality than it actually
> > > provides.
> > >
> > > Should this be named either more specific has_current_bpf_ctx() maybe?
> > Stans also made a similar point on this to add comment.
> > Rename makes sense until all bpf prog has bpf_ctx.  in_bpf() was
> > just the name it was used in the v1 discussion for the setsockopt
> > context.
> >
> > > Also, separately, should be make an effort to set bpf_ctx for all
> > > program types (instead or in addition to the above)?
> > I would prefer to separate this as a separate effort.  This set is
> > getting pretty long and the bpf_getsockopt() is still not posted.
> 
> Yeah, sure, I don't think you should be blocked on that.
> 
> >
> > If you prefer this must be done first, I can do that also.
> 
> I wanted to bring this up for discussion. I find bpf_ctx a very useful
> construct, if we had it available universally we could use it
> (reliably) for this in_bpf() check, we could also have a sleepable vs
> non-sleepable flag stored in such context and thus avoid all the
> special handling we have for providing different gfp flags, etc.
> 
> But it's not just up for me to decide if we want to add it for all
> program types (e.g., I wouldn't be surprised if I got push back adding
> this to XDP). Most program types I normally use already have bpf_ctx
> (and bpf_cookie built on top), but I was wondering what others feel
> regarding making this (bpf_ctx in general, bpf_cookie in particular)
> universally available.
It may be easier to reason to add bpf_ctx with a use case.
Like networking prog, the cgroup-bpf (for storage and retval) and the
struct_ops (came automatically from the trampoline but finally become
useful in setsockopt here).

I don't think other network prog types have it now.  For sleepable or not,
I am not sure if those other network programs will ever be sleepable.
tc-bpf cannot call setsockopt also.

> 
> So please proceed with your changes, I just used your patch as an
> anchor for this discussion :)
+1
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 20c26aed7896..b905b1b34fe4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1966,6 +1966,10 @@  static inline bool unprivileged_ebpf_enabled(void)
 	return !sysctl_unprivileged_bpf_disabled;
 }
 
+static inline bool in_bpf(void)
+{
+	return !!current->bpf_ctx;
+}
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
@@ -2175,6 +2179,10 @@  static inline bool unprivileged_ebpf_enabled(void)
 	return false;
 }
 
+static inline bool in_bpf(void)
+{
+	return false;
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
diff --git a/include/net/sock.h b/include/net/sock.h
index a7273b289188..b2ff230860c6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1721,6 +1721,9 @@  static inline void unlock_sock_fast(struct sock *sk, bool slow)
 	}
 }
 
+void sockopt_lock_sock(struct sock *sk);
+void sockopt_release_sock(struct sock *sk);
+
 /* Used by processes to "lock" a socket state, so that
  * interrupts and bottom half handlers won't change it
  * from under us. It essentially blocks any incoming
diff --git a/net/core/sock.c b/net/core/sock.c
index 20269c37ab3b..82759540ae2c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -703,7 +703,9 @@  static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
 			goto out;
 	}
 
-	return sock_bindtoindex(sk, index, true);
+	sockopt_lock_sock(sk);
+	ret = sock_bindtoindex_locked(sk, index);
+	sockopt_release_sock(sk);
 out:
 #endif
 
@@ -1036,6 +1038,24 @@  static int sock_reserve_memory(struct sock *sk, int bytes)
 	return 0;
 }
 
+void sockopt_lock_sock(struct sock *sk)
+{
+	if (in_bpf())
+		return;
+
+	lock_sock(sk);
+}
+EXPORT_SYMBOL(sockopt_lock_sock);
+
+void sockopt_release_sock(struct sock *sk)
+{
+	if (in_bpf())
+		return;
+
+	release_sock(sk);
+}
+EXPORT_SYMBOL(sockopt_release_sock);
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
@@ -1067,7 +1087,7 @@  static int sk_setsockopt(struct sock *sk, int level, int optname,
 
 	valbool = val ? 1 : 0;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 
 	switch (optname) {
 	case SO_DEBUG:
@@ -1496,7 +1516,7 @@  static int sk_setsockopt(struct sock *sk, int level, int optname,
 		ret = -ENOPROTOOPT;
 		break;
 	}
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return ret;
 }