diff mbox series

[v3,bpf-next,3/4] bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value

Message ID 833b122afaeaba4485942c563ef16a64fa997fe6.1641316155.git.zhuyifei@google.com (mailing list archive)
State New, archived
Delegated to: BPF
Headers show
Series bpf: allow cgroup progs to export custom retval to userspace | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1797 this patch: 1797
netdev/cc_maintainers warning 10 maintainers not CCed: joe@cilium.io linux-kselftest@vger.kernel.org kpsingh@kernel.org yauheni.kaliuta@redhat.com john.fastabend@gmail.com kafai@fb.com songliubraving@fb.com shuah@kernel.org yhs@fb.com netdev@vger.kernel.org
netdev/build_clang success Errors and warnings before: 212 this patch: 212
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1811 this patch: 1811
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 12 this patch: 12
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next success VM_Test

Commit Message

YiFei Zhu Jan. 4, 2022, 5:15 p.m. UTC
The helpers continue to use int for retval because all the hooks
are int-returning rather than long-returning. The return value of
bpf_set_retval is int for future-proofing, in case in the future
there may be errors trying to set the retval.

After the previous patch, if a program rejects a syscall by
returning 0, an -EPERM will be generated no matter if the retval
is already set to -err. This patch change it being forced only if
retval is not -err. This is because we want to support, for
example, invoking bpf_set_retval(-EINVAL) and return 0, and have
the syscall return value be -EINVAL not -EPERM.

This change is reflected in the sockopt_sk test which has been
updated to assert the errno is EINVAL instead of the EPERM.
The eBPF prog has to explicitly bpf_set_retval(-EPERM) if EPERM
is wanted. I also removed the explicit mentions of EPERM in the
comments in the prog.

For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
that, if the return value is NET_XMIT_DROP, the packet is silently
dropped. We preserve this behavior for backward compatibility
reasons, so even if an errno is set, the errno does not return to
caller. However, setting a non-err to retval cannot propagate so
this is not allowed and we return a -EFAULT in that case.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h                           | 10 +++--
 include/uapi/linux/bpf.h                      | 18 +++++++++
 kernel/bpf/cgroup.c                           | 38 ++++++++++++++++++-
 tools/include/uapi/linux/bpf.h                | 18 +++++++++
 .../selftests/bpf/prog_tests/sockopt_sk.c     |  2 +-
 .../testing/selftests/bpf/progs/sockopt_sk.c  | 32 ++++++++--------
 6 files changed, 96 insertions(+), 22 deletions(-)

Comments

Alexei Starovoitov Jan. 19, 2022, 7:50 p.m. UTC | #1
On Tue, Jan 4, 2022 at 9:15 AM YiFei Zhu <zhuyifei@google.com> wrote:
>
> The helpers continue to use int for retval because all the hooks
> are int-returning rather than long-returning. The return value of
> bpf_set_retval is int for future-proofing, in case in the future
> there may be errors trying to set the retval.
>
> After the previous patch, if a program rejects a syscall by
> returning 0, an -EPERM will be generated no matter if the retval
> is already set to -err. This patch change it being forced only if
> retval is not -err. This is because we want to support, for
> example, invoking bpf_set_retval(-EINVAL) and return 0, and have
> the syscall return value be -EINVAL not -EPERM.
>
> This change is reflected in the sockopt_sk test which has been
> updated to assert the errno is EINVAL instead of the EPERM.
> The eBPF prog has to explicitly bpf_set_retval(-EPERM) if EPERM
> is wanted. I also removed the explicit mentions of EPERM in the
> comments in the prog.
>
> For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> that, if the return value is NET_XMIT_DROP, the packet is silently
> dropped. We preserve this behavior for backward compatibility
> reasons, so even if an errno is set, the errno does not return to
> caller. However, setting a non-err to retval cannot propagate so
> this is not allowed and we return a -EFAULT in that case.
>
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> Reviewed-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf.h                           | 10 +++--
>  include/uapi/linux/bpf.h                      | 18 +++++++++
>  kernel/bpf/cgroup.c                           | 38 ++++++++++++++++++-
>  tools/include/uapi/linux/bpf.h                | 18 +++++++++
>  .../selftests/bpf/prog_tests/sockopt_sk.c     |  2 +-
>  .../testing/selftests/bpf/progs/sockopt_sk.c  | 32 ++++++++--------
>  6 files changed, 96 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 88f6891e2b53..300df48fa0e0 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1300,7 +1300,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
>         while ((prog = READ_ONCE(item->prog))) {
>                 run_ctx.prog_item = item;
>                 func_ret = run_prog(prog, ctx);
> -               if (!(func_ret & 1))
> +               if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))
>                         run_ctx.retval = -EPERM;
>                 *(ret_flags) |= (func_ret >> 1);
>                 item++;
> @@ -1330,7 +1330,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
>         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>         while ((prog = READ_ONCE(item->prog))) {
>                 run_ctx.prog_item = item;
> -               if (!run_prog(prog, ctx))
> +               if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval))
>                         run_ctx.retval = -EPERM;
>                 item++;
>         }
> @@ -1390,7 +1390,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
>   *   0: NET_XMIT_SUCCESS  skb should be transmitted
>   *   1: NET_XMIT_DROP     skb should be dropped and cn
>   *   2: NET_XMIT_CN       skb should be transmitted and cn
> - *   3: -EPERM            skb should be dropped
> + *   3: -err              skb should be dropped
>   */
>  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)                \
>         ({                                              \
> @@ -1399,10 +1399,12 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
>                 u32 _ret;                               \
>                 _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
>                 _cn = _flags & BPF_RET_SET_CN;          \
> +               if (_ret && !IS_ERR_VALUE((long)_ret))  \
> +                       _ret = -EFAULT;                 \
>                 if (!_ret)                              \
>                         _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);  \
>                 else                                    \
> -                       _ret = (_cn ? NET_XMIT_DROP : -EPERM);          \
> +                       _ret = (_cn ? NET_XMIT_DROP : _ret);            \

Sorry for the long delay in reviewing.
Overall it looks very good.
Few questions:

Why change this behavior for BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY ?
It's for an inet_egress hook only. In other words ip_output.
What kind of different error codes do you want to pass to
the stack from there?

> diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> index 4b937e5dbaca..164aa5020bf1 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> @@ -177,7 +177,7 @@ static int getsetsockopt(void)
>         optlen = sizeof(buf.zc);
>         errno = 0;
>         err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
> -       if (errno != EPERM) {
> +       if (errno != EINVAL) {

Could you explain which part of this patch caused this change
in user visible behavior?
I understand the desire to do bpf_set_retval(-EINVAL) and return 0,
but progs/sockopt_sk.c is not doing it.
Where does EINVAL come from?
YiFei Zhu Jan. 19, 2022, 8:22 p.m. UTC | #2
On Wed, Jan 19, 2022 at 11:50 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 4, 2022 at 9:15 AM YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > The helpers continue to use int for retval because all the hooks
> > are int-returning rather than long-returning. The return value of
> > bpf_set_retval is int for future-proofing, in case in the future
> > there may be errors trying to set the retval.
> >
> > After the previous patch, if a program rejects a syscall by
> > returning 0, an -EPERM will be generated no matter if the retval
> > is already set to -err. This patch change it being forced only if
> > retval is not -err. This is because we want to support, for
> > example, invoking bpf_set_retval(-EINVAL) and return 0, and have
> > the syscall return value be -EINVAL not -EPERM.
> >
> > This change is reflected in the sockopt_sk test which has been
> > updated to assert the errno is EINVAL instead of the EPERM.
> > The eBPF prog has to explicitly bpf_set_retval(-EPERM) if EPERM
> > is wanted. I also removed the explicit mentions of EPERM in the
> > comments in the prog.
> >
> > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> > that, if the return value is NET_XMIT_DROP, the packet is silently
> > dropped. We preserve this behavior for backward compatibility
> > reasons, so even if an errno is set, the errno does not return to
> > caller. However, setting a non-err to retval cannot propagate so
> > this is not allowed and we return a -EFAULT in that case.
> >
> > Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> > Reviewed-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/bpf.h                           | 10 +++--
> >  include/uapi/linux/bpf.h                      | 18 +++++++++
> >  kernel/bpf/cgroup.c                           | 38 ++++++++++++++++++-
> >  tools/include/uapi/linux/bpf.h                | 18 +++++++++
> >  .../selftests/bpf/prog_tests/sockopt_sk.c     |  2 +-
> >  .../testing/selftests/bpf/progs/sockopt_sk.c  | 32 ++++++++--------
> >  6 files changed, 96 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 88f6891e2b53..300df48fa0e0 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1300,7 +1300,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
> >         while ((prog = READ_ONCE(item->prog))) {
> >                 run_ctx.prog_item = item;
> >                 func_ret = run_prog(prog, ctx);
> > -               if (!(func_ret & 1))
> > +               if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))
> >                         run_ctx.retval = -EPERM;
> >                 *(ret_flags) |= (func_ret >> 1);
> >                 item++;
> > @@ -1330,7 +1330,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
> >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >         while ((prog = READ_ONCE(item->prog))) {
> >                 run_ctx.prog_item = item;
> > -               if (!run_prog(prog, ctx))
> > +               if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval))
> >                         run_ctx.retval = -EPERM;
> >                 item++;
> >         }
> > @@ -1390,7 +1390,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> >   *   0: NET_XMIT_SUCCESS  skb should be transmitted
> >   *   1: NET_XMIT_DROP     skb should be dropped and cn
> >   *   2: NET_XMIT_CN       skb should be transmitted and cn
> > - *   3: -EPERM            skb should be dropped
> > + *   3: -err              skb should be dropped
> >   */
> >  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)                \
> >         ({                                              \
> > @@ -1399,10 +1399,12 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> >                 u32 _ret;                               \
> >                 _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
> >                 _cn = _flags & BPF_RET_SET_CN;          \
> > +               if (_ret && !IS_ERR_VALUE((long)_ret))  \
> > +                       _ret = -EFAULT;                 \
> >                 if (!_ret)                              \
> >                         _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);  \
> >                 else                                    \
> > -                       _ret = (_cn ? NET_XMIT_DROP : -EPERM);          \
> > +                       _ret = (_cn ? NET_XMIT_DROP : _ret);            \
>
> Sorry for the long delay in reviewing.
> Overall it looks very good.
> Few questions:
>
> Why change this behavior for BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY ?
> It's for an inet_egress hook only. In other words ip_output.
> What kind of different error codes do you want to pass to
> the stack from there?

I don't really have a use case in mind for a different error code for
an egress hook (my use cases are for sockopt hooks) at the moment, but
it sounds to me that it would a surprising behavior if bpf_set_retval
is provided yet it would still be -EPERM.

> > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > index 4b937e5dbaca..164aa5020bf1 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > @@ -177,7 +177,7 @@ static int getsetsockopt(void)
> >         optlen = sizeof(buf.zc);
> >         errno = 0;
> >         err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
> > -       if (errno != EPERM) {
> > +       if (errno != EINVAL) {
>
> Could you explain which part of this patch caused this change
> in user visible behavior?
> I understand the desire to do bpf_set_retval(-EINVAL) and return 0,
> but progs/sockopt_sk.c is not doing it.
> Where does EINVAL come from?

This comes from the kernel. In for an anvalid address to the
getsockopt handler in tcp_zerocopy_receive [1]. The original behavior
prior to this series is that the eBPF getsockopt hook generating an
-EPERM overrides that of the kernel's -EINVAL, but now when the eBPF
hook returns 0, it sees that an -EINVAL is already set by the kernel
and does not modify the error code.

[1] https://elixir.bootlin.com/linux/v5.16.1/source/net/ipv4/tcp.c#L2060

YiFei Zhu
Alexei Starovoitov Jan. 19, 2022, 9:28 p.m. UTC | #3
On Wed, Jan 19, 2022 at 12:22 PM YiFei Zhu <zhuyifei@google.com> wrote:
>
> On Wed, Jan 19, 2022 at 11:50 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Jan 4, 2022 at 9:15 AM YiFei Zhu <zhuyifei@google.com> wrote:
> > >
> > > The helpers continue to use int for retval because all the hooks
> > > are int-returning rather than long-returning. The return value of
> > > bpf_set_retval is int for future-proofing, in case in the future
> > > there may be errors trying to set the retval.
> > >
> > > After the previous patch, if a program rejects a syscall by
> > > returning 0, an -EPERM will be generated no matter if the retval
> > > is already set to -err. This patch change it being forced only if
> > > retval is not -err. This is because we want to support, for
> > > example, invoking bpf_set_retval(-EINVAL) and return 0, and have
> > > the syscall return value be -EINVAL not -EPERM.
> > >
> > > This change is reflected in the sockopt_sk test which has been
> > > updated to assert the errno is EINVAL instead of the EPERM.
> > > The eBPF prog has to explicitly bpf_set_retval(-EPERM) if EPERM
> > > is wanted. I also removed the explicit mentions of EPERM in the
> > > comments in the prog.
> > >
> > > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> > > that, if the return value is NET_XMIT_DROP, the packet is silently
> > > dropped. We preserve this behavior for backward compatibility
> > > reasons, so even if an errno is set, the errno does not return to
> > > caller. However, setting a non-err to retval cannot propagate so
> > > this is not allowed and we return a -EFAULT in that case.
> > >
> > > Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> > > Reviewed-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  include/linux/bpf.h                           | 10 +++--
> > >  include/uapi/linux/bpf.h                      | 18 +++++++++
> > >  kernel/bpf/cgroup.c                           | 38 ++++++++++++++++++-
> > >  tools/include/uapi/linux/bpf.h                | 18 +++++++++
> > >  .../selftests/bpf/prog_tests/sockopt_sk.c     |  2 +-
> > >  .../testing/selftests/bpf/progs/sockopt_sk.c  | 32 ++++++++--------
> > >  6 files changed, 96 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 88f6891e2b53..300df48fa0e0 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1300,7 +1300,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
> > >         while ((prog = READ_ONCE(item->prog))) {
> > >                 run_ctx.prog_item = item;
> > >                 func_ret = run_prog(prog, ctx);
> > > -               if (!(func_ret & 1))
> > > +               if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))
> > >                         run_ctx.retval = -EPERM;
> > >                 *(ret_flags) |= (func_ret >> 1);
> > >                 item++;
> > > @@ -1330,7 +1330,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
> > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > >         while ((prog = READ_ONCE(item->prog))) {
> > >                 run_ctx.prog_item = item;
> > > -               if (!run_prog(prog, ctx))
> > > +               if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval))
> > >                         run_ctx.retval = -EPERM;
> > >                 item++;
> > >         }
> > > @@ -1390,7 +1390,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> > >   *   0: NET_XMIT_SUCCESS  skb should be transmitted
> > >   *   1: NET_XMIT_DROP     skb should be dropped and cn
> > >   *   2: NET_XMIT_CN       skb should be transmitted and cn
> > > - *   3: -EPERM            skb should be dropped
> > > + *   3: -err              skb should be dropped
> > >   */
> > >  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)                \
> > >         ({                                              \
> > > @@ -1399,10 +1399,12 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
> > >                 u32 _ret;                               \
> > >                 _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
> > >                 _cn = _flags & BPF_RET_SET_CN;          \
> > > +               if (_ret && !IS_ERR_VALUE((long)_ret))  \
> > > +                       _ret = -EFAULT;                 \
> > >                 if (!_ret)                              \
> > >                         _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);  \
> > >                 else                                    \
> > > -                       _ret = (_cn ? NET_XMIT_DROP : -EPERM);          \
> > > +                       _ret = (_cn ? NET_XMIT_DROP : _ret);            \
> >
> > Sorry for the long delay in reviewing.
> > Overall it looks very good.
> > Few questions:
> >
> > Why change this behavior for BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY ?
> > It's for an inet_egress hook only. In other words ip_output.
> > What kind of different error codes do you want to pass to
> > the stack from there?
>
> I don't really have a use case in mind for a different error code for
> an egress hook (my use cases are for sockopt hooks) at the moment, but
> it sounds to me that it would a surprising behavior if bpf_set_retval
> is provided yet it would still be -EPERM.

Good point.

> > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > > index 4b937e5dbaca..164aa5020bf1 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > > @@ -177,7 +177,7 @@ static int getsetsockopt(void)
> > >         optlen = sizeof(buf.zc);
> > >         errno = 0;
> > >         err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
> > > -       if (errno != EPERM) {
> > > +       if (errno != EINVAL) {
> >
> > Could you explain which part of this patch caused this change
> > in user visible behavior?
> > I understand the desire to do bpf_set_retval(-EINVAL) and return 0,
> > but progs/sockopt_sk.c is not doing it.
> > Where does EINVAL come from?
>
> This comes from the kernel. In for an anvalid address to the
> getsockopt handler in tcp_zerocopy_receive [1]. The original behavior
> prior to this series is that the eBPF getsockopt hook generating an
> -EPERM overrides that of the kernel's -EINVAL, but now when the eBPF
> hook returns 0, it sees that an -EINVAL is already set by the kernel
> and does not modify the error code.

Got it.
I've added the following to the last patch to clarify it:
-       buf.zc.address = 12345; /* rejected by BPF */
+       buf.zc.address = 12345; /* Not page aligned. Rejected by
tcp_zerocopy_receive() */

and applied the set to bpf-next. Thanks!
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 88f6891e2b53..300df48fa0e0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1300,7 +1300,7 @@  BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
 	while ((prog = READ_ONCE(item->prog))) {
 		run_ctx.prog_item = item;
 		func_ret = run_prog(prog, ctx);
-		if (!(func_ret & 1))
+		if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))
 			run_ctx.retval = -EPERM;
 		*(ret_flags) |= (func_ret >> 1);
 		item++;
@@ -1330,7 +1330,7 @@  BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 	while ((prog = READ_ONCE(item->prog))) {
 		run_ctx.prog_item = item;
-		if (!run_prog(prog, ctx))
+		if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval))
 			run_ctx.retval = -EPERM;
 		item++;
 	}
@@ -1390,7 +1390,7 @@  BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
  *   0: NET_XMIT_SUCCESS  skb should be transmitted
  *   1: NET_XMIT_DROP     skb should be dropped and cn
  *   2: NET_XMIT_CN       skb should be transmitted and cn
- *   3: -EPERM            skb should be dropped
+ *   3: -err              skb should be dropped
  */
 #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
 	({						\
@@ -1399,10 +1399,12 @@  BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
 		u32 _ret;				\
 		_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \
 		_cn = _flags & BPF_RET_SET_CN;		\
+		if (_ret && !IS_ERR_VALUE((long)_ret))	\
+			_ret = -EFAULT;			\
 		if (!_ret)				\
 			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
 		else					\
-			_ret = (_cn ? NET_XMIT_DROP : -EPERM);		\
+			_ret = (_cn ? NET_XMIT_DROP : _ret);		\
 		_ret;					\
 	})
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..140702c56938 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5018,6 +5018,22 @@  union bpf_attr {
  *
  *	Return
  *		The number of arguments of the traced function.
+ *
+ * int bpf_get_retval(void)
+ *	Description
+ *		Get the syscall's return value that will be returned to userspace.
+ *
+ *		This helper is currently supported by cgroup programs only.
+ *	Return
+ *		The syscall's return value.
+ *
+ * int bpf_set_retval(int retval)
+ *	Description
+ *		Set the syscall's return value that will be returned to userspace.
+ *
+ *		This helper is currently supported by cgroup programs only.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5206,6 +5222,8 @@  union bpf_attr {
 	FN(get_func_arg),		\
 	FN(get_func_ret),		\
 	FN(get_func_arg_cnt),		\
+	FN(get_retval),			\
+	FN(set_retval),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b6fad0bbf5a7..279ebbed75a5 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1044,7 +1044,7 @@  int cgroup_bpf_prog_query(const union bpf_attr *attr,
  *   NET_XMIT_DROP       (1)	- drop packet and notify TCP to call cwr
  *   NET_XMIT_CN         (2)	- continue with packet output and notify TCP
  *				  to call cwr
- *   -EPERM			- drop packet
+ *   -err			- drop packet
  *
  * For ingress packets, this function will return -EPERM if any
  * attached program was found and if it returned != 1 during execution.
@@ -1080,6 +1080,8 @@  int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	} else {
 		ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[atype], skb,
 					    __bpf_prog_run_save_cb, 0);
+		if (ret && !IS_ERR_VALUE((long)ret))
+			ret = -EFAULT;
 	}
 	bpf_restore_data_end(skb, saved_data_end);
 	__skb_pull(skb, offset);
@@ -1205,6 +1207,36 @@  int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 	return ret;
 }
 
+BPF_CALL_0(bpf_get_retval)
+{
+	struct bpf_cg_run_ctx *ctx =
+		container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx);
+
+	return ctx->retval;
+}
+
+static const struct bpf_func_proto bpf_get_retval_proto = {
+	.func		= bpf_get_retval,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+};
+
+BPF_CALL_1(bpf_set_retval, int, retval)
+{
+	struct bpf_cg_run_ctx *ctx =
+		container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx);
+
+	ctx->retval = retval;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_set_retval_proto = {
+	.func		= bpf_set_retval,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1217,6 +1249,10 @@  cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_current_cgroup_id_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
+	case BPF_FUNC_get_retval:
+		return &bpf_get_retval_proto;
+	case BPF_FUNC_set_retval:
+		return &bpf_set_retval_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b0383d371b9a..140702c56938 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5018,6 +5018,22 @@  union bpf_attr {
  *
  *	Return
  *		The number of arguments of the traced function.
+ *
+ * int bpf_get_retval(void)
+ *	Description
+ *		Get the syscall's return value that will be returned to userspace.
+ *
+ *		This helper is currently supported by cgroup programs only.
+ *	Return
+ *		The syscall's return value.
+ *
+ * int bpf_set_retval(int retval)
+ *	Description
+ *		Set the syscall's return value that will be returned to userspace.
+ *
+ *		This helper is currently supported by cgroup programs only.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5206,6 +5222,8 @@  union bpf_attr {
 	FN(get_func_arg),		\
 	FN(get_func_ret),		\
 	FN(get_func_arg_cnt),		\
+	FN(get_retval),			\
+	FN(set_retval),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
index 4b937e5dbaca..164aa5020bf1 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
@@ -177,7 +177,7 @@  static int getsetsockopt(void)
 	optlen = sizeof(buf.zc);
 	errno = 0;
 	err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
-	if (errno != EPERM) {
+	if (errno != EINVAL) {
 		log_err("Unexpected getsockopt(TCP_ZEROCOPY_RECEIVE) err=%d errno=%d",
 			err, errno);
 		goto err;
diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c
index 79c8139b63b8..d0298dccedcd 100644
--- a/tools/testing/selftests/bpf/progs/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c
@@ -73,17 +73,17 @@  int _getsockopt(struct bpf_sockopt *ctx)
 		 */
 
 		if (optval + sizeof(struct tcp_zerocopy_receive) > optval_end)
-			return 0; /* EPERM, bounds check */
+			return 0; /* bounds check */
 
 		if (((struct tcp_zerocopy_receive *)optval)->address != 0)
-			return 0; /* EPERM, unexpected data */
+			return 0; /* unexpected data */
 
 		return 1;
 	}
 
 	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
 		if (optval + 1 > optval_end)
-			return 0; /* EPERM, bounds check */
+			return 0; /* bounds check */
 
 		ctx->retval = 0; /* Reset system call return value to zero */
 
@@ -96,24 +96,24 @@  int _getsockopt(struct bpf_sockopt *ctx)
 		 * bytes of data.
 		 */
 		if (optval_end - optval != page_size)
-			return 0; /* EPERM, unexpected data size */
+			return 0; /* unexpected data size */
 
 		return 1;
 	}
 
 	if (ctx->level != SOL_CUSTOM)
-		return 0; /* EPERM, deny everything except custom level */
+		return 0; /* deny everything except custom level */
 
 	if (optval + 1 > optval_end)
-		return 0; /* EPERM, bounds check */
+		return 0; /* bounds check */
 
 	storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0,
 				     BPF_SK_STORAGE_GET_F_CREATE);
 	if (!storage)
-		return 0; /* EPERM, couldn't get sk storage */
+		return 0; /* couldn't get sk storage */
 
 	if (!ctx->retval)
-		return 0; /* EPERM, kernel should not have handled
+		return 0; /* kernel should not have handled
 			   * SOL_CUSTOM, something is wrong!
 			   */
 	ctx->retval = 0; /* Reset system call return value to zero */
@@ -152,7 +152,7 @@  int _setsockopt(struct bpf_sockopt *ctx)
 		/* Overwrite SO_SNDBUF value */
 
 		if (optval + sizeof(__u32) > optval_end)
-			return 0; /* EPERM, bounds check */
+			return 0; /* bounds check */
 
 		*(__u32 *)optval = 0x55AA;
 		ctx->optlen = 4;
@@ -164,7 +164,7 @@  int _setsockopt(struct bpf_sockopt *ctx)
 		/* Always use cubic */
 
 		if (optval + 5 > optval_end)
-			return 0; /* EPERM, bounds check */
+			return 0; /* bounds check */
 
 		memcpy(optval, "cubic", 5);
 		ctx->optlen = 5;
@@ -175,10 +175,10 @@  int _setsockopt(struct bpf_sockopt *ctx)
 	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
 		/* Original optlen is larger than PAGE_SIZE. */
 		if (ctx->optlen != page_size * 2)
-			return 0; /* EPERM, unexpected data size */
+			return 0; /* unexpected data size */
 
 		if (optval + 1 > optval_end)
-			return 0; /* EPERM, bounds check */
+			return 0; /* bounds check */
 
 		/* Make sure we can trim the buffer. */
 		optval[0] = 0;
@@ -189,21 +189,21 @@  int _setsockopt(struct bpf_sockopt *ctx)
 		 * bytes of data.
 		 */
 		if (optval_end - optval != page_size)
-			return 0; /* EPERM, unexpected data size */
+			return 0; /* unexpected data size */
 
 		return 1;
 	}
 
 	if (ctx->level != SOL_CUSTOM)
-		return 0; /* EPERM, deny everything except custom level */
+		return 0; /* deny everything except custom level */
 
 	if (optval + 1 > optval_end)
-		return 0; /* EPERM, bounds check */
+		return 0; /* bounds check */
 
 	storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0,
 				     BPF_SK_STORAGE_GET_F_CREATE);
 	if (!storage)
-		return 0; /* EPERM, couldn't get sk storage */
+		return 0; /* couldn't get sk storage */
 
 	storage->val = optval[0];
 	ctx->optlen = -1; /* BPF has consumed this option, don't call kernel