diff mbox series

[bpf-next,2/3] bpf: Add cgroup helper bpf_export_errno to get/set exported errno value

Message ID a2e569ee61e677ee474b7538adcebb0e1462df69.1633535940.git.zhuyifei@google.com (mailing list archive)
State New, archived
Delegated to: BPF
Headers show
Series bpf: allow cgroup progs to export custom errnos to userspace | expand

Checks

Context Check Description
netdev/cover_letter success Series has a cover letter
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 12 maintainers not CCed: brouer@redhat.com joe@cilium.io revest@chromium.org netdev@vger.kernel.org jackmanb@google.com kafai@fb.com yhs@fb.com andrii@kernel.org liuhangbin@gmail.com john.fastabend@gmail.com songliubraving@fb.com kpsingh@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 11790 this patch: 11790
netdev/kdoc success Errors and warnings before: 13 this patch: 13
netdev/verify_fixes success No Fixes tag
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 11421 this patch: 11421
netdev/header_inline success No static functions without inline keyword in header files
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next success VM_Test

Commit Message

YiFei Zhu Oct. 6, 2021, 4:02 p.m. UTC
From: YiFei Zhu <zhuyifei@google.com>

When passed in a positive errno, it sets the errno and returns 0.
When passed in 0, it gets the previously set errno. When passed in
an out of bound number, it returns -EINVAL. This is unambiguous:
negative return values are error in invoking the helper itself,
and positive return values are errnos being exported. Errnos once
set cannot be unset, but can be overridden.

The errno value is stored inside bpf_cg_run_ctx for ease of access
different prog types with different context structs layouts. The
helper implementation can simply perform a container_of from
current->bpf_ctx to retrieve bpf_cg_run_ctx.

For backward compatibility, if a program rejects without calling
the helper, and the errno has not been set by any prior progs, the
BPF_PROG_RUN_ARRAY_CG family macros automatically set the errno to
EPERM. If a prog sets an errno but returns 1 (allow), the outcome
is considered implementation-defined. This patch treat it the same
way as if 0 (reject) is returned.

For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
that, if the return value is NET_XMIT_DROP, the packet is silently
dropped. We preserve this behavior for backward compatibility
reasons, so even if an errno is set, the errno does not return to
caller.

For getsockopt hooks, they are different in that bpf progs runs
after kernel processes the getsockopt syscall instead of before.
There is also a retval in its context struct in which bpf progs
can unset the retval, and can force an -EPERM by returning 0.
We preseve the same semantics. Even though there is retval,
that value can only be unset, while progs can set (and not unset)
additional errno by using the helper, and that will override
whatever is in retval.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h            | 23 +++++++++++------------
 include/uapi/linux/bpf.h       | 14 ++++++++++++++
 kernel/bpf/cgroup.c            | 24 ++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
 4 files changed, 63 insertions(+), 12 deletions(-)

Comments

Song Liu Oct. 7, 2021, 12:41 a.m. UTC | #1
On Wed, Oct 6, 2021 at 9:04 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
>
> From: YiFei Zhu <zhuyifei@google.com>
>
> When passed in a positive errno, it sets the errno and returns 0.
> When passed in 0, it gets the previously set errno. When passed in
> an out of bound number, it returns -EINVAL. This is unambiguous:
> negative return values are error in invoking the helper itself,
> and positive return values are errnos being exported. Errnos once
> set cannot be unset, but can be overridden.
>
> The errno value is stored inside bpf_cg_run_ctx for ease of access
> different prog types with different context structs layouts. The
> helper implementation can simply perform a container_of from
> current->bpf_ctx to retrieve bpf_cg_run_ctx.
>
> For backward compatibility, if a program rejects without calling
> the helper, and the errno has not been set by any prior progs, the
> BPF_PROG_RUN_ARRAY_CG family macros automatically set the errno to
> EPERM. If a prog sets an errno but returns 1 (allow), the outcome
> is considered implementation-defined. This patch treat it the same
> way as if 0 (reject) is returned.
>
> For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> that, if the return value is NET_XMIT_DROP, the packet is silently
> dropped. We preserve this behavior for backward compatibility
> reasons, so even if an errno is set, the errno does not return to
> caller.
>
> For getsockopt hooks, they are different in that bpf progs runs
> after kernel processes the getsockopt syscall instead of before.
> There is also a retval in its context struct in which bpf progs
> can unset the retval, and can force an -EPERM by returning 0.
> We preseve the same semantics. Even though there is retval,
> that value can only be unset, while progs can set (and not unset)
> additional errno by using the helper, and that will override
> whatever is in retval.
>
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> Reviewed-by: Stanislav Fomichev <sdf@google.com>

This is pretty complicated, but the logic looks all correct. Thus,

Acked-by: Song Liu <songliubraving@fb.com>

One question, if the program want to retrieve existing errno_val, and
set a different one, it needs to call the helper twice, right? I guess it
is possible to do that in one call with a "swap" logic. Would this work?

Thanks,
Song
Song Liu Oct. 7, 2021, 5:59 a.m. UTC | #2
On Wed, Oct 6, 2021 at 5:41 PM Song Liu <song@kernel.org> wrote:
>
> On Wed, Oct 6, 2021 at 9:04 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> >
> > From: YiFei Zhu <zhuyifei@google.com>
> >
> > When passed in a positive errno, it sets the errno and returns 0.
> > When passed in 0, it gets the previously set errno. When passed in
> > an out of bound number, it returns -EINVAL. This is unambiguous:
> > negative return values are error in invoking the helper itself,
> > and positive return values are errnos being exported. Errnos once
> > set cannot be unset, but can be overridden.
> >
> > The errno value is stored inside bpf_cg_run_ctx for ease of access
> > different prog types with different context structs layouts. The
> > helper implementation can simply perform a container_of from
> > current->bpf_ctx to retrieve bpf_cg_run_ctx.
> >
> > For backward compatibility, if a program rejects without calling
> > the helper, and the errno has not been set by any prior progs, the
> > BPF_PROG_RUN_ARRAY_CG family macros automatically set the errno to
> > EPERM. If a prog sets an errno but returns 1 (allow), the outcome
> > is considered implementation-defined. This patch treat it the same
> > way as if 0 (reject) is returned.
> >
> > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> > that, if the return value is NET_XMIT_DROP, the packet is silently
> > dropped. We preserve this behavior for backward compatibility
> > reasons, so even if an errno is set, the errno does not return to
> > caller.
> >
> > For getsockopt hooks, they are different in that bpf progs runs
> > after kernel processes the getsockopt syscall instead of before.
> > There is also a retval in its context struct in which bpf progs
> > can unset the retval, and can force an -EPERM by returning 0.
> > We preseve the same semantics. Even though there is retval,
> > that value can only be unset, while progs can set (and not unset)
> > additional errno by using the helper, and that will override
> > whatever is in retval.
> >
> > Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> > Reviewed-by: Stanislav Fomichev <sdf@google.com>
>
> This is pretty complicated, but the logic looks all correct. Thus,
>
> Acked-by: Song Liu <songliubraving@fb.com>
>
> One question, if the program want to retrieve existing errno_val, and
> set a different one, it needs to call the helper twice, right? I guess it
> is possible to do that in one call with a "swap" logic. Would this work?

Actually, how about we split this into two helpers:bpf_set_errno() and
bpf_get_errno(). This should avoid some confusion in long term.

Thanks,
Song
Stanislav Fomichev Oct. 7, 2021, 3:11 p.m. UTC | #3
On 10/06, Song Liu wrote:
> On Wed, Oct 6, 2021 at 5:41 PM Song Liu <song@kernel.org> wrote:
> >
> > On Wed, Oct 6, 2021 at 9:04 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> > >
> > > From: YiFei Zhu <zhuyifei@google.com>
> > >
> > > When passed in a positive errno, it sets the errno and returns 0.
> > > When passed in 0, it gets the previously set errno. When passed in
> > > an out of bound number, it returns -EINVAL. This is unambiguous:
> > > negative return values are error in invoking the helper itself,
> > > and positive return values are errnos being exported. Errnos once
> > > set cannot be unset, but can be overridden.
> > >
> > > The errno value is stored inside bpf_cg_run_ctx for ease of access
> > > different prog types with different context structs layouts. The
> > > helper implementation can simply perform a container_of from
> > > current->bpf_ctx to retrieve bpf_cg_run_ctx.
> > >
> > > For backward compatibility, if a program rejects without calling
> > > the helper, and the errno has not been set by any prior progs, the
> > > BPF_PROG_RUN_ARRAY_CG family macros automatically set the errno to
> > > EPERM. If a prog sets an errno but returns 1 (allow), the outcome
> > > is considered implementation-defined. This patch treat it the same
> > > way as if 0 (reject) is returned.
> > >
> > > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> > > that, if the return value is NET_XMIT_DROP, the packet is silently
> > > dropped. We preserve this behavior for backward compatibility
> > > reasons, so even if an errno is set, the errno does not return to
> > > caller.
> > >
> > > For getsockopt hooks, they are different in that bpf progs runs
> > > after kernel processes the getsockopt syscall instead of before.
> > > There is also a retval in its context struct in which bpf progs
> > > can unset the retval, and can force an -EPERM by returning 0.
> > > We preseve the same semantics. Even though there is retval,
> > > that value can only be unset, while progs can set (and not unset)
> > > additional errno by using the helper, and that will override
> > > whatever is in retval.
> > >
> > > Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> > > Reviewed-by: Stanislav Fomichev <sdf@google.com>
> >
> > This is pretty complicated, but the logic looks all correct. Thus,
> >
> > Acked-by: Song Liu <songliubraving@fb.com>
> >
> > One question, if the program want to retrieve existing errno_val, and
> > set a different one, it needs to call the helper twice, right? I guess  
> it
> > is possible to do that in one call with a "swap" logic. Would this work?

> Actually, how about we split this into two helpers:bpf_set_errno() and
> bpf_get_errno(). This should avoid some confusion in long term.

We've agreed on the single helper during bpf office hours (about 2 weeks
ago), but we can do two, I don't think it matters that much.
YiFei Zhu Oct. 7, 2021, 4:23 p.m. UTC | #4
Yeah it felt like we only needed one helper for the parameters and
return values to be unambiguous. But if two better avoid confusion for
users, we can do that.

YiFei Zhu

On Thu, Oct 7, 2021 at 8:11 AM <sdf@google.com> wrote:
>
> On 10/06, Song Liu wrote:
> > On Wed, Oct 6, 2021 at 5:41 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Wed, Oct 6, 2021 at 9:04 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> > > >
> > > > From: YiFei Zhu <zhuyifei@google.com>
> > > >
> > > > When passed in a positive errno, it sets the errno and returns 0.
> > > > When passed in 0, it gets the previously set errno. When passed in
> > > > an out of bound number, it returns -EINVAL. This is unambiguous:
> > > > negative return values are error in invoking the helper itself,
> > > > and positive return values are errnos being exported. Errnos once
> > > > set cannot be unset, but can be overridden.
> > > >
> > > > The errno value is stored inside bpf_cg_run_ctx for ease of access
> > > > different prog types with different context structs layouts. The
> > > > helper implementation can simply perform a container_of from
> > > > current->bpf_ctx to retrieve bpf_cg_run_ctx.
> > > >
> > > > For backward compatibility, if a program rejects without calling
> > > > the helper, and the errno has not been set by any prior progs, the
> > > > BPF_PROG_RUN_ARRAY_CG family macros automatically set the errno to
> > > > EPERM. If a prog sets an errno but returns 1 (allow), the outcome
> > > > is considered implementation-defined. This patch treat it the same
> > > > way as if 0 (reject) is returned.
> > > >
> > > > For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> > > > that, if the return value is NET_XMIT_DROP, the packet is silently
> > > > dropped. We preserve this behavior for backward compatibility
> > > > reasons, so even if an errno is set, the errno does not return to
> > > > caller.
> > > >
> > > > For getsockopt hooks, they are different in that bpf progs runs
> > > > after kernel processes the getsockopt syscall instead of before.
> > > > There is also a retval in its context struct in which bpf progs
> > > > can unset the retval, and can force an -EPERM by returning 0.
> > > > We preseve the same semantics. Even though there is retval,
> > > > that value can only be unset, while progs can set (and not unset)
> > > > additional errno by using the helper, and that will override
> > > > whatever is in retval.
> > > >
> > > > Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> > > > Reviewed-by: Stanislav Fomichev <sdf@google.com>
> > >
> > > This is pretty complicated, but the logic looks all correct. Thus,
> > >
> > > Acked-by: Song Liu <songliubraving@fb.com>
> > >
> > > One question, if the program want to retrieve existing errno_val, and
> > > set a different one, it needs to call the helper twice, right? I guess
> > it
> > > is possible to do that in one call with a "swap" logic. Would this work?
>
> > Actually, how about we split this into two helpers:bpf_set_errno() and
> > bpf_get_errno(). This should avoid some confusion in long term.
>
> We've agreed on the single helper during bpf office hours (about 2 weeks
> ago), but we can do two, I don't think it matters that much.
Song Liu Oct. 7, 2021, 4:34 p.m. UTC | #5
On Thu, Oct 7, 2021 at 9:23 AM YiFei Zhu <zhuyifei@google.com> wrote:
>
> Yeah it felt like we only needed one helper for the parameters and
> return values to be unambiguous. But if two better avoid confusion for
> users, we can do that.
>
> YiFei Zhu
>
[...]
> > > >
> > > > One question, if the program want to retrieve existing errno_val, and
> > > > set a different one, it needs to call the helper twice, right? I guess
> > > it
> > > > is possible to do that in one call with a "swap" logic. Would this work?
> >
> > > Actually, how about we split this into two helpers:bpf_set_errno() and
> > > bpf_get_errno(). This should avoid some confusion in long term.
> >
> > We've agreed on the single helper during bpf office hours (about 2 weeks
> > ago), but we can do two, I don't think it matters that much.

I see. If we agreed on this syntax, I won't object.

Thanks,
Song
YiFei Zhu Oct. 8, 2021, 8:49 p.m. UTC | #6
On Thu, Oct 7, 2021 at 9:34 AM Song Liu <song@kernel.org> wrote:
>
> On Thu, Oct 7, 2021 at 9:23 AM YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > Yeah it felt like we only needed one helper for the parameters and
> > return values to be unambiguous. But if two better avoid confusion for
> > users, we can do that.
> >
> > YiFei Zhu
> >
> [...]
> > > > >
> > > > > One question, if the program want to retrieve existing errno_val, and
> > > > > set a different one, it needs to call the helper twice, right? I guess
> > > > it
> > > > > is possible to do that in one call with a "swap" logic. Would this work?
> > >
> > > > Actually, how about we split this into two helpers:bpf_set_errno() and
> > > > bpf_get_errno(). This should avoid some confusion in long term.
> > >
> > > We've agreed on the single helper during bpf office hours (about 2 weeks
> > > ago), but we can do two, I don't think it matters that much.
>
> I see. If we agreed on this syntax, I won't object.
>
> Thanks,
> Song

Shall I do the swap then? I don't think it has been discussed, and I
don't see any downsides from doing so, but I don't really see a
scenario in which someone would want to get and set at the same time
either.

YiFei Zhu
Stanislav Fomichev Oct. 8, 2021, 9 p.m. UTC | #7
On Fri, Oct 8, 2021 at 1:49 PM YiFei Zhu <zhuyifei@google.com> wrote:
>
> On Thu, Oct 7, 2021 at 9:34 AM Song Liu <song@kernel.org> wrote:
> >
> > On Thu, Oct 7, 2021 at 9:23 AM YiFei Zhu <zhuyifei@google.com> wrote:
> > >
> > > Yeah it felt like we only needed one helper for the parameters and
> > > return values to be unambiguous. But if two better avoid confusion for
> > > users, we can do that.
> > >
> > > YiFei Zhu
> > >
> > [...]
> > > > > >
> > > > > > One question, if the program want to retrieve existing errno_val, and
> > > > > > set a different one, it needs to call the helper twice, right? I guess
> > > > > it
> > > > > > is possible to do that in one call with a "swap" logic. Would this work?
> > > >
> > > > > Actually, how about we split this into two helpers:bpf_set_errno() and
> > > > > bpf_get_errno(). This should avoid some confusion in long term.
> > > >
> > > > We've agreed on the single helper during bpf office hours (about 2 weeks
> > > > ago), but we can do two, I don't think it matters that much.
> >
> > I see. If we agreed on this syntax, I won't object.
> >
> > Thanks,
> > Song
>
> Shall I do the swap then? I don't think it has been discussed, and I
> don't see any downsides from doing so, but I don't really see a
> scenario in which someone would want to get and set at the same time
> either.

What kind of swap do you have in mind? IMO it's such a corner case
operation that doing 2 calls is fine. I'm assuming the majority of
use-cases are: (1) export a custom errno regardless of was was
previously done in the chain (2) see if there was already an errno set
in the chain and bail out early. I don't see any real need for some
efficient swapping and rewriting, but I might be missing something..
Andrii Nakryiko Oct. 20, 2021, 11:28 p.m. UTC | #8
On Wed, Oct 6, 2021 at 9:04 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
>
> From: YiFei Zhu <zhuyifei@google.com>
>
> When passed in a positive errno, it sets the errno and returns 0.
> When passed in 0, it gets the previously set errno. When passed in
> an out of bound number, it returns -EINVAL. This is unambiguous:
> negative return values are error in invoking the helper itself,
> and positive return values are errnos being exported. Errnos once
> set cannot be unset, but can be overridden.
>
> The errno value is stored inside bpf_cg_run_ctx for ease of access
> different prog types with different context structs layouts. The
> helper implementation can simply perform a container_of from
> current->bpf_ctx to retrieve bpf_cg_run_ctx.
>
> For backward compatibility, if a program rejects without calling
> the helper, and the errno has not been set by any prior progs, the
> BPF_PROG_RUN_ARRAY_CG family macros automatically set the errno to
> EPERM. If a prog sets an errno but returns 1 (allow), the outcome
> is considered implementation-defined. This patch treat it the same
> way as if 0 (reject) is returned.
>
> For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
> that, if the return value is NET_XMIT_DROP, the packet is silently
> dropped. We preserve this behavior for backward compatibility
> reasons, so even if an errno is set, the errno does not return to
> caller.
>
> For getsockopt hooks, they are different in that bpf progs runs
> after kernel processes the getsockopt syscall instead of before.
> There is also a retval in its context struct in which bpf progs
> can unset the retval, and can force an -EPERM by returning 0.
> We preseve the same semantics. Even though there is retval,
> that value can only be unset, while progs can set (and not unset)
> additional errno by using the helper, and that will override
> whatever is in retval.
>
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> Reviewed-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf.h            | 23 +++++++++++------------
>  include/uapi/linux/bpf.h       | 14 ++++++++++++++
>  kernel/bpf/cgroup.c            | 24 ++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>  4 files changed, 63 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 938885562d68..5e3f3d2f5871 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1155,6 +1155,7 @@ struct bpf_run_ctx {};
>  struct bpf_cg_run_ctx {
>         struct bpf_run_ctx run_ctx;
>         const struct bpf_prog_array_item *prog_item;
> +       int errno_val;
>  };
>
>  struct bpf_trace_run_ctx {
> @@ -1196,8 +1197,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
>         const struct bpf_prog *prog;
>         const struct bpf_prog_array *array;
>         struct bpf_run_ctx *old_run_ctx;
> -       struct bpf_cg_run_ctx run_ctx;
> -       int ret = 0;
> +       struct bpf_cg_run_ctx run_ctx = {};

you are zero-initializing this struct unnecessarily here. It's a
microoptimization, but it would be a bit cheaper to just
run_ctx.errno_val = 0; before the loop.

>         u32 func_ret;
>
>         migrate_disable();
> @@ -1208,15 +1208,15 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
>         while ((prog = READ_ONCE(item->prog))) {
>                 run_ctx.prog_item = item;
>                 func_ret = run_prog(prog, ctx);
> -               if (!(func_ret & 1))
> -                       ret = -EPERM;
> +               if (!(func_ret & 1) && !run_ctx.errno_val)
> +                       run_ctx.errno_val = EPERM;
>                 *(ret_flags) |= (func_ret >> 1);
>                 item++;
>         }
>         bpf_reset_run_ctx(old_run_ctx);
>         rcu_read_unlock();
>         migrate_enable();
> -       return ret;
> +       return -run_ctx.errno_val;
>  }
>
>  static __always_inline int
> @@ -1227,8 +1227,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
>         const struct bpf_prog *prog;
>         const struct bpf_prog_array *array;
>         struct bpf_run_ctx *old_run_ctx;
> -       struct bpf_cg_run_ctx run_ctx;
> -       int ret = 0;
> +       struct bpf_cg_run_ctx run_ctx = {};
>
>         migrate_disable();
>         rcu_read_lock();
> @@ -1237,14 +1236,14 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
>         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>         while ((prog = READ_ONCE(item->prog))) {
>                 run_ctx.prog_item = item;
> -               if (!run_prog(prog, ctx))
> -                       ret = -EPERM;
> +               if (!run_prog(prog, ctx) && !run_ctx.errno_val)
> +                       run_ctx.errno_val = EPERM;
>                 item++;
>         }
>         bpf_reset_run_ctx(old_run_ctx);
>         rcu_read_unlock();
>         migrate_enable();
> -       return ret;
> +       return -run_ctx.errno_val;
>  }
>
>  static __always_inline u32
> @@ -1297,7 +1296,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
>   *   0: NET_XMIT_SUCCESS  skb should be transmitted
>   *   1: NET_XMIT_DROP     skb should be dropped and cn
>   *   2: NET_XMIT_CN       skb should be transmitted and cn
> - *   3: -EPERM            skb should be dropped
> + *   3: -errno            skb should be dropped
>   */
>  #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)                \
>         ({                                              \
> @@ -1309,7 +1308,7 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
>                 if (!_ret)                              \
>                         _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);  \
>                 else                                    \
> -                       _ret = (_cn ? NET_XMIT_DROP : -EPERM);          \
> +                       _ret = (_cn ? NET_XMIT_DROP : _ret);            \
>                 _ret;                                   \
>         })
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6fc59d61937a..d8126f8c0541 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -4909,6 +4909,19 @@ union bpf_attr {
>   *     Return
>   *             The number of bytes written to the buffer, or a negative error
>   *             in case of failure.
> + *
> + * int bpf_export_errno(int errno_val)

it's subjective, but "bpf_export_errno" name is quite confusing. What
are we "exporting" and where?

I actually like Song's proposal for two helpers,
bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
honestly don't remember the requirement to have one combined helper
from the BPF office hour discussion, but if there was a good reason
for that, please remind us.

> + *     Description
> + *             If *errno_val* is positive, set the syscall's return error code;

This inversion of error code is also confusing. If we are to return
-EXXX, bpf_set_err(EXXX) is quite confusing.

> + *             if *errno_val* is zero, retrieve the previously set code.

Also, are there use cases where zero is the valid "error" (or lack of
it, rather). I.e., wouldn't there be cases where you want to clear a
previous error? We might have discussed this, sorry if I forgot.

But either way, if bpf_set_err() accepted <= 0 and used that as error
value as-is (> 0 should be rejected, probably) that would make for
straightforward logic. Then for getting the current error we can have
a well-paired bpf_get_err()?


BTW, "errno" is very strongly associated with user-space errno, do we
want to have this naming association (this is the reason I used "err"
terminology above).

> + *
> + *             This helper is currently supported by cgroup programs only.
> + *     Return
> + *             Zero if set is successful, or the previously set error code on
> + *             retrieval. Previously set code may be zero if it was never set.
> + *             On error, a negative value.
> + *
> + *             **-EINVAL** if *errno_val* not between zero and MAX_ERRNO inclusive.

[...]
YiFei Zhu Oct. 26, 2021, 12:06 a.m. UTC | #9
On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> it's subjective, but "bpf_export_errno" name is quite confusing. What
> are we "exporting" and where?
>
> I actually like Song's proposal for two helpers,
> bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> honestly don't remember the requirement to have one combined helper
> from the BPF office hour discussion, but if there was a good reason
> for that, please remind us.
>
> > + *     Description
> > + *             If *errno_val* is positive, set the syscall's return error code;
>
> This inversion of error code is also confusing. If we are to return
> -EXXX, bpf_set_err(EXXX) is quite confusing.
>
> > + *             if *errno_val* is zero, retrieve the previously set code.
>
> Also, are there use cases where zero is the valid "error" (or lack of
> it, rather). I.e., wouldn't there be cases where you want to clear a
> previous error? We might have discussed this, sorry if I forgot.

Hmm, originally I thought it's best to assume the underlying
assumption is that filters may set policies and it would violate it if
policies become ignored; however one could argue that debugging would
be a use case for an error-clearing filter.

Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
errors. I'm having trouble thinking of the best way to have it
interact with the getsockopt "retval" in its context:
* Let's say the kernel initially sets an error code in the retval. I
think it would be a surprising behavior if only "retval" but not
bpf_get_err() shows the error. Therefore we'd need to initialize "err"
with the "retval" if retval is an error.
* If we initialize "err" with the "retval", then for a prog to clear
the error they'd need to clear it twice, once with bpf_set_err(0) with
and another with ctx->retval = 0. This will immediately break backward
compatibility. Therefore, we'd need to mirror the setting of
ctx->retval = 0 to bpf_set_err(0)
* In that case, what to do if a user uses ctx->retval as a way to pass
data between filters? I mean, whether ctx->retval is set to 0 or the
original is only checked after all filters are run. It could be any
value while the filters are running.
* A second issue, if we have first a legacy filter that returns 0 to
set EPERM, and then there's another filter that does a ctx->retval =
0. The original behavior would be that the syscall fails with EPERM,
but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
would be cleared.

One of the reasons I liked "export" is that it's slightly clearer that
this value is strictly from the BPF's side and has nothing to do with
what the kernel sets (as in the getsockopt case). But yeah I agree
it's not an ideal name.

> But either way, if bpf_set_err() accepted <= 0 and used that as error
> value as-is (> 0 should be rejected, probably) that would make for
> straightforward logic. Then for getting the current error we can have
> a well-paired bpf_get_err()?
>
>
> BTW, "errno" is very strongly associated with user-space errno, do we
> want to have this naming association (this is the reason I used "err"
> terminology above).

Ack.

YiFei Zhu
Stanislav Fomichev Oct. 26, 2021, 3:44 p.m. UTC | #10
On Mon, Oct 25, 2021 at 5:06 PM YiFei Zhu <zhuyifei@google.com> wrote:
>
> On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > it's subjective, but "bpf_export_errno" name is quite confusing. What
> > are we "exporting" and where?
> >
> > I actually like Song's proposal for two helpers,
> > bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> > honestly don't remember the requirement to have one combined helper
> > from the BPF office hour discussion, but if there was a good reason
> > for that, please remind us.
> >
> > > + *     Description
> > > + *             If *errno_val* is positive, set the syscall's return error code;
> >
> > This inversion of error code is also confusing. If we are to return
> > -EXXX, bpf_set_err(EXXX) is quite confusing.
> >
> > > + *             if *errno_val* is zero, retrieve the previously set code.
> >
> > Also, are there use cases where zero is the valid "error" (or lack of
> > it, rather). I.e., wouldn't there be cases where you want to clear a
> > previous error? We might have discussed this, sorry if I forgot.
>
> Hmm, originally I thought it's best to assume the underlying
> assumption is that filters may set policies and it would violate it if
> policies become ignored; however one could argue that debugging would
> be a use case for an error-clearing filter.
>
> Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
> errors. I'm having trouble thinking of the best way to have it
> interact with the getsockopt "retval" in its context:
> * Let's say the kernel initially sets an error code in the retval. I
> think it would be a surprising behavior if only "retval" but not
> bpf_get_err() shows the error. Therefore we'd need to initialize "err"
> with the "retval" if retval is an error.
> * If we initialize "err" with the "retval", then for a prog to clear
> the error they'd need to clear it twice, once with bpf_set_err(0) with
> and another with ctx->retval = 0. This will immediately break backward
> compatibility. Therefore, we'd need to mirror the setting of
> ctx->retval = 0 to bpf_set_err(0)
> * In that case, what to do if a user uses ctx->retval as a way to pass
> data between filters? I mean, whether ctx->retval is set to 0 or the
> original is only checked after all filters are run. It could be any
> value while the filters are running.
> * A second issue, if we have first a legacy filter that returns 0 to
> set EPERM, and then there's another filter that does a ctx->retval =
> 0. The original behavior would be that the syscall fails with EPERM,
> but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
> would be cleared.
>
> One of the reasons I liked "export" is that it's slightly clearer that
> this value is strictly from the BPF's side and has nothing to do with
> what the kernel sets (as in the getsockopt case). But yeah I agree
> it's not an ideal name.

For getsockopt, maybe the best way to go is to point ctx->retval to
run_ctx.errno_val? (i.e., bpf_set_err would be equivalent to doing
ctx->retval = x;). We can leave ctx->retval as a backwards-compatible
legacy way of doing things. For new programs, bpf_set_err would work
universally, regardless of attach type. Any cons here?

> > But either way, if bpf_set_err() accepted <= 0 and used that as error
> > value as-is (> 0 should be rejected, probably) that would make for
> > straightforward logic. Then for getting the current error we can have
> > a well-paired bpf_get_err()?
> >
> >
> > BTW, "errno" is very strongly associated with user-space errno, do we
> > want to have this naming association (this is the reason I used "err"
> > terminology above).
>
> Ack.
>
> YiFei Zhu
YiFei Zhu Oct. 26, 2021, 8:50 p.m. UTC | #11
On Tue, Oct 26, 2021 at 8:44 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Mon, Oct 25, 2021 at 5:06 PM YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > it's subjective, but "bpf_export_errno" name is quite confusing. What
> > > are we "exporting" and where?
> > >
> > > I actually like Song's proposal for two helpers,
> > > bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> > > honestly don't remember the requirement to have one combined helper
> > > from the BPF office hour discussion, but if there was a good reason
> > > for that, please remind us.
> > >
> > > > + *     Description
> > > > + *             If *errno_val* is positive, set the syscall's return error code;
> > >
> > > This inversion of error code is also confusing. If we are to return
> > > -EXXX, bpf_set_err(EXXX) is quite confusing.
> > >
> > > > + *             if *errno_val* is zero, retrieve the previously set code.
> > >
> > > Also, are there use cases where zero is the valid "error" (or lack of
> > > it, rather). I.e., wouldn't there be cases where you want to clear a
> > > previous error? We might have discussed this, sorry if I forgot.
> >
> > Hmm, originally I thought it's best to assume the underlying
> > assumption is that filters may set policies and it would violate it if
> > policies become ignored; however one could argue that debugging would
> > be a use case for an error-clearing filter.
> >
> > Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
> > errors. I'm having trouble thinking of the best way to have it
> > interact with the getsockopt "retval" in its context:
> > * Let's say the kernel initially sets an error code in the retval. I
> > think it would be a surprising behavior if only "retval" but not
> > bpf_get_err() shows the error. Therefore we'd need to initialize "err"
> > with the "retval" if retval is an error.
> > * If we initialize "err" with the "retval", then for a prog to clear
> > the error they'd need to clear it twice, once with bpf_set_err(0) with
> > and another with ctx->retval = 0. This will immediately break backward
> > compatibility. Therefore, we'd need to mirror the setting of
> > ctx->retval = 0 to bpf_set_err(0)
> > * In that case, what to do if a user uses ctx->retval as a way to pass
> > data between filters? I mean, whether ctx->retval is set to 0 or the
> > original is only checked after all filters are run. It could be any
> > value while the filters are running.
> > * A second issue, if we have first a legacy filter that returns 0 to
> > set EPERM, and then there's another filter that does a ctx->retval =
> > 0. The original behavior would be that the syscall fails with EPERM,
> > but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
> > would be cleared.
> >
> > One of the reasons I liked "export" is that it's slightly clearer that
> > this value is strictly from the BPF's side and has nothing to do with
> > what the kernel sets (as in the getsockopt case). But yeah I agree
> > it's not an ideal name.
>
> For getsockopt, maybe the best way to go is to point ctx->retval to
> run_ctx.errno_val? (i.e., bpf_set_err would be equivalent to doing
> ctx->retval = x;). We can leave ctx->retval as a backwards-compatible
> legacy way of doing things. For new programs, bpf_set_err would work
> universally, regardless of attach type. Any cons here?

Is it a concern that AFAICT getsockopt retval may be a positive number
whereas the err here must be non-negative?

Also the fourth point still stands. If any getsockopt returns 0,
original behavior is return -EPERM whereas new behavior, clearing
retval will clear -EPERM.

YiFei Zhu

> > > But either way, if bpf_set_err() accepted <= 0 and used that as error
> > > value as-is (> 0 should be rejected, probably) that would make for
> > > straightforward logic. Then for getting the current error we can have
> > > a well-paired bpf_get_err()?
> > >
> > >
> > > BTW, "errno" is very strongly associated with user-space errno, do we
> > > want to have this naming association (this is the reason I used "err"
> > > terminology above).
> >
> > Ack.
> >
> > YiFei Zhu
Stanislav Fomichev Oct. 26, 2021, 9:26 p.m. UTC | #12
On Tue, Oct 26, 2021 at 1:50 PM YiFei Zhu <zhuyifei@google.com> wrote:
>
> On Tue, Oct 26, 2021 at 8:44 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Mon, Oct 25, 2021 at 5:06 PM YiFei Zhu <zhuyifei@google.com> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > it's subjective, but "bpf_export_errno" name is quite confusing. What
> > > > are we "exporting" and where?
> > > >
> > > > I actually like Song's proposal for two helpers,
> > > > bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> > > > honestly don't remember the requirement to have one combined helper
> > > > from the BPF office hour discussion, but if there was a good reason
> > > > for that, please remind us.
> > > >
> > > > > + *     Description
> > > > > + *             If *errno_val* is positive, set the syscall's return error code;
> > > >
> > > > This inversion of error code is also confusing. If we are to return
> > > > -EXXX, bpf_set_err(EXXX) is quite confusing.
> > > >
> > > > > + *             if *errno_val* is zero, retrieve the previously set code.
> > > >
> > > > Also, are there use cases where zero is the valid "error" (or lack of
> > > > it, rather). I.e., wouldn't there be cases where you want to clear a
> > > > previous error? We might have discussed this, sorry if I forgot.
> > >
> > > Hmm, originally I thought it's best to assume the underlying
> > > assumption is that filters may set policies and it would violate it if
> > > policies become ignored; however one could argue that debugging would
> > > be a use case for an error-clearing filter.
> > >
> > > Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
> > > errors. I'm having trouble thinking of the best way to have it
> > > interact with the getsockopt "retval" in its context:
> > > * Let's say the kernel initially sets an error code in the retval. I
> > > think it would be a surprising behavior if only "retval" but not
> > > bpf_get_err() shows the error. Therefore we'd need to initialize "err"
> > > with the "retval" if retval is an error.
> > > * If we initialize "err" with the "retval", then for a prog to clear
> > > the error they'd need to clear it twice, once with bpf_set_err(0) with
> > > and another with ctx->retval = 0. This will immediately break backward
> > > compatibility. Therefore, we'd need to mirror the setting of
> > > ctx->retval = 0 to bpf_set_err(0)
> > > * In that case, what to do if a user uses ctx->retval as a way to pass
> > > data between filters? I mean, whether ctx->retval is set to 0 or the
> > > original is only checked after all filters are run. It could be any
> > > value while the filters are running.
> > > * A second issue, if we have first a legacy filter that returns 0 to
> > > set EPERM, and then there's another filter that does a ctx->retval =
> > > 0. The original behavior would be that the syscall fails with EPERM,
> > > but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
> > > would be cleared.
> > >
> > > One of the reasons I liked "export" is that it's slightly clearer that
> > > this value is strictly from the BPF's side and has nothing to do with
> > > what the kernel sets (as in the getsockopt case). But yeah I agree
> > > it's not an ideal name.
> >
> > For getsockopt, maybe the best way to go is to point ctx->retval to
> > run_ctx.errno_val? (i.e., bpf_set_err would be equivalent to doing
> > ctx->retval = x;). We can leave ctx->retval as a backwards-compatible
> > legacy way of doing things. For new programs, bpf_set_err would work
> > universally, regardless of attach type. Any cons here?
>
> Is it a concern that AFAICT getsockopt retval may be a positive number
> whereas the err here must be non-negative?

getsockopt retval is either -errno or 0. It's not really enforced at
load/attach time, but there is a runtime check which returns -EFAULT
if the prog sets it to something else.

> Also the fourth point still stands. If any getsockopt returns 0,
> original behavior is return -EPERM whereas new behavior, clearing
> retval will clear -EPERM.

True, but do you think these cases exist out there? I guess somebody
can do it inadvertently, but the example you've mentioned doesn't
really make sense, right?
This is why we are adding a way to propagate the status, so the
programs in the chain can understand whether they should do anything
at all (previous prog returned EPERM). Returning EPERM from the child
and then doing ctx->retval=0 in the parent should already not work as
expected.
YiFei Zhu Nov. 1, 2021, 10:23 a.m. UTC | #13
On Tue, Oct 26, 2021 at 2:26 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Oct 26, 2021 at 1:50 PM YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > On Tue, Oct 26, 2021 at 8:44 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On Mon, Oct 25, 2021 at 5:06 PM YiFei Zhu <zhuyifei@google.com> wrote:
> > > >
> > > > On Wed, Oct 20, 2021 at 4:28 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > it's subjective, but "bpf_export_errno" name is quite confusing. What
> > > > > are we "exporting" and where?
> > > > >
> > > > > I actually like Song's proposal for two helpers,
> > > > > bpf_set_err()/bpf_get_err(). It makes the semantics less confusing. I
> > > > > honestly don't remember the requirement to have one combined helper
> > > > > from the BPF office hour discussion, but if there was a good reason
> > > > > for that, please remind us.
> > > > >
> > > > > > + *     Description
> > > > > > + *             If *errno_val* is positive, set the syscall's return error code;
> > > > >
> > > > > This inversion of error code is also confusing. If we are to return
> > > > > -EXXX, bpf_set_err(EXXX) is quite confusing.
> > > > >
> > > > > > + *             if *errno_val* is zero, retrieve the previously set code.
> > > > >
> > > > > Also, are there use cases where zero is the valid "error" (or lack of
> > > > > it, rather). I.e., wouldn't there be cases where you want to clear a
> > > > > previous error? We might have discussed this, sorry if I forgot.
> > > >
> > > > Hmm, originally I thought it's best to assume the underlying
> > > > assumption is that filters may set policies and it would violate it if
> > > > policies become ignored; however one could argue that debugging would
> > > > be a use case for an error-clearing filter.
> > > >
> > > > Let's say we do bpf_set_err()/bpf_get_err(), with the ability to clear
> > > > errors. I'm having trouble thinking of the best way to have it
> > > > interact with the getsockopt "retval" in its context:
> > > > * Let's say the kernel initially sets an error code in the retval. I
> > > > think it would be a surprising behavior if only "retval" but not
> > > > bpf_get_err() shows the error. Therefore we'd need to initialize "err"
> > > > with the "retval" if retval is an error.
> > > > * If we initialize "err" with the "retval", then for a prog to clear
> > > > the error they'd need to clear it twice, once with bpf_set_err(0) with
> > > > and another with ctx->retval = 0. This will immediately break backward
> > > > compatibility. Therefore, we'd need to mirror the setting of
> > > > ctx->retval = 0 to bpf_set_err(0)
> > > > * In that case, what to do if a user uses ctx->retval as a way to pass
> > > > data between filters? I mean, whether ctx->retval is set to 0 or the
> > > > original is only checked after all filters are run. It could be any
> > > > value while the filters are running.
> > > > * A second issue, if we have first a legacy filter that returns 0 to
> > > > set EPERM, and then there's another filter that does a ctx->retval =
> > > > 0. The original behavior would be that the syscall fails with EPERM,
> > > > but if we mirror ctx->retval = 0 to bpf_set_err(0), then that EPERM
> > > > would be cleared.
> > > >
> > > > One of the reasons I liked "export" is that it's slightly clearer that
> > > > this value is strictly from the BPF's side and has nothing to do with
> > > > what the kernel sets (as in the getsockopt case). But yeah I agree
> > > > it's not an ideal name.
> > >
> > > For getsockopt, maybe the best way to go is to point ctx->retval to
> > > run_ctx.errno_val? (i.e., bpf_set_err would be equivalent to doing
> > > ctx->retval = x;). We can leave ctx->retval as a backwards-compatible
> > > legacy way of doing things. For new programs, bpf_set_err would work
> > > universally, regardless of attach type. Any cons here?
> >
> > Is it a concern that AFAICT getsockopt retval may be a positive number
> > whereas the err here must be non-negative?
>
> getsockopt retval is either -errno or 0. It's not really enforced at
> load/attach time, but there is a runtime check which returns -EFAULT
> if the prog sets it to something else.
>
> > Also the fourth point still stands. If any getsockopt returns 0,
> > original behavior is return -EPERM whereas new behavior, clearing
> > retval will clear -EPERM.
>
> True, but do you think these cases exist out there? I guess somebody
> can do it inadvertently, but the example you've mentioned doesn't
> really make sense, right?
> This is why we are adding a way to propagate the status, so the
> programs in the chain can understand whether they should do anything
> at all (previous prog returned EPERM). Returning EPERM from the child
> and then doing ctx->retval=0 in the parent should already not work as
> expected.

How about this? Have a bpf_{get,set}_retval that mirrors (in both
directions) the ctx->retval without any processing. Considering
in-kernel implementations of getsockopt sometimes return positive
values (usually optlen), we could allow eBPF-implemented getsockopt to
do so too, by relaxing the current 'only change to zero or keep the
same restriction, and allow it the filter to set arbitrary return
values to user space. For a filter that runs before the in-kernel
implementation, such as setsockopt or cgroup_skb, we verify after
running all the hooks, that it must be 0 or a negative number in
-errno; -EFAULT otherwise.

For legacy -EPERM programs that do it by returning 0, a filter that
bpf_set_retval(0) or ctx->retval = 0 will clear the -EPERM, this will
be different from the current behavior of getsockopt programs. I'm not
really sure of any use cases where users would rely on the current
behavior -- one would do ctx->retval = 0 to tell userspace that
something is done, yet another filter denies that 'something'? Doesn't
make sense to me, but correct me if I'm wrong or if we think this UAPI
must be kept exactly the same.

Another potential UAPI breakage is that originally getsockopt hooks
can inject an -EFAULT instead of -EPERM by setting bogus values to
ctx->retval. Now they have to do it by setting ctx->retval = -EFAULT;
any other value, even bogus values will be passed to userspace. That
said, I'm not sure why anyone would want to return an -EFAULT instead
of -EPERM; some unusual fault injection maybe? And in that case if
they literally want -EFAULT, the statement that makes sense would
already be ctx->retval = -EFAULT, which is usually a bogus value.

Considering that here we would have bpf_{get,set}_retval with no
in-kernel processing at all to mirror a value in ctx... I think it
would make a lot of sense to just use a context variable instead of
helpers (i.e. provide ctx->retval to all cgroup program types)?

YiFei Zhu
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 938885562d68..5e3f3d2f5871 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1155,6 +1155,7 @@  struct bpf_run_ctx {};
 struct bpf_cg_run_ctx {
 	struct bpf_run_ctx run_ctx;
 	const struct bpf_prog_array_item *prog_item;
+	int errno_val;
 };
 
 struct bpf_trace_run_ctx {
@@ -1196,8 +1197,7 @@  BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
 	const struct bpf_prog *prog;
 	const struct bpf_prog_array *array;
 	struct bpf_run_ctx *old_run_ctx;
-	struct bpf_cg_run_ctx run_ctx;
-	int ret = 0;
+	struct bpf_cg_run_ctx run_ctx = {};
 	u32 func_ret;
 
 	migrate_disable();
@@ -1208,15 +1208,15 @@  BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
 	while ((prog = READ_ONCE(item->prog))) {
 		run_ctx.prog_item = item;
 		func_ret = run_prog(prog, ctx);
-		if (!(func_ret & 1))
-			ret = -EPERM;
+		if (!(func_ret & 1) && !run_ctx.errno_val)
+			run_ctx.errno_val = EPERM;
 		*(ret_flags) |= (func_ret >> 1);
 		item++;
 	}
 	bpf_reset_run_ctx(old_run_ctx);
 	rcu_read_unlock();
 	migrate_enable();
-	return ret;
+	return -run_ctx.errno_val;
 }
 
 static __always_inline int
@@ -1227,8 +1227,7 @@  BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
 	const struct bpf_prog *prog;
 	const struct bpf_prog_array *array;
 	struct bpf_run_ctx *old_run_ctx;
-	struct bpf_cg_run_ctx run_ctx;
-	int ret = 0;
+	struct bpf_cg_run_ctx run_ctx = {};
 
 	migrate_disable();
 	rcu_read_lock();
@@ -1237,14 +1236,14 @@  BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 	while ((prog = READ_ONCE(item->prog))) {
 		run_ctx.prog_item = item;
-		if (!run_prog(prog, ctx))
-			ret = -EPERM;
+		if (!run_prog(prog, ctx) && !run_ctx.errno_val)
+			run_ctx.errno_val = EPERM;
 		item++;
 	}
 	bpf_reset_run_ctx(old_run_ctx);
 	rcu_read_unlock();
 	migrate_enable();
-	return ret;
+	return -run_ctx.errno_val;
 }
 
 static __always_inline u32
@@ -1297,7 +1296,7 @@  BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
  *   0: NET_XMIT_SUCCESS  skb should be transmitted
  *   1: NET_XMIT_DROP     skb should be dropped and cn
  *   2: NET_XMIT_CN       skb should be transmitted and cn
- *   3: -EPERM            skb should be dropped
+ *   3: -errno            skb should be dropped
  */
 #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
 	({						\
@@ -1309,7 +1308,7 @@  BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
 		if (!_ret)				\
 			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
 		else					\
-			_ret = (_cn ? NET_XMIT_DROP : -EPERM);		\
+			_ret = (_cn ? NET_XMIT_DROP : _ret);		\
 		_ret;					\
 	})
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6fc59d61937a..d8126f8c0541 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4909,6 +4909,19 @@  union bpf_attr {
  *	Return
  *		The number of bytes written to the buffer, or a negative error
  *		in case of failure.
+ *
+ * int bpf_export_errno(int errno_val)
+ *	Description
+ *		If *errno_val* is positive, set the syscall's return error code;
+ *		if *errno_val* is zero, retrieve the previously set code.
+ *
+ *		This helper is currently supported by cgroup programs only.
+ *	Return
+ *		Zero if set is successful, or the previously set error code on
+ *		retrieval. Previously set code may be zero if it was never set.
+ *		On error, a negative value.
+ *
+ *		**-EINVAL** if *errno_val* not between zero and MAX_ERRNO inclusive.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5089,6 +5102,7 @@  union bpf_attr {
 	FN(task_pt_regs),		\
 	FN(get_branch_snapshot),	\
 	FN(trace_vprintk),		\
+	FN(export_errno),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5efe2588575e..5b5051eb43e6 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1169,6 +1169,28 @@  int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 	return ret;
 }
 
+BPF_CALL_1(bpf_export_errno, int, errno_val)
+{
+	struct bpf_cg_run_ctx *ctx =
+		container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx);
+
+	if (errno_val < 0 || errno_val > MAX_ERRNO)
+		return -EINVAL;
+
+	if (!errno_val)
+		return ctx->errno_val;
+
+	ctx->errno_val = errno_val;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_export_errno_proto = {
+	.func		= bpf_export_errno,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1181,6 +1203,8 @@  cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_current_cgroup_id_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
+	case BPF_FUNC_export_errno:
+		return &bpf_export_errno_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6fc59d61937a..d8126f8c0541 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4909,6 +4909,19 @@  union bpf_attr {
  *	Return
  *		The number of bytes written to the buffer, or a negative error
  *		in case of failure.
+ *
+ * int bpf_export_errno(int errno_val)
+ *	Description
+ *		If *errno_val* is positive, set the syscall's return error code;
+ *		if *errno_val* is zero, retrieve the previously set code.
+ *
+ *		This helper is currently supported by cgroup programs only.
+ *	Return
+ *		Zero if set is successful, or the previously set error code on
+ *		retrieval. Previously set code may be zero if it was never set.
+ *		On error, a negative value.
+ *
+ *		**-EINVAL** if *errno_val* not between zero and MAX_ERRNO inclusive.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5089,6 +5102,7 @@  union bpf_attr {
 	FN(task_pt_regs),		\
 	FN(get_branch_snapshot),	\
 	FN(trace_vprintk),		\
+	FN(export_errno),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper