diff mbox series

[bpf-next,v1,7/9] bpf: Lift permission check in __sys_bpf when called from kernel.

Message ID 20220225234339.2386398-8-haoluo@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Extend cgroup interface with bpf | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 15 this patch: 15
netdev/cc_maintainers warning 2 maintainers not CCed: netdev@vger.kernel.org john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next fail VM_Test

Commit Message

Hao Luo Feb. 25, 2022, 11:43 p.m. UTC
After we introduced sleepable tracing programs, we now have an
interesting problem. There are now three execution paths that can
reach bpf_sys_bpf:

 1. called from bpf syscall.
 2. called from kernel context (e.g. kernel modules).
 3. called from bpf programs.

Ideally, capability check in bpf_sys_bpf is necessary for the first two
scenarios. But it may not be necessary for the third case.

The use case of sleepable tracepoints is to allow root user to deploy
bpf progs which run when a certain kernel tracepoints are triggered.
An example use case is to monitor cgroup creation and perform bpf
operations whenever a cgroup is created. These operations include
pinning an iter to export the cgroup's state. Using sleepable tracing
is preferred because it eliminates the need of a userspace daemon to
monitor cgroup changes.

However, in this use case, the current task who triggers the tracepoint
may be unprivileged and the permission check in __sys_bpf will thus
prevent it from making bpf syscalls. Therefore the tracing progs
deployed by root can not be used by non-root users.

A solution to this problem is to lift the permission check if the caller
of bpf_sys_bpf comes from either kernel context or bpf programs.

An alternative of lifting this permission check would be introducing an
'unpriv' version of bpf_sys_bpf, which doesn't check the current task's
capability. If the owner of the tracing prog wants it to be exclusively
used by root users, they can use the 'priv' version of bpf_sys_bpf; if
the owner wants it to be usable for non-root users, they can use the
'unpriv' version.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Alexei Starovoitov March 2, 2022, 8:01 p.m. UTC | #1
On Fri, Feb 25, 2022 at 03:43:37PM -0800, Hao Luo wrote:
> After we introduced sleepable tracing programs, we now have an
> interesting problem. There are now three execution paths that can
> reach bpf_sys_bpf:
> 
>  1. called from bpf syscall.
>  2. called from kernel context (e.g. kernel modules).
>  3. called from bpf programs.
> 
> Ideally, capability check in bpf_sys_bpf is necessary for the first two
> scenarios. But it may not be necessary for the third case.

Well, it's unnecessary for the first two as well.
When called from the kernel lskel it's a pointless check.
The kernel module can do anything regardless.
When called from bpf syscall program it's not quite correct either.
When CAP_BPF was introduced we've designed it to enforce permissions
at prog load time. The prog_run doesn't check permissions.
So syscall progs don't need this secondary permission check.
Please add "case BPF_PROG_TYPE_SYSCALL:" to is_perfmon_prog_type()
and combine it with this patch.

That would be the best. The alternative below is less appealing.

> An alternative of lifting this permission check would be introducing an
> 'unpriv' version of bpf_sys_bpf, which doesn't check the current task's
> capability. If the owner of the tracing prog wants it to be exclusively
> used by root users, they can use the 'priv' version of bpf_sys_bpf; if
> the owner wants it to be usable for non-root users, they can use the
> 'unpriv' version.

...

> -	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> +	if (sysctl_unprivileged_bpf_disabled && !bpf_capable() && !uattr.is_kernel)

This is great idea. If I could think of this I would went with it when prog_syscall
was introduced.
Hao Luo March 3, 2022, 7:14 p.m. UTC | #2
On Wed, Mar 2, 2022 at 12:02 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Feb 25, 2022 at 03:43:37PM -0800, Hao Luo wrote:
> > After we introduced sleepable tracing programs, we now have an
> > interesting problem. There are now three execution paths that can
> > reach bpf_sys_bpf:
> >
> >  1. called from bpf syscall.
> >  2. called from kernel context (e.g. kernel modules).
> >  3. called from bpf programs.
> >
> > Ideally, capability check in bpf_sys_bpf is necessary for the first two
> > scenarios. But it may not be necessary for the third case.
>
> Well, it's unnecessary for the first two as well.
> When called from the kernel lskel it's a pointless check.
> The kernel module can do anything regardless.
> When called from bpf syscall program it's not quite correct either.
> When CAP_BPF was introduced we've designed it to enforce permissions
> at prog load time. The prog_run doesn't check permissions.
> So syscall progs don't need this secondary permission check.
> Please add "case BPF_PROG_TYPE_SYSCALL:" to is_perfmon_prog_type()
> and combine it with this patch.
>

Sure, will do.

> That would be the best. The alternative below is less appealing.
>
> > An alternative of lifting this permission check would be introducing an
> > 'unpriv' version of bpf_sys_bpf, which doesn't check the current task's
> > capability. If the owner of the tracing prog wants it to be exclusively
> > used by root users, they can use the 'priv' version of bpf_sys_bpf; if
> > the owner wants it to be usable for non-root users, they can use the
> > 'unpriv' version.
>
> ...
>
> > -     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > +     if (sysctl_unprivileged_bpf_disabled && !bpf_capable() && !uattr.is_kernel)
>
> This is great idea. If I could think of this I would went with it when prog_syscall
> was introduced.

Thanks!
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0a12f52fe8a9..3bf88002ee56 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4613,7 +4613,7 @@  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	union bpf_attr attr;
 	int err;
 
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable() && !uattr.is_kernel)
 		return -EPERM;
 
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);