Message ID | 53db70ed544928d227df7e3f3a1f8c53e3665c65.1620499942.git.yifeifz2@illinois.edu (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | eBPF seccomp filters | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 9 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org mingo@redhat.com andrii@kernel.org kafai@fb.com rostedt@goodmis.org john.fastabend@gmail.com songliubraving@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 10045 this patch: 10041 |
netdev/kdoc | success | Errors and warnings before: 8 this patch: 8 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 78 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 10459 this patch: 10455 |
netdev/header_inline | success | Link |
On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote: > > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size, > + const void __user *, unsafe_ptr) > +{ > + int ret = -EPERM; > + > + if (get_dumpable(current->mm)) > + ret = copy_from_user_nofault(dst, unsafe_ptr, size); Could you explain a bit more how dumpable flag makes it safe for unpriv? The unpriv prog is attached to the children tasks only, right? and dumpable gets cleared if euid changes?
On Mon, May 10, 2021 at 9:04 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote: > > > > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size, > > + const void __user *, unsafe_ptr) > > +{ > > + int ret = -EPERM; > > + > > + if (get_dumpable(current->mm)) > > + ret = copy_from_user_nofault(dst, unsafe_ptr, size); > > Could you explain a bit more how dumpable flag makes it safe for unpriv? > The unpriv prog is attached to the children tasks only, right? > and dumpable gets cleared if euid changes? This is the "reduction to ptrace". The model here is that the eBPF seccomp filter is doing the equivalent of ptracing the user process using the privileges of the task at the time of loading the seccomp filter. ptrace access control is governed by ptrace.c:__ptrace_may_access. The requirements are: * always allow thread group introspection -- assume false so we are more restrictive than ptrace. * tracer has CAP_PTRACE in the target user namespace or tracer r/fsu/gidid equal target resu/gid -- discuss below * tracer has CAP_PTRACE in the target user namespace or target is SUID_DUMP_USER (I realized I should probably change the condition to == SUID_DUMP_USER). * passes LSM checks (eg yama ptrace_scope) -- we expose a hook to LSM but it's more of a "disable all advanced seccomp-eBPF features". How would a better interface to LSM look like? The dumpable check handles the "target is SUID_DUMP_USER" condition, in the circumstance that the loader does not have CAP_PTRACE in its namespace at the time of load. Why would this imply its CAP_PTRACE capability in target namespace? This is based on my understanding on how capabilities and user namespaces interact: For the sake of simplicity, let's first assume that loader is the same task as the task that attaches the filter (via prctl or seccomp syscall). * Case 1: target and loader are the same user namespace. Trivial case, the two operations are the same. * Case 2: target is loader's parent namespace. Can't happen under assumption. Seccomp affects itself and children only, and it is only possible to join a descendant user ns. * Case 3: target is loader's descendant namespace. Loader would have full CAP_PTRACE on target. We are more restrictive than ptrace. * Case 4: target and loader are on unrelated namespace branches. Can't happen under assumption. Same as case 2. Let's break this assumption and see what happens if the loader and attacher are in different contexts: * Case 1: attacher is less capable (as a general term of "what it can do") than loader then all of the above applies, since the model concerns and checks the capabilities of the loader. * Case 2: attacher is more capable than loader. The attacher would need an fd to the prog to attach it: * subcase 1: attacher inherited the fd after an exec and became more capable. uh... why is it trusting fds from a less capable context? * subcase 2: attacher has CAP_SYS_ADMIN and gets the fd via BPF_PROG_GET_FD_BY_ID. uh... why is it trusting random fds and attaching it? * subcase 3: attacher received the fd via a domain socket from a process which may be in a different user namespace. On my first thought, I thought, why is it trusting random fds from a less capable context? Except I just thought of an adversary could: * Clone into new userns, * Load filter in child, which has CAP_PTRACE in new userns * Send filter to the parent which doesn't have CAP_PTRACE in its userns * It's broken :( We'll think more about this case. One way is to check against init namespace, which means unpriv container runtimes won't have the non-dumpable override. Though, it shouldn't be affecting most of the use cases. Alternatively we can store which userns it was loaded from and reject attaching from a different userns. Regarding u/gids, for an attacher to attach a seccomp filter, whether cBPF or eBPF, if it doesn't have CAP_SYS_ADMIN in its current ns, it will have to set no_new_privs on itself before it can attach. (Unlike the previous discussion, this check is done at attach time rather than load.) With no_new_privs the target's privs is a subset of the attacher's, so the attacher should have a way to match the target's resuid, so this condition is not a concern. YiFei Zhu
On Tue, May 11, 2021 at 02:14:01AM -0500, YiFei Zhu wrote: > On Mon, May 10, 2021 at 9:04 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote: > > > > > > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size, > > > + const void __user *, unsafe_ptr) > > > +{ > > > + int ret = -EPERM; > > > + > > > + if (get_dumpable(current->mm)) > > > + ret = copy_from_user_nofault(dst, unsafe_ptr, size); > > > > Could you explain a bit more how dumpable flag makes it safe for unpriv? > > The unpriv prog is attached to the children tasks only, right? > > and dumpable gets cleared if euid changes? > > This is the "reduction to ptrace". The model here is that the eBPF > seccomp filter is doing the equivalent of ptracing the user process > using the privileges of the task at the time of loading the seccomp > filter. > > ptrace access control is governed by ptrace.c:__ptrace_may_access. The > requirements are: > * always allow thread group introspection -- assume false so we are > more restrictive than ptrace. > * tracer has CAP_PTRACE in the target user namespace or tracer > r/fsu/gidid equal target resu/gid -- discuss below > * tracer has CAP_PTRACE in the target user namespace or target is > SUID_DUMP_USER (I realized I should probably change the condition to > == SUID_DUMP_USER). > * passes LSM checks (eg yama ptrace_scope) -- we expose a hook to LSM > but it's more of a "disable all advanced seccomp-eBPF features". How > would a better interface to LSM look like? > > The dumpable check handles the "target is SUID_DUMP_USER" condition, > in the circumstance that the loader does not have CAP_PTRACE in its > namespace at the time of load. Why would this imply its CAP_PTRACE > capability in target namespace? This is based on my understanding on > how capabilities and user namespaces interact: > For the sake of simplicity, let's first assume that loader is the same > task as the task that attaches the filter (via prctl or seccomp > syscall). > * Case 1: target and loader are the same user namespace. Trivial case, > the two operations are the same. > * Case 2: target is loader's parent namespace. Can't happen under > assumption. Seccomp affects itself and children only, and it is only > possible to join a descendant user ns. > * Case 3: target is loader's descendant namespace. Loader would have > full CAP_PTRACE on target. We are more restrictive than ptrace. > * Case 4: target and loader are on unrelated namespace branches. Can't > happen under assumption. Same as case 2. > > Let's break this assumption and see what happens if the loader and > attacher are in different contexts: > * Case 1: attacher is less capable (as a general term of "what it can > do") than loader then all of the above applies, since the model > concerns and checks the capabilities of the loader. > * Case 2: attacher is more capable than loader. The attacher would > need an fd to the prog to attach it: > * subcase 1: attacher inherited the fd after an exec and became more > capable. uh... why is it trusting fds from a less capable context? > * subcase 2: attacher has CAP_SYS_ADMIN and gets the fd via > BPF_PROG_GET_FD_BY_ID. uh... why is it trusting random fds and > attaching it? > * subcase 3: attacher received the fd via a domain socket from a > process which may be in a different user namespace. On my first > thought, I thought, why is it trusting random fds from a less capable > context? Except I just thought of an adversary could: > * Clone into new userns, > * Load filter in child, which has CAP_PTRACE in new userns > * Send filter to the parent which doesn't have CAP_PTRACE in its userns > * It's broken :( > We'll think more about this case. One way is to check against init > namespace, which means unpriv container runtimes won't have the > non-dumpable override. Though, it shouldn't be affecting most of the > use cases. Alternatively we can store which userns it was loaded from > and reject attaching from a different userns. Typically the verifier does all the checks at load time to avoid run-time overhead during program execution. Then at attach time we check that attach parameters provided at load time match exactly to those at attach time. ifindex, attach_btf_id, etc fall into this category. Doing something similar it should be possible to avoid doing get_dumpable() at run-time.
On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > Typically the verifier does all the checks at load time to avoid > run-time overhead during program execution. Then at attach time we > check that attach parameters provided at load time match exactly > to those at attach time. ifindex, attach_btf_id, etc fall into this category. > Doing something similar it should be possible to avoid > doing get_dumpable() at run-time. Do you mean to move the check of dumpable to load time instead of runtime? I do not think that makes sense. A process may arbitrarily set its dumpable attribute during execution via prctl. A process could do set itself to non-dumpable, before interacting with sensitive information that would better not be possible to be dumped (eg. ssh-agent does this [1]). Therefore, being dumpable at one point in time does not indicate anything about whether it stays dumpable at a later point in time. Besides, seccomp filters are inherited across clone and exec, attaching to many tasks with no option to detach. What should the load-time check of task dump-ability be against? The current task may only be the tip of an iceburg. [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398 YiFei Zhu
> On May 12, 2021, at 10:26 PM, YiFei Zhu <zhuyifei1999@gmail.com> wrote: > > On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> Typically the verifier does all the checks at load time to avoid >> run-time overhead during program execution. Then at attach time we >> check that attach parameters provided at load time match exactly >> to those at attach time. ifindex, attach_btf_id, etc fall into this category. >> Doing something similar it should be possible to avoid >> doing get_dumpable() at run-time. > > Do you mean to move the check of dumpable to load time instead of > runtime? I do not think that makes sense. A process may arbitrarily > set its dumpable attribute during execution via prctl. A process could > do set itself to non-dumpable, before interacting with sensitive > information that would better not be possible to be dumped (eg. > ssh-agent does this [1]). Therefore, being dumpable at one point in > time does not indicate anything about whether it stays dumpable at a > later point in time. Besides, seccomp filters are inherited across > clone and exec, attaching to many tasks with no option to detach. What > should the load-time check of task dump-ability be against? The > current task may only be the tip of an iceburg. > > [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398 > > First things first: why are you checking dumpable at all? Once you figure out why and whether it’s needed, you may learn something about what task to check. I don’t think checking dumpable makes any sense.
On Thu, May 13, 2021 at 9:53 AM Andy Lutomirski <luto@amacapital.net> wrote: > > On May 12, 2021, at 10:26 PM, YiFei Zhu <zhuyifei1999@gmail.com> wrote: > > > > On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > >> Typically the verifier does all the checks at load time to avoid > >> run-time overhead during program execution. Then at attach time we > >> check that attach parameters provided at load time match exactly > >> to those at attach time. ifindex, attach_btf_id, etc fall into this category. > >> Doing something similar it should be possible to avoid > >> doing get_dumpable() at run-time. > > > > Do you mean to move the check of dumpable to load time instead of > > runtime? I do not think that makes sense. A process may arbitrarily > > set its dumpable attribute during execution via prctl. A process could > > do set itself to non-dumpable, before interacting with sensitive > > information that would better not be possible to be dumped (eg. > > ssh-agent does this [1]). Therefore, being dumpable at one point in > > time does not indicate anything about whether it stays dumpable at a > > later point in time. Besides, seccomp filters are inherited across > > clone and exec, attaching to many tasks with no option to detach. What > > should the load-time check of task dump-ability be against? The > > current task may only be the tip of an iceburg. > > > > [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398 > > > > > > First things first: why are you checking dumpable at all? Once you figure out why and whether it’s needed, you may learn something about what task to check. > > I don’t think checking dumpable makes any sense. ptrace. We don't want to extend one's ability to read another process's memory if they could not read it via ptrace (process_vm_readv or ptrace(PTRACE_PEEK{TEXT,DATA})). The constraints for ptrace to access a target's memory I've written down earlier [1], but tl;dr: to be at least as restrictive as ptrace, a tracer without CAP_PTRACE cannot trace a non-dumpable process. What's the target process (i.e. the process whose memory is being read) in the context of a seccomp filter? The current task. Does that answer your questions? [1] https://lore.kernel.org/bpf/CABqSeAT8iz-VhWjWqABqGbF7ydkoT7LmzJ5Do8K1ANQvQK=FJQ@mail.gmail.com/ YiFei Zhu
On Thu, May 13, 2021 at 10:13 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote: > > On Thu, May 13, 2021 at 9:53 AM Andy Lutomirski <luto@amacapital.net> wrote: > > > On May 12, 2021, at 10:26 PM, YiFei Zhu <zhuyifei1999@gmail.com> wrote: > > > > > > On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > >> Typically the verifier does all the checks at load time to avoid > > >> run-time overhead during program execution. Then at attach time we > > >> check that attach parameters provided at load time match exactly > > >> to those at attach time. ifindex, attach_btf_id, etc fall into this category. > > >> Doing something similar it should be possible to avoid > > >> doing get_dumpable() at run-time. > > > > > > Do you mean to move the check of dumpable to load time instead of > > > runtime? I do not think that makes sense. A process may arbitrarily > > > set its dumpable attribute during execution via prctl. A process could > > > do set itself to non-dumpable, before interacting with sensitive > > > information that would better not be possible to be dumped (eg. > > > ssh-agent does this [1]). Therefore, being dumpable at one point in > > > time does not indicate anything about whether it stays dumpable at a > > > later point in time. Besides, seccomp filters are inherited across > > > clone and exec, attaching to many tasks with no option to detach. What > > > should the load-time check of task dump-ability be against? The > > > current task may only be the tip of an iceburg. > > > > > > [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398 > > > > > > > > > > First things first: why are you checking dumpable at all? Once you figure out why and whether it’s needed, you may learn something about what task to check. > > > > I don’t think checking dumpable makes any sense. > > ptrace. We don't want to extend one's ability to read another > process's memory if they could not read it via ptrace > (process_vm_readv or ptrace(PTRACE_PEEK{TEXT,DATA})). The constraints > for ptrace to access a target's memory I've written down earlier [1], > but tl;dr: to be at least as restrictive as ptrace, a tracer without > CAP_PTRACE cannot trace a non-dumpable process. What's the target > process (i.e. the process whose memory is being read) in the context > of a seccomp filter? The current task. Does that answer your > questions? > > [1] https://lore.kernel.org/bpf/CABqSeAT8iz-VhWjWqABqGbF7ydkoT7LmzJ5Do8K1ANQvQK=FJQ@mail.gmail.com/ The whole seccomp model is based on the assumption that the filter installer completely controls the filtered task. Reading memory is not qualitatively different. To be clear, this is not to be interpreted as an ack to allowing seccomp to read process memory. I'm saying that, if seccomp gains the ability to read process memory, I don't think a dumpable or ptrace check is needed.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 86f3e8784e43..2019c0893250 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1965,6 +1965,10 @@ extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto; extern const struct bpf_func_proto bpf_task_storage_get_proto; extern const struct bpf_func_proto bpf_task_storage_delete_proto; extern const struct bpf_func_proto bpf_for_each_map_elem_proto; +extern const struct bpf_func_proto bpf_probe_read_user_proto; +extern const struct bpf_func_proto bpf_probe_read_user_dumpable_proto; +extern const struct bpf_func_proto bpf_probe_read_user_str_proto; +extern const struct bpf_func_proto bpf_probe_read_user_dumpable_str_proto; const struct bpf_func_proto *bpf_tracing_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); diff --git a/kernel/seccomp.c b/kernel/seccomp.c index b9ed9951a05b..330e9c365cdc 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -2449,6 +2449,14 @@ seccomp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_current_uid_gid_proto; case BPF_FUNC_get_current_pid_tgid: return &bpf_get_current_pid_tgid_proto; + case BPF_FUNC_probe_read_user: + return ns_capable(current_user_ns(), CAP_SYS_PTRACE) ? + &bpf_probe_read_user_proto : + &bpf_probe_read_user_dumpable_proto; + case BPF_FUNC_probe_read_user_str: + return ns_capable(current_user_ns(), CAP_SYS_PTRACE) ? + &bpf_probe_read_user_str_proto : + &bpf_probe_read_user_dumpable_str_proto; default: break; } diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index d2d7cf6cfe83..a1d6d64bde08 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -175,6 +175,27 @@ const struct bpf_func_proto bpf_probe_read_user_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size, + const void __user *, unsafe_ptr) +{ + int ret = -EPERM; + + if (get_dumpable(current->mm)) + ret = copy_from_user_nofault(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + return ret; +} + +const struct bpf_func_proto bpf_probe_read_user_dumpable_proto = { + .func = bpf_probe_read_user_dumpable, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_UNINIT_MEM, + .arg2_type = ARG_CONST_SIZE_OR_ZERO, + .arg3_type = ARG_ANYTHING, +}; + static __always_inline int bpf_probe_read_user_str_common(void *dst, u32 size, const void __user *unsafe_ptr) @@ -212,6 +233,27 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_3(bpf_probe_read_user_dumpable_str, void *, dst, u32, size, + const void __user *, unsafe_ptr) +{ + int ret = -EPERM; + + if (get_dumpable(current->mm)) + ret = strncpy_from_user_nofault(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + return ret; +} + +const struct bpf_func_proto bpf_probe_read_user_dumpable_str_proto = { + .func = bpf_probe_read_user_dumpable_str, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_UNINIT_MEM, + .arg2_type = ARG_CONST_SIZE_OR_ZERO, + .arg3_type = ARG_ANYTHING, +}; + static __always_inline int bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) {