diff mbox series

[RFC,bpf-next,seccomp,10/12] seccomp-ebpf: Add ability to read user memory

Message ID 53db70ed544928d227df7e3f3a1f8c53e3665c65.1620499942.git.yifeifz2@illinois.edu (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series eBPF seccomp filters | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 9 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org mingo@redhat.com andrii@kernel.org kafai@fb.com rostedt@goodmis.org john.fastabend@gmail.com songliubraving@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 10045 this patch: 10041
netdev/kdoc success Errors and warnings before: 8 this patch: 8
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 78 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 10459 this patch: 10455
netdev/header_inline success Link

Commit Message

YiFei Zhu May 10, 2021, 5:22 p.m. UTC
From: YiFei Zhu <yifeifz2@illinois.edu>

This uses helpers bpf_probe_read_user{,str}. To repect unprivileged
users may also load filters, when the loader of the filter does not
have CAP_SYS_PTRACE, attempting to read user memory when current mm
is non-dumpable results in -EPERM.

Right now this is not sleepable, -EFAULT may happen for valid memory
addresses. Future work might be adding support to bpf_copy_from_user
via sleepable filters.

Use of memory data to implement policy is discouraged until there is
a solution for time-of-check to time-of-use.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 include/linux/bpf.h      |  4 ++++
 kernel/seccomp.c         |  8 ++++++++
 kernel/trace/bpf_trace.c | 42 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

Comments

Alexei Starovoitov May 11, 2021, 2:04 a.m. UTC | #1
On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote:
>  
> +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size,
> +	   const void __user *, unsafe_ptr)
> +{
> +	int ret = -EPERM;
> +
> +	if (get_dumpable(current->mm))
> +		ret = copy_from_user_nofault(dst, unsafe_ptr, size);

Could you explain a bit more how dumpable flag makes it safe for unpriv?
The unpriv prog is attached to the children tasks only, right?
and dumpable gets cleared if euid changes?
YiFei Zhu May 11, 2021, 7:14 a.m. UTC | #2
On Mon, May 10, 2021 at 9:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote:
> >
> > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size,
> > +        const void __user *, unsafe_ptr)
> > +{
> > +     int ret = -EPERM;
> > +
> > +     if (get_dumpable(current->mm))
> > +             ret = copy_from_user_nofault(dst, unsafe_ptr, size);
>
> Could you explain a bit more how dumpable flag makes it safe for unpriv?
> The unpriv prog is attached to the children tasks only, right?
> and dumpable gets cleared if euid changes?

This is the "reduction to ptrace". The model here is that the eBPF
seccomp filter is doing the equivalent of ptracing the user process
using the privileges of the task at the time of loading the seccomp
filter.

ptrace access control is governed by ptrace.c:__ptrace_may_access. The
requirements are:
* always allow thread group introspection -- assume false so we are
more restrictive than ptrace.
* tracer has CAP_PTRACE in the target user namespace or tracer
r/fsu/gidid equal target resu/gid -- discuss below
* tracer has CAP_PTRACE in the target user namespace or target is
SUID_DUMP_USER (I realized I should probably change the condition to
== SUID_DUMP_USER).
* passes LSM checks (eg yama ptrace_scope) -- we expose a hook to LSM
but it's more of a "disable all advanced seccomp-eBPF features". How
would a better interface to LSM look like?

The dumpable check handles the "target is SUID_DUMP_USER" condition,
in the circumstance that the loader does not have CAP_PTRACE in its
namespace at the time of load. Why would this imply its CAP_PTRACE
capability in target namespace? This is based on my understanding on
how capabilities and user namespaces interact:
For the sake of simplicity, let's first assume that loader is the same
task as the task that attaches the filter (via prctl or seccomp
syscall).
* Case 1: target and loader are the same user namespace. Trivial case,
the two operations are the same.
* Case 2: target is loader's parent namespace. Can't happen under
assumption. Seccomp affects itself and children only, and it is only
possible to join a descendant user ns.
* Case 3: target is loader's descendant namespace. Loader would have
full CAP_PTRACE on target. We are more restrictive than ptrace.
* Case 4: target and loader are on unrelated namespace branches. Can't
happen under assumption. Same as case 2.

Let's break this assumption and see what happens if the loader and
attacher are in different contexts:
* Case 1: attacher is less capable (as a general term of "what it can
do") than loader then all of the above applies, since the model
concerns and checks the capabilities of the loader.
* Case 2: attacher is more capable than loader. The attacher would
need an fd to the prog to attach it:
  * subcase 1: attacher inherited the fd after an exec and became more
capable. uh... why is it trusting fds from a less capable context?
  * subcase 2: attacher has CAP_SYS_ADMIN and gets the fd via
BPF_PROG_GET_FD_BY_ID. uh... why is it trusting random fds and
attaching it?
  * subcase 3: attacher received the fd via a domain socket from a
process which may be in a different user namespace. On my first
thought, I thought, why is it trusting random fds from a less capable
context? Except I just thought of an adversary could:
    * Clone into new userns,
    * Load filter in child, which has CAP_PTRACE in new userns
    * Send filter to the parent which doesn't have CAP_PTRACE in its userns
    * It's broken :(
We'll think more about this case. One way is to check against init
namespace, which means unpriv container runtimes won't have the
non-dumpable override. Though, it shouldn't be affecting most of the
use cases. Alternatively we can store which userns it was loaded from
and reject attaching from a different userns.

Regarding u/gids, for an attacher to attach a seccomp filter, whether
cBPF or eBPF, if it doesn't have CAP_SYS_ADMIN in its current ns, it
will have to set no_new_privs on itself before it can attach. (Unlike
the previous discussion, this check is done at attach time rather than
load.) With no_new_privs the target's privs is a subset of the
attacher's, so the attacher should have a way to match the target's
resuid, so this condition is not a concern.

YiFei Zhu
Alexei Starovoitov May 12, 2021, 10:36 p.m. UTC | #3
On Tue, May 11, 2021 at 02:14:01AM -0500, YiFei Zhu wrote:
> On Mon, May 10, 2021 at 9:04 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote:
> > >
> > > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size,
> > > +        const void __user *, unsafe_ptr)
> > > +{
> > > +     int ret = -EPERM;
> > > +
> > > +     if (get_dumpable(current->mm))
> > > +             ret = copy_from_user_nofault(dst, unsafe_ptr, size);
> >
> > Could you explain a bit more how dumpable flag makes it safe for unpriv?
> > The unpriv prog is attached to the children tasks only, right?
> > and dumpable gets cleared if euid changes?
> 
> This is the "reduction to ptrace". The model here is that the eBPF
> seccomp filter is doing the equivalent of ptracing the user process
> using the privileges of the task at the time of loading the seccomp
> filter.
> 
> ptrace access control is governed by ptrace.c:__ptrace_may_access. The
> requirements are:
> * always allow thread group introspection -- assume false so we are
> more restrictive than ptrace.
> * tracer has CAP_PTRACE in the target user namespace or tracer
> r/fsu/gidid equal target resu/gid -- discuss below
> * tracer has CAP_PTRACE in the target user namespace or target is
> SUID_DUMP_USER (I realized I should probably change the condition to
> == SUID_DUMP_USER).
> * passes LSM checks (eg yama ptrace_scope) -- we expose a hook to LSM
> but it's more of a "disable all advanced seccomp-eBPF features". How
> would a better interface to LSM look like?
> 
> The dumpable check handles the "target is SUID_DUMP_USER" condition,
> in the circumstance that the loader does not have CAP_PTRACE in its
> namespace at the time of load. Why would this imply its CAP_PTRACE
> capability in target namespace? This is based on my understanding on
> how capabilities and user namespaces interact:
> For the sake of simplicity, let's first assume that loader is the same
> task as the task that attaches the filter (via prctl or seccomp
> syscall).
> * Case 1: target and loader are the same user namespace. Trivial case,
> the two operations are the same.
> * Case 2: target is loader's parent namespace. Can't happen under
> assumption. Seccomp affects itself and children only, and it is only
> possible to join a descendant user ns.
> * Case 3: target is loader's descendant namespace. Loader would have
> full CAP_PTRACE on target. We are more restrictive than ptrace.
> * Case 4: target and loader are on unrelated namespace branches. Can't
> happen under assumption. Same as case 2.
> 
> Let's break this assumption and see what happens if the loader and
> attacher are in different contexts:
> * Case 1: attacher is less capable (as a general term of "what it can
> do") than loader then all of the above applies, since the model
> concerns and checks the capabilities of the loader.
> * Case 2: attacher is more capable than loader. The attacher would
> need an fd to the prog to attach it:
>   * subcase 1: attacher inherited the fd after an exec and became more
> capable. uh... why is it trusting fds from a less capable context?
>   * subcase 2: attacher has CAP_SYS_ADMIN and gets the fd via
> BPF_PROG_GET_FD_BY_ID. uh... why is it trusting random fds and
> attaching it?
>   * subcase 3: attacher received the fd via a domain socket from a
> process which may be in a different user namespace. On my first
> thought, I thought, why is it trusting random fds from a less capable
> context? Except I just thought of an adversary could:
>     * Clone into new userns,
>     * Load filter in child, which has CAP_PTRACE in new userns
>     * Send filter to the parent which doesn't have CAP_PTRACE in its userns
>     * It's broken :(
> We'll think more about this case. One way is to check against init
> namespace, which means unpriv container runtimes won't have the
> non-dumpable override. Though, it shouldn't be affecting most of the
> use cases. Alternatively we can store which userns it was loaded from
> and reject attaching from a different userns.

Typically the verifier does all the checks at load time to avoid
run-time overhead during program execution. Then at attach time we
check that attach parameters provided at load time match exactly
to those at attach time. ifindex, attach_btf_id, etc fall into this category.
Doing something similar it should be possible to avoid
doing get_dumpable() at run-time.
YiFei Zhu May 13, 2021, 5:26 a.m. UTC | #4
On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> Typically the verifier does all the checks at load time to avoid
> run-time overhead during program execution. Then at attach time we
> check that attach parameters provided at load time match exactly
> to those at attach time. ifindex, attach_btf_id, etc fall into this category.
> Doing something similar it should be possible to avoid
> doing get_dumpable() at run-time.

Do you mean to move the check of dumpable to load time instead of
runtime? I do not think that makes sense. A process may arbitrarily
set its dumpable attribute during execution via prctl. A process could
do set itself to non-dumpable, before interacting with sensitive
information that would better not be possible to be dumped (eg.
ssh-agent does this [1]). Therefore, being dumpable at one point in
time does not indicate anything about whether it stays dumpable at a
later point in time. Besides, seccomp filters are inherited across
clone and exec, attaching to many tasks with no option to detach. What
should the load-time check of task dump-ability be against? The
current task may only be the tip of an iceburg.

[1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398

YiFei Zhu
Andy Lutomirski May 13, 2021, 2:53 p.m. UTC | #5
> On May 12, 2021, at 10:26 PM, YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> 
> On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> Typically the verifier does all the checks at load time to avoid
>> run-time overhead during program execution. Then at attach time we
>> check that attach parameters provided at load time match exactly
>> to those at attach time. ifindex, attach_btf_id, etc fall into this category.
>> Doing something similar it should be possible to avoid
>> doing get_dumpable() at run-time.
> 
> Do you mean to move the check of dumpable to load time instead of
> runtime? I do not think that makes sense. A process may arbitrarily
> set its dumpable attribute during execution via prctl. A process could
> do set itself to non-dumpable, before interacting with sensitive
> information that would better not be possible to be dumped (eg.
> ssh-agent does this [1]). Therefore, being dumpable at one point in
> time does not indicate anything about whether it stays dumpable at a
> later point in time. Besides, seccomp filters are inherited across
> clone and exec, attaching to many tasks with no option to detach. What
> should the load-time check of task dump-ability be against? The
> current task may only be the tip of an iceburg.
> 
> [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398
> 
> 

First things first: why are you checking dumpable at all?  Once you figure out why and whether it’s needed, you may learn something about what task to check.

I don’t think checking dumpable makes any sense.
YiFei Zhu May 13, 2021, 5:12 p.m. UTC | #6
On Thu, May 13, 2021 at 9:53 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > On May 12, 2021, at 10:26 PM, YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> >
> > On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >> Typically the verifier does all the checks at load time to avoid
> >> run-time overhead during program execution. Then at attach time we
> >> check that attach parameters provided at load time match exactly
> >> to those at attach time. ifindex, attach_btf_id, etc fall into this category.
> >> Doing something similar it should be possible to avoid
> >> doing get_dumpable() at run-time.
> >
> > Do you mean to move the check of dumpable to load time instead of
> > runtime? I do not think that makes sense. A process may arbitrarily
> > set its dumpable attribute during execution via prctl. A process could
> > do set itself to non-dumpable, before interacting with sensitive
> > information that would better not be possible to be dumped (eg.
> > ssh-agent does this [1]). Therefore, being dumpable at one point in
> > time does not indicate anything about whether it stays dumpable at a
> > later point in time. Besides, seccomp filters are inherited across
> > clone and exec, attaching to many tasks with no option to detach. What
> > should the load-time check of task dump-ability be against? The
> > current task may only be the tip of an iceburg.
> >
> > [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398
> >
> >
>
> First things first: why are you checking dumpable at all?  Once you figure out why and whether it’s needed, you may learn something about what task to check.
>
> I don’t think checking dumpable makes any sense.

ptrace. We don't want to extend one's ability to read another
process's memory if they could not read it via ptrace
(process_vm_readv or ptrace(PTRACE_PEEK{TEXT,DATA})). The constraints
for ptrace to access a target's memory I've written down earlier [1],
but tl;dr: to be at least as restrictive as ptrace, a tracer without
CAP_PTRACE cannot trace a non-dumpable process. What's the target
process (i.e. the process whose memory is being read) in the context
of a seccomp filter? The current task. Does that answer your
questions?

[1] https://lore.kernel.org/bpf/CABqSeAT8iz-VhWjWqABqGbF7ydkoT7LmzJ5Do8K1ANQvQK=FJQ@mail.gmail.com/

YiFei Zhu
Andy Lutomirski May 13, 2021, 5:15 p.m. UTC | #7
On Thu, May 13, 2021 at 10:13 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 9:53 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > > On May 12, 2021, at 10:26 PM, YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> > >
> > > On Wed, May 12, 2021 at 5:36 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > >> Typically the verifier does all the checks at load time to avoid
> > >> run-time overhead during program execution. Then at attach time we
> > >> check that attach parameters provided at load time match exactly
> > >> to those at attach time. ifindex, attach_btf_id, etc fall into this category.
> > >> Doing something similar it should be possible to avoid
> > >> doing get_dumpable() at run-time.
> > >
> > > Do you mean to move the check of dumpable to load time instead of
> > > runtime? I do not think that makes sense. A process may arbitrarily
> > > set its dumpable attribute during execution via prctl. A process could
> > > do set itself to non-dumpable, before interacting with sensitive
> > > information that would better not be possible to be dumped (eg.
> > > ssh-agent does this [1]). Therefore, being dumpable at one point in
> > > time does not indicate anything about whether it stays dumpable at a
> > > later point in time. Besides, seccomp filters are inherited across
> > > clone and exec, attaching to many tasks with no option to detach. What
> > > should the load-time check of task dump-ability be against? The
> > > current task may only be the tip of an iceburg.
> > >
> > > [1] https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-agent.c#L1398
> > >
> > >
> >
> > First things first: why are you checking dumpable at all?  Once you figure out why and whether it’s needed, you may learn something about what task to check.
> >
> > I don’t think checking dumpable makes any sense.
>
> ptrace. We don't want to extend one's ability to read another
> process's memory if they could not read it via ptrace
> (process_vm_readv or ptrace(PTRACE_PEEK{TEXT,DATA})). The constraints
> for ptrace to access a target's memory I've written down earlier [1],
> but tl;dr: to be at least as restrictive as ptrace, a tracer without
> CAP_PTRACE cannot trace a non-dumpable process. What's the target
> process (i.e. the process whose memory is being read) in the context
> of a seccomp filter? The current task. Does that answer your
> questions?
>
> [1] https://lore.kernel.org/bpf/CABqSeAT8iz-VhWjWqABqGbF7ydkoT7LmzJ5Do8K1ANQvQK=FJQ@mail.gmail.com/

The whole seccomp model is based on the assumption that the filter
installer completely controls the filtered task.  Reading memory is
not qualitatively different.

To be clear, this is not to be interpreted as an ack to allowing
seccomp to read process memory.  I'm saying that, if seccomp gains the
ability to read process memory, I don't think a dumpable or ptrace
check is needed.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 86f3e8784e43..2019c0893250 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1965,6 +1965,10 @@  extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
 extern const struct bpf_func_proto bpf_task_storage_get_proto;
 extern const struct bpf_func_proto bpf_task_storage_delete_proto;
 extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
+extern const struct bpf_func_proto bpf_probe_read_user_proto;
+extern const struct bpf_func_proto bpf_probe_read_user_dumpable_proto;
+extern const struct bpf_func_proto bpf_probe_read_user_str_proto;
+extern const struct bpf_func_proto bpf_probe_read_user_dumpable_str_proto;
 
 const struct bpf_func_proto *bpf_tracing_func_proto(
 	enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b9ed9951a05b..330e9c365cdc 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -2449,6 +2449,14 @@  seccomp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_current_uid_gid_proto;
 	case BPF_FUNC_get_current_pid_tgid:
 		return &bpf_get_current_pid_tgid_proto;
+	case BPF_FUNC_probe_read_user:
+		return ns_capable(current_user_ns(), CAP_SYS_PTRACE) ?
+			&bpf_probe_read_user_proto :
+			&bpf_probe_read_user_dumpable_proto;
+	case BPF_FUNC_probe_read_user_str:
+		return ns_capable(current_user_ns(), CAP_SYS_PTRACE) ?
+			&bpf_probe_read_user_str_proto :
+			&bpf_probe_read_user_dumpable_str_proto;
 	default:
 		break;
 	}
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d2d7cf6cfe83..a1d6d64bde08 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -175,6 +175,27 @@  const struct bpf_func_proto bpf_probe_read_user_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size,
+	   const void __user *, unsafe_ptr)
+{
+	int ret = -EPERM;
+
+	if (get_dumpable(current->mm))
+		ret = copy_from_user_nofault(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0))
+		memset(dst, 0, size);
+	return ret;
+}
+
+const struct bpf_func_proto bpf_probe_read_user_dumpable_proto = {
+	.func		= bpf_probe_read_user_dumpable,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static __always_inline int
 bpf_probe_read_user_str_common(void *dst, u32 size,
 			       const void __user *unsafe_ptr)
@@ -212,6 +233,27 @@  const struct bpf_func_proto bpf_probe_read_user_str_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_probe_read_user_dumpable_str, void *, dst, u32, size,
+	   const void __user *, unsafe_ptr)
+{
+	int ret = -EPERM;
+
+	if (get_dumpable(current->mm))
+		ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0))
+		memset(dst, 0, size);
+	return ret;
+}
+
+const struct bpf_func_proto bpf_probe_read_user_dumpable_str_proto = {
+	.func		= bpf_probe_read_user_dumpable_str,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static __always_inline int
 bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)
 {