diff mbox series

[RFC,bpf-next,seccomp,12/12] seccomp-ebpf: support task storage from BPF-LSM, defaulting to group leader

Message ID db41ad3924d01374d08984d20ad6678f91b82cde.1620499942.git.yifeifz2@illinois.edu (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series eBPF seccomp filters | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 7 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com john.fastabend@gmail.com songliubraving@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 10041 this patch: 10041
netdev/kdoc success Errors and warnings before: 8 this patch: 8
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 10455 this patch: 10455
netdev/header_inline success Link

Commit Message

YiFei Zhu May 10, 2021, 5:22 p.m. UTC
From: YiFei Zhu <yifeifz2@illinois.edu>

This enables seccomp-eBPF filters to have per-process state even when
the filter is loaded by an unprivileged process. Without CAP_BPF &&
CAP_PERFMON no access to ptr to BTF ID is possible, so the only valid
task the verifier will accept is NULL, and the helper implementation
fallbacks to the group leader to have a per-process storage.

Filters loaded by privileged processes may still access the storage
of arbitrary tasks via a valid task_struct ptr to BTF ID.

Since task storage require rcu being locked. We lock and unlock
rcu before every seccomp-eBPF filter execution.

I'm not sure if this is the best way to do this. One, this introduces
a dependency on BPF-LSM. Two, per-thread storage is not accessible
to unprivileged filter loaders; it has to be per-process.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 include/linux/bpf.h           |  2 ++
 kernel/bpf/bpf_task_storage.c | 64 ++++++++++++++++++++++++++++++-----
 kernel/seccomp.c              |  4 +++
 3 files changed, 61 insertions(+), 9 deletions(-)

Comments

Alexei Starovoitov May 11, 2021, 1:58 a.m. UTC | #1
On Mon, May 10, 2021 at 12:22:49PM -0500, YiFei Zhu wrote:
> +
> +BPF_CALL_4(bpf_task_storage_get_default_leader, struct bpf_map *, map,
> +	   struct task_struct *, task, void *, value, u64, flags)
> +{
> +	if (!task)
> +		task = current->group_leader;

Did you actually need it to be group_leader or current is enough?
If so loading BTF is not necessary.
You could have exposed it bpf_get_current_task_btf() and passed its
return value into bpf_task_storage_get.

On the other side loading BTF can be relaxed to unpriv,
but doing current->group_leader deref will make it priv only anyway.
YiFei Zhu May 11, 2021, 5:44 a.m. UTC | #2
On Mon, May 10, 2021 at 8:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 10, 2021 at 12:22:49PM -0500, YiFei Zhu wrote:
> > +
> > +BPF_CALL_4(bpf_task_storage_get_default_leader, struct bpf_map *, map,
> > +        struct task_struct *, task, void *, value, u64, flags)
> > +{
> > +     if (!task)
> > +             task = current->group_leader;
>
> Did you actually need it to be group_leader or current is enough?
> If so loading BTF is not necessary.

I think if task_storage were to be used to apply a policy onto a set
of tasks, there are probably more use cases to perform the state
transitions across an entire process than state transitions across a
single thread. Though, since seccomp only applies to the process tree
a lot of use cases of a per-process storage would be covered by a
global datasec too.

> You could have exposed it bpf_get_current_task_btf() and passed its
> return value into bpf_task_storage_get.
>
> On the other side loading BTF can be relaxed to unpriv,
> but doing current->group_leader deref will make it priv only anyway.

Yeah, that deref is what I was concerned about. It seems that if I
expose BTF structs to a prog type it gains the ability to deref it,
and I definitely don't want unpriv reading task_structs. Though yeah
we could potentially change the verifier to prohibit PTR_TO_BTF_ID
deref and any pointer arithmetic on it...

How about, we expose bpf_get_current_task_btf to unpriv, prohibit
unpriv deref and pointer arithmetic, and have NULL be
current->group_leader? This way, unpriv has access to both per-thread
and per-process.

YiFei Zhu
Alexei Starovoitov May 12, 2021, 9:56 p.m. UTC | #3
On Tue, May 11, 2021 at 12:44:46AM -0500, YiFei Zhu wrote:
> On Mon, May 10, 2021 at 8:58 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, May 10, 2021 at 12:22:49PM -0500, YiFei Zhu wrote:
> > > +
> > > +BPF_CALL_4(bpf_task_storage_get_default_leader, struct bpf_map *, map,
> > > +        struct task_struct *, task, void *, value, u64, flags)
> > > +{
> > > +     if (!task)
> > > +             task = current->group_leader;
> >
> > Did you actually need it to be group_leader or current is enough?
> > If so loading BTF is not necessary.
> 
> I think if task_storage were to be used to apply a policy onto a set
> of tasks, there are probably more use cases to perform the state
> transitions across an entire process than state transitions across a
> single thread. Though, since seccomp only applies to the process tree
> a lot of use cases of a per-process storage would be covered by a
> global datasec too.
> 
> > You could have exposed it bpf_get_current_task_btf() and passed its
> > return value into bpf_task_storage_get.
> >
> > On the other side loading BTF can be relaxed to unpriv,
> > but doing current->group_leader deref will make it priv only anyway.
> 
> Yeah, that deref is what I was concerned about. It seems that if I
> expose BTF structs to a prog type it gains the ability to deref it,
> and I definitely don't want unpriv reading task_structs. Though yeah
> we could potentially change the verifier to prohibit PTR_TO_BTF_ID
> deref and any pointer arithmetic on it...
> 
> How about, we expose bpf_get_current_task_btf to unpriv, prohibit
> unpriv deref and pointer arithmetic, and have NULL be
> current->group_leader? This way, unpriv has access to both per-thread
> and per-process.

NULL to mean current->group_leader is too specific and not extensible.
I'd rather add another bpf_get_current_task_btf-like helper that
returns parent or group_leader depending on flags.
For now I wouldn't even do that. As you said hashmap with key=pid will
work fine. task_local_storage is a performance optimization.
I'd focus on landing the key bits before optimizing.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index efa6444b88d3..7c9755802275 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1964,7 +1964,9 @@  extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
 extern const struct bpf_func_proto bpf_sock_from_file_proto;
 extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
 extern const struct bpf_func_proto bpf_task_storage_get_proto;
+extern const struct bpf_func_proto bpf_task_storage_get_default_leader_proto;
 extern const struct bpf_func_proto bpf_task_storage_delete_proto;
+extern const struct bpf_func_proto bpf_task_storage_delete_default_leader_proto;
 extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 extern const struct bpf_func_proto bpf_probe_read_user_proto;
 extern const struct bpf_func_proto bpf_probe_read_user_dumpable_proto;
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 3ce75758d394..5ddf3a92d359 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -224,19 +224,19 @@  static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
 	return err;
 }
 
-BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
-	   task, void *, value, u64, flags)
+static void *_bpf_task_storage_get(struct bpf_map *map, struct task_struct *task,
+				   void *value, u64 flags)
 {
 	struct bpf_local_storage_data *sdata;
 
 	if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
-		return (unsigned long)NULL;
+		return NULL;
 
 	if (!task)
-		return (unsigned long)NULL;
+		return NULL;
 
 	if (!bpf_task_storage_trylock())
-		return (unsigned long)NULL;
+		return NULL;
 
 	sdata = task_storage_lookup(task, map, true);
 	if (sdata)
@@ -251,12 +251,24 @@  BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
 
 unlock:
 	bpf_task_storage_unlock();
-	return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL :
-		(unsigned long)sdata->data;
+	return IS_ERR_OR_NULL(sdata) ? NULL : sdata->data;
 }
 
-BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *,
-	   task)
+BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
+	   task, void *, value, u64, flags)
+{
+	return (unsigned long)_bpf_task_storage_get(map, task, value, flags);
+}
+
+BPF_CALL_4(bpf_task_storage_get_default_leader, struct bpf_map *, map,
+	   struct task_struct *, task, void *, value, u64, flags)
+{
+	if (!task)
+		task = current->group_leader;
+	return (unsigned long)_bpf_task_storage_get(map, task, value, flags);
+}
+
+static int _bpf_task_storage_delete(struct bpf_map *map, struct task_struct *task)
 {
 	int ret;
 
@@ -275,6 +287,20 @@  BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *,
 	return ret;
 }
 
+BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *,
+	   task)
+{
+	return _bpf_task_storage_delete(map, task);
+}
+
+BPF_CALL_2(bpf_task_storage_delete_default_leader, struct bpf_map *, map,
+	   struct task_struct *, task)
+{
+	if (!task)
+		task = current->group_leader;
+	return _bpf_task_storage_delete(map, task);
+}
+
 static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
 {
 	return -ENOTSUPP;
@@ -330,6 +356,17 @@  const struct bpf_func_proto bpf_task_storage_get_proto = {
 	.arg4_type = ARG_ANYTHING,
 };
 
+const struct bpf_func_proto bpf_task_storage_get_default_leader_proto = {
+	.func = bpf_task_storage_get_default_leader,
+	.gpl_only = false,
+	.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg2_btf_id = &bpf_task_storage_btf_ids[0],
+	.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg4_type = ARG_ANYTHING,
+};
+
 const struct bpf_func_proto bpf_task_storage_delete_proto = {
 	.func = bpf_task_storage_delete,
 	.gpl_only = false,
@@ -338,3 +375,12 @@  const struct bpf_func_proto bpf_task_storage_delete_proto = {
 	.arg2_type = ARG_PTR_TO_BTF_ID,
 	.arg2_btf_id = &bpf_task_storage_btf_ids[0],
 };
+
+const struct bpf_func_proto bpf_task_storage_delete_default_leader_proto = {
+	.func = bpf_task_storage_delete_default_leader,
+	.gpl_only = false,
+	.ret_type = RET_INTEGER,
+	.arg1_type = ARG_CONST_MAP_PTR,
+	.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg2_btf_id = &bpf_task_storage_btf_ids[0],
+};
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 330e9c365cdc..5b41b2aee39c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -2457,6 +2457,10 @@  seccomp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return ns_capable(current_user_ns(), CAP_SYS_PTRACE) ?
 			&bpf_probe_read_user_str_proto :
 			&bpf_probe_read_user_dumpable_str_proto;
+	case BPF_FUNC_task_storage_get:
+		return &bpf_task_storage_get_default_leader_proto;
+	case BPF_FUNC_task_storage_delete:
+		return &bpf_task_storage_delete_default_leader_proto;
 	default:
 		break;
 	}