Message ID | 1472121165-29071-10-git-send-email-mic@digikod.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote: > Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) > to compare the current process cgroup with a cgroup handle, The handle > can match the current cgroup if it is the same or a child. This allows > to make conditional rules according to the current cgroup. > > A cgroup handle is a map entry created from a file descriptor referring > a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the > map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the > inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. Can you elaborate on why this is useful? I.e. why not just supply different policies to different subtrees. Also, how does this interact with the current cgroup v1 vs v2 mess? As far as I can tell, no one can even really agree on what "what cgroup am I in" means right now. > > An unprivileged process can create and manipulate cgroups thanks to > cgroup delegation. What is cgroup delegation? --Andy
On 25/08/2016 13:09, Andy Lutomirski wrote: > On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote: >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) >> to compare the current process cgroup with a cgroup handle, The handle >> can match the current cgroup if it is the same or a child. This allows >> to make conditional rules according to the current cgroup. >> >> A cgroup handle is a map entry created from a file descriptor referring >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. > > Can you elaborate on why this is useful? I.e. why not just supply > different policies to different subtrees. The main use case I see is to load the security policies at the start of a user session for all processes but not enforce them right away. The user can then keep a shell for Landlock administration tasks and lock the other processes with a dedicated cgroup on the fly. This allows the user to make unremovable Landlock security policies but only activate them when needed for specific processes. > > Also, how does this interact with the current cgroup v1 vs v2 mess? > As far as I can tell, no one can even really agree on what "what > cgroup am I in" means right now. I tested with cgroup-v2 but indeed, it seems a bit different with cgroup-v1 :) Does anyone know how to handle both cases? > >> >> An unprivileged process can create and manipulate cgroups thanks to >> cgroup delegation. > > What is cgroup delegation? This is simply the action of changing the owner of cgroup sysfs files to allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt) Mickaël
On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote: > Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) > to compare the current process cgroup with a cgroup handle, The handle > can match the current cgroup if it is the same or a child. This allows > to make conditional rules according to the current cgroup. > > A cgroup handle is a map entry created from a file descriptor referring > a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the > map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the > inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. > > An unprivileged process can create and manipulate cgroups thanks to > cgroup delegation. > > Signed-off-by: Mickaël Salaün <mic@digikod.net> ... > +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map, > + u64 r3_map_op, u64 r4, u64 r5) > +{ > + u8 option = (u8) r1_option; > + struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; > + enum bpf_map_array_op map_op = r3_map_op; > + struct bpf_array *array = container_of(map, struct bpf_array, map); > + struct cgroup *cg1, *cg2; > + struct map_landlock_handle *handle; > + int i; > + > + /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */ > + if (unlikely(!map)) { > + WARN_ON(1); > + return -EFAULT; > + } > + if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK)) > + return -EINVAL; > + > + /* for now, only handle OP_OR */ > + switch (map_op) { > + case BPF_MAP_ARRAY_OP_OR: > + break; > + case BPF_MAP_ARRAY_OP_UNSPEC: > + case BPF_MAP_ARRAY_OP_AND: > + case BPF_MAP_ARRAY_OP_XOR: > + default: > + return -EINVAL; > + } > + > + synchronize_rcu(); > + > + for (i = 0; i < array->n_entries; i++) { > + handle = (struct map_landlock_handle *) > + (array->value + array->elem_size * i); > + > + /* protected by the proto types, should not happen */ > + if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) { > + WARN_ON(1); > + return -EFAULT; > + } > + if (unlikely(!handle->css)) { > + WARN_ON(1); > + return -EFAULT; > + } > + > + if (option & LANDLOCK_FLAG_OPT_REVERSE) { > + cg1 = handle->css->cgroup; > + cg2 = task_css_set(current)->dfl_cgrp; > + } else { > + cg1 = task_css_set(current)->dfl_cgrp; > + cg2 = handle->css->cgroup; > + } > + > + if (cgroup_is_descendant(cg1, cg2)) > + return 0; > + } > + return 1; > +} - please take a loook at exisiting bpf_current_task_under_cgroup and reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array is nothing but duplication of the code. - I don't think such 'for' loop can scale. The solution needs to work with thousands of containers and thousands of cgroups. In the patch 06/10 the proposal is to use 'current' as holder of the programs: + for (prog = current->seccomp.landlock_prog; + prog; prog = prog->prev) { + if (prog->filter == landlock_ret->filter) { + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); + break; + } + } imo that's the root of scalability issue. I think to be able to scale the bpf programs have to be attached to cgroups instead of tasks. That would be very different api. seccomp doesn't need to be touched. But that is the only way I see to be able to scale. May be another way of thinking about it is 'lsm cgroup controller' that Sargun is proposing. The lsm hooks will provide stable execution points and the programs will be called like: prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id]; BPF_PROG_RUN(prog, ctx); The delegation functionality and 'prog_effective' logic that Daniel Mack is proposing will be fully reused here. External container management software will be able to apply bpf programs to control tasks under cgroup and such bpf_landlock_cmp_cgroup_beneath() helper won't be necessary. The user will be able to register different programs for different lsm hooks. If I understand the patch 6/10 correctly, there is one (or a list) prog for all lsm hooks per task which is not flexible enough. Anoop Naravaram's use case is to control the ports the applications under cgroup can bind and listen on. Such use case can be solved by such 'lsm cgroup controller' by attaching bpf program to security_socket_bind lsm hook and filtering sockaddr. Furthermore Sargun's use case is to allow further sockaddr rewrites from the bpf program which can be done as natural extension of such mechanism. If I understood Daniel's Anoop's Sargun's and yours use cases correctly the common piece of kernel infrastructure that can solve them all can start from Daniel's current set of patches that establish a mechanism of attaching bpf program to a cgroup. Then adding lsm hooks to it and later allowing argument rewrite (since they're already in the kernel and no ToCToU problems exist) As far as safety and type checking that bpf programs has to do, I like the approach of patch 06/10: +LANDLOCK_HOOK2(file_open, FILE_OPEN, + PTR_TO_STRUCT_FILE, struct file *, file, + PTR_TO_STRUCT_CRED, const struct cred *, cred +) teaching verifier to recognize struct file, cred, sockaddr will let bpf program access them naturally without any overhead. Though: @@ -102,6 +102,9 @@ enum bpf_prog_type { BPF_PROG_TYPE_SCHED_CLS, BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_TRACEPOINT, + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, }; is a bit of overkill. I think it would be cleaner to have single BPF_PROG_TYPE_LSM and at program load time pass lsm_hook_id as well, so that verifier can do safety checks based on type info provided in LANDLOCK_HOOKs
Hello, On Thu, Aug 25, 2016 at 04:44:13PM +0200, Mickaël Salaün wrote: > I tested with cgroup-v2 but indeed, it seems a bit different with > cgroup-v1 :) > Does anyone know how to handle both cases? If you wanna do cgroup membership test, just do cgroup v2 membership test. No need to introduce a new controller and possibly struct sock association field for that. That's what all new cgroup aware network operations are using anyway and doesn't conflicts with whether other controllers are v1 or v2. For examples of using cgroup v2 membership test, please take a look at cgroup_mt_v1(). Thanks.
On Thu, Aug 25, 2016 at 7:44 AM, Mickaël Salaün <mic@digikod.net> wrote: > > On 25/08/2016 13:09, Andy Lutomirski wrote: >> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün <mic@digikod.net> wrote: >>> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) >>> to compare the current process cgroup with a cgroup handle, The handle >>> can match the current cgroup if it is the same or a child. This allows >>> to make conditional rules according to the current cgroup. >>> >>> A cgroup handle is a map entry created from a file descriptor referring >>> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the >>> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the >>> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. >> >> Can you elaborate on why this is useful? I.e. why not just supply >> different policies to different subtrees. > > The main use case I see is to load the security policies at the start of > a user session for all processes but not enforce them right away. The > user can then keep a shell for Landlock administration tasks and lock > the other processes with a dedicated cgroup on the fly. This allows the > user to make unremovable Landlock security policies but only activate > them when needed for specific processes. This seems like a bit of a dubious use case to me. The landlock mechanism should be flexible enough to do this kind of thing even without cgroups, and "spawn a process, wait a while, and then confine it by fiddling with cgroups" seems a lot dicier than just loading the right policy in the first place, especially since eBPF policies can be stateful. > >> >> Also, how does this interact with the current cgroup v1 vs v2 mess? >> As far as I can tell, no one can even really agree on what "what >> cgroup am I in" means right now. > > I tested with cgroup-v2 but indeed, it seems a bit different with > cgroup-v1 :) > Does anyone know how to handle both cases? > >> >>> >>> An unprivileged process can create and manipulate cgroups thanks to >>> cgroup delegation. >> >> What is cgroup delegation? > > This is simply the action of changing the owner of cgroup sysfs files to > allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt) As far as I can tell, Tejun and systemd both actively discourage doing this. Maybe I misunderstand. But in any event, the admin giving you a cgroup hierarchy you can use for this means that the admin has to cooperate with your policy, and it further requires (with cgroup v2 or similar, which is most likely the future) that your lockdown policy be compatible with your resource control policy. I would suggest dropping this lockdown feature until a use case emerges that really can't be addressed adequately without it. --Andy
On 26/08/2016 04:14, Alexei Starovoitov wrote: > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote: >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) >> to compare the current process cgroup with a cgroup handle, The handle >> can match the current cgroup if it is the same or a child. This allows >> to make conditional rules according to the current cgroup. >> >> A cgroup handle is a map entry created from a file descriptor referring >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. >> >> An unprivileged process can create and manipulate cgroups thanks to >> cgroup delegation. >> >> Signed-off-by: Mickaël Salaün <mic@digikod.net> > ... >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map, >> + u64 r3_map_op, u64 r4, u64 r5) >> +{ >> + u8 option = (u8) r1_option; >> + struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; >> + enum bpf_map_array_op map_op = r3_map_op; >> + struct bpf_array *array = container_of(map, struct bpf_array, map); >> + struct cgroup *cg1, *cg2; >> + struct map_landlock_handle *handle; >> + int i; >> + >> + /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */ >> + if (unlikely(!map)) { >> + WARN_ON(1); >> + return -EFAULT; >> + } >> + if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK)) >> + return -EINVAL; >> + >> + /* for now, only handle OP_OR */ >> + switch (map_op) { >> + case BPF_MAP_ARRAY_OP_OR: >> + break; >> + case BPF_MAP_ARRAY_OP_UNSPEC: >> + case BPF_MAP_ARRAY_OP_AND: >> + case BPF_MAP_ARRAY_OP_XOR: >> + default: >> + return -EINVAL; >> + } >> + >> + synchronize_rcu(); >> + >> + for (i = 0; i < array->n_entries; i++) { >> + handle = (struct map_landlock_handle *) >> + (array->value + array->elem_size * i); >> + >> + /* protected by the proto types, should not happen */ >> + if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) { >> + WARN_ON(1); >> + return -EFAULT; >> + } >> + if (unlikely(!handle->css)) { >> + WARN_ON(1); >> + return -EFAULT; >> + } >> + >> + if (option & LANDLOCK_FLAG_OPT_REVERSE) { >> + cg1 = handle->css->cgroup; >> + cg2 = task_css_set(current)->dfl_cgrp; >> + } else { >> + cg1 = task_css_set(current)->dfl_cgrp; >> + cg2 = handle->css->cgroup; >> + } >> + >> + if (cgroup_is_descendant(cg1, cg2)) >> + return 0; >> + } >> + return 1; >> +} > > - please take a loook at exisiting bpf_current_task_under_cgroup and > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array > is nothing but duplication of the code. Oh, I didn't know about this patchset and the new helper. Indeed, it looks a lot like mine except there is no static verification of the map type as I did with the arraymap of handles, and no batch mode either. I think the return value of bpf_current_task_under_cgroup is error-prone if an eBPF program do an "if(ret)" test on the value (because of the negative ERRNO return value). Inverting the 0 and 1 return values should fix this (0 == succeed, 1 == failed, <0 == error). To sum up, there is four related patchsets: * "Landlock LSM: Unprivileged sandboxing" (this series) * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon) * "Networking cgroup controller" (Anoop Naravaram) * "Add eBPF hooks for cgroups" (Daniel Mack) The three other series (Sargun's, Anoop's and Daniel's) are mainly focused on network access-control via cgroup for *containers*. As far as I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's goal is to empower all processes (privileged or not) to create their own sandbox. This also means, like explained in "[RFC v2 00/10] Landlock LSM: Unprivileged sandboxing", there is more constraints. For example, it is not acceptable to let a process probe the kernel memory as it wish. More details are in the Landlock cover-letter. Another important point is that supporting cgroup for Landlock is optional. It does not rely on cgroup to be usable but is only a feature available when (unprivileged) users can manage there own cgroup, which is an important constraint. Put another way, Landlock should not rely on cgroup to create sandboxes. Indeed, a process creating a sandbox do not necessarily have access to the cgroup mount point (directly or not). > > - I don't think such 'for' loop can scale. The solution needs to work > with thousands of containers and thousands of cgroups. > In the patch 06/10 the proposal is to use 'current' as holder of > the programs: > + for (prog = current->seccomp.landlock_prog; > + prog; prog = prog->prev) { > + if (prog->filter == landlock_ret->filter) { > + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > + break; > + } > + } > imo that's the root of scalability issue. > I think to be able to scale the bpf programs have to be attached to > cgroups instead of tasks. > That would be very different api. seccomp doesn't need to be touched. > But that is the only way I see to be able to scale. Landlock is inspired from seccomp which also use a BPF program per thread. For seccomp, each BPF programs are executed for each syscall. For Landlock, some BPF programs are executed for some LSM hooks. I don't see why it is a scale issue for Landlock comparing to seccomp. I also don't see why storing the BPF program list pointer in the cgroup struct instead of the task struct change a lot here. The BPF programs execution will be the same anyway (for each LSM hook). Kees should probably have a better opinion on this. > May be another way of thinking about it is 'lsm cgroup controller' > that Sargun is proposing. > The lsm hooks will provide stable execution points and the programs > will be called like: > prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id]; > BPF_PROG_RUN(prog, ctx); > The delegation functionality and 'prog_effective' logic that > Daniel Mack is proposing will be fully reused here. > External container management software will be able to apply bpf > programs to control tasks under cgroup and such > bpf_landlock_cmp_cgroup_beneath() helper won't be necessary. > The user will be able to register different programs for different lsm hooks. > If I understand the patch 6/10 correctly, there is one (or a list) prog for > all lsm hooks per task which is not flexible enough. For each LSM hook triggered by a thread, all of its Landlock eBPF programs (dedicated for this kind of hook) will be evaluated (if needed). This is the same behavior as seccomp (list of BPF programs attached to a process hierarchy) except the BPF programs are not evaluated for syscall but for LSM hooks. There is no way to make it more fine-grained :) > Anoop Naravaram's use case is to control the ports the applications > under cgroup can bind and listen on. > Such use case can be solved by such 'lsm cgroup controller' by > attaching bpf program to security_socket_bind lsm hook and > filtering sockaddr. > Furthermore Sargun's use case is to allow further sockaddr rewrites > from the bpf program which can be done as natural extension > of such mechanism. > > If I understood Daniel's Anoop's Sargun's and yours use cases > correctly the common piece of kernel infrastructure that can solve > them all can start from Daniel's current set of patches that > establish a mechanism of attaching bpf program to a cgroup. > Then adding lsm hooks to it and later allowing argument rewrite > (since they're already in the kernel and no ToCToU problems exist) To sum up, the pieces we have in common are the eBPF use and (optionally for Landlock) there execution depending of the current cgroup. Moreover, the three other series (Sargun's, Anoop's and Daniel's) do not deal with unprivileged process which is the main purpose of Landlock. I'm not sure that allowing sockaddr rewrites is a good idea here... Like other LSMs, Landlock is dedicated to access-control. For the network-related series, I think it make more sense to simply create a netfilter rule matching a cgroup and then add more features to netfilter (restrict port ranges and so on) thanks to eBPF programs. Containers are (usually) in a dedicated network namespace, which open the possibility to not only rely on cgroups (e.g. match UID, netmask...). It would also be more flexible to be able to load a BPF program in netfilter and update its maps on the fly to make dynamic rules, like ipset does, but in a more generic way. > > As far as safety and type checking that bpf programs has to do, > I like the approach of patch 06/10: > +LANDLOCK_HOOK2(file_open, FILE_OPEN, > + PTR_TO_STRUCT_FILE, struct file *, file, > + PTR_TO_STRUCT_CRED, const struct cred *, cred > +) > teaching verifier to recognize struct file, cred, sockaddr > will let bpf program access them naturally without any overhead. > Though: > @@ -102,6 +102,9 @@ enum bpf_prog_type { > BPF_PROG_TYPE_SCHED_CLS, > BPF_PROG_TYPE_SCHED_ACT, > BPF_PROG_TYPE_TRACEPOINT, > + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, > + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, > + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, > }; > is a bit of overkill. > I think it would be cleaner to have single > BPF_PROG_TYPE_LSM and at program load time pass > lsm_hook_id as well, so that verifier can do safety checks > based on type info provided in LANDLOCK_HOOKs I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF verifier check programs according to their types. If we need to check specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a dedicated program types. I don't see any other way to do it with the current verifier code. Moreover it's the purpose of program types, right? Regards, Mickaël
Hello, On Fri, Aug 26, 2016 at 07:20:35AM -0700, Andy Lutomirski wrote: > > This is simply the action of changing the owner of cgroup sysfs files to > > allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt) > > As far as I can tell, Tejun and systemd both actively discourage doing > this. Maybe I misunderstand. But in any event, the admin giving you Please refer to "2-5. Delegation" of Documentation/cgroup-v2.txt. Delegation on v1 is broken on both core and specific controller behaviors and thus discouraged. On v2, delegation should work just fine. I haven't looked in detail but in general I'm not too excited about layering security mechanism on top of cgroup. Maybe it makes some sense when security domain coincides with resource domains but at any rate please keep me in the loop. Thanks.
On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > trimming cc list again. When it's too big vger will consider it as spam. > On 26/08/2016 04:14, Alexei Starovoitov wrote: > > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote: > >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) > >> to compare the current process cgroup with a cgroup handle, The handle > >> can match the current cgroup if it is the same or a child. This allows > >> to make conditional rules according to the current cgroup. > >> > >> A cgroup handle is a map entry created from a file descriptor referring > >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the > >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the > >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. > >> > >> An unprivileged process can create and manipulate cgroups thanks to > >> cgroup delegation. > >> > >> Signed-off-by: Mickaël Salaün <mic@digikod.net> > > ... > >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map, > >> + u64 r3_map_op, u64 r4, u64 r5) > >> +{ > >> + u8 option = (u8) r1_option; > >> + struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; > >> + enum bpf_map_array_op map_op = r3_map_op; > >> + struct bpf_array *array = container_of(map, struct bpf_array, map); > >> + struct cgroup *cg1, *cg2; > >> + struct map_landlock_handle *handle; > >> + int i; > >> + > >> + /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */ > >> + if (unlikely(!map)) { > >> + WARN_ON(1); > >> + return -EFAULT; > >> + } > >> + if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK)) > >> + return -EINVAL; > >> + > >> + /* for now, only handle OP_OR */ > >> + switch (map_op) { > >> + case BPF_MAP_ARRAY_OP_OR: > >> + break; > >> + case BPF_MAP_ARRAY_OP_UNSPEC: > >> + case BPF_MAP_ARRAY_OP_AND: > >> + case BPF_MAP_ARRAY_OP_XOR: > >> + default: > >> + return -EINVAL; > >> + } > >> + > >> + synchronize_rcu(); > >> + > >> + for (i = 0; i < array->n_entries; i++) { > >> + handle = (struct map_landlock_handle *) > >> + (array->value + array->elem_size * i); > >> + > >> + /* protected by the proto types, should not happen */ > >> + if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) { > >> + WARN_ON(1); > >> + return -EFAULT; > >> + } > >> + if (unlikely(!handle->css)) { > >> + WARN_ON(1); > >> + return -EFAULT; > >> + } > >> + > >> + if (option & LANDLOCK_FLAG_OPT_REVERSE) { > >> + cg1 = handle->css->cgroup; > >> + cg2 = task_css_set(current)->dfl_cgrp; > >> + } else { > >> + cg1 = task_css_set(current)->dfl_cgrp; > >> + cg2 = handle->css->cgroup; > >> + } > >> + > >> + if (cgroup_is_descendant(cg1, cg2)) > >> + return 0; > >> + } > >> + return 1; > >> +} > > > > - please take a loook at exisiting bpf_current_task_under_cgroup and > > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array > > is nothing but duplication of the code. > > Oh, I didn't know about this patchset and the new helper. Indeed, it > looks a lot like mine except there is no static verification of the map > type as I did with the arraymap of handles, and no batch mode either. I > think the return value of bpf_current_task_under_cgroup is error-prone > if an eBPF program do an "if(ret)" test on the value (because of the > negative ERRNO return value). Inverting the 0 and 1 return values should > fix this (0 == succeed, 1 == failed, <0 == error). nothing to fix. It's good as-is. Use if (ret > 0) instead. > > To sum up, there is four related patchsets: > * "Landlock LSM: Unprivileged sandboxing" (this series) > * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon) > * "Networking cgroup controller" (Anoop Naravaram) > * "Add eBPF hooks for cgroups" (Daniel Mack) > > The three other series (Sargun's, Anoop's and Daniel's) are mainly > focused on network access-control via cgroup for *containers*. As far as > I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's > goal is to empower all processes (privileged or not) to create their own > sandbox. This also means, like explained in "[RFC v2 00/10] Landlock > LSM: Unprivileged sandboxing", there is more constraints. For example, > it is not acceptable to let a process probe the kernel memory as it > wish. More details are in the Landlock cover-letter. > > > Another important point is that supporting cgroup for Landlock is > optional. It does not rely on cgroup to be usable but is only a feature > available when (unprivileged) users can manage there own cgroup, which > is an important constraint. Put another way, Landlock should not rely on > cgroup to create sandboxes. Indeed, a process creating a sandbox do not > necessarily have access to the cgroup mount point (directly or not). cgroup is the common way to group multiple tasks. Without cgroup only parent<->child relationship will be possible, which will limit usability of such lsm to a master task that controls its children. Such api restriction would have been ok, if we could extend it in the future, but unfortunately task-centric won't allow it without creating a parallel lsm that is cgroup based. Therefore I think we have to go with cgroup-centric api and your application has to use cgroups from the start though only parent-child would have been enough. Also I don't think the kernel can afford two bpf based lsm. One task based and another cgroup based, so we have to find common ground that suits both use cases. Having unprivliged access is a subset. There is no strong reason why cgroup+lsm+bpf should be limited to root only always. When we can guarantee no pointer leaks, we can allow unpriv. > > > > - I don't think such 'for' loop can scale. The solution needs to work > > with thousands of containers and thousands of cgroups. > > In the patch 06/10 the proposal is to use 'current' as holder of > > the programs: > > + for (prog = current->seccomp.landlock_prog; > > + prog; prog = prog->prev) { > > + if (prog->filter == landlock_ret->filter) { > > + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > > + break; > > + } > > + } > > imo that's the root of scalability issue. > > I think to be able to scale the bpf programs have to be attached to > > cgroups instead of tasks. > > That would be very different api. seccomp doesn't need to be touched. > > But that is the only way I see to be able to scale. > > Landlock is inspired from seccomp which also use a BPF program per > thread. For seccomp, each BPF programs are executed for each syscall. > For Landlock, some BPF programs are executed for some LSM hooks. I don't > see why it is a scale issue for Landlock comparing to seccomp. I also > don't see why storing the BPF program list pointer in the cgroup struct > instead of the task struct change a lot here. The BPF programs execution > will be the same anyway (for each LSM hook). Kees should probably have a > better opinion on this. seccomp has its own issues and copying them doesn't make this lsm any better. Like seccomp bpf programs are all gigantic switch statement that looks for interesting syscall numbers. All syscalls of a task are paying non-trivial seccomp penalty due to such design. If bpf was attached per syscall it would have been much cheaper. Of course doing it this way for seccomp is not easy, but for lsm such facility is already there. Blank call of a single bpf prog for all lsm hooks is unnecessary overhead that can and should be avoided. > > May be another way of thinking about it is 'lsm cgroup controller' > > that Sargun is proposing. > > The lsm hooks will provide stable execution points and the programs > > will be called like: > > prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id]; > > BPF_PROG_RUN(prog, ctx); > > The delegation functionality and 'prog_effective' logic that > > Daniel Mack is proposing will be fully reused here. > > External container management software will be able to apply bpf > > programs to control tasks under cgroup and such > > bpf_landlock_cmp_cgroup_beneath() helper won't be necessary. > > The user will be able to register different programs for different lsm hooks. > > If I understand the patch 6/10 correctly, there is one (or a list) prog for > > all lsm hooks per task which is not flexible enough. > > For each LSM hook triggered by a thread, all of its Landlock eBPF > programs (dedicated for this kind of hook) will be evaluated (if > needed). This is the same behavior as seccomp (list of BPF programs > attached to a process hierarchy) except the BPF programs are not > evaluated for syscall but for LSM hooks. There is no way to make it more > fine-grained :) There is a way to attach different bpf program per cgroup and per lsm hook. Such approach drastically reduces overhead of sandboxed application. > > Anoop Naravaram's use case is to control the ports the applications > > under cgroup can bind and listen on. > > Such use case can be solved by such 'lsm cgroup controller' by > > attaching bpf program to security_socket_bind lsm hook and > > filtering sockaddr. > > Furthermore Sargun's use case is to allow further sockaddr rewrites > > from the bpf program which can be done as natural extension > > of such mechanism. > > > > If I understood Daniel's Anoop's Sargun's and yours use cases > > correctly the common piece of kernel infrastructure that can solve > > them all can start from Daniel's current set of patches that > > establish a mechanism of attaching bpf program to a cgroup. > > Then adding lsm hooks to it and later allowing argument rewrite > > (since they're already in the kernel and no ToCToU problems exist) > > To sum up, the pieces we have in common are the eBPF use and (optionally > for Landlock) there execution depending of the current cgroup. > > Moreover, the three other series (Sargun's, Anoop's and Daniel's) do not > deal with unprivileged process which is the main purpose of Landlock. > I'm not sure that allowing sockaddr rewrites is a good idea here... Like > other LSMs, Landlock is dedicated to access-control. we have to find common ground and common infra to solve all these use cases. Pointing out the differences isn't going to make this snowflake any more special. > For the network-related series, I think it make more sense to simply > create a netfilter rule matching a cgroup and then add more features to > netfilter (restrict port ranges and so on) thanks to eBPF programs. > Containers are (usually) in a dedicated network namespace, which open > the possibility to not only rely on cgroups (e.g. match UID, > netmask...). It would also be more flexible to be able to load a BPF > program in netfilter and update its maps on the fly to make dynamic > rules, like ipset does, but in a more generic way. > > > > > > As far as safety and type checking that bpf programs has to do, > > I like the approach of patch 06/10: > > +LANDLOCK_HOOK2(file_open, FILE_OPEN, > > + PTR_TO_STRUCT_FILE, struct file *, file, > > + PTR_TO_STRUCT_CRED, const struct cred *, cred > > +) > > teaching verifier to recognize struct file, cred, sockaddr > > will let bpf program access them naturally without any overhead. > > Though: > > @@ -102,6 +102,9 @@ enum bpf_prog_type { > > BPF_PROG_TYPE_SCHED_CLS, > > BPF_PROG_TYPE_SCHED_ACT, > > BPF_PROG_TYPE_TRACEPOINT, > > + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, > > + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, > > + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, > > }; > > is a bit of overkill. > > I think it would be cleaner to have single > > BPF_PROG_TYPE_LSM and at program load time pass > > lsm_hook_id as well, so that verifier can do safety checks > > based on type info provided in LANDLOCK_HOOKs > > I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF > verifier check programs according to their types. If we need to check > specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a > dedicated program types. I don't see any other way to do it with the > current verifier code. Moreover it's the purpose of program types, right? Adding new bpf program type for every lsm hook is not acceptable. Either do one new program type + pass lsm_hook_id as suggested or please come up with an alternative approach.
On Aug 27, 2016 1:05 AM, "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote: > > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > > > > trimming cc list again. When it's too big vger will consider it as spam. > > > On 26/08/2016 04:14, Alexei Starovoitov wrote: > > > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote: > > >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) > > >> to compare the current process cgroup with a cgroup handle, The handle > > >> can match the current cgroup if it is the same or a child. This allows > > >> to make conditional rules according to the current cgroup. > > >> > > >> A cgroup handle is a map entry created from a file descriptor referring > > >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the > > >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the > > >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. > > >> > > >> An unprivileged process can create and manipulate cgroups thanks to > > >> cgroup delegation. > > >> > > >> Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > ... > > >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map, > > >> + u64 r3_map_op, u64 r4, u64 r5) > > >> +{ > > >> + u8 option = (u8) r1_option; > > >> + struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; > > >> + enum bpf_map_array_op map_op = r3_map_op; > > >> + struct bpf_array *array = container_of(map, struct bpf_array, map); > > >> + struct cgroup *cg1, *cg2; > > >> + struct map_landlock_handle *handle; > > >> + int i; > > >> + > > >> + /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */ > > >> + if (unlikely(!map)) { > > >> + WARN_ON(1); > > >> + return -EFAULT; > > >> + } > > >> + if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK)) > > >> + return -EINVAL; > > >> + > > >> + /* for now, only handle OP_OR */ > > >> + switch (map_op) { > > >> + case BPF_MAP_ARRAY_OP_OR: > > >> + break; > > >> + case BPF_MAP_ARRAY_OP_UNSPEC: > > >> + case BPF_MAP_ARRAY_OP_AND: > > >> + case BPF_MAP_ARRAY_OP_XOR: > > >> + default: > > >> + return -EINVAL; > > >> + } > > >> + > > >> + synchronize_rcu(); > > >> + > > >> + for (i = 0; i < array->n_entries; i++) { > > >> + handle = (struct map_landlock_handle *) > > >> + (array->value + array->elem_size * i); > > >> + > > >> + /* protected by the proto types, should not happen */ > > >> + if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) { > > >> + WARN_ON(1); > > >> + return -EFAULT; > > >> + } > > >> + if (unlikely(!handle->css)) { > > >> + WARN_ON(1); > > >> + return -EFAULT; > > >> + } > > >> + > > >> + if (option & LANDLOCK_FLAG_OPT_REVERSE) { > > >> + cg1 = handle->css->cgroup; > > >> + cg2 = task_css_set(current)->dfl_cgrp; > > >> + } else { > > >> + cg1 = task_css_set(current)->dfl_cgrp; > > >> + cg2 = handle->css->cgroup; > > >> + } > > >> + > > >> + if (cgroup_is_descendant(cg1, cg2)) > > >> + return 0; > > >> + } > > >> + return 1; > > >> +} > > > > > > - please take a loook at exisiting bpf_current_task_under_cgroup and > > > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array > > > is nothing but duplication of the code. > > > > Oh, I didn't know about this patchset and the new helper. Indeed, it > > looks a lot like mine except there is no static verification of the map > > type as I did with the arraymap of handles, and no batch mode either. I > > think the return value of bpf_current_task_under_cgroup is error-prone > > if an eBPF program do an "if(ret)" test on the value (because of the > > negative ERRNO return value). Inverting the 0 and 1 return values should > > fix this (0 == succeed, 1 == failed, <0 == error). > > nothing to fix. It's good as-is. Use if (ret > 0) instead. > > > > > To sum up, there is four related patchsets: > > * "Landlock LSM: Unprivileged sandboxing" (this series) > > * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon) > > * "Networking cgroup controller" (Anoop Naravaram) > > * "Add eBPF hooks for cgroups" (Daniel Mack) > > > > The three other series (Sargun's, Anoop's and Daniel's) are mainly > > focused on network access-control via cgroup for *containers*. As far as > > I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's > > goal is to empower all processes (privileged or not) to create their own > > sandbox. This also means, like explained in "[RFC v2 00/10] Landlock > > LSM: Unprivileged sandboxing", there is more constraints. For example, > > it is not acceptable to let a process probe the kernel memory as it > > wish. More details are in the Landlock cover-letter. > > > > > > Another important point is that supporting cgroup for Landlock is > > optional. It does not rely on cgroup to be usable but is only a feature > > available when (unprivileged) users can manage there own cgroup, which > > is an important constraint. Put another way, Landlock should not rely on > > cgroup to create sandboxes. Indeed, a process creating a sandbox do not > > necessarily have access to the cgroup mount point (directly or not). > > cgroup is the common way to group multiple tasks. > Without cgroup only parent<->child relationship will be possible, > which will limit usability of such lsm to a master task that controls > its children. Such api restriction would have been ok, if we could > extend it in the future, but unfortunately task-centric won't allow it > without creating a parallel lsm that is cgroup based. > Therefore I think we have to go with cgroup-centric api and your > application has to use cgroups from the start though only parent-child > would have been enough. > Also I don't think the kernel can afford two bpf based lsm. One task > based and another cgroup based, so we have to find common ground > that suits both use cases. > Having unprivliged access is a subset. There is no strong reason why > cgroup+lsm+bpf should be limited to root only always. > When we can guarantee no pointer leaks, we can allow unpriv. I don't really understand what you mean. In the context of landlock, which is a *sandbox*, can one of you explain a use case that materially benefits from this type of cgroup usage? I haven't thought of one.
On 27/08/2016 01:05, Alexei Starovoitov wrote: > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: >> >>> >>> - I don't think such 'for' loop can scale. The solution needs to work >>> with thousands of containers and thousands of cgroups. >>> In the patch 06/10 the proposal is to use 'current' as holder of >>> the programs: >>> + for (prog = current->seccomp.landlock_prog; >>> + prog; prog = prog->prev) { >>> + if (prog->filter == landlock_ret->filter) { >>> + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); >>> + break; >>> + } >>> + } >>> imo that's the root of scalability issue. >>> I think to be able to scale the bpf programs have to be attached to >>> cgroups instead of tasks. >>> That would be very different api. seccomp doesn't need to be touched. >>> But that is the only way I see to be able to scale. >> >> Landlock is inspired from seccomp which also use a BPF program per >> thread. For seccomp, each BPF programs are executed for each syscall. >> For Landlock, some BPF programs are executed for some LSM hooks. I don't >> see why it is a scale issue for Landlock comparing to seccomp. I also >> don't see why storing the BPF program list pointer in the cgroup struct >> instead of the task struct change a lot here. The BPF programs execution >> will be the same anyway (for each LSM hook). Kees should probably have a >> better opinion on this. > > seccomp has its own issues and copying them doesn't make this lsm any better. > Like seccomp bpf programs are all gigantic switch statement that looks > for interesting syscall numbers. All syscalls of a task are paying > non-trivial seccomp penalty due to such design. If bpf was attached per > syscall it would have been much cheaper. Of course doing it this way > for seccomp is not easy, but for lsm such facility is already there. > Blank call of a single bpf prog for all lsm hooks is unnecessary > overhead that can and should be avoided. It's probably a misunderstanding. Contrary to seccomp which run all the thread's BPF programs for any syscall, Landlock only run eBPF programs for the triggered LSM hooks, if their type match. Indeed, thanks to the multiple eBPF program types and contrary to seccomp, Landlock only run an eBPF program when needed. Landlock will have almost no performance overhead if the syscalls do not trigger the watched LSM hooks for the current process. > >>> May be another way of thinking about it is 'lsm cgroup controller' >>> that Sargun is proposing. >>> The lsm hooks will provide stable execution points and the programs >>> will be called like: >>> prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id]; >>> BPF_PROG_RUN(prog, ctx); >>> The delegation functionality and 'prog_effective' logic that >>> Daniel Mack is proposing will be fully reused here. >>> External container management software will be able to apply bpf >>> programs to control tasks under cgroup and such >>> bpf_landlock_cmp_cgroup_beneath() helper won't be necessary. >>> The user will be able to register different programs for different lsm hooks. >>> If I understand the patch 6/10 correctly, there is one (or a list) prog for >>> all lsm hooks per task which is not flexible enough. >> >> For each LSM hook triggered by a thread, all of its Landlock eBPF >> programs (dedicated for this kind of hook) will be evaluated (if >> needed). This is the same behavior as seccomp (list of BPF programs >> attached to a process hierarchy) except the BPF programs are not >> evaluated for syscall but for LSM hooks. There is no way to make it more >> fine-grained :) > > There is a way to attach different bpf program per cgroup > and per lsm hook. Such approach drastically reduces overhead > of sandboxed application. As said above, Landlock will not run an eBPF programs when not strictly needed. Attaching to a cgroup will have the same performance impact as attaching to a process hierarchy.
On 27/08/2016 01:05, Alexei Starovoitov wrote: > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: >> To sum up, there is four related patchsets: >> * "Landlock LSM: Unprivileged sandboxing" (this series) >> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon) >> * "Networking cgroup controller" (Anoop Naravaram) >> * "Add eBPF hooks for cgroups" (Daniel Mack) >>> Anoop Naravaram's use case is to control the ports the applications >>> under cgroup can bind and listen on. >>> Such use case can be solved by such 'lsm cgroup controller' by >>> attaching bpf program to security_socket_bind lsm hook and >>> filtering sockaddr. >>> Furthermore Sargun's use case is to allow further sockaddr rewrites >>> from the bpf program which can be done as natural extension >>> of such mechanism. >>> >>> If I understood Daniel's Anoop's Sargun's and yours use cases >>> correctly the common piece of kernel infrastructure that can solve >>> them all can start from Daniel's current set of patches that >>> establish a mechanism of attaching bpf program to a cgroup. >>> Then adding lsm hooks to it and later allowing argument rewrite >>> (since they're already in the kernel and no ToCToU problems exist) >> For the network-related series, I think it make more sense to simply >> create a netfilter rule matching a cgroup and then add more features to >> netfilter (restrict port ranges and so on) thanks to eBPF programs. >> Containers are (usually) in a dedicated network namespace, which open >> the possibility to not only rely on cgroups (e.g. match UID, >> netmask...). It would also be more flexible to be able to load a BPF >> program in netfilter and update its maps on the fly to make dynamic >> rules, like ipset does, but in a more generic way. What do the netdev folks think about this design?
On 27/08/2016 01:05, Alexei Starovoitov wrote: > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > >>> As far as safety and type checking that bpf programs has to do, >>> I like the approach of patch 06/10: >>> +LANDLOCK_HOOK2(file_open, FILE_OPEN, >>> + PTR_TO_STRUCT_FILE, struct file *, file, >>> + PTR_TO_STRUCT_CRED, const struct cred *, cred >>> +) >>> teaching verifier to recognize struct file, cred, sockaddr >>> will let bpf program access them naturally without any overhead. >>> Though: >>> @@ -102,6 +102,9 @@ enum bpf_prog_type { >>> BPF_PROG_TYPE_SCHED_CLS, >>> BPF_PROG_TYPE_SCHED_ACT, >>> BPF_PROG_TYPE_TRACEPOINT, >>> + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, >>> + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, >>> + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, >>> }; >>> is a bit of overkill. >>> I think it would be cleaner to have single >>> BPF_PROG_TYPE_LSM and at program load time pass >>> lsm_hook_id as well, so that verifier can do safety checks >>> based on type info provided in LANDLOCK_HOOKs >> >> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF >> verifier check programs according to their types. If we need to check >> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a >> dedicated program types. I don't see any other way to do it with the >> current verifier code. Moreover it's the purpose of program types, right? > > Adding new bpf program type for every lsm hook is not acceptable. > Either do one new program type + pass lsm_hook_id as suggested > or please come up with an alternative approach. OK, so we have to modify the verifier to not only rely on the program type but on another value to check the context accesses. Do you have a hint from where this value could come from? Do we need to add a new bpf command to associate a program to a subtype?
On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: > > On 27/08/2016 01:05, Alexei Starovoitov wrote: > > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > >> > >>> > >>> - I don't think such 'for' loop can scale. The solution needs to work > >>> with thousands of containers and thousands of cgroups. > >>> In the patch 06/10 the proposal is to use 'current' as holder of > >>> the programs: > >>> + for (prog = current->seccomp.landlock_prog; > >>> + prog; prog = prog->prev) { > >>> + if (prog->filter == landlock_ret->filter) { > >>> + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > >>> + break; > >>> + } > >>> + } > >>> imo that's the root of scalability issue. > >>> I think to be able to scale the bpf programs have to be attached to > >>> cgroups instead of tasks. > >>> That would be very different api. seccomp doesn't need to be touched. > >>> But that is the only way I see to be able to scale. > >> > >> Landlock is inspired from seccomp which also use a BPF program per > >> thread. For seccomp, each BPF programs are executed for each syscall. > >> For Landlock, some BPF programs are executed for some LSM hooks. I don't > >> see why it is a scale issue for Landlock comparing to seccomp. I also > >> don't see why storing the BPF program list pointer in the cgroup struct > >> instead of the task struct change a lot here. The BPF programs execution > >> will be the same anyway (for each LSM hook). Kees should probably have a > >> better opinion on this. > > > > seccomp has its own issues and copying them doesn't make this lsm any better. > > Like seccomp bpf programs are all gigantic switch statement that looks > > for interesting syscall numbers. All syscalls of a task are paying > > non-trivial seccomp penalty due to such design. If bpf was attached per > > syscall it would have been much cheaper. Of course doing it this way > > for seccomp is not easy, but for lsm such facility is already there. > > Blank call of a single bpf prog for all lsm hooks is unnecessary > > overhead that can and should be avoided. > > It's probably a misunderstanding. Contrary to seccomp which run all the > thread's BPF programs for any syscall, Landlock only run eBPF programs > for the triggered LSM hooks, if their type match. Indeed, thanks to the > multiple eBPF program types and contrary to seccomp, Landlock only run > an eBPF program when needed. Landlock will have almost no performance > overhead if the syscalls do not trigger the watched LSM hooks for the > current process. that's not what I see in the patch 06/10: all lsm_hooks in 'static struct security_hook_list landlock_hooks' (which eventually means all lsm hooks) will call static inline int landlock_hook_##NAME which will call landlock_run_prog() which does: + for (landlock_ret = current->seccomp.landlock_ret; + landlock_ret; landlock_ret = landlock_ret->prev) { + if (landlock_ret->triggered) { + ctx.cookie = landlock_ret->cookie; + for (prog = current->seccomp.landlock_prog; + prog; prog = prog->prev) { + if (prog->filter == landlock_ret->filter) { + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); + break; + } + } that is unacceptable overhead and not a scalable design. It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale as soon as more lsm hooks are added. > As said above, Landlock will not run an eBPF programs when not strictly > needed. Attaching to a cgroup will have the same performance impact as > attaching to a process hierarchy. Having a prog per cgroup per lsm_hook is the only scalable way I could come up with. If you see another way, please propose. current->seccomp.landlock_prog is not the answer.
On Sat, Aug 27, 2016 at 12:30:36AM -0700, Andy Lutomirski wrote: > > cgroup is the common way to group multiple tasks. > > Without cgroup only parent<->child relationship will be possible, > > which will limit usability of such lsm to a master task that controls > > its children. Such api restriction would have been ok, if we could > > extend it in the future, but unfortunately task-centric won't allow it > > without creating a parallel lsm that is cgroup based. > > Therefore I think we have to go with cgroup-centric api and your > > application has to use cgroups from the start though only parent-child > > would have been enough. > > Also I don't think the kernel can afford two bpf based lsm. One task > > based and another cgroup based, so we have to find common ground > > that suits both use cases. > > Having unprivliged access is a subset. There is no strong reason why > > cgroup+lsm+bpf should be limited to root only always. > > When we can guarantee no pointer leaks, we can allow unpriv. > > I don't really understand what you mean. In the context of landlock, > which is a *sandbox*, can one of you explain a use case that > materially benefits from this type of cgroup usage? I haven't thought > of one. In case of seccomp-like sandbox where parent controls child processes cgroup is not needed. It's needed when container management software needs to control a set of applications. If we can have one bpf-based lsm that works via cgroup and without, I'd be fine with it. Right now I haven't seen a plausible proposal to do that. Therefore cgroup based api is a common api that works for sandbox as well, though requiring parent to create a cgroup just to control a single child is cumbersome.
On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote: > > On 27/08/2016 01:05, Alexei Starovoitov wrote: > > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > > > >>> As far as safety and type checking that bpf programs has to do, > >>> I like the approach of patch 06/10: > >>> +LANDLOCK_HOOK2(file_open, FILE_OPEN, > >>> + PTR_TO_STRUCT_FILE, struct file *, file, > >>> + PTR_TO_STRUCT_CRED, const struct cred *, cred > >>> +) > >>> teaching verifier to recognize struct file, cred, sockaddr > >>> will let bpf program access them naturally without any overhead. > >>> Though: > >>> @@ -102,6 +102,9 @@ enum bpf_prog_type { > >>> BPF_PROG_TYPE_SCHED_CLS, > >>> BPF_PROG_TYPE_SCHED_ACT, > >>> BPF_PROG_TYPE_TRACEPOINT, > >>> + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, > >>> + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, > >>> + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, > >>> }; > >>> is a bit of overkill. > >>> I think it would be cleaner to have single > >>> BPF_PROG_TYPE_LSM and at program load time pass > >>> lsm_hook_id as well, so that verifier can do safety checks > >>> based on type info provided in LANDLOCK_HOOKs > >> > >> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF > >> verifier check programs according to their types. If we need to check > >> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a > >> dedicated program types. I don't see any other way to do it with the > >> current verifier code. Moreover it's the purpose of program types, right? > > > > Adding new bpf program type for every lsm hook is not acceptable. > > Either do one new program type + pass lsm_hook_id as suggested > > or please come up with an alternative approach. > > OK, so we have to modify the verifier to not only rely on the program > type but on another value to check the context accesses. Do you have a > hint from where this value could come from? Do we need to add a new bpf > command to associate a program to a subtype? It's another field prog_subtype (or prog_hook_id) in union bpf_attr. Both prog_type and prog_hook_id are used during verification. prog_type distinguishes the main aspects whereas prog_hook_id selects which lsm_hook's argument definition to apply. At the time of attaching to a hook, the prog_hook_id passed at the load time should match lsm's hook_id.
On Sat, Aug 27, 2016 at 04:19:05PM +0200, Mickaël Salaün wrote: > > On 27/08/2016 01:05, Alexei Starovoitov wrote: > > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > >> To sum up, there is four related patchsets: > >> * "Landlock LSM: Unprivileged sandboxing" (this series) > >> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon) > >> * "Networking cgroup controller" (Anoop Naravaram) > >> * "Add eBPF hooks for cgroups" (Daniel Mack) > > >>> Anoop Naravaram's use case is to control the ports the applications > >>> under cgroup can bind and listen on. > >>> Such use case can be solved by such 'lsm cgroup controller' by > >>> attaching bpf program to security_socket_bind lsm hook and > >>> filtering sockaddr. > >>> Furthermore Sargun's use case is to allow further sockaddr rewrites > >>> from the bpf program which can be done as natural extension > >>> of such mechanism. > >>> > >>> If I understood Daniel's Anoop's Sargun's and yours use cases > >>> correctly the common piece of kernel infrastructure that can solve > >>> them all can start from Daniel's current set of patches that > >>> establish a mechanism of attaching bpf program to a cgroup. > >>> Then adding lsm hooks to it and later allowing argument rewrite > >>> (since they're already in the kernel and no ToCToU problems exist) > > >> For the network-related series, I think it make more sense to simply > >> create a netfilter rule matching a cgroup and then add more features to > >> netfilter (restrict port ranges and so on) thanks to eBPF programs. > >> Containers are (usually) in a dedicated network namespace, which open > >> the possibility to not only rely on cgroups (e.g. match UID, > >> netmask...). It would also be more flexible to be able to load a BPF > >> program in netfilter and update its maps on the fly to make dynamic > >> rules, like ipset does, but in a more generic way. > > What do the netdev folks think about this design? such design doesn't scale when used for container management and that's what we need to solve. netns has its overhead and management issues. There are proposals to solve that but that is orthogonal to containers in general. A lot of them don't use netns.
On 27/08/2016 20:06, Alexei Starovoitov wrote: > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >> >> On 27/08/2016 01:05, Alexei Starovoitov wrote: >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: >>>> >>>>> >>>>> - I don't think such 'for' loop can scale. The solution needs to work >>>>> with thousands of containers and thousands of cgroups. >>>>> In the patch 06/10 the proposal is to use 'current' as holder of >>>>> the programs: >>>>> + for (prog = current->seccomp.landlock_prog; >>>>> + prog; prog = prog->prev) { >>>>> + if (prog->filter == landlock_ret->filter) { >>>>> + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); >>>>> + break; >>>>> + } >>>>> + } >>>>> imo that's the root of scalability issue. >>>>> I think to be able to scale the bpf programs have to be attached to >>>>> cgroups instead of tasks. >>>>> That would be very different api. seccomp doesn't need to be touched. >>>>> But that is the only way I see to be able to scale. >>>> >>>> Landlock is inspired from seccomp which also use a BPF program per >>>> thread. For seccomp, each BPF programs are executed for each syscall. >>>> For Landlock, some BPF programs are executed for some LSM hooks. I don't >>>> see why it is a scale issue for Landlock comparing to seccomp. I also >>>> don't see why storing the BPF program list pointer in the cgroup struct >>>> instead of the task struct change a lot here. The BPF programs execution >>>> will be the same anyway (for each LSM hook). Kees should probably have a >>>> better opinion on this. >>> >>> seccomp has its own issues and copying them doesn't make this lsm any better. >>> Like seccomp bpf programs are all gigantic switch statement that looks >>> for interesting syscall numbers. All syscalls of a task are paying >>> non-trivial seccomp penalty due to such design. If bpf was attached per >>> syscall it would have been much cheaper. Of course doing it this way >>> for seccomp is not easy, but for lsm such facility is already there. >>> Blank call of a single bpf prog for all lsm hooks is unnecessary >>> overhead that can and should be avoided. >> >> It's probably a misunderstanding. Contrary to seccomp which run all the >> thread's BPF programs for any syscall, Landlock only run eBPF programs >> for the triggered LSM hooks, if their type match. Indeed, thanks to the >> multiple eBPF program types and contrary to seccomp, Landlock only run >> an eBPF program when needed. Landlock will have almost no performance >> overhead if the syscalls do not trigger the watched LSM hooks for the >> current process. > > that's not what I see in the patch 06/10: > all lsm_hooks in 'static struct security_hook_list landlock_hooks' > (which eventually means all lsm hooks) will call > static inline int landlock_hook_##NAME > which will call landlock_run_prog() > which does: > + for (landlock_ret = current->seccomp.landlock_ret; > + landlock_ret; landlock_ret = landlock_ret->prev) { > + if (landlock_ret->triggered) { > + ctx.cookie = landlock_ret->cookie; > + for (prog = current->seccomp.landlock_prog; > + prog; prog = prog->prev) { > + if (prog->filter == landlock_ret->filter) { > + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > + break; > + } > + } > > that is unacceptable overhead and not a scalable design. > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale > as soon as more lsm hooks are added. Good catch! I forgot to check the program (sub)type in the loop to only run the needed programs for the current hook. I will fix this. > >> As said above, Landlock will not run an eBPF programs when not strictly >> needed. Attaching to a cgroup will have the same performance impact as >> attaching to a process hierarchy. > > Having a prog per cgroup per lsm_hook is the only scalable way I > could come up with. If you see another way, please propose. > current->seccomp.landlock_prog is not the answer. Hum, I don't see the difference from a performance point of view between a cgroup-based or a process hierarchy-based system. Maybe a better option should be to use an array of pointers with N entries, one for each supported hook, instead of a unique pointer list? Anyway, being able to attach an LSM hook program to a cgroup thanks to the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility to use a process hierarchy). The downside will be to handle an LSM hook program which is not triggered by a seccomp-filter, but this should be needed anyway to handle interruptions.
On 27/08/2016 20:19, Alexei Starovoitov wrote: > On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote: >> >> On 27/08/2016 01:05, Alexei Starovoitov wrote: >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: >>> >>>>> As far as safety and type checking that bpf programs has to do, >>>>> I like the approach of patch 06/10: >>>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN, >>>>> + PTR_TO_STRUCT_FILE, struct file *, file, >>>>> + PTR_TO_STRUCT_CRED, const struct cred *, cred >>>>> +) >>>>> teaching verifier to recognize struct file, cred, sockaddr >>>>> will let bpf program access them naturally without any overhead. >>>>> Though: >>>>> @@ -102,6 +102,9 @@ enum bpf_prog_type { >>>>> BPF_PROG_TYPE_SCHED_CLS, >>>>> BPF_PROG_TYPE_SCHED_ACT, >>>>> BPF_PROG_TYPE_TRACEPOINT, >>>>> + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, >>>>> + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, >>>>> + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, >>>>> }; >>>>> is a bit of overkill. >>>>> I think it would be cleaner to have single >>>>> BPF_PROG_TYPE_LSM and at program load time pass >>>>> lsm_hook_id as well, so that verifier can do safety checks >>>>> based on type info provided in LANDLOCK_HOOKs >>>> >>>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF >>>> verifier check programs according to their types. If we need to check >>>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a >>>> dedicated program types. I don't see any other way to do it with the >>>> current verifier code. Moreover it's the purpose of program types, right? >>> >>> Adding new bpf program type for every lsm hook is not acceptable. >>> Either do one new program type + pass lsm_hook_id as suggested >>> or please come up with an alternative approach. >> >> OK, so we have to modify the verifier to not only rely on the program >> type but on another value to check the context accesses. Do you have a >> hint from where this value could come from? Do we need to add a new bpf >> command to associate a program to a subtype? > > It's another field prog_subtype (or prog_hook_id) in union bpf_attr. > Both prog_type and prog_hook_id are used during verification. > prog_type distinguishes the main aspects whereas prog_hook_id selects > which lsm_hook's argument definition to apply. > At the time of attaching to a hook, the prog_hook_id passed at the > load time should match lsm's hook_id. OK, so this new prog_subtype field should be use/set by a new bpf_cmd, right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?
On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: > > On 27/08/2016 20:06, Alexei Starovoitov wrote: > > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: > >> > >> On 27/08/2016 01:05, Alexei Starovoitov wrote: > >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > >>>> > >>>>> > >>>>> - I don't think such 'for' loop can scale. The solution needs to work > >>>>> with thousands of containers and thousands of cgroups. > >>>>> In the patch 06/10 the proposal is to use 'current' as holder of > >>>>> the programs: > >>>>> + for (prog = current->seccomp.landlock_prog; > >>>>> + prog; prog = prog->prev) { > >>>>> + if (prog->filter == landlock_ret->filter) { > >>>>> + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > >>>>> + break; > >>>>> + } > >>>>> + } > >>>>> imo that's the root of scalability issue. > >>>>> I think to be able to scale the bpf programs have to be attached to > >>>>> cgroups instead of tasks. > >>>>> That would be very different api. seccomp doesn't need to be touched. > >>>>> But that is the only way I see to be able to scale. > >>>> > >>>> Landlock is inspired from seccomp which also use a BPF program per > >>>> thread. For seccomp, each BPF programs are executed for each syscall. > >>>> For Landlock, some BPF programs are executed for some LSM hooks. I don't > >>>> see why it is a scale issue for Landlock comparing to seccomp. I also > >>>> don't see why storing the BPF program list pointer in the cgroup struct > >>>> instead of the task struct change a lot here. The BPF programs execution > >>>> will be the same anyway (for each LSM hook). Kees should probably have a > >>>> better opinion on this. > >>> > >>> seccomp has its own issues and copying them doesn't make this lsm any better. > >>> Like seccomp bpf programs are all gigantic switch statement that looks > >>> for interesting syscall numbers. All syscalls of a task are paying > >>> non-trivial seccomp penalty due to such design. If bpf was attached per > >>> syscall it would have been much cheaper. Of course doing it this way > >>> for seccomp is not easy, but for lsm such facility is already there. > >>> Blank call of a single bpf prog for all lsm hooks is unnecessary > >>> overhead that can and should be avoided. > >> > >> It's probably a misunderstanding. Contrary to seccomp which run all the > >> thread's BPF programs for any syscall, Landlock only run eBPF programs > >> for the triggered LSM hooks, if their type match. Indeed, thanks to the > >> multiple eBPF program types and contrary to seccomp, Landlock only run > >> an eBPF program when needed. Landlock will have almost no performance > >> overhead if the syscalls do not trigger the watched LSM hooks for the > >> current process. > > > > that's not what I see in the patch 06/10: > > all lsm_hooks in 'static struct security_hook_list landlock_hooks' > > (which eventually means all lsm hooks) will call > > static inline int landlock_hook_##NAME > > which will call landlock_run_prog() > > which does: > > + for (landlock_ret = current->seccomp.landlock_ret; > > + landlock_ret; landlock_ret = landlock_ret->prev) { > > + if (landlock_ret->triggered) { > > + ctx.cookie = landlock_ret->cookie; > > + for (prog = current->seccomp.landlock_prog; > > + prog; prog = prog->prev) { > > + if (prog->filter == landlock_ret->filter) { > > + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > > + break; > > + } > > + } > > > > that is unacceptable overhead and not a scalable design. > > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale > > as soon as more lsm hooks are added. > > Good catch! I forgot to check the program (sub)type in the loop to only > run the needed programs for the current hook. I will fix this. > > > > > >> As said above, Landlock will not run an eBPF programs when not strictly > >> needed. Attaching to a cgroup will have the same performance impact as > >> attaching to a process hierarchy. > > > > Having a prog per cgroup per lsm_hook is the only scalable way I > > could come up with. If you see another way, please propose. > > current->seccomp.landlock_prog is not the answer. > > Hum, I don't see the difference from a performance point of view between > a cgroup-based or a process hierarchy-based system. > > Maybe a better option should be to use an array of pointers with N > entries, one for each supported hook, instead of a unique pointer list? yes, clearly array dereference is faster than link list walk. Now the question is where to keep this prog_array[num_lsm_hooks] ? Since we cannot keep it inside task_struct, we have to allocate it. Every time the task is creted then. What to do on the fork? That will require changes all over. Then the obvious optimization would be to share this allocated array of prog pointers across multiple tasks... and little by little this new facility will look like cgroup. Hence the suggestion to put this array into cgroup from the start. > Anyway, being able to attach an LSM hook program to a cgroup thanks to > the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility > to use a process hierarchy). The downside will be to handle an LSM hook > program which is not triggered by a seccomp-filter, but this should be > needed anyway to handle interruptions. what do you mean 'not triggered by seccomp' ? You're not suggesting that this lsm has to enable seccomp to be functional? imo that's non starter due to overhead.
On Sat, Aug 27, 2016 at 09:55:01PM +0200, Mickaël Salaün wrote: > > On 27/08/2016 20:19, Alexei Starovoitov wrote: > > On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote: > >> > >> On 27/08/2016 01:05, Alexei Starovoitov wrote: > >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: > >>> > >>>>> As far as safety and type checking that bpf programs has to do, > >>>>> I like the approach of patch 06/10: > >>>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN, > >>>>> + PTR_TO_STRUCT_FILE, struct file *, file, > >>>>> + PTR_TO_STRUCT_CRED, const struct cred *, cred > >>>>> +) > >>>>> teaching verifier to recognize struct file, cred, sockaddr > >>>>> will let bpf program access them naturally without any overhead. > >>>>> Though: > >>>>> @@ -102,6 +102,9 @@ enum bpf_prog_type { > >>>>> BPF_PROG_TYPE_SCHED_CLS, > >>>>> BPF_PROG_TYPE_SCHED_ACT, > >>>>> BPF_PROG_TYPE_TRACEPOINT, > >>>>> + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, > >>>>> + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, > >>>>> + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, > >>>>> }; > >>>>> is a bit of overkill. > >>>>> I think it would be cleaner to have single > >>>>> BPF_PROG_TYPE_LSM and at program load time pass > >>>>> lsm_hook_id as well, so that verifier can do safety checks > >>>>> based on type info provided in LANDLOCK_HOOKs > >>>> > >>>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF > >>>> verifier check programs according to their types. If we need to check > >>>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a > >>>> dedicated program types. I don't see any other way to do it with the > >>>> current verifier code. Moreover it's the purpose of program types, right? > >>> > >>> Adding new bpf program type for every lsm hook is not acceptable. > >>> Either do one new program type + pass lsm_hook_id as suggested > >>> or please come up with an alternative approach. > >> > >> OK, so we have to modify the verifier to not only rely on the program > >> type but on another value to check the context accesses. Do you have a > >> hint from where this value could come from? Do we need to add a new bpf > >> command to associate a program to a subtype? > > > > It's another field prog_subtype (or prog_hook_id) in union bpf_attr. > > Both prog_type and prog_hook_id are used during verification. > > prog_type distinguishes the main aspects whereas prog_hook_id selects > > which lsm_hook's argument definition to apply. > > At the time of attaching to a hook, the prog_hook_id passed at the > > load time should match lsm's hook_id. > > OK, so this new prog_subtype field should be use/set by a new bpf_cmd, > right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA? No new command. It will be an optional field to existing BPF_PROG_LOAD. In other words instead of replicating everything that different bpf_prog_type-s need, we can pass this hook_id field to fine tune the purpose (and applicability) of the program. And one BPF_PROG_TYPE_LANDLOCK type will cover all lsm hooks. For example existing BPF_PROG_TYPE_TRACEPOINT checks the safety for all tracepoints while they all have different arguments. For tracepoints it's easier, since the only difference between them is the range of ctx access. Here we need strong type safety of arguments therefore need extra hook_id at load time.
On 27/08/2016 22:43, Alexei Starovoitov wrote: > On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: >> On 27/08/2016 20:06, Alexei Starovoitov wrote: >>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >>>> As said above, Landlock will not run an eBPF programs when not strictly >>>> needed. Attaching to a cgroup will have the same performance impact as >>>> attaching to a process hierarchy. >>> >>> Having a prog per cgroup per lsm_hook is the only scalable way I >>> could come up with. If you see another way, please propose. >>> current->seccomp.landlock_prog is not the answer. >> >> Hum, I don't see the difference from a performance point of view between >> a cgroup-based or a process hierarchy-based system. >> >> Maybe a better option should be to use an array of pointers with N >> entries, one for each supported hook, instead of a unique pointer list? > > yes, clearly array dereference is faster than link list walk. > Now the question is where to keep this prog_array[num_lsm_hooks] ? > Since we cannot keep it inside task_struct, we have to allocate it. > Every time the task is creted then. What to do on the fork? That > will require changes all over. Then the obvious optimization would be > to share this allocated array of prog pointers across multiple tasks... > and little by little this new facility will look like cgroup. > Hence the suggestion to put this array into cgroup from the start. I see your point :) > >> Anyway, being able to attach an LSM hook program to a cgroup thanks to >> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility >> to use a process hierarchy). The downside will be to handle an LSM hook >> program which is not triggered by a seccomp-filter, but this should be >> needed anyway to handle interruptions. > > what do you mean 'not triggered by seccomp' ? > You're not suggesting that this lsm has to enable seccomp to be functional? > imo that's non starter due to overhead. Yes, for now, it is triggered by a new seccomp filter return value RET_LANDLOCK, which can take a 16-bit value called cookie. This must not be needed but could be useful to bind a seccomp filter security policy with a Landlock one. Waiting for Kees's point of view…
On 27/08/2016 22:56, Alexei Starovoitov wrote: > On Sat, Aug 27, 2016 at 09:55:01PM +0200, Mickaël Salaün wrote: >> >> On 27/08/2016 20:19, Alexei Starovoitov wrote: >>> On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote: >>>> >>>> On 27/08/2016 01:05, Alexei Starovoitov wrote: >>>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: >>>>> >>>>>>> As far as safety and type checking that bpf programs has to do, >>>>>>> I like the approach of patch 06/10: >>>>>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN, >>>>>>> + PTR_TO_STRUCT_FILE, struct file *, file, >>>>>>> + PTR_TO_STRUCT_CRED, const struct cred *, cred >>>>>>> +) >>>>>>> teaching verifier to recognize struct file, cred, sockaddr >>>>>>> will let bpf program access them naturally without any overhead. >>>>>>> Though: >>>>>>> @@ -102,6 +102,9 @@ enum bpf_prog_type { >>>>>>> BPF_PROG_TYPE_SCHED_CLS, >>>>>>> BPF_PROG_TYPE_SCHED_ACT, >>>>>>> BPF_PROG_TYPE_TRACEPOINT, >>>>>>> + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, >>>>>>> + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, >>>>>>> + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, >>>>>>> }; >>>>>>> is a bit of overkill. >>>>>>> I think it would be cleaner to have single >>>>>>> BPF_PROG_TYPE_LSM and at program load time pass >>>>>>> lsm_hook_id as well, so that verifier can do safety checks >>>>>>> based on type info provided in LANDLOCK_HOOKs >>>>>> >>>>>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF >>>>>> verifier check programs according to their types. If we need to check >>>>>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a >>>>>> dedicated program types. I don't see any other way to do it with the >>>>>> current verifier code. Moreover it's the purpose of program types, right? >>>>> >>>>> Adding new bpf program type for every lsm hook is not acceptable. >>>>> Either do one new program type + pass lsm_hook_id as suggested >>>>> or please come up with an alternative approach. >>>> >>>> OK, so we have to modify the verifier to not only rely on the program >>>> type but on another value to check the context accesses. Do you have a >>>> hint from where this value could come from? Do we need to add a new bpf >>>> command to associate a program to a subtype? >>> >>> It's another field prog_subtype (or prog_hook_id) in union bpf_attr. >>> Both prog_type and prog_hook_id are used during verification. >>> prog_type distinguishes the main aspects whereas prog_hook_id selects >>> which lsm_hook's argument definition to apply. >>> At the time of attaching to a hook, the prog_hook_id passed at the >>> load time should match lsm's hook_id. >> >> OK, so this new prog_subtype field should be use/set by a new bpf_cmd, >> right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA? > > No new command. It will be an optional field to existing BPF_PROG_LOAD. > In other words instead of replicating everything that different > bpf_prog_type-s need, we can pass this hook_id field to fine tune > the purpose (and applicability) of the program. > And one BPF_PROG_TYPE_LANDLOCK type will cover all lsm hooks. > For example existing BPF_PROG_TYPE_TRACEPOINT checks the safety > for all tracepoints while they all have different arguments. > For tracepoints it's easier, since the only difference between them > is the range of ctx access. Here we need strong type safety > of arguments therefore need extra hook_id at load time. OK, I will do it.
On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: > > > On 27/08/2016 22:43, Alexei Starovoitov wrote: > > On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: > >> On 27/08/2016 20:06, Alexei Starovoitov wrote: > >>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: > >>>> As said above, Landlock will not run an eBPF programs when not strictly > >>>> needed. Attaching to a cgroup will have the same performance impact as > >>>> attaching to a process hierarchy. > >>> > >>> Having a prog per cgroup per lsm_hook is the only scalable way I > >>> could come up with. If you see another way, please propose. > >>> current->seccomp.landlock_prog is not the answer. > >> > >> Hum, I don't see the difference from a performance point of view between > >> a cgroup-based or a process hierarchy-based system. > >> > >> Maybe a better option should be to use an array of pointers with N > >> entries, one for each supported hook, instead of a unique pointer list? > > > > yes, clearly array dereference is faster than link list walk. > > Now the question is where to keep this prog_array[num_lsm_hooks] ? > > Since we cannot keep it inside task_struct, we have to allocate it. > > Every time the task is creted then. What to do on the fork? That > > will require changes all over. Then the obvious optimization would be > > to share this allocated array of prog pointers across multiple tasks... > > and little by little this new facility will look like cgroup. > > Hence the suggestion to put this array into cgroup from the start. > > I see your point :) > > > > >> Anyway, being able to attach an LSM hook program to a cgroup thanks to > >> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility > >> to use a process hierarchy). The downside will be to handle an LSM hook > >> program which is not triggered by a seccomp-filter, but this should be > >> needed anyway to handle interruptions. > > > > what do you mean 'not triggered by seccomp' ? > > You're not suggesting that this lsm has to enable seccomp to be functional? > > imo that's non starter due to overhead. > > Yes, for now, it is triggered by a new seccomp filter return value > RET_LANDLOCK, which can take a 16-bit value called cookie. This must not > be needed but could be useful to bind a seccomp filter security policy > with a Landlock one. Waiting for Kees's point of view… > I'm not Kees, but I'd be okay with that. I still think that doing this by process hierarchy a la seccomp will be easier to use and to understand (which is quite important for this kind of work) than doing it by cgroup. A feature I've wanted to add for a while is to have an fd that represents a seccomp layer, the idea being that you would set up your seccomp layer (with syscall filter, landlock hooks, etc) and then you would have a syscall to install that layer. Then an unprivileged sandbox manager could set up its layer and still be able to inject new processes into it later on, no cgroups needed. --Andy
On Aug 27, 2016 8:12 PM, "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote: > > On Sat, Aug 27, 2016 at 12:30:36AM -0700, Andy Lutomirski wrote: > > > cgroup is the common way to group multiple tasks. > > > Without cgroup only parent<->child relationship will be possible, > > > which will limit usability of such lsm to a master task that controls > > > its children. Such api restriction would have been ok, if we could > > > extend it in the future, but unfortunately task-centric won't allow it > > > without creating a parallel lsm that is cgroup based. > > > Therefore I think we have to go with cgroup-centric api and your > > > application has to use cgroups from the start though only parent-child > > > would have been enough. > > > Also I don't think the kernel can afford two bpf based lsm. One task > > > based and another cgroup based, so we have to find common ground > > > that suits both use cases. > > > Having unprivliged access is a subset. There is no strong reason why > > > cgroup+lsm+bpf should be limited to root only always. > > > When we can guarantee no pointer leaks, we can allow unpriv. > > > > I don't really understand what you mean. In the context of landlock, > > which is a *sandbox*, can one of you explain a use case that > > materially benefits from this type of cgroup usage? I haven't thought > > of one. > > In case of seccomp-like sandbox where parent controls child processes > cgroup is not needed. It's needed when container management software > needs to control a set of applications. If we can have one bpf-based lsm > that works via cgroup and without, I'd be fine with it. Right now > I haven't seen a plausible proposal to do that. Therefore cgroup based > api is a common api that works for sandbox as well, though requiring > parent to create a cgroup just to control a single child is cumbersome. > I don't believe that a common API can work to accomplish your goal. For privileged container management, the manager is trusted. For unprivileged sandboxing, the manager is emphatically not trusted, which means you need special rules like NO_NEW_PRIVS, and, unless you want to start restricting setuid and such in some cgroups, you really do need a different interface for joining the sandbox than whatever the container manager is using. What could make sense is to have one BPF-based LSM that supports both a seccomp-like unprivileged interface and a cgroup-based privileged interface. Most of the code for it is the BPF part anyway -- all that the cgroup or seccomp part needs to do is to figure out which BPF program(s) to call. Also, for container management software, you don't really need everything tied to cgroup -- you just need a way to cleanly add new processes to the same security context.
On 28/08/2016 10:13, Andy Lutomirski wrote: > On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: >> >> >> On 27/08/2016 22:43, Alexei Starovoitov wrote: >>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: >>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: >>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >>>>>> As said above, Landlock will not run an eBPF programs when not strictly >>>>>> needed. Attaching to a cgroup will have the same performance impact as >>>>>> attaching to a process hierarchy. >>>>> >>>>> Having a prog per cgroup per lsm_hook is the only scalable way I >>>>> could come up with. If you see another way, please propose. >>>>> current->seccomp.landlock_prog is not the answer. >>>> >>>> Hum, I don't see the difference from a performance point of view between >>>> a cgroup-based or a process hierarchy-based system. >>>> >>>> Maybe a better option should be to use an array of pointers with N >>>> entries, one for each supported hook, instead of a unique pointer list? >>> >>> yes, clearly array dereference is faster than link list walk. >>> Now the question is where to keep this prog_array[num_lsm_hooks] ? >>> Since we cannot keep it inside task_struct, we have to allocate it. >>> Every time the task is creted then. What to do on the fork? That >>> will require changes all over. Then the obvious optimization would be >>> to share this allocated array of prog pointers across multiple tasks... >>> and little by little this new facility will look like cgroup. >>> Hence the suggestion to put this array into cgroup from the start. >> >> I see your point :) >> >>> >>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to >>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility >>>> to use a process hierarchy). The downside will be to handle an LSM hook >>>> program which is not triggered by a seccomp-filter, but this should be >>>> needed anyway to handle interruptions. >>> >>> what do you mean 'not triggered by seccomp' ? >>> You're not suggesting that this lsm has to enable seccomp to be functional? >>> imo that's non starter due to overhead. >> >> Yes, for now, it is triggered by a new seccomp filter return value >> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not >> be needed but could be useful to bind a seccomp filter security policy >> with a Landlock one. Waiting for Kees's point of view… >> > > I'm not Kees, but I'd be okay with that. I still think that doing > this by process hierarchy a la seccomp will be easier to use and to > understand (which is quite important for this kind of work) than doing > it by cgroup. > > A feature I've wanted to add for a while is to have an fd that > represents a seccomp layer, the idea being that you would set up your > seccomp layer (with syscall filter, landlock hooks, etc) and then you > would have a syscall to install that layer. Then an unprivileged > sandbox manager could set up its layer and still be able to inject new > processes into it later on, no cgroups needed. A nice thing I didn't highlight about Landlock is that a process can prepare a layer of rules (arraymap of handles + Landlock programs) and pass the file descriptors of the Landlock programs to another process. This process could then apply this programs to get sandboxed. However, for now, because a Landlock program is only triggered by a seccomp filter (which do not follow the Landlock programs as a FD), they will be useless. The FD referring to an arraymap of handles can also be used to update a map and change the behavior of a Landlock program. A master process can then add or remove restrictions to another process hierarchy on the fly. However, I think it would make more sense to use cgroups if we want to move an existing (unwilling) unsandoxed process into a sandboxed environment. Of course, some more no_new_privs checks would be needed.
On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote: > > > On 28/08/2016 10:13, Andy Lutomirski wrote: >> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: >>> >>> >>> On 27/08/2016 22:43, Alexei Starovoitov wrote: >>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: >>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: >>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >>>>>>> As said above, Landlock will not run an eBPF programs when not strictly >>>>>>> needed. Attaching to a cgroup will have the same performance impact as >>>>>>> attaching to a process hierarchy. >>>>>> >>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I >>>>>> could come up with. If you see another way, please propose. >>>>>> current->seccomp.landlock_prog is not the answer. >>>>> >>>>> Hum, I don't see the difference from a performance point of view between >>>>> a cgroup-based or a process hierarchy-based system. >>>>> >>>>> Maybe a better option should be to use an array of pointers with N >>>>> entries, one for each supported hook, instead of a unique pointer list? >>>> >>>> yes, clearly array dereference is faster than link list walk. >>>> Now the question is where to keep this prog_array[num_lsm_hooks] ? >>>> Since we cannot keep it inside task_struct, we have to allocate it. >>>> Every time the task is creted then. What to do on the fork? That >>>> will require changes all over. Then the obvious optimization would be >>>> to share this allocated array of prog pointers across multiple tasks... >>>> and little by little this new facility will look like cgroup. >>>> Hence the suggestion to put this array into cgroup from the start. >>> >>> I see your point :) >>> >>>> >>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to >>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility >>>>> to use a process hierarchy). The downside will be to handle an LSM hook >>>>> program which is not triggered by a seccomp-filter, but this should be >>>>> needed anyway to handle interruptions. >>>> >>>> what do you mean 'not triggered by seccomp' ? >>>> You're not suggesting that this lsm has to enable seccomp to be functional? >>>> imo that's non starter due to overhead. >>> >>> Yes, for now, it is triggered by a new seccomp filter return value >>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not >>> be needed but could be useful to bind a seccomp filter security policy >>> with a Landlock one. Waiting for Kees's point of view… >>> >> >> I'm not Kees, but I'd be okay with that. I still think that doing >> this by process hierarchy a la seccomp will be easier to use and to >> understand (which is quite important for this kind of work) than doing >> it by cgroup. >> >> A feature I've wanted to add for a while is to have an fd that >> represents a seccomp layer, the idea being that you would set up your >> seccomp layer (with syscall filter, landlock hooks, etc) and then you >> would have a syscall to install that layer. Then an unprivileged >> sandbox manager could set up its layer and still be able to inject new >> processes into it later on, no cgroups needed. > > A nice thing I didn't highlight about Landlock is that a process can > prepare a layer of rules (arraymap of handles + Landlock programs) and > pass the file descriptors of the Landlock programs to another process. > This process could then apply this programs to get sandboxed. However, > for now, because a Landlock program is only triggered by a seccomp > filter (which do not follow the Landlock programs as a FD), they will be > useless. > > The FD referring to an arraymap of handles can also be used to update a > map and change the behavior of a Landlock program. A master process can > then add or remove restrictions to another process hierarchy on the fly. Maybe this could be extended a little bit. The fd could hold the seccomp filter *and* the LSM hook filters. FMODE_EXECUTE could give the ability to install it and FMODE_WRITE could give the ability to modify it.
On 30/08/2016 20:55, Andy Lutomirski wrote: > On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote: >> >> >> On 28/08/2016 10:13, Andy Lutomirski wrote: >>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: >>>> >>>> >>>> On 27/08/2016 22:43, Alexei Starovoitov wrote: >>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: >>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: >>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly >>>>>>>> needed. Attaching to a cgroup will have the same performance impact as >>>>>>>> attaching to a process hierarchy. >>>>>>> >>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I >>>>>>> could come up with. If you see another way, please propose. >>>>>>> current->seccomp.landlock_prog is not the answer. >>>>>> >>>>>> Hum, I don't see the difference from a performance point of view between >>>>>> a cgroup-based or a process hierarchy-based system. >>>>>> >>>>>> Maybe a better option should be to use an array of pointers with N >>>>>> entries, one for each supported hook, instead of a unique pointer list? >>>>> >>>>> yes, clearly array dereference is faster than link list walk. >>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ? >>>>> Since we cannot keep it inside task_struct, we have to allocate it. >>>>> Every time the task is creted then. What to do on the fork? That >>>>> will require changes all over. Then the obvious optimization would be >>>>> to share this allocated array of prog pointers across multiple tasks... >>>>> and little by little this new facility will look like cgroup. >>>>> Hence the suggestion to put this array into cgroup from the start. >>>> >>>> I see your point :) >>>> >>>>> >>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to >>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility >>>>>> to use a process hierarchy). The downside will be to handle an LSM hook >>>>>> program which is not triggered by a seccomp-filter, but this should be >>>>>> needed anyway to handle interruptions. >>>>> >>>>> what do you mean 'not triggered by seccomp' ? >>>>> You're not suggesting that this lsm has to enable seccomp to be functional? >>>>> imo that's non starter due to overhead. >>>> >>>> Yes, for now, it is triggered by a new seccomp filter return value >>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not >>>> be needed but could be useful to bind a seccomp filter security policy >>>> with a Landlock one. Waiting for Kees's point of view… >>>> >>> >>> I'm not Kees, but I'd be okay with that. I still think that doing >>> this by process hierarchy a la seccomp will be easier to use and to >>> understand (which is quite important for this kind of work) than doing >>> it by cgroup. >>> >>> A feature I've wanted to add for a while is to have an fd that >>> represents a seccomp layer, the idea being that you would set up your >>> seccomp layer (with syscall filter, landlock hooks, etc) and then you >>> would have a syscall to install that layer. Then an unprivileged >>> sandbox manager could set up its layer and still be able to inject new >>> processes into it later on, no cgroups needed. >> >> A nice thing I didn't highlight about Landlock is that a process can >> prepare a layer of rules (arraymap of handles + Landlock programs) and >> pass the file descriptors of the Landlock programs to another process. >> This process could then apply this programs to get sandboxed. However, >> for now, because a Landlock program is only triggered by a seccomp >> filter (which do not follow the Landlock programs as a FD), they will be >> useless. >> >> The FD referring to an arraymap of handles can also be used to update a >> map and change the behavior of a Landlock program. A master process can >> then add or remove restrictions to another process hierarchy on the fly. > > Maybe this could be extended a little bit. The fd could hold the > seccomp filter *and* the LSM hook filters. FMODE_EXECUTE could give > the ability to install it and FMODE_WRITE could give the ability to > modify it. > This is interesting! It should be possible to append the seccomp stack of a source process to the seccomp stack of the target process when a Landlock program is passed and then activated through seccomp(2). For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage permission of the eBPF program FD in a specific way?
On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote: > > On 30/08/2016 20:55, Andy Lutomirski wrote: >> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote: >>> >>> >>> On 28/08/2016 10:13, Andy Lutomirski wrote: >>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: >>>>> >>>>> >>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote: >>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: >>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: >>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly >>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as >>>>>>>>> attaching to a process hierarchy. >>>>>>>> >>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I >>>>>>>> could come up with. If you see another way, please propose. >>>>>>>> current->seccomp.landlock_prog is not the answer. >>>>>>> >>>>>>> Hum, I don't see the difference from a performance point of view between >>>>>>> a cgroup-based or a process hierarchy-based system. >>>>>>> >>>>>>> Maybe a better option should be to use an array of pointers with N >>>>>>> entries, one for each supported hook, instead of a unique pointer list? >>>>>> >>>>>> yes, clearly array dereference is faster than link list walk. >>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ? >>>>>> Since we cannot keep it inside task_struct, we have to allocate it. >>>>>> Every time the task is creted then. What to do on the fork? That >>>>>> will require changes all over. Then the obvious optimization would be >>>>>> to share this allocated array of prog pointers across multiple tasks... >>>>>> and little by little this new facility will look like cgroup. >>>>>> Hence the suggestion to put this array into cgroup from the start. >>>>> >>>>> I see your point :) >>>>> >>>>>> >>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to >>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility >>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook >>>>>>> program which is not triggered by a seccomp-filter, but this should be >>>>>>> needed anyway to handle interruptions. >>>>>> >>>>>> what do you mean 'not triggered by seccomp' ? >>>>>> You're not suggesting that this lsm has to enable seccomp to be functional? >>>>>> imo that's non starter due to overhead. >>>>> >>>>> Yes, for now, it is triggered by a new seccomp filter return value >>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not >>>>> be needed but could be useful to bind a seccomp filter security policy >>>>> with a Landlock one. Waiting for Kees's point of view… >>>>> >>>> >>>> I'm not Kees, but I'd be okay with that. I still think that doing >>>> this by process hierarchy a la seccomp will be easier to use and to >>>> understand (which is quite important for this kind of work) than doing >>>> it by cgroup. >>>> >>>> A feature I've wanted to add for a while is to have an fd that >>>> represents a seccomp layer, the idea being that you would set up your >>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you >>>> would have a syscall to install that layer. Then an unprivileged >>>> sandbox manager could set up its layer and still be able to inject new >>>> processes into it later on, no cgroups needed. >>> >>> A nice thing I didn't highlight about Landlock is that a process can >>> prepare a layer of rules (arraymap of handles + Landlock programs) and >>> pass the file descriptors of the Landlock programs to another process. >>> This process could then apply this programs to get sandboxed. However, >>> for now, because a Landlock program is only triggered by a seccomp >>> filter (which do not follow the Landlock programs as a FD), they will be >>> useless. >>> >>> The FD referring to an arraymap of handles can also be used to update a >>> map and change the behavior of a Landlock program. A master process can >>> then add or remove restrictions to another process hierarchy on the fly. >> >> Maybe this could be extended a little bit. The fd could hold the >> seccomp filter *and* the LSM hook filters. FMODE_EXECUTE could give >> the ability to install it and FMODE_WRITE could give the ability to >> modify it. >> > > This is interesting! It should be possible to append the seccomp stack > of a source process to the seccomp stack of the target process when a > Landlock program is passed and then activated through seccomp(2). > > For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage > permission of the eBPF program FD in a specific way? > This wouldn't be an eBPF program FD -- it would be an FD encapsulating an entire configuration including seccomp BPF program, whatever landlock stuff is associated, and eventual seccomp monitor configuration (once I write that code), etc. You wouldn't say "attach this process's seccomp stack to me" -- you'd say "attach this seccomp layer to me". A decision that we'd have to make would be whether the FD links to the parent layer or whether it can be attached without regard to what the parent layer is. --Andy
On 30/08/2016 22:23, Andy Lutomirski wrote: > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote: >> >> On 30/08/2016 20:55, Andy Lutomirski wrote: >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote: >>>> >>>> >>>> On 28/08/2016 10:13, Andy Lutomirski wrote: >>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: >>>>>> >>>>>> >>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote: >>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: >>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: >>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly >>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as >>>>>>>>>> attaching to a process hierarchy. >>>>>>>>> >>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I >>>>>>>>> could come up with. If you see another way, please propose. >>>>>>>>> current->seccomp.landlock_prog is not the answer. >>>>>>>> >>>>>>>> Hum, I don't see the difference from a performance point of view between >>>>>>>> a cgroup-based or a process hierarchy-based system. >>>>>>>> >>>>>>>> Maybe a better option should be to use an array of pointers with N >>>>>>>> entries, one for each supported hook, instead of a unique pointer list? >>>>>>> >>>>>>> yes, clearly array dereference is faster than link list walk. >>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ? >>>>>>> Since we cannot keep it inside task_struct, we have to allocate it. >>>>>>> Every time the task is creted then. What to do on the fork? That >>>>>>> will require changes all over. Then the obvious optimization would be >>>>>>> to share this allocated array of prog pointers across multiple tasks... >>>>>>> and little by little this new facility will look like cgroup. >>>>>>> Hence the suggestion to put this array into cgroup from the start. >>>>>> >>>>>> I see your point :) >>>>>> >>>>>>> >>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to >>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility >>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook >>>>>>>> program which is not triggered by a seccomp-filter, but this should be >>>>>>>> needed anyway to handle interruptions. >>>>>>> >>>>>>> what do you mean 'not triggered by seccomp' ? >>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional? >>>>>>> imo that's non starter due to overhead. >>>>>> >>>>>> Yes, for now, it is triggered by a new seccomp filter return value >>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not >>>>>> be needed but could be useful to bind a seccomp filter security policy >>>>>> with a Landlock one. Waiting for Kees's point of view… >>>>>> >>>>> >>>>> I'm not Kees, but I'd be okay with that. I still think that doing >>>>> this by process hierarchy a la seccomp will be easier to use and to >>>>> understand (which is quite important for this kind of work) than doing >>>>> it by cgroup. >>>>> >>>>> A feature I've wanted to add for a while is to have an fd that >>>>> represents a seccomp layer, the idea being that you would set up your >>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you >>>>> would have a syscall to install that layer. Then an unprivileged >>>>> sandbox manager could set up its layer and still be able to inject new >>>>> processes into it later on, no cgroups needed. >>>> >>>> A nice thing I didn't highlight about Landlock is that a process can >>>> prepare a layer of rules (arraymap of handles + Landlock programs) and >>>> pass the file descriptors of the Landlock programs to another process. >>>> This process could then apply this programs to get sandboxed. However, >>>> for now, because a Landlock program is only triggered by a seccomp >>>> filter (which do not follow the Landlock programs as a FD), they will be >>>> useless. >>>> >>>> The FD referring to an arraymap of handles can also be used to update a >>>> map and change the behavior of a Landlock program. A master process can >>>> then add or remove restrictions to another process hierarchy on the fly. >>> >>> Maybe this could be extended a little bit. The fd could hold the >>> seccomp filter *and* the LSM hook filters. FMODE_EXECUTE could give >>> the ability to install it and FMODE_WRITE could give the ability to >>> modify it. >>> >> >> This is interesting! It should be possible to append the seccomp stack >> of a source process to the seccomp stack of the target process when a >> Landlock program is passed and then activated through seccomp(2). >> >> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage >> permission of the eBPF program FD in a specific way? >> > > This wouldn't be an eBPF program FD -- it would be an FD encapsulating > an entire configuration including seccomp BPF program, whatever > landlock stuff is associated, and eventual seccomp monitor > configuration (once I write that code), etc. > > You wouldn't say "attach this process's seccomp stack to me" -- you'd > say "attach this seccomp layer to me". > > A decision that we'd have to make would be whether the FD links to the > parent layer or whether it can be attached without regard to what the > parent layer is. OK, I like that, but I think it could be done on a second time. :)
On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote: > > > On 30/08/2016 22:23, Andy Lutomirski wrote: > > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote: > >> > >> On 30/08/2016 20:55, Andy Lutomirski wrote: > >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote: > >>>> > >>>> > >>>> On 28/08/2016 10:13, Andy Lutomirski wrote: > >>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: > >>>>>> > >>>>>> > >>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote: > >>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: > >>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: > >>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: > >>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly > >>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as > >>>>>>>>>> attaching to a process hierarchy. > >>>>>>>>> > >>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I > >>>>>>>>> could come up with. If you see another way, please propose. > >>>>>>>>> current->seccomp.landlock_prog is not the answer. > >>>>>>>> > >>>>>>>> Hum, I don't see the difference from a performance point of view between > >>>>>>>> a cgroup-based or a process hierarchy-based system. > >>>>>>>> > >>>>>>>> Maybe a better option should be to use an array of pointers with N > >>>>>>>> entries, one for each supported hook, instead of a unique pointer list? > >>>>>>> > >>>>>>> yes, clearly array dereference is faster than link list walk. > >>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ? > >>>>>>> Since we cannot keep it inside task_struct, we have to allocate it. > >>>>>>> Every time the task is creted then. What to do on the fork? That > >>>>>>> will require changes all over. Then the obvious optimization would be > >>>>>>> to share this allocated array of prog pointers across multiple tasks... > >>>>>>> and little by little this new facility will look like cgroup. > >>>>>>> Hence the suggestion to put this array into cgroup from the start. > >>>>>> > >>>>>> I see your point :) > >>>>>> > >>>>>>> > >>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to > >>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility > >>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook > >>>>>>>> program which is not triggered by a seccomp-filter, but this should be > >>>>>>>> needed anyway to handle interruptions. > >>>>>>> > >>>>>>> what do you mean 'not triggered by seccomp' ? > >>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional? > >>>>>>> imo that's non starter due to overhead. > >>>>>> > >>>>>> Yes, for now, it is triggered by a new seccomp filter return value > >>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not > >>>>>> be needed but could be useful to bind a seccomp filter security policy > >>>>>> with a Landlock one. Waiting for Kees's point of view… > >>>>>> > >>>>> > >>>>> I'm not Kees, but I'd be okay with that. I still think that doing > >>>>> this by process hierarchy a la seccomp will be easier to use and to > >>>>> understand (which is quite important for this kind of work) than doing > >>>>> it by cgroup. > >>>>> > >>>>> A feature I've wanted to add for a while is to have an fd that > >>>>> represents a seccomp layer, the idea being that you would set up your > >>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you > >>>>> would have a syscall to install that layer. Then an unprivileged > >>>>> sandbox manager could set up its layer and still be able to inject new > >>>>> processes into it later on, no cgroups needed. > >>>> > >>>> A nice thing I didn't highlight about Landlock is that a process can > >>>> prepare a layer of rules (arraymap of handles + Landlock programs) and > >>>> pass the file descriptors of the Landlock programs to another process. > >>>> This process could then apply this programs to get sandboxed. However, > >>>> for now, because a Landlock program is only triggered by a seccomp > >>>> filter (which do not follow the Landlock programs as a FD), they will be > >>>> useless. > >>>> > >>>> The FD referring to an arraymap of handles can also be used to update a > >>>> map and change the behavior of a Landlock program. A master process can > >>>> then add or remove restrictions to another process hierarchy on the fly. > >>> > >>> Maybe this could be extended a little bit. The fd could hold the > >>> seccomp filter *and* the LSM hook filters. FMODE_EXECUTE could give > >>> the ability to install it and FMODE_WRITE could give the ability to > >>> modify it. > >>> > >> > >> This is interesting! It should be possible to append the seccomp stack > >> of a source process to the seccomp stack of the target process when a > >> Landlock program is passed and then activated through seccomp(2). > >> > >> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage > >> permission of the eBPF program FD in a specific way? > >> > > > > This wouldn't be an eBPF program FD -- it would be an FD encapsulating > > an entire configuration including seccomp BPF program, whatever > > landlock stuff is associated, and eventual seccomp monitor > > configuration (once I write that code), etc. > > > > You wouldn't say "attach this process's seccomp stack to me" -- you'd > > say "attach this seccomp layer to me". > > > > A decision that we'd have to make would be whether the FD links to the > > parent layer or whether it can be attached without regard to what the > > parent layer is. > > OK, I like that, but I think it could be done on a second time. :) I don't. Single FD that is a collection of objects seems an odd abstraction to me. I also don't see what it actually solves. I think lsm and seccomp should be orthogonal and not tied into each other.
On Aug 30, 2016 1:56 PM, "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote: > > On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote: > > > > > > On 30/08/2016 22:23, Andy Lutomirski wrote: > > > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@digikod.net> wrote: > > >> > > >> On 30/08/2016 20:55, Andy Lutomirski wrote: > > >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@digikod.net> wrote: > > >>>> > > >>>> > > >>>> On 28/08/2016 10:13, Andy Lutomirski wrote: > > >>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@digikod.net> wrote: > > >>>>>> > > >>>>>> > > >>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote: > > >>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote: > > >>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote: > > >>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: > > >>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly > > >>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as > > >>>>>>>>>> attaching to a process hierarchy. > > >>>>>>>>> > > >>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I > > >>>>>>>>> could come up with. If you see another way, please propose. > > >>>>>>>>> current->seccomp.landlock_prog is not the answer. > > >>>>>>>> > > >>>>>>>> Hum, I don't see the difference from a performance point of view between > > >>>>>>>> a cgroup-based or a process hierarchy-based system. > > >>>>>>>> > > >>>>>>>> Maybe a better option should be to use an array of pointers with N > > >>>>>>>> entries, one for each supported hook, instead of a unique pointer list? > > >>>>>>> > > >>>>>>> yes, clearly array dereference is faster than link list walk. > > >>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ? > > >>>>>>> Since we cannot keep it inside task_struct, we have to allocate it. > > >>>>>>> Every time the task is creted then. What to do on the fork? That > > >>>>>>> will require changes all over. Then the obvious optimization would be > > >>>>>>> to share this allocated array of prog pointers across multiple tasks... > > >>>>>>> and little by little this new facility will look like cgroup. > > >>>>>>> Hence the suggestion to put this array into cgroup from the start. > > >>>>>> > > >>>>>> I see your point :) > > >>>>>> > > >>>>>>> > > >>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to > > >>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility > > >>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook > > >>>>>>>> program which is not triggered by a seccomp-filter, but this should be > > >>>>>>>> needed anyway to handle interruptions. > > >>>>>>> > > >>>>>>> what do you mean 'not triggered by seccomp' ? > > >>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional? > > >>>>>>> imo that's non starter due to overhead. > > >>>>>> > > >>>>>> Yes, for now, it is triggered by a new seccomp filter return value > > >>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not > > >>>>>> be needed but could be useful to bind a seccomp filter security policy > > >>>>>> with a Landlock one. Waiting for Kees's point of view… > > >>>>>> > > >>>>> > > >>>>> I'm not Kees, but I'd be okay with that. I still think that doing > > >>>>> this by process hierarchy a la seccomp will be easier to use and to > > >>>>> understand (which is quite important for this kind of work) than doing > > >>>>> it by cgroup. > > >>>>> > > >>>>> A feature I've wanted to add for a while is to have an fd that > > >>>>> represents a seccomp layer, the idea being that you would set up your > > >>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you > > >>>>> would have a syscall to install that layer. Then an unprivileged > > >>>>> sandbox manager could set up its layer and still be able to inject new > > >>>>> processes into it later on, no cgroups needed. > > >>>> > > >>>> A nice thing I didn't highlight about Landlock is that a process can > > >>>> prepare a layer of rules (arraymap of handles + Landlock programs) and > > >>>> pass the file descriptors of the Landlock programs to another process. > > >>>> This process could then apply this programs to get sandboxed. However, > > >>>> for now, because a Landlock program is only triggered by a seccomp > > >>>> filter (which do not follow the Landlock programs as a FD), they will be > > >>>> useless. > > >>>> > > >>>> The FD referring to an arraymap of handles can also be used to update a > > >>>> map and change the behavior of a Landlock program. A master process can > > >>>> then add or remove restrictions to another process hierarchy on the fly. > > >>> > > >>> Maybe this could be extended a little bit. The fd could hold the > > >>> seccomp filter *and* the LSM hook filters. FMODE_EXECUTE could give > > >>> the ability to install it and FMODE_WRITE could give the ability to > > >>> modify it. > > >>> > > >> > > >> This is interesting! It should be possible to append the seccomp stack > > >> of a source process to the seccomp stack of the target process when a > > >> Landlock program is passed and then activated through seccomp(2). > > >> > > >> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage > > >> permission of the eBPF program FD in a specific way? > > >> > > > > > > This wouldn't be an eBPF program FD -- it would be an FD encapsulating > > > an entire configuration including seccomp BPF program, whatever > > > landlock stuff is associated, and eventual seccomp monitor > > > configuration (once I write that code), etc. > > > > > > You wouldn't say "attach this process's seccomp stack to me" -- you'd > > > say "attach this seccomp layer to me". > > > > > > A decision that we'd have to make would be whether the FD links to the > > > parent layer or whether it can be attached without regard to what the > > > parent layer is. > > > > OK, I like that, but I think it could be done on a second time. :) > > I don't. Single FD that is a collection of objects seems an odd abstraction > to me. I also don't see what it actually solves. > I think lsm and seccomp should be orthogonal and not tied into each other. > It's not a random collection of objects. It's a fully configured sandboxing layer. One might argue that landlock shouldn't be tied to seccomp (in theory, attached progs could be given access to syscall_get_xyz()), but I think that the seccomp attachment mechanism is the right way to install unprivileged filters. It handles the no_new_privs stuff, it allows TSYNC, it's totally independent of systemwide policy, etc. Trying to use cgroups or similar for this is going to be much nastier. Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of putting cgroupfs in their containers, so requiring cgroups or similar would be a mess for that type of application. --Andy
On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote: > > One might argue that landlock shouldn't be tied to seccomp (in theory, > attached progs could be given access to syscall_get_xyz()), but I proposed lsm is way more powerful than syscall_get_xyz. no need to dumb it down. > think that the seccomp attachment mechanism is the right way to > install unprivileged filters. It handles the no_new_privs stuff, it > allows TSYNC, it's totally independent of systemwide policy, etc. > > Trying to use cgroups or similar for this is going to be much nastier. > Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of > putting cgroupfs in their containers, so requiring cgroups or similar > would be a mess for that type of application. I don't see why it is a 'mess'. cgroups are already used by majority of the systems, so I don't see why requiring a cgroup is such a big deal. But let's say we don't do them. How implementation is going to look like for task based hierarchy? Note that we need an array of bpf_prog pointers. One for each lsm hook. Where this array is going to be stored? We cannot put in task_struct, since it's too large. Cannot put it into 'struct seccomp' directly either, unless it will become a pointer. Is that the proposal? So now we will be wasting extra 1kbyte of memory per task. Not great. We'd want to optimize it by sharing this such struct seccomp with prog array across threads of the same task? Or dynimically allocating it when landlock is in use? May sound nice, but how to account for that kernel memory? I guess also solvable by charging memlock. With cgroup based approach we don't need to worry about all that.
On Tue, Aug 30, 2016 at 6:36 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote: >> >> One might argue that landlock shouldn't be tied to seccomp (in theory, >> attached progs could be given access to syscall_get_xyz()), but I > > proposed lsm is way more powerful than syscall_get_xyz. > no need to dumb it down. I think you're misunderstanding me. Mickaël's code allows one to make the LSM hook filters depend on the syscall using SECCOMP_RET_LANDLOCK. I'm suggesting that a similar effect could be achieved by allowing the eBPF LSM hook to call syscall_get_xyz() if it wants to. > >> think that the seccomp attachment mechanism is the right way to >> install unprivileged filters. It handles the no_new_privs stuff, it >> allows TSYNC, it's totally independent of systemwide policy, etc. >> >> Trying to use cgroups or similar for this is going to be much nastier. >> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of >> putting cgroupfs in their containers, so requiring cgroups or similar >> would be a mess for that type of application. > > I don't see why it is a 'mess'. cgroups are already used by majority > of the systems, so I don't see why requiring a cgroup is such a big deal. Requiring cgroup to be configured in isn't a big deal. Requiring > But let's say we don't do them. How implementation is going to look like > for task based hierarchy? Note that we need an array of bpf_prog pointers. > One for each lsm hook. Where this array is going to be stored? > We cannot put in task_struct, since it's too large. Cannot put it > into 'struct seccomp' directly either, unless it will become a pointer. > Is that the proposal? It would go in struct seccomp_filter or in something pointed to from there. > So now we will be wasting extra 1kbyte of memory per task. Not great. > We'd want to optimize it by sharing this such struct seccomp with prog array > across threads of the same task? Or dynimically allocating it when > landlock is in use? May sound nice, but how to account for that kernel > memory? I guess also solvable by charging memlock. > With cgroup based approach we don't need to worry about all that. > The considerations are essentially identical either way. With cgroups, if you want to share the memory between multiple separate sandboxes (Firejail instances, Sandstorm grains, Chromium instances, xdg-apps, etc), you'd need to get them to all coordinate to share a cgroup. With a seccomp-like interface, you'd need to get them to coordinate to share an installed layer (using my FD idea or similar). There would *not* be any duplication of this memory just because a sandboxed process called fork(). --Andy
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 79014aedbea4..9e6786e7a40a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -14,6 +14,9 @@ #ifdef CONFIG_SECURITY_LANDLOCK #include <linux/fs.h> /* struct file */ +#ifdef CONFIG_CGROUPS +#include <linux/cgroup-defs.h> /* struct cgroup_subsys_state */ +#endif /* CONFIG_CGROUPS */ #endif /* CONFIG_SECURITY_LANDLOCK */ struct bpf_map; @@ -85,6 +88,7 @@ enum bpf_arg_type { ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */ ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS, /* pointer to Landlock FS handle */ + ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP, /* pointer to Landlock cgroup handle */ }; /* type of values returned from helper functions */ @@ -148,6 +152,7 @@ enum bpf_reg_type { PTR_TO_STRUCT_FILE, PTR_TO_STRUCT_CRED, CONST_PTR_TO_LANDLOCK_HANDLE_FS, + CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP, }; struct bpf_prog; @@ -212,6 +217,9 @@ struct map_landlock_handle { u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */ union { struct file *file; +#ifdef CONFIG_CGROUPS + struct cgroup_subsys_state *css; +#endif /* CONFIG_CGROUPS */ }; }; #endif /* CONFIG_SECURITY_LANDLOCK */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 88af79dd668c..7f60b9fdb35c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -90,12 +90,14 @@ enum bpf_map_type { enum bpf_map_array_type { BPF_MAP_ARRAY_TYPE_UNSPEC, BPF_MAP_ARRAY_TYPE_LANDLOCK_FS, + BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP, }; enum bpf_map_handle_type { BPF_MAP_HANDLE_TYPE_UNSPEC, BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD, BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB, + BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD, }; enum bpf_map_array_op { @@ -364,6 +366,19 @@ enum bpf_func_id { */ BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file, + /** + * bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) + * Check if the current process is a leaf of cgroup handles + * + * @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE) + * @map: handles to compare against + * @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR) + * + * Return: 0 if the current cgroup is the sam or beneath the handle, + * 1 otherwise, or a negative value if an error occurred. + */ + BPF_FUNC_landlock_cmp_cgroup_beneath, + __BPF_FUNC_MAX_ID, }; diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 6804dafd8355..050b3d8d88c8 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -19,6 +19,12 @@ #include <linux/file.h> /* fput() */ #include <linux/fs.h> /* struct file */ +#ifdef CONFIG_SECURITY_LANDLOCK +#ifdef CONFIG_CGROUPS +#include <linux/cgroup-defs.h> /* struct cgroup_subsys_state */ +#endif /* CONFIG_CGROUPS */ +#endif /* CONFIG_SECURITY_LANDLOCK */ + static void bpf_array_free_percpu(struct bpf_array *array) { int i; @@ -514,6 +520,12 @@ static void landlock_put_handle(struct map_landlock_handle *handle) else WARN_ON(1); break; + case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD: + if (likely(handle->css)) + css_put(handle->css); + else + WARN_ON(1); + break; default: WARN_ON(1); } @@ -541,6 +553,10 @@ static enum bpf_map_array_type landlock_get_array_type( case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD: case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB: return BPF_MAP_ARRAY_TYPE_LANDLOCK_FS; + case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD: +#ifdef CONFIG_CGROUPS + return BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP; +#endif /* CONFIG_CGROUPS */ case BPF_MAP_HANDLE_TYPE_UNSPEC: default: return -EINVAL; @@ -557,6 +573,9 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst, struct landlock_handle *khandle) { struct path kpath; +#ifdef CONFIG_CGROUPS + struct cgroup_subsys_state *css; +#endif /* CONFIG_CGROUPS */ struct file *handle_file; if (unlikely(!khandle)) @@ -569,6 +588,17 @@ static inline long landlock_store_handle(struct map_landlock_handle *dst, FGET_OR_RET(handle_file, khandle->fd); dst->file = handle_file; break; +#ifdef CONFIG_CGROUPS + case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD: + FGET_OR_RET(handle_file, khandle->fd); + css = css_tryget_online_from_dir(file_dentry(handle_file), NULL); + fput(handle_file); + /* NULL css check done by css_tryget_online_from_dir() */ + if (IS_ERR(css)) + return PTR_ERR(css); + dst->css = css; + break; +#endif /* CONFIG_CGROUPS */ default: WARN_ON(1); path_put(&kpath); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b182c88d5c13..b4e5c3bbc520 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -247,6 +247,7 @@ static const char * const reg_type_str[] = { [PTR_TO_STRUCT_FILE] = "struct_file", [PTR_TO_STRUCT_CRED] = "struct_cred", [CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs", + [CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP] = "landlock_handle_cgroup", }; static void print_verifier_state(struct verifier_state *state) @@ -560,6 +561,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type) case PTR_TO_STRUCT_FILE: case PTR_TO_STRUCT_CRED: case CONST_PTR_TO_LANDLOCK_HANDLE_FS: + case CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP: return true; default: return false; @@ -955,6 +957,8 @@ static int check_func_arg(struct verifier_env *env, u32 regno, expected_type = PTR_TO_STRUCT_CRED; } else if (arg_type == ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS) { expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS; + } else if (arg_type == ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP) { + expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP; } else if (arg_type == ARG_PTR_TO_STACK || arg_type == ARG_PTR_TO_RAW_STACK) { expected_type = PTR_TO_STACK; @@ -1733,6 +1737,8 @@ static inline enum bpf_reg_type bpf_reg_type_from_map(struct bpf_map *map) switch (map->map_array_type) { case BPF_MAP_ARRAY_TYPE_LANDLOCK_FS: return CONST_PTR_TO_LANDLOCK_HANDLE_FS; + case BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP: + return CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP; case BPF_MAP_ARRAY_TYPE_UNSPEC: default: return CONST_PTR_TO_MAP; diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig index dc8328d216d7..414eb047e50e 100644 --- a/security/landlock/Kconfig +++ b/security/landlock/Kconfig @@ -10,6 +10,9 @@ config SECURITY_LANDLOCK of stacked eBPF programs for some LSM hooks. Each program can do some access comparison to check if an access request is legitimate. + It is recommended to enable cgroups to be able to match a policy + according to a group of processes. + Further information about eBPF can be found in Documentation/networking/filter.txt diff --git a/security/landlock/Makefile b/security/landlock/Makefile index 27f359a8cfaa..cdaaa152b849 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o -landlock-y := lsm.o checker_fs.o +landlock-y := lsm.o checker_fs.o checker_cgroup.o diff --git a/security/landlock/checker_cgroup.c b/security/landlock/checker_cgroup.c new file mode 100644 index 000000000000..97f29ac64188 --- /dev/null +++ b/security/landlock/checker_cgroup.c @@ -0,0 +1,96 @@ +/* + * Landlock LSM - cgroup Checkers + * + * Copyright (C) 2016 Mickaël Salaün <mic@digikod.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. + */ + +#ifdef CONFIG_CGROUPS + +#include <asm/current.h> +#include <linux/bpf.h> /* enum bpf_map_array_op */ +#include <linux/cgroup-defs.h> /* struct cgroup_subsys_state */ +#include <linux/cgroup.h> /* cgroup_is_descendant(), task_css_set() */ +#include <linux/errno.h> + +#include "checker_cgroup.h" + + +/* + * bpf_landlock_cmp_cgroup_beneath + * + * Cf. include/uapi/linux/bpf.h + */ +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map, + u64 r3_map_op, u64 r4, u64 r5) +{ + u8 option = (u8) r1_option; + struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; + enum bpf_map_array_op map_op = r3_map_op; + struct bpf_array *array = container_of(map, struct bpf_array, map); + struct cgroup *cg1, *cg2; + struct map_landlock_handle *handle; + int i; + + /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */ + if (unlikely(!map)) { + WARN_ON(1); + return -EFAULT; + } + if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK)) + return -EINVAL; + + /* for now, only handle OP_OR */ + switch (map_op) { + case BPF_MAP_ARRAY_OP_OR: + break; + case BPF_MAP_ARRAY_OP_UNSPEC: + case BPF_MAP_ARRAY_OP_AND: + case BPF_MAP_ARRAY_OP_XOR: + default: + return -EINVAL; + } + + synchronize_rcu(); + + for (i = 0; i < array->n_entries; i++) { + handle = (struct map_landlock_handle *) + (array->value + array->elem_size * i); + + /* protected by the proto types, should not happen */ + if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) { + WARN_ON(1); + return -EFAULT; + } + if (unlikely(!handle->css)) { + WARN_ON(1); + return -EFAULT; + } + + if (option & LANDLOCK_FLAG_OPT_REVERSE) { + cg1 = handle->css->cgroup; + cg2 = task_css_set(current)->dfl_cgrp; + } else { + cg1 = task_css_set(current)->dfl_cgrp; + cg2 = handle->css->cgroup; + } + + if (cgroup_is_descendant(cg1, cg2)) + return 0; + } + return 1; +} + +const struct bpf_func_proto bpf_landlock_cmp_cgroup_beneath_proto = { + .func = bpf_landlock_cmp_cgroup_beneath, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, + .arg2_type = ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP, + .arg3_type = ARG_ANYTHING, +}; + +#endif /* CONFIG_CGROUPS */ diff --git a/security/landlock/checker_cgroup.h b/security/landlock/checker_cgroup.h new file mode 100644 index 000000000000..497cad7c2bb8 --- /dev/null +++ b/security/landlock/checker_cgroup.h @@ -0,0 +1,18 @@ +/* + * Landlock LSM - cgroup Checkers + * + * Copyright (C) 2016 Mickaël Salaün <mic@digikod.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. + */ + +#ifdef CONFIG_CGROUPS +#ifndef _SECURITY_LANDLOCK_CHECKER_CGROUP_H +#define _SECURITY_LANDLOCK_CHECKER_CGROUP_H + +extern const struct bpf_func_proto bpf_landlock_cmp_cgroup_beneath_proto; + +#endif /* _SECURITY_LANDLOCK_CHECKER_CGROUP_H */ +#endif /* CONFIG_CGROUPS */ diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c index 8645743243b6..cc4759f4e6c5 100644 --- a/security/landlock/lsm.c +++ b/security/landlock/lsm.c @@ -18,6 +18,10 @@ #include "checker_fs.h" +#ifdef CONFIG_CGROUPS +#include "checker_cgroup.h" +#endif /* CONFIG_CGROUPS */ + #define LANDLOCK_HOOK_INIT(NAME) LSM_HOOK_INIT(NAME, landlock_hook_##NAME) #define LANDLOCK_HOOKx(X, NAME, CNAME, ...) \ @@ -124,6 +128,10 @@ static const struct bpf_func_proto *bpf_landlock_func_proto( return &bpf_landlock_cmp_fs_prop_with_struct_file_proto; case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file: return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto; + case BPF_FUNC_landlock_cmp_cgroup_beneath: +#ifdef CONFIG_CGROUPS + return &bpf_landlock_cmp_cgroup_beneath_proto; +#endif /* CONFIG_CGROUPS */ default: return NULL; }
Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) to compare the current process cgroup with a cgroup handle, The handle can match the current cgroup if it is the same or a child. This allows to make conditional rules according to the current cgroup. A cgroup handle is a map entry created from a file descriptor referring a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. An unprivileged process can create and manipulate cgroups thanks to cgroup delegation. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Kees Cook <keescook@chromium.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: James Morris <james.l.morris@oracle.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <daniel@iogearbox.net> --- include/linux/bpf.h | 8 ++++ include/uapi/linux/bpf.h | 15 ++++++ kernel/bpf/arraymap.c | 30 ++++++++++++ kernel/bpf/verifier.c | 6 +++ security/landlock/Kconfig | 3 ++ security/landlock/Makefile | 2 +- security/landlock/checker_cgroup.c | 96 ++++++++++++++++++++++++++++++++++++++ security/landlock/checker_cgroup.h | 18 +++++++ security/landlock/lsm.c | 8 ++++ 9 files changed, 185 insertions(+), 1 deletion(-) create mode 100644 security/landlock/checker_cgroup.c create mode 100644 security/landlock/checker_cgroup.h