[bpf-next,0/8] New BPF map and BTF security LSM hooks

Message ID	20230412043300.360803-1-andrii@kernel.org (mailing list archive)
Headers	show Return-Path: <linux-security-module-owner@vger.kernel.org> From: Andrii Nakryiko <andrii@kernel.org> To: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>, <kpsingh@kernel.org>, <keescook@chromium.org>, <paul@paul-moore.com> CC: <linux-security-module@vger.kernel.org>, Andrii Nakryiko <andrii@kernel.org> Subject: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks Date: Tue, 11 Apr 2023 21:32:52 -0700 Message-ID: <20230412043300.360803-1-andrii@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain Precedence: bulk
Series	New BPF map and BTF security LSM hooks \| expand [bpf-next,0/8] New BPF map and BTF security LSM hooks [bpf-next,1/8] bpf: move unprivileged checks into map_create() and bpf_prog_load() [bpf-next,2/8] bpf: inline map creation logic in map_create() function [bpf-next,3/8] bpf: centralize permissions checks for all BPF map types [bpf-next,4/8] bpf, lsm: implement bpf_map_create_security LSM hook [bpf-next,5/8] selftests/bpf: validate new bpf_map_create_security LSM hook [bpf-next,6/8] bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command [bpf-next,7/8] bpf, lsm: implement bpf_btf_load_security LSM hook [bpf-next,8/8] selftests/bpf: enhance lsm_map_create test with BTF LSM control

Andrii Nakryiko April 12, 2023, 4:32 a.m. UTC

Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
are meant to allow highly-granular LSM-based control over the usage of BPF
subsytem. Specifically, to control the creation of BPF maps and BTF data
objects, which are fundamental building blocks of any modern BPF application.

These new hooks are able to override default kernel-side CAP_BPF-based (and
sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
implement LSM policies that could granularly enforce more restrictions on
a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
capabilities), but also, importantly, allow to *bypass kernel-side
enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
cases. The decision about trust for a particular process is delegated to
custom LSM policy implementation. Such setup allows to implement safe and
highly-granular trust-based unprivileged BPF map creation, which is a first
step and a prerequisite towards implementing full-fledged trusted unprivileged
BPF application workflow. Similar approach seems to be implemented by some
other existing LSM hooks, e.g., vm_enough_memory().

Such LSM hook semantics gives ability to have safer-by-default policy of not
giving applications any of the CAP_BPF/CAP_PERFMON/CAP_NET_ADMIN capabilities,
normally required to be able to use BPF subsystem in the kernel. Instead, all
the BPF processes could be left completely unprivileged, and only allowlisted
exceptions for trusted and verified production use cases would be granted
permission to work with bpf() syscall, as if those application had root-like
capabilities. 

This patch set implements and demonstrates an overall approach starting with
BPF map and BTF object creation, first two steps in the lifetime of a typical
BPF applications. Next step would be to do similar changes for BPF_PROG_LOAD
command to allow BPF program loading and verificatlion. This will be
implemented in a follow up patch set and will follow the same approach as
implemented in this patch set.

Patches #1-#3 are refactorings that allow to add new LSM hook in one
centralized place. Patch #4 is where we add and implement LSM hook for
BPF_MAP_CREATE command. Patch #5 adds tests that validates that LSM hook works
as expected: we implement a trivial BPF LSM policy allowing unprivileged BPF
map creation for test_prog's process only. Patch #6 drops unnecessary CAP_BPF
restriction for BPF_MAP_FREEZE command, which seems to slip through the craack
during refactoring to remove extra capability restrictions for commands that
accept FDs of BPF objects. Patches #7 add bpf_btf_load_security LSM hook to
control BTF object load, and patch #8 adds extra tests for that hook.

Andrii Nakryiko (8):
  bpf: move unprivileged checks into map_create() and bpf_prog_load()
  bpf: inline map creation logic in map_create() function
  bpf: centralize permissions checks for all BPF map types
  bpf, lsm: implement bpf_map_create_security LSM hook
  selftests/bpf: validate new bpf_map_create_security LSM hook
  bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  bpf, lsm: implement bpf_btf_load_security LSM hook
  selftests/bpf: enhance lsm_map_create test with BTF LSM control

 include/linux/lsm_hook_defs.h                 |   2 +
 include/linux/lsm_hooks.h                     |  25 +++
 include/linux/security.h                      |  12 +
 kernel/bpf/bloom_filter.c                     |   3 -
 kernel/bpf/bpf_local_storage.c                |   3 -
 kernel/bpf/bpf_lsm.c                          |   2 +
 kernel/bpf/bpf_struct_ops.c                   |   3 -
 kernel/bpf/cpumap.c                           |   4 -
 kernel/bpf/devmap.c                           |   3 -
 kernel/bpf/hashtab.c                          |   6 -
 kernel/bpf/lpm_trie.c                         |   3 -
 kernel/bpf/queue_stack_maps.c                 |   4 -
 kernel/bpf/reuseport_array.c                  |   3 -
 kernel/bpf/stackmap.c                         |   3 -
 kernel/bpf/syscall.c                          | 177 ++++++++++-----
 net/core/sock_map.c                           |   4 -
 net/xdp/xskmap.c                              |   4 -
 security/security.c                           |   8 +
 .../selftests/bpf/prog_tests/lsm_map_create.c | 208 ++++++++++++++++++
 .../bpf/prog_tests/unpriv_bpf_disabled.c      |   6 +-
 tools/testing/selftests/bpf/progs/just_maps.c |  56 +++++
 .../selftests/bpf/progs/lsm_map_create.c      |  47 ++++
 tools/testing/selftests/bpf/test_progs.h      |   6 +
 23 files changed, 494 insertions(+), 98 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lsm_map_create.c
 create mode 100644 tools/testing/selftests/bpf/progs/just_maps.c
 create mode 100644 tools/testing/selftests/bpf/progs/lsm_map_create.c

Paul Moore April 12, 2023, 4:49 p.m. UTC | #1

On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> are meant to allow highly-granular LSM-based control over the usage of BPF
> subsytem. Specifically, to control the creation of BPF maps and BTF data
> objects, which are fundamental building blocks of any modern BPF application.
>
> These new hooks are able to override default kernel-side CAP_BPF-based (and
> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> implement LSM policies that could granularly enforce more restrictions on
> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> capabilities), but also, importantly, allow to *bypass kernel-side
> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> cases.

One of the hallmarks of the LSM has always been that it is
non-authoritative: it cannot unilaterally grant access, it can only
restrict what would have been otherwise permitted on a traditional
Linux system.  Put another way, a LSM should not undermine the Linux
discretionary access controls, e.g. capabilities.

If there is a problem with the eBPF capability-based access controls,
that problem needs to be addressed in how the core eBPF code
implements its capability checks, not by modifying the LSM mechanism
to bypass these checks.

Kees Cook April 12, 2023, 5:47 p.m. UTC | #2

On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > are meant to allow highly-granular LSM-based control over the usage of BPF
> > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > objects, which are fundamental building blocks of any modern BPF application.
> >
> > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > implement LSM policies that could granularly enforce more restrictions on
> > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > capabilities), but also, importantly, allow to *bypass kernel-side
> > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > cases.
> 
> One of the hallmarks of the LSM has always been that it is
> non-authoritative: it cannot unilaterally grant access, it can only
> restrict what would have been otherwise permitted on a traditional
> Linux system.  Put another way, a LSM should not undermine the Linux
> discretionary access controls, e.g. capabilities.
> 
> If there is a problem with the eBPF capability-based access controls,
> that problem needs to be addressed in how the core eBPF code
> implements its capability checks, not by modifying the LSM mechanism
> to bypass these checks.

I think semantics matter here. I wouldn't view this as _bypassing_
capability enforcement: it's just more fine-grained access control.

For example, in many places we have things like:

	if (!some_check(...) && !capable(...))
		return -EPERM;

I would expect this is a similar logic. An operation can succeed if the
access control requirement is met. The mismatch we have through-out the
kernel is that capability checks aren't strictly done by LSM hooks. And
this series conceptually, I think, doesn't violate that -- it's changing
the logic of the capability checks, not the LSM (i.e. there no LSM hooks
yet here).

The reason CAP_BPF was created was because there was nothing else that
would be fine-grained enough at the time.

Paul Moore April 12, 2023, 6:06 p.m. UTC | #3

On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > objects, which are fundamental building blocks of any modern BPF application.
> > >
> > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > implement LSM policies that could granularly enforce more restrictions on
> > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > cases.
> >
> > One of the hallmarks of the LSM has always been that it is
> > non-authoritative: it cannot unilaterally grant access, it can only
> > restrict what would have been otherwise permitted on a traditional
> > Linux system.  Put another way, a LSM should not undermine the Linux
> > discretionary access controls, e.g. capabilities.
> >
> > If there is a problem with the eBPF capability-based access controls,
> > that problem needs to be addressed in how the core eBPF code
> > implements its capability checks, not by modifying the LSM mechanism
> > to bypass these checks.
>
> I think semantics matter here. I wouldn't view this as _bypassing_
> capability enforcement: it's just more fine-grained access control.
>
> For example, in many places we have things like:
>
>         if (!some_check(...) && !capable(...))
>                 return -EPERM;
>
> I would expect this is a similar logic. An operation can succeed if the
> access control requirement is met. The mismatch we have through-out the
> kernel is that capability checks aren't strictly done by LSM hooks. And
> this series conceptually, I think, doesn't violate that -- it's changing
> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> yet here).

Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
when it returns a positive value "bypasses kernel checks".  The patch
isn't based on either Linus' tree or the LSM tree, I'm guessing it is
based on a eBPF tree, so I can't say with 100% certainty that it is
bypassing a capability check, but the description claims that to be
the case.

Regardless of how you want to spin this, I'm not supportive of a LSM
hook which allows a LSM to bypass a capability check.  A LSM hook can
be used to provide additional access control restrictions beyond a
capability check, but a LSM hook should never be allowed to overrule
an access denial due to a capability check.

> The reason CAP_BPF was created was because there was nothing else that
> would be fine-grained enough at the time.

The LSM layer predates CAP_BPF, and one could make a very solid
argument that one of the reasons LSMs exist is to provide
supplementary controls due to capability-based access controls being a
poor fit for many modern use cases.

--
paul-moore.com

Kees Cook April 12, 2023, 6:28 p.m. UTC | #4

On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > objects, which are fundamental building blocks of any modern BPF application.
> > > >
> > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > implement LSM policies that could granularly enforce more restrictions on
> > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > cases.
> > >
> > > One of the hallmarks of the LSM has always been that it is
> > > non-authoritative: it cannot unilaterally grant access, it can only
> > > restrict what would have been otherwise permitted on a traditional
> > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > discretionary access controls, e.g. capabilities.
> > >
> > > If there is a problem with the eBPF capability-based access controls,
> > > that problem needs to be addressed in how the core eBPF code
> > > implements its capability checks, not by modifying the LSM mechanism
> > > to bypass these checks.
> >
> > I think semantics matter here. I wouldn't view this as _bypassing_
> > capability enforcement: it's just more fine-grained access control.
> >
> > For example, in many places we have things like:
> >
> >         if (!some_check(...) && !capable(...))
> >                 return -EPERM;
> >
> > I would expect this is a similar logic. An operation can succeed if the
> > access control requirement is met. The mismatch we have through-out the
> > kernel is that capability checks aren't strictly done by LSM hooks. And
> > this series conceptually, I think, doesn't violate that -- it's changing
> > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > yet here).
> 
> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> when it returns a positive value "bypasses kernel checks".  The patch
> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> based on a eBPF tree, so I can't say with 100% certainty that it is
> bypassing a capability check, but the description claims that to be
> the case.
> 
> Regardless of how you want to spin this, I'm not supportive of a LSM
> hook which allows a LSM to bypass a capability check.  A LSM hook can
> be used to provide additional access control restrictions beyond a
> capability check, but a LSM hook should never be allowed to overrule
> an access denial due to a capability check.
> 
> > The reason CAP_BPF was created was because there was nothing else that
> > would be fine-grained enough at the time.
> 
> The LSM layer predates CAP_BPF, and one could make a very solid
> argument that one of the reasons LSMs exist is to provide
> supplementary controls due to capability-based access controls being a
> poor fit for many modern use cases.

I generally agree with what you say, but we DO have this code pattern:

         if (!some_check(...) && !capable(...))
                 return -EPERM;

It looks to me like this series can be refactored to do the same. I
wouldn't consider that to be a "bypass", but I would agree the current
series looks too much like "bypass", and makes reasoning about the
effect of the LSM hooks too "special". :)

Casey Schaufler April 12, 2023, 6:38 p.m. UTC | #5

On 4/12/2023 11:06 AM, Paul Moore wrote:
> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
>>>> objects, which are fundamental building blocks of any modern BPF application.
>>>>
>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
>>>> implement LSM policies that could granularly enforce more restrictions on
>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
>>>> capabilities), but also, importantly, allow to *bypass kernel-side
>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
>>>> cases.
>>> One of the hallmarks of the LSM has always been that it is
>>> non-authoritative: it cannot unilaterally grant access, it can only
>>> restrict what would have been otherwise permitted on a traditional
>>> Linux system.  Put another way, a LSM should not undermine the Linux
>>> discretionary access controls, e.g. capabilities.
>>>
>>> If there is a problem with the eBPF capability-based access controls,
>>> that problem needs to be addressed in how the core eBPF code
>>> implements its capability checks, not by modifying the LSM mechanism
>>> to bypass these checks.

Agreed. A lot of thought went into this. The LSM mechanism would be
vastly different if the hooks were authoritative instead of restrictive.

>> I think semantics matter here. I wouldn't view this as _bypassing_
>> capability enforcement: it's just more fine-grained access control.
>>
>> For example, in many places we have things like:
>>
>>         if (!some_check(...) && !capable(...))
>>                 return -EPERM;
>>
>> I would expect this is a similar logic. An operation can succeed if the
>> access control requirement is met. The mismatch we have through-out the
>> kernel is that capability checks aren't strictly done by LSM hooks. And
>> this series conceptually, I think, doesn't violate that -- it's changing
>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>> yet here).
> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> when it returns a positive value "bypasses kernel checks".  The patch
> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> based on a eBPF tree, so I can't say with 100% certainty that it is
> bypassing a capability check, but the description claims that to be
> the case.
>
> Regardless of how you want to spin this, I'm not supportive of a LSM
> hook which allows a LSM to bypass a capability check.  A LSM hook can
> be used to provide additional access control restrictions beyond a
> capability check, but a LSM hook should never be allowed to overrule
> an access denial due to a capability check.
>
>> The reason CAP_BPF was created was because there was nothing else that
>> would be fine-grained enough at the time.

There's nothing stopping you from having a fine grained mechanism that
further restricts a process with CAP_BPF. SELinux implements many checks
that can, policy willing, restrict a process with a capability from doing
what the capability permits.

> The LSM layer predates CAP_BPF, and one could make a very solid
> argument that one of the reasons LSMs exist is to provide
> supplementary controls due to capability-based access controls being a
> poor fit for many modern use cases.
>
> --
> paul-moore.com

Paul Moore April 12, 2023, 7:06 p.m. UTC | #6

On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > > objects, which are fundamental building blocks of any modern BPF application.
> > > > >
> > > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > > implement LSM policies that could granularly enforce more restrictions on
> > > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > > cases.
> > > >
> > > > One of the hallmarks of the LSM has always been that it is
> > > > non-authoritative: it cannot unilaterally grant access, it can only
> > > > restrict what would have been otherwise permitted on a traditional
> > > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > > discretionary access controls, e.g. capabilities.
> > > >
> > > > If there is a problem with the eBPF capability-based access controls,
> > > > that problem needs to be addressed in how the core eBPF code
> > > > implements its capability checks, not by modifying the LSM mechanism
> > > > to bypass these checks.
> > >
> > > I think semantics matter here. I wouldn't view this as _bypassing_
> > > capability enforcement: it's just more fine-grained access control.
> > >
> > > For example, in many places we have things like:
> > >
> > >         if (!some_check(...) && !capable(...))
> > >                 return -EPERM;
> > >
> > > I would expect this is a similar logic. An operation can succeed if the
> > > access control requirement is met. The mismatch we have through-out the
> > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > this series conceptually, I think, doesn't violate that -- it's changing
> > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > yet here).
> >
> > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > when it returns a positive value "bypasses kernel checks".  The patch
> > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > based on a eBPF tree, so I can't say with 100% certainty that it is
> > bypassing a capability check, but the description claims that to be
> > the case.
> >
> > Regardless of how you want to spin this, I'm not supportive of a LSM
> > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > be used to provide additional access control restrictions beyond a
> > capability check, but a LSM hook should never be allowed to overrule
> > an access denial due to a capability check.
> >
> > > The reason CAP_BPF was created was because there was nothing else that
> > > would be fine-grained enough at the time.
> >
> > The LSM layer predates CAP_BPF, and one could make a very solid
> > argument that one of the reasons LSMs exist is to provide
> > supplementary controls due to capability-based access controls being a
> > poor fit for many modern use cases.
>
> I generally agree with what you say, but we DO have this code pattern:
>
>          if (!some_check(...) && !capable(...))
>                  return -EPERM;

I think we need to make this more concrete; we don't have a pattern in
the upstream kernel where 'some_check(...)' is a LSM hook, right?
Simply because there is another kernel access control mechanism which
allows a capability check to be skipped doesn't mean I want to allow a
LSM hook to be used to skip a capability check.

> It looks to me like this series can be refactored to do the same. I
> wouldn't consider that to be a "bypass", but I would agree the current
> series looks too much like "bypass", and makes reasoning about the
> effect of the LSM hooks too "special". :)

Andrii Nakryiko April 13, 2023, 1:43 a.m. UTC | #7

On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > > >
> > > > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > > > objects, which are fundamental building blocks of any modern BPF application.
> > > > > >
> > > > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > > > implement LSM policies that could granularly enforce more restrictions on
> > > > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > > > cases.
> > > > >
> > > > > One of the hallmarks of the LSM has always been that it is
> > > > > non-authoritative: it cannot unilaterally grant access, it can only
> > > > > restrict what would have been otherwise permitted on a traditional
> > > > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > > > discretionary access controls, e.g. capabilities.
> > > > >
> > > > > If there is a problem with the eBPF capability-based access controls,
> > > > > that problem needs to be addressed in how the core eBPF code
> > > > > implements its capability checks, not by modifying the LSM mechanism
> > > > > to bypass these checks.
> > > >
> > > > I think semantics matter here. I wouldn't view this as _bypassing_
> > > > capability enforcement: it's just more fine-grained access control.

Exactly. One of the motivations for this work was the need to move
some production use cases that are only needing extra privileges so
that they can use BPF into a more restrictive environment. Granting
CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
for BPF usage is too coarse grained. These caps would allow those
applications way more than just BPF usage. So the idea here is more
finer-grained control of BPF-specific operations, granting *effective*
CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
production logic that would validate the use case.

This *is* an attempt to achieve a more secure production approach.

> > > >
> > > > For example, in many places we have things like:
> > > >
> > > >         if (!some_check(...) && !capable(...))
> > > >                 return -EPERM;
> > > >
> > > > I would expect this is a similar logic. An operation can succeed if the
> > > > access control requirement is met. The mismatch we have through-out the
> > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > yet here).
> > >
> > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > when it returns a positive value "bypasses kernel checks".  The patch
> > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > bypassing a capability check, but the description claims that to be
> > > the case.
> > >
> > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > be used to provide additional access control restrictions beyond a
> > > capability check, but a LSM hook should never be allowed to overrule
> > > an access denial due to a capability check.
> > >
> > > > The reason CAP_BPF was created was because there was nothing else that
> > > > would be fine-grained enough at the time.
> > >
> > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > argument that one of the reasons LSMs exist is to provide
> > > supplementary controls due to capability-based access controls being a
> > > poor fit for many modern use cases.
> >
> > I generally agree with what you say, but we DO have this code pattern:
> >
> >          if (!some_check(...) && !capable(...))
> >                  return -EPERM;
>
> I think we need to make this more concrete; we don't have a pattern in
> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> Simply because there is another kernel access control mechanism which
> allows a capability check to be skipped doesn't mean I want to allow a
> LSM hook to be used to skip a capability check.

This work is an attempt to tighten the security of production systems
by allowing to drop too coarse-grained and permissive capabilities
(like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
than production use cases are meant to be able to do) and then grant
specific BPF operations on specific BPF programs/maps based on custom
LSM security policy, which validates application trustworthiness using
custom production-specific logic.

Isn't this goal in line with LSMs mission to enhance system security?

>
> > It looks to me like this series can be refactored to do the same. I
> > wouldn't consider that to be a "bypass", but I would agree the current
> > series looks too much like "bypass", and makes reasoning about the
> > effect of the LSM hooks too "special". :)

Sorry, I didn't realize that the current code layout is making things
more confusing. I'll address feedback to make the intent a bit
clearer.

>
> --
> paul-moore.com

Paul Moore April 13, 2023, 2:56 a.m. UTC | #8

On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:

...

> > > > > For example, in many places we have things like:
> > > > >
> > > > >         if (!some_check(...) && !capable(...))
> > > > >                 return -EPERM;
> > > > >
> > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > access control requirement is met. The mismatch we have through-out the
> > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > yet here).
> > > >
> > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > bypassing a capability check, but the description claims that to be
> > > > the case.
> > > >
> > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > be used to provide additional access control restrictions beyond a
> > > > capability check, but a LSM hook should never be allowed to overrule
> > > > an access denial due to a capability check.
> > > >
> > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > would be fine-grained enough at the time.
> > > >
> > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > argument that one of the reasons LSMs exist is to provide
> > > > supplementary controls due to capability-based access controls being a
> > > > poor fit for many modern use cases.
> > >
> > > I generally agree with what you say, but we DO have this code pattern:
> > >
> > >          if (!some_check(...) && !capable(...))
> > >                  return -EPERM;
> >
> > I think we need to make this more concrete; we don't have a pattern in
> > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > Simply because there is another kernel access control mechanism which
> > allows a capability check to be skipped doesn't mean I want to allow a
> > LSM hook to be used to skip a capability check.
>
> This work is an attempt to tighten the security of production systems
> by allowing to drop too coarse-grained and permissive capabilities
> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> than production use cases are meant to be able to do) and then grant
> specific BPF operations on specific BPF programs/maps based on custom
> LSM security policy, which validates application trustworthiness using
> custom production-specific logic.

There are ways to leverage the LSMs to apply finer grained access
control on top of the relatively coarse capabilities that do not
require circumventing those capability controls.  One grants the
capabilities, just as one would do today, and then leverages the
security functionality of a LSM to further restrict specific users,
applications, etc. with a level of granularity beyond that offered by
the capability controls.

Andrii Nakryiko April 13, 2023, 5:16 a.m. UTC | #9

On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> ...
>
> > > > > > For example, in many places we have things like:
> > > > > >
> > > > > >         if (!some_check(...) && !capable(...))
> > > > > >                 return -EPERM;
> > > > > >
> > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > yet here).
> > > > >
> > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > bypassing a capability check, but the description claims that to be
> > > > > the case.
> > > > >
> > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > be used to provide additional access control restrictions beyond a
> > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > an access denial due to a capability check.
> > > > >
> > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > would be fine-grained enough at the time.
> > > > >
> > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > argument that one of the reasons LSMs exist is to provide
> > > > > supplementary controls due to capability-based access controls being a
> > > > > poor fit for many modern use cases.
> > > >
> > > > I generally agree with what you say, but we DO have this code pattern:
> > > >
> > > >          if (!some_check(...) && !capable(...))
> > > >                  return -EPERM;
> > >
> > > I think we need to make this more concrete; we don't have a pattern in
> > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > Simply because there is another kernel access control mechanism which
> > > allows a capability check to be skipped doesn't mean I want to allow a
> > > LSM hook to be used to skip a capability check.
> >
> > This work is an attempt to tighten the security of production systems
> > by allowing to drop too coarse-grained and permissive capabilities
> > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > than production use cases are meant to be able to do) and then grant
> > specific BPF operations on specific BPF programs/maps based on custom
> > LSM security policy, which validates application trustworthiness using
> > custom production-specific logic.
>
> There are ways to leverage the LSMs to apply finer grained access
> control on top of the relatively coarse capabilities that do not
> require circumventing those capability controls.  One grants the
> capabilities, just as one would do today, and then leverages the
> security functionality of a LSM to further restrict specific users,
> applications, etc. with a level of granularity beyond that offered by
> the capability controls.

Please help me understand something. What you and Casey are proposing,
when taken to the logical extreme, is to grant to all processes root
permissions and then use LSM to restrict specific actions, do I
understand correctly? This strikes me as a less secure and more
error-prone way of doing things. If there is some problem with
installing LSM policy, it could go unnoticed for a really long time,
while the system would be way more vulnerable. Why do you prefer such
an approach instead of going with no extra permissions by default, but
allowing custom LSM policy to grant few exceptions for known and
trusted use cases?

By the way, even the above proposal of yours doesn't work for
production use cases when user namespaces are involved, as far as I
understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
containers running inside user namespaces, as CAP_BPF in non-init
namespace is not enough for bpf() syscall to allow loading BPF maps or
BPF program (bpf() doesn't do ns_capable(), it's only using
capable()). What solution would you suggest for such production
setups?

Also, in previous email you said:

> Simply because there is another kernel access control mechanism which
> allows a capability check to be skipped doesn't mean I want to allow a
> LSM hook to be used to skip a capability check.

I understand your stated position, but can you please help me
understand the reasoning behind it? What would be wrong with some LSM
hooks granting effective capabilities? How would that change anything
about LSM design? As far as I can see, I'm not doing anything crazy
with my LSM hook implementation. It's reusing the standard
call_int_hook() mechanism very straightforwardly with a default result
of 0. And then just interprets 0, <0, and >0 results accordingly. Is
that abusing the LSM mechanism itself somehow?

Does the above also mean that you'd be fine if we just don't plug into
the LSM subsystem at all and instead come up with some ad-hoc solution
to allow effectively the same policies? This sounds detrimental both
to LSM and BPF subsystems, so I hope we can talk this through before
finalizing decisions.

Lastly, you mentioned before:

> > > I think we need to make this more concrete; we don't have a pattern in
> > > the upstream kernel where 'some_check(...)' is a LSM hook, right?

Unfortunately I don't have enough familiarity with all LSM hooks, so I
can't confirm or disprove the above statement. But earlier someone
brought to my attention the case of security_vm_enough_memory_mm(),
which seems to be granting effectively CAP_SYS_ADMIN for the purposes
of memory accounting. Am I missing something subtle there or does it
grant effective caps indeed?

>
> --
> paul-moore.com

Paul Moore April 13, 2023, 3:11 p.m. UTC | #10

On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > ...
> >
> > > > > > > For example, in many places we have things like:
> > > > > > >
> > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > >                 return -EPERM;
> > > > > > >
> > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > yet here).
> > > > > >
> > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > bypassing a capability check, but the description claims that to be
> > > > > > the case.
> > > > > >
> > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > be used to provide additional access control restrictions beyond a
> > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > an access denial due to a capability check.
> > > > > >
> > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > would be fine-grained enough at the time.
> > > > > >
> > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > supplementary controls due to capability-based access controls being a
> > > > > > poor fit for many modern use cases.
> > > > >
> > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > >
> > > > >          if (!some_check(...) && !capable(...))
> > > > >                  return -EPERM;
> > > >
> > > > I think we need to make this more concrete; we don't have a pattern in
> > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > Simply because there is another kernel access control mechanism which
> > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > LSM hook to be used to skip a capability check.
> > >
> > > This work is an attempt to tighten the security of production systems
> > > by allowing to drop too coarse-grained and permissive capabilities
> > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > than production use cases are meant to be able to do) and then grant
> > > specific BPF operations on specific BPF programs/maps based on custom
> > > LSM security policy, which validates application trustworthiness using
> > > custom production-specific logic.
> >
> > There are ways to leverage the LSMs to apply finer grained access
> > control on top of the relatively coarse capabilities that do not
> > require circumventing those capability controls.  One grants the
> > capabilities, just as one would do today, and then leverages the
> > security functionality of a LSM to further restrict specific users,
> > applications, etc. with a level of granularity beyond that offered by
> > the capability controls.
>
> Please help me understand something. What you and Casey are proposing,
> when taken to the logical extreme, is to grant to all processes root
> permissions and then use LSM to restrict specific actions, do I
> understand correctly? This strikes me as a less secure and more
> error-prone way of doing things.

When taken to the "logical extreme" most concepts end up sounding a
bit absurd, but that was the point, wasn't it?

Here is a fun story which seems relevant ... in the early days of
SELinux, one of the community devs setup up a system with a SELinux
policy which restricted all privileged operations from the root user,
put the system on a publicly accessible network, posted the root
password for all to see, and invited the public to login to the system
and attempt to exercise root privilege (it's been well over 10 years
at this point so the details are a bit fuzzy).  Granted, there were
some hiccups in the beginning, mostly due to the crude state of policy
development/analysis at the time, but after a few policy revisions the
system held up quite well.

On the more practical side of things, there are several use cases
which require, by way of legal or contractual requirements, that full
root/admin privileges are decomposed into separate roles: security
admin, audit admin, backup admin, etc.  These users satisfy these
requirements by using LSMs, such as SELinux, to restrict the
administrative capabilities based on the SELinux user/role/domain.

> By the way, even the above proposal of yours doesn't work for
> production use cases when user namespaces are involved, as far as I
> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> containers running inside user namespaces, as CAP_BPF in non-init
> namespace is not enough for bpf() syscall to allow loading BPF maps or
> BPF program ...

Once again, the LSM has always intended to be a restrictive mechanism,
not a privilege granting mechanism.  If an operation is not possible
without the LSM layer enabled, it should not be possible with the LSM
layer enabled.  The LSM is not a mechanism to circumvent other access
control mechanisms in the kernel.

> Also, in previous email you said:
>
> > Simply because there is another kernel access control mechanism which
> > allows a capability check to be skipped doesn't mean I want to allow a
> > LSM hook to be used to skip a capability check.
>
> I understand your stated position, but can you please help me
> understand the reasoning behind it?

Keeping the LSM as a restrictive access control mechanism helps ensure
some level of sanity and consistency across different Linux
installations.  If a certain operation requires CAP_SYS_ADMIN on one
Linux system, it should require CAP_SYS_ADMIN on another Linux system.
Granted, a LSM running on one system might impose additional
constraints on that operation, but the CAP_SYS_ADMIN requirement still
applies.

There is also an issue of safety in knowing that enabling a LSM will
not degrade the access controls on a system by potentially granting
operations that were previously denied.

> Does the above also mean that you'd be fine if we just don't plug into
> the LSM subsystem at all and instead come up with some ad-hoc solution
> to allow effectively the same policies? This sounds detrimental both
> to LSM and BPF subsystems, so I hope we can talk this through before
> finalizing decisions.

Based on your patches and our discussion, it seems to me that the
problem you are trying to resolve is related more to the
capability-based access controls in the eBPF, and possibly other
kernel subsystems, and not any LSM-based restrictions.  I'm happy to
work with you on a solution involving the LSM, but please understand
that I'm not going to support a solution which changes a core
philosophy of the LSM layer.

> Lastly, you mentioned before:
>
> > > > I think we need to make this more concrete; we don't have a pattern in
> > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
>
> Unfortunately I don't have enough familiarity with all LSM hooks, so I
> can't confirm or disprove the above statement. But earlier someone
> brought to my attention the case of security_vm_enough_memory_mm(),
> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> of memory accounting. Am I missing something subtle there or does it
> grant effective caps indeed?

Some of the comments around that hook can be misleading, but if you
look at the actual code it starts to make more sense.

First, look at the LSM-disabled case and you'll see that the
security_vm_enough_memory_mm() hook ends up looking like this:

int security_vm_enough_memory_mm(...)
{
  return __vm_enough_memory(mm, pages, cap_vm_enough_memory(mm, pages));
}

... which basically calls into the core capability code to check for
CAP_SYS_ADMIN, passing the result onto __vm_enough_memory.

If we then look at the LSM-enabled case, things are a little more
complicated, but it looks something like this:

int security_vm_enough_memory_mm(...)
{
  int cap_admin = 1;

  for_each_lsm_hook(...) {
    rc = lsm_hook(...);
    if (rc <= 0) {
      cap_admin = 0;
      break;
    }
  }

  return __vm_enough_memory(mm, pages, cap_admin);
}

... which as the comment says, "If all of the modules agree that it
should be set it will. If any module thinks it should not be set it
won't.".  However, if we look at which LSMs define vm_enough_memory()
hooks we see just two: the capability LSM, and SELinux.  The
capability LSM[1] just uses cap_vm_enough_memory() so that's
straightforward, and the SELinux hook is selinux_vm_enough_memory(),
which simply checks the loaded SELinux policy to see if the current
task has permission to exercise the CAP_SYS_ADMIN capability.  SELinux
can't grant CAP_SYS_ADMIN beyond what the capability code permits, it
only restricts its use.  Put another way, if the capability code does
not allow CAP_SYS_ADMIN in a call to security_vm_enough_memory() then
CAP_SYS_ADMIN will not be granted regardless of what the other LSMs
may decide.

I do agree that the security_vm_enough_memory() hook is structured a
bit differently than most of the other LSM hooks, but it still
operates with the same philosophy: a LSM should only be allowed to
restrict access, a LSM should never be allowed to grant access that
would otherwise be denied by the traditional Linux access controls.

Hopefully that explanation makes sense, but if things are still a bit
fuzzy I would encourage you to go look at the code, I'm sure it will
make sense once you spend a few minutes figuring out how it works.

[1] There is a long and sorta bizarre history with the capability LSM,
but just understand it is a bit "special" in many ways, and those
"special" behaviors are intentional.

--
paul-moore.com

Casey Schaufler April 13, 2023, 4:27 p.m. UTC | #11

On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
>>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
>>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
>>>>>>> objects, which are fundamental building blocks of any modern BPF application.
>>>>>>>
>>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
>>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
>>>>>>> implement LSM policies that could granularly enforce more restrictions on
>>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
>>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
>>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
>>>>>>> cases.
>>>>>> One of the hallmarks of the LSM has always been that it is
>>>>>> non-authoritative: it cannot unilaterally grant access, it can only
>>>>>> restrict what would have been otherwise permitted on a traditional
>>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
>>>>>> discretionary access controls, e.g. capabilities.
>>>>>>
>>>>>> If there is a problem with the eBPF capability-based access controls,
>>>>>> that problem needs to be addressed in how the core eBPF code
>>>>>> implements its capability checks, not by modifying the LSM mechanism
>>>>>> to bypass these checks.
>>>>> I think semantics matter here. I wouldn't view this as _bypassing_
>>>>> capability enforcement: it's just more fine-grained access control.
> Exactly. One of the motivations for this work was the need to move
> some production use cases that are only needing extra privileges so
> that they can use BPF into a more restrictive environment. Granting
> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> for BPF usage is too coarse grained. These caps would allow those
> applications way more than just BPF usage. So the idea here is more
> finer-grained control of BPF-specific operations, granting *effective*
> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> production logic that would validate the use case.

That's an authoritative model which is in direct conflict with the
design and implementation of both capabilities and LSM.

>
> This *is* an attempt to achieve a more secure production approach.
>
>>>>> For example, in many places we have things like:
>>>>>
>>>>>         if (!some_check(...) && !capable(...))
>>>>>                 return -EPERM;
>>>>>
>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>> access control requirement is met. The mismatch we have through-out the
>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>> yet here).
>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>> bypassing a capability check, but the description claims that to be
>>>> the case.
>>>>
>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>> be used to provide additional access control restrictions beyond a
>>>> capability check, but a LSM hook should never be allowed to overrule
>>>> an access denial due to a capability check.
>>>>
>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>> would be fine-grained enough at the time.
>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>> argument that one of the reasons LSMs exist is to provide
>>>> supplementary controls due to capability-based access controls being a
>>>> poor fit for many modern use cases.
>>> I generally agree with what you say, but we DO have this code pattern:
>>>
>>>          if (!some_check(...) && !capable(...))
>>>                  return -EPERM;
>> I think we need to make this more concrete; we don't have a pattern in
>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>> Simply because there is another kernel access control mechanism which
>> allows a capability check to be skipped doesn't mean I want to allow a
>> LSM hook to be used to skip a capability check.
> This work is an attempt to tighten the security of production systems
> by allowing to drop too coarse-grained and permissive capabilities
> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> than production use cases are meant to be able to do)

The BPF developers are in complete control of what CAP_BPF controls.
You can easily address the granularity issue by adding addition restrictions
on processes that have CAP_BPF. That is the intended use of LSM.
The whole point of having multiple capabilities is so that you can
grant just those that are required by the system security policy, and
do so safely. That leads to differences of opinion regarding the definition
of the system security policy. BPF chose to set itself up as an element
of security policy (you need CAP_BPF) rather than define elements such that
existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
control. 

>  and then grant
> specific BPF operations on specific BPF programs/maps based on custom
> LSM security policy,

This is backwards. The correct implementation is to require CAP_BPF and
further restrict BPF operations based on a custom LSM security policy.
That's how LSM is designed.

>  which validates application trustworthiness using
> custom production-specific logic.
>
> Isn't this goal in line with LSMs mission to enhance system security?

We're not arguing the goal, we're discussing the implementation.

>>> It looks to me like this series can be refactored to do the same. I
>>> wouldn't consider that to be a "bypass", but I would agree the current
>>> series looks too much like "bypass", and makes reasoning about the
>>> effect of the LSM hooks too "special". :)
> Sorry, I didn't realize that the current code layout is making things
> more confusing. I'll address feedback to make the intent a bit
> clearer.
>
>> --
>> paul-moore.com

Casey Schaufler April 13, 2023, 4:54 p.m. UTC | #12

On 4/12/2023 10:16 PM, Andrii Nakryiko wrote:
> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
>> <andrii.nakryiko@gmail.com> wrote:
>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>> ...
>>
>>>>>>> For example, in many places we have things like:
>>>>>>>
>>>>>>>         if (!some_check(...) && !capable(...))
>>>>>>>                 return -EPERM;
>>>>>>>
>>>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>>>> access control requirement is met. The mismatch we have through-out the
>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>>>> yet here).
>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>>>> bypassing a capability check, but the description claims that to be
>>>>>> the case.
>>>>>>
>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>>>> be used to provide additional access control restrictions beyond a
>>>>>> capability check, but a LSM hook should never be allowed to overrule
>>>>>> an access denial due to a capability check.
>>>>>>
>>>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>>>> would be fine-grained enough at the time.
>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>>>> argument that one of the reasons LSMs exist is to provide
>>>>>> supplementary controls due to capability-based access controls being a
>>>>>> poor fit for many modern use cases.
>>>>> I generally agree with what you say, but we DO have this code pattern:
>>>>>
>>>>>          if (!some_check(...) && !capable(...))
>>>>>                  return -EPERM;
>>>> I think we need to make this more concrete; we don't have a pattern in
>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>>> Simply because there is another kernel access control mechanism which
>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>> LSM hook to be used to skip a capability check.
>>> This work is an attempt to tighten the security of production systems
>>> by allowing to drop too coarse-grained and permissive capabilities
>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
>>> than production use cases are meant to be able to do) and then grant
>>> specific BPF operations on specific BPF programs/maps based on custom
>>> LSM security policy, which validates application trustworthiness using
>>> custom production-specific logic.
>> There are ways to leverage the LSMs to apply finer grained access
>> control on top of the relatively coarse capabilities that do not
>> require circumventing those capability controls.  One grants the
>> capabilities, just as one would do today, and then leverages the
>> security functionality of a LSM to further restrict specific users,
>> applications, etc. with a level of granularity beyond that offered by
>> the capability controls.
> Please help me understand something. What you and Casey are proposing,
> when taken to the logical extreme, is to grant to all processes root
> permissions and then use LSM to restrict specific actions, do I
> understand correctly?

No. You grant a process the capabilities it needs (CAP_BPF, CAP_WHATEVER)
and only those capabilities. If you want additional restrictions you include
an LSM that implements those restrictions. If you want finer control over
the operations controlled by CAP_BPF you include an LSM that implements
those controls.

>  This strikes me as a less secure and more
> error-prone way of doing things. If there is some problem with
> installing LSM policy,

LSMs are not required to have loadable or dynamic policies. That's
up to the developer.

>  it could go unnoticed for a really long time,
> while the system would be way more vulnerable.

There is no way Paul or I are going to solve the mis-configured system
problem.

>  Why do you prefer such
> an approach instead of going with no extra permissions by default, but
> allowing custom LSM policy to grant few exceptions for known and
> trusted use cases?

Because that's not how capabilities work. Capabilities are independent
of other controls. If you want to propose a change to how capabilities
work, you need to propose that to the capability maintainer.

Because that's not how LSMs work. LSMs implement additional restrictions
to the existing policy. The restrictive vs. authoritative debate was closed
long ago. It's a fundamental property of how LSMs work.

> By the way, even the above proposal of yours doesn't work for
> production use cases when user namespaces are involved, as far as I
> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> containers running inside user namespaces, as CAP_BPF in non-init
> namespace is not enough for bpf() syscall to allow loading BPF maps or
> BPF program (bpf() doesn't do ns_capable(), it's only using
> capable()). What solution would you suggest for such production
> setups?

If user namespaces don't work the way you'd like, you should take that
up with the namespace maintainers. Or, since this appears to be an issue
with BPF not being namespace aware, fix BPF's use of capable() and ns_capable().

> Also, in previous email you said:
>
>> Simply because there is another kernel access control mechanism which
>> allows a capability check to be skipped doesn't mean I want to allow a
>> LSM hook to be used to skip a capability check.
> I understand your stated position, but can you please help me
> understand the reasoning behind it? What would be wrong with some LSM
> hooks granting effective capabilities?

You keep asking the question and ignoring the answer. See above.

>  How would that change anything
> about LSM design? As far as I can see, I'm not doing anything crazy
> with my LSM hook implementation.

You keep asking the question and ignoring the answer. See above.


>  It's reusing the standard
> call_int_hook() mechanism very straightforwardly with a default result
> of 0. And then just interprets 0, <0, and >0 results accordingly. Is
> that abusing the LSM mechanism itself somehow?
>
> Does the above also mean that you'd be fine if we just don't plug into
> the LSM subsystem at all and instead come up with some ad-hoc solution
> to allow effectively the same policies?

No, because you would be breaking the capability system in that case.

There is an example of a feature that does just what you're suggesting.
POSIX ACLs aren't an LSM because they don't just add restrictions, they
change the semantics of the file mode bits. Look at that implementation
before you seriously consider going that route.

>  This sounds detrimental both
> to LSM and BPF subsystems, so I hope we can talk this through before
> finalizing decisions.
>
> Lastly, you mentioned before:
>
>>>> I think we need to make this more concrete; we don't have a pattern in
>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> Unfortunately I don't have enough familiarity with all LSM hooks, so I
> can't confirm or disprove the above statement. But earlier someone
> brought to my attention the case of security_vm_enough_memory_mm(),
> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> of memory accounting. Am I missing something subtle there or does it
> grant effective caps indeed?
>
>
>
>
>> --
>> paul-moore.com

Jonathan Corbet April 13, 2023, 7:03 p.m. UTC | #13

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> Why do you prefer such
> an approach instead of going with no extra permissions by default, but
> allowing custom LSM policy to grant few exceptions for known and
> trusted use cases?

Should you be curious, you can find some of the history of the "no
authoritative hooks" policy at:

  https://lwn.net/2001/1108/kernel.php3

It was fairly heatedly discussed at the time.

jon

Dr. Greg April 14, 2023, 8:23 p.m. UTC | #14

On Wed, Apr 12, 2023 at 10:47:13AM -0700, Kees Cook wrote:

Hi, I hope the week is ending well for everyone.

> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > On Wed, Apr 12, 2023 at 12:33???AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > objects, which are fundamental building blocks of any modern BPF application.
> > >
> > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > implement LSM policies that could granularly enforce more restrictions on
> > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > cases.
> > 
> > One of the hallmarks of the LSM has always been that it is
> > non-authoritative: it cannot unilaterally grant access, it can only
> > restrict what would have been otherwise permitted on a traditional
> > Linux system.  Put another way, a LSM should not undermine the Linux
> > discretionary access controls, e.g. capabilities.
> > 
> > If there is a problem with the eBPF capability-based access controls,
> > that problem needs to be addressed in how the core eBPF code
> > implements its capability checks, not by modifying the LSM mechanism
> > to bypass these checks.

> I think semantics matter here. I wouldn't view this as _bypassing_
> capability enforcement: it's just more fine-grained access control.
> 
> For example, in many places we have things like:
> 
> 	if (!some_check(...) && !capable(...))
> 		return -EPERM;
> 
> I would expect this is a similar logic. An operation can succeed if the
> access control requirement is met. The mismatch we have through-out the
> kernel is that capability checks aren't strictly done by LSM hooks. And
> this series conceptually, I think, doesn't violate that -- it's changing
> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> yet here).
> 
> The reason CAP_BPF was created was because there was nothing else that
> would be fine-grained enough at the time.

This was one of the issues, among others, that the TSEM LSM we are
working to upstream, was designed to address and may be an avenue
forward.

TSEM, being narratival rather than deontologically based, provides a
framework for security permissions that are based on a
characterization of the event itself.  So the permissions are as
variable as the contents of whatever BPF related information is passed
to the bpf* LSM hooks [1].

Currently, the tsem_bpf_* hooks are generically modeled.  We would
certainly entertain any discussion or suggestions as to what elements
of the structures passed to the hooks would be useful with respect
to establishing security policies useful and appropriate to the BPF
community.

We don't want to get in the middle of the restrictive
vs. authoritative debate, but it would seem that the jury is
conclusively in on that issue and LSM hooks are not going to be
allowed to dismiss, or modify, any other security controls.

Hopefully the BPF ABI isn't tied to CAP_BPF as that would seem to make
it problematic to make controls more granular.

> Kees Cook

Have a good weekend.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

[1]: Plus developers don't need to write security policies, you test
your application in order to get the desired controls for a workload.

Andrii Nakryiko April 17, 2023, 11:28 p.m. UTC | #15

On Thu, Apr 13, 2023 at 12:03 PM Jonathan Corbet <corbet@lwn.net> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > Why do you prefer such
> > an approach instead of going with no extra permissions by default, but
> > allowing custom LSM policy to grant few exceptions for known and
> > trusted use cases?
>
> Should you be curious, you can find some of the history of the "no
> authoritative hooks" policy at:
>
>   https://lwn.net/2001/1108/kernel.php3
>
> It was fairly heatedly discussed at the time.
>

Thanks, Jonathan! Yes, it was very useful to get a bit of context.


> jon

Andrii Nakryiko April 17, 2023, 11:29 p.m. UTC | #16

On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > ...
> > >
> > > > > > > > For example, in many places we have things like:
> > > > > > > >
> > > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > > >                 return -EPERM;
> > > > > > > >
> > > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > > yet here).
> > > > > > >
> > > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > > bypassing a capability check, but the description claims that to be
> > > > > > > the case.
> > > > > > >
> > > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > > be used to provide additional access control restrictions beyond a
> > > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > > an access denial due to a capability check.
> > > > > > >
> > > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > > would be fine-grained enough at the time.
> > > > > > >
> > > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > > supplementary controls due to capability-based access controls being a
> > > > > > > poor fit for many modern use cases.
> > > > > >
> > > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > > >
> > > > > >          if (!some_check(...) && !capable(...))
> > > > > >                  return -EPERM;
> > > > >
> > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > > Simply because there is another kernel access control mechanism which
> > > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > > LSM hook to be used to skip a capability check.
> > > >
> > > > This work is an attempt to tighten the security of production systems
> > > > by allowing to drop too coarse-grained and permissive capabilities
> > > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > > than production use cases are meant to be able to do) and then grant
> > > > specific BPF operations on specific BPF programs/maps based on custom
> > > > LSM security policy, which validates application trustworthiness using
> > > > custom production-specific logic.
> > >
> > > There are ways to leverage the LSMs to apply finer grained access
> > > control on top of the relatively coarse capabilities that do not
> > > require circumventing those capability controls.  One grants the
> > > capabilities, just as one would do today, and then leverages the
> > > security functionality of a LSM to further restrict specific users,
> > > applications, etc. with a level of granularity beyond that offered by
> > > the capability controls.
> >
> > Please help me understand something. What you and Casey are proposing,
> > when taken to the logical extreme, is to grant to all processes root
> > permissions and then use LSM to restrict specific actions, do I
> > understand correctly? This strikes me as a less secure and more
> > error-prone way of doing things.
>
> When taken to the "logical extreme" most concepts end up sounding a
> bit absurd, but that was the point, wasn't it?

Wasn't my intent to make it sound absurd, sorry. The way I see it, for
the sake of example, let's say CAP_BPF allows 20 different operations
(each with its own security_xxx hook). And let's say in production I
want to only allow 3 of them. Sure, technically it should be possible
to deny access at 17 hooks and let it through in just those 3. But if
someone adds 21st and I forget to add 21st restriction, that would be
bad (but very probably with such approach).

So my point is that for situations like this, dropping CAP_BPF, but
allowing only 3 hooks to proceed seems a safer approach, because if we
add 21st hook, it will safely be denied without CAP_BPF *by default*.
That's what I tried to point out.

But even if we ignore this "safe by default when a new hook is added"
behavior, when taking user namespaces into account, the restrictive
LSM approach just doesn't seem to work at all for something like
CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
because we cannot ensure that a given BPF program won't access kernel
state "belonging" to another process (as one example).

Now, thanks to Jonathan, I get that there was a heated discussion 20
years ago about authoritative vs restrictive LSMs. But if I read a
summary at that time ([0]), authoritative hooks were not out of the
question *in principle*. Surely, "walk before we can run" makes sense,
but it's been a while ago.

  [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3


>
> Here is a fun story which seems relevant ... in the early days of
> SELinux, one of the community devs setup up a system with a SELinux
> policy which restricted all privileged operations from the root user,
> put the system on a publicly accessible network, posted the root
> password for all to see, and invited the public to login to the system
> and attempt to exercise root privilege (it's been well over 10 years
> at this point so the details are a bit fuzzy).  Granted, there were
> some hiccups in the beginning, mostly due to the crude state of policy
> development/analysis at the time, but after a few policy revisions the
> system held up quite well.

Honest question out of curiosity: was the intent to demonstrate that
with LSM one can completely restrict root? Or that root was actually
allowed to do something useful? Because I can see how rejecting
everything would be rather simple, but actually pretty useless in
practice. Restricting only part of the power of the root, while still
allowing it to do something useful in production seems like a much
harder (but way more valuable) endeavor. Not saying it's impossible,
but see my example about missing 21st new CAP_BPF functionality.

>
> On the more practical side of things, there are several use cases
> which require, by way of legal or contractual requirements, that full
> root/admin privileges are decomposed into separate roles: security
> admin, audit admin, backup admin, etc.  These users satisfy these
> requirements by using LSMs, such as SELinux, to restrict the
> administrative capabilities based on the SELinux user/role/domain.
>
> > By the way, even the above proposal of yours doesn't work for
> > production use cases when user namespaces are involved, as far as I
> > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > containers running inside user namespaces, as CAP_BPF in non-init
> > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > BPF program ...
>
> Once again, the LSM has always intended to be a restrictive mechanism,
> not a privilege granting mechanism.  If an operation is not possible

Not according to [0] above:

  > It is our belief that these changes do not belong in the initial version of
  > LSM (especially given our limited charter and original goals), and should
  > be proposed as incremental refinements after LSM has been initially
  > accepted.
  > ...
  > It is our belief that the current LSM
  > will provide a meaningful improvement in the security infrastructure of the
  > Linux kernel, and that there is plenty of room for future expansion of LSM
  > in subsequent phases.

I don't see "always intended to be a restrictive mechanism" there.

> without the LSM layer enabled, it should not be possible with the LSM
> layer enabled.  The LSM is not a mechanism to circumvent other access
> control mechanisms in the kernel.

I understand, but it's not like we are proposing to go and bypass all
kinds of random kernel security mechanisms. These are targeted hooks,
developed by the BPF community for the BPF subsystem to allow trusted
unprivileged production use cases. Yes, we currently rely on checking
CAP_BPF to grant more dangerous/advanced features, but it's because we
can't just allow any unprivileged process to do this. But what we
really want is to answer the question "can we trust this process to
use this advanced functionality", and if there is no specific LSM
policy that cares one way (allow) or the other (disallow), fallback to
CAP_BPF enforcement.

So it's not bypassing kernel checks, but rather augmenting them with
more flexible and customizable mechanisms, while still falling back to
CAP_BPF if the user didn't install any custom LSM policy.

>
> > Also, in previous email you said:
> >
> > > Simply because there is another kernel access control mechanism which
> > > allows a capability check to be skipped doesn't mean I want to allow a
> > > LSM hook to be used to skip a capability check.
> >
> > I understand your stated position, but can you please help me
> > understand the reasoning behind it?
>
> Keeping the LSM as a restrictive access control mechanism helps ensure
> some level of sanity and consistency across different Linux
> installations.  If a certain operation requires CAP_SYS_ADMIN on one
> Linux system, it should require CAP_SYS_ADMIN on another Linux system.
> Granted, a LSM running on one system might impose additional
> constraints on that operation, but the CAP_SYS_ADMIN requirement still
> applies.
>
> There is also an issue of safety in knowing that enabling a LSM will
> not degrade the access controls on a system by potentially granting
> operations that were previously denied.
>
> > Does the above also mean that you'd be fine if we just don't plug into
> > the LSM subsystem at all and instead come up with some ad-hoc solution
> > to allow effectively the same policies? This sounds detrimental both
> > to LSM and BPF subsystems, so I hope we can talk this through before
> > finalizing decisions.
>
> Based on your patches and our discussion, it seems to me that the
> problem you are trying to resolve is related more to the
> capability-based access controls in the eBPF, and possibly other
> kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> work with you on a solution involving the LSM, but please understand
> that I'm not going to support a solution which changes a core
> philosophy of the LSM layer.

Great, I'd really appreciate help and suggestions on how to solve the
following problem.

We have a BPF subsystem that allows loading BPF programs. Those BPF
programs cannot be contained within a particular namespace just by its
system-wide tracing nature (it can safely read kernel and user memory
and we can't restrict whether that memory belongs to a particular
namespace), so it's like CAP_SYS_TIME, just with much broader API
surface.

The other piece of a puzzle is user namespaces. We do want to run
applications inside user namespaces, but allow them to use BPF
programs. As far as I can tell, there is no way to grant real CAP_BPF
that will be recognized by capable(CAP_BPF) (not ns_capable, see above
about system-wide nature of BPF). If there is, please help me
understand how. All my local experiments failed, and looking at
cap_capable() implementation it is not intended to even check the
initial namespace's capability if the process is running in the user
namespace.


So, given that a) we can't make CAP_BPF namespace-aware and b) we
can't grant real CAP_BPF to processes in user namespace, how could we
allow user namespaced applications to do useful work with BPF?

>
> > Lastly, you mentioned before:
> >
> > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >
> > Unfortunately I don't have enough familiarity with all LSM hooks, so I
> > can't confirm or disprove the above statement. But earlier someone
> > brought to my attention the case of security_vm_enough_memory_mm(),
> > which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> > of memory accounting. Am I missing something subtle there or does it
> > grant effective caps indeed?
>
> Some of the comments around that hook can be misleading, but if you
> look at the actual code it starts to make more sense.
>

[...]

>
> I do agree that the security_vm_enough_memory() hook is structured a
> bit differently than most of the other LSM hooks, but it still
> operates with the same philosophy: a LSM should only be allowed to
> restrict access, a LSM should never be allowed to grant access that
> would otherwise be denied by the traditional Linux access controls.
>
> Hopefully that explanation makes sense, but if things are still a bit
> fuzzy I would encourage you to go look at the code, I'm sure it will
> make sense once you spend a few minutes figuring out how it works.
>

Yep, thanks a lot, it's way more clear after grokking relevant pieces
of LSM the code you pointed out and LSM infrastructure in general.
"capabilities" LSM is non-negotiable, so it effectively always
restricts a small subset of hooks, including vm_enough_memory and
capable.

Still, the problem still stands. How do we marry BPF and user
namespaces? I'd really appreciate suggestions. Thank you!


> [1] There is a long and sorta bizarre history with the capability LSM,
> but just understand it is a bit "special" in many ways, and those
> "special" behaviors are intentional.
>
> --
> paul-moore.com

Andrii Nakryiko April 17, 2023, 11:31 p.m. UTC | #17

On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> >>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
> >>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
> >>>>>>> objects, which are fundamental building blocks of any modern BPF application.
> >>>>>>>
> >>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
> >>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> >>>>>>> implement LSM policies that could granularly enforce more restrictions on
> >>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> >>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
> >>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> >>>>>>> cases.
> >>>>>> One of the hallmarks of the LSM has always been that it is
> >>>>>> non-authoritative: it cannot unilaterally grant access, it can only
> >>>>>> restrict what would have been otherwise permitted on a traditional
> >>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
> >>>>>> discretionary access controls, e.g. capabilities.
> >>>>>>
> >>>>>> If there is a problem with the eBPF capability-based access controls,
> >>>>>> that problem needs to be addressed in how the core eBPF code
> >>>>>> implements its capability checks, not by modifying the LSM mechanism
> >>>>>> to bypass these checks.
> >>>>> I think semantics matter here. I wouldn't view this as _bypassing_
> >>>>> capability enforcement: it's just more fine-grained access control.
> > Exactly. One of the motivations for this work was the need to move
> > some production use cases that are only needing extra privileges so
> > that they can use BPF into a more restrictive environment. Granting
> > CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> > for BPF usage is too coarse grained. These caps would allow those
> > applications way more than just BPF usage. So the idea here is more
> > finer-grained control of BPF-specific operations, granting *effective*
> > CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> > production logic that would validate the use case.
>
> That's an authoritative model which is in direct conflict with the
> design and implementation of both capabilities and LSM.
>
> >
> > This *is* an attempt to achieve a more secure production approach.
> >
> >>>>> For example, in many places we have things like:
> >>>>>
> >>>>>         if (!some_check(...) && !capable(...))
> >>>>>                 return -EPERM;
> >>>>>
> >>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>> access control requirement is met. The mismatch we have through-out the
> >>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>> yet here).
> >>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>> bypassing a capability check, but the description claims that to be
> >>>> the case.
> >>>>
> >>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>> be used to provide additional access control restrictions beyond a
> >>>> capability check, but a LSM hook should never be allowed to overrule
> >>>> an access denial due to a capability check.
> >>>>
> >>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>> would be fine-grained enough at the time.
> >>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>> argument that one of the reasons LSMs exist is to provide
> >>>> supplementary controls due to capability-based access controls being a
> >>>> poor fit for many modern use cases.
> >>> I generally agree with what you say, but we DO have this code pattern:
> >>>
> >>>          if (!some_check(...) && !capable(...))
> >>>                  return -EPERM;
> >> I think we need to make this more concrete; we don't have a pattern in
> >> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >> Simply because there is another kernel access control mechanism which
> >> allows a capability check to be skipped doesn't mean I want to allow a
> >> LSM hook to be used to skip a capability check.
> > This work is an attempt to tighten the security of production systems
> > by allowing to drop too coarse-grained and permissive capabilities
> > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > than production use cases are meant to be able to do)
>
> The BPF developers are in complete control of what CAP_BPF controls.
> You can easily address the granularity issue by adding addition restrictions
> on processes that have CAP_BPF. That is the intended use of LSM.
> The whole point of having multiple capabilities is so that you can
> grant just those that are required by the system security policy, and
> do so safely. That leads to differences of opinion regarding the definition
> of the system security policy. BPF chose to set itself up as an element
> of security policy (you need CAP_BPF) rather than define elements such that
> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
> control.

Please see my reply to Paul, where I explain CAP_BPF's system-wide
nature and problem with user namespaces. I don't think the problem is
in the granularity of CAP_BPF, it's more of a "non-namespaceable"
nature of the BPF subsystem in general.

>
> >  and then grant
> > specific BPF operations on specific BPF programs/maps based on custom
> > LSM security policy,
>
> This is backwards. The correct implementation is to require CAP_BPF and
> further restrict BPF operations based on a custom LSM security policy.
> That's how LSM is designed.

Please see my reply to Paul, we can't grant real CAP_BPF for
applications in user namespace (unless there is some trick that I
don't know, so please do point it out). Let's converge the discussion
in that email thread branch to not discuss the same topic multiple
times.


>
> >  which validates application trustworthiness using
> > custom production-specific logic.
> >
> > Isn't this goal in line with LSMs mission to enhance system security?
>
> We're not arguing the goal, we're discussing the implementation.
>
> >>> It looks to me like this series can be refactored to do the same. I
> >>> wouldn't consider that to be a "bypass", but I would agree the current
> >>> series looks too much like "bypass", and makes reasoning about the
> >>> effect of the LSM hooks too "special". :)
> > Sorry, I didn't realize that the current code layout is making things
> > more confusing. I'll address feedback to make the intent a bit
> > clearer.
> >
> >> --
> >> paul-moore.com

Andrii Nakryiko April 17, 2023, 11:31 p.m. UTC | #18

On Thu, Apr 13, 2023 at 9:54 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/12/2023 10:16 PM, Andrii Nakryiko wrote:
> > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> >> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> >> <andrii.nakryiko@gmail.com> wrote:
> >>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >> ...
> >>
> >>>>>>> For example, in many places we have things like:
> >>>>>>>
> >>>>>>>         if (!some_check(...) && !capable(...))
> >>>>>>>                 return -EPERM;
> >>>>>>>
> >>>>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>>>> access control requirement is met. The mismatch we have through-out the
> >>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>>>> yet here).
> >>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>>>> bypassing a capability check, but the description claims that to be
> >>>>>> the case.
> >>>>>>
> >>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>>>> be used to provide additional access control restrictions beyond a
> >>>>>> capability check, but a LSM hook should never be allowed to overrule
> >>>>>> an access denial due to a capability check.
> >>>>>>
> >>>>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>>>> would be fine-grained enough at the time.
> >>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>>>> argument that one of the reasons LSMs exist is to provide
> >>>>>> supplementary controls due to capability-based access controls being a
> >>>>>> poor fit for many modern use cases.
> >>>>> I generally agree with what you say, but we DO have this code pattern:
> >>>>>
> >>>>>          if (!some_check(...) && !capable(...))
> >>>>>                  return -EPERM;
> >>>> I think we need to make this more concrete; we don't have a pattern in
> >>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>>> Simply because there is another kernel access control mechanism which
> >>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>> LSM hook to be used to skip a capability check.
> >>> This work is an attempt to tighten the security of production systems
> >>> by allowing to drop too coarse-grained and permissive capabilities
> >>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> >>> than production use cases are meant to be able to do) and then grant
> >>> specific BPF operations on specific BPF programs/maps based on custom
> >>> LSM security policy, which validates application trustworthiness using
> >>> custom production-specific logic.
> >> There are ways to leverage the LSMs to apply finer grained access
> >> control on top of the relatively coarse capabilities that do not
> >> require circumventing those capability controls.  One grants the
> >> capabilities, just as one would do today, and then leverages the
> >> security functionality of a LSM to further restrict specific users,
> >> applications, etc. with a level of granularity beyond that offered by
> >> the capability controls.
> > Please help me understand something. What you and Casey are proposing,
> > when taken to the logical extreme, is to grant to all processes root
> > permissions and then use LSM to restrict specific actions, do I
> > understand correctly?
>
> No. You grant a process the capabilities it needs (CAP_BPF, CAP_WHATEVER)
> and only those capabilities. If you want additional restrictions you include
> an LSM that implements those restrictions. If you want finer control over
> the operations controlled by CAP_BPF you include an LSM that implements
> those controls.
>

See previous replies. We can't grant CAP_BPF, even if we wanted to, if
the process is in a user namespace.

> >  This strikes me as a less secure and more
> > error-prone way of doing things. If there is some problem with
> > installing LSM policy,
>
> LSMs are not required to have loadable or dynamic policies. That's
> up to the developer.
>

Sure, but having a more dynamic policy is a very attractive feature
and one of the reasons for people to use BPF LSM. So it might not be
required, but it's something that people are using in practice, so if
we can make all this less error-prone, that would be better for
everyone.

> >  it could go unnoticed for a really long time,
> > while the system would be way more vulnerable.
>
> There is no way Paul or I are going to solve the mis-configured system
> problem.
>

Please see my example about (hypothetical) 21st added hook that is
very easy to miss, because the kernel is big and there are tons of
people doing development, and so it's no wonder that users might miss
a new hook they are supposed to restrict.

But again, even with all that said, granting CAP_BPF is impossible for
user namespaced applications.

> >  Why do you prefer such
> > an approach instead of going with no extra permissions by default, but
> > allowing custom LSM policy to grant few exceptions for known and
> > trusted use cases?
>
> Because that's not how capabilities work. Capabilities are independent
> of other controls. If you want to propose a change to how capabilities
> work, you need to propose that to the capability maintainer.
>
> Because that's not how LSMs work. LSMs implement additional restrictions
> to the existing policy. The restrictive vs. authoritative debate was closed
> long ago. It's a fundamental property of how LSMs work.

There doesn't seem to be anything fundamentally and technically
preventing LSM hooks to say "yep, looks good, no need to fallback to
CAP_BPF checks due to lack of other signal". [0] also outright said
that authoritative hooks can be the next step, but didn't reject it
outright.

  [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3


>
> > By the way, even the above proposal of yours doesn't work for
> > production use cases when user namespaces are involved, as far as I
> > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > containers running inside user namespaces, as CAP_BPF in non-init
> > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > BPF program (bpf() doesn't do ns_capable(), it's only using
> > capable()). What solution would you suggest for such production
> > setups?
>
> If user namespaces don't work the way you'd like, you should take that
> up with the namespace maintainers. Or, since this appears to be an issue
> with BPF not being namespace aware, fix BPF's use of capable() and ns_capable().

Can't be fixed on the BPF side, unfortunately. Don't know enough about
namespaces to tell if it's a bug or feature that root CAP_BPF can't be
checked from inside userns. So yep, I should perhaps ask.

>
> > Also, in previous email you said:
> >
> >> Simply because there is another kernel access control mechanism which
> >> allows a capability check to be skipped doesn't mean I want to allow a
> >> LSM hook to be used to skip a capability check.
> > I understand your stated position, but can you please help me
> > understand the reasoning behind it? What would be wrong with some LSM
> > hooks granting effective capabilities?
>
> You keep asking the question and ignoring the answer. See above.
>
> >  How would that change anything
> > about LSM design? As far as I can see, I'm not doing anything crazy
> > with my LSM hook implementation.
>
> You keep asking the question and ignoring the answer. See above.
>
>
> >  It's reusing the standard
> > call_int_hook() mechanism very straightforwardly with a default result
> > of 0. And then just interprets 0, <0, and >0 results accordingly. Is
> > that abusing the LSM mechanism itself somehow?
> >
> > Does the above also mean that you'd be fine if we just don't plug into
> > the LSM subsystem at all and instead come up with some ad-hoc solution
> > to allow effectively the same policies?
>
> No, because you would be breaking the capability system in that case.
>
> There is an example of a feature that does just what you're suggesting.
> POSIX ACLs aren't an LSM because they don't just add restrictions, they
> change the semantics of the file mode bits. Look at that implementation
> before you seriously consider going that route.

Are you referring to posix_acl_permission() and fs/posix_acl.c? I'll
take a look, not familiar. Thanks for the suggestion!

I'd still prefer to avoid building a new access control system just
for BPF, of course. But let me take a look at the code and see what
you are referring to.

>
> >  This sounds detrimental both
> > to LSM and BPF subsystems, so I hope we can talk this through before
> > finalizing decisions.
> >
> > Lastly, you mentioned before:
> >
> >>>> I think we need to make this more concrete; we don't have a pattern in
> >>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > Unfortunately I don't have enough familiarity with all LSM hooks, so I
> > can't confirm or disprove the above statement. But earlier someone
> > brought to my attention the case of security_vm_enough_memory_mm(),
> > which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> > of memory accounting. Am I missing something subtle there or does it
> > grant effective caps indeed?
> >
> >
> >
> >
> >> --
> >> paul-moore.com

Andrii Nakryiko April 17, 2023, 11:31 p.m. UTC | #19

On Fri, Apr 14, 2023 at 1:24 PM Dr. Greg <greg@enjellic.com> wrote:
>
> On Wed, Apr 12, 2023 at 10:47:13AM -0700, Kees Cook wrote:
>
> Hi, I hope the week is ending well for everyone.
>
> > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > On Wed, Apr 12, 2023 at 12:33???AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > objects, which are fundamental building blocks of any modern BPF application.
> > > >
> > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > implement LSM policies that could granularly enforce more restrictions on
> > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > cases.
> > >
> > > One of the hallmarks of the LSM has always been that it is
> > > non-authoritative: it cannot unilaterally grant access, it can only
> > > restrict what would have been otherwise permitted on a traditional
> > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > discretionary access controls, e.g. capabilities.
> > >
> > > If there is a problem with the eBPF capability-based access controls,
> > > that problem needs to be addressed in how the core eBPF code
> > > implements its capability checks, not by modifying the LSM mechanism
> > > to bypass these checks.
>
> > I think semantics matter here. I wouldn't view this as _bypassing_
> > capability enforcement: it's just more fine-grained access control.
> >
> > For example, in many places we have things like:
> >
> >       if (!some_check(...) && !capable(...))
> >               return -EPERM;
> >
> > I would expect this is a similar logic. An operation can succeed if the
> > access control requirement is met. The mismatch we have through-out the
> > kernel is that capability checks aren't strictly done by LSM hooks. And
> > this series conceptually, I think, doesn't violate that -- it's changing
> > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > yet here).
> >
> > The reason CAP_BPF was created was because there was nothing else that
> > would be fine-grained enough at the time.
>
> This was one of the issues, among others, that the TSEM LSM we are
> working to upstream, was designed to address and may be an avenue
> forward.
>
> TSEM, being narratival rather than deontologically based, provides a
> framework for security permissions that are based on a
> characterization of the event itself.  So the permissions are as
> variable as the contents of whatever BPF related information is passed
> to the bpf* LSM hooks [1].
>
> Currently, the tsem_bpf_* hooks are generically modeled.  We would
> certainly entertain any discussion or suggestions as to what elements
> of the structures passed to the hooks would be useful with respect
> to establishing security policies useful and appropriate to the BPF
> community.

Could you please provide some links to get a bit more context and
information? I'd like to understand at least "narratival rather than
deontologically based" part of this.

>
> We don't want to get in the middle of the restrictive
> vs. authoritative debate, but it would seem that the jury is
> conclusively in on that issue and LSM hooks are not going to be
> allowed to dismiss, or modify, any other security controls.
>
> Hopefully the BPF ABI isn't tied to CAP_BPF as that would seem to make
> it problematic to make controls more granular.
>
> > Kees Cook
>
> Have a good weekend.
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity
>
> [1]: Plus developers don't need to write security policies, you test
> your application in order to get the desired controls for a workload.

Casey Schaufler April 17, 2023, 11:53 p.m. UTC | #20

On 4/17/2023 4:31 PM, Andrii Nakryiko wrote:
> On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
>>>>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
>>>>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
>>>>>>>>> objects, which are fundamental building blocks of any modern BPF application.
>>>>>>>>>
>>>>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
>>>>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
>>>>>>>>> implement LSM policies that could granularly enforce more restrictions on
>>>>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
>>>>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
>>>>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
>>>>>>>>> cases.
>>>>>>>> One of the hallmarks of the LSM has always been that it is
>>>>>>>> non-authoritative: it cannot unilaterally grant access, it can only
>>>>>>>> restrict what would have been otherwise permitted on a traditional
>>>>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
>>>>>>>> discretionary access controls, e.g. capabilities.
>>>>>>>>
>>>>>>>> If there is a problem with the eBPF capability-based access controls,
>>>>>>>> that problem needs to be addressed in how the core eBPF code
>>>>>>>> implements its capability checks, not by modifying the LSM mechanism
>>>>>>>> to bypass these checks.
>>>>>>> I think semantics matter here. I wouldn't view this as _bypassing_
>>>>>>> capability enforcement: it's just more fine-grained access control.
>>> Exactly. One of the motivations for this work was the need to move
>>> some production use cases that are only needing extra privileges so
>>> that they can use BPF into a more restrictive environment. Granting
>>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
>>> for BPF usage is too coarse grained. These caps would allow those
>>> applications way more than just BPF usage. So the idea here is more
>>> finer-grained control of BPF-specific operations, granting *effective*
>>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
>>> production logic that would validate the use case.
>> That's an authoritative model which is in direct conflict with the
>> design and implementation of both capabilities and LSM.
>>
>>> This *is* an attempt to achieve a more secure production approach.
>>>
>>>>>>> For example, in many places we have things like:
>>>>>>>
>>>>>>>         if (!some_check(...) && !capable(...))
>>>>>>>                 return -EPERM;
>>>>>>>
>>>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>>>> access control requirement is met. The mismatch we have through-out the
>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>>>> yet here).
>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>>>> bypassing a capability check, but the description claims that to be
>>>>>> the case.
>>>>>>
>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>>>> be used to provide additional access control restrictions beyond a
>>>>>> capability check, but a LSM hook should never be allowed to overrule
>>>>>> an access denial due to a capability check.
>>>>>>
>>>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>>>> would be fine-grained enough at the time.
>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>>>> argument that one of the reasons LSMs exist is to provide
>>>>>> supplementary controls due to capability-based access controls being a
>>>>>> poor fit for many modern use cases.
>>>>> I generally agree with what you say, but we DO have this code pattern:
>>>>>
>>>>>          if (!some_check(...) && !capable(...))
>>>>>                  return -EPERM;
>>>> I think we need to make this more concrete; we don't have a pattern in
>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>>> Simply because there is another kernel access control mechanism which
>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>> LSM hook to be used to skip a capability check.
>>> This work is an attempt to tighten the security of production systems
>>> by allowing to drop too coarse-grained and permissive capabilities
>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
>>> than production use cases are meant to be able to do)
>> The BPF developers are in complete control of what CAP_BPF controls.
>> You can easily address the granularity issue by adding addition restrictions
>> on processes that have CAP_BPF. That is the intended use of LSM.
>> The whole point of having multiple capabilities is so that you can
>> grant just those that are required by the system security policy, and
>> do so safely. That leads to differences of opinion regarding the definition
>> of the system security policy. BPF chose to set itself up as an element
>> of security policy (you need CAP_BPF) rather than define elements such that
>> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
>> control.
> Please see my reply to Paul, where I explain CAP_BPF's system-wide
> nature and problem with user namespaces. I don't think the problem is
> in the granularity of CAP_BPF, it's more of a "non-namespaceable"
> nature of the BPF subsystem in general.

Paul is approaching this from a different angle. Your response to Paul
does not address the issue I have raised.

>>>  and then grant
>>> specific BPF operations on specific BPF programs/maps based on custom
>>> LSM security policy,
>> This is backwards. The correct implementation is to require CAP_BPF and
>> further restrict BPF operations based on a custom LSM security policy.
>> That's how LSM is designed.
> Please see my reply to Paul, we can't grant real CAP_BPF for
> applications in user namespace (unless there is some trick that I
> don't know, so please do point it out). Let's converge the discussion
> in that email thread branch to not discuss the same topic multiple
> times.

I saw your reply to Paul. Paul's points are not my points. If they where,
I wouldn't have taken my or your time to present them.

>>>  which validates application trustworthiness using
>>> custom production-specific logic.
>>>
>>> Isn't this goal in line with LSMs mission to enhance system security?
>> We're not arguing the goal, we're discussing the implementation.
>>
>>>>> It looks to me like this series can be refactored to do the same. I
>>>>> wouldn't consider that to be a "bypass", but I would agree the current
>>>>> series looks too much like "bypass", and makes reasoning about the
>>>>> effect of the LSM hooks too "special". :)
>>> Sorry, I didn't realize that the current code layout is making things
>>> more confusing. I'll address feedback to make the intent a bit
>>> clearer.
>>>
>>>> --
>>>> paul-moore.com

Andrii Nakryiko April 18, 2023, 12:28 a.m. UTC | #21

On Mon, Apr 17, 2023 at 4:53 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/17/2023 4:31 PM, Andrii Nakryiko wrote:
> > On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> >>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>>>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> >>>>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
> >>>>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
> >>>>>>>>> objects, which are fundamental building blocks of any modern BPF application.
> >>>>>>>>>
> >>>>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
> >>>>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> >>>>>>>>> implement LSM policies that could granularly enforce more restrictions on
> >>>>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> >>>>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
> >>>>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> >>>>>>>>> cases.
> >>>>>>>> One of the hallmarks of the LSM has always been that it is
> >>>>>>>> non-authoritative: it cannot unilaterally grant access, it can only
> >>>>>>>> restrict what would have been otherwise permitted on a traditional
> >>>>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
> >>>>>>>> discretionary access controls, e.g. capabilities.
> >>>>>>>>
> >>>>>>>> If there is a problem with the eBPF capability-based access controls,
> >>>>>>>> that problem needs to be addressed in how the core eBPF code
> >>>>>>>> implements its capability checks, not by modifying the LSM mechanism
> >>>>>>>> to bypass these checks.
> >>>>>>> I think semantics matter here. I wouldn't view this as _bypassing_
> >>>>>>> capability enforcement: it's just more fine-grained access control.
> >>> Exactly. One of the motivations for this work was the need to move
> >>> some production use cases that are only needing extra privileges so
> >>> that they can use BPF into a more restrictive environment. Granting
> >>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> >>> for BPF usage is too coarse grained. These caps would allow those
> >>> applications way more than just BPF usage. So the idea here is more
> >>> finer-grained control of BPF-specific operations, granting *effective*
> >>> CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> >>> production logic that would validate the use case.
> >> That's an authoritative model which is in direct conflict with the
> >> design and implementation of both capabilities and LSM.
> >>
> >>> This *is* an attempt to achieve a more secure production approach.
> >>>
> >>>>>>> For example, in many places we have things like:
> >>>>>>>
> >>>>>>>         if (!some_check(...) && !capable(...))
> >>>>>>>                 return -EPERM;
> >>>>>>>
> >>>>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>>>> access control requirement is met. The mismatch we have through-out the
> >>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>>>> yet here).
> >>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>>>> bypassing a capability check, but the description claims that to be
> >>>>>> the case.
> >>>>>>
> >>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>>>> be used to provide additional access control restrictions beyond a
> >>>>>> capability check, but a LSM hook should never be allowed to overrule
> >>>>>> an access denial due to a capability check.
> >>>>>>
> >>>>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>>>> would be fine-grained enough at the time.
> >>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>>>> argument that one of the reasons LSMs exist is to provide
> >>>>>> supplementary controls due to capability-based access controls being a
> >>>>>> poor fit for many modern use cases.
> >>>>> I generally agree with what you say, but we DO have this code pattern:
> >>>>>
> >>>>>          if (!some_check(...) && !capable(...))
> >>>>>                  return -EPERM;
> >>>> I think we need to make this more concrete; we don't have a pattern in
> >>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>>> Simply because there is another kernel access control mechanism which
> >>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>> LSM hook to be used to skip a capability check.
> >>> This work is an attempt to tighten the security of production systems
> >>> by allowing to drop too coarse-grained and permissive capabilities
> >>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> >>> than production use cases are meant to be able to do)
> >> The BPF developers are in complete control of what CAP_BPF controls.
> >> You can easily address the granularity issue by adding addition restrictions
> >> on processes that have CAP_BPF. That is the intended use of LSM.
> >> The whole point of having multiple capabilities is so that you can
> >> grant just those that are required by the system security policy, and
> >> do so safely. That leads to differences of opinion regarding the definition
> >> of the system security policy. BPF chose to set itself up as an element
> >> of security policy (you need CAP_BPF) rather than define elements such that
> >> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
> >> control.
> > Please see my reply to Paul, where I explain CAP_BPF's system-wide
> > nature and problem with user namespaces. I don't think the problem is
> > in the granularity of CAP_BPF, it's more of a "non-namespaceable"
> > nature of the BPF subsystem in general.
>
> Paul is approaching this from a different angle. Your response to Paul
> does not address the issue I have raised.

I see, I definitely missed this. Re-reading your reply, I still am not
clear on what you are proposing, tbh. Can you please elaborate what
you have in mind?

>
> >>>  and then grant
> >>> specific BPF operations on specific BPF programs/maps based on custom
> >>> LSM security policy,
> >> This is backwards. The correct implementation is to require CAP_BPF and
> >> further restrict BPF operations based on a custom LSM security policy.
> >> That's how LSM is designed.
> > Please see my reply to Paul, we can't grant real CAP_BPF for
> > applications in user namespace (unless there is some trick that I
> > don't know, so please do point it out). Let's converge the discussion
> > in that email thread branch to not discuss the same topic multiple
> > times.
>
> I saw your reply to Paul. Paul's points are not my points. If they where,
> I wouldn't have taken my or your time to present them.

Sure, sorry about that. What do you have in mind then?

>
> >>>  which validates application trustworthiness using
> >>> custom production-specific logic.
> >>>
> >>> Isn't this goal in line with LSMs mission to enhance system security?
> >> We're not arguing the goal, we're discussing the implementation.
> >>
> >>>>> It looks to me like this series can be refactored to do the same. I
> >>>>> wouldn't consider that to be a "bypass", but I would agree the current
> >>>>> series looks too much like "bypass", and makes reasoning about the
> >>>>> effect of the LSM hooks too "special". :)
> >>> Sorry, I didn't realize that the current code layout is making things
> >>> more confusing. I'll address feedback to make the intent a bit
> >>> clearer.
> >>>
> >>>> --
> >>>> paul-moore.com

Casey Schaufler April 18, 2023, 12:47 a.m. UTC | #22

On 4/17/2023 4:29 PM, Andrii Nakryiko wrote:
> On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
>> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
>> <andrii.nakryiko@gmail.com> wrote:
>>> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
>>>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
>>>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>>>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
>>>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>>>> ...
>>>>
>>>>>>>>> For example, in many places we have things like:
>>>>>>>>>
>>>>>>>>>         if (!some_check(...) && !capable(...))
>>>>>>>>>                 return -EPERM;
>>>>>>>>>
>>>>>>>>> I would expect this is a similar logic. An operation can succeed if the
>>>>>>>>> access control requirement is met. The mismatch we have through-out the
>>>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
>>>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
>>>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
>>>>>>>>> yet here).
>>>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
>>>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
>>>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
>>>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
>>>>>>>> bypassing a capability check, but the description claims that to be
>>>>>>>> the case.
>>>>>>>>
>>>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
>>>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
>>>>>>>> be used to provide additional access control restrictions beyond a
>>>>>>>> capability check, but a LSM hook should never be allowed to overrule
>>>>>>>> an access denial due to a capability check.
>>>>>>>>
>>>>>>>>> The reason CAP_BPF was created was because there was nothing else that
>>>>>>>>> would be fine-grained enough at the time.
>>>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
>>>>>>>> argument that one of the reasons LSMs exist is to provide
>>>>>>>> supplementary controls due to capability-based access controls being a
>>>>>>>> poor fit for many modern use cases.
>>>>>>> I generally agree with what you say, but we DO have this code pattern:
>>>>>>>
>>>>>>>          if (!some_check(...) && !capable(...))
>>>>>>>                  return -EPERM;
>>>>>> I think we need to make this more concrete; we don't have a pattern in
>>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>>>>> Simply because there is another kernel access control mechanism which
>>>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>>>> LSM hook to be used to skip a capability check.
>>>>> This work is an attempt to tighten the security of production systems
>>>>> by allowing to drop too coarse-grained and permissive capabilities
>>>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
>>>>> than production use cases are meant to be able to do) and then grant
>>>>> specific BPF operations on specific BPF programs/maps based on custom
>>>>> LSM security policy, which validates application trustworthiness using
>>>>> custom production-specific logic.
>>>> There are ways to leverage the LSMs to apply finer grained access
>>>> control on top of the relatively coarse capabilities that do not
>>>> require circumventing those capability controls.  One grants the
>>>> capabilities, just as one would do today, and then leverages the
>>>> security functionality of a LSM to further restrict specific users,
>>>> applications, etc. with a level of granularity beyond that offered by
>>>> the capability controls.
>>> Please help me understand something. What you and Casey are proposing,
>>> when taken to the logical extreme, is to grant to all processes root
>>> permissions and then use LSM to restrict specific actions, do I
>>> understand correctly? This strikes me as a less secure and more
>>> error-prone way of doing things.
>> When taken to the "logical extreme" most concepts end up sounding a
>> bit absurd, but that was the point, wasn't it?
> Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> the sake of example, let's say CAP_BPF allows 20 different operations
> (each with its own security_xxx hook). And let's say in production I
> want to only allow 3 of them. Sure, technically it should be possible
> to deny access at 17 hooks and let it through in just those 3. But if
> someone adds 21st and I forget to add 21st restriction, that would be
> bad (but very probably with such approach).

That would be a flaw in the implementation of the 21st, not a problem
with the capabilities or LSM model. For the LSM model to be sufficiently
flexible it cannot be required to prevent or detect coding errors.

> So my point is that for situations like this, dropping CAP_BPF, but
> allowing only 3 hooks to proceed seems a safer approach, because if we
> add 21st hook, it will safely be denied without CAP_BPF *by default*.
> That's what I tried to point out.

When you're creating security relevant or enforcing mechanisms there has
too be a level of expectation regarding the care with which they're
developed. My expectation is that the 21st hook won't go in without
adequate review.

> But even if we ignore this "safe by default when a new hook is added"
> behavior, when taking user namespaces into account, the restrictive
> LSM approach just doesn't seem to work at all for something like
> CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> because we cannot ensure that a given BPF program won't access kernel
> state "belonging" to another process (as one example).

Time namespaces have been proposed. I would be surprised if there aren't
people working on BPF namespaces somewhere. There's a difference between
"can't" and "haven't been".

> Now, thanks to Jonathan, I get that there was a heated discussion 20
> years ago about authoritative vs restrictive LSMs. But if I read a
> summary at that time ([0]), authoritative hooks were not out of the
> question *in principle*. Surely, "walk before we can run" makes sense,
> but it's been a while ago.

Certainly. The SGI comment was mine, by the way. I wanted authoritative
hooks for cases like POSIX ACLs and systems without root. While I would
have liked the decision to go the other way, there's no way I would endorse
a hybrid, where some hooks are restrictive and others authoritative.

>   [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3
>
>
>> Here is a fun story which seems relevant ... in the early days of
>> SELinux, one of the community devs setup up a system with a SELinux
>> policy which restricted all privileged operations from the root user,
>> put the system on a publicly accessible network, posted the root
>> password for all to see, and invited the public to login to the system
>> and attempt to exercise root privilege (it's been well over 10 years
>> at this point so the details are a bit fuzzy).  Granted, there were
>> some hiccups in the beginning, mostly due to the crude state of policy
>> development/analysis at the time, but after a few policy revisions the
>> system held up quite well.
> Honest question out of curiosity: was the intent to demonstrate that
> with LSM one can completely restrict root? Or that root was actually
> allowed to do something useful? Because I can see how rejecting
> everything would be rather simple, but actually pretty useless in
> practice. Restricting only part of the power of the root, while still
> allowing it to do something useful in production seems like a much
> harder (but way more valuable) endeavor. Not saying it's impossible,
> but see my example about missing 21st new CAP_BPF functionality.

Capabilities are sufficient to implement a rootless system. It's been done.
Someone will point out that CAP_SYS_ADMIN is effectively root, and there's
some truth to that.

>> On the more practical side of things, there are several use cases
>> which require, by way of legal or contractual requirements, that full
>> root/admin privileges are decomposed into separate roles: security
>> admin, audit admin, backup admin, etc.  These users satisfy these
>> requirements by using LSMs, such as SELinux, to restrict the
>> administrative capabilities based on the SELinux user/role/domain.
>>
>>> By the way, even the above proposal of yours doesn't work for
>>> production use cases when user namespaces are involved, as far as I
>>> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
>>> containers running inside user namespaces, as CAP_BPF in non-init
>>> namespace is not enough for bpf() syscall to allow loading BPF maps or
>>> BPF program ...
>> Once again, the LSM has always intended to be a restrictive mechanism,
>> not a privilege granting mechanism.  If an operation is not possible
> Not according to [0] above:
>
>   > It is our belief that these changes do not belong in the initial version of
>   > LSM (especially given our limited charter and original goals), and should
>   > be proposed as incremental refinements after LSM has been initially
>   > accepted.
>   > ...
>   > It is our belief that the current LSM
>   > will provide a meaningful improvement in the security infrastructure of the
>   > Linux kernel, and that there is plenty of room for future expansion of LSM
>   > in subsequent phases.
>
> I don't see "always intended to be a restrictive mechanism" there.

Having been on the other side of the argument, the system that was accepted
was in fact "always intended to be a restrictive mechanism". The quote above
is a "never say never" statement.

>> without the LSM layer enabled, it should not be possible with the LSM
>> layer enabled.  The LSM is not a mechanism to circumvent other access
>> control mechanisms in the kernel.
> I understand, but it's not like we are proposing to go and bypass all
> kinds of random kernel security mechanisms. These are targeted hooks,
> developed by the BPF community for the BPF subsystem to allow trusted
> unprivileged production use cases. Yes, we currently rely on checking
> CAP_BPF to grant more dangerous/advanced features, but it's because we
> can't just allow any unprivileged process to do this. But what we
> really want is to answer the question "can we trust this process to
> use this advanced functionality", and if there is no specific LSM
> policy that cares one way (allow) or the other (disallow), fallback to
> CAP_BPF enforcement.
>
> So it's not bypassing kernel checks, but rather augmenting them with
> more flexible and customizable mechanisms, while still falling back to
> CAP_BPF if the user didn't install any custom LSM policy.

That would make CAP_BPF behave differently from all other capabilities.
Capabilities are hard enough to use correctly as it is. If each capability
defined its own semantics they would be completely unusable. 

>>> Also, in previous email you said:
>>>
>>>> Simply because there is another kernel access control mechanism which
>>>> allows a capability check to be skipped doesn't mean I want to allow a
>>>> LSM hook to be used to skip a capability check.
>>> I understand your stated position, but can you please help me
>>> understand the reasoning behind it?
>> Keeping the LSM as a restrictive access control mechanism helps ensure
>> some level of sanity and consistency across different Linux
>> installations.  If a certain operation requires CAP_SYS_ADMIN on one
>> Linux system, it should require CAP_SYS_ADMIN on another Linux system.
>> Granted, a LSM running on one system might impose additional
>> constraints on that operation, but the CAP_SYS_ADMIN requirement still
>> applies.
>>
>> There is also an issue of safety in knowing that enabling a LSM will
>> not degrade the access controls on a system by potentially granting
>> operations that were previously denied.
>>
>>> Does the above also mean that you'd be fine if we just don't plug into
>>> the LSM subsystem at all and instead come up with some ad-hoc solution
>>> to allow effectively the same policies? This sounds detrimental both
>>> to LSM and BPF subsystems, so I hope we can talk this through before
>>> finalizing decisions.
>> Based on your patches and our discussion, it seems to me that the
>> problem you are trying to resolve is related more to the
>> capability-based access controls in the eBPF, and possibly other
>> kernel subsystems, and not any LSM-based restrictions.  I'm happy to
>> work with you on a solution involving the LSM, but please understand
>> that I'm not going to support a solution which changes a core
>> philosophy of the LSM layer.
> Great, I'd really appreciate help and suggestions on how to solve the
> following problem.
>
> We have a BPF subsystem that allows loading BPF programs. Those BPF
> programs cannot be contained within a particular namespace just by its
> system-wide tracing nature (it can safely read kernel and user memory
> and we can't restrict whether that memory belongs to a particular
> namespace), so it's like CAP_SYS_TIME, just with much broader API
> surface.

This doesn't sound like a problem, it sounds like BPF is explicitly
designed to prevent interference by namespaces. But in some cases you
now want to limit it by namespaces.

It appears that the desired uses of BPF are no longer compatible with
its original security model. That's unfortunate, and likely to require
a significant change to the implementation of BPF.

>
> The other piece of a puzzle is user namespaces. We do want to run
> applications inside user namespaces, but allow them to use BPF
> programs. As far as I can tell, there is no way to grant real CAP_BPF
> that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> about system-wide nature of BPF). If there is, please help me
> understand how. All my local experiments failed, and looking at
> cap_capable() implementation it is not intended to even check the
> initial namespace's capability if the process is running in the user
> namespace.
>
>
> So, given that a) we can't make CAP_BPF namespace-aware and b) we
> can't grant real CAP_BPF to processes in user namespace, how could we
> allow user namespaced applications to do useful work with BPF?
>
>>> Lastly, you mentioned before:
>>>
>>>>>> I think we need to make this more concrete; we don't have a pattern in
>>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
>>> Unfortunately I don't have enough familiarity with all LSM hooks, so I
>>> can't confirm or disprove the above statement. But earlier someone
>>> brought to my attention the case of security_vm_enough_memory_mm(),
>>> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
>>> of memory accounting. Am I missing something subtle there or does it
>>> grant effective caps indeed?
>> Some of the comments around that hook can be misleading, but if you
>> look at the actual code it starts to make more sense.
>>
> [...]
>
>> I do agree that the security_vm_enough_memory() hook is structured a
>> bit differently than most of the other LSM hooks, but it still
>> operates with the same philosophy: a LSM should only be allowed to
>> restrict access, a LSM should never be allowed to grant access that
>> would otherwise be denied by the traditional Linux access controls.
>>
>> Hopefully that explanation makes sense, but if things are still a bit
>> fuzzy I would encourage you to go look at the code, I'm sure it will
>> make sense once you spend a few minutes figuring out how it works.
>>
> Yep, thanks a lot, it's way more clear after grokking relevant pieces
> of LSM the code you pointed out and LSM infrastructure in general.
> "capabilities" LSM is non-negotiable, so it effectively always
> restricts a small subset of hooks, including vm_enough_memory and
> capable.
>
> Still, the problem still stands. How do we marry BPF and user
> namespaces? I'd really appreciate suggestions. Thank you!
>
>
>> [1] There is a long and sorta bizarre history with the capability LSM,
>> but just understand it is a bit "special" in many ways, and those
>> "special" behaviors are intentional.
>>
>> --
>> paul-moore.com

Casey Schaufler April 18, 2023, 12:52 a.m. UTC | #23

On 4/17/2023 5:28 PM, Andrii Nakryiko wrote:
> On Mon, Apr 17, 2023 at 4:53 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> ...
>>
>> The BPF developers are in complete control of what CAP_BPF controls.
>> You can easily address the granularity issue by adding addition restrictions
>> on processes that have CAP_BPF. That is the intended use of LSM.
>> The whole point of having multiple capabilities is so that you can
>> grant just those that are required by the system security policy, and
>> do so safely. That leads to differences of opinion regarding the definition
>> of the system security policy. BPF chose to set itself up as an element
>> of security policy (you need CAP_BPF) rather than define elements such that
>> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
>> control.
>>> Please see my reply to Paul, where I explain CAP_BPF's system-wide
>>> nature and problem with user namespaces. I don't think the problem is
>>> in the granularity of CAP_BPF, it's more of a "non-namespaceable"
>>> nature of the BPF subsystem in general.
>> Paul is approaching this from a different angle. Your response to Paul
>> does not address the issue I have raised.
> I see, I definitely missed this. Re-reading your reply, I still am not
> clear on what you are proposing, tbh. Can you please elaborate what
> you have in mind?

As requested, I've moved over to the "other" thread.

Paul Moore April 18, 2023, 2:21 p.m. UTC | #24

On Mon, Apr 17, 2023 at 7:29 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > ...
> > > >
> > > > > > > > > For example, in many places we have things like:
> > > > > > > > >
> > > > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > > > >                 return -EPERM;
> > > > > > > > >
> > > > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > > > yet here).
> > > > > > > >
> > > > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > > > bypassing a capability check, but the description claims that to be
> > > > > > > > the case.
> > > > > > > >
> > > > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > > > be used to provide additional access control restrictions beyond a
> > > > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > > > an access denial due to a capability check.
> > > > > > > >
> > > > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > > > would be fine-grained enough at the time.
> > > > > > > >
> > > > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > > > supplementary controls due to capability-based access controls being a
> > > > > > > > poor fit for many modern use cases.
> > > > > > >
> > > > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > > > >
> > > > > > >          if (!some_check(...) && !capable(...))
> > > > > > >                  return -EPERM;
> > > > > >
> > > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > > > Simply because there is another kernel access control mechanism which
> > > > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > > > LSM hook to be used to skip a capability check.
> > > > >
> > > > > This work is an attempt to tighten the security of production systems
> > > > > by allowing to drop too coarse-grained and permissive capabilities
> > > > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > > > than production use cases are meant to be able to do) and then grant
> > > > > specific BPF operations on specific BPF programs/maps based on custom
> > > > > LSM security policy, which validates application trustworthiness using
> > > > > custom production-specific logic.
> > > >
> > > > There are ways to leverage the LSMs to apply finer grained access
> > > > control on top of the relatively coarse capabilities that do not
> > > > require circumventing those capability controls.  One grants the
> > > > capabilities, just as one would do today, and then leverages the
> > > > security functionality of a LSM to further restrict specific users,
> > > > applications, etc. with a level of granularity beyond that offered by
> > > > the capability controls.
> > >
> > > Please help me understand something. What you and Casey are proposing,
> > > when taken to the logical extreme, is to grant to all processes root
> > > permissions and then use LSM to restrict specific actions, do I
> > > understand correctly? This strikes me as a less secure and more
> > > error-prone way of doing things.
> >
> > When taken to the "logical extreme" most concepts end up sounding a
> > bit absurd, but that was the point, wasn't it?
>
> Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> the sake of example, let's say CAP_BPF allows 20 different operations
> (each with its own security_xxx hook). And let's say in production I
> want to only allow 3 of them. Sure, technically it should be possible
> to deny access at 17 hooks and let it through in just those 3. But if
> someone adds 21st and I forget to add 21st restriction, that would be
> bad (but very probably with such approach).

Welcome to the challenges of maintaining access controls within the
Linux Kernel, LSM or otherwise.  As we all know, the Linux Kernel
moves forward at a staggering pace sometimes, and it is not uncommon
for new features/subsystems to be added without consulting all of the
different folks who worry about access controls.  In many cases it can
be a simple misunderstanding, but in some cases it's a willful
rejection of a particular form of access control, the LSM being a
prime example.  Thankfully in almost all of those cases we have been
moderately successful in retrofitting the necessary access controls,
sometimes they are not as good/capable/granular/etc. as we would like
because of design limitations, but such is life.

I say this not because I believe this is a valid argument for
authoritative LSM hooks, I say this simply to acknowledge that this
*is* a problem.

> So my point is that for situations like this, dropping CAP_BPF, but
> allowing only 3 hooks to proceed seems a safer approach, because if we
> add 21st hook, it will safely be denied without CAP_BPF *by default*.
> That's what I tried to point out.

I believe I understand your point, I just disagree with you on
accepting authoritative LSM hooks in the upstream Linux Kernel; I
believe it would be a *big* mistake to move away from the restrictive
LSM hook philosophy at this point in time.

> But even if we ignore this "safe by default when a new hook is added"
> behavior, when taking user namespaces into account, the restrictive
> LSM approach just doesn't seem to work at all for something like
> CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> because we cannot ensure that a given BPF program won't access kernel
> state "belonging" to another process (as one example).

Once again, the root of this problem lies in the capabilities and/or
namespace mechanisms, not the LSM; if you want to fix this properly
you should be looking at how eBPF leverages capabilities for access
control.  Changing the very core behavior of the LSM layer in order to
work around an issue with another access control mechanism is a
non-starter.  I can't say this enough.

> Now, thanks to Jonathan, I get that there was a heated discussion 20
> years ago about authoritative vs restrictive LSMs. But if I read a
> summary at that time ([0]), authoritative hooks were not out of the
> question *in principle*. Surely, "walk before we can run" makes sense,
> but it's been a while ago.

... and once again, the restrictive approach has proven to work
reasonably well over the past ~20 years, why would we abandon that
simply to work around a problem with a different access control
mechanism.  Don't break the LSM layer to fix something else.

> > Here is a fun story which seems relevant ... in the early days of
> > SELinux, one of the community devs setup up a system with a SELinux
> > policy which restricted all privileged operations from the root user,
> > put the system on a publicly accessible network, posted the root
> > password for all to see, and invited the public to login to the system
> > and attempt to exercise root privilege (it's been well over 10 years
> > at this point so the details are a bit fuzzy).  Granted, there were
> > some hiccups in the beginning, mostly due to the crude state of policy
> > development/analysis at the time, but after a few policy revisions the
> > system held up quite well.
>
> Honest question out of curiosity: was the intent to demonstrate that
> with LSM one can completely restrict root? Or that root was actually
> allowed to do something useful?

The intent was to show that it is possible to restrict
capability-based access controls with the LSM layer; it was the best
example of the "logical extreme" carried out in the real world that I
could think of when writing my response.

> > On the more practical side of things, there are several use cases
> > which require, by way of legal or contractual requirements, that full
> > root/admin privileges are decomposed into separate roles: security
> > admin, audit admin, backup admin, etc.  These users satisfy these
> > requirements by using LSMs, such as SELinux, to restrict the
> > administrative capabilities based on the SELinux user/role/domain.
> >
> > > By the way, even the above proposal of yours doesn't work for
> > > production use cases when user namespaces are involved, as far as I
> > > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > > containers running inside user namespaces, as CAP_BPF in non-init
> > > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > > BPF program ...
> >
> > Once again, the LSM has always intended to be a restrictive mechanism,
> > not a privilege granting mechanism.  If an operation is not possible
>
> Not according to [0] above:

When one considers what has been present in Linus' tree, then yes.
The idea of authoritative LSM hooks has been rejected for ~20 years
and I've seen nothing in this thread to make me believe that we should
change that now, and for this use case.

> > Based on your patches and our discussion, it seems to me that the
> > problem you are trying to resolve is related more to the
> > capability-based access controls in the eBPF, and possibly other
> > kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> > work with you on a solution involving the LSM, but please understand
> > that I'm not going to support a solution which changes a core
> > philosophy of the LSM layer.
>
> Great, I'd really appreciate help and suggestions on how to solve the
> following problem.
>
> We have a BPF subsystem that allows loading BPF programs. Those BPF
> programs cannot be contained within a particular namespace just by its
> system-wide tracing nature (it can safely read kernel and user memory
> and we can't restrict whether that memory belongs to a particular
> namespace), so it's like CAP_SYS_TIME, just with much broader API
> surface.
>
> The other piece of a puzzle is user namespaces. We do want to run
> applications inside user namespaces, but allow them to use BPF
> programs. As far as I can tell, there is no way to grant real CAP_BPF
> that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> about system-wide nature of BPF). If there is, please help me
> understand how. All my local experiments failed, and looking at
> cap_capable() implementation it is not intended to even check the
> initial namespace's capability if the process is running in the user
> namespace.
>
> So, given that a) we can't make CAP_BPF namespace-aware and b) we
> can't grant real CAP_BPF to processes in user namespace, how could we
> allow user namespaced applications to do useful work with BPF?

I would start by talking with the user namespace folks.  I may be
misunderstanding the problem as you've described it, but it seems like
the core issue is how capabilities, specifically CAP_BPF, are handled
in user namespaces.  To be honest, I'm not sure how much luck you'll
have there, but you stand a better chance in changing how capabilities
are handled across user namespaces than you do in getting an
authoritative LSM hook merged.

Regardless, my offer still stands, if you have a solution which sticks
to a restrictive LSM model, I'm happy to work with you further to sort
out the details and try to make that work.  I don't have any great
ideas there at the moment, but there are plenty of smart people on
this mailing list and others who might have something clever in mind.

Dr. Greg April 19, 2023, 10:53 a.m. UTC | #25

On Mon, Apr 17, 2023 at 04:31:31PM -0700, Andrii Nakryiko wrote:

Hi, I hope the week is going well for everyone.

> On Fri, Apr 14, 2023 at 1:24???PM Dr. Greg <greg@enjellic.com> wrote:
> >
> > On Wed, Apr 12, 2023 at 10:47:13AM -0700, Kees Cook wrote:
> >
> > Hi, I hope the week is ending well for everyone.
> >
> > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > On Wed, Apr 12, 2023 at 12:33???AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> > > > > are meant to allow highly-granular LSM-based control over the usage of BPF
> > > > > subsytem. Specifically, to control the creation of BPF maps and BTF data
> > > > > objects, which are fundamental building blocks of any modern BPF application.
> > > > >
> > > > > These new hooks are able to override default kernel-side CAP_BPF-based (and
> > > > > sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> > > > > implement LSM policies that could granularly enforce more restrictions on
> > > > > a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> > > > > capabilities), but also, importantly, allow to *bypass kernel-side
> > > > > enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> > > > > cases.
> > > >
> > > > One of the hallmarks of the LSM has always been that it is
> > > > non-authoritative: it cannot unilaterally grant access, it can only
> > > > restrict what would have been otherwise permitted on a traditional
> > > > Linux system.  Put another way, a LSM should not undermine the Linux
> > > > discretionary access controls, e.g. capabilities.
> > > >
> > > > If there is a problem with the eBPF capability-based access controls,
> > > > that problem needs to be addressed in how the core eBPF code
> > > > implements its capability checks, not by modifying the LSM mechanism
> > > > to bypass these checks.
> >
> > > I think semantics matter here. I wouldn't view this as _bypassing_
> > > capability enforcement: it's just more fine-grained access control.
> > >
> > > For example, in many places we have things like:
> > >
> > >       if (!some_check(...) && !capable(...))
> > >               return -EPERM;
> > >
> > > I would expect this is a similar logic. An operation can succeed if the
> > > access control requirement is met. The mismatch we have through-out the
> > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > this series conceptually, I think, doesn't violate that -- it's changing
> > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > yet here).
> > >
> > > The reason CAP_BPF was created was because there was nothing else that
> > > would be fine-grained enough at the time.

> > This was one of the issues, among others, that the TSEM LSM we are
> > working to upstream, was designed to address and may be an avenue
> > forward.
> >
> > TSEM, being narratival rather than deontologically based, provides a
> > framework for security permissions that are based on a
> > characterization of the event itself.  So the permissions are as
> > variable as the contents of whatever BPF related information is passed
> > to the bpf* LSM hooks [1].
> >
> > Currently, the tsem_bpf_* hooks are generically modeled.  We would
> > certainly entertain any discussion or suggestions as to what elements
> > of the structures passed to the hooks would be useful with respect
> > to establishing security policies useful and appropriate to the BPF
> > community.

> Could you please provide some links to get a bit more context and
> information? I'd like to understand at least "narratival rather than
> deontologically based" part of this.

We don't have much in the way of links, hopefully some simple prose
will be helpful.

'Narratival vs deontological' contrasts the logic philosophy that is
being used in the design of a security architecture.

Deontological implies that the security architecture is 'rules' based.
A concept embraced by the classic mandatory access control
architectures such as SeLinux.

Narratival, the logic predicate embraced by TSEM, implies that the
security architecture is events based and is constructed from a
narration of a known good workload by unit testing.

At the risk of indulging in further philosophical wonkiness, the two
bodies of logic arise from the constrasting philosopies espoused by
Immanual Kant and Georg Wilhelm Friedrich Hegel.  It is somewhat less
precise, but a security architecture that is rules based would be
considered 'Kantian' motivated while an events based architecture
would be considered 'Hegelian' inspired.

So, departing from epistemology, what does all of this mean with
respect to security.

In a policy based architecture, the security decision is a product of
the rules, in the case of SeLinux a rather complex corpus, that have
been established to regulate the interaction of a role, subject and
object label.

In an events based architecture, the security decision is a product of
the characteristics of the event.  From a granularity perspective,
which seems to be an issue in this BPF/BTF discussion, the granularity
of the security decision can be as variable as any of characteristics
that is used to describe the LSM event at the operating system level.

In TSEM, the characteristics of the event are used to generic a unique
numeric coefficient specific to the event.  The TSEM documentation
discusses the functional generation of these coefficients.

In the case of the three bpf LSM hooks that are in 6.5, this would be
any of the characteristics embodied in the following variables.

bpf command
bpf_attributes
bpf_map
fmode_t
bpf_prog

With respect to your problem at hand; Paul Moore suggested elsewhere
in this thread that there were smart people hanging around on the list
that might be able to comment on the challenge of CAP_BPF lacking
granularity and being unavailable in a user namespace.

I can't claim to being very smart, but I did hook up the big screen TV
at our lake place in west-central Minnesota and it worked the first
time, so here goes some thoughts.

I can't claim a great deal of experience with BPF, but I'm assuming
that any of the characteristics above, or that would be passed to the
proposed BPF LSM hooks, would embody sufficient information about a
BPF program to fully characterize it from a security perspective.

I'm also assuming that the BPF implementation in the Linux kernel is
now sufficiently featureful for a BPF program to assist in making a
security decision by analyzing any of the attributes passed to an LSM
hook for a subsequent and subordinate BPF program.

We currently don't have support in TSEM for connecting a BPF program
to an in kernel Trusted Modeling Agent (TMA), but it is on our radar
screen, desperately seeking attention cycles.  With such hypothetical
support in place, I would propose gating the ability to attach a BPF
program to a TMA with CAP_BPF.  Said program would then assume the
role of assisting the TMA in generating the security coefficients for
subsequent BPF related security events in the modeling namespace.

At that point, the security behavior of subsequent BPF programs will
be under the control of the security model being run by the TMA
assigned to that security namespace.  It can be as granular and
restrictive as any security characteristics that would be described as
being relevant to BPF.

From a security perspective, you don't write any security policy, you
unit test the BPF application and the trust orchestrator generates the
security model that would be subsequently enforced.

With this model, you don't override any existing security controls and
the LSM implementation remains purely restrictive.  CAP_BPF regulates
whether the BPF infrastructure can be accessed and BPF itself becomes
responsible for defining the permissable security behavior of any
subordinate BPF applications.

There are undoubtedly considerations needed in the BPF implementation
to support this model but I haven't had time to look at those
particulars.

There is further discussion of the concepts involved in the 18+ page
documentation file that was included in the V0 release of TSEM.  Here
is the lore link for the original series:

https://lore.kernel.org/linux-security-module/20230204050954.11583-1-greg@enjellic.com/#t

The V1 release, currently being finalized, is a significantly enhanced
implementation but the architectural and security concepts discussed
are all still relevant, if there is a desire to dig into this further.

With respect to the thinking and writings of Kant and Hegel, Wikipedia
is your friend.... :-)

To conclude in a big picture context, if it hasn't already jumped out
at people.  While TSEM operates practically from a narratival design
perspective, it is designed to do so by applying either deterministic
or machine learning models to the characterization and enforcement of
the security behavior of a platform.

The reason we have a somewhat intense interest in BPF is that HIDS
based machine learning models need to do characteristic screening in
order to be properly trained for anomaly detection.  BPF is a pathway
to achieving this with a single kernel based trusted modeling agent
implementation.

Now, back to figuring out how to hook up the stereo/hifi.

Have a good remainder of the week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

Andrii Nakryiko April 21, 2023, midnight UTC | #26

On Mon, Apr 17, 2023 at 5:48 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/17/2023 4:29 PM, Andrii Nakryiko wrote:
> > On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
> >> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> >> <andrii.nakryiko@gmail.com> wrote:
> >>> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> >>>> <andrii.nakryiko@gmail.com> wrote:
> >>>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >>>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >>>> ...
> >>>>
> >>>>>>>>> For example, in many places we have things like:
> >>>>>>>>>
> >>>>>>>>>         if (!some_check(...) && !capable(...))
> >>>>>>>>>                 return -EPERM;
> >>>>>>>>>
> >>>>>>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>>>>>> access control requirement is met. The mismatch we have through-out the
> >>>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>>>>>> yet here).
> >>>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>>>>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>>>>>> bypassing a capability check, but the description claims that to be
> >>>>>>>> the case.
> >>>>>>>>
> >>>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>>>>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>>>>>> be used to provide additional access control restrictions beyond a
> >>>>>>>> capability check, but a LSM hook should never be allowed to overrule
> >>>>>>>> an access denial due to a capability check.
> >>>>>>>>
> >>>>>>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>>>>>> would be fine-grained enough at the time.
> >>>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>>>>>> argument that one of the reasons LSMs exist is to provide
> >>>>>>>> supplementary controls due to capability-based access controls being a
> >>>>>>>> poor fit for many modern use cases.
> >>>>>>> I generally agree with what you say, but we DO have this code pattern:
> >>>>>>>
> >>>>>>>          if (!some_check(...) && !capable(...))
> >>>>>>>                  return -EPERM;
> >>>>>> I think we need to make this more concrete; we don't have a pattern in
> >>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>>>>> Simply because there is another kernel access control mechanism which
> >>>>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>>>> LSM hook to be used to skip a capability check.
> >>>>> This work is an attempt to tighten the security of production systems
> >>>>> by allowing to drop too coarse-grained and permissive capabilities
> >>>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> >>>>> than production use cases are meant to be able to do) and then grant
> >>>>> specific BPF operations on specific BPF programs/maps based on custom
> >>>>> LSM security policy, which validates application trustworthiness using
> >>>>> custom production-specific logic.
> >>>> There are ways to leverage the LSMs to apply finer grained access
> >>>> control on top of the relatively coarse capabilities that do not
> >>>> require circumventing those capability controls.  One grants the
> >>>> capabilities, just as one would do today, and then leverages the
> >>>> security functionality of a LSM to further restrict specific users,
> >>>> applications, etc. with a level of granularity beyond that offered by
> >>>> the capability controls.
> >>> Please help me understand something. What you and Casey are proposing,
> >>> when taken to the logical extreme, is to grant to all processes root
> >>> permissions and then use LSM to restrict specific actions, do I
> >>> understand correctly? This strikes me as a less secure and more
> >>> error-prone way of doing things.
> >> When taken to the "logical extreme" most concepts end up sounding a
> >> bit absurd, but that was the point, wasn't it?
> > Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> > the sake of example, let's say CAP_BPF allows 20 different operations
> > (each with its own security_xxx hook). And let's say in production I
> > want to only allow 3 of them. Sure, technically it should be possible
> > to deny access at 17 hooks and let it through in just those 3. But if
> > someone adds 21st and I forget to add 21st restriction, that would be
> > bad (but very probably with such approach).
>
> That would be a flaw in the implementation of the 21st, not a problem
> with the capabilities or LSM model. For the LSM model to be sufficiently
> flexible it cannot be required to prevent or detect coding errors.
>
> > So my point is that for situations like this, dropping CAP_BPF, but
> > allowing only 3 hooks to proceed seems a safer approach, because if we
> > add 21st hook, it will safely be denied without CAP_BPF *by default*.
> > That's what I tried to point out.
>
> When you're creating security relevant or enforcing mechanisms there has
> too be a level of expectation regarding the care with which they're
> developed. My expectation is that the 21st hook won't go in without
> adequate review.
>

That's not how it works with BPF LSM, but there is no point in arguing
about this. I agree that LSM shouldn't be prevent from adding new
hooks just because of some particular LSM implementation.

> > But even if we ignore this "safe by default when a new hook is added"
> > behavior, when taking user namespaces into account, the restrictive
> > LSM approach just doesn't seem to work at all for something like
> > CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> > because we cannot ensure that a given BPF program won't access kernel
> > state "belonging" to another process (as one example).
>
> Time namespaces have been proposed. I would be surprised if there aren't
> people working on BPF namespaces somewhere. There's a difference between
> "can't" and "haven't been".
>

It really is "can't" for BPF, as it allows tracing of kernel internals.

> > Now, thanks to Jonathan, I get that there was a heated discussion 20
> > years ago about authoritative vs restrictive LSMs. But if I read a
> > summary at that time ([0]), authoritative hooks were not out of the
> > question *in principle*. Surely, "walk before we can run" makes sense,
> > but it's been a while ago.
>
> Certainly. The SGI comment was mine, by the way. I wanted authoritative
> hooks for cases like POSIX ACLs and systems without root. While I would
> have liked the decision to go the other way, there's no way I would endorse
> a hybrid, where some hooks are restrictive and others authoritative.
>

Yep, saw your comments as well. Can't say I get what would be wrong
with having authoritative hooks together with restrictive ones, but oh
well.


> >   [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3
> >
> >
> >> Here is a fun story which seems relevant ... in the early days of
> >> SELinux, one of the community devs setup up a system with a SELinux
> >> policy which restricted all privileged operations from the root user,
> >> put the system on a publicly accessible network, posted the root
> >> password for all to see, and invited the public to login to the system
> >> and attempt to exercise root privilege (it's been well over 10 years
> >> at this point so the details are a bit fuzzy).  Granted, there were
> >> some hiccups in the beginning, mostly due to the crude state of policy
> >> development/analysis at the time, but after a few policy revisions the
> >> system held up quite well.
> > Honest question out of curiosity: was the intent to demonstrate that
> > with LSM one can completely restrict root? Or that root was actually
> > allowed to do something useful? Because I can see how rejecting
> > everything would be rather simple, but actually pretty useless in
> > practice. Restricting only part of the power of the root, while still
> > allowing it to do something useful in production seems like a much
> > harder (but way more valuable) endeavor. Not saying it's impossible,
> > but see my example about missing 21st new CAP_BPF functionality.
>
> Capabilities are sufficient to implement a rootless system. It's been done.
> Someone will point out that CAP_SYS_ADMIN is effectively root, and there's
> some truth to that.
>
> >> On the more practical side of things, there are several use cases
> >> which require, by way of legal or contractual requirements, that full
> >> root/admin privileges are decomposed into separate roles: security
> >> admin, audit admin, backup admin, etc.  These users satisfy these
> >> requirements by using LSMs, such as SELinux, to restrict the
> >> administrative capabilities based on the SELinux user/role/domain.
> >>
> >>> By the way, even the above proposal of yours doesn't work for
> >>> production use cases when user namespaces are involved, as far as I
> >>> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> >>> containers running inside user namespaces, as CAP_BPF in non-init
> >>> namespace is not enough for bpf() syscall to allow loading BPF maps or
> >>> BPF program ...
> >> Once again, the LSM has always intended to be a restrictive mechanism,
> >> not a privilege granting mechanism.  If an operation is not possible
> > Not according to [0] above:
> >
> >   > It is our belief that these changes do not belong in the initial version of
> >   > LSM (especially given our limited charter and original goals), and should
> >   > be proposed as incremental refinements after LSM has been initially
> >   > accepted.
> >   > ...
> >   > It is our belief that the current LSM
> >   > will provide a meaningful improvement in the security infrastructure of the
> >   > Linux kernel, and that there is plenty of room for future expansion of LSM
> >   > in subsequent phases.
> >
> > I don't see "always intended to be a restrictive mechanism" there.
>
> Having been on the other side of the argument, the system that was accepted
> was in fact "always intended to be a restrictive mechanism". The quote above
> is a "never say never" statement.
>
> >> without the LSM layer enabled, it should not be possible with the LSM
> >> layer enabled.  The LSM is not a mechanism to circumvent other access
> >> control mechanisms in the kernel.
> > I understand, but it's not like we are proposing to go and bypass all
> > kinds of random kernel security mechanisms. These are targeted hooks,
> > developed by the BPF community for the BPF subsystem to allow trusted
> > unprivileged production use cases. Yes, we currently rely on checking
> > CAP_BPF to grant more dangerous/advanced features, but it's because we
> > can't just allow any unprivileged process to do this. But what we
> > really want is to answer the question "can we trust this process to
> > use this advanced functionality", and if there is no specific LSM
> > policy that cares one way (allow) or the other (disallow), fallback to
> > CAP_BPF enforcement.
> >
> > So it's not bypassing kernel checks, but rather augmenting them with
> > more flexible and customizable mechanisms, while still falling back to
> > CAP_BPF if the user didn't install any custom LSM policy.
>
> That would make CAP_BPF behave differently from all other capabilities.
> Capabilities are hard enough to use correctly as it is. If each capability
> defined its own semantics they would be completely unusable.
>
> >>> Also, in previous email you said:
> >>>
> >>>> Simply because there is another kernel access control mechanism which
> >>>> allows a capability check to be skipped doesn't mean I want to allow a
> >>>> LSM hook to be used to skip a capability check.
> >>> I understand your stated position, but can you please help me
> >>> understand the reasoning behind it?
> >> Keeping the LSM as a restrictive access control mechanism helps ensure
> >> some level of sanity and consistency across different Linux
> >> installations.  If a certain operation requires CAP_SYS_ADMIN on one
> >> Linux system, it should require CAP_SYS_ADMIN on another Linux system.
> >> Granted, a LSM running on one system might impose additional
> >> constraints on that operation, but the CAP_SYS_ADMIN requirement still
> >> applies.
> >>
> >> There is also an issue of safety in knowing that enabling a LSM will
> >> not degrade the access controls on a system by potentially granting
> >> operations that were previously denied.
> >>
> >>> Does the above also mean that you'd be fine if we just don't plug into
> >>> the LSM subsystem at all and instead come up with some ad-hoc solution
> >>> to allow effectively the same policies? This sounds detrimental both
> >>> to LSM and BPF subsystems, so I hope we can talk this through before
> >>> finalizing decisions.
> >> Based on your patches and our discussion, it seems to me that the
> >> problem you are trying to resolve is related more to the
> >> capability-based access controls in the eBPF, and possibly other
> >> kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> >> work with you on a solution involving the LSM, but please understand
> >> that I'm not going to support a solution which changes a core
> >> philosophy of the LSM layer.
> > Great, I'd really appreciate help and suggestions on how to solve the
> > following problem.
> >
> > We have a BPF subsystem that allows loading BPF programs. Those BPF
> > programs cannot be contained within a particular namespace just by its
> > system-wide tracing nature (it can safely read kernel and user memory
> > and we can't restrict whether that memory belongs to a particular
> > namespace), so it's like CAP_SYS_TIME, just with much broader API
> > surface.
>
> This doesn't sound like a problem, it sounds like BPF is explicitly
> designed to prevent interference by namespaces. But in some cases you
> now want to limit it by namespaces.
>
> It appears that the desired uses of BPF are no longer compatible with
> its original security model. That's unfortunate, and likely to require
> a significant change to the implementation of BPF.
>

I have some new ideas, so hopefully not as significant. While I still
think that authoritative LSM hooks would be great, I'll stop arguing.
I'll get back with a different proposal that would allow BPF usage
within user namespaces. We still will want LSM hooks for fine-grained
control, but I think we'll be able to make them restrictive-only.

> >
> > The other piece of a puzzle is user namespaces. We do want to run
> > applications inside user namespaces, but allow them to use BPF
> > programs. As far as I can tell, there is no way to grant real CAP_BPF
> > that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> > about system-wide nature of BPF). If there is, please help me
> > understand how. All my local experiments failed, and looking at
> > cap_capable() implementation it is not intended to even check the
> > initial namespace's capability if the process is running in the user
> > namespace.
> >
> >
> > So, given that a) we can't make CAP_BPF namespace-aware and b) we
> > can't grant real CAP_BPF to processes in user namespace, how could we
> > allow user namespaced applications to do useful work with BPF?
> >
> >>> Lastly, you mentioned before:
> >>>
> >>>>>> I think we need to make this more concrete; we don't have a pattern in
> >>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >>> Unfortunately I don't have enough familiarity with all LSM hooks, so I
> >>> can't confirm or disprove the above statement. But earlier someone
> >>> brought to my attention the case of security_vm_enough_memory_mm(),
> >>> which seems to be granting effectively CAP_SYS_ADMIN for the purposes
> >>> of memory accounting. Am I missing something subtle there or does it
> >>> grant effective caps indeed?
> >> Some of the comments around that hook can be misleading, but if you
> >> look at the actual code it starts to make more sense.
> >>
> > [...]
> >
> >> I do agree that the security_vm_enough_memory() hook is structured a
> >> bit differently than most of the other LSM hooks, but it still
> >> operates with the same philosophy: a LSM should only be allowed to
> >> restrict access, a LSM should never be allowed to grant access that
> >> would otherwise be denied by the traditional Linux access controls.
> >>
> >> Hopefully that explanation makes sense, but if things are still a bit
> >> fuzzy I would encourage you to go look at the code, I'm sure it will
> >> make sense once you spend a few minutes figuring out how it works.
> >>
> > Yep, thanks a lot, it's way more clear after grokking relevant pieces
> > of LSM the code you pointed out and LSM infrastructure in general.
> > "capabilities" LSM is non-negotiable, so it effectively always
> > restricts a small subset of hooks, including vm_enough_memory and
> > capable.
> >
> > Still, the problem still stands. How do we marry BPF and user
> > namespaces? I'd really appreciate suggestions. Thank you!
> >
> >
> >> [1] There is a long and sorta bizarre history with the capability LSM,
> >> but just understand it is a bit "special" in many ways, and those
> >> "special" behaviors are intentional.
> >>
> >> --
> >> paul-moore.com

Andrii Nakryiko April 21, 2023, midnight UTC | #27

On Tue, Apr 18, 2023 at 7:21 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Mon, Apr 17, 2023 at 7:29 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@paul-moore.com> wrote:
> > > On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > > On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> > > > > > > > > > On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> > > > > > > > > > > On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > > > > > For example, in many places we have things like:
> > > > > > > > > >
> > > > > > > > > >         if (!some_check(...) && !capable(...))
> > > > > > > > > >                 return -EPERM;
> > > > > > > > > >
> > > > > > > > > > I would expect this is a similar logic. An operation can succeed if the
> > > > > > > > > > access control requirement is met. The mismatch we have through-out the
> > > > > > > > > > kernel is that capability checks aren't strictly done by LSM hooks. And
> > > > > > > > > > this series conceptually, I think, doesn't violate that -- it's changing
> > > > > > > > > > the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> > > > > > > > > > yet here).
> > > > > > > > >
> > > > > > > > > Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> > > > > > > > > when it returns a positive value "bypasses kernel checks".  The patch
> > > > > > > > > isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> > > > > > > > > based on a eBPF tree, so I can't say with 100% certainty that it is
> > > > > > > > > bypassing a capability check, but the description claims that to be
> > > > > > > > > the case.
> > > > > > > > >
> > > > > > > > > Regardless of how you want to spin this, I'm not supportive of a LSM
> > > > > > > > > hook which allows a LSM to bypass a capability check.  A LSM hook can
> > > > > > > > > be used to provide additional access control restrictions beyond a
> > > > > > > > > capability check, but a LSM hook should never be allowed to overrule
> > > > > > > > > an access denial due to a capability check.
> > > > > > > > >
> > > > > > > > > > The reason CAP_BPF was created was because there was nothing else that
> > > > > > > > > > would be fine-grained enough at the time.
> > > > > > > > >
> > > > > > > > > The LSM layer predates CAP_BPF, and one could make a very solid
> > > > > > > > > argument that one of the reasons LSMs exist is to provide
> > > > > > > > > supplementary controls due to capability-based access controls being a
> > > > > > > > > poor fit for many modern use cases.
> > > > > > > >
> > > > > > > > I generally agree with what you say, but we DO have this code pattern:
> > > > > > > >
> > > > > > > >          if (!some_check(...) && !capable(...))
> > > > > > > >                  return -EPERM;
> > > > > > >
> > > > > > > I think we need to make this more concrete; we don't have a pattern in
> > > > > > > the upstream kernel where 'some_check(...)' is a LSM hook, right?
> > > > > > > Simply because there is another kernel access control mechanism which
> > > > > > > allows a capability check to be skipped doesn't mean I want to allow a
> > > > > > > LSM hook to be used to skip a capability check.
> > > > > >
> > > > > > This work is an attempt to tighten the security of production systems
> > > > > > by allowing to drop too coarse-grained and permissive capabilities
> > > > > > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > > > > > than production use cases are meant to be able to do) and then grant
> > > > > > specific BPF operations on specific BPF programs/maps based on custom
> > > > > > LSM security policy, which validates application trustworthiness using
> > > > > > custom production-specific logic.
> > > > >
> > > > > There are ways to leverage the LSMs to apply finer grained access
> > > > > control on top of the relatively coarse capabilities that do not
> > > > > require circumventing those capability controls.  One grants the
> > > > > capabilities, just as one would do today, and then leverages the
> > > > > security functionality of a LSM to further restrict specific users,
> > > > > applications, etc. with a level of granularity beyond that offered by
> > > > > the capability controls.
> > > >
> > > > Please help me understand something. What you and Casey are proposing,
> > > > when taken to the logical extreme, is to grant to all processes root
> > > > permissions and then use LSM to restrict specific actions, do I
> > > > understand correctly? This strikes me as a less secure and more
> > > > error-prone way of doing things.
> > >
> > > When taken to the "logical extreme" most concepts end up sounding a
> > > bit absurd, but that was the point, wasn't it?
> >
> > Wasn't my intent to make it sound absurd, sorry. The way I see it, for
> > the sake of example, let's say CAP_BPF allows 20 different operations
> > (each with its own security_xxx hook). And let's say in production I
> > want to only allow 3 of them. Sure, technically it should be possible
> > to deny access at 17 hooks and let it through in just those 3. But if
> > someone adds 21st and I forget to add 21st restriction, that would be
> > bad (but very probably with such approach).
>
> Welcome to the challenges of maintaining access controls within the
> Linux Kernel, LSM or otherwise.  As we all know, the Linux Kernel
> moves forward at a staggering pace sometimes, and it is not uncommon
> for new features/subsystems to be added without consulting all of the
> different folks who worry about access controls.  In many cases it can
> be a simple misunderstanding, but in some cases it's a willful
> rejection of a particular form of access control, the LSM being a
> prime example.  Thankfully in almost all of those cases we have been
> moderately successful in retrofitting the necessary access controls,
> sometimes they are not as good/capable/granular/etc. as we would like
> because of design limitations, but such is life.
>
> I say this not because I believe this is a valid argument for
> authoritative LSM hooks, I say this simply to acknowledge that this
> *is* a problem.
>

Ack, thanks.

> > So my point is that for situations like this, dropping CAP_BPF, but
> > allowing only 3 hooks to proceed seems a safer approach, because if we
> > add 21st hook, it will safely be denied without CAP_BPF *by default*.
> > That's what I tried to point out.
>
> I believe I understand your point, I just disagree with you on
> accepting authoritative LSM hooks in the upstream Linux Kernel; I
> believe it would be a *big* mistake to move away from the restrictive
> LSM hook philosophy at this point in time.

Ok, understood. While unfortunate, I'll stop pushing for authoritative LSMs.

>
> > But even if we ignore this "safe by default when a new hook is added"
> > behavior, when taking user namespaces into account, the restrictive
> > LSM approach just doesn't seem to work at all for something like
> > CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME,
> > because we cannot ensure that a given BPF program won't access kernel
> > state "belonging" to another process (as one example).
>
> Once again, the root of this problem lies in the capabilities and/or
> namespace mechanisms, not the LSM; if you want to fix this properly
> you should be looking at how eBPF leverages capabilities for access
> control.  Changing the very core behavior of the LSM layer in order to
> work around an issue with another access control mechanism is a
> non-starter.  I can't say this enough.

Alright. I now do have an alternative approach in mind that will only
use restrictive LSMs and will still allow BPF usage within user
namespaces.

>
> > Now, thanks to Jonathan, I get that there was a heated discussion 20
> > years ago about authoritative vs restrictive LSMs. But if I read a
> > summary at that time ([0]), authoritative hooks were not out of the
> > question *in principle*. Surely, "walk before we can run" makes sense,
> > but it's been a while ago.
>
> ... and once again, the restrictive approach has proven to work
> reasonably well over the past ~20 years, why would we abandon that
> simply to work around a problem with a different access control
> mechanism.  Don't break the LSM layer to fix something else.

There was no breakage introduced, let's call things by their proper
names. Surely, new hooks were authoritative, but they don't really
break anything, right? I understand that they go against your
restrictive-only LSM philosophy, but it's not a breakage in any proper
sense of that word. All existing hooks continue to work. New hooks
would work properly as well. It's not a breakage. I'm not saying this
to try to convince you, but let's not misrepresent what I tried to do
in this patch set.

>
> > > Here is a fun story which seems relevant ... in the early days of
> > > SELinux, one of the community devs setup up a system with a SELinux
> > > policy which restricted all privileged operations from the root user,
> > > put the system on a publicly accessible network, posted the root
> > > password for all to see, and invited the public to login to the system
> > > and attempt to exercise root privilege (it's been well over 10 years
> > > at this point so the details are a bit fuzzy).  Granted, there were
> > > some hiccups in the beginning, mostly due to the crude state of policy
> > > development/analysis at the time, but after a few policy revisions the
> > > system held up quite well.
> >
> > Honest question out of curiosity: was the intent to demonstrate that
> > with LSM one can completely restrict root? Or that root was actually
> > allowed to do something useful?
>
> The intent was to show that it is possible to restrict
> capability-based access controls with the LSM layer; it was the best
> example of the "logical extreme" carried out in the real world that I
> could think of when writing my response.
>
> > > On the more practical side of things, there are several use cases
> > > which require, by way of legal or contractual requirements, that full
> > > root/admin privileges are decomposed into separate roles: security
> > > admin, audit admin, backup admin, etc.  These users satisfy these
> > > requirements by using LSMs, such as SELinux, to restrict the
> > > administrative capabilities based on the SELinux user/role/domain.
> > >
> > > > By the way, even the above proposal of yours doesn't work for
> > > > production use cases when user namespaces are involved, as far as I
> > > > understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for
> > > > containers running inside user namespaces, as CAP_BPF in non-init
> > > > namespace is not enough for bpf() syscall to allow loading BPF maps or
> > > > BPF program ...
> > >
> > > Once again, the LSM has always intended to be a restrictive mechanism,
> > > not a privilege granting mechanism.  If an operation is not possible
> >
> > Not according to [0] above:
>
> When one considers what has been present in Linus' tree, then yes.
> The idea of authoritative LSM hooks has been rejected for ~20 years
> and I've seen nothing in this thread to make me believe that we should
> change that now, and for this use case.

Ack.

>
> > > Based on your patches and our discussion, it seems to me that the
> > > problem you are trying to resolve is related more to the
> > > capability-based access controls in the eBPF, and possibly other
> > > kernel subsystems, and not any LSM-based restrictions.  I'm happy to
> > > work with you on a solution involving the LSM, but please understand
> > > that I'm not going to support a solution which changes a core
> > > philosophy of the LSM layer.
> >
> > Great, I'd really appreciate help and suggestions on how to solve the
> > following problem.
> >
> > We have a BPF subsystem that allows loading BPF programs. Those BPF
> > programs cannot be contained within a particular namespace just by its
> > system-wide tracing nature (it can safely read kernel and user memory
> > and we can't restrict whether that memory belongs to a particular
> > namespace), so it's like CAP_SYS_TIME, just with much broader API
> > surface.
> >
> > The other piece of a puzzle is user namespaces. We do want to run
> > applications inside user namespaces, but allow them to use BPF
> > programs. As far as I can tell, there is no way to grant real CAP_BPF
> > that will be recognized by capable(CAP_BPF) (not ns_capable, see above
> > about system-wide nature of BPF). If there is, please help me
> > understand how. All my local experiments failed, and looking at
> > cap_capable() implementation it is not intended to even check the
> > initial namespace's capability if the process is running in the user
> > namespace.
> >
> > So, given that a) we can't make CAP_BPF namespace-aware and b) we
> > can't grant real CAP_BPF to processes in user namespace, how could we
> > allow user namespaced applications to do useful work with BPF?
>
> I would start by talking with the user namespace folks.  I may be
> misunderstanding the problem as you've described it, but it seems like
> the core issue is how capabilities, specifically CAP_BPF, are handled
> in user namespaces.  To be honest, I'm not sure how much luck you'll
> have there, but you stand a better chance in changing how capabilities
> are handled across user namespaces than you do in getting an
> authoritative LSM hook merged.
>

You made it very clear, yes.

> Regardless, my offer still stands, if you have a solution which sticks
> to a restrictive LSM model, I'm happy to work with you further to sort
> out the details and try to make that work.  I don't have any great
> ideas there at the moment, but there are plenty of smart people on
> this mailing list and others who might have something clever in mind.

I do have a solution in mind. Stay tuned.

>
> --
> paul-moore.com

Kees Cook April 21, 2023, 6:57 p.m. UTC | #28

On Thu, Apr 20, 2023 at 05:00:55PM -0700, Andrii Nakryiko wrote:
> Alright. I now do have an alternative approach in mind that will only
> use restrictive LSMs and will still allow BPF usage within user
> namespaces.

It seems the problem with in the existing kernel is that bpf_capable() is
rather inflexible. In only one place is sysctl_unprivileged_bpf_disabled
checked (outside the unprivileged_ebpf_enabled() checks in CPU errata
fixes).

Should CAP_BPF be per-namespace?

[bpf-next,0/8] New BPF map and BTF security LSM hooks

Message

Comments