mbox series

[v2,0/4] Introduce user namespace capabilities

Message ID 20240609104355.442002-1-jcalmels@3xx0.net (mailing list archive)
Headers show
Series Introduce user namespace capabilities | expand

Message

Jonathan Calmels June 9, 2024, 10:43 a.m. UTC
This patch series introduces a new user namespace capability set, as
well as some plumbing around it (i.e. sysctl, secbit, lsm support).

First patch goes over the motivations for this as well as prior art.

In summary, while user namespaces are a great success today in that they
avoid running a lot of code as root, they also expand the attack surface
of the kernel substantially which is often abused by attackers. 
Methods exist to limit the creation of such namespaces [1], however,
application developers often need to assume that user namespaces are
available for various tasks such as sandboxing. Thus, instead of
restricting the creation of user namespaces, we offer ways for userspace
to limit the capabilities granted to them.

Why a new capability set and not something specific to the userns (e.g.
ioctl_ns)?

    1. We can't really expect userspace to patch every single callsite
    and opt-in this new security mechanism. 

    2. We don't necessarily want policies enforced at said callsites.
    For example a service like systemd-machined or a PAM session need to
    be able to place restrictions on any namespace spawned under it.

    3. We would need to come up with inheritance rules, querying
    capabilities, etc. At this point we're just reinventing capability
    sets.

    4. We can easily define interactions between capability sets, thus
    helping with adoption (patch 2 is an example of this)

Some examples of how this could be leveraged in userspace:

    - Prevent user from getting CAP_NET_ADMIN in user namespaces under SSH:
        echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
        echo "!cap_net_admin $USER"     >> /etc/security/capability.conf
        capsh --secbits=$((1 << 8)) -- -c /usr/sbin/sshd

    - Prevent containers from ever getting CAP_DAC_OVERRIDE:
        systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
                    -p SecureBits=userns-strict-caps \
                    /usr/bin/dockerd
        systemd-run -p UserNSCapabilities=~CAP_DAC_OVERRIDE \
                    /usr/bin/incusd

    - Kernel could be vulnerable to CAP_SYS_RAWIO exploits, prevent it:
        sysctl -w cap_bound_userns_mask=0x1fffffdffff

    - Drop CAP_SYS_ADMIN for this shell and all the user namespaces below it:
        bwrap --unshare-user --cap-drop CAP_SYS_ADMIN /bin/sh

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7cd4c5c2101cb092db00f61f69d24380cf7a0ee8

---
Changes since v1:
- Add documentation
- Change commit wording
- Cleanup various aspects of the code based on feedback
- Add new CAP_SYS_CONTROL capability for sysctl check
- Add BPF-LSM support for modifying userns capabilities
---
Jonathan Calmels (4):
  capabilities: Add user namespace capabilities
  capabilities: Add securebit to restrict userns caps
  capabilities: Add sysctl to mask off userns caps
  bpf,lsm: Allow editing capabilities in BPF-LSM hooks

 Documentation/filesystems/proc.rst            |  1 +
 Documentation/security/credentials.rst        |  6 ++
 fs/proc/array.c                               |  9 +++
 include/linux/cred.h                          |  3 +
 include/linux/lsm_hook_defs.h                 |  2 +-
 include/linux/securebits.h                    |  1 +
 include/linux/security.h                      |  4 +-
 include/linux/user_namespace.h                |  7 ++
 include/uapi/linux/capability.h               |  6 +-
 include/uapi/linux/prctl.h                    |  7 ++
 include/uapi/linux/securebits.h               | 11 ++-
 kernel/bpf/bpf_lsm.c                          | 55 +++++++++++++
 kernel/cred.c                                 |  3 +
 kernel/sysctl.c                               | 10 +++
 kernel/umh.c                                  | 15 ++++
 kernel/user_namespace.c                       | 80 +++++++++++++++++--
 security/apparmor/lsm.c                       |  2 +-
 security/commoncap.c                          | 62 +++++++++++++-
 security/keys/process_keys.c                  |  3 +
 security/security.c                           |  6 +-
 security/selinux/hooks.c                      |  2 +-
 security/selinux/include/classmap.h           |  5 +-
 .../selftests/bpf/prog_tests/deny_namespace.c | 12 ++-
 .../selftests/bpf/progs/test_deny_namespace.c |  7 +-
 24 files changed, 291 insertions(+), 28 deletions(-)

Comments

Josef Bacik June 10, 2024, 8:12 p.m. UTC | #1
On Sun, Jun 09, 2024 at 03:43:33AM -0700, Jonathan Calmels wrote:
> This patch series introduces a new user namespace capability set, as
> well as some plumbing around it (i.e. sysctl, secbit, lsm support).
> 
> First patch goes over the motivations for this as well as prior art.
> 
> In summary, while user namespaces are a great success today in that they
> avoid running a lot of code as root, they also expand the attack surface
> of the kernel substantially which is often abused by attackers. 
> Methods exist to limit the creation of such namespaces [1], however,
> application developers often need to assume that user namespaces are
> available for various tasks such as sandboxing. Thus, instead of
> restricting the creation of user namespaces, we offer ways for userspace
> to limit the capabilities granted to them.
> 
> Why a new capability set and not something specific to the userns (e.g.
> ioctl_ns)?
> 
>     1. We can't really expect userspace to patch every single callsite
>     and opt-in this new security mechanism. 
> 
>     2. We don't necessarily want policies enforced at said callsites.
>     For example a service like systemd-machined or a PAM session need to
>     be able to place restrictions on any namespace spawned under it.
> 
>     3. We would need to come up with inheritance rules, querying
>     capabilities, etc. At this point we're just reinventing capability
>     sets.
> 
>     4. We can easily define interactions between capability sets, thus
>     helping with adoption (patch 2 is an example of this)
> 
> Some examples of how this could be leveraged in userspace:
> 
>     - Prevent user from getting CAP_NET_ADMIN in user namespaces under SSH:
>         echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
>         echo "!cap_net_admin $USER"     >> /etc/security/capability.conf
>         capsh --secbits=$((1 << 8)) -- -c /usr/sbin/sshd
> 
>     - Prevent containers from ever getting CAP_DAC_OVERRIDE:
>         systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
>                     -p SecureBits=userns-strict-caps \
>                     /usr/bin/dockerd
>         systemd-run -p UserNSCapabilities=~CAP_DAC_OVERRIDE \
>                     /usr/bin/incusd
> 
>     - Kernel could be vulnerable to CAP_SYS_RAWIO exploits, prevent it:
>         sysctl -w cap_bound_userns_mask=0x1fffffdffff
> 
>     - Drop CAP_SYS_ADMIN for this shell and all the user namespaces below it:
>         bwrap --unshare-user --cap-drop CAP_SYS_ADMIN /bin/sh
> 

Where are the tests for this patchset?  I see you updated the bpf tests for the
bpf lsm bits, but there's nothing to validate this new behavior or exercise the
new ioctl you've added.  Thanks,

Josef
Jonathan Calmels June 11, 2024, 8:33 a.m. UTC | #2
On Mon, Jun 10, 2024 at 04:12:27PM GMT, Josef Bacik wrote:
> Where are the tests for this patchset?  I see you updated the bpf tests for the
> bpf lsm bits, but there's nothing to validate this new behavior or exercise the
> new ioctl you've added.  Thanks,

Apologies, I haven't had much time to spend on it so I prioritized the
rest. But yes, we should certainly update the capabilities selftests
once we agreed on the different behaviors.