mbox series

[v1,0/3] Introduce CAP_SYS_PERFMON capability for secure Perf users groups

Message ID 283f09a5-33bd-eac3-bdfd-83d775045bf9@linux.intel.com (mailing list archive)
Headers show
Series Introduce CAP_SYS_PERFMON capability for secure Perf users groups | expand

Message

Alexey Budankov Dec. 5, 2019, 4:15 p.m. UTC
Currently access to perf_events functionality [1] beyond the scope permitted
by perf_event_paranoid [1] kernel setting is allowed to a privileged process
[2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].

This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
governing role for perf_events based performance monitoring of a system.

CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
performance using perf_events subsystem by processes and Perf privileged users
[2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
privileged processes [3].

CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
performance monitoring functionality of perf_events and balance amount of
CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
see Notes to kernel developers, below."

For backward compatibility reasons performance monitoring functionality of 
perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
secure performance monitoring use cases is discouraged with respect to the
introduced CAP_SYS_PERFMON capability.

In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
[2] to conduct secure performance monitoring using perf_events in the scope
of available online CPUs when executing code in kernel and user modes.

Possible alternative solution to this capabilities balancing, system security
hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
perf_events' performance monitoring functionality, since process debugging is
similar to performance monitoring with respect to providing insights into
process memory and execution details. However CAP_SYS_PTRACE still provides
users with more credentials than are required for secure performance monitoring
using perf_events subsystem and this excess is avoided by using the dedicated
CAP_SYS_PERFMON capability.

libcap library utilities [4], [5] and Perf tool can be used to apply
CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
permitted by system wide perf_event_paranoid kernel setting and below are the
steps to evaluate the advancement suggested by the patch set:

  - patch, build and boot the kernel
  - patch, build Perf tool e.g. to /home/user/perf
  ...
  # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
  # pushd libcap
  # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
  # make
  # pushd progs
  # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
  # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
  /home/user/perf: OK
  # ./getcap /home/user/perf
  /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
  # echo 2 > /proc/sys/kernel/perf_event_paranoid
  # cat /proc/sys/kernel/perf_event_paranoid 
  2
  ...
  $ /home/user/perf top
    ... works as expected ...
  $ cat /proc/`pidof perf`/status
  Name:	perf
  Umask:	0002
  State:	S (sleeping)
  Tgid:	2958
  Ngid:	0
  Pid:	2958
  PPid:	9847
  TracerPid:	0
  Uid:	500	500	500	500
  Gid:	500	500	500	500
  FDSize:	256
  ...
  CapInh:	0000000000000000
  CapPrm:	0000004400080000
  CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
                                     cap_sys_perfmon,cap_sys_ptrace,cap_syslog
  CapBnd:	0000007fffffffff
  CapAmb:	0000000000000000
  NoNewPrivs:	0
  Seccomp:	0
  Speculation_Store_Bypass:	thread vulnerable
  Cpus_allowed:	ff
  Cpus_allowed_list:	0-7
  ...

Usage of cap_sys_perfmon effectively avoids unused credentials excess:
- with cap_sys_admin:
  CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
- with cap_sys_perfmon:
  CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
                                    38   34               19
                           sys_perfmon   syslog           sys_ptrace

The patch set is for tip perf/core repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
  tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6

[1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
[2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
[3] http://man7.org/linux/man-pages/man7/capabilities.7.html
[4] http://man7.org/linux/man-pages/man8/setcap.8.html
[5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
[6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf

---
Alexey Budankov (3):
  capabilities: introduce CAP_SYS_PERFMON to kernel and user space
  perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
  perf tool: extend Perf tool with CAP_SYS_PERFMON support

 include/linux/perf_event.h          |  6 ++++--
 include/uapi/linux/capability.h     | 10 +++++++++-
 security/selinux/include/classmap.h |  4 ++--
 tools/perf/design.txt               |  3 ++-
 tools/perf/util/cap.h               |  4 ++++
 tools/perf/util/evsel.c             | 10 +++++-----
 tools/perf/util/util.c              | 15 +++++++++++++--
 7 files changed, 39 insertions(+), 13 deletions(-)

Comments

Casey Schaufler Dec. 5, 2019, 4:49 p.m. UTC | #1
On 12/5/2019 8:15 AM, Alexey Budankov wrote:
> Currently access to perf_events functionality [1] beyond the scope permitted
> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>
> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
> governing role for perf_events based performance monitoring of a system.
>
> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
> performance using perf_events subsystem by processes and Perf privileged users
> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
> privileged processes [3].

Are there use cases where you would need CAP_SYS_PERFMON where you
would not also need CAP_SYS_ADMIN? If you separate a new capability
from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
with the new capability it is all rather pointless.

The scope you've defined for this CAP_SYS_PERFMON is very small.
Is there a larger set of privilege checks that might be applicable
for it?
 

>
> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
> performance monitoring functionality of perf_events and balance amount of
> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
> see Notes to kernel developers, below."
>
> For backward compatibility reasons performance monitoring functionality of 
> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
> secure performance monitoring use cases is discouraged with respect to the
> introduced CAP_SYS_PERFMON capability.
>
> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
> [2] to conduct secure performance monitoring using perf_events in the scope
> of available online CPUs when executing code in kernel and user modes.
>
> Possible alternative solution to this capabilities balancing, system security
> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
> perf_events' performance monitoring functionality, since process debugging is
> similar to performance monitoring with respect to providing insights into
> process memory and execution details. However CAP_SYS_PTRACE still provides
> users with more credentials than are required for secure performance monitoring
> using perf_events subsystem and this excess is avoided by using the dedicated
> CAP_SYS_PERFMON capability.
>
> libcap library utilities [4], [5] and Perf tool can be used to apply
> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
> permitted by system wide perf_event_paranoid kernel setting and below are the
> steps to evaluate the advancement suggested by the patch set:
>
>   - patch, build and boot the kernel
>   - patch, build Perf tool e.g. to /home/user/perf
>   ...
>   # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
>   # pushd libcap
>   # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
>   # make
>   # pushd progs
>   # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>   # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>   /home/user/perf: OK
>   # ./getcap /home/user/perf
>   /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
>   # echo 2 > /proc/sys/kernel/perf_event_paranoid
>   # cat /proc/sys/kernel/perf_event_paranoid 
>   2
>   ...
>   $ /home/user/perf top
>     ... works as expected ...
>   $ cat /proc/`pidof perf`/status
>   Name:	perf
>   Umask:	0002
>   State:	S (sleeping)
>   Tgid:	2958
>   Ngid:	0
>   Pid:	2958
>   PPid:	9847
>   TracerPid:	0
>   Uid:	500	500	500	500
>   Gid:	500	500	500	500
>   FDSize:	256
>   ...
>   CapInh:	0000000000000000
>   CapPrm:	0000004400080000
>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>                                      cap_sys_perfmon,cap_sys_ptrace,cap_syslog
>   CapBnd:	0000007fffffffff
>   CapAmb:	0000000000000000
>   NoNewPrivs:	0
>   Seccomp:	0
>   Speculation_Store_Bypass:	thread vulnerable
>   Cpus_allowed:	ff
>   Cpus_allowed_list:	0-7
>   ...
>
> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
> - with cap_sys_admin:
>   CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
> - with cap_sys_perfmon:
>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>                                     38   34               19
>                            sys_perfmon   syslog           sys_ptrace
>
> The patch set is for tip perf/core repository:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
>   tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
>
> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
>
> ---
> Alexey Budankov (3):
>   capabilities: introduce CAP_SYS_PERFMON to kernel and user space
>   perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
>   perf tool: extend Perf tool with CAP_SYS_PERFMON support
>
>  include/linux/perf_event.h          |  6 ++++--
>  include/uapi/linux/capability.h     | 10 +++++++++-
>  security/selinux/include/classmap.h |  4 ++--
>  tools/perf/design.txt               |  3 ++-
>  tools/perf/util/cap.h               |  4 ++++
>  tools/perf/util/evsel.c             | 10 +++++-----
>  tools/perf/util/util.c              | 15 +++++++++++++--
>  7 files changed, 39 insertions(+), 13 deletions(-)
>
Alexey Budankov Dec. 5, 2019, 5:05 p.m. UTC | #2
Hello Casey,
 
On 05.12.2019 19:49, Casey Schaufler wrote:
> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>> Currently access to perf_events functionality [1] beyond the scope permitted
>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>
>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>> governing role for perf_events based performance monitoring of a system.
>>
>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>> performance using perf_events subsystem by processes and Perf privileged users
>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>> privileged processes [3].
> 
> Are there use cases where you would need CAP_SYS_PERFMON where you
> would not also need CAP_SYS_ADMIN? If you separate a new capability

Actually, there are. Perf tool that has record, stat and top modes could run with
CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.

> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
> with the new capability it is all rather pointless.
> 
> The scope you've defined for this CAP_SYS_PERFMON is very small.
> Is there a larger set of privilege checks that might be applicable
> for it?

CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
and stat mode use cases for system wide performance monitoring in kernel and
user modes.

Thanks,
Alexey

>  
> 
>>
>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
>> performance monitoring functionality of perf_events and balance amount of
>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
>> see Notes to kernel developers, below."
>>
>> For backward compatibility reasons performance monitoring functionality of 
>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
>> secure performance monitoring use cases is discouraged with respect to the
>> introduced CAP_SYS_PERFMON capability.
>>
>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
>> [2] to conduct secure performance monitoring using perf_events in the scope
>> of available online CPUs when executing code in kernel and user modes.
>>
>> Possible alternative solution to this capabilities balancing, system security
>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
>> perf_events' performance monitoring functionality, since process debugging is
>> similar to performance monitoring with respect to providing insights into
>> process memory and execution details. However CAP_SYS_PTRACE still provides
>> users with more credentials than are required for secure performance monitoring
>> using perf_events subsystem and this excess is avoided by using the dedicated
>> CAP_SYS_PERFMON capability.
>>
>> libcap library utilities [4], [5] and Perf tool can be used to apply
>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
>> permitted by system wide perf_event_paranoid kernel setting and below are the
>> steps to evaluate the advancement suggested by the patch set:
>>
>>   - patch, build and boot the kernel
>>   - patch, build Perf tool e.g. to /home/user/perf
>>   ...
>>   # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
>>   # pushd libcap
>>   # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
>>   # make
>>   # pushd progs
>>   # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>   # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>   /home/user/perf: OK
>>   # ./getcap /home/user/perf
>>   /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
>>   # echo 2 > /proc/sys/kernel/perf_event_paranoid
>>   # cat /proc/sys/kernel/perf_event_paranoid 
>>   2
>>   ...
>>   $ /home/user/perf top
>>     ... works as expected ...
>>   $ cat /proc/`pidof perf`/status
>>   Name:	perf
>>   Umask:	0002
>>   State:	S (sleeping)
>>   Tgid:	2958
>>   Ngid:	0
>>   Pid:	2958
>>   PPid:	9847
>>   TracerPid:	0
>>   Uid:	500	500	500	500
>>   Gid:	500	500	500	500
>>   FDSize:	256
>>   ...
>>   CapInh:	0000000000000000
>>   CapPrm:	0000004400080000
>>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>                                      cap_sys_perfmon,cap_sys_ptrace,cap_syslog
>>   CapBnd:	0000007fffffffff
>>   CapAmb:	0000000000000000
>>   NoNewPrivs:	0
>>   Seccomp:	0
>>   Speculation_Store_Bypass:	thread vulnerable
>>   Cpus_allowed:	ff
>>   Cpus_allowed_list:	0-7
>>   ...
>>
>> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
>> - with cap_sys_admin:
>>   CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
>> - with cap_sys_perfmon:
>>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>                                     38   34               19
>>                            sys_perfmon   syslog           sys_ptrace
>>
>> The patch set is for tip perf/core repository:
>>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
>>   tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
>>
>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
>>
>> ---
>> Alexey Budankov (3):
>>   capabilities: introduce CAP_SYS_PERFMON to kernel and user space
>>   perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
>>   perf tool: extend Perf tool with CAP_SYS_PERFMON support
>>
>>  include/linux/perf_event.h          |  6 ++++--
>>  include/uapi/linux/capability.h     | 10 +++++++++-
>>  security/selinux/include/classmap.h |  4 ++--
>>  tools/perf/design.txt               |  3 ++-
>>  tools/perf/util/cap.h               |  4 ++++
>>  tools/perf/util/evsel.c             | 10 +++++-----
>>  tools/perf/util/util.c              | 15 +++++++++++++--
>>  7 files changed, 39 insertions(+), 13 deletions(-)
>>
> 
>
Casey Schaufler Dec. 5, 2019, 5:33 p.m. UTC | #3
On 12/5/2019 9:05 AM, Alexey Budankov wrote:
> Hello Casey,
>  
> On 05.12.2019 19:49, Casey Schaufler wrote:
>> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>>> Currently access to perf_events functionality [1] beyond the scope permitted
>>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>>
>>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>>> governing role for perf_events based performance monitoring of a system.
>>>
>>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>>> performance using perf_events subsystem by processes and Perf privileged users
>>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>>> privileged processes [3].
>> Are there use cases where you would need CAP_SYS_PERFMON where you
>> would not also need CAP_SYS_ADMIN? If you separate a new capability
> Actually, there are. Perf tool that has record, stat and top modes could run with
> CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
> data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.

The question isn't whether the tool could use the capability, it's whether
the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
My bet is that any tool that does performance monitoring is going to need
CAP_SYS_ADMIN for other reasons.

>
>> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
>> with the new capability it is all rather pointless.
>>
>> The scope you've defined for this CAP_SYS_PERFMON is very small.
>> Is there a larger set of privilege checks that might be applicable
>> for it?
> CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
> and stat mode use cases for system wide performance monitoring in kernel and
> user modes.

The granularity of capabilities is something we have to watch
very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
if we broke it up "properly" we'd have hundreds of capabilities.
If you want control that finely we have SELinux.

>
> Thanks,
> Alexey
>
>>  
>>
>>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
>>> performance monitoring functionality of perf_events and balance amount of
>>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
>>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
>>> see Notes to kernel developers, below."
>>>
>>> For backward compatibility reasons performance monitoring functionality of 
>>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
>>> secure performance monitoring use cases is discouraged with respect to the
>>> introduced CAP_SYS_PERFMON capability.
>>>
>>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
>>> [2] to conduct secure performance monitoring using perf_events in the scope
>>> of available online CPUs when executing code in kernel and user modes.
>>>
>>> Possible alternative solution to this capabilities balancing, system security
>>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
>>> perf_events' performance monitoring functionality, since process debugging is
>>> similar to performance monitoring with respect to providing insights into
>>> process memory and execution details. However CAP_SYS_PTRACE still provides
>>> users with more credentials than are required for secure performance monitoring
>>> using perf_events subsystem and this excess is avoided by using the dedicated
>>> CAP_SYS_PERFMON capability.
>>>
>>> libcap library utilities [4], [5] and Perf tool can be used to apply
>>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
>>> permitted by system wide perf_event_paranoid kernel setting and below are the
>>> steps to evaluate the advancement suggested by the patch set:
>>>
>>>   - patch, build and boot the kernel
>>>   - patch, build Perf tool e.g. to /home/user/perf
>>>   ...
>>>   # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
>>>   # pushd libcap
>>>   # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
>>>   # make
>>>   # pushd progs
>>>   # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>>   # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>>   /home/user/perf: OK
>>>   # ./getcap /home/user/perf
>>>   /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
>>>   # echo 2 > /proc/sys/kernel/perf_event_paranoid
>>>   # cat /proc/sys/kernel/perf_event_paranoid 
>>>   2
>>>   ...
>>>   $ /home/user/perf top
>>>     ... works as expected ...
>>>   $ cat /proc/`pidof perf`/status
>>>   Name:	perf
>>>   Umask:	0002
>>>   State:	S (sleeping)
>>>   Tgid:	2958
>>>   Ngid:	0
>>>   Pid:	2958
>>>   PPid:	9847
>>>   TracerPid:	0
>>>   Uid:	500	500	500	500
>>>   Gid:	500	500	500	500
>>>   FDSize:	256
>>>   ...
>>>   CapInh:	0000000000000000
>>>   CapPrm:	0000004400080000
>>>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>>                                      cap_sys_perfmon,cap_sys_ptrace,cap_syslog
>>>   CapBnd:	0000007fffffffff
>>>   CapAmb:	0000000000000000
>>>   NoNewPrivs:	0
>>>   Seccomp:	0
>>>   Speculation_Store_Bypass:	thread vulnerable
>>>   Cpus_allowed:	ff
>>>   Cpus_allowed_list:	0-7
>>>   ...
>>>
>>> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
>>> - with cap_sys_admin:
>>>   CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
>>> - with cap_sys_perfmon:
>>>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>>                                     38   34               19
>>>                            sys_perfmon   syslog           sys_ptrace
>>>
>>> The patch set is for tip perf/core repository:
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
>>>   tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
>>>
>>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
>>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
>>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
>>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
>>>
>>> ---
>>> Alexey Budankov (3):
>>>   capabilities: introduce CAP_SYS_PERFMON to kernel and user space
>>>   perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
>>>   perf tool: extend Perf tool with CAP_SYS_PERFMON support
>>>
>>>  include/linux/perf_event.h          |  6 ++++--
>>>  include/uapi/linux/capability.h     | 10 +++++++++-
>>>  security/selinux/include/classmap.h |  4 ++--
>>>  tools/perf/design.txt               |  3 ++-
>>>  tools/perf/util/cap.h               |  4 ++++
>>>  tools/perf/util/evsel.c             | 10 +++++-----
>>>  tools/perf/util/util.c              | 15 +++++++++++++--
>>>  7 files changed, 39 insertions(+), 13 deletions(-)
>>>
>>
Andi Kleen Dec. 5, 2019, 6:11 p.m. UTC | #4
> The question isn't whether the tool could use the capability, it's whether
> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
> My bet is that any tool that does performance monitoring is going to need
> CAP_SYS_ADMIN for other reasons.

At least perf stat won't.

-Andi
Alexey Budankov Dec. 5, 2019, 6:37 p.m. UTC | #5
On 05.12.2019 20:33, Casey Schaufler wrote:
> On 12/5/2019 9:05 AM, Alexey Budankov wrote:
>> Hello Casey,
>>  
>> On 05.12.2019 19:49, Casey Schaufler wrote:
>>> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>>>> Currently access to perf_events functionality [1] beyond the scope permitted
>>>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>>>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>>>
>>>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>>>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>>>> governing role for perf_events based performance monitoring of a system.
>>>>
>>>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>>>> performance using perf_events subsystem by processes and Perf privileged users
>>>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>>>> privileged processes [3].
>>> Are there use cases where you would need CAP_SYS_PERFMON where you
>>> would not also need CAP_SYS_ADMIN? If you separate a new capability
>> Actually, there are. Perf tool that has record, stat and top modes could run with
>> CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
>> data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.
> 
> The question isn't whether the tool could use the capability, it's whether
> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
> My bet is that any tool that does performance monitoring is going to need
> CAP_SYS_ADMIN for other reasons.

Yes, sorry. The tool is perf tool (part of kernel tree). If its binary is granted 
CAP_SYS_ADMIN capability then the tool can collect performance data in system wide 
mode for some group of unprivileged users.

This patch allows replacing CAP_SYS_ADMIN by CAP_SYS_PERFMON e.g. for perf tool and 
then the tool being granted CAP_SYS_PERFMON could still provide performance data
in system wide scope for the same group of unprivileged users.

Hope it's got clearer. Feel free to ask more.

Thanks,
Alexey

> 
>>
>>> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
>>> with the new capability it is all rather pointless.
>>>
>>> The scope you've defined for this CAP_SYS_PERFMON is very small.
>>> Is there a larger set of privilege checks that might be applicable
>>> for it?
>> CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
>> and stat mode use cases for system wide performance monitoring in kernel and
>> user modes.
> 
> The granularity of capabilities is something we have to watch
> very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
> if we broke it up "properly" we'd have hundreds of capabilities.
> If you want control that finely we have SELinux.
> 
>>
>> Thanks,
>> Alexey
>>
>>>  
>>>
>>>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
>>>> performance monitoring functionality of perf_events and balance amount of
>>>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
>>>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
>>>> see Notes to kernel developers, below."
>>>>
>>>> For backward compatibility reasons performance monitoring functionality of 
>>>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
>>>> secure performance monitoring use cases is discouraged with respect to the
>>>> introduced CAP_SYS_PERFMON capability.
>>>>
>>>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
>>>> [2] to conduct secure performance monitoring using perf_events in the scope
>>>> of available online CPUs when executing code in kernel and user modes.
>>>>
>>>> Possible alternative solution to this capabilities balancing, system security
>>>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
>>>> perf_events' performance monitoring functionality, since process debugging is
>>>> similar to performance monitoring with respect to providing insights into
>>>> process memory and execution details. However CAP_SYS_PTRACE still provides
>>>> users with more credentials than are required for secure performance monitoring
>>>> using perf_events subsystem and this excess is avoided by using the dedicated
>>>> CAP_SYS_PERFMON capability.
>>>>
>>>> libcap library utilities [4], [5] and Perf tool can be used to apply
>>>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
>>>> permitted by system wide perf_event_paranoid kernel setting and below are the
>>>> steps to evaluate the advancement suggested by the patch set:
>>>>
>>>>   - patch, build and boot the kernel
>>>>   - patch, build Perf tool e.g. to /home/user/perf
>>>>   ...
>>>>   # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
>>>>   # pushd libcap
>>>>   # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
>>>>   # make
>>>>   # pushd progs
>>>>   # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>>>   # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
>>>>   /home/user/perf: OK
>>>>   # ./getcap /home/user/perf
>>>>   /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
>>>>   # echo 2 > /proc/sys/kernel/perf_event_paranoid
>>>>   # cat /proc/sys/kernel/perf_event_paranoid 
>>>>   2
>>>>   ...
>>>>   $ /home/user/perf top
>>>>     ... works as expected ...
>>>>   $ cat /proc/`pidof perf`/status
>>>>   Name:	perf
>>>>   Umask:	0002
>>>>   State:	S (sleeping)
>>>>   Tgid:	2958
>>>>   Ngid:	0
>>>>   Pid:	2958
>>>>   PPid:	9847
>>>>   TracerPid:	0
>>>>   Uid:	500	500	500	500
>>>>   Gid:	500	500	500	500
>>>>   FDSize:	256
>>>>   ...
>>>>   CapInh:	0000000000000000
>>>>   CapPrm:	0000004400080000
>>>>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>>>                                      cap_sys_perfmon,cap_sys_ptrace,cap_syslog
>>>>   CapBnd:	0000007fffffffff
>>>>   CapAmb:	0000000000000000
>>>>   NoNewPrivs:	0
>>>>   Seccomp:	0
>>>>   Speculation_Store_Bypass:	thread vulnerable
>>>>   Cpus_allowed:	ff
>>>>   Cpus_allowed_list:	0-7
>>>>   ...
>>>>
>>>> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
>>>> - with cap_sys_admin:
>>>>   CapEff:	0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
>>>> - with cap_sys_perfmon:
>>>>   CapEff:	0000004400080000 => 01000100 00000000 00001000 00000000 00000000
>>>>                                     38   34               19
>>>>                            sys_perfmon   syslog           sys_ptrace
>>>>
>>>> The patch set is for tip perf/core repository:
>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
>>>>   tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
>>>>
>>>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
>>>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>>>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
>>>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
>>>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
>>>>
>>>> ---
>>>> Alexey Budankov (3):
>>>>   capabilities: introduce CAP_SYS_PERFMON to kernel and user space
>>>>   perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
>>>>   perf tool: extend Perf tool with CAP_SYS_PERFMON support
>>>>
>>>>  include/linux/perf_event.h          |  6 ++++--
>>>>  include/uapi/linux/capability.h     | 10 +++++++++-
>>>>  security/selinux/include/classmap.h |  4 ++--
>>>>  tools/perf/design.txt               |  3 ++-
>>>>  tools/perf/util/cap.h               |  4 ++++
>>>>  tools/perf/util/evsel.c             | 10 +++++-----
>>>>  tools/perf/util/util.c              | 15 +++++++++++++--
>>>>  7 files changed, 39 insertions(+), 13 deletions(-)
>>>>
>>>
> 
>
Alexey Budankov Dec. 11, 2019, 10:52 a.m. UTC | #6
On 05.12.2019 20:33, Casey Schaufler wrote:
> On 12/5/2019 9:05 AM, Alexey Budankov wrote:
>> Hello Casey,
>>  
>> On 05.12.2019 19:49, Casey Schaufler wrote:
>>> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>>>> Currently access to perf_events functionality [1] beyond the scope permitted
>>>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>>>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>>>
>>>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>>>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>>>> governing role for perf_events based performance monitoring of a system.
>>>>
>>>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>>>> performance using perf_events subsystem by processes and Perf privileged users
>>>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>>>> privileged processes [3].
>>> Are there use cases where you would need CAP_SYS_PERFMON where you
>>> would not also need CAP_SYS_ADMIN? If you separate a new capability
>> Actually, there are. Perf tool that has record, stat and top modes could run with
>> CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
>> data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.
> 
> The question isn't whether the tool could use the capability, it's whether
> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
> My bet is that any tool that does performance monitoring is going to need
> CAP_SYS_ADMIN for other reasons.
> 
>>
>>> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
>>> with the new capability it is all rather pointless.
>>>
>>> The scope you've defined for this CAP_SYS_PERFMON is very small.
>>> Is there a larger set of privilege checks that might be applicable
>>> for it?
>> CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
>> and stat mode use cases for system wide performance monitoring in kernel and
>> user modes.
> 
> The granularity of capabilities is something we have to watch
> very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
> if we broke it up "properly" we'd have hundreds of capabilities.

Fully agree and this broader discussion is really helpful to come up with
properly balanced solution.

> If you want control that finely we have SELinux.

Undoubtedly, SELinux is the powerful, mature, whole level of functionality that
could provide benefits not only for perf_events subsystem. However perf_events
is built around capabilities to provide access control to its functionality,
thus perf_events would require considerable rework prior it could be controlled
thru SELinux. Then the adoption could also require changes to the installed
infrastructure just for the sake of adopting alternative access control mechanism.

On the other hand there are currently already existing users and use cases that
are built around the CAP_SYS_ADMIN based access control, and Perf tool, which is
the native Linux kernel observability and performance profiling tool, provides
means to operate in restricted multiuser environments(HPC clusters, cloud and 
virtual environments) for groups of unprivileged users under admins control [1].

In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
trade-offs between perf_events subsystem extensions, required level of control
and configurability of perf_events, existing users adoption effort, and it brings
security hardening benefits of decreasing attack surface for the existing users
and use cases.

Well, yes, it is really good that Linux nowadays provides a handful of various
security assuring mechanisms but proper balance is what usually makes valuable
features happen and its users happy and moves forward. 

Gratefully,
Alexey

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
Peter Zijlstra Dec. 11, 2019, 3:24 p.m. UTC | #7
On Wed, Dec 11, 2019 at 01:52:15PM +0300, Alexey Budankov wrote:
> Undoubtedly, SELinux is the powerful, mature, whole level of functionality that
> could provide benefits not only for perf_events subsystem. However perf_events
> is built around capabilities to provide access control to its functionality,
> thus perf_events would require considerable rework prior it could be controlled
> thru SELinux. 

You mean this:

  da97e18458fb ("perf_event: Add support for LSM and SELinux checks")

?

> Then the adoption could also require changes to the installed
> infrastructure just for the sake of adopting alternative access control mechanism.

This is still very much true.
Alexey Budankov Dec. 11, 2019, 5 p.m. UTC | #8
On 11.12.2019 18:24, Peter Zijlstra wrote:
> On Wed, Dec 11, 2019 at 01:52:15PM +0300, Alexey Budankov wrote:
>> Undoubtedly, SELinux is the powerful, mature, whole level of functionality that
>> could provide benefits not only for perf_events subsystem. However perf_events
>> is built around capabilities to provide access control to its functionality,
>> thus perf_events would require considerable rework prior it could be controlled
>> thru SELinux. 
> 
> You mean this:
> 
>   da97e18458fb ("perf_event: Add support for LSM and SELinux checks")
> 
> ?

Yes, I do.

This feature greatly adds up into MAC access control [1], [2] for perf_events,
additionally to already existing DAC [3]. However, there is still the whole
other part of MAC story on the user space side.

Fortunately MAC and DAC access control mechanisms designed in the way they are
naturally layered and coexist in the system so I don't see any contradiction
in advancing either mechanism to meet the demand of possible diverse use cases.

There is no much rationale in providing favor to one or the other mechanism
because together they constitute complete integrity of security access control
and configurability for diverse use cases of perf_events.

> 
>> Then the adoption could also require changes to the installed
>> infrastructure just for the sake of adopting alternative access control mechanism.
> 
> This is still very much true.

It is just enough to imaging some HPC cluster or Cloud lab with
several hundreds of nodes to be upgraded.

Thanks,
Alexey

[1] https://en.wikipedia.org/wiki/Security-Enhanced_Linux
[2] https://en.wikipedia.org/wiki/Mandatory_access_control
[3] https://en.wikipedia.org/wiki/Discretionary_access_control
Casey Schaufler Dec. 11, 2019, 6:09 p.m. UTC | #9
On 12/11/2019 2:52 AM, Alexey Budankov wrote:
> On 05.12.2019 20:33, Casey Schaufler wrote:
>> On 12/5/2019 9:05 AM, Alexey Budankov wrote:
>>> Hello Casey,
>>>  
>>> On 05.12.2019 19:49, Casey Schaufler wrote:
>>>> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
>>>>> Currently access to perf_events functionality [1] beyond the scope permitted
>>>>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
>>>>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
>>>>>
>>>>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
>>>>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
>>>>> governing role for perf_events based performance monitoring of a system.
>>>>>
>>>>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
>>>>> performance using perf_events subsystem by processes and Perf privileged users
>>>>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
>>>>> privileged processes [3].
>>>> Are there use cases where you would need CAP_SYS_PERFMON where you
>>>> would not also need CAP_SYS_ADMIN? If you separate a new capability
>>> Actually, there are. Perf tool that has record, stat and top modes could run with
>>> CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
>>> data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.
>> The question isn't whether the tool could use the capability, it's whether
>> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
>> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?
>> My bet is that any tool that does performance monitoring is going to need
>> CAP_SYS_ADMIN for other reasons.
>>
>>>> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
>>>> with the new capability it is all rather pointless.
>>>>
>>>> The scope you've defined for this CAP_SYS_PERFMON is very small.
>>>> Is there a larger set of privilege checks that might be applicable
>>>> for it?
>>> CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
>>> and stat mode use cases for system wide performance monitoring in kernel and
>>> user modes.
>> The granularity of capabilities is something we have to watch
>> very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
>> if we broke it up "properly" we'd have hundreds of capabilities.
> Fully agree and this broader discussion is really helpful to come up with
> properly balanced solution.
>
>> If you want control that finely we have SELinux.
> Undoubtedly, SELinux is the powerful, mature, whole level of functionality that
> could provide benefits not only for perf_events subsystem. However perf_events
> is built around capabilities to provide access control to its functionality,
> thus perf_events would require considerable rework prior it could be controlled
> thru SELinux. Then the adoption could also require changes to the installed
> infrastructure just for the sake of adopting alternative access control mechanism.
>
> On the other hand there are currently already existing users and use cases that
> are built around the CAP_SYS_ADMIN based access control, and Perf tool, which is
> the native Linux kernel observability and performance profiling tool, provides
> means to operate in restricted multiuser environments(HPC clusters, cloud and 
> virtual environments) for groups of unprivileged users under admins control [1].
>
> In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
> trade-offs between perf_events subsystem extensions, required level of control
> and configurability of perf_events, existing users adoption effort, and it brings
> security hardening benefits of decreasing attack surface for the existing users
> and use cases.

I'm not 100% opposed to CAP_SYS_PERFMON. I am 100% opposed to new capabilities
that have a single use. Surely there are other CAP_SYS_ADMIN users that [cs]ould
be converted to CAP_SYS_PERFMON as well. If there is a class of system performance
privileged operations, say a dozen or so, you may have a viable argument.


>
> Well, yes, it is really good that Linux nowadays provides a handful of various
> security assuring mechanisms but proper balance is what usually makes valuable
> features happen and its users happy and moves forward. 
>
> Gratefully,
> Alexey
>
> [1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
Stephane Eranian Dec. 11, 2019, 7:04 p.m. UTC | #10
On Thu, Dec 5, 2019 at 9:35 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 12/5/2019 9:05 AM, Alexey Budankov wrote:
> > Hello Casey,
> >
> > On 05.12.2019 19:49, Casey Schaufler wrote:
> >> On 12/5/2019 8:15 AM, Alexey Budankov wrote:
> >>> Currently access to perf_events functionality [1] beyond the scope permitted
> >>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process
> >>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3].
> >>>
> >>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance
> >>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its
> >>> governing role for perf_events based performance monitoring of a system.
> >>>
> >>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring
> >>> performance using perf_events subsystem by processes and Perf privileged users
> >>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN
> >>> privileged processes [3].
> >> Are there use cases where you would need CAP_SYS_PERFMON where you
> >> would not also need CAP_SYS_ADMIN? If you separate a new capability
> > Actually, there are. Perf tool that has record, stat and top modes could run with
> > CAP_SYS_PERFMON capability as mentioned below and provide system wide performance
> > data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN.
>
> The question isn't whether the tool could use the capability, it's whether
> the tool would also need CAP_SYS_ADMIN to be useful. Are there existing
> tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON?

The answer is yes. I have recently been alerted to a problem with
paranoid=2 and the
popular rr debugger (https://rr-project.org/). This debugger uses
several perf_events
features, including profiling of PMU events and tracepoints
(context-switches). With
paranoid=2, it does not work anymore. We would need a privilege between regular
user and admin to make it work again. Note that context switches
tracepoint is only
applied to self (not system-wide).


> My bet is that any tool that does performance monitoring is going to need
> CAP_SYS_ADMIN for other reasons.
>
> >
> >> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction
> >> with the new capability it is all rather pointless.
> >>
> >> The scope you've defined for this CAP_SYS_PERFMON is very small.
> >> Is there a larger set of privilege checks that might be applicable
> >> for it?
> > CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record
> > and stat mode use cases for system wide performance monitoring in kernel and
> > user modes.
>
> The granularity of capabilities is something we have to watch
> very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but
> if we broke it up "properly" we'd have hundreds of capabilities.
> If you want control that finely we have SELinux.
>
> >
> > Thanks,
> > Alexey
> >
> >>
> >>
> >>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to
> >>> performance monitoring functionality of perf_events and balance amount of
> >>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in
> >>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded;
> >>> see Notes to kernel developers, below."
> >>>
> >>> For backward compatibility reasons performance monitoring functionality of
> >>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for
> >>> secure performance monitoring use cases is discouraged with respect to the
> >>> introduced CAP_SYS_PERFMON capability.
> >>>
> >>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users
> >>> [2] to conduct secure performance monitoring using perf_events in the scope
> >>> of available online CPUs when executing code in kernel and user modes.
> >>>
> >>> Possible alternative solution to this capabilities balancing, system security
> >>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
> >>> perf_events' performance monitoring functionality, since process debugging is
> >>> similar to performance monitoring with respect to providing insights into
> >>> process memory and execution details. However CAP_SYS_PTRACE still provides
> >>> users with more credentials than are required for secure performance monitoring
> >>> using perf_events subsystem and this excess is avoided by using the dedicated
> >>> CAP_SYS_PERFMON capability.
> >>>
> >>> libcap library utilities [4], [5] and Perf tool can be used to apply
> >>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope
> >>> permitted by system wide perf_event_paranoid kernel setting and below are the
> >>> steps to evaluate the advancement suggested by the patch set:
> >>>
> >>>   - patch, build and boot the kernel
> >>>   - patch, build Perf tool e.g. to /home/user/perf
> >>>   ...
> >>>   # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
> >>>   # pushd libcap
> >>>   # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3]
> >>>   # make
> >>>   # pushd progs
> >>>   # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
> >>>   # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
> >>>   /home/user/perf: OK
> >>>   # ./getcap /home/user/perf
> >>>   /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep
> >>>   # echo 2 > /proc/sys/kernel/perf_event_paranoid
> >>>   # cat /proc/sys/kernel/perf_event_paranoid
> >>>   2
> >>>   ...
> >>>   $ /home/user/perf top
> >>>     ... works as expected ...
> >>>   $ cat /proc/`pidof perf`/status
> >>>   Name:     perf
> >>>   Umask:    0002
> >>>   State:    S (sleeping)
> >>>   Tgid:     2958
> >>>   Ngid:     0
> >>>   Pid:      2958
> >>>   PPid:     9847
> >>>   TracerPid:        0
> >>>   Uid:      500     500     500     500
> >>>   Gid:      500     500     500     500
> >>>   FDSize:   256
> >>>   ...
> >>>   CapInh:   0000000000000000
> >>>   CapPrm:   0000004400080000
> >>>   CapEff:   0000004400080000 => 01000100 00000000 00001000 00000000 00000000
> >>>                                      cap_sys_perfmon,cap_sys_ptrace,cap_syslog
> >>>   CapBnd:   0000007fffffffff
> >>>   CapAmb:   0000000000000000
> >>>   NoNewPrivs:       0
> >>>   Seccomp:  0
> >>>   Speculation_Store_Bypass: thread vulnerable
> >>>   Cpus_allowed:     ff
> >>>   Cpus_allowed_list:        0-7
> >>>   ...
> >>>
> >>> Usage of cap_sys_perfmon effectively avoids unused credentials excess:
> >>> - with cap_sys_admin:
> >>>   CapEff:   0000007fffffffff => 01111111 11111111 11111111 11111111 11111111
> >>> - with cap_sys_perfmon:
> >>>   CapEff:   0000004400080000 => 01000100 00000000 00001000 00000000 00000000
> >>>                                     38   34               19
> >>>                            sys_perfmon   syslog           sys_ptrace
> >>>
> >>> The patch set is for tip perf/core repository:
> >>>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
> >>>   tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6
> >>>
> >>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html
> >>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> >>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html
> >>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html
> >>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
> >>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
> >>>
> >>> ---
> >>> Alexey Budankov (3):
> >>>   capabilities: introduce CAP_SYS_PERFMON to kernel and user space
> >>>   perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring
> >>>   perf tool: extend Perf tool with CAP_SYS_PERFMON support
> >>>
> >>>  include/linux/perf_event.h          |  6 ++++--
> >>>  include/uapi/linux/capability.h     | 10 +++++++++-
> >>>  security/selinux/include/classmap.h |  4 ++--
> >>>  tools/perf/design.txt               |  3 ++-
> >>>  tools/perf/util/cap.h               |  4 ++++
> >>>  tools/perf/util/evsel.c             | 10 +++++-----
> >>>  tools/perf/util/util.c              | 15 +++++++++++++--
> >>>  7 files changed, 39 insertions(+), 13 deletions(-)
> >>>
> >>
>
Andi Kleen Dec. 11, 2019, 8:36 p.m. UTC | #11
> > In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
> > trade-offs between perf_events subsystem extensions, required level of control
> > and configurability of perf_events, existing users adoption effort, and it brings
> > security hardening benefits of decreasing attack surface for the existing users
> > and use cases.
> 
> I'm not 100% opposed to CAP_SYS_PERFMON. I am 100% opposed to new capabilities
> that have a single use. Surely there are other CAP_SYS_ADMIN users that [cs]ould
> be converted to CAP_SYS_PERFMON as well. If there is a class of system performance
> privileged operations, say a dozen or so, you may have a viable argument.

perf events is not a single use. It has a bazillion of sub functionalities,
including hardware tracing, software tracing, pmu counters, software counters,
uncore counters, break points and various other stuff in its PMU drivers.

See it more as a whole quite heterogenous driver subsystem.

I guess CAP_SYS_PERFMON is not a good name because perf is much more
than just Perfmon. Perhaps call it CAP_SYS_PERF_EVENTS

-Andi
Casey Schaufler Dec. 11, 2019, 9:25 p.m. UTC | #12
On 12/11/2019 12:36 PM, Andi Kleen wrote:
>>> In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
>>> trade-offs between perf_events subsystem extensions, required level of control
>>> and configurability of perf_events, existing users adoption effort, and it brings
>>> security hardening benefits of decreasing attack surface for the existing users
>>> and use cases.
>> I'm not 100% opposed to CAP_SYS_PERFMON. I am 100% opposed to new capabilities
>> that have a single use. Surely there are other CAP_SYS_ADMIN users that [cs]ould
>> be converted to CAP_SYS_PERFMON as well. If there is a class of system performance
>> privileged operations, say a dozen or so, you may have a viable argument.
> perf events is not a single use.

If it is only being called in two places, it is single use.

>  It has a bazillion of sub functionalities,
> including hardware tracing, software tracing, pmu counters, software counters,
> uncore counters, break points and various other stuff in its PMU drivers.
>
> See it more as a whole quite heterogenous driver subsystem.
>
> I guess CAP_SYS_PERFMON is not a good name because perf is much more
> than just Perfmon. Perhaps call it CAP_SYS_PERF_EVENTS
>
> -Andi
Stephen Smalley Dec. 12, 2019, 2:24 p.m. UTC | #13
On 12/11/19 3:36 PM, Andi Kleen wrote:
>>> In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
>>> trade-offs between perf_events subsystem extensions, required level of control
>>> and configurability of perf_events, existing users adoption effort, and it brings
>>> security hardening benefits of decreasing attack surface for the existing users
>>> and use cases.
>>
>> I'm not 100% opposed to CAP_SYS_PERFMON. I am 100% opposed to new capabilities
>> that have a single use. Surely there are other CAP_SYS_ADMIN users that [cs]ould
>> be converted to CAP_SYS_PERFMON as well. If there is a class of system performance
>> privileged operations, say a dozen or so, you may have a viable argument.
> 
> perf events is not a single use. It has a bazillion of sub functionalities,
> including hardware tracing, software tracing, pmu counters, software counters,
> uncore counters, break points and various other stuff in its PMU drivers.
> 
> See it more as a whole quite heterogenous driver subsystem.
> 
> I guess CAP_SYS_PERFMON is not a good name because perf is much more
> than just Perfmon. Perhaps call it CAP_SYS_PERF_EVENTS

That seems misleading since it isn't being checked for all perf_events 
operations IIUC (CAP_SYS_ADMIN is still required for some?) and it is 
even more specialized than CAP_SYS_PERFMON, making it less likely that 
we could ever use this capability as a check for other kernel 
performance monitoring facilities beyond perf_events.

I'm not as opposed to fine-grained capabilities as Casey is but I do 
recognize that there are a limited number of available bits (although we 
do have a fair number of unused ones currently given the extension to 
64-bits) and that it would be easy to consume them all if we allocated 
one for every kernel feature.  That said, this might be a sufficiently 
important use case to justify it.

Obviously I'd encourage you to consider leveraging SELinux as well but I 
understand that you are looking for a solution that doesn't depend on a 
distro using a particular LSM or a particular policy.  I will note that 
SELinux doesn't suffer from the limited bits problem because one can 
always define a new SELinux security class with its own access vector 
permissions bitmap, as has been done for the recently added LSM/SELinux 
perf_event hooks.

I don't know who actually gets to decide when/if a new capability is 
allocated.  Maybe Serge and/or James as capabilities and LSM maintainers.

I have no objections to these patches from a SELinux POV.
Alexey Budankov Dec. 15, 2019, 11:53 a.m. UTC | #14
On 12.12.2019 17:24, Stephen Smalley wrote:
> On 12/11/19 3:36 PM, Andi Kleen wrote:
>>>> In this circumstances CAP_SYS_PERFMON looks like smart balanced advancement that
>>>> trade-offs between perf_events subsystem extensions, required level of control
>>>> and configurability of perf_events, existing users adoption effort, and it brings
>>>> security hardening benefits of decreasing attack surface for the existing users
>>>> and use cases.
>>>
>>> I'm not 100% opposed to CAP_SYS_PERFMON. I am 100% opposed to new capabilities
>>> that have a single use. Surely there are other CAP_SYS_ADMIN users that [cs]ould
>>> be converted to CAP_SYS_PERFMON as well. If there is a class of system performance
>>> privileged operations, say a dozen or so, you may have a viable argument.
>>
>> perf events is not a single use. It has a bazillion of sub functionalities,
>> including hardware tracing, software tracing, pmu counters, software counters,
>> uncore counters, break points and various other stuff in its PMU drivers.
>>
>> See it more as a whole quite heterogenous driver subsystem.
>>
>> I guess CAP_SYS_PERFMON is not a good name because perf is much more
>> than just Perfmon. Perhaps call it CAP_SYS_PERF_EVENTS
> 
> That seems misleading since it isn't being checked for all perf_events operations IIUC (CAP_SYS_ADMIN is still required for some?) and it is even more specialized than CAP_SYS_PERFMON, making it less likely that we could ever use this capability as a check for other kernel performance monitoring facilities beyond perf_events.
> 
> I'm not as opposed to fine-grained capabilities as Casey is but I do recognize that there are a limited number of available bits (although we do have a fair number of unused ones currently given the extension to 64-bits) and that it would be easy to consume them all if we allocated one for every kernel feature.  That said, this might be a sufficiently important use case to justify it.
> 
> Obviously I'd encourage you to consider leveraging SELinux as well but I understand that you are looking for a solution that doesn't depend on a distro using a particular LSM or a particular policy.  I will note that SELinux doesn't suffer from the limited bits problem because one can always define a new SELinux security class with its own access vector permissions bitmap, as has been done for the recently added LSM/SELinux perf_event hooks.
> 
> I don't know who actually gets to decide when/if a new capability is allocated.  Maybe Serge and/or James as capabilities and LSM maintainers.
> 
> I have no objections to these patches from a SELinux POV.

Stephen, thanks for meaningful input!

~Alexey