From patchwork Thu Feb 7 13:29:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Budankov X-Patchwork-Id: 10801227 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C8B513BF for ; Thu, 7 Feb 2019 13:29:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65F1E2D61C for ; Thu, 7 Feb 2019 13:29:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 55C762D617; Thu, 7 Feb 2019 13:29:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 4ED312D617 for ; Thu, 7 Feb 2019 13:29:33 +0000 (UTC) Received: (qmail 15989 invoked by uid 550); 7 Feb 2019 13:29:31 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 15971 invoked from network); 7 Feb 2019 13:29:31 -0000 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,344,1544515200"; d="scan'208";a="297959929" Subject: [PATCH v2 1/4] perf-security: document perf_events/Perf resource control From: Alexey Budankov To: Jonatan Corbet , Kees Cook , Thomas Gleixner , Ingo Molnar , Peter Zijlstra Cc: Jann Horn , Arnaldo Carvalho de Melo , Jiri Olsa , Namhyung Kim , Alexander Shishkin , Andi Kleen , Mark Rutland , Tvrtko Ursulin , "kernel-hardening@lists.openwall.com" , "linux-doc@vger.kernel.org" , linux-kernel References: Organization: Intel Corp. Message-ID: Date: Thu, 7 Feb 2019 16:29:14 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Virus-Scanned: ClamAV using ClamSMTP Extend perf-security.rst file with perf_events/Perf resource control section describing RLIMIT_NOFILE and perf_event_mlock_kb settings for performance monitoring user processes. Signed-off-by: Alexey Budankov --- Changes in v2: - applied comments on v1 --- Documentation/admin-guide/perf-security.rst | 36 +++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index f73ebfe9bfe2..3915f07b9dea 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -84,6 +84,40 @@ governed by perf_event_paranoid [2]_ setting: locking limit is imposed but ignored for unprivileged processes with CAP_IPC_LOCK capability. +perf_events/Perf resource control +--------------------------------- + +The perf_events system call API [2]_ allocates file descriptors for every configured +PMU event. Open file descriptors are a per-process accountable resource governed +by the RLIMIT_NOFILE [11]_ limit (ulimit -n), which is usually derived from the login +shell process. When configuring Perf collection for a long list of events on a +large server system, this limit can be easily hit preventing required monitoring +configuration. RLIMIT_NOFILE limit can be increased on per-user basis modifying +content of the limits.conf file [12]_ on some systems. Ordinarily, a Perf sampling session +(perf record) requires an amount of open perf_event file descriptors that is not +less than a number of monitored events multiplied by a number of monitored CPUs. + +An amount of memory available to user processes for capturing performance monitoring +data is governed by the perf_event_mlock_kb [2]_ setting. This perf_event specific +resource setting defines overall per-cpu limits of memory allowed for mapping +by the user processes to execute performance monitoring. The setting essentially +extends the RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped specially +for capturing monitored performance events and related data. + +For example, if a machine has eight cores and perf_event_mlock_kb limit is set +to 516 KiB, then a user process is provided with 516 KiB * 8 = 4128 KiB of memory +above the RLIMIT_MEMLOCK limit (ulimit -l) for perf_event mmap buffers. In particular, +this means that, if the user wants to start two or more performance monitoring +processes, the user is required to manually distribute available 4128 KiB between the +monitoring processes, for example, using the --mmap-pages Perf record mode option. +Otherwise, the first started performance monitoring process allocates all available +4128 KiB and the other processes will fail to proceed due to the lack of memory. + +RLIMIT_MEMLOCK and perf_event_mlock_kb resource costraints are ignored for +processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf privileged users +can be provided with memory above the constraints for perf_events/Perf performance +monitoring purpose by providing the Perf executable with CAP_IPC_LOCK capability. + Bibliography ------------ @@ -94,4 +128,6 @@ Bibliography .. [5] ``_ .. [6] ``_ .. [7] ``_ +.. [11] ``_ +.. [12] ``_ From patchwork Thu Feb 7 13:30:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Budankov X-Patchwork-Id: 10801231 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EA75213B4 for ; Thu, 7 Feb 2019 13:30:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D89132D6DD for ; Thu, 7 Feb 2019 13:30:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC5952D6E0; Thu, 7 Feb 2019 13:30:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id DF0682D6DE for ; Thu, 7 Feb 2019 13:30:39 +0000 (UTC) Received: (qmail 18085 invoked by uid 550); 7 Feb 2019 13:30:38 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 18062 invoked from network); 7 Feb 2019 13:30:37 -0000 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,344,1544515200"; d="scan'208";a="317082652" Subject: [PATCH v2 2/4] perf-security: document collected perf_events/Perf data categories From: Alexey Budankov To: Jonatan Corbet , Kees Cook , Thomas Gleixner , Ingo Molnar , Peter Zijlstra Cc: Jann Horn , Arnaldo Carvalho de Melo , Jiri Olsa , Namhyung Kim , Alexander Shishkin , Andi Kleen , Mark Rutland , Tvrtko Ursulin , "kernel-hardening@lists.openwall.com" , "linux-doc@vger.kernel.org" , linux-kernel References: Organization: Intel Corp. Message-ID: <6d075f2c-ada4-e56d-5bf8-d6d617b761db@linux.intel.com> Date: Thu, 7 Feb 2019 16:30:19 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Virus-Scanned: ClamAV using ClamSMTP Document and categorize system and performance data into groups that can be captured by perf_events/Perf and explicitly indicate the group that can contain process sensitive data. Signed-off-by: Alexey Budankov --- Changes in v2: - applied comments on v1 --- Documentation/admin-guide/perf-security.rst | 32 +++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index 3915f07b9dea..e6eb7e1ee5ad 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -11,8 +11,34 @@ impose a considerable risk of leaking sensitive data accessed by monitored processes. The data leakage is possible both in scenarios of direct usage of perf_events system call API [2]_ and over data files generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that -perf_events performance monitoring units (PMU) [2]_ collect and expose for -performance analysis. Having that said perf_events/Perf performance monitoring +perf_events performance monitoring units (PMU) [2]_ and Perf collect and expose +for performance analysis. Collected system and performance data may be split into +several categories: + +1. System hardware and software configuration data, for example: a CPU model and + its cache configuration, an amount of available memory and its topology, used + kernel and Perf versions, performance monitoring setup including experiment + time, events configuration, Perf command line parameters, etc. + +2. User and kernel module paths and their load addresses with sizes, process and + thread names with their PIDs and TIDs, timestamps for captured hardware and + software events. + +3. Content of kernel software counters (e.g., for context switches, page faults, + CPU migrations), architectural hardware performance counters (PMC) [8]_ and + machine specific registers (MSR) [9]_ that provide execution metrics for + various monitored parts of the system (e.g., memory controller (IMC), interconnect + (QPI/UPI) or peripheral (PCIe) uncore counters) without direct attribution to any + execution context state. + +4. Content of architectural execution context registers (e.g., RIP, RSP, RBP on + x86_64), process user and kernel space memory addresses and data, content of + various architectural MSRs that capture data from this category. + +Data that belong to the fourth category can potentially contain sensitive process +data. If PMUs in some monitoring modes capture values of execution context registers +or data from process memory then access to such monitoring capabilities requires +to be ordered and secured properly. So, perf_events/Perf performance monitoring is the subject for security access control management [5]_ . perf_events/Perf access control @@ -128,6 +154,8 @@ Bibliography .. [5] ``_ .. [6] ``_ .. [7] ``_ +.. [8] ``_ +.. [9] ``_ .. [11] ``_ .. [12] ``_ From patchwork Thu Feb 7 13:31:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Budankov X-Patchwork-Id: 10801233 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82EC36C2 for ; Thu, 7 Feb 2019 13:31:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D1602D6DE for ; Thu, 7 Feb 2019 13:31:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 605862D6E1; Thu, 7 Feb 2019 13:31:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 55A8A2D6DE for ; Thu, 7 Feb 2019 13:31:47 +0000 (UTC) Received: (qmail 20274 invoked by uid 550); 7 Feb 2019 13:31:46 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 20254 invoked from network); 7 Feb 2019 13:31:45 -0000 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,344,1544515200"; d="scan'208";a="114406966" Subject: [PATCH v2 3/4] perf-security: elaborate on perf_events/Perf privileged users From: Alexey Budankov To: Jonatan Corbet , Kees Cook , Thomas Gleixner , Ingo Molnar , Peter Zijlstra Cc: Jann Horn , Arnaldo Carvalho de Melo , Jiri Olsa , Namhyung Kim , Alexander Shishkin , Andi Kleen , Mark Rutland , Tvrtko Ursulin , "kernel-hardening@lists.openwall.com" , "linux-doc@vger.kernel.org" , linux-kernel References: Organization: Intel Corp. Message-ID: Date: Thu, 7 Feb 2019 16:31:27 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Virus-Scanned: ClamAV using ClamSMTP Elaborate on possible perf_event/Perf privileged users groups and document steps about creating such groups. Signed-off-by: Alexey Budankov --- Changes in v2: - applied comments on v1 --- Documentation/admin-guide/perf-security.rst | 43 +++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index e6eb7e1ee5ad..f27a62805651 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -73,6 +73,48 @@ enable capturing of additional data required for later performance analysis of monitored processes or a system. For example, CAP_SYSLOG capability permits reading kernel space memory addresses from /proc/kallsyms file. +perf_events/Perf privileged users +--------------------------------- + +Mechanisms of capabilities, privileged capability-dumb files [6]_ and file system +ACLs [10]_ can be used to create a dedicated group of perf_events/Perf privileged +users who are permitted to execute performance monitoring without scope limits. +The following steps can be taken to create such a group of privileged Perf users. + +1. Create perf_users group of privileged Perf users, assign perf_users group to + Perf tool executable and limit access to the executable for other users in the + system who are not in the perf_users group: + +:: + + # groupadd perf_users + # ls -alhF + -rwxr-xr-x 2 root root 11M Oct 19 15:12 perf + # chgrp perf_users perf + # ls -alhF + -rwxr-xr-x 2 root perf_users 11M Oct 19 15:12 perf + # chmod o-rwx perf + # ls -alhF + -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf + +2. Assign the required capabilities to the Perf tool executable file and enable + members of perf_users group with performance monitoring privileges [6]_ : + +:: + + # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + perf: OK + # getcap perf + perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep + +As a result, members of perf_users group are capable of conducting performance +monitoring by using functionality of the configured Perf tool executable that, +when executes, passes perf_events subsystem scope checks. + +This specific access control management is only available to superuser or root +running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ capabilities. + perf_events/Perf unprivileged users ----------------------------------- @@ -156,6 +198,7 @@ Bibliography .. [7] ``_ .. [8] ``_ .. [9] ``_ +.. [10] ``_ .. [11] ``_ .. [12] ``_ From patchwork Thu Feb 7 13:32:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Budankov X-Patchwork-Id: 10801247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 715C76C2 for ; Thu, 7 Feb 2019 13:32:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5CBE42D6DE for ; Thu, 7 Feb 2019 13:32:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4EBFB2D6E1; Thu, 7 Feb 2019 13:32:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 29AD12D6DE for ; Thu, 7 Feb 2019 13:32:52 +0000 (UTC) Received: (qmail 22478 invoked by uid 550); 7 Feb 2019 13:32:51 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 22460 invoked from network); 7 Feb 2019 13:32:50 -0000 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,344,1544515200"; d="scan'208";a="141474005" Subject: [PATCH v2 4/4] perf-security: wrap paragraphs on 72 columns From: Alexey Budankov To: Jonatan Corbet , Kees Cook , Thomas Gleixner , Ingo Molnar , Peter Zijlstra Cc: Jann Horn , Arnaldo Carvalho de Melo , Jiri Olsa , Namhyung Kim , Alexander Shishkin , Andi Kleen , Mark Rutland , Tvrtko Ursulin , "kernel-hardening@lists.openwall.com" , "linux-doc@vger.kernel.org" , linux-kernel References: Organization: Intel Corp. Message-ID: <4858caa3-6f53-fb51-984e-369571f71793@linux.intel.com> Date: Thu, 7 Feb 2019 16:32:31 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Virus-Scanned: ClamAV using ClamSMTP Implemented formatting of paragraphs to be not wider than 72 columns. Signed-off-by: Alexey Budankov --- Documentation/admin-guide/perf-security.rst | 276 +++++++++++--------- 1 file changed, 148 insertions(+), 128 deletions(-) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index f27a62805651..6a26067f5fd6 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -6,84 +6,94 @@ Perf Events and tool security Overview -------- -Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ can -impose a considerable risk of leaking sensitive data accessed by monitored -processes. The data leakage is possible both in scenarios of direct usage of -perf_events system call API [2]_ and over data files generated by Perf tool user -mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that -perf_events performance monitoring units (PMU) [2]_ and Perf collect and expose -for performance analysis. Collected system and performance data may be split into -several categories: - -1. System hardware and software configuration data, for example: a CPU model and - its cache configuration, an amount of available memory and its topology, used - kernel and Perf versions, performance monitoring setup including experiment - time, events configuration, Perf command line parameters, etc. - -2. User and kernel module paths and their load addresses with sizes, process and - thread names with their PIDs and TIDs, timestamps for captured hardware and - software events. - -3. Content of kernel software counters (e.g., for context switches, page faults, - CPU migrations), architectural hardware performance counters (PMC) [8]_ and - machine specific registers (MSR) [9]_ that provide execution metrics for - various monitored parts of the system (e.g., memory controller (IMC), interconnect - (QPI/UPI) or peripheral (PCIe) uncore counters) without direct attribution to any - execution context state. - -4. Content of architectural execution context registers (e.g., RIP, RSP, RBP on - x86_64), process user and kernel space memory addresses and data, content of - various architectural MSRs that capture data from this category. - -Data that belong to the fourth category can potentially contain sensitive process -data. If PMUs in some monitoring modes capture values of execution context registers -or data from process memory then access to such monitoring capabilities requires -to be ordered and secured properly. So, perf_events/Perf performance monitoring -is the subject for security access control management [5]_ . +Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ +can impose a considerable risk of leaking sensitive data accessed by +monitored processes. The data leakage is possible both in scenarios of +direct usage of perf_events system call API [2]_ and over data files +generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk +depends on the nature of data that perf_events performance monitoring +units (PMU) [2]_ and Perf collect and expose for performance analysis. +Collected system and performance data may be split into several +categories: + +1. System hardware and software configuration data, for example: a CPU + model and its cache configuration, an amount of available memory and + its topology, used kernel and Perf versions, performance monitoring + setup including experiment time, events configuration, Perf command + line parameters, etc. + +2. User and kernel module paths and their load addresses with sizes, + process and thread names with their PIDs and TIDs, timestamps for + captured hardware and software events. + +3. Content of kernel software counters (e.g., for context switches, page + faults, CPU migrations), architectural hardware performance counters + (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide + execution metrics for various monitored parts of the system (e.g., + memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe) + uncore counters) without direct attribution to any execution context + state. + +4. Content of architectural execution context registers (e.g., RIP, RSP, + RBP on x86_64), process user and kernel space memory addresses and + data, content of various architectural MSRs that capture data from + this category. + +Data that belong to the fourth category can potentially contain +sensitive process data. If PMUs in some monitoring modes capture values +of execution context registers or data from process memory then access +to such monitoring capabilities requires to be ordered and secured +properly. So, perf_events/Perf performance monitoring is the subject for +security access control management [5]_ . perf_events/Perf access control ------------------------------- -To perform security checks, the Linux implementation splits processes into two -categories [6]_ : a) privileged processes (whose effective user ID is 0, referred -to as superuser or root), and b) unprivileged processes (whose effective UID is -nonzero). Privileged processes bypass all kernel security permission checks so -perf_events performance monitoring is fully available to privileged processes -without access, scope and resource restrictions. - -Unprivileged processes are subject to a full security permission check based on -the process's credentials [5]_ (usually: effective UID, effective GID, and -supplementary group list). - -Linux divides the privileges traditionally associated with superuser into -distinct units, known as capabilities [6]_ , which can be independently enabled -and disabled on per-thread basis for processes and files of unprivileged users. - -Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated as -privileged processes with respect to perf_events performance monitoring and -bypass *scope* permissions checks in the kernel. - -Unprivileged processes using perf_events system call API is also subject for -PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome -determines whether monitoring is permitted. So unprivileged processes provided -with CAP_SYS_PTRACE capability are effectively permitted to pass the check. - -Other capabilities being granted to unprivileged processes can effectively -enable capturing of additional data required for later performance analysis of -monitored processes or a system. For example, CAP_SYSLOG capability permits -reading kernel space memory addresses from /proc/kallsyms file. +To perform security checks, the Linux implementation splits processes +into two categories [6]_ : a) privileged processes (whose effective user +ID is 0, referred to as superuser or root), and b) unprivileged +processes (whose effective UID is nonzero). Privileged processes bypass +all kernel security permission checks so perf_events performance +monitoring is fully available to privileged processes without access, +scope and resource restrictions. + +Unprivileged processes are subject to a full security permission check +based on the process's credentials [5]_ (usually: effective UID, +effective GID, and supplementary group list). + +Linux divides the privileges traditionally associated with superuser +into distinct units, known as capabilities [6]_ , which can be +independently enabled and disabled on per-thread basis for processes and +files of unprivileged users. + +Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated +as privileged processes with respect to perf_events performance +monitoring and bypass *scope* permissions checks in the kernel. + +Unprivileged processes using perf_events system call API is also subject +for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose +outcome determines whether monitoring is permitted. So unprivileged +processes provided with CAP_SYS_PTRACE capability are effectively +permitted to pass the check. + +Other capabilities being granted to unprivileged processes can +effectively enable capturing of additional data required for later +performance analysis of monitored processes or a system. For example, +CAP_SYSLOG capability permits reading kernel space memory addresses from +/proc/kallsyms file. perf_events/Perf privileged users --------------------------------- -Mechanisms of capabilities, privileged capability-dumb files [6]_ and file system -ACLs [10]_ can be used to create a dedicated group of perf_events/Perf privileged -users who are permitted to execute performance monitoring without scope limits. -The following steps can be taken to create such a group of privileged Perf users. +Mechanisms of capabilities, privileged capability-dumb files [6]_ and +file system ACLs [10]_ can be used to create a dedicated group of +perf_events/Perf privileged users who are permitted to execute +performance monitoring without scope limits. The following steps can be +taken to create such a group of privileged Perf users. -1. Create perf_users group of privileged Perf users, assign perf_users group to - Perf tool executable and limit access to the executable for other users in the - system who are not in the perf_users group: +1. Create perf_users group of privileged Perf users, assign perf_users + group to Perf tool executable and limit access to the executable for + other users in the system who are not in the perf_users group: :: @@ -97,8 +107,9 @@ The following steps can be taken to create such a group of privileged Perf users # ls -alhF -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf -2. Assign the required capabilities to the Perf tool executable file and enable - members of perf_users group with performance monitoring privileges [6]_ : +2. Assign the required capabilities to the Perf tool executable file and + enable members of perf_users group with performance monitoring + privileges [6]_ : :: @@ -108,83 +119,92 @@ The following steps can be taken to create such a group of privileged Perf users # getcap perf perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep -As a result, members of perf_users group are capable of conducting performance -monitoring by using functionality of the configured Perf tool executable that, -when executes, passes perf_events subsystem scope checks. +As a result, members of perf_users group are capable of conducting +performance monitoring by using functionality of the configured Perf +tool executable that, when executes, passes perf_events subsystem scope +checks. -This specific access control management is only available to superuser or root -running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ capabilities. +This specific access control management is only available to superuser +or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ +capabilities. perf_events/Perf unprivileged users ----------------------------------- -perf_events/Perf *scope* and *access* control for unprivileged processes is -governed by perf_event_paranoid [2]_ setting: +perf_events/Perf *scope* and *access* control for unprivileged processes +is governed by perf_event_paranoid [2]_ setting: -1: - Impose no *scope* and *access* restrictions on using perf_events performance - monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is - ignored when allocating memory buffers for storing performance data. - This is the least secure mode since allowed monitored *scope* is - maximized and no perf_events specific limits are imposed on *resources* - allocated for performance monitoring. + Impose no *scope* and *access* restrictions on using perf_events + performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ + locking limit is ignored when allocating memory buffers for storing + performance data. This is the least secure mode since allowed + monitored *scope* is maximized and no perf_events specific limits + are imposed on *resources* allocated for performance monitoring. >=0: *scope* includes per-process and system wide performance monitoring - but excludes raw tracepoints and ftrace function tracepoints monitoring. - CPU and system events happened when executing either in user or - in kernel space can be monitored and captured for later analysis. - Per-user per-cpu perf_event_mlock_kb locking limit is imposed but - ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability. + but excludes raw tracepoints and ftrace function tracepoints + monitoring. CPU and system events happened when executing either in + user or in kernel space can be monitored and captured for later + analysis. Per-user per-cpu perf_event_mlock_kb locking limit is + imposed but ignored for unprivileged processes with CAP_IPC_LOCK + [6]_ capability. >=1: - *scope* includes per-process performance monitoring only and excludes - system wide performance monitoring. CPU and system events happened when - executing either in user or in kernel space can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only and + excludes system wide performance monitoring. CPU and system events + happened when executing either in user or in kernel space can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. >=2: - *scope* includes per-process performance monitoring only. CPU and system - events happened when executing in user space only can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only. CPU and + system events happened when executing in user space only can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. perf_events/Perf resource control --------------------------------- -The perf_events system call API [2]_ allocates file descriptors for every configured -PMU event. Open file descriptors are a per-process accountable resource governed -by the RLIMIT_NOFILE [11]_ limit (ulimit -n), which is usually derived from the login -shell process. When configuring Perf collection for a long list of events on a -large server system, this limit can be easily hit preventing required monitoring -configuration. RLIMIT_NOFILE limit can be increased on per-user basis modifying -content of the limits.conf file [12]_ on some systems. Ordinarily, a Perf sampling session -(perf record) requires an amount of open perf_event file descriptors that is not -less than a number of monitored events multiplied by a number of monitored CPUs. - -An amount of memory available to user processes for capturing performance monitoring -data is governed by the perf_event_mlock_kb [2]_ setting. This perf_event specific -resource setting defines overall per-cpu limits of memory allowed for mapping -by the user processes to execute performance monitoring. The setting essentially -extends the RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped specially +The perf_events system call API [2]_ allocates file descriptors for +every configured PMU event. Open file descriptors are a per-process +accountable resource governed by the RLIMIT_NOFILE [11]_ limit +(ulimit -n), which is usually derived from the login shell process. When +configuring Perf collection for a long list of events on a large server +system, this limit can be easily hit preventing required monitoring +configuration. RLIMIT_NOFILE limit can be increased on per-user basis +modifying content of the limits.conf file [12]_ on some systems. +Ordinarily, a Perf sampling session (perf record) requires an amount of +open perf_event file descriptors that is not less than a number of +monitored events multiplied by a number of monitored CPUs. + +An amount of memory available to user processes for capturing +performance monitoring data is governed by the perf_event_mlock_kb [2]_ +setting. This perf_event specific resource setting defines overall +per-cpu limits of memory allowed for mapping by the user processes to +execute performance monitoring. The setting essentially extends the +RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped specially for capturing monitored performance events and related data. -For example, if a machine has eight cores and perf_event_mlock_kb limit is set -to 516 KiB, then a user process is provided with 516 KiB * 8 = 4128 KiB of memory -above the RLIMIT_MEMLOCK limit (ulimit -l) for perf_event mmap buffers. In particular, -this means that, if the user wants to start two or more performance monitoring -processes, the user is required to manually distribute available 4128 KiB between the -monitoring processes, for example, using the --mmap-pages Perf record mode option. -Otherwise, the first started performance monitoring process allocates all available -4128 KiB and the other processes will fail to proceed due to the lack of memory. - -RLIMIT_MEMLOCK and perf_event_mlock_kb resource costraints are ignored for -processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf privileged users -can be provided with memory above the constraints for perf_events/Perf performance -monitoring purpose by providing the Perf executable with CAP_IPC_LOCK capability. +For example, if a machine has eight cores and perf_event_mlock_kb limit +is set to 516 KiB, then a user process is provided with 516 KiB * 8 = +4128 KiB of memory above the RLIMIT_MEMLOCK limit (ulimit -l) for +perf_event mmap buffers. In particular, this means that, if the user +wants to start two or more performance monitoring processes, the user is +required to manually distribute available 4128 KiB between the +monitoring processes, for example, using the --mmap-pages Perf record +mode option. Otherwise, the first started performance monitoring process +allocates all available 4128 KiB and the other processes will fail to +proceed due to the lack of memory. + +RLIMIT_MEMLOCK and perf_event_mlock_kb resource costraints are ignored +for processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf +privileged users can be provided with memory above the constraints for +perf_events/Perf performance monitoring purpose by providing the Perf +executable with CAP_IPC_LOCK capability. Bibliography ------------