[v5,01/10] capabilities: introduce CAP_PERFMON to kernel and user space
diff mbox series

Message ID 9b77124b-675d-5ac7-3741-edec575bd425@linux.intel.com
State New
Headers show
Series
  • Introduce CAP_PERFMON to secure system performance monitoring and observability
Related show

Commit Message

Alexey Budankov Jan. 20, 2020, 11:23 a.m. UTC
Introduce CAP_PERFMON capability designed to secure system performance
monitoring and observability operations so that CAP_PERFMON would assist
CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems.

CAP_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack
surface that is available to a CAP_SYS_ADMIN privileged process [1].
Providing access to system performance monitoring and observability
operations under CAP_PERFMON capability singly, without the rest of
CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
makes operation more secure.

CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
system performance monitoring and observability operations and balance
amount of CAP_SYS_ADMIN credentials following the recommendations in the
capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
overloaded; see Notes to kernel developers, below."

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official embargoed hardware issues mitigation procedure [2].
The bugs in the software itself could be fixed following the standard
kernel development process [3] to maintain and harden security of system
performance monitoring and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 include/linux/capability.h          | 12 ++++++++++++
 include/uapi/linux/capability.h     |  8 +++++++-
 security/selinux/include/classmap.h |  4 ++--
 3 files changed, 21 insertions(+), 3 deletions(-)

Comments

Stephen Smalley Jan. 21, 2020, 2:43 p.m. UTC | #1
On 1/20/20 6:23 AM, Alexey Budankov wrote:
> 
> Introduce CAP_PERFMON capability designed to secure system performance
> monitoring and observability operations so that CAP_PERFMON would assist
> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> and other performance monitoring and observability subsystems.
> 
> CAP_PERFMON intends to harden system security and integrity during system
> performance monitoring and observability operations by decreasing attack
> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> Providing access to system performance monitoring and observability
> operations under CAP_PERFMON capability singly, without the rest of
> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> makes operation more secure.
> 
> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> system performance monitoring and observability operations and balance
> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> overloaded; see Notes to kernel developers, below."
> 
> Although the software running under CAP_PERFMON can not ensure avoidance
> of related hardware issues, the software can still mitigate these issues
> following the official embargoed hardware issues mitigation procedure [2].
> The bugs in the software itself could be fixed following the standard
> kernel development process [3] to maintain and harden security of system
> performance monitoring and observability operations.
> 
> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>   include/linux/capability.h          | 12 ++++++++++++
>   include/uapi/linux/capability.h     |  8 +++++++-
>   security/selinux/include/classmap.h |  4 ++--
>   3 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index ecce0f43c73a..8784969d91e1 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> +static inline bool perfmon_capable(void)
> +{
> +	struct user_namespace *ns = &init_user_ns;
> +
> +	if (ns_capable_noaudit(ns, CAP_PERFMON))
> +		return ns_capable(ns, CAP_PERFMON);
> +
> +	if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> +		return ns_capable(ns, CAP_SYS_ADMIN);
> +
> +	return false;
> +}

Why _noaudit()?  Normally only used when a permission failure is 
non-fatal to the operation.  Otherwise, we want the audit message.
Alexey Budankov Jan. 21, 2020, 5:30 p.m. UTC | #2
On 21.01.2020 17:43, Stephen Smalley wrote:
> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>
>> Introduce CAP_PERFMON capability designed to secure system performance
>> monitoring and observability operations so that CAP_PERFMON would assist
>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>> and other performance monitoring and observability subsystems.
>>
>> CAP_PERFMON intends to harden system security and integrity during system
>> performance monitoring and observability operations by decreasing attack
>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>> Providing access to system performance monitoring and observability
>> operations under CAP_PERFMON capability singly, without the rest of
>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>> makes operation more secure.
>>
>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>> system performance monitoring and observability operations and balance
>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>> overloaded; see Notes to kernel developers, below."
>>
>> Although the software running under CAP_PERFMON can not ensure avoidance
>> of related hardware issues, the software can still mitigate these issues
>> following the official embargoed hardware issues mitigation procedure [2].
>> The bugs in the software itself could be fixed following the standard
>> kernel development process [3] to maintain and harden security of system
>> performance monitoring and observability operations.
>>
>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>   include/linux/capability.h          | 12 ++++++++++++
>>   include/uapi/linux/capability.h     |  8 +++++++-
>>   security/selinux/include/classmap.h |  4 ++--
>>   3 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>> index ecce0f43c73a..8784969d91e1 100644
>> --- a/include/linux/capability.h
>> +++ b/include/linux/capability.h
>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>> +static inline bool perfmon_capable(void)
>> +{
>> +    struct user_namespace *ns = &init_user_ns;
>> +
>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>> +        return ns_capable(ns, CAP_PERFMON);
>> +
>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>> +
>> +    return false;
>> +}
> 
> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.

Some of ideas from v4 review.
Well, on the second sight, it defenitly should be logged for CAP_SYS_ADMIN.
Probably it is not so fatal for CAP_PERFMON, but personally 
I would unconditionally log it for CAP_PERFMON as well.
Good catch, thank you.

~Alexey
Alexei Starovoitov Jan. 21, 2020, 5:55 p.m. UTC | #3
On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
>
> On 21.01.2020 17:43, Stephen Smalley wrote:
> > On 1/20/20 6:23 AM, Alexey Budankov wrote:
> >>
> >> Introduce CAP_PERFMON capability designed to secure system performance
> >> monitoring and observability operations so that CAP_PERFMON would assist
> >> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> >> and other performance monitoring and observability subsystems.
> >>
> >> CAP_PERFMON intends to harden system security and integrity during system
> >> performance monitoring and observability operations by decreasing attack
> >> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> >> Providing access to system performance monitoring and observability
> >> operations under CAP_PERFMON capability singly, without the rest of
> >> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> >> makes operation more secure.
> >>
> >> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> >> system performance monitoring and observability operations and balance
> >> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> >> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> >> overloaded; see Notes to kernel developers, below."
> >>
> >> Although the software running under CAP_PERFMON can not ensure avoidance
> >> of related hardware issues, the software can still mitigate these issues
> >> following the official embargoed hardware issues mitigation procedure [2].
> >> The bugs in the software itself could be fixed following the standard
> >> kernel development process [3] to maintain and harden security of system
> >> performance monitoring and observability operations.
> >>
> >> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> >> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> >> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
> >>
> >> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> >> ---
> >>   include/linux/capability.h          | 12 ++++++++++++
> >>   include/uapi/linux/capability.h     |  8 +++++++-
> >>   security/selinux/include/classmap.h |  4 ++--
> >>   3 files changed, 21 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/capability.h b/include/linux/capability.h
> >> index ecce0f43c73a..8784969d91e1 100644
> >> --- a/include/linux/capability.h
> >> +++ b/include/linux/capability.h
> >> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
> >>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> >>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
> >>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> >> +static inline bool perfmon_capable(void)
> >> +{
> >> +    struct user_namespace *ns = &init_user_ns;
> >> +
> >> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
> >> +        return ns_capable(ns, CAP_PERFMON);
> >> +
> >> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> >> +        return ns_capable(ns, CAP_SYS_ADMIN);
> >> +
> >> +    return false;
> >> +}
> >
> > Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>
> Some of ideas from v4 review.

well, in the requested changes form v4 I wrote:
return capable(CAP_PERFMON);
instead of
return false;

That's what Andy suggested earlier for CAP_BPF.
I think that should resolve Stephen's concern.
Alexey Budankov Jan. 21, 2020, 6:27 p.m. UTC | #4
On 21.01.2020 20:55, Alexei Starovoitov wrote:
> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>
>>
>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>
>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>> and other performance monitoring and observability subsystems.
>>>>
>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>> performance monitoring and observability operations by decreasing attack
>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>> Providing access to system performance monitoring and observability
>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>> makes operation more secure.
>>>>
>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>> system performance monitoring and observability operations and balance
>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>> overloaded; see Notes to kernel developers, below."
>>>>
>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>> of related hardware issues, the software can still mitigate these issues
>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>> The bugs in the software itself could be fixed following the standard
>>>> kernel development process [3] to maintain and harden security of system
>>>> performance monitoring and observability operations.
>>>>
>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>
>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>> ---
>>>>   include/linux/capability.h          | 12 ++++++++++++
>>>>   include/uapi/linux/capability.h     |  8 +++++++-
>>>>   security/selinux/include/classmap.h |  4 ++--
>>>>   3 files changed, 21 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>> index ecce0f43c73a..8784969d91e1 100644
>>>> --- a/include/linux/capability.h
>>>> +++ b/include/linux/capability.h
>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>> +static inline bool perfmon_capable(void)
>>>> +{
>>>> +    struct user_namespace *ns = &init_user_ns;
>>>> +
>>>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>> +        return ns_capable(ns, CAP_PERFMON);
>>>> +
>>>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>>>> +
>>>> +    return false;
>>>> +}
>>>
>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>
>> Some of ideas from v4 review.
> 
> well, in the requested changes form v4 I wrote:
> return capable(CAP_PERFMON);
> instead of
> return false;

Aww, indeed. I was concerning exactly about it when updating the patch
and simply put false, missing the fact that capable() also logs.

I suppose the idea is originally from here [1].
BTW, Has it already seen any _more optimal_ implementation?
Anyway, original or optimized version could be reused for CAP_PERFMON.

~Alexey

[1] https://patchwork.ozlabs.org/patch/1159243/

> 
> That's what Andy suggested earlier for CAP_BPF.
> I think that should resolve Stephen's concern.
>
Alexey Budankov Jan. 22, 2020, 10:45 a.m. UTC | #5
On 21.01.2020 21:27, Alexey Budankov wrote:
> 
> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>> <alexey.budankov@linux.intel.com> wrote:
>>>
>>>
>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>
>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>> and other performance monitoring and observability subsystems.
>>>>>
>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>> performance monitoring and observability operations by decreasing attack
>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>> Providing access to system performance monitoring and observability
>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>> makes operation more secure.
>>>>>
>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>> system performance monitoring and observability operations and balance
>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>> overloaded; see Notes to kernel developers, below."
>>>>>
>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>> of related hardware issues, the software can still mitigate these issues
>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>> The bugs in the software itself could be fixed following the standard
>>>>> kernel development process [3] to maintain and harden security of system
>>>>> performance monitoring and observability operations.
>>>>>
>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>
>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>> ---
>>>>>   include/linux/capability.h          | 12 ++++++++++++
>>>>>   include/uapi/linux/capability.h     |  8 +++++++-
>>>>>   security/selinux/include/classmap.h |  4 ++--
>>>>>   3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>> --- a/include/linux/capability.h
>>>>> +++ b/include/linux/capability.h
>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>> +static inline bool perfmon_capable(void)
>>>>> +{
>>>>> +    struct user_namespace *ns = &init_user_ns;
>>>>> +
>>>>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>> +        return ns_capable(ns, CAP_PERFMON);
>>>>> +
>>>>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>
>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.

So far so good, I suggest using the simplest version for v6:

static inline bool perfmon_capable(void)
{
	return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
}

It keeps the implementation simple and readable. The implementation is more
performant in the sense of calling the API - one capable() call for CAP_PERFMON
privileged process.

Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
but this bloating also advertises and leverages using more secure CAP_PERFMON
based approach to use perf_event_open system call.

~Alexey

>>>
>>> Some of ideas from v4 review.
>>
>> well, in the requested changes form v4 I wrote:
>> return capable(CAP_PERFMON);
>> instead of
>> return false;
> 
> Aww, indeed. I was concerning exactly about it when updating the patch
> and simply put false, missing the fact that capable() also logs.
> 
> I suppose the idea is originally from here [1].
> BTW, Has it already seen any _more optimal_ implementation?
> Anyway, original or optimized version could be reused for CAP_PERFMON.
> 
> ~Alexey
> 
> [1] https://patchwork.ozlabs.org/patch/1159243/
> 
>>
>> That's what Andy suggested earlier for CAP_BPF.
>> I think that should resolve Stephen's concern.
>>
Stephen Smalley Jan. 22, 2020, 2:07 p.m. UTC | #6
On 1/22/20 5:45 AM, Alexey Budankov wrote:
> 
> On 21.01.2020 21:27, Alexey Budankov wrote:
>>
>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>> <alexey.budankov@linux.intel.com> wrote:
>>>>
>>>>
>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>> and other performance monitoring and observability subsystems.
>>>>>>
>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>> Providing access to system performance monitoring and observability
>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>> makes operation more secure.
>>>>>>
>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>> system performance monitoring and observability operations and balance
>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>
>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>> performance monitoring and observability operations.
>>>>>>
>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>>
>>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>>> ---
>>>>>>    include/linux/capability.h          | 12 ++++++++++++
>>>>>>    include/uapi/linux/capability.h     |  8 +++++++-
>>>>>>    security/selinux/include/classmap.h |  4 ++--
>>>>>>    3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>>> --- a/include/linux/capability.h
>>>>>> +++ b/include/linux/capability.h
>>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>>    extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>>    extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>>    extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>>> +static inline bool perfmon_capable(void)
>>>>>> +{
>>>>>> +    struct user_namespace *ns = &init_user_ns;
>>>>>> +
>>>>>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>>> +        return ns_capable(ns, CAP_PERFMON);
>>>>>> +
>>>>>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>
>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
> 
> So far so good, I suggest using the simplest version for v6:
> 
> static inline bool perfmon_capable(void)
> {
> 	return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
> }
> 
> It keeps the implementation simple and readable. The implementation is more
> performant in the sense of calling the API - one capable() call for CAP_PERFMON
> privileged process.
> 
> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
> but this bloating also advertises and leverages using more secure CAP_PERFMON
> based approach to use perf_event_open system call.

I can live with that.  We just need to document that when you see both a 
CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only 
allowing CAP_PERFMON first and see if that resolves the issue.  We have 
a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
Alexey Budankov Jan. 22, 2020, 2:25 p.m. UTC | #7
On 22.01.2020 17:07, Stephen Smalley wrote:
> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>
>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>
>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>>> and other performance monitoring and observability subsystems.
>>>>>>>
>>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>>> Providing access to system performance monitoring and observability
>>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>>> makes operation more secure.
>>>>>>>
>>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>>> system performance monitoring and observability operations and balance
>>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>>
>>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>>> performance monitoring and observability operations.
>>>>>>>
>>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>>>
>>>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>>>> ---
>>>>>>>    include/linux/capability.h          | 12 ++++++++++++
>>>>>>>    include/uapi/linux/capability.h     |  8 +++++++-
>>>>>>>    security/selinux/include/classmap.h |  4 ++--
>>>>>>>    3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>>>> --- a/include/linux/capability.h
>>>>>>> +++ b/include/linux/capability.h
>>>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>>>    extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>>>    extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>>>    extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>>>> +static inline bool perfmon_capable(void)
>>>>>>> +{
>>>>>>> +    struct user_namespace *ns = &init_user_ns;
>>>>>>> +
>>>>>>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>>>> +        return ns_capable(ns, CAP_PERFMON);
>>>>>>> +
>>>>>>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>>>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>>
>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>
>> So far so good, I suggest using the simplest version for v6:
>>
>> static inline bool perfmon_capable(void)
>> {
>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>> }
>>
>> It keeps the implementation simple and readable. The implementation is more
>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>> privileged process.
>>
>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>> based approach to use perf_event_open system call.
> 
> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.

perf security [1] document can be updated, at least, to align and document 
this audit logging specifics.

~Alexey

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
Alexey Budankov Feb. 6, 2020, 6:03 p.m. UTC | #8
On 22.01.2020 17:25, Alexey Budankov wrote:
> 
> On 22.01.2020 17:07, Stephen Smalley wrote:
>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>
>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>
>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>>>> and other performance monitoring and observability subsystems.
>>>>>>>>
>>>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>>>> Providing access to system performance monitoring and observability
>>>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>>>> makes operation more secure.
>>>>>>>>
>>>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>>>> system performance monitoring and observability operations and balance
>>>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>>>
>>>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>>>> performance monitoring and observability operations.
>>>>>>>>
>>>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
<SNIP>
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>>>>>
>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>
>>> So far so good, I suggest using the simplest version for v6:
>>>
>>> static inline bool perfmon_capable(void)
>>> {
>>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>> }
>>>
>>> It keeps the implementation simple and readable. The implementation is more
>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>> privileged process.
>>>
>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>> based approach to use perf_event_open system call.
>>
>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
> 
> perf security [1] document can be updated, at least, to align and document 
> this audit logging specifics.

And I plan to update the document right after this patch set is accepted.
Feel free to let me know of the places in the kernel docs that also
require update w.r.t CAP_PERFMON extension.

~Alexey

> 
> ~Alexey
> 
> [1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>
Thomas Gleixner Feb. 7, 2020, 11:38 a.m. UTC | #9
Alexey Budankov <alexey.budankov@linux.intel.com> writes:
> On 22.01.2020 17:25, Alexey Budankov wrote:
>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>> It keeps the implementation simple and readable. The implementation is more
>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>> privileged process.
>>>>
>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>> based approach to use perf_event_open system call.
>>>
>>> I can live with that.  We just need to document that when you see
>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>> try only allowing CAP_PERFMON first and see if that resolves the
>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>> CAP_DAC_OVERRIDE.
>> 
>> perf security [1] document can be updated, at least, to align and document 
>> this audit logging specifics.
>
> And I plan to update the document right after this patch set is accepted.
> Feel free to let me know of the places in the kernel docs that also
> require update w.r.t CAP_PERFMON extension.

The documentation update wants be part of the patch set and not planned
to be done _after_ the patch set is merged.

Thanks,

        tglx
Alexey Budankov Feb. 7, 2020, 1:39 p.m. UTC | #10
On 07.02.2020 14:38, Thomas Gleixner wrote:
> Alexey Budankov <alexey.budankov@linux.intel.com> writes:
>> On 22.01.2020 17:25, Alexey Budankov wrote:
>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>> privileged process.
>>>>>
>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>> based approach to use perf_event_open system call.
>>>>
>>>> I can live with that.  We just need to document that when you see
>>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>>> try only allowing CAP_PERFMON first and see if that resolves the
>>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>>> CAP_DAC_OVERRIDE.
>>>
>>> perf security [1] document can be updated, at least, to align and document 
>>> this audit logging specifics.
>>
>> And I plan to update the document right after this patch set is accepted.
>> Feel free to let me know of the places in the kernel docs that also
>> require update w.r.t CAP_PERFMON extension.
> 
> The documentation update wants be part of the patch set and not planned
> to be done _after_ the patch set is merged.

Well, accepted. It is going to make patches #11 and beyond.

Thanks,
Alexey

> 
> Thanks,
> 
>         tglx
>
Alexey Budankov Feb. 12, 2020, 8:53 a.m. UTC | #11
Hi Stephen,

On 22.01.2020 17:07, Stephen Smalley wrote:
> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>
>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>
<SNIP>
>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>
>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>
>> So far so good, I suggest using the simplest version for v6:
>>
>> static inline bool perfmon_capable(void)
>> {
>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>> }
>>
>> It keeps the implementation simple and readable. The implementation is more
>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>> privileged process.
>>
>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>> based approach to use perf_event_open system call.
> 
> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.

I am trying to reproduce this double logging with CAP_PERFMON.
I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
When running perf stat -a I am observing this AVC audit messages:

type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1

However there is no capability related messages around. I suppose my refpolicy should 
be modified somehow to observe capability related AVCs.

Could you please comment or clarify on how to enable caps related AVCs in order
to test the concerned logging.

Thanks,
Alexey

---
[1] https://github.com/SELinuxProject/refpolicy.git
Stephen Smalley Feb. 12, 2020, 1:32 p.m. UTC | #12
On 2/12/20 3:53 AM, Alexey Budankov wrote:
> Hi Stephen,
> 
> On 22.01.2020 17:07, Stephen Smalley wrote:
>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>
>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>
> <SNIP>
>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>
>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>
>>> So far so good, I suggest using the simplest version for v6:
>>>
>>> static inline bool perfmon_capable(void)
>>> {
>>>      return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>> }
>>>
>>> It keeps the implementation simple and readable. The implementation is more
>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>> privileged process.
>>>
>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>> based approach to use perf_event_open system call.
>>
>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
> 
> I am trying to reproduce this double logging with CAP_PERFMON.
> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
> When running perf stat -a I am observing this AVC audit messages:
> 
> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> 
> However there is no capability related messages around. I suppose my refpolicy should
> be modified somehow to observe capability related AVCs.
> 
> Could you please comment or clarify on how to enable caps related AVCs in order
> to test the concerned logging.

The new perfmon permission has to be defined in your policy; you'll have 
a message in dmesg about "Permission perfmon in class capability2 not 
defined in policy.".  You can either add it to the common cap2 
definition in refpolicy/policy/flask/access_vectors and rebuild your 
policy or extract your base module as CIL, add it there, and insert the 
updated module.
Alexey Budankov Feb. 12, 2020, 1:53 p.m. UTC | #13
On 12.02.2020 16:32, Stephen Smalley wrote:
> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>> Hi Stephen,
>>
>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>
>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>
>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>
>> <SNIP>
>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>
>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>
>>>> So far so good, I suggest using the simplest version for v6:
>>>>
>>>> static inline bool perfmon_capable(void)
>>>> {
>>>>      return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>> }
>>>>
>>>> It keeps the implementation simple and readable. The implementation is more
>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>> privileged process.
>>>>
>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>> based approach to use perf_event_open system call.
>>>
>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>
>> I am trying to reproduce this double logging with CAP_PERFMON.
>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>> When running perf stat -a I am observing this AVC audit messages:
>>
>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>
>> However there is no capability related messages around. I suppose my refpolicy should
>> be modified somehow to observe capability related AVCs.
>>
>> Could you please comment or clarify on how to enable caps related AVCs in order
>> to test the concerned logging.
> 
> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.

Yes, I already have it like this:
common cap2
{
<------>mac_override<--># unused by SELinux
<------>mac_admin
<------>syslog
<------>wake_alarm
<------>block_suspend
<------>audit_read
<------>perfmon
}

dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.

~Alexey

> 
>
Stephen Smalley Feb. 12, 2020, 3:21 p.m. UTC | #14
On 2/12/20 8:53 AM, Alexey Budankov wrote:
> On 12.02.2020 16:32, Stephen Smalley wrote:
>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>> Hi Stephen,
>>>
>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>
>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>
>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>
>>> <SNIP>
>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>
>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>
>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>
>>>>> static inline bool perfmon_capable(void)
>>>>> {
>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>> }
>>>>>
>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>> privileged process.
>>>>>
>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>> based approach to use perf_event_open system call.
>>>>
>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>
>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>> When running perf stat -a I am observing this AVC audit messages:
>>>
>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>
>>> However there is no capability related messages around. I suppose my refpolicy should
>>> be modified somehow to observe capability related AVCs.
>>>
>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>> to test the concerned logging.
>>
>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
> 
> Yes, I already have it like this:
> common cap2
> {
> <------>mac_override<--># unused by SELinux
> <------>mac_admin
> <------>syslog
> <------>wake_alarm
> <------>block_suspend
> <------>audit_read
> <------>perfmon
> }
> 
> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.

Some denials may be silenced by dontaudit rules; semodule -DB will strip 
those and semodule -B will restore them.  Other possibility is that the 
process doesn't have CAP_PERFMON in its effective set and therefore 
never reaches SELinux at all; denied first by the capability module.
Stephen Smalley Feb. 12, 2020, 3:45 p.m. UTC | #15
On 2/12/20 10:21 AM, Stephen Smalley wrote:
> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>> Hi Stephen,
>>>>
>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>
>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>
>>>> <SNIP>
>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system 
>>>>>>>>>>> performance
>>>>>>>>>>
>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure 
>>>>>>>>>> is non-fatal to the operation.  Otherwise, we want the audit 
>>>>>>>>>> message.
>>>>>>
>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>
>>>>>> static inline bool perfmon_capable(void)
>>>>>> {
>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>> }
>>>>>>
>>>>>> It keeps the implementation simple and readable. The 
>>>>>> implementation is more
>>>>>> performant in the sense of calling the API - one capable() call 
>>>>>> for CAP_PERFMON
>>>>>> privileged process.
>>>>>>
>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and 
>>>>>> unprivileged processes,
>>>>>> but this bloating also advertises and leverages using more secure 
>>>>>> CAP_PERFMON
>>>>>> based approach to use perf_event_open system call.
>>>>>
>>>>> I can live with that.  We just need to document that when you see 
>>>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, 
>>>>> try only allowing CAP_PERFMON first and see if that resolves the 
>>>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus 
>>>>> CAP_DAC_OVERRIDE.
>>>>
>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>> I am using the refpolicy version with enabled perf_event tclass [1], 
>>>> in permissive mode.
>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  
>>>> pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t 
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } 
>>>> for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t 
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  
>>>> pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t 
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } 
>>>> for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t 
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>
>>>> However there is no capability related messages around. I suppose my 
>>>> refpolicy should
>>>> be modified somehow to observe capability related AVCs.
>>>>
>>>> Could you please comment or clarify on how to enable caps related 
>>>> AVCs in order
>>>> to test the concerned logging.
>>>
>>> The new perfmon permission has to be defined in your policy; you'll 
>>> have a message in dmesg about "Permission perfmon in class 
>>> capability2 not defined in policy.".  You can either add it to the 
>>> common cap2 definition in refpolicy/policy/flask/access_vectors and 
>>> rebuild your policy or extract your base module as CIL, add it there, 
>>> and insert the updated module.
>>
>> Yes, I already have it like this:
>> common cap2
>> {
>> <------>mac_override<--># unused by SELinux
>> <------>mac_admin
>> <------>syslog
>> <------>wake_alarm
>> <------>block_suspend
>> <------>audit_read
>> <------>perfmon
>> }
>>
>> dmesg stopped reporting perfmon as not defined but audit.log still 
>> doesn't report CAP_PERFMON denials.
>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however 
>> perfmon_capable() does check for it.
> 
> Some denials may be silenced by dontaudit rules; semodule -DB will strip 
> those and semodule -B will restore them.  Other possibility is that the 
> process doesn't have CAP_PERFMON in its effective set and therefore 
> never reaches SELinux at all; denied first by the capability module.

Also, the fact that your denials are showing up in user_systemd_t 
suggests that something is off in your policy or userspace/distro; I 
assume that is a domain type for the systemd --user instance, but your 
shell and commands shouldn't be running in that domain (user_t would be 
more appropriate for that).
Alexey Budankov Feb. 12, 2020, 4:16 p.m. UTC | #16
On 12.02.2020 18:21, Stephen Smalley wrote:
> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>> Hi Stephen,
>>>>
>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>
>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>
>>>> <SNIP>
>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>
>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>
>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>
>>>>>> static inline bool perfmon_capable(void)
>>>>>> {
>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>> }
>>>>>>
>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>> privileged process.
>>>>>>
>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>> based approach to use perf_event_open system call.
>>>>>
>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>
>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>
>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>> be modified somehow to observe capability related AVCs.
>>>>
>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>> to test the concerned logging.
>>>
>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>
>> Yes, I already have it like this:
>> common cap2
>> {
>> <------>mac_override<--># unused by SELinux
>> <------>mac_admin
>> <------>syslog
>> <------>wake_alarm
>> <------>block_suspend
>> <------>audit_read
>> <------>perfmon
>> }
>>
>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
> 
> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.

Yes, that all makes sense.
selinux_capable() calls avc_audit() logging but cap_capable() doesn't, so proper order matters.
I am doing debug tracing of the kernel code to reveal the exact reasons.

~Alexey
Alexey Budankov Feb. 12, 2020, 4:56 p.m. UTC | #17
On 12.02.2020 18:45, Stephen Smalley wrote:
> On 2/12/20 10:21 AM, Stephen Smalley wrote:
>> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>>> Hi Stephen,
>>>>>
>>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>>
>>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>>
>>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>>
>>>>> <SNIP>
>>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>>
>>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>>
>>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>>
>>>>>>> static inline bool perfmon_capable(void)
>>>>>>> {
>>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>>> }
>>>>>>>
>>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>>> privileged process.
>>>>>>>
>>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>>> based approach to use perf_event_open system call.
>>>>>>
>>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>>
>>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>>
>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>
>>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>>> be modified somehow to observe capability related AVCs.
>>>>>
>>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>>> to test the concerned logging.
>>>>
>>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>>
>>> Yes, I already have it like this:
>>> common cap2
>>> {
>>> <------>mac_override<--># unused by SELinux
>>> <------>mac_admin
>>> <------>syslog
>>> <------>wake_alarm
>>> <------>block_suspend
>>> <------>audit_read
>>> <------>perfmon
>>> }
>>>
>>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>>
>> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.
> 
> Also, the fact that your denials are showing up in user_systemd_t suggests that something is off in your policy or userspace/distro; I assume that is a domain type for the systemd --user instance, but your shell and commands shouldn't be running in that domain (user_t would be more appropriate for that).

It is user_t for local terminal session:
ps -Z
LABEL                             PID TTY          TIME CMD
user_u:user_r:user_t            11317 pts/9    00:00:00 bash
user_u:user_r:user_t            11796 pts/9    00:00:00 ps

For local terminal root session:
ps -Z
LABEL                             PID TTY          TIME CMD
user_u:user_r:user_su_t          2926 pts/3    00:00:00 bash
user_u:user_r:user_su_t         10995 pts/3    00:00:00 ps

For remote ssh session:
ps -Z
LABEL                             PID TTY          TIME CMD
user_u:user_r:user_t             7540 pts/8    00:00:00 ps
user_u:user_r:user_systemd_t     8875 pts/8    00:00:00 bash

~Alexey
Stephen Smalley Feb. 12, 2020, 5:09 p.m. UTC | #18
On 2/12/20 11:56 AM, Alexey Budankov wrote:
> 
> 
> On 12.02.2020 18:45, Stephen Smalley wrote:
>> On 2/12/20 10:21 AM, Stephen Smalley wrote:
>>> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>>>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>>>> Hi Stephen,
>>>>>>
>>>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>>>
>>>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>>>
>>>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>>>
>>>>>> <SNIP>
>>>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>>>
>>>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>>>
>>>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>>>
>>>>>>>> static inline bool perfmon_capable(void)
>>>>>>>> {
>>>>>>>>        return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>>>> }
>>>>>>>>
>>>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>>>> privileged process.
>>>>>>>>
>>>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>>>> based approach to use perf_event_open system call.
>>>>>>>
>>>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>>>
>>>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>>>
>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>
>>>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>>>> be modified somehow to observe capability related AVCs.
>>>>>>
>>>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>>>> to test the concerned logging.
>>>>>
>>>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>>>
>>>> Yes, I already have it like this:
>>>> common cap2
>>>> {
>>>> <------>mac_override<--># unused by SELinux
>>>> <------>mac_admin
>>>> <------>syslog
>>>> <------>wake_alarm
>>>> <------>block_suspend
>>>> <------>audit_read
>>>> <------>perfmon
>>>> }
>>>>
>>>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>>>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>>>
>>> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.
>>
>> Also, the fact that your denials are showing up in user_systemd_t suggests that something is off in your policy or userspace/distro; I assume that is a domain type for the systemd --user instance, but your shell and commands shouldn't be running in that domain (user_t would be more appropriate for that).
> 
> It is user_t for local terminal session:
> ps -Z
> LABEL                             PID TTY          TIME CMD
> user_u:user_r:user_t            11317 pts/9    00:00:00 bash
> user_u:user_r:user_t            11796 pts/9    00:00:00 ps
> 
> For local terminal root session:
> ps -Z
> LABEL                             PID TTY          TIME CMD
> user_u:user_r:user_su_t          2926 pts/3    00:00:00 bash
> user_u:user_r:user_su_t         10995 pts/3    00:00:00 ps
> 
> For remote ssh session:
> ps -Z
> LABEL                             PID TTY          TIME CMD
> user_u:user_r:user_t             7540 pts/8    00:00:00 ps
> user_u:user_r:user_systemd_t     8875 pts/8    00:00:00 bash

That's a bug in either your policy or your userspace/distro integration. 
  In any event, unless user_systemd_t is allowed all capability2 
permissions by your policy, you should see the denials if CAP_PERFMON is 
set in the effective capability set of the process.
Alexey Budankov Feb. 13, 2020, 9:05 a.m. UTC | #19
On 12.02.2020 20:09, Stephen Smalley wrote:
> On 2/12/20 11:56 AM, Alexey Budankov wrote:
>>
>>
>> On 12.02.2020 18:45, Stephen Smalley wrote:
>>> On 2/12/20 10:21 AM, Stephen Smalley wrote:
>>>> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>>>>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>>>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>>>>
>>>>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>>>>
>>>>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>>>>> <alexey.budankov@linux.intel.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>>>>
>>>>>>> <SNIP>
>>>>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>>>>
>>>>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>>>>
>>>>>>>>> static inline bool perfmon_capable(void)
>>>>>>>>> {
>>>>>>>>>        return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>>>>> privileged process.
>>>>>>>>>
>>>>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>>>>> based approach to use perf_event_open system call.
>>>>>>>>
>>>>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>>>>
>>>>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>>>>
>>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>>
>>>>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>>>>> be modified somehow to observe capability related AVCs.
>>>>>>>
>>>>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>>>>> to test the concerned logging.
>>>>>>
>>>>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>>>>
>>>>> Yes, I already have it like this:
>>>>> common cap2
>>>>> {
>>>>> <------>mac_override<--># unused by SELinux
>>>>> <------>mac_admin
>>>>> <------>syslog
>>>>> <------>wake_alarm
>>>>> <------>block_suspend
>>>>> <------>audit_read
>>>>> <------>perfmon
>>>>> }
>>>>>
>>>>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>>>>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>>>>
>>>> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.
>>>
>>> Also, the fact that your denials are showing up in user_systemd_t suggests that something is off in your policy or userspace/distro; I assume that is a domain type for the systemd --user instance, but your shell and commands shouldn't be running in that domain (user_t would be more appropriate for that).
>>
>> It is user_t for local terminal session:
>> ps -Z
>> LABEL                             PID TTY          TIME CMD
>> user_u:user_r:user_t            11317 pts/9    00:00:00 bash
>> user_u:user_r:user_t            11796 pts/9    00:00:00 ps
>>
>> For local terminal root session:
>> ps -Z
>> LABEL                             PID TTY          TIME CMD
>> user_u:user_r:user_su_t          2926 pts/3    00:00:00 bash
>> user_u:user_r:user_su_t         10995 pts/3    00:00:00 ps
>>
>> For remote ssh session:
>> ps -Z
>> LABEL                             PID TTY          TIME CMD
>> user_u:user_r:user_t             7540 pts/8    00:00:00 ps
>> user_u:user_r:user_systemd_t     8875 pts/8    00:00:00 bash
> 
> That's a bug in either your policy or your userspace/distro integration.  In any event, unless user_systemd_t is allowed all capability2 permissions by your policy, you should see the denials if CAP_PERFMON is set in the effective capability set of the process.
> 

That all seems to be true. After instrumentation, rebuilding and rebooting, in CAP_PERFMON case:

$ getcap perf
perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep

$ perf stat -a

type=AVC msg=audit(1581580399.165:784): avc:  denied  { open } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:785): avc:  denied  { perfmon } for  pid=8859 comm="perf" capability=38  scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability2 permissive=1
type=AVC msg=audit(1581580399.165:786): avc:  denied  { kernel } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:787): avc:  denied  { cpu } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:788): avc:  denied  { write } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580408.078:791): avc:  denied  { read } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1

dmesg:

[  137.877713] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  137.877774] cread_has_capability(CAP_PERFMON) = 0
[  137.877775] prior avc_audit(CAP_PERFMON)
[  137.877779] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = 0

[  137.877784] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  137.877785] cread_has_capability(CAP_PERFMON) = 0
[  137.877786] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = 0

[  137.877794] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  137.877795] cread_has_capability(CAP_PERFMON) = 0
[  137.877796] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = 0

...

in CAP_SYS_ADMIN case:

$ getcap perf
perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep

$ perf stat -a

type=AVC msg=audit(1581580747.928:835): avc:  denied  { open } for  pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:836): avc:  denied  { cpu } for  pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:837): avc:  denied  { kernel } for  pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:838): avc:  denied  { read } for  pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:839): avc:  denied  { write } for  pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
...

$ perf record -- ls
...
type=AVC msg=audit(1581580747.930:843): avc:  denied  { sys_ptrace } for  pid=8927 comm="perf" capability=19  scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability permissive=1
...

dmesg:

[  276.714266] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  276.714268] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[  276.714269] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[  276.714270] cread_has_capability(CAP_SYS_ADMIN) = 0
[  276.714270] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = 0

[  276.714287] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  276.714287] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[  276.714288] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[  276.714288] cread_has_capability(CAP_SYS_ADMIN) = 0
[  276.714289] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = 0

[  276.714294] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  276.714295] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[  276.714295] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[  276.714296] cread_has_capability(CAP_SYS_ADMIN) = 0
[  276.714296] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = 0

...

in unprivileged case:

$ getcap perf
perf =

$ perf stat -a; perf record -a

...

dmesg:

[  947.275611] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  947.275613] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[  947.275614] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[  947.275615] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = -1

[  947.275636] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[  947.275637] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[  947.275638] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[  947.275638] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = -1

...

So it looks like CAP_PERFMON and CAP_SYS_ADMIN are not ever logged by AVC simultaneously,
in the current LSM and perfmon_capable() implementations.

If perfmon is granted:
	perfmon is not logged by capabilities, perfmon is logged by AVC,
	no check for sys_admin by perfmon_capable().

If perfmon is not granted but sys_admin is granted:
	perfmon is not logged by capabilities, AVC logging is not called for perfmon,
	sys_admin is not logged by capabilities, sys_admin is not logged by AVC, for some intended reason?

No caps are granted:
	AVC logging is not called either for perfmon or for sys_admin.

BTW, is there a way to may be drop some AV cache so denials would appear in audit in the next AV access?

Well, I guess you have initially mentioned some case similar to this (note that ids are not the same but pids= are):

type=AVC msg=audit(1581580399.165:784): avc:  denied  { open } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:785): avc:  denied  { perfmon } for  pid=8859 comm="perf" capability=38  scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability2 permissive=1
type=AVC msg=audit(          .   :   ): avc:  denied  { sys_admin } for  pid=8859 comm="perf" capability=21  scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability2 permissive=1
type=AVC msg=audit(1581580399.165:786): avc:  denied  { kernel } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:787): avc:  denied  { cpu } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:788): avc:  denied  { write } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580408.078:791): avc:  denied  { read } for  pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1

So the message could be like this:

"If audit logs for a process using perf_events related syscalls i.e. perf_event_open(), read(), write(),
 ioctl(), mmap() contain denials both for CAP_PERFMON and CAP_SYS_ADMIN capabilities then providing the
 process with CAP_PERFMON capability singly is the secure preferred approach to resolve access denials 
 to performance monitoring and observability operations."

~Alexey
Alexey Budankov Feb. 20, 2020, 1:05 p.m. UTC | #20
On 07.02.2020 16:39, Alexey Budankov wrote:
> 
> On 07.02.2020 14:38, Thomas Gleixner wrote:
>> Alexey Budankov <alexey.budankov@linux.intel.com> writes:
>>> On 22.01.2020 17:25, Alexey Budankov wrote:
>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>> privileged process.
>>>>>>
>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>> based approach to use perf_event_open system call.
>>>>>
>>>>> I can live with that.  We just need to document that when you see
>>>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>>>> try only allowing CAP_PERFMON first and see if that resolves the
>>>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>>>> CAP_DAC_OVERRIDE.
>>>>
>>>> perf security [1] document can be updated, at least, to align and document 
>>>> this audit logging specifics.
>>>
>>> And I plan to update the document right after this patch set is accepted.
>>> Feel free to let me know of the places in the kernel docs that also
>>> require update w.r.t CAP_PERFMON extension.
>>
>> The documentation update wants be part of the patch set and not planned
>> to be done _after_ the patch set is merged.
> 
> Well, accepted. It is going to make patches #11 and beyond.

Patches #11 and #12 of v7 [1] contain information on CAP_PERFMON intention and usage.
Patch for man-pages [2] extends perf_event_open.2 documentation.

Thanks,
Alexey

---
[1] https://lore.kernel.org/lkml/c8de937a-0b3a-7147-f5ef-69f467e87a13@linux.intel.com/
[2] https://lore.kernel.org/lkml/18d1083d-efe5-f5f8-c531-d142c0e5c1a8@linux.intel.com/

Patch
diff mbox series

diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..8784969d91e1 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,18 @@  extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
 extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
 extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
 extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
+static inline bool perfmon_capable(void)
+{
+	struct user_namespace *ns = &init_user_ns;
+
+	if (ns_capable_noaudit(ns, CAP_PERFMON))
+		return ns_capable(ns, CAP_PERFMON);
+
+	if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
+		return ns_capable(ns, CAP_SYS_ADMIN);
+
+	return false;
+}
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..8b416e5f3afa 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -366,8 +366,14 @@  struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_READ		37
 
+/*
+ * Allow system performance and observability privileged operations
+ * using perf_events, i915_perf and other kernel subsystems
+ */
+
+#define CAP_PERFMON		38
 
-#define CAP_LAST_CAP         CAP_AUDIT_READ
+#define CAP_LAST_CAP         CAP_PERFMON
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 7db24855e12d..c599b0c2b0e7 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@ 
 	    "audit_control", "setfcap"
 
 #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
-		"wake_alarm", "block_suspend", "audit_read"
+		"wake_alarm", "block_suspend", "audit_read", "perfmon"
 
-#if CAP_LAST_CAP > CAP_AUDIT_READ
+#if CAP_LAST_CAP > CAP_PERFMON
 #error New capability defined, please update COMMON_CAP2_PERMS.
 #endif