[RFC] security,capability: pass object information to security_capable
diff mbox series

Message ID 20190712173404.14417-1-nhfran2@tycho.nsa.gov
State New
Headers show
Series
  • [RFC] security,capability: pass object information to security_capable
Related show

Commit Message

Nicholas Franck July 12, 2019, 5:34 p.m. UTC
At present security_capable does not pass any object information
and therefore can neither audit the particular object nor take it
into account. Augment the security_capable interface to support
passing supplementary data. Use this facility initially to convey
the inode for capability checks relevant to inodes. This only
addresses capable_wrt_inode_uidgid calls; other capability checks
relevant to inodes will be addressed in subsequent changes. In the
future, this will be further extended to pass object information for
other capability checks such as the target task for CAP_KILL.

In SELinux this new information is leveraged here to include the inode
in the audit message. In the future, it could also be used to perform
a per inode capability checks.

It would be possible to fold the existing opts argument into this new
supplementary data structure. This was omitted from this change to
minimize changes.

Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>
---
 include/linux/capability.h             |  7 ++++++
 include/linux/lsm_audit.h              |  5 +++-
 include/linux/lsm_hooks.h              |  3 ++-
 include/linux/security.h               | 23 +++++++++++++-----
 kernel/capability.c                    | 33 ++++++++++++++++++--------
 kernel/seccomp.c                       |  2 +-
 security/apparmor/capability.c         |  8 ++++---
 security/apparmor/include/capability.h |  4 +++-
 security/apparmor/ipc.c                |  2 +-
 security/apparmor/lsm.c                |  5 ++--
 security/apparmor/resource.c           |  2 +-
 security/commoncap.c                   | 11 +++++----
 security/lsm_audit.c                   | 21 ++++++++++++++--
 security/safesetid/lsm.c               |  3 ++-
 security/security.c                    |  5 ++--
 security/selinux/hooks.c               | 20 +++++++++-------
 security/smack/smack_access.c          |  2 +-
 17 files changed, 110 insertions(+), 46 deletions(-)

Comments

James Morris July 12, 2019, 5:50 p.m. UTC | #1
On Fri, 12 Jul 2019, Nicholas Franck wrote:

> +	case LSM_AUDIT_DATA_CAP: {
> +		const struct inode *inode;
> +
> +		if (a->u.cap_struct.cad) {
> +			switch (a->u.cap_struct.cad->type) {
> +			case CAP_AUX_DATA_INODE: {
> +				inode = a->u.cap_struct.cad->u.inode;
> +
> +				audit_log_format(ab, " dev=");
> +				audit_log_untrustedstring(ab,
> +					inode->i_sb->s_id);
> +				audit_log_format(ab, " ino=%lu",
> +					inode->i_ino);
> +				break;
> +			}
> +			}
> +		}
> +		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
>  		break;

Will this break any existing userspace log parsers?
Casey Schaufler July 12, 2019, 5:58 p.m. UTC | #2
On 7/12/2019 10:34 AM, Nicholas Franck wrote:
> At present security_capable does not pass any object information
> and therefore can neither audit the particular object nor take it
> into account. Augment the security_capable interface to support
> passing supplementary data. Use this facility initially to convey
> the inode for capability checks relevant to inodes. This only
> addresses capable_wrt_inode_uidgid calls; other capability checks
> relevant to inodes will be addressed in subsequent changes. In the
> future, this will be further extended to pass object information for
> other capability checks such as the target task for CAP_KILL.

This seems wrong to me. The capability system has nothing to do
with objects. Passing object information through security_capable()
may be convenient, but isn't relevant to the purpose of the interface.
It appears that there are very few places where the object information
is actually useful.

> In SELinux this new information is leveraged here to include the inode
> in the audit message. In the future, it could also be used to perform
> a per inode capability checks.

I suggest that you want a mechanism for adding the inode information
to the audit record instead.

What would a "per inode" capability check be? Capability checks are
process checks, not object checks.

> It would be possible to fold the existing opts argument into this new
> supplementary data structure. This was omitted from this change to
> minimize changes.
>
> Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>
> ---
>  include/linux/capability.h             |  7 ++++++
>  include/linux/lsm_audit.h              |  5 +++-
>  include/linux/lsm_hooks.h              |  3 ++-
>  include/linux/security.h               | 23 +++++++++++++-----
>  kernel/capability.c                    | 33 ++++++++++++++++++--------
>  kernel/seccomp.c                       |  2 +-
>  security/apparmor/capability.c         |  8 ++++---
>  security/apparmor/include/capability.h |  4 +++-
>  security/apparmor/ipc.c                |  2 +-
>  security/apparmor/lsm.c                |  5 ++--
>  security/apparmor/resource.c           |  2 +-
>  security/commoncap.c                   | 11 +++++----
>  security/lsm_audit.c                   | 21 ++++++++++++++--
>  security/safesetid/lsm.c               |  3 ++-
>  security/security.c                    |  5 ++--
>  security/selinux/hooks.c               | 20 +++++++++-------
>  security/smack/smack_access.c          |  2 +-
>  17 files changed, 110 insertions(+), 46 deletions(-)
>
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index ecce0f43c73a..f72de64c179d 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -211,6 +211,8 @@ extern bool capable(int cap);
>  extern bool ns_capable(struct user_namespace *ns, int cap);
>  extern bool ns_capable_noaudit(struct user_namespace *ns, int cap);
>  extern bool ns_capable_setid(struct user_namespace *ns, int cap);
> +extern bool ns_capable_inode(struct user_namespace *ns, int cap,
> +			     const struct inode *inode);
>  #else
>  static inline bool has_capability(struct task_struct *t, int cap)
>  {
> @@ -246,6 +248,11 @@ static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
>  {
>  	return true;
>  }
> +static bool ns_capable_inode(struct user_namespace *ns, int cap,
> +			     const struct inode *inode)
> +{
> +	return true;
> +}
>  #endif /* CONFIG_MULTIUSER */
>  extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode);
>  extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
> index 915330abf6e5..15d2a62639f0 100644
> --- a/include/linux/lsm_audit.h
> +++ b/include/linux/lsm_audit.h
> @@ -79,7 +79,10 @@ struct common_audit_data {
>  		struct dentry *dentry;
>  		struct inode *inode;
>  		struct lsm_network_audit *net;
> -		int cap;
> +		struct  {
> +			int cap;

> +			struct cap_aux_data *cad;
> +		} cap_struct;
>  		int ipc_id;
>  		struct task_struct *tsk;
>  #ifdef CONFIG_KEYS
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 47f58cfb6a19..b2a37d613030 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1469,7 +1469,8 @@ union security_list_options {
>  	int (*capable)(const struct cred *cred,
>  			struct user_namespace *ns,
>  			int cap,
> -			unsigned int opts);
> +			unsigned int opts,
> +			struct cap_aux_data *cad);
>  	int (*quotactl)(int cmds, int type, int id, struct super_block *sb);
>  	int (*quota_on)(struct dentry *dentry);
>  	int (*syslog)(int type);
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 659071c2e57c..8fce5e69dc52 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -77,9 +77,18 @@ enum lsm_event {
>  	LSM_POLICY_CHANGE,
>  };
>  
> +
> +struct cap_aux_data {
> +	char type;
> +#define CAP_AUX_DATA_INODE	1
> +	union	{
> +		const struct inode *inode;
> +	} u;
> +};
> +
>  /* These functions are in security/commoncap.c */
>  extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
> -		       int cap, unsigned int opts);
> +		       int cap, unsigned int opts, struct cap_aux_data *cad);
>  extern int cap_settime(const struct timespec64 *ts, const struct timezone *tz);
>  extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
>  extern int cap_ptrace_traceme(struct task_struct *parent);
> @@ -215,9 +224,10 @@ int security_capset(struct cred *new, const struct cred *old,
>  		    const kernel_cap_t *inheritable,
>  		    const kernel_cap_t *permitted);
>  int security_capable(const struct cred *cred,
> -		       struct user_namespace *ns,
> -		       int cap,
> -		       unsigned int opts);
> +		     struct user_namespace *ns,
> +		     int cap,
> +		     unsigned int opts,
> +		     struct cap_aux_data *cad);
>  int security_quotactl(int cmds, int type, int id, struct super_block *sb);
>  int security_quota_on(struct dentry *dentry);
>  int security_syslog(int type);
> @@ -478,9 +488,10 @@ static inline int security_capset(struct cred *new,
>  static inline int security_capable(const struct cred *cred,
>  				   struct user_namespace *ns,
>  				   int cap,
> -				   unsigned int opts)
> +				   unsigned int opts,
> +				   struct cap_aux_data *cad)
>  {
> -	return cap_capable(cred, ns, cap, opts);
> +	return cap_capable(cred, ns, cap, opts, NULL);
>  }
>  
>  static inline int security_quotactl(int cmds, int type, int id,
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 1444f3954d75..c3723443904a 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -297,7 +297,7 @@ bool has_ns_capability(struct task_struct *t,
>  	int ret;
>  
>  	rcu_read_lock();
> -	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE);
> +	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE, NULL);
>  	rcu_read_unlock();
>  
>  	return (ret == 0);
> @@ -338,7 +338,7 @@ bool has_ns_capability_noaudit(struct task_struct *t,
>  	int ret;
>  
>  	rcu_read_lock();
> -	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT);
> +	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT, NULL);
>  	rcu_read_unlock();
>  
>  	return (ret == 0);
> @@ -363,7 +363,8 @@ bool has_capability_noaudit(struct task_struct *t, int cap)
>  
>  static bool ns_capable_common(struct user_namespace *ns,
>  			      int cap,
> -			      unsigned int opts)
> +			      unsigned int opts,
> +			      struct cap_aux_data *cad)
>  {
>  	int capable;
>  
> @@ -372,7 +373,7 @@ static bool ns_capable_common(struct user_namespace *ns,
>  		BUG();
>  	}
>  
> -	capable = security_capable(current_cred(), ns, cap, opts);
> +	capable = security_capable(current_cred(), ns, cap, opts, cad);
>  	if (capable == 0) {
>  		current->flags |= PF_SUPERPRIV;
>  		return true;
> @@ -393,7 +394,7 @@ static bool ns_capable_common(struct user_namespace *ns,
>   */
>  bool ns_capable(struct user_namespace *ns, int cap)
>  {
> -	return ns_capable_common(ns, cap, CAP_OPT_NONE);
> +	return ns_capable_common(ns, cap, CAP_OPT_NONE, NULL);
>  }
>  EXPORT_SYMBOL(ns_capable);
>  
> @@ -411,7 +412,7 @@ EXPORT_SYMBOL(ns_capable);
>   */
>  bool ns_capable_noaudit(struct user_namespace *ns, int cap)
>  {
> -	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT);
> +	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT, NULL);
>  }
>  EXPORT_SYMBOL(ns_capable_noaudit);
>  
> @@ -430,7 +431,7 @@ EXPORT_SYMBOL(ns_capable_noaudit);
>   */
>  bool ns_capable_setid(struct user_namespace *ns, int cap)
>  {
> -	return ns_capable_common(ns, cap, CAP_OPT_INSETID);
> +	return ns_capable_common(ns, cap, CAP_OPT_INSETID, NULL);
>  }
>  EXPORT_SYMBOL(ns_capable_setid);
>  
> @@ -470,7 +471,7 @@ bool file_ns_capable(const struct file *file, struct user_namespace *ns,
>  	if (WARN_ON_ONCE(!cap_valid(cap)))
>  		return false;
>  
> -	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE) == 0)
> +	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE, NULL) == 0)
>  		return true;
>  
>  	return false;
> @@ -503,7 +504,8 @@ bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
>  {
>  	struct user_namespace *ns = current_user_ns();
>  
> -	return ns_capable(ns, cap) && privileged_wrt_inode_uidgid(ns, inode);
> +	return ns_capable_inode(ns, cap, inode) &&
> +		privileged_wrt_inode_uidgid(ns, inode);
>  }
>  EXPORT_SYMBOL(capable_wrt_inode_uidgid);
>  
> @@ -524,7 +526,18 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
>  	cred = rcu_dereference(tsk->ptracer_cred);
>  	if (cred)
>  		ret = security_capable(cred, ns, CAP_SYS_PTRACE,
> -				       CAP_OPT_NOAUDIT);
> +				       CAP_OPT_NOAUDIT, NULL);
>  	rcu_read_unlock();
>  	return (ret == 0);
>  }
> +
> +bool ns_capable_inode(struct user_namespace *ns, int cap,
> +			const struct inode *inode)
> +{
> +	struct cap_aux_data cad;
> +
> +	cad.type = CAP_AUX_DATA_INODE;
> +	cad.u.inode = inode;
> +	return ns_capable_common(ns, cap, CAP_OPT_NONE, &cad);
> +}
> +EXPORT_SYMBOL(ns_capable_inode);
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 811b4a86cdf6..d59dd7079ece 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -446,7 +446,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>  	 */
>  	if (!task_no_new_privs(current) &&
>  	    security_capable(current_cred(), current_user_ns(),
> -				     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) != 0)
> +			     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) != 0)
>  		return ERR_PTR(-EACCES);
>  
>  	/* Allocate a new seccomp_filter */
> diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
> index 752f73980e30..c45192a16733 100644
> --- a/security/apparmor/capability.c
> +++ b/security/apparmor/capability.c
> @@ -50,7 +50,7 @@ static void audit_cb(struct audit_buffer *ab, void *va)
>  	struct common_audit_data *sa = va;
>  
>  	audit_log_format(ab, " capname=");
> -	audit_log_untrustedstring(ab, capability_names[sa->u.cap]);
> +	audit_log_untrustedstring(ab, capability_names[sa->u.cap_struct.cap]);
>  }
>  
>  /**
> @@ -148,13 +148,15 @@ static int profile_capable(struct aa_profile *profile, int cap,
>   *
>   * Returns: 0 on success, or else an error code.
>   */
> -int aa_capable(struct aa_label *label, int cap, unsigned int opts)
> +int aa_capable(struct aa_label *label, int cap, unsigned int opts,
> +	       struct cap_aux_data *cad)
>  {
>  	struct aa_profile *profile;
>  	int error = 0;
>  	DEFINE_AUDIT_DATA(sa, LSM_AUDIT_DATA_CAP, OP_CAPABLE);
>  
> -	sa.u.cap = cap;
> +	sa.u.cap_struct.cap = cap;
> +	sa.u.cap_struct.cad = cad;
>  	error = fn_for_each_confined(label, profile,
>  			profile_capable(profile, cap, opts, &sa));
>  
> diff --git a/security/apparmor/include/capability.h b/security/apparmor/include/capability.h
> index 1b3663b6ab12..d888f09d76d1 100644
> --- a/security/apparmor/include/capability.h
> +++ b/security/apparmor/include/capability.h
> @@ -20,6 +20,7 @@
>  #include "apparmorfs.h"
>  
>  struct aa_label;
> +struct cap_aux_data;
>  
>  /* aa_caps - confinement data for capabilities
>   * @allowed: capabilities mask
> @@ -40,7 +41,8 @@ struct aa_caps {
>  
>  extern struct aa_sfs_entry aa_sfs_entry_caps[];
>  
> -int aa_capable(struct aa_label *label, int cap, unsigned int opts);
> +int aa_capable(struct aa_label *label, int cap, unsigned int opts,
> +	       struct cap_aux_data *cad);
>  
>  static inline void aa_free_cap_rules(struct aa_caps *caps)
>  {
> diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
> index aacd1e95cb59..deb5267ca695 100644
> --- a/security/apparmor/ipc.c
> +++ b/security/apparmor/ipc.c
> @@ -108,7 +108,7 @@ static int profile_tracer_perm(struct aa_profile *tracer,
>  	aad(sa)->peer = tracee;
>  	aad(sa)->request = 0;
>  	aad(sa)->error = aa_capable(&tracer->label, CAP_SYS_PTRACE,
> -				    CAP_OPT_NONE);
> +				    CAP_OPT_NONE, NULL);
>  
>  	return aa_audit(AUDIT_APPARMOR_AUTO, tracer, sa, audit_ptrace_cb);
>  }
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index 87500bde5a92..82790accb679 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -172,14 +172,15 @@ static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
>  }
>  
>  static int apparmor_capable(const struct cred *cred, struct user_namespace *ns,
> -			    int cap, unsigned int opts)
> +			    int cap, unsigned int opts,
> +			    struct cap_aux_data *cad)
>  {
>  	struct aa_label *label;
>  	int error = 0;
>  
>  	label = aa_get_newest_cred_label(cred);
>  	if (!unconfined(label))
> -		error = aa_capable(label, cap, opts);
> +		error = aa_capable(label, cap, opts, cad);
>  	aa_put_label(label);
>  
>  	return error;
> diff --git a/security/apparmor/resource.c b/security/apparmor/resource.c
> index 552ed09cb47e..9b3d4b4437f2 100644
> --- a/security/apparmor/resource.c
> +++ b/security/apparmor/resource.c
> @@ -124,7 +124,7 @@ int aa_task_setrlimit(struct aa_label *label, struct task_struct *task,
>  	 */
>  
>  	if (label != peer &&
> -	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT) != 0)
> +	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT, NULL) != 0)
>  		error = fn_for_each(label, profile,
>  				audit_resource(profile, resource,
>  					       new_rlim->rlim_max, peer,
> diff --git a/security/commoncap.c b/security/commoncap.c
> index c477fb673701..1860ea50f473 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -68,7 +68,7 @@ static void warn_setuid_and_fcaps_mixed(const char *fname)
>   * kernel's capable() and has_capability() returns 1 for this case.
>   */
>  int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
> -		int cap, unsigned int opts)
> +		int cap, unsigned int opts, struct cap_aux_data *cad)
>  {
>  	struct user_namespace *ns = targ_ns;
>  
> @@ -226,7 +226,7 @@ static inline int cap_inh_is_capped(void)
>  	 * capability
>  	 */
>  	if (cap_capable(current_cred(), current_cred()->user_ns,
> -			CAP_SETPCAP, CAP_OPT_NONE) == 0)
> +			CAP_SETPCAP, CAP_OPT_NONE, NULL) == 0)
>  		return 0;
>  	return 1;
>  }
> @@ -1211,7 +1211,8 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>  		    || (cap_capable(current_cred(),
>  				    current_cred()->user_ns,
>  				    CAP_SETPCAP,
> -				    CAP_OPT_NONE) != 0)			/*[4]*/
> +				    CAP_OPT_NONE,
> +				    NULL) != 0)				/*[4]*/
>  			/*
>  			 * [1] no changing of bits that are locked
>  			 * [2] no unlocking of locks
> @@ -1307,7 +1308,7 @@ int cap_vm_enough_memory(struct mm_struct *mm, long pages)
>  	int cap_sys_admin = 0;
>  
>  	if (cap_capable(current_cred(), &init_user_ns,
> -				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) == 0)
> +				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) == 0)
>  		cap_sys_admin = 1;
>  
>  	return cap_sys_admin;
> @@ -1328,7 +1329,7 @@ int cap_mmap_addr(unsigned long addr)
>  
>  	if (addr < dac_mmap_min_addr) {
>  		ret = cap_capable(current_cred(), &init_user_ns, CAP_SYS_RAWIO,
> -				  CAP_OPT_NONE);
> +				  CAP_OPT_NONE, NULL);
>  		/* set PF_SUPERPRIV if it turns out we allow the low mmap */
>  		if (ret == 0)
>  			current->flags |= PF_SUPERPRIV;
> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> index 33028c098ef3..4871b2508a4a 100644
> --- a/security/lsm_audit.c
> +++ b/security/lsm_audit.c
> @@ -229,9 +229,26 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>  	case LSM_AUDIT_DATA_IPC:
>  		audit_log_format(ab, " key=%d ", a->u.ipc_id);
>  		break;
> -	case LSM_AUDIT_DATA_CAP:
> -		audit_log_format(ab, " capability=%d ", a->u.cap);
> +	case LSM_AUDIT_DATA_CAP: {
> +		const struct inode *inode;
> +
> +		if (a->u.cap_struct.cad) {
> +			switch (a->u.cap_struct.cad->type) {
> +			case CAP_AUX_DATA_INODE: {
> +				inode = a->u.cap_struct.cad->u.inode;
> +
> +				audit_log_format(ab, " dev=");
> +				audit_log_untrustedstring(ab,
> +					inode->i_sb->s_id);
> +				audit_log_format(ab, " ino=%lu",
> +					inode->i_ino);
> +				break;
> +			}
> +			}
> +		}
> +		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
>  		break;
> +	}
>  	case LSM_AUDIT_DATA_PATH: {
>  		struct inode *inode;
>  
> diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
> index cecd38e2ac80..c74ed83e9501 100644
> --- a/security/safesetid/lsm.c
> +++ b/security/safesetid/lsm.c
> @@ -80,7 +80,8 @@ static bool check_setuid_policy_hashtable_key_value(kuid_t parent,
>  static int safesetid_security_capable(const struct cred *cred,
>  				      struct user_namespace *ns,
>  				      int cap,
> -				      unsigned int opts)
> +				      unsigned int opts,
> +				      struct cap_aux_data *cad)
>  {
>  	if (cap == CAP_SETUID &&
>  	    check_setuid_policy_hashtable_key(cred->uid)) {
> diff --git a/security/security.c b/security/security.c
> index 613a5c00e602..06274a7b9c4e 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -691,9 +691,10 @@ int security_capset(struct cred *new, const struct cred *old,
>  int security_capable(const struct cred *cred,
>  		     struct user_namespace *ns,
>  		     int cap,
> -		     unsigned int opts)
> +		     unsigned int opts,
> +		     struct cap_aux_data *cad)
>  {
> -	return call_int_hook(capable, 0, cred, ns, cap, opts);
> +	return call_int_hook(capable, 0, cred, ns, cap, opts, cad);
>  }
>  
>  int security_quotactl(int cmds, int type, int id, struct super_block *sb)
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index f77b314d0575..d6c699ed06be 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1618,7 +1618,10 @@ static inline u32 signal_to_av(int sig)
>  
>  /* Check whether a task is allowed to use a capability. */
>  static int cred_has_capability(const struct cred *cred,
> -			       int cap, unsigned int opts, bool initns)
> +				int cap,
> +				unsigned int opts,
> +				bool initns,
> +				struct cap_aux_data *cad)
>  {
>  	struct common_audit_data ad;
>  	struct av_decision avd;
> @@ -1628,7 +1631,8 @@ static int cred_has_capability(const struct cred *cred,
>  	int rc;
>  
>  	ad.type = LSM_AUDIT_DATA_CAP;
> -	ad.u.cap = cap;
> +	ad.u.cap_struct.cad = cad;
> +	ad.u.cap_struct.cap = cap;
>  
>  	switch (CAP_TO_INDEX(cap)) {
>  	case 0:
> @@ -2165,9 +2169,9 @@ static int selinux_capset(struct cred *new, const struct cred *old,
>   */
>  
>  static int selinux_capable(const struct cred *cred, struct user_namespace *ns,
> -			   int cap, unsigned int opts)
> +			   int cap, unsigned int opts, struct cap_aux_data *cad)
>  {
> -	return cred_has_capability(cred, cap, opts, ns == &init_user_ns);
> +	return cred_has_capability(cred, cap, opts, ns == &init_user_ns, cad);
>  }
>  
>  static int selinux_quotactl(int cmds, int type, int id, struct super_block *sb)
> @@ -2241,7 +2245,7 @@ static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
>  	int rc, cap_sys_admin = 0;
>  
>  	rc = cred_has_capability(current_cred(), CAP_SYS_ADMIN,
> -				 CAP_OPT_NOAUDIT, true);
> +				 CAP_OPT_NOAUDIT, true, NULL);
>  	if (rc == 0)
>  		cap_sys_admin = 1;
>  
> @@ -3101,9 +3105,9 @@ static bool has_cap_mac_admin(bool audit)
>  	const struct cred *cred = current_cred();
>  	unsigned int opts = audit ? CAP_OPT_NONE : CAP_OPT_NOAUDIT;
>  
> -	if (cap_capable(cred, &init_user_ns, CAP_MAC_ADMIN, opts))
> +	if (cap_capable(cred, &init_user_ns, CAP_MAC_ADMIN, opts, NULL))
>  		return false;
> -	if (cred_has_capability(cred, CAP_MAC_ADMIN, opts, true))
> +	if (cred_has_capability(cred, CAP_MAC_ADMIN, opts, true, NULL))
>  		return false;
>  	return true;
>  }
> @@ -3563,7 +3567,7 @@ static int selinux_file_ioctl(struct file *file, unsigned int cmd,
>  	case KDSKBENT:
>  	case KDSKBSENT:
>  		error = cred_has_capability(cred, CAP_SYS_TTY_CONFIG,
> -					    CAP_OPT_NONE, true);
> +					    CAP_OPT_NONE, true, NULL);
>  		break;
>  
>  	/* default case assumes that the command will go
> diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
> index fe2ce3a65822..e961bfe8f00a 100644
> --- a/security/smack/smack_access.c
> +++ b/security/smack/smack_access.c
> @@ -640,7 +640,7 @@ bool smack_privileged_cred(int cap, const struct cred *cred)
>  	struct smack_known_list_elem *sklep;
>  	int rc;
>  
> -	rc = cap_capable(cred, &init_user_ns, cap, CAP_OPT_NONE);
> +	rc = cap_capable(cred, &init_user_ns, cap, CAP_OPT_NONE, NULL);
>  	if (rc)
>  		return false;
>
Stephen Smalley July 12, 2019, 6:02 p.m. UTC | #3
On 7/12/19 1:50 PM, James Morris wrote:
> On Fri, 12 Jul 2019, Nicholas Franck wrote:
> 
>> +	case LSM_AUDIT_DATA_CAP: {
>> +		const struct inode *inode;
>> +
>> +		if (a->u.cap_struct.cad) {
>> +			switch (a->u.cap_struct.cad->type) {
>> +			case CAP_AUX_DATA_INODE: {
>> +				inode = a->u.cap_struct.cad->u.inode;
>> +
>> +				audit_log_format(ab, " dev=");
>> +				audit_log_untrustedstring(ab,
>> +					inode->i_sb->s_id);
>> +				audit_log_format(ab, " ino=%lu",
>> +					inode->i_ino);
>> +				break;
>> +			}
>> +			}
>> +		}
>> +		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
>>   		break;
> 
> Will this break any existing userspace log parsers?

I'm hoping not given that we are only adding auxiliary fields and those 
are already defined for other AVC audit messages.  ausearch appeared to 
work fine.  Added the linux-audit mailing list to the cc line to get 
their view.
Stephen Smalley July 12, 2019, 6:25 p.m. UTC | #4
On 7/12/19 1:58 PM, Casey Schaufler wrote:
> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>> At present security_capable does not pass any object information
>> and therefore can neither audit the particular object nor take it
>> into account. Augment the security_capable interface to support
>> passing supplementary data. Use this facility initially to convey
>> the inode for capability checks relevant to inodes. This only
>> addresses capable_wrt_inode_uidgid calls; other capability checks
>> relevant to inodes will be addressed in subsequent changes. In the
>> future, this will be further extended to pass object information for
>> other capability checks such as the target task for CAP_KILL.
> 
> This seems wrong to me. The capability system has nothing to do
> with objects. Passing object information through security_capable()
> may be convenient, but isn't relevant to the purpose of the interface.
> It appears that there are very few places where the object information
> is actually useful.

A fair number of capabilities are checked upon some attempted object 
access (often right after comparing UIDs or other per-object state), and 
the particular object can be very helpful in both audit and in access 
control.  More below.

> 
>> In SELinux this new information is leveraged here to include the inode
>> in the audit message. In the future, it could also be used to perform
>> a per inode capability checks.
> 
> I suggest that you want a mechanism for adding the inode information
> to the audit record instead.

That is part of what we want, but not the entire picture.  But with 
respect to audit, the problem today is that one sees SELinux 
dac_read_search, dac_override, etc denials with no indication as to the 
particular file, which is unfortunate both from a security auditing 
perspective and from a policy development perspective.  The only option 
today to gain that information is by enabling system call audit and 
setting at least one audit filter so that the audit framework will 
collect that information and include it in the audit records that are 
emitted upon syscall exit after any such AVC denial.  Unfortunately, 
that is all disabled by default on most systems due to its associated 
performance overhead, and one doesn't even have the option of enabling 
it on some systems, e.g. Android devices.  And even when one can enable 
syscall audit, one must correlate the syscall audit record to the 
associated AVC record to identify the object rather than having the 
information directly included in the same record.


> What would a "per inode" capability check be? Capability checks are
> process checks, not object checks.

Ideally it would be possible to scope dac_override and other 
capabilities to specific objects rather than having to allow it for all 
or none.  Just because a process needs to override DAC on one file or 
one set of files is not a reason to allow it to do so on every file it 
can access at all.  If we want to apply least privilege, then this is a 
desirable facility.  I understand that doesn't mesh with Smack's mental 
model but it would probably be useful to both SELinux and AppArmor, 
among others.


> 
>> It would be possible to fold the existing opts argument into this new
>> supplementary data structure. This was omitted from this change to
>> minimize changes.
>>
>> Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>
>> ---
>>   include/linux/capability.h             |  7 ++++++
>>   include/linux/lsm_audit.h              |  5 +++-
>>   include/linux/lsm_hooks.h              |  3 ++-
>>   include/linux/security.h               | 23 +++++++++++++-----
>>   kernel/capability.c                    | 33 ++++++++++++++++++--------
>>   kernel/seccomp.c                       |  2 +-
>>   security/apparmor/capability.c         |  8 ++++---
>>   security/apparmor/include/capability.h |  4 +++-
>>   security/apparmor/ipc.c                |  2 +-
>>   security/apparmor/lsm.c                |  5 ++--
>>   security/apparmor/resource.c           |  2 +-
>>   security/commoncap.c                   | 11 +++++----
>>   security/lsm_audit.c                   | 21 ++++++++++++++--
>>   security/safesetid/lsm.c               |  3 ++-
>>   security/security.c                    |  5 ++--
>>   security/selinux/hooks.c               | 20 +++++++++-------
>>   security/smack/smack_access.c          |  2 +-
>>   17 files changed, 110 insertions(+), 46 deletions(-)
>>
>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>> index ecce0f43c73a..f72de64c179d 100644
>> --- a/include/linux/capability.h
>> +++ b/include/linux/capability.h
>> @@ -211,6 +211,8 @@ extern bool capable(int cap);
>>   extern bool ns_capable(struct user_namespace *ns, int cap);
>>   extern bool ns_capable_noaudit(struct user_namespace *ns, int cap);
>>   extern bool ns_capable_setid(struct user_namespace *ns, int cap);
>> +extern bool ns_capable_inode(struct user_namespace *ns, int cap,
>> +			     const struct inode *inode);
>>   #else
>>   static inline bool has_capability(struct task_struct *t, int cap)
>>   {
>> @@ -246,6 +248,11 @@ static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
>>   {
>>   	return true;
>>   }
>> +static bool ns_capable_inode(struct user_namespace *ns, int cap,
>> +			     const struct inode *inode)
>> +{
>> +	return true;
>> +}
>>   #endif /* CONFIG_MULTIUSER */
>>   extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode);
>>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>> diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
>> index 915330abf6e5..15d2a62639f0 100644
>> --- a/include/linux/lsm_audit.h
>> +++ b/include/linux/lsm_audit.h
>> @@ -79,7 +79,10 @@ struct common_audit_data {
>>   		struct dentry *dentry;
>>   		struct inode *inode;
>>   		struct lsm_network_audit *net;
>> -		int cap;
>> +		struct  {
>> +			int cap;
> 
>> +			struct cap_aux_data *cad;
>> +		} cap_struct;
>>   		int ipc_id;
>>   		struct task_struct *tsk;
>>   #ifdef CONFIG_KEYS
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 47f58cfb6a19..b2a37d613030 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -1469,7 +1469,8 @@ union security_list_options {
>>   	int (*capable)(const struct cred *cred,
>>   			struct user_namespace *ns,
>>   			int cap,
>> -			unsigned int opts);
>> +			unsigned int opts,
>> +			struct cap_aux_data *cad);
>>   	int (*quotactl)(int cmds, int type, int id, struct super_block *sb);
>>   	int (*quota_on)(struct dentry *dentry);
>>   	int (*syslog)(int type);
>> diff --git a/include/linux/security.h b/include/linux/security.h
>> index 659071c2e57c..8fce5e69dc52 100644
>> --- a/include/linux/security.h
>> +++ b/include/linux/security.h
>> @@ -77,9 +77,18 @@ enum lsm_event {
>>   	LSM_POLICY_CHANGE,
>>   };
>>   
>> +
>> +struct cap_aux_data {
>> +	char type;
>> +#define CAP_AUX_DATA_INODE	1
>> +	union	{
>> +		const struct inode *inode;
>> +	} u;
>> +};
>> +
>>   /* These functions are in security/commoncap.c */
>>   extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
>> -		       int cap, unsigned int opts);
>> +		       int cap, unsigned int opts, struct cap_aux_data *cad);
>>   extern int cap_settime(const struct timespec64 *ts, const struct timezone *tz);
>>   extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
>>   extern int cap_ptrace_traceme(struct task_struct *parent);
>> @@ -215,9 +224,10 @@ int security_capset(struct cred *new, const struct cred *old,
>>   		    const kernel_cap_t *inheritable,
>>   		    const kernel_cap_t *permitted);
>>   int security_capable(const struct cred *cred,
>> -		       struct user_namespace *ns,
>> -		       int cap,
>> -		       unsigned int opts);
>> +		     struct user_namespace *ns,
>> +		     int cap,
>> +		     unsigned int opts,
>> +		     struct cap_aux_data *cad);
>>   int security_quotactl(int cmds, int type, int id, struct super_block *sb);
>>   int security_quota_on(struct dentry *dentry);
>>   int security_syslog(int type);
>> @@ -478,9 +488,10 @@ static inline int security_capset(struct cred *new,
>>   static inline int security_capable(const struct cred *cred,
>>   				   struct user_namespace *ns,
>>   				   int cap,
>> -				   unsigned int opts)
>> +				   unsigned int opts,
>> +				   struct cap_aux_data *cad)
>>   {
>> -	return cap_capable(cred, ns, cap, opts);
>> +	return cap_capable(cred, ns, cap, opts, NULL);
>>   }
>>   
>>   static inline int security_quotactl(int cmds, int type, int id,
>> diff --git a/kernel/capability.c b/kernel/capability.c
>> index 1444f3954d75..c3723443904a 100644
>> --- a/kernel/capability.c
>> +++ b/kernel/capability.c
>> @@ -297,7 +297,7 @@ bool has_ns_capability(struct task_struct *t,
>>   	int ret;
>>   
>>   	rcu_read_lock();
>> -	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE);
>> +	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE, NULL);
>>   	rcu_read_unlock();
>>   
>>   	return (ret == 0);
>> @@ -338,7 +338,7 @@ bool has_ns_capability_noaudit(struct task_struct *t,
>>   	int ret;
>>   
>>   	rcu_read_lock();
>> -	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT);
>> +	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT, NULL);
>>   	rcu_read_unlock();
>>   
>>   	return (ret == 0);
>> @@ -363,7 +363,8 @@ bool has_capability_noaudit(struct task_struct *t, int cap)
>>   
>>   static bool ns_capable_common(struct user_namespace *ns,
>>   			      int cap,
>> -			      unsigned int opts)
>> +			      unsigned int opts,
>> +			      struct cap_aux_data *cad)
>>   {
>>   	int capable;
>>   
>> @@ -372,7 +373,7 @@ static bool ns_capable_common(struct user_namespace *ns,
>>   		BUG();
>>   	}
>>   
>> -	capable = security_capable(current_cred(), ns, cap, opts);
>> +	capable = security_capable(current_cred(), ns, cap, opts, cad);
>>   	if (capable == 0) {
>>   		current->flags |= PF_SUPERPRIV;
>>   		return true;
>> @@ -393,7 +394,7 @@ static bool ns_capable_common(struct user_namespace *ns,
>>    */
>>   bool ns_capable(struct user_namespace *ns, int cap)
>>   {
>> -	return ns_capable_common(ns, cap, CAP_OPT_NONE);
>> +	return ns_capable_common(ns, cap, CAP_OPT_NONE, NULL);
>>   }
>>   EXPORT_SYMBOL(ns_capable);
>>   
>> @@ -411,7 +412,7 @@ EXPORT_SYMBOL(ns_capable);
>>    */
>>   bool ns_capable_noaudit(struct user_namespace *ns, int cap)
>>   {
>> -	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT);
>> +	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT, NULL);
>>   }
>>   EXPORT_SYMBOL(ns_capable_noaudit);
>>   
>> @@ -430,7 +431,7 @@ EXPORT_SYMBOL(ns_capable_noaudit);
>>    */
>>   bool ns_capable_setid(struct user_namespace *ns, int cap)
>>   {
>> -	return ns_capable_common(ns, cap, CAP_OPT_INSETID);
>> +	return ns_capable_common(ns, cap, CAP_OPT_INSETID, NULL);
>>   }
>>   EXPORT_SYMBOL(ns_capable_setid);
>>   
>> @@ -470,7 +471,7 @@ bool file_ns_capable(const struct file *file, struct user_namespace *ns,
>>   	if (WARN_ON_ONCE(!cap_valid(cap)))
>>   		return false;
>>   
>> -	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE) == 0)
>> +	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE, NULL) == 0)
>>   		return true;
>>   
>>   	return false;
>> @@ -503,7 +504,8 @@ bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
>>   {
>>   	struct user_namespace *ns = current_user_ns();
>>   
>> -	return ns_capable(ns, cap) && privileged_wrt_inode_uidgid(ns, inode);
>> +	return ns_capable_inode(ns, cap, inode) &&
>> +		privileged_wrt_inode_uidgid(ns, inode);
>>   }
>>   EXPORT_SYMBOL(capable_wrt_inode_uidgid);
>>   
>> @@ -524,7 +526,18 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
>>   	cred = rcu_dereference(tsk->ptracer_cred);
>>   	if (cred)
>>   		ret = security_capable(cred, ns, CAP_SYS_PTRACE,
>> -				       CAP_OPT_NOAUDIT);
>> +				       CAP_OPT_NOAUDIT, NULL);
>>   	rcu_read_unlock();
>>   	return (ret == 0);
>>   }
>> +
>> +bool ns_capable_inode(struct user_namespace *ns, int cap,
>> +			const struct inode *inode)
>> +{
>> +	struct cap_aux_data cad;
>> +
>> +	cad.type = CAP_AUX_DATA_INODE;
>> +	cad.u.inode = inode;
>> +	return ns_capable_common(ns, cap, CAP_OPT_NONE, &cad);
>> +}
>> +EXPORT_SYMBOL(ns_capable_inode);
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 811b4a86cdf6..d59dd7079ece 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -446,7 +446,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>>   	 */
>>   	if (!task_no_new_privs(current) &&
>>   	    security_capable(current_cred(), current_user_ns(),
>> -				     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) != 0)
>> +			     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) != 0)
>>   		return ERR_PTR(-EACCES);
>>   
>>   	/* Allocate a new seccomp_filter */
>> diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
>> index 752f73980e30..c45192a16733 100644
>> --- a/security/apparmor/capability.c
>> +++ b/security/apparmor/capability.c
>> @@ -50,7 +50,7 @@ static void audit_cb(struct audit_buffer *ab, void *va)
>>   	struct common_audit_data *sa = va;
>>   
>>   	audit_log_format(ab, " capname=");
>> -	audit_log_untrustedstring(ab, capability_names[sa->u.cap]);
>> +	audit_log_untrustedstring(ab, capability_names[sa->u.cap_struct.cap]);
>>   }
>>   
>>   /**
>> @@ -148,13 +148,15 @@ static int profile_capable(struct aa_profile *profile, int cap,
>>    *
>>    * Returns: 0 on success, or else an error code.
>>    */
>> -int aa_capable(struct aa_label *label, int cap, unsigned int opts)
>> +int aa_capable(struct aa_label *label, int cap, unsigned int opts,
>> +	       struct cap_aux_data *cad)
>>   {
>>   	struct aa_profile *profile;
>>   	int error = 0;
>>   	DEFINE_AUDIT_DATA(sa, LSM_AUDIT_DATA_CAP, OP_CAPABLE);
>>   
>> -	sa.u.cap = cap;
>> +	sa.u.cap_struct.cap = cap;
>> +	sa.u.cap_struct.cad = cad;
>>   	error = fn_for_each_confined(label, profile,
>>   			profile_capable(profile, cap, opts, &sa));
>>   
>> diff --git a/security/apparmor/include/capability.h b/security/apparmor/include/capability.h
>> index 1b3663b6ab12..d888f09d76d1 100644
>> --- a/security/apparmor/include/capability.h
>> +++ b/security/apparmor/include/capability.h
>> @@ -20,6 +20,7 @@
>>   #include "apparmorfs.h"
>>   
>>   struct aa_label;
>> +struct cap_aux_data;
>>   
>>   /* aa_caps - confinement data for capabilities
>>    * @allowed: capabilities mask
>> @@ -40,7 +41,8 @@ struct aa_caps {
>>   
>>   extern struct aa_sfs_entry aa_sfs_entry_caps[];
>>   
>> -int aa_capable(struct aa_label *label, int cap, unsigned int opts);
>> +int aa_capable(struct aa_label *label, int cap, unsigned int opts,
>> +	       struct cap_aux_data *cad);
>>   
>>   static inline void aa_free_cap_rules(struct aa_caps *caps)
>>   {
>> diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
>> index aacd1e95cb59..deb5267ca695 100644
>> --- a/security/apparmor/ipc.c
>> +++ b/security/apparmor/ipc.c
>> @@ -108,7 +108,7 @@ static int profile_tracer_perm(struct aa_profile *tracer,
>>   	aad(sa)->peer = tracee;
>>   	aad(sa)->request = 0;
>>   	aad(sa)->error = aa_capable(&tracer->label, CAP_SYS_PTRACE,
>> -				    CAP_OPT_NONE);
>> +				    CAP_OPT_NONE, NULL);
>>   
>>   	return aa_audit(AUDIT_APPARMOR_AUTO, tracer, sa, audit_ptrace_cb);
>>   }
>> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
>> index 87500bde5a92..82790accb679 100644
>> --- a/security/apparmor/lsm.c
>> +++ b/security/apparmor/lsm.c
>> @@ -172,14 +172,15 @@ static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
>>   }
>>   
>>   static int apparmor_capable(const struct cred *cred, struct user_namespace *ns,
>> -			    int cap, unsigned int opts)
>> +			    int cap, unsigned int opts,
>> +			    struct cap_aux_data *cad)
>>   {
>>   	struct aa_label *label;
>>   	int error = 0;
>>   
>>   	label = aa_get_newest_cred_label(cred);
>>   	if (!unconfined(label))
>> -		error = aa_capable(label, cap, opts);
>> +		error = aa_capable(label, cap, opts, cad);
>>   	aa_put_label(label);
>>   
>>   	return error;
>> diff --git a/security/apparmor/resource.c b/security/apparmor/resource.c
>> index 552ed09cb47e..9b3d4b4437f2 100644
>> --- a/security/apparmor/resource.c
>> +++ b/security/apparmor/resource.c
>> @@ -124,7 +124,7 @@ int aa_task_setrlimit(struct aa_label *label, struct task_struct *task,
>>   	 */
>>   
>>   	if (label != peer &&
>> -	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT) != 0)
>> +	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT, NULL) != 0)
>>   		error = fn_for_each(label, profile,
>>   				audit_resource(profile, resource,
>>   					       new_rlim->rlim_max, peer,
>> diff --git a/security/commoncap.c b/security/commoncap.c
>> index c477fb673701..1860ea50f473 100644
>> --- a/security/commoncap.c
>> +++ b/security/commoncap.c
>> @@ -68,7 +68,7 @@ static void warn_setuid_and_fcaps_mixed(const char *fname)
>>    * kernel's capable() and has_capability() returns 1 for this case.
>>    */
>>   int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
>> -		int cap, unsigned int opts)
>> +		int cap, unsigned int opts, struct cap_aux_data *cad)
>>   {
>>   	struct user_namespace *ns = targ_ns;
>>   
>> @@ -226,7 +226,7 @@ static inline int cap_inh_is_capped(void)
>>   	 * capability
>>   	 */
>>   	if (cap_capable(current_cred(), current_cred()->user_ns,
>> -			CAP_SETPCAP, CAP_OPT_NONE) == 0)
>> +			CAP_SETPCAP, CAP_OPT_NONE, NULL) == 0)
>>   		return 0;
>>   	return 1;
>>   }
>> @@ -1211,7 +1211,8 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>>   		    || (cap_capable(current_cred(),
>>   				    current_cred()->user_ns,
>>   				    CAP_SETPCAP,
>> -				    CAP_OPT_NONE) != 0)			/*[4]*/
>> +				    CAP_OPT_NONE,
>> +				    NULL) != 0)				/*[4]*/
>>   			/*
>>   			 * [1] no changing of bits that are locked
>>   			 * [2] no unlocking of locks
>> @@ -1307,7 +1308,7 @@ int cap_vm_enough_memory(struct mm_struct *mm, long pages)
>>   	int cap_sys_admin = 0;
>>   
>>   	if (cap_capable(current_cred(), &init_user_ns,
>> -				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) == 0)
>> +				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) == 0)
>>   		cap_sys_admin = 1;
>>   
>>   	return cap_sys_admin;
>> @@ -1328,7 +1329,7 @@ int cap_mmap_addr(unsigned long addr)
>>   
>>   	if (addr < dac_mmap_min_addr) {
>>   		ret = cap_capable(current_cred(), &init_user_ns, CAP_SYS_RAWIO,
>> -				  CAP_OPT_NONE);
>> +				  CAP_OPT_NONE, NULL);
>>   		/* set PF_SUPERPRIV if it turns out we allow the low mmap */
>>   		if (ret == 0)
>>   			current->flags |= PF_SUPERPRIV;
>> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
>> index 33028c098ef3..4871b2508a4a 100644
>> --- a/security/lsm_audit.c
>> +++ b/security/lsm_audit.c
>> @@ -229,9 +229,26 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>>   	case LSM_AUDIT_DATA_IPC:
>>   		audit_log_format(ab, " key=%d ", a->u.ipc_id);
>>   		break;
>> -	case LSM_AUDIT_DATA_CAP:
>> -		audit_log_format(ab, " capability=%d ", a->u.cap);
>> +	case LSM_AUDIT_DATA_CAP: {
>> +		const struct inode *inode;
>> +
>> +		if (a->u.cap_struct.cad) {
>> +			switch (a->u.cap_struct.cad->type) {
>> +			case CAP_AUX_DATA_INODE: {
>> +				inode = a->u.cap_struct.cad->u.inode;
>> +
>> +				audit_log_format(ab, " dev=");
>> +				audit_log_untrustedstring(ab,
>> +					inode->i_sb->s_id);
>> +				audit_log_format(ab, " ino=%lu",
>> +					inode->i_ino);
>> +				break;
>> +			}
>> +			}
>> +		}
>> +		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
>>   		break;
>> +	}
>>   	case LSM_AUDIT_DATA_PATH: {
>>   		struct inode *inode;
>>   
>> diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
>> index cecd38e2ac80..c74ed83e9501 100644
>> --- a/security/safesetid/lsm.c
>> +++ b/security/safesetid/lsm.c
>> @@ -80,7 +80,8 @@ static bool check_setuid_policy_hashtable_key_value(kuid_t parent,
>>   static int safesetid_security_capable(const struct cred *cred,
>>   				      struct user_namespace *ns,
>>   				      int cap,
>> -				      unsigned int opts)
>> +				      unsigned int opts,
>> +				      struct cap_aux_data *cad)
>>   {
>>   	if (cap == CAP_SETUID &&
>>   	    check_setuid_policy_hashtable_key(cred->uid)) {
>> diff --git a/security/security.c b/security/security.c
>> index 613a5c00e602..06274a7b9c4e 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -691,9 +691,10 @@ int security_capset(struct cred *new, const struct cred *old,
>>   int security_capable(const struct cred *cred,
>>   		     struct user_namespace *ns,
>>   		     int cap,
>> -		     unsigned int opts)
>> +		     unsigned int opts,
>> +		     struct cap_aux_data *cad)
>>   {
>> -	return call_int_hook(capable, 0, cred, ns, cap, opts);
>> +	return call_int_hook(capable, 0, cred, ns, cap, opts, cad);
>>   }
>>   
>>   int security_quotactl(int cmds, int type, int id, struct super_block *sb)
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index f77b314d0575..d6c699ed06be 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -1618,7 +1618,10 @@ static inline u32 signal_to_av(int sig)
>>   
>>   /* Check whether a task is allowed to use a capability. */
>>   static int cred_has_capability(const struct cred *cred,
>> -			       int cap, unsigned int opts, bool initns)
>> +				int cap,
>> +				unsigned int opts,
>> +				bool initns,
>> +				struct cap_aux_data *cad)
>>   {
>>   	struct common_audit_data ad;
>>   	struct av_decision avd;
>> @@ -1628,7 +1631,8 @@ static int cred_has_capability(const struct cred *cred,
>>   	int rc;
>>   
>>   	ad.type = LSM_AUDIT_DATA_CAP;
>> -	ad.u.cap = cap;
>> +	ad.u.cap_struct.cad = cad;
>> +	ad.u.cap_struct.cap = cap;
>>   
>>   	switch (CAP_TO_INDEX(cap)) {
>>   	case 0:
>> @@ -2165,9 +2169,9 @@ static int selinux_capset(struct cred *new, const struct cred *old,
>>    */
>>   
>>   static int selinux_capable(const struct cred *cred, struct user_namespace *ns,
>> -			   int cap, unsigned int opts)
>> +			   int cap, unsigned int opts, struct cap_aux_data *cad)
>>   {
>> -	return cred_has_capability(cred, cap, opts, ns == &init_user_ns);
>> +	return cred_has_capability(cred, cap, opts, ns == &init_user_ns, cad);
>>   }
>>   
>>   static int selinux_quotactl(int cmds, int type, int id, struct super_block *sb)
>> @@ -2241,7 +2245,7 @@ static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
>>   	int rc, cap_sys_admin = 0;
>>   
>>   	rc = cred_has_capability(current_cred(), CAP_SYS_ADMIN,
>> -				 CAP_OPT_NOAUDIT, true);
>> +				 CAP_OPT_NOAUDIT, true, NULL);
>>   	if (rc == 0)
>>   		cap_sys_admin = 1;
>>   
>> @@ -3101,9 +3105,9 @@ static bool has_cap_mac_admin(bool audit)
>>   	const struct cred *cred = current_cred();
>>   	unsigned int opts = audit ? CAP_OPT_NONE : CAP_OPT_NOAUDIT;
>>   
>> -	if (cap_capable(cred, &init_user_ns, CAP_MAC_ADMIN, opts))
>> +	if (cap_capable(cred, &init_user_ns, CAP_MAC_ADMIN, opts, NULL))
>>   		return false;
>> -	if (cred_has_capability(cred, CAP_MAC_ADMIN, opts, true))
>> +	if (cred_has_capability(cred, CAP_MAC_ADMIN, opts, true, NULL))
>>   		return false;
>>   	return true;
>>   }
>> @@ -3563,7 +3567,7 @@ static int selinux_file_ioctl(struct file *file, unsigned int cmd,
>>   	case KDSKBENT:
>>   	case KDSKBSENT:
>>   		error = cred_has_capability(cred, CAP_SYS_TTY_CONFIG,
>> -					    CAP_OPT_NONE, true);
>> +					    CAP_OPT_NONE, true, NULL);
>>   		break;
>>   
>>   	/* default case assumes that the command will go
>> diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
>> index fe2ce3a65822..e961bfe8f00a 100644
>> --- a/security/smack/smack_access.c
>> +++ b/security/smack/smack_access.c
>> @@ -640,7 +640,7 @@ bool smack_privileged_cred(int cap, const struct cred *cred)
>>   	struct smack_known_list_elem *sklep;
>>   	int rc;
>>   
>> -	rc = cap_capable(cred, &init_user_ns, cap, CAP_OPT_NONE);
>> +	rc = cap_capable(cred, &init_user_ns, cap, CAP_OPT_NONE, NULL);
>>   	if (rc)
>>   		return false;
>>   
>
Casey Schaufler July 12, 2019, 7:54 p.m. UTC | #5
On 7/12/2019 11:25 AM, Stephen Smalley wrote:
> On 7/12/19 1:58 PM, Casey Schaufler wrote:
>> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>>> At present security_capable does not pass any object information
>>> and therefore can neither audit the particular object nor take it
>>> into account. Augment the security_capable interface to support
>>> passing supplementary data. Use this facility initially to convey
>>> the inode for capability checks relevant to inodes. This only
>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>> relevant to inodes will be addressed in subsequent changes. In the
>>> future, this will be further extended to pass object information for
>>> other capability checks such as the target task for CAP_KILL.
>>
>> This seems wrong to me. The capability system has nothing to do
>> with objects. Passing object information through security_capable()
>> may be convenient, but isn't relevant to the purpose of the interface.
>> It appears that there are very few places where the object information
>> is actually useful.
>
> A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.

I'm not disagreeing with that. What I'm saying is that the capability
check interface is not the right place to pass that information. The
capability check has no use for the object information. I would much
rather see a security_pass_object_data() hook that gets called after
(or before) the security_capable() hook in the places where you want
the extra information.

>>> In SELinux this new information is leveraged here to include the inode
>>> in the audit message. In the future, it could also be used to perform
>>> a per inode capability checks.
>>
>> I suggest that you want a mechanism for adding the inode information
>> to the audit record instead.
>
> That is part of what we want, but not the entire picture.  But with respect to audit, the problem today is that one sees SELinux dac_read_search, dac_override, etc denials with no indication as to the particular file, which is unfortunate both from a security auditing perspective and from a policy development perspective.

I can see how that is a problem.

> The only option today to gain that information is by enabling system call audit and setting at least one audit filter so that the audit framework will collect that information and include it in the audit records that are emitted upon syscall exit after any such AVC denial.  Unfortunately, that is all disabled by default on most systems due to its associated performance overhead, and one doesn't even have the option of enabling it on some systems, e.g. Android devices.  And even when one can enable syscall audit, one must correlate the syscall audit record to the associated AVC record to identify the object rather than having the information directly included in the same record.

None of which gives any rationale for adding the information
to the capability check. Sure, it's in the right place, but there
is no object interaction with the capability call.

>> What would a "per inode" capability check be? Capability checks are
>> process checks, not object checks.
>
> Ideally it would be possible to scope dac_override and other capabilities to specific objects rather than having to allow it for all or none.

That would require a major overhaul of the capability scheme,
and you're going to get arguments from people like me about
whether or not that would be ideal. Besides, isn't that what
SELinux is all about, providing that sort of privilege granularity?

> Just because a process needs to override DAC on one file or one set of files is not a reason to allow it to do so on every file it can access at all.

That's an argument for privilege bracketing within the process.
Not something I recommend (the oft referenced sendmail problems
being but one example) but the only way to do it properly without
delving into path based controls.

> If we want to apply least privilege, then this is a desirable facility.

The capability mechanism is object agnostic by design.

> I understand that doesn't mesh with Smack's mental modelbut it would probably be useful to both SELinux and AppArmor, among others.

I'm perfectly happy to have the information transmitted.
I think a separate interface for doing so is appropriate.
Stephen Smalley July 12, 2019, 8:21 p.m. UTC | #6
On 7/12/19 3:54 PM, Casey Schaufler wrote:
> On 7/12/2019 11:25 AM, Stephen Smalley wrote:
>> On 7/12/19 1:58 PM, Casey Schaufler wrote:
>>> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>>>> At present security_capable does not pass any object information
>>>> and therefore can neither audit the particular object nor take it
>>>> into account. Augment the security_capable interface to support
>>>> passing supplementary data. Use this facility initially to convey
>>>> the inode for capability checks relevant to inodes. This only
>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>>> relevant to inodes will be addressed in subsequent changes. In the
>>>> future, this will be further extended to pass object information for
>>>> other capability checks such as the target task for CAP_KILL.
>>>
>>> This seems wrong to me. The capability system has nothing to do
>>> with objects. Passing object information through security_capable()
>>> may be convenient, but isn't relevant to the purpose of the interface.
>>> It appears that there are very few places where the object information
>>> is actually useful.
>>
>> A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
> 
> I'm not disagreeing with that. What I'm saying is that the capability
> check interface is not the right place to pass that information. The
> capability check has no use for the object information. I would much
> rather see a security_pass_object_data() hook that gets called after
> (or before) the security_capable() hook in the places where you want
> the extra information.

I don't see how that would work or be advantageous.  Within the capable 
hook, the security module(s) decide whether or not to allow the 
capability, and generate any relevant LSM audit records.  It is 
precisely at those two points (deciding and auditing) that we need the 
information; both occur within the existing capable hooks.  Calling a 
separate hook before the capable call and e.g. saving the information in 
the task security structure for later consumption during the capable 
call offers only overhead, no benefit.  Calling a separate hook after 
the capable call is too late to be of use - the decision and auditing 
are already done.  And both hooks would be invoked from precisely the 
same function at the same point.  If the information wasn't already 
readily available at the point of the hook call, it might be a different 
matter, but that isn't the case here.

> 
>>>> In SELinux this new information is leveraged here to include the inode
>>>> in the audit message. In the future, it could also be used to perform
>>>> a per inode capability checks.
>>>
>>> I suggest that you want a mechanism for adding the inode information
>>> to the audit record instead.
>>
>> That is part of what we want, but not the entire picture.  But with respect to audit, the problem today is that one sees SELinux dac_read_search, dac_override, etc denials with no indication as to the particular file, which is unfortunate both from a security auditing perspective and from a policy development perspective.
> 
> I can see how that is a problem.
> 
>> The only option today to gain that information is by enabling system call audit and setting at least one audit filter so that the audit framework will collect that information and include it in the audit records that are emitted upon syscall exit after any such AVC denial.  Unfortunately, that is all disabled by default on most systems due to its associated performance overhead, and one doesn't even have the option of enabling it on some systems, e.g. Android devices.  And even when one can enable syscall audit, one must correlate the syscall audit record to the associated AVC record to identify the object rather than having the information directly included in the same record.
> 
> None of which gives any rationale for adding the information
> to the capability check. Sure, it's in the right place, but there
> is no object interaction with the capability call.

We introducing such an interaction - that's the point.

> 
>>> What would a "per inode" capability check be? Capability checks are
>>> process checks, not object checks.
>>
>> Ideally it would be possible to scope dac_override and other capabilities to specific objects rather than having to allow it for all or none.
> 
> That would require a major overhaul of the capability scheme,
> and you're going to get arguments from people like me about
> whether or not that would be ideal. Besides, isn't that what
> SELinux is all about, providing that sort of privilege granularity?

It only requires passing the information to the security modules at the 
point of the hook call, and then SELinux or other security modules can 
implement it themselves without any other changes to the kernel.  We 
aren't changing the way the base capabilities logic works.

We can't provide that degree of granularity without the additional 
information.  Let's say domain A needs DAC override for files of type B 
and for nothing else.  To support this requirement, SELinux policy has 
to include at least:
1) allow A self:capability dac_override;
2) allow A B:file { read write };

Let's say that A also reads from files of type C and writes to file of 
type D.  So SELinux policy also has to allow:
3) allow A C:file read;
4) allow A D:file write;

There are files within type C and within type D that are under different 
DAC ownerships or modes.  Only some of these files should be accessible 
to A.  But because dac_override is global and not scoped to specific 
sets of files, the combination of these permissions now allows A to 
override DAC on files of type C and of type D; thus it can read all 
files of type C and write to all files of type D even though it has no 
legitimate need to do so.


> 
>> Just because a process needs to override DAC on one file or one set of files is not a reason to allow it to do so on every file it can access at all.
> 
> That's an argument for privilege bracketing within the process.
> Not something I recommend (the oft referenced sendmail problems
> being but one example) but the only way to do it properly without
> delving into path based controls.

As you note, historically privilege bracketing hasn't worked so well, 
and fundamentally it puts the trust burden on userspace for something 
that could be enforced at the system level quite easily and in a 
race-free manner.

> 
>> If we want to apply least privilege, then this is a desirable facility.
> 
> The capability mechanism is object agnostic by design.

Some might argue that's a flawed design.

>> I understand that doesn't mesh with Smack's mental modelbut it would probably be useful to both SELinux and AppArmor, among others.
> 
> I'm perfectly happy to have the information transmitted.
> I think a separate interface for doing so is appropriate.

As above, I don't see any way to do that that isn't just adding overhead.
Casey Schaufler July 12, 2019, 10:37 p.m. UTC | #7
On 7/12/2019 1:21 PM, Stephen Smalley wrote:
> On 7/12/19 3:54 PM, Casey Schaufler wrote:
>> On 7/12/2019 11:25 AM, Stephen Smalley wrote:
>>> On 7/12/19 1:58 PM, Casey Schaufler wrote:
>>>> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>>>>> At present security_capable does not pass any object information
>>>>> and therefore can neither audit the particular object nor take it
>>>>> into account. Augment the security_capable interface to support
>>>>> passing supplementary data. Use this facility initially to convey
>>>>> the inode for capability checks relevant to inodes. This only
>>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>>>> relevant to inodes will be addressed in subsequent changes. In the
>>>>> future, this will be further extended to pass object information for
>>>>> other capability checks such as the target task for CAP_KILL.
>>>>
>>>> This seems wrong to me. The capability system has nothing to do
>>>> with objects. Passing object information through security_capable()
>>>> may be convenient, but isn't relevant to the purpose of the interface.
>>>> It appears that there are very few places where the object information
>>>> is actually useful.
>>>
>>> A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
>>
>> I'm not disagreeing with that. What I'm saying is that the capability
>> check interface is not the right place to pass that information. The
>> capability check has no use for the object information. I would much
>> rather see a security_pass_object_data() hook that gets called after
>> (or before) the security_capable() hook in the places where you want
>> the extra information.
>
> I don't see how that would work or be advantageous.  Within the capable hook, the security module(s) decide whether or not to allow the capability, and generate any relevant LSM audit records.  It is precisely at those two points (deciding and auditing) that we need the information; both occur within the existing capable hooks.  Calling a separate hook before the capable call and e.g. saving the information in the task security structure for later consumption during the capable call offers only overhead, no benefit.  Calling a separate hook after the capable call is too late to be of use - the decision and auditing are already done.  And both hooks would be invoked from precisely the same function at the same point.  If the information wasn't already readily available at the point of the hook call, it might be a different matter, but that isn't the case here.

If there's a problem with the audit system you should be
looking at fixing that rather than tacking information that's
useless for capabilities onto it's interfaces.

>>>>> In SELinux this new information is leveraged here to include the inode
>>>>> in the audit message. In the future, it could also be used to perform
>>>>> a per inode capability checks.
>>>>
>>>> I suggest that you want a mechanism for adding the inode information
>>>> to the audit record instead.
>>>
>>> That is part of what we want, but not the entire picture.  But with respect to audit, the problem today is that one sees SELinux dac_read_search, dac_override, etc denials with no indication as to the particular file, which is unfortunate both from a security auditing perspective and from a policy development perspective.
>>
>> I can see how that is a problem.
>>
>>> The only option today to gain that information is by enabling system call audit and setting at least one audit filter so that the audit framework will collect that information and include it in the audit records that are emitted upon syscall exit after any such AVC denial.  Unfortunately, that is all disabled by default on most systems due to its associated performance overhead, and one doesn't even have the option of enabling it on some systems, e.g. Android devices.  And even when one can enable syscall audit, one must correlate the syscall audit record to the associated AVC record to identify the object rather than having the information directly included in the same record.
>>
>> None of which gives any rationale for adding the information
>> to the capability check. Sure, it's in the right place, but there
>> is no object interaction with the capability call.
>
> We introducing such an interaction - that's the point.

And I say that you're breaking the layering.

>>>> What would a "per inode" capability check be? Capability checks are
>>>> process checks, not object checks.
>>>
>>> Ideally it would be possible to scope dac_override and other capabilities to specific objects rather than having to allow it for all or none.
>>
>> That would require a major overhaul of the capability scheme,
>> and you're going to get arguments from people like me about
>> whether or not that would be ideal. Besides, isn't that what
>> SELinux is all about, providing that sort of privilege granularity?
>
> It only requires passing the information to the security modules at the point of the hook call, and then SELinux or other security modules can implement it themselves without any other changes to the kernel.  We aren't changing the way the base capabilities logic works.

If you're not changing how capabilities work you
shouldn't be adding parameters to its interface.

> We can't provide that degree of granularity without the additional information.  Let's say domain A needs DAC override for files of type B and for nothing else.  To support this requirement, SELinux policy has to include at least:
> 1) allow A self:capability dac_override;
> 2) allow A B:file { read write };
>
> Let's say that A also reads from files of type C and writes to file of type D.  So SELinux policy also has to allow:
> 3) allow A C:file read;
> 4) allow A D:file write;
>
> There are files within type C and within type D that are under different DAC ownerships or modes.  Only some of these files should be accessible to A.  But because dac_override is global and not scoped to specific sets of files, the combination of these permissions now allows A to override DAC on files of type C and of type D; thus it can read all files of type C and write to all files of type D even though it has no legitimate need to do so.

Capabilities do not implement a fine grained policy.
This has been lamented ad nauseam regarding CAP_SYS_ADMIN.
It is also irrelevant to the issue here. 

>>> Just because a process needs to override DAC on one file or one set of files is not a reason to allow it to do so on every file it can access at all.
>>
>> That's an argument for privilege bracketing within the process.
>> Not something I recommend (the oft referenced sendmail problems
>> being but one example) but the only way to do it properly without
>> delving into path based controls.
>
> As you note, historically privilege bracketing hasn't worked so well, and fundamentally it puts the trust burden on userspace for something that could be enforced at the system level quite easily and in a race-free manner.

I'm not arguing against that goal, I'm saying that you're
going about it the wrong way.


>>> If we want to apply least privilege, then this is a desirable facility.
>>
>> The capability mechanism is object agnostic by design.
>
> Some might argue that's a flawed design.

Might? Show me someone who's looked at capabilities who
doesn't think the design is flawed! The entire POSIX 1003.1e/2c
group knew it was flawed, which is the primary reason it never
got past DRAFT status. If capabilities had been perfect no one
would ever have been tempted to add domain enforcement.

But there it is. That is not an excuse to muddle it up
with pass-through parameters in an effort to wiggle around
the issues in other facilities.

>>> I understand that doesn't mesh with Smack's mental modelbut it would probably be useful to both SELinux and AppArmor, among others.
>>
>> I'm perfectly happy to have the information transmitted.
>> I think a separate interface for doing so is appropriate.
>
> As above, I don't see any way to do that that isn't just adding overhead.

I'll see if I can squeeze some time into alternatives, but
my brain is already wrestling with audit issues of my own.
James Morris July 13, 2019, 4:29 a.m. UTC | #8
On Fri, 12 Jul 2019, Casey Schaufler wrote:

> I'm not disagreeing with that. What I'm saying is that the capability
> check interface is not the right place to pass that information. The
> capability check has no use for the object information. I would much
> rather see a security_pass_object_data() hook that gets called after
> (or before) the security_capable() hook in the places where you want
> the extra information.

Extending existing security models is a core feature of the LSM framework. 

The Linux capability code has no use for object metadata by design, but 
extending that model to MAC (and other models) via LSM hooks is well 
within scope and of course already happening e.g. mediating Linux 
capabilities wrt SELinux subject types. Adding object metadata extends the 
function of the capability hook along these lines, so that more effective 
MAC policies may be implemented by LSMs.
James Morris July 13, 2019, 4:35 a.m. UTC | #9
On Fri, 12 Jul 2019, Stephen Smalley wrote:

> > > If we want to apply least privilege, then this is a desirable facility.
> > 
> > The capability mechanism is object agnostic by design.
> 
> Some might argue that's a flawed design.

Narrator: it's a flawed design.

> > > I understand that doesn't mesh with Smack's mental modelbut it would
> > > probably be useful to both SELinux and AppArmor, among others.
> > 
> > I'm perfectly happy to have the information transmitted.
> > I think a separate interface for doing so is appropriate.
> 
> As above, I don't see any way to do that that isn't just adding overhead.
> 

Agreed, and even so, part of the point of LSM is to allow existing 
security models to be extended to meet a wider range of security 
requirements.
Casey Schaufler July 13, 2019, 6:46 p.m. UTC | #10
On 7/12/2019 9:35 PM, James Morris wrote:
> On Fri, 12 Jul 2019, Stephen Smalley wrote:
>
>>>> If we want to apply least privilege, then this is a desirable facility.
>>> The capability mechanism is object agnostic by design.
>> Some might argue that's a flawed design.
> Narrator: it's a flawed design.
>
>>>> I understand that doesn't mesh with Smack's mental modelbut it would
>>>> probably be useful to both SELinux and AppArmor, among others.
>>> I'm perfectly happy to have the information transmitted.
>>> I think a separate interface for doing so is appropriate.
>> As above, I don't see any way to do that that isn't just adding overhead.
>>
> Agreed, and even so, part of the point of LSM is to allow existing 
> security models to be extended to meet a wider range of security 
> requirements.

We bow to the wisdom of the Maintainer.
Richard Guy Briggs July 15, 2019, 6:42 p.m. UTC | #11
On 2019-07-12 14:02, Stephen Smalley wrote:
> On 7/12/19 1:50 PM, James Morris wrote:
> > On Fri, 12 Jul 2019, Nicholas Franck wrote:
> > 

This aggressive trimming dropped a bit of helpful context:

+++ b/security/lsm_audit.c
@@ -229,9 +229,26 @@ static void dump_common_audit_data(struct
audit_buffer *ab,
        case LSM_AUDIT_DATA_IPC:
                audit_log_format(ab, " key=%d ", a->u.ipc_id);
                break;
-       case LSM_AUDIT_DATA_CAP:
-               audit_log_format(ab, " capability=%d ", a->u.cap);

> > > +	case LSM_AUDIT_DATA_CAP: {
> > > +		const struct inode *inode;
> > > +
> > > +		if (a->u.cap_struct.cad) {
> > > +			switch (a->u.cap_struct.cad->type) {
> > > +			case CAP_AUX_DATA_INODE: {
> > > +				inode = a->u.cap_struct.cad->u.inode;
> > > +
> > > +				audit_log_format(ab, " dev=");
> > > +				audit_log_untrustedstring(ab,
> > > +					inode->i_sb->s_id);
> > > +				audit_log_format(ab, " ino=%lu",
> > > +					inode->i_ino);
> > > +				break;
> > > +			}
> > > +			}
> > > +		}
> > > +		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
> > >   		break;

+       }
        case LSM_AUDIT_DATA_PATH: {
                struct inode *inode;

> > 
> > Will this break any existing userspace log parsers?
> 
> I'm hoping not given that we are only adding auxiliary fields and those are
> already defined for other AVC audit messages.  ausearch appeared to work
> fine.  Added the linux-audit mailing list to the cc line to get their view.

Generally, additional or optional fields should only be added after
existing ones.  Also, generally, fields should not be optional, but this
is tricky since it gobbles network and cpu bandwidth and disk space and
there are lots of precedents to contradict this.


- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
Serge E. Hallyn July 16, 2019, 2:03 p.m. UTC | #12
On Fri, Jul 12, 2019 at 12:54:02PM -0700, Casey Schaufler wrote:
> On 7/12/2019 11:25 AM, Stephen Smalley wrote:
> > On 7/12/19 1:58 PM, Casey Schaufler wrote:
> >> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
> >>> At present security_capable does not pass any object information
> >>> and therefore can neither audit the particular object nor take it
> >>> into account. Augment the security_capable interface to support
> >>> passing supplementary data. Use this facility initially to convey
> >>> the inode for capability checks relevant to inodes. This only
> >>> addresses capable_wrt_inode_uidgid calls; other capability checks
> >>> relevant to inodes will be addressed in subsequent changes. In the
> >>> future, this will be further extended to pass object information for
> >>> other capability checks such as the target task for CAP_KILL.
> >>
> >> This seems wrong to me. The capability system has nothing to do
> >> with objects. Passing object information through security_capable()
> >> may be convenient, but isn't relevant to the purpose of the interface.
> >> It appears that there are very few places where the object information
> >> is actually useful.
> >
> > A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
> 
> I'm not disagreeing with that. What I'm saying is that the capability
> check interface is not the right place to pass that information. The
> capability check has no use for the object information. I would much

I've had to argue this before while doing the namespaced file
capabilities implementation.  Perhaps this would be worth writing something
more formal about.  My main argument is, even if we want to claim that the
capabilities model is and should be object agnostic, the implementation
of user namespaces (currently) is such that the whole view of the user's
privilege must include information which is stored with the object.

There are various user namespaces.

The Linux capabilities ( :-) ) model is user namespaced.  It must be, in
order to be useful.  If we're going to use file capabilities in distros,
and distros are going to run in containers, then the capabilities must
be namespaced.  Otherwise, capabilities will not be used, and heck, should
just be dropped.

The only way to find out which user namespace has privilege over an inode
is to look at the inode.

Therefore, object information is needed.

Until now we've sneaked around that by doing things like capable_wrt_inode_uidgid()
and rootid_from_xattr().

Again, this crucial: IMO, you have to be able to use a distro the same way in a
container and not.  Either we support using capabilities in a user namespaced
container, or we drop capabilities support will not be used, and we may as
well drop the module.

Now, yes, if someone tries to extend this stuff to do pathname parsing, then we
might have to put our footsies down.  But we've been dancing around this for a
long time anyway, so passing the inode in so we can do better logging gets a +1
from me.

-serge
Serge E. Hallyn July 16, 2019, 2:16 p.m. UTC | #13
On Fri, Jul 12, 2019 at 01:34:04PM -0400, Nicholas Franck wrote:
> At present security_capable does not pass any object information
> and therefore can neither audit the particular object nor take it
> into account. Augment the security_capable interface to support
> passing supplementary data. Use this facility initially to convey
> the inode for capability checks relevant to inodes. This only
> addresses capable_wrt_inode_uidgid calls; other capability checks
> relevant to inodes will be addressed in subsequent changes. In the
> future, this will be further extended to pass object information for
> other capability checks such as the target task for CAP_KILL.
> 
> In SELinux this new information is leveraged here to include the inode
> in the audit message. In the future, it could also be used to perform
> a per inode capability checks.
> 
> It would be possible to fold the existing opts argument into this new
> supplementary data structure. This was omitted from this change to
> minimize changes.
> 
> Signed-off-by: Nicholas Franck <nhfran2@tycho.nsa.gov>

One comment below, but overall

Acked-by: Serge Hallyn <serge@hallyn.com>

> ---
>  include/linux/capability.h             |  7 ++++++
>  include/linux/lsm_audit.h              |  5 +++-
>  include/linux/lsm_hooks.h              |  3 ++-
>  include/linux/security.h               | 23 +++++++++++++-----
>  kernel/capability.c                    | 33 ++++++++++++++++++--------
>  kernel/seccomp.c                       |  2 +-
>  security/apparmor/capability.c         |  8 ++++---
>  security/apparmor/include/capability.h |  4 +++-
>  security/apparmor/ipc.c                |  2 +-
>  security/apparmor/lsm.c                |  5 ++--
>  security/apparmor/resource.c           |  2 +-
>  security/commoncap.c                   | 11 +++++----
>  security/lsm_audit.c                   | 21 ++++++++++++++--
>  security/safesetid/lsm.c               |  3 ++-
>  security/security.c                    |  5 ++--
>  security/selinux/hooks.c               | 20 +++++++++-------
>  security/smack/smack_access.c          |  2 +-
>  17 files changed, 110 insertions(+), 46 deletions(-)
> 
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index ecce0f43c73a..f72de64c179d 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -211,6 +211,8 @@ extern bool capable(int cap);
>  extern bool ns_capable(struct user_namespace *ns, int cap);
>  extern bool ns_capable_noaudit(struct user_namespace *ns, int cap);
>  extern bool ns_capable_setid(struct user_namespace *ns, int cap);
> +extern bool ns_capable_inode(struct user_namespace *ns, int cap,
> +			     const struct inode *inode);
>  #else
>  static inline bool has_capability(struct task_struct *t, int cap)
>  {
> @@ -246,6 +248,11 @@ static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
>  {
>  	return true;
>  }
> +static bool ns_capable_inode(struct user_namespace *ns, int cap,
> +			     const struct inode *inode)
> +{
> +	return true;
> +}
>  #endif /* CONFIG_MULTIUSER */
>  extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode);
>  extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
> index 915330abf6e5..15d2a62639f0 100644
> --- a/include/linux/lsm_audit.h
> +++ b/include/linux/lsm_audit.h
> @@ -79,7 +79,10 @@ struct common_audit_data {
>  		struct dentry *dentry;
>  		struct inode *inode;
>  		struct lsm_network_audit *net;
> -		int cap;
> +		struct  {
> +			int cap;
> +			struct cap_aux_data *cad;
> +		} cap_struct;
>  		int ipc_id;
>  		struct task_struct *tsk;
>  #ifdef CONFIG_KEYS
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 47f58cfb6a19..b2a37d613030 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1469,7 +1469,8 @@ union security_list_options {
>  	int (*capable)(const struct cred *cred,
>  			struct user_namespace *ns,
>  			int cap,
> -			unsigned int opts);
> +			unsigned int opts,
> +			struct cap_aux_data *cad);
>  	int (*quotactl)(int cmds, int type, int id, struct super_block *sb);
>  	int (*quota_on)(struct dentry *dentry);
>  	int (*syslog)(int type);
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 659071c2e57c..8fce5e69dc52 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -77,9 +77,18 @@ enum lsm_event {
>  	LSM_POLICY_CHANGE,
>  };
>  
> +
> +struct cap_aux_data {
> +	char type;
> +#define CAP_AUX_DATA_INODE	1
> +	union	{
> +		const struct inode *inode;
> +	} u;
> +};
> +
>  /* These functions are in security/commoncap.c */
>  extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
> -		       int cap, unsigned int opts);
> +		       int cap, unsigned int opts, struct cap_aux_data *cad);
>  extern int cap_settime(const struct timespec64 *ts, const struct timezone *tz);
>  extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
>  extern int cap_ptrace_traceme(struct task_struct *parent);
> @@ -215,9 +224,10 @@ int security_capset(struct cred *new, const struct cred *old,
>  		    const kernel_cap_t *inheritable,
>  		    const kernel_cap_t *permitted);
>  int security_capable(const struct cred *cred,
> -		       struct user_namespace *ns,
> -		       int cap,
> -		       unsigned int opts);
> +		     struct user_namespace *ns,
> +		     int cap,
> +		     unsigned int opts,
> +		     struct cap_aux_data *cad);
>  int security_quotactl(int cmds, int type, int id, struct super_block *sb);
>  int security_quota_on(struct dentry *dentry);
>  int security_syslog(int type);
> @@ -478,9 +488,10 @@ static inline int security_capset(struct cred *new,
>  static inline int security_capable(const struct cred *cred,
>  				   struct user_namespace *ns,
>  				   int cap,
> -				   unsigned int opts)
> +				   unsigned int opts,
> +				   struct cap_aux_data *cad)
>  {
> -	return cap_capable(cred, ns, cap, opts);
> +	return cap_capable(cred, ns, cap, opts, NULL);
>  }
>  
>  static inline int security_quotactl(int cmds, int type, int id,
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 1444f3954d75..c3723443904a 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -297,7 +297,7 @@ bool has_ns_capability(struct task_struct *t,
>  	int ret;
>  
>  	rcu_read_lock();
> -	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE);
> +	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE, NULL);
>  	rcu_read_unlock();
>  
>  	return (ret == 0);
> @@ -338,7 +338,7 @@ bool has_ns_capability_noaudit(struct task_struct *t,
>  	int ret;
>  
>  	rcu_read_lock();
> -	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT);
> +	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT, NULL);
>  	rcu_read_unlock();
>  
>  	return (ret == 0);
> @@ -363,7 +363,8 @@ bool has_capability_noaudit(struct task_struct *t, int cap)
>  
>  static bool ns_capable_common(struct user_namespace *ns,
>  			      int cap,
> -			      unsigned int opts)
> +			      unsigned int opts,
> +			      struct cap_aux_data *cad)
>  {
>  	int capable;
>  
> @@ -372,7 +373,7 @@ static bool ns_capable_common(struct user_namespace *ns,
>  		BUG();
>  	}
>  
> -	capable = security_capable(current_cred(), ns, cap, opts);
> +	capable = security_capable(current_cred(), ns, cap, opts, cad);
>  	if (capable == 0) {
>  		current->flags |= PF_SUPERPRIV;
>  		return true;
> @@ -393,7 +394,7 @@ static bool ns_capable_common(struct user_namespace *ns,
>   */
>  bool ns_capable(struct user_namespace *ns, int cap)
>  {
> -	return ns_capable_common(ns, cap, CAP_OPT_NONE);
> +	return ns_capable_common(ns, cap, CAP_OPT_NONE, NULL);
>  }
>  EXPORT_SYMBOL(ns_capable);
>  
> @@ -411,7 +412,7 @@ EXPORT_SYMBOL(ns_capable);
>   */
>  bool ns_capable_noaudit(struct user_namespace *ns, int cap)
>  {
> -	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT);
> +	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT, NULL);
>  }
>  EXPORT_SYMBOL(ns_capable_noaudit);
>  
> @@ -430,7 +431,7 @@ EXPORT_SYMBOL(ns_capable_noaudit);
>   */
>  bool ns_capable_setid(struct user_namespace *ns, int cap)
>  {
> -	return ns_capable_common(ns, cap, CAP_OPT_INSETID);
> +	return ns_capable_common(ns, cap, CAP_OPT_INSETID, NULL);
>  }
>  EXPORT_SYMBOL(ns_capable_setid);
>  
> @@ -470,7 +471,7 @@ bool file_ns_capable(const struct file *file, struct user_namespace *ns,
>  	if (WARN_ON_ONCE(!cap_valid(cap)))
>  		return false;
>  
> -	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE) == 0)
> +	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE, NULL) == 0)
>  		return true;
>  
>  	return false;
> @@ -503,7 +504,8 @@ bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
>  {
>  	struct user_namespace *ns = current_user_ns();
>  
> -	return ns_capable(ns, cap) && privileged_wrt_inode_uidgid(ns, inode);
> +	return ns_capable_inode(ns, cap, inode) &&
> +		privileged_wrt_inode_uidgid(ns, inode);
>  }
>  EXPORT_SYMBOL(capable_wrt_inode_uidgid);
>  
> @@ -524,7 +526,18 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
>  	cred = rcu_dereference(tsk->ptracer_cred);
>  	if (cred)
>  		ret = security_capable(cred, ns, CAP_SYS_PTRACE,
> -				       CAP_OPT_NOAUDIT);
> +				       CAP_OPT_NOAUDIT, NULL);
>  	rcu_read_unlock();
>  	return (ret == 0);
>  }
> +
> +bool ns_capable_inode(struct user_namespace *ns, int cap,
> +			const struct inode *inode)
> +{
> +	struct cap_aux_data cad;
> +
> +	cad.type = CAP_AUX_DATA_INODE;
> +	cad.u.inode = inode;
> +	return ns_capable_common(ns, cap, CAP_OPT_NONE, &cad);
> +}
> +EXPORT_SYMBOL(ns_capable_inode);
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 811b4a86cdf6..d59dd7079ece 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -446,7 +446,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>  	 */
>  	if (!task_no_new_privs(current) &&
>  	    security_capable(current_cred(), current_user_ns(),
> -				     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) != 0)
> +			     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) != 0)
>  		return ERR_PTR(-EACCES);
>  
>  	/* Allocate a new seccomp_filter */
> diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
> index 752f73980e30..c45192a16733 100644
> --- a/security/apparmor/capability.c
> +++ b/security/apparmor/capability.c
> @@ -50,7 +50,7 @@ static void audit_cb(struct audit_buffer *ab, void *va)
>  	struct common_audit_data *sa = va;
>  
>  	audit_log_format(ab, " capname=");
> -	audit_log_untrustedstring(ab, capability_names[sa->u.cap]);
> +	audit_log_untrustedstring(ab, capability_names[sa->u.cap_struct.cap]);
>  }
>  
>  /**
> @@ -148,13 +148,15 @@ static int profile_capable(struct aa_profile *profile, int cap,
>   *
>   * Returns: 0 on success, or else an error code.
>   */
> -int aa_capable(struct aa_label *label, int cap, unsigned int opts)
> +int aa_capable(struct aa_label *label, int cap, unsigned int opts,
> +	       struct cap_aux_data *cad)
>  {
>  	struct aa_profile *profile;
>  	int error = 0;
>  	DEFINE_AUDIT_DATA(sa, LSM_AUDIT_DATA_CAP, OP_CAPABLE);
>  
> -	sa.u.cap = cap;
> +	sa.u.cap_struct.cap = cap;
> +	sa.u.cap_struct.cad = cad;
>  	error = fn_for_each_confined(label, profile,
>  			profile_capable(profile, cap, opts, &sa));
>  
> diff --git a/security/apparmor/include/capability.h b/security/apparmor/include/capability.h
> index 1b3663b6ab12..d888f09d76d1 100644
> --- a/security/apparmor/include/capability.h
> +++ b/security/apparmor/include/capability.h
> @@ -20,6 +20,7 @@
>  #include "apparmorfs.h"
>  
>  struct aa_label;
> +struct cap_aux_data;
>  
>  /* aa_caps - confinement data for capabilities
>   * @allowed: capabilities mask
> @@ -40,7 +41,8 @@ struct aa_caps {
>  
>  extern struct aa_sfs_entry aa_sfs_entry_caps[];
>  
> -int aa_capable(struct aa_label *label, int cap, unsigned int opts);
> +int aa_capable(struct aa_label *label, int cap, unsigned int opts,
> +	       struct cap_aux_data *cad);
>  
>  static inline void aa_free_cap_rules(struct aa_caps *caps)
>  {
> diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
> index aacd1e95cb59..deb5267ca695 100644
> --- a/security/apparmor/ipc.c
> +++ b/security/apparmor/ipc.c
> @@ -108,7 +108,7 @@ static int profile_tracer_perm(struct aa_profile *tracer,
>  	aad(sa)->peer = tracee;
>  	aad(sa)->request = 0;
>  	aad(sa)->error = aa_capable(&tracer->label, CAP_SYS_PTRACE,
> -				    CAP_OPT_NONE);
> +				    CAP_OPT_NONE, NULL);
>  
>  	return aa_audit(AUDIT_APPARMOR_AUTO, tracer, sa, audit_ptrace_cb);
>  }
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index 87500bde5a92..82790accb679 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -172,14 +172,15 @@ static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
>  }
>  
>  static int apparmor_capable(const struct cred *cred, struct user_namespace *ns,
> -			    int cap, unsigned int opts)
> +			    int cap, unsigned int opts,
> +			    struct cap_aux_data *cad)
>  {
>  	struct aa_label *label;
>  	int error = 0;
>  
>  	label = aa_get_newest_cred_label(cred);
>  	if (!unconfined(label))
> -		error = aa_capable(label, cap, opts);
> +		error = aa_capable(label, cap, opts, cad);
>  	aa_put_label(label);
>  
>  	return error;
> diff --git a/security/apparmor/resource.c b/security/apparmor/resource.c
> index 552ed09cb47e..9b3d4b4437f2 100644
> --- a/security/apparmor/resource.c
> +++ b/security/apparmor/resource.c
> @@ -124,7 +124,7 @@ int aa_task_setrlimit(struct aa_label *label, struct task_struct *task,
>  	 */
>  
>  	if (label != peer &&
> -	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT) != 0)
> +	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT, NULL) != 0)
>  		error = fn_for_each(label, profile,
>  				audit_resource(profile, resource,
>  					       new_rlim->rlim_max, peer,
> diff --git a/security/commoncap.c b/security/commoncap.c
> index c477fb673701..1860ea50f473 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -68,7 +68,7 @@ static void warn_setuid_and_fcaps_mixed(const char *fname)
>   * kernel's capable() and has_capability() returns 1 for this case.
>   */
>  int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
> -		int cap, unsigned int opts)
> +		int cap, unsigned int opts, struct cap_aux_data *cad)
>  {
>  	struct user_namespace *ns = targ_ns;
>  
> @@ -226,7 +226,7 @@ static inline int cap_inh_is_capped(void)
>  	 * capability
>  	 */
>  	if (cap_capable(current_cred(), current_cred()->user_ns,
> -			CAP_SETPCAP, CAP_OPT_NONE) == 0)
> +			CAP_SETPCAP, CAP_OPT_NONE, NULL) == 0)
>  		return 0;
>  	return 1;
>  }
> @@ -1211,7 +1211,8 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>  		    || (cap_capable(current_cred(),
>  				    current_cred()->user_ns,
>  				    CAP_SETPCAP,
> -				    CAP_OPT_NONE) != 0)			/*[4]*/
> +				    CAP_OPT_NONE,
> +				    NULL) != 0)				/*[4]*/
>  			/*
>  			 * [1] no changing of bits that are locked
>  			 * [2] no unlocking of locks
> @@ -1307,7 +1308,7 @@ int cap_vm_enough_memory(struct mm_struct *mm, long pages)
>  	int cap_sys_admin = 0;
>  
>  	if (cap_capable(current_cred(), &init_user_ns,
> -				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) == 0)
> +				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) == 0)
>  		cap_sys_admin = 1;
>  
>  	return cap_sys_admin;
> @@ -1328,7 +1329,7 @@ int cap_mmap_addr(unsigned long addr)
>  
>  	if (addr < dac_mmap_min_addr) {
>  		ret = cap_capable(current_cred(), &init_user_ns, CAP_SYS_RAWIO,
> -				  CAP_OPT_NONE);
> +				  CAP_OPT_NONE, NULL);
>  		/* set PF_SUPERPRIV if it turns out we allow the low mmap */
>  		if (ret == 0)
>  			current->flags |= PF_SUPERPRIV;
> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> index 33028c098ef3..4871b2508a4a 100644
> --- a/security/lsm_audit.c
> +++ b/security/lsm_audit.c
> @@ -229,9 +229,26 @@ static void dump_common_audit_data(struct audit_buffer *ab,
>  	case LSM_AUDIT_DATA_IPC:
>  		audit_log_format(ab, " key=%d ", a->u.ipc_id);
>  		break;
> -	case LSM_AUDIT_DATA_CAP:
> -		audit_log_format(ab, " capability=%d ", a->u.cap);
> +	case LSM_AUDIT_DATA_CAP: {
> +		const struct inode *inode;
> +
> +		if (a->u.cap_struct.cad) {
> +			switch (a->u.cap_struct.cad->type) {
> +			case CAP_AUX_DATA_INODE: {

Putting a { after the 'case:' seems weird to me, and leads to the suspicion
arousing two brackets in a row on same indent level down below.

> +				inode = a->u.cap_struct.cad->u.inode;
> +
> +				audit_log_format(ab, " dev=");
> +				audit_log_untrustedstring(ab,
> +					inode->i_sb->s_id);
> +				audit_log_format(ab, " ino=%lu",
> +					inode->i_ino);
> +				break;
> +			}
> +			}
> +		}
> +		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
>  		break;
> +	}
>  	case LSM_AUDIT_DATA_PATH: {
>  		struct inode *inode;
>
Andy Lutomirski July 16, 2019, 2:21 p.m. UTC | #14
On Tue, Jul 16, 2019 at 7:03 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Fri, Jul 12, 2019 at 12:54:02PM -0700, Casey Schaufler wrote:
> > On 7/12/2019 11:25 AM, Stephen Smalley wrote:
> > > On 7/12/19 1:58 PM, Casey Schaufler wrote:
> > >> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
> > >>> At present security_capable does not pass any object information
> > >>> and therefore can neither audit the particular object nor take it
> > >>> into account. Augment the security_capable interface to support
> > >>> passing supplementary data. Use this facility initially to convey
> > >>> the inode for capability checks relevant to inodes. This only
> > >>> addresses capable_wrt_inode_uidgid calls; other capability checks
> > >>> relevant to inodes will be addressed in subsequent changes. In the
> > >>> future, this will be further extended to pass object information for
> > >>> other capability checks such as the target task for CAP_KILL.
> > >>
> > >> This seems wrong to me. The capability system has nothing to do
> > >> with objects. Passing object information through security_capable()
> > >> may be convenient, but isn't relevant to the purpose of the interface.
> > >> It appears that there are very few places where the object information
> > >> is actually useful.
> > >
> > > A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
> >
> > I'm not disagreeing with that. What I'm saying is that the capability
> > check interface is not the right place to pass that information. The
> > capability check has no use for the object information. I would much
>
> I've had to argue this before while doing the namespaced file
> capabilities implementation.  Perhaps this would be worth writing something
> more formal about.  My main argument is, even if we want to claim that the
> capabilities model is and should be object agnostic, the implementation
> of user namespaces (currently) is such that the whole view of the user's
> privilege must include information which is stored with the object.
>
> There are various user namespaces.
>
> The Linux capabilities ( :-) ) model is user namespaced.  It must be, in
> order to be useful.  If we're going to use file capabilities in distros,
> and distros are going to run in containers, then the capabilities must
> be namespaced.  Otherwise, capabilities will not be used, and heck, should
> just be dropped.
>
> The only way to find out which user namespace has privilege over an inode
> is to look at the inode.
>
> Therefore, object information is needed.

Agreed.  The concept in the kernel is "capability over a namespace."

That being said, sticking a flexible object type into ns_capable()
seems prematurely general to me.  How about adding
security_capable_wrt_inode_uidgid() and allowing LSMs to hook that?
The current implementation would go into commoncap.  The obvious
extensions I can think of are security_dac_read_search(..., inode,
...) and security_dac_override(..., inode, ...).  (Or dentry or
whatever is appropriate.)

If this patch were restructured like that, the semantics would be
obvious, and it would arguably be a genuine cleanup instead of a whole
new mechanism of unknown scope.
Casey Schaufler July 16, 2019, 2:43 p.m. UTC | #15
On 7/16/2019 7:03 AM, Serge E. Hallyn wrote:
> On Fri, Jul 12, 2019 at 12:54:02PM -0700, Casey Schaufler wrote:
>> On 7/12/2019 11:25 AM, Stephen Smalley wrote:
>>> On 7/12/19 1:58 PM, Casey Schaufler wrote:
>>>> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>>>>> At present security_capable does not pass any object information
>>>>> and therefore can neither audit the particular object nor take it
>>>>> into account. Augment the security_capable interface to support
>>>>> passing supplementary data. Use this facility initially to convey
>>>>> the inode for capability checks relevant to inodes. This only
>>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>>>> relevant to inodes will be addressed in subsequent changes. In the
>>>>> future, this will be further extended to pass object information for
>>>>> other capability checks such as the target task for CAP_KILL.
>>>> This seems wrong to me. The capability system has nothing to do
>>>> with objects. Passing object information through security_capable()
>>>> may be convenient, but isn't relevant to the purpose of the interface.
>>>> It appears that there are very few places where the object information
>>>> is actually useful.
>>> A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
>> I'm not disagreeing with that. What I'm saying is that the capability
>> check interface is not the right place to pass that information. The
>> capability check has no use for the object information. I would much
> I've had to argue this before while doing the namespaced file
> capabilities implementation.  Perhaps this would be worth writing something
> more formal about.  My main argument is, even if we want to claim that the
> capabilities model is and should be object agnostic, the implementation
> of user namespaces (currently) is such that the whole view of the user's
> privilege must include information which is stored with the object.
>
> There are various user namespaces.
>
> The Linux capabilities ( :-) ) model is user namespaced.  It must be, in
> order to be useful.  If we're going to use file capabilities in distros,
> and distros are going to run in containers, then the capabilities must
> be namespaced.  Otherwise, capabilities will not be used, and heck, should
> just be dropped.
>
> The only way to find out which user namespace has privilege over an inode
> is to look at the inode.
>
> Therefore, object information is needed.
>
> Until now we've sneaked around that by doing things like capable_wrt_inode_uidgid()
> and rootid_from_xattr().
>
> Again, this crucial: IMO, you have to be able to use a distro the same way in a
> container and not.  Either we support using capabilities in a user namespaced
> container, or we drop capabilities support will not be used, and we may as
> well drop the module.
>
> Now, yes, if someone tries to extend this stuff to do pathname parsing, then we
> might have to put our footsies down.  But we've been dancing around this for a
> long time anyway, so passing the inode in so we can do better logging gets a +1
> from me.

I shake my head and sigh, but as I don't have a better
solution, nor the time to go looking for one, I'm not
going to place obstacles. That, and it's entirely possible
that my view is wrong.

>
> -serge
Casey Schaufler July 16, 2019, 3:03 p.m. UTC | #16
On 7/16/2019 7:21 AM, Andy Lutomirski wrote:
> On Tue, Jul 16, 2019 at 7:03 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>> On Fri, Jul 12, 2019 at 12:54:02PM -0700, Casey Schaufler wrote:
>>> On 7/12/2019 11:25 AM, Stephen Smalley wrote:
>>>> On 7/12/19 1:58 PM, Casey Schaufler wrote:
>>>>> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>>>>>> At present security_capable does not pass any object information
>>>>>> and therefore can neither audit the particular object nor take it
>>>>>> into account. Augment the security_capable interface to support
>>>>>> passing supplementary data. Use this facility initially to convey
>>>>>> the inode for capability checks relevant to inodes. This only
>>>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>>>>> relevant to inodes will be addressed in subsequent changes. In the
>>>>>> future, this will be further extended to pass object information for
>>>>>> other capability checks such as the target task for CAP_KILL.
>>>>> This seems wrong to me. The capability system has nothing to do
>>>>> with objects. Passing object information through security_capable()
>>>>> may be convenient, but isn't relevant to the purpose of the interface.
>>>>> It appears that there are very few places where the object information
>>>>> is actually useful.
>>>> A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
>>> I'm not disagreeing with that. What I'm saying is that the capability
>>> check interface is not the right place to pass that information. The
>>> capability check has no use for the object information. I would much
>> I've had to argue this before while doing the namespaced file
>> capabilities implementation.  Perhaps this would be worth writing something
>> more formal about.  My main argument is, even if we want to claim that the
>> capabilities model is and should be object agnostic, the implementation
>> of user namespaces (currently) is such that the whole view of the user's
>> privilege must include information which is stored with the object.
>>
>> There are various user namespaces.
>>
>> The Linux capabilities ( :-) ) model is user namespaced.  It must be, in
>> order to be useful.  If we're going to use file capabilities in distros,
>> and distros are going to run in containers, then the capabilities must
>> be namespaced.  Otherwise, capabilities will not be used, and heck, should
>> just be dropped.
>>
>> The only way to find out which user namespace has privilege over an inode
>> is to look at the inode.
>>
>> Therefore, object information is needed.
> Agreed.  The concept in the kernel is "capability over a namespace."
>
> That being said, sticking a flexible object type into ns_capable()
> seems prematurely general to me.  How about adding
> security_capable_wrt_inode_uidgid() and allowing LSMs to hook that?
> The current implementation would go into commoncap.  The obvious
> extensions I can think of are security_dac_read_search(..., inode,
> ...) and security_dac_override(..., inode, ...).  (Or dentry or
> whatever is appropriate.)

Would you have an LSM interface for each capability then?
security_sysadmin()? security_chown()? Or do you want to add
security_hey_look_here_is_yet_another_special_case() for
each if () in the kernel?

Sorry, I got carried away. I've been wallowing in the LSM
for too long not to be sensitive to just how fragile the
whole thing is. Adding a bunch more single use interfaces
isn't going to help it be useful in the long run. Please,
let's not go hog wild adding LSM functions. Please.

>
> If this patch were restructured like that, the semantics would be
> obvious, and it would arguably be a genuine cleanup instead of a whole
> new mechanism of unknown scope.
Stephen Smalley July 16, 2019, 3:08 p.m. UTC | #17
On 7/16/19 10:21 AM, Andy Lutomirski wrote:
> On Tue, Jul 16, 2019 at 7:03 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>>
>> On Fri, Jul 12, 2019 at 12:54:02PM -0700, Casey Schaufler wrote:
>>> On 7/12/2019 11:25 AM, Stephen Smalley wrote:
>>>> On 7/12/19 1:58 PM, Casey Schaufler wrote:
>>>>> On 7/12/2019 10:34 AM, Nicholas Franck wrote:
>>>>>> At present security_capable does not pass any object information
>>>>>> and therefore can neither audit the particular object nor take it
>>>>>> into account. Augment the security_capable interface to support
>>>>>> passing supplementary data. Use this facility initially to convey
>>>>>> the inode for capability checks relevant to inodes. This only
>>>>>> addresses capable_wrt_inode_uidgid calls; other capability checks
>>>>>> relevant to inodes will be addressed in subsequent changes. In the
>>>>>> future, this will be further extended to pass object information for
>>>>>> other capability checks such as the target task for CAP_KILL.
>>>>>
>>>>> This seems wrong to me. The capability system has nothing to do
>>>>> with objects. Passing object information through security_capable()
>>>>> may be convenient, but isn't relevant to the purpose of the interface.
>>>>> It appears that there are very few places where the object information
>>>>> is actually useful.
>>>>
>>>> A fair number of capabilities are checked upon some attempted object access (often right after comparing UIDs or other per-object state), and the particular object can be very helpful in both audit and in access control.  More below.
>>>
>>> I'm not disagreeing with that. What I'm saying is that the capability
>>> check interface is not the right place to pass that information. The
>>> capability check has no use for the object information. I would much
>>
>> I've had to argue this before while doing the namespaced file
>> capabilities implementation.  Perhaps this would be worth writing something
>> more formal about.  My main argument is, even if we want to claim that the
>> capabilities model is and should be object agnostic, the implementation
>> of user namespaces (currently) is such that the whole view of the user's
>> privilege must include information which is stored with the object.
>>
>> There are various user namespaces.
>>
>> The Linux capabilities ( :-) ) model is user namespaced.  It must be, in
>> order to be useful.  If we're going to use file capabilities in distros,
>> and distros are going to run in containers, then the capabilities must
>> be namespaced.  Otherwise, capabilities will not be used, and heck, should
>> just be dropped.
>>
>> The only way to find out which user namespace has privilege over an inode
>> is to look at the inode.
>>
>> Therefore, object information is needed.
> 
> Agreed.  The concept in the kernel is "capability over a namespace."
> 
> That being said, sticking a flexible object type into ns_capable()
> seems prematurely general to me.  How about adding
> security_capable_wrt_inode_uidgid() and allowing LSMs to hook that?
> The current implementation would go into commoncap.  The obvious
> extensions I can think of are security_dac_read_search(..., inode,
> ...) and security_dac_override(..., inode, ...).  (Or dentry or
> whatever is appropriate.)

1) Not even all the inode-related capability checks go through 
capable_wrt_inode_uidgid(), so a hook there won't suffice.

2) Other capabilities have other kinds of objects, e.g. tasks, sysvipc, 
etc, and we'll want those to be handled too.

> 
> If this patch were restructured like that, the semantics would be
> obvious, and it would arguably be a genuine cleanup instead of a whole
> new mechanism of unknown scope.
>
Paul Moore July 24, 2019, 8:12 p.m. UTC | #18
On Fri, Jul 12, 2019 at 2:25 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> That is part of what we want, but not the entire picture.  But with
> respect to audit, the problem today is that one sees SELinux
> dac_read_search, dac_override, etc denials with no indication as to the
> particular file, which is unfortunate both from a security auditing
> perspective and from a policy development perspective.  The only option
> today to gain that information is by enabling system call audit and
> setting at least one audit filter so that the audit framework will
> collect that information and include it in the audit records that are
> emitted upon syscall exit after any such AVC denial.

It might be worth looking into adding some functionality to allow the
LSMs to have some control about when/where to enable auditing.  While
selectively adding records once the denial is triggered isn't going to
be possible (it's too late to collect some of the information we would
want, e.g. file information), it should be possible to allow the LSM
to trigger per-process audit record collection without too much
trouble.  We could potentially take it a step further and on those
processes which have had auditing enabled by the LSM we could allow
the LSM to select when these additional records would be emitted (e.g.
on access denial); you would still incur the collection overhead, but
the logs would be much less.

> And even when one can enable
> syscall audit, one must correlate the syscall audit record to the
> associated AVC record to identify the object rather than having the
> information directly included in the same record.

We've been doing a lot of work to associate related records so that
you don't have to deal with the problem you're describing.  If you are
still seeing problematic records on a modern kernel I would like to
hear about it.

Patch
diff mbox series

diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..f72de64c179d 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -211,6 +211,8 @@  extern bool capable(int cap);
 extern bool ns_capable(struct user_namespace *ns, int cap);
 extern bool ns_capable_noaudit(struct user_namespace *ns, int cap);
 extern bool ns_capable_setid(struct user_namespace *ns, int cap);
+extern bool ns_capable_inode(struct user_namespace *ns, int cap,
+			     const struct inode *inode);
 #else
 static inline bool has_capability(struct task_struct *t, int cap)
 {
@@ -246,6 +248,11 @@  static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
 {
 	return true;
 }
+static bool ns_capable_inode(struct user_namespace *ns, int cap,
+			     const struct inode *inode)
+{
+	return true;
+}
 #endif /* CONFIG_MULTIUSER */
 extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode);
 extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 915330abf6e5..15d2a62639f0 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -79,7 +79,10 @@  struct common_audit_data {
 		struct dentry *dentry;
 		struct inode *inode;
 		struct lsm_network_audit *net;
-		int cap;
+		struct  {
+			int cap;
+			struct cap_aux_data *cad;
+		} cap_struct;
 		int ipc_id;
 		struct task_struct *tsk;
 #ifdef CONFIG_KEYS
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 47f58cfb6a19..b2a37d613030 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1469,7 +1469,8 @@  union security_list_options {
 	int (*capable)(const struct cred *cred,
 			struct user_namespace *ns,
 			int cap,
-			unsigned int opts);
+			unsigned int opts,
+			struct cap_aux_data *cad);
 	int (*quotactl)(int cmds, int type, int id, struct super_block *sb);
 	int (*quota_on)(struct dentry *dentry);
 	int (*syslog)(int type);
diff --git a/include/linux/security.h b/include/linux/security.h
index 659071c2e57c..8fce5e69dc52 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -77,9 +77,18 @@  enum lsm_event {
 	LSM_POLICY_CHANGE,
 };
 
+
+struct cap_aux_data {
+	char type;
+#define CAP_AUX_DATA_INODE	1
+	union	{
+		const struct inode *inode;
+	} u;
+};
+
 /* These functions are in security/commoncap.c */
 extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
-		       int cap, unsigned int opts);
+		       int cap, unsigned int opts, struct cap_aux_data *cad);
 extern int cap_settime(const struct timespec64 *ts, const struct timezone *tz);
 extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
 extern int cap_ptrace_traceme(struct task_struct *parent);
@@ -215,9 +224,10 @@  int security_capset(struct cred *new, const struct cred *old,
 		    const kernel_cap_t *inheritable,
 		    const kernel_cap_t *permitted);
 int security_capable(const struct cred *cred,
-		       struct user_namespace *ns,
-		       int cap,
-		       unsigned int opts);
+		     struct user_namespace *ns,
+		     int cap,
+		     unsigned int opts,
+		     struct cap_aux_data *cad);
 int security_quotactl(int cmds, int type, int id, struct super_block *sb);
 int security_quota_on(struct dentry *dentry);
 int security_syslog(int type);
@@ -478,9 +488,10 @@  static inline int security_capset(struct cred *new,
 static inline int security_capable(const struct cred *cred,
 				   struct user_namespace *ns,
 				   int cap,
-				   unsigned int opts)
+				   unsigned int opts,
+				   struct cap_aux_data *cad)
 {
-	return cap_capable(cred, ns, cap, opts);
+	return cap_capable(cred, ns, cap, opts, NULL);
 }
 
 static inline int security_quotactl(int cmds, int type, int id,
diff --git a/kernel/capability.c b/kernel/capability.c
index 1444f3954d75..c3723443904a 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -297,7 +297,7 @@  bool has_ns_capability(struct task_struct *t,
 	int ret;
 
 	rcu_read_lock();
-	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE);
+	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NONE, NULL);
 	rcu_read_unlock();
 
 	return (ret == 0);
@@ -338,7 +338,7 @@  bool has_ns_capability_noaudit(struct task_struct *t,
 	int ret;
 
 	rcu_read_lock();
-	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT);
+	ret = security_capable(__task_cred(t), ns, cap, CAP_OPT_NOAUDIT, NULL);
 	rcu_read_unlock();
 
 	return (ret == 0);
@@ -363,7 +363,8 @@  bool has_capability_noaudit(struct task_struct *t, int cap)
 
 static bool ns_capable_common(struct user_namespace *ns,
 			      int cap,
-			      unsigned int opts)
+			      unsigned int opts,
+			      struct cap_aux_data *cad)
 {
 	int capable;
 
@@ -372,7 +373,7 @@  static bool ns_capable_common(struct user_namespace *ns,
 		BUG();
 	}
 
-	capable = security_capable(current_cred(), ns, cap, opts);
+	capable = security_capable(current_cred(), ns, cap, opts, cad);
 	if (capable == 0) {
 		current->flags |= PF_SUPERPRIV;
 		return true;
@@ -393,7 +394,7 @@  static bool ns_capable_common(struct user_namespace *ns,
  */
 bool ns_capable(struct user_namespace *ns, int cap)
 {
-	return ns_capable_common(ns, cap, CAP_OPT_NONE);
+	return ns_capable_common(ns, cap, CAP_OPT_NONE, NULL);
 }
 EXPORT_SYMBOL(ns_capable);
 
@@ -411,7 +412,7 @@  EXPORT_SYMBOL(ns_capable);
  */
 bool ns_capable_noaudit(struct user_namespace *ns, int cap)
 {
-	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT);
+	return ns_capable_common(ns, cap, CAP_OPT_NOAUDIT, NULL);
 }
 EXPORT_SYMBOL(ns_capable_noaudit);
 
@@ -430,7 +431,7 @@  EXPORT_SYMBOL(ns_capable_noaudit);
  */
 bool ns_capable_setid(struct user_namespace *ns, int cap)
 {
-	return ns_capable_common(ns, cap, CAP_OPT_INSETID);
+	return ns_capable_common(ns, cap, CAP_OPT_INSETID, NULL);
 }
 EXPORT_SYMBOL(ns_capable_setid);
 
@@ -470,7 +471,7 @@  bool file_ns_capable(const struct file *file, struct user_namespace *ns,
 	if (WARN_ON_ONCE(!cap_valid(cap)))
 		return false;
 
-	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE) == 0)
+	if (security_capable(file->f_cred, ns, cap, CAP_OPT_NONE, NULL) == 0)
 		return true;
 
 	return false;
@@ -503,7 +504,8 @@  bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
 {
 	struct user_namespace *ns = current_user_ns();
 
-	return ns_capable(ns, cap) && privileged_wrt_inode_uidgid(ns, inode);
+	return ns_capable_inode(ns, cap, inode) &&
+		privileged_wrt_inode_uidgid(ns, inode);
 }
 EXPORT_SYMBOL(capable_wrt_inode_uidgid);
 
@@ -524,7 +526,18 @@  bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
 	cred = rcu_dereference(tsk->ptracer_cred);
 	if (cred)
 		ret = security_capable(cred, ns, CAP_SYS_PTRACE,
-				       CAP_OPT_NOAUDIT);
+				       CAP_OPT_NOAUDIT, NULL);
 	rcu_read_unlock();
 	return (ret == 0);
 }
+
+bool ns_capable_inode(struct user_namespace *ns, int cap,
+			const struct inode *inode)
+{
+	struct cap_aux_data cad;
+
+	cad.type = CAP_AUX_DATA_INODE;
+	cad.u.inode = inode;
+	return ns_capable_common(ns, cap, CAP_OPT_NONE, &cad);
+}
+EXPORT_SYMBOL(ns_capable_inode);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 811b4a86cdf6..d59dd7079ece 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -446,7 +446,7 @@  static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 	 */
 	if (!task_no_new_privs(current) &&
 	    security_capable(current_cred(), current_user_ns(),
-				     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) != 0)
+			     CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) != 0)
 		return ERR_PTR(-EACCES);
 
 	/* Allocate a new seccomp_filter */
diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
index 752f73980e30..c45192a16733 100644
--- a/security/apparmor/capability.c
+++ b/security/apparmor/capability.c
@@ -50,7 +50,7 @@  static void audit_cb(struct audit_buffer *ab, void *va)
 	struct common_audit_data *sa = va;
 
 	audit_log_format(ab, " capname=");
-	audit_log_untrustedstring(ab, capability_names[sa->u.cap]);
+	audit_log_untrustedstring(ab, capability_names[sa->u.cap_struct.cap]);
 }
 
 /**
@@ -148,13 +148,15 @@  static int profile_capable(struct aa_profile *profile, int cap,
  *
  * Returns: 0 on success, or else an error code.
  */
-int aa_capable(struct aa_label *label, int cap, unsigned int opts)
+int aa_capable(struct aa_label *label, int cap, unsigned int opts,
+	       struct cap_aux_data *cad)
 {
 	struct aa_profile *profile;
 	int error = 0;
 	DEFINE_AUDIT_DATA(sa, LSM_AUDIT_DATA_CAP, OP_CAPABLE);
 
-	sa.u.cap = cap;
+	sa.u.cap_struct.cap = cap;
+	sa.u.cap_struct.cad = cad;
 	error = fn_for_each_confined(label, profile,
 			profile_capable(profile, cap, opts, &sa));
 
diff --git a/security/apparmor/include/capability.h b/security/apparmor/include/capability.h
index 1b3663b6ab12..d888f09d76d1 100644
--- a/security/apparmor/include/capability.h
+++ b/security/apparmor/include/capability.h
@@ -20,6 +20,7 @@ 
 #include "apparmorfs.h"
 
 struct aa_label;
+struct cap_aux_data;
 
 /* aa_caps - confinement data for capabilities
  * @allowed: capabilities mask
@@ -40,7 +41,8 @@  struct aa_caps {
 
 extern struct aa_sfs_entry aa_sfs_entry_caps[];
 
-int aa_capable(struct aa_label *label, int cap, unsigned int opts);
+int aa_capable(struct aa_label *label, int cap, unsigned int opts,
+	       struct cap_aux_data *cad);
 
 static inline void aa_free_cap_rules(struct aa_caps *caps)
 {
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index aacd1e95cb59..deb5267ca695 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -108,7 +108,7 @@  static int profile_tracer_perm(struct aa_profile *tracer,
 	aad(sa)->peer = tracee;
 	aad(sa)->request = 0;
 	aad(sa)->error = aa_capable(&tracer->label, CAP_SYS_PTRACE,
-				    CAP_OPT_NONE);
+				    CAP_OPT_NONE, NULL);
 
 	return aa_audit(AUDIT_APPARMOR_AUTO, tracer, sa, audit_ptrace_cb);
 }
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 87500bde5a92..82790accb679 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -172,14 +172,15 @@  static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
 }
 
 static int apparmor_capable(const struct cred *cred, struct user_namespace *ns,
-			    int cap, unsigned int opts)
+			    int cap, unsigned int opts,
+			    struct cap_aux_data *cad)
 {
 	struct aa_label *label;
 	int error = 0;
 
 	label = aa_get_newest_cred_label(cred);
 	if (!unconfined(label))
-		error = aa_capable(label, cap, opts);
+		error = aa_capable(label, cap, opts, cad);
 	aa_put_label(label);
 
 	return error;
diff --git a/security/apparmor/resource.c b/security/apparmor/resource.c
index 552ed09cb47e..9b3d4b4437f2 100644
--- a/security/apparmor/resource.c
+++ b/security/apparmor/resource.c
@@ -124,7 +124,7 @@  int aa_task_setrlimit(struct aa_label *label, struct task_struct *task,
 	 */
 
 	if (label != peer &&
-	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT) != 0)
+	    aa_capable(label, CAP_SYS_RESOURCE, CAP_OPT_NOAUDIT, NULL) != 0)
 		error = fn_for_each(label, profile,
 				audit_resource(profile, resource,
 					       new_rlim->rlim_max, peer,
diff --git a/security/commoncap.c b/security/commoncap.c
index c477fb673701..1860ea50f473 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -68,7 +68,7 @@  static void warn_setuid_and_fcaps_mixed(const char *fname)
  * kernel's capable() and has_capability() returns 1 for this case.
  */
 int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
-		int cap, unsigned int opts)
+		int cap, unsigned int opts, struct cap_aux_data *cad)
 {
 	struct user_namespace *ns = targ_ns;
 
@@ -226,7 +226,7 @@  static inline int cap_inh_is_capped(void)
 	 * capability
 	 */
 	if (cap_capable(current_cred(), current_cred()->user_ns,
-			CAP_SETPCAP, CAP_OPT_NONE) == 0)
+			CAP_SETPCAP, CAP_OPT_NONE, NULL) == 0)
 		return 0;
 	return 1;
 }
@@ -1211,7 +1211,8 @@  int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 		    || (cap_capable(current_cred(),
 				    current_cred()->user_ns,
 				    CAP_SETPCAP,
-				    CAP_OPT_NONE) != 0)			/*[4]*/
+				    CAP_OPT_NONE,
+				    NULL) != 0)				/*[4]*/
 			/*
 			 * [1] no changing of bits that are locked
 			 * [2] no unlocking of locks
@@ -1307,7 +1308,7 @@  int cap_vm_enough_memory(struct mm_struct *mm, long pages)
 	int cap_sys_admin = 0;
 
 	if (cap_capable(current_cred(), &init_user_ns,
-				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) == 0)
+				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT, NULL) == 0)
 		cap_sys_admin = 1;
 
 	return cap_sys_admin;
@@ -1328,7 +1329,7 @@  int cap_mmap_addr(unsigned long addr)
 
 	if (addr < dac_mmap_min_addr) {
 		ret = cap_capable(current_cred(), &init_user_ns, CAP_SYS_RAWIO,
-				  CAP_OPT_NONE);
+				  CAP_OPT_NONE, NULL);
 		/* set PF_SUPERPRIV if it turns out we allow the low mmap */
 		if (ret == 0)
 			current->flags |= PF_SUPERPRIV;
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 33028c098ef3..4871b2508a4a 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -229,9 +229,26 @@  static void dump_common_audit_data(struct audit_buffer *ab,
 	case LSM_AUDIT_DATA_IPC:
 		audit_log_format(ab, " key=%d ", a->u.ipc_id);
 		break;
-	case LSM_AUDIT_DATA_CAP:
-		audit_log_format(ab, " capability=%d ", a->u.cap);
+	case LSM_AUDIT_DATA_CAP: {
+		const struct inode *inode;
+
+		if (a->u.cap_struct.cad) {
+			switch (a->u.cap_struct.cad->type) {
+			case CAP_AUX_DATA_INODE: {
+				inode = a->u.cap_struct.cad->u.inode;
+
+				audit_log_format(ab, " dev=");
+				audit_log_untrustedstring(ab,
+					inode->i_sb->s_id);
+				audit_log_format(ab, " ino=%lu",
+					inode->i_ino);
+				break;
+			}
+			}
+		}
+		audit_log_format(ab, " capability=%d ", a->u.cap_struct.cap);
 		break;
+	}
 	case LSM_AUDIT_DATA_PATH: {
 		struct inode *inode;
 
diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
index cecd38e2ac80..c74ed83e9501 100644
--- a/security/safesetid/lsm.c
+++ b/security/safesetid/lsm.c
@@ -80,7 +80,8 @@  static bool check_setuid_policy_hashtable_key_value(kuid_t parent,
 static int safesetid_security_capable(const struct cred *cred,
 				      struct user_namespace *ns,
 				      int cap,
-				      unsigned int opts)
+				      unsigned int opts,
+				      struct cap_aux_data *cad)
 {
 	if (cap == CAP_SETUID &&
 	    check_setuid_policy_hashtable_key(cred->uid)) {
diff --git a/security/security.c b/security/security.c
index 613a5c00e602..06274a7b9c4e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -691,9 +691,10 @@  int security_capset(struct cred *new, const struct cred *old,
 int security_capable(const struct cred *cred,
 		     struct user_namespace *ns,
 		     int cap,
-		     unsigned int opts)
+		     unsigned int opts,
+		     struct cap_aux_data *cad)
 {
-	return call_int_hook(capable, 0, cred, ns, cap, opts);
+	return call_int_hook(capable, 0, cred, ns, cap, opts, cad);
 }
 
 int security_quotactl(int cmds, int type, int id, struct super_block *sb)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index f77b314d0575..d6c699ed06be 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1618,7 +1618,10 @@  static inline u32 signal_to_av(int sig)
 
 /* Check whether a task is allowed to use a capability. */
 static int cred_has_capability(const struct cred *cred,
-			       int cap, unsigned int opts, bool initns)
+				int cap,
+				unsigned int opts,
+				bool initns,
+				struct cap_aux_data *cad)
 {
 	struct common_audit_data ad;
 	struct av_decision avd;
@@ -1628,7 +1631,8 @@  static int cred_has_capability(const struct cred *cred,
 	int rc;
 
 	ad.type = LSM_AUDIT_DATA_CAP;
-	ad.u.cap = cap;
+	ad.u.cap_struct.cad = cad;
+	ad.u.cap_struct.cap = cap;
 
 	switch (CAP_TO_INDEX(cap)) {
 	case 0:
@@ -2165,9 +2169,9 @@  static int selinux_capset(struct cred *new, const struct cred *old,
  */
 
 static int selinux_capable(const struct cred *cred, struct user_namespace *ns,
-			   int cap, unsigned int opts)
+			   int cap, unsigned int opts, struct cap_aux_data *cad)
 {
-	return cred_has_capability(cred, cap, opts, ns == &init_user_ns);
+	return cred_has_capability(cred, cap, opts, ns == &init_user_ns, cad);
 }
 
 static int selinux_quotactl(int cmds, int type, int id, struct super_block *sb)
@@ -2241,7 +2245,7 @@  static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
 	int rc, cap_sys_admin = 0;
 
 	rc = cred_has_capability(current_cred(), CAP_SYS_ADMIN,
-				 CAP_OPT_NOAUDIT, true);
+				 CAP_OPT_NOAUDIT, true, NULL);
 	if (rc == 0)
 		cap_sys_admin = 1;
 
@@ -3101,9 +3105,9 @@  static bool has_cap_mac_admin(bool audit)
 	const struct cred *cred = current_cred();
 	unsigned int opts = audit ? CAP_OPT_NONE : CAP_OPT_NOAUDIT;
 
-	if (cap_capable(cred, &init_user_ns, CAP_MAC_ADMIN, opts))
+	if (cap_capable(cred, &init_user_ns, CAP_MAC_ADMIN, opts, NULL))
 		return false;
-	if (cred_has_capability(cred, CAP_MAC_ADMIN, opts, true))
+	if (cred_has_capability(cred, CAP_MAC_ADMIN, opts, true, NULL))
 		return false;
 	return true;
 }
@@ -3563,7 +3567,7 @@  static int selinux_file_ioctl(struct file *file, unsigned int cmd,
 	case KDSKBENT:
 	case KDSKBSENT:
 		error = cred_has_capability(cred, CAP_SYS_TTY_CONFIG,
-					    CAP_OPT_NONE, true);
+					    CAP_OPT_NONE, true, NULL);
 		break;
 
 	/* default case assumes that the command will go
diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
index fe2ce3a65822..e961bfe8f00a 100644
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
@@ -640,7 +640,7 @@  bool smack_privileged_cred(int cap, const struct cred *cred)
 	struct smack_known_list_elem *sklep;
 	int rc;
 
-	rc = cap_capable(cred, &init_user_ns, cap, CAP_OPT_NONE);
+	rc = cap_capable(cred, &init_user_ns, cap, CAP_OPT_NONE, NULL);
 	if (rc)
 		return false;