[ghak90,V9,11/13] audit: contid check descendancy and nesting
diff mbox series

Message ID 01229b93733d9baf6ac9bb0cc243eeb08ad579cd.1593198710.git.rgb@redhat.com
State New
Headers show
Series
  • audit: implement container identifier
Related show

Commit Message

Richard Guy Briggs June 27, 2020, 1:20 p.m. UTC
Require the target task to be a descendant of the container
orchestrator/engine.

You would only change the audit container ID from one set or inherited
value to another if you were nesting containers.

If changing the contid, the container orchestrator/engine must be a
descendant and not same orchestrator as the one that set it so it is not
possible to change the contid of another orchestrator's container.

Since the task_is_descendant() function is used in YAMA and in audit,
remove the duplication and pull the function into kernel/core/sched.c

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 include/linux/sched.h    |  3 +++
 kernel/audit.c           | 23 +++++++++++++++++++++--
 kernel/sched/core.c      | 33 +++++++++++++++++++++++++++++++++
 security/yama/yama_lsm.c | 33 ---------------------------------
 4 files changed, 57 insertions(+), 35 deletions(-)

Comments

Paul Moore July 5, 2020, 3:11 p.m. UTC | #1
On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Require the target task to be a descendant of the container
> orchestrator/engine.
>
> You would only change the audit container ID from one set or inherited
> value to another if you were nesting containers.
>
> If changing the contid, the container orchestrator/engine must be a
> descendant and not same orchestrator as the one that set it so it is not
> possible to change the contid of another orchestrator's container.
>
> Since the task_is_descendant() function is used in YAMA and in audit,
> remove the duplication and pull the function into kernel/core/sched.c
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/sched.h    |  3 +++
>  kernel/audit.c           | 23 +++++++++++++++++++++--
>  kernel/sched/core.c      | 33 +++++++++++++++++++++++++++++++++
>  security/yama/yama_lsm.c | 33 ---------------------------------
>  4 files changed, 57 insertions(+), 35 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2213ac670386..06938d0b9e0c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2047,4 +2047,7 @@ static inline void rseq_syscall(struct pt_regs *regs)
>
>  const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
>
> +extern int task_is_descendant(struct task_struct *parent,
> +                             struct task_struct *child);
> +
>  #endif
> diff --git a/kernel/audit.c b/kernel/audit.c
> index a862721dfd9b..efa65ec01239 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2713,6 +2713,20 @@ int audit_signal_info(int sig, struct task_struct *t)
>         return audit_signal_info_syscall(t);
>  }
>
> +static bool audit_contid_isnesting(struct task_struct *tsk)
> +{
> +       bool isowner = false;
> +       bool ownerisparent = false;
> +
> +       rcu_read_lock();
> +       if (tsk->audit && tsk->audit->cont) {
> +               isowner = current == tsk->audit->cont->owner;
> +               ownerisparent = task_is_descendant(tsk->audit->cont->owner, current);

I want to make sure I'm understanding this correctly and I keep
mentally tripping over something: it seems like for a given audit
container ID a task is either the owner or a descendent, there is no
third state, is that correct?

Assuming that is true, can the descendent check simply be a negative
owner check given they both have the same audit container ID?

> +       }
> +       rcu_read_unlock();
> +       return !isowner && ownerisparent;
> +}
> +
>  /*
>   * audit_set_contid - set current task's audit contid
>   * @task: target task
> @@ -2755,8 +2769,13 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>                 rc = -EBUSY;
>                 goto unlock;
>         }
> -       /* if contid is already set, deny */
> -       if (audit_contid_set(task))
> +       /* if task is not descendant, block */
> +       if (task == current || !task_is_descendant(current, task)) {

I'm also still fuzzy on why we can't let a task set it's own audit
container ID, assuming it meets all the criteria established in patch
2/13.  It somewhat made sense when you were tracking inherited vs
explicitly set audit container IDs, but that doesn't appear to be the
case so far in this patchset, yes?

> +               rc = -EXDEV;

I'm fairly confident we had a discussion about not using all these
different error codes, but that may be a moot point given my next
comment.

> +               goto unlock;
> +       }
> +       /* only allow contid setting again if nesting */
> +       if (audit_contid_set(task) && !audit_contid_isnesting(task))
>                 rc = -EEXIST;

It seems like what we need in audit_set_contid() is a check to ensure
that the task being modified is only modified by the owner of the
audit container ID, yes?  If so, I would think we could do this quite
easily with the following, or similar logic, (NOTE: assumes both
current and tsk are properly setup):

  if ((current->audit->cont != tsk->audit->cont) ||
(current->audit->cont->owner != current))
    return -EACCESS;

This is somewhat independent of the above issue, but we may also want
to add to the capability check.  Patch 2 adds a
"capable(CAP_AUDIT_CONTROL)" which is good, but perhaps we also need a
"ns_capable(CAP_AUDIT_CONTROL)" to allow a given audit container ID
orchestrator/owner the ability to control which of it's descendants
can change their audit container ID, for example:

  if (!capable(CAP_AUDIT_CONTROL) ||
      !ns_capable(current->nsproxy->user_ns, CAP_AUDIT_CONTROL))
    return -EPERM;

--
paul moore
www.paul-moore.com

Patch
diff mbox series

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2213ac670386..06938d0b9e0c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2047,4 +2047,7 @@  static inline void rseq_syscall(struct pt_regs *regs)
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
 
+extern int task_is_descendant(struct task_struct *parent,
+			      struct task_struct *child);
+
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index a862721dfd9b..efa65ec01239 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2713,6 +2713,20 @@  int audit_signal_info(int sig, struct task_struct *t)
 	return audit_signal_info_syscall(t);
 }
 
+static bool audit_contid_isnesting(struct task_struct *tsk)
+{
+	bool isowner = false;
+	bool ownerisparent = false;
+
+	rcu_read_lock();
+	if (tsk->audit && tsk->audit->cont) {
+		isowner = current == tsk->audit->cont->owner;
+		ownerisparent = task_is_descendant(tsk->audit->cont->owner, current);
+	}
+	rcu_read_unlock();
+	return !isowner && ownerisparent;
+}
+
 /*
  * audit_set_contid - set current task's audit contid
  * @task: target task
@@ -2755,8 +2769,13 @@  int audit_set_contid(struct task_struct *task, u64 contid)
 		rc = -EBUSY;
 		goto unlock;
 	}
-	/* if contid is already set, deny */
-	if (audit_contid_set(task))
+	/* if task is not descendant, block */
+	if (task == current || !task_is_descendant(current, task)) {
+		rc = -EXDEV;
+		goto unlock;
+	}
+	/* only allow contid setting again if nesting */
+	if (audit_contid_set(task) && !audit_contid_isnesting(task))
 		rc = -EEXIST;
 unlock:
 	read_unlock(&tasklist_lock);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8f360326861e..e6b24c52b3c3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8012,6 +8012,39 @@  void dump_cpu_task(int cpu)
 }
 
 /*
+ * task_is_descendant - walk up a process family tree looking for a match
+ * @parent: the process to compare against while walking up from child
+ * @child: the process to start from while looking upwards for parent
+ *
+ * Returns 1 if child is a descendant of parent, 0 if not.
+ */
+int task_is_descendant(struct task_struct *parent,
+			      struct task_struct *child)
+{
+	int rc = 0;
+	struct task_struct *walker = child;
+
+	if (!parent || !child)
+		return 0;
+
+	rcu_read_lock();
+	if (!thread_group_leader(parent))
+		parent = rcu_dereference(parent->group_leader);
+	while (walker->pid > 0) {
+		if (!thread_group_leader(walker))
+			walker = rcu_dereference(walker->group_leader);
+		if (walker == parent) {
+			rc = 1;
+			break;
+		}
+		walker = rcu_dereference(walker->real_parent);
+	}
+	rcu_read_unlock();
+
+	return rc;
+}
+
+/*
  * Nice levels are multiplicative, with a gentle 10% change for every
  * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
  * nice 1, it will get ~10% less CPU time than another CPU-bound task
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 536c99646f6a..24939f765df5 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -263,39 +263,6 @@  static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 }
 
 /**
- * task_is_descendant - walk up a process family tree looking for a match
- * @parent: the process to compare against while walking up from child
- * @child: the process to start from while looking upwards for parent
- *
- * Returns 1 if child is a descendant of parent, 0 if not.
- */
-static int task_is_descendant(struct task_struct *parent,
-			      struct task_struct *child)
-{
-	int rc = 0;
-	struct task_struct *walker = child;
-
-	if (!parent || !child)
-		return 0;
-
-	rcu_read_lock();
-	if (!thread_group_leader(parent))
-		parent = rcu_dereference(parent->group_leader);
-	while (walker->pid > 0) {
-		if (!thread_group_leader(walker))
-			walker = rcu_dereference(walker->group_leader);
-		if (walker == parent) {
-			rc = 1;
-			break;
-		}
-		walker = rcu_dereference(walker->real_parent);
-	}
-	rcu_read_unlock();
-
-	return rc;
-}
-
-/**
  * ptracer_exception_found - tracer registered as exception for this tracee
  * @tracer: the task_struct of the process attempting ptrace
  * @tracee: the task_struct of the process to be ptraced