diff mbox series

[RFC,6/6] security/fbfam: Mitigate a fork brute force attack

Message ID 20200910202107.3799376-7-keescook@chromium.org (mailing list archive)
State New, archived
Headers show
Series Fork brute force attack mitigation (fbfam) | expand

Commit Message

Kees Cook Sept. 10, 2020, 8:21 p.m. UTC
From: John Wood <john.wood@gmx.com>

In order to mitigate a fork brute force attack it is necessary to kill
all the offending tasks. This tasks are all the ones that share the
statistical data with the current task (the task that has crashed).

Since the attack detection is done in the function fbfam_handle_attack()
that is called every time a core dump is triggered, only is needed to
kill the others tasks that share the same statistical data, not the
current one as this is in the path to be killed.

When the SIGKILL signal is sent to the offending tasks from the function
fbfam_kill_tasks(), this one will be called again during the core dump
due to the shared statistical data shows a quickly crashing rate. So, to
avoid kill again the same tasks due to a recursive call of this
function, it is necessary to disable the attack detection.

To disable this attack detection, add a condition in the function
fbfam_handle_attack() to not compute the crashing rate when the jiffies
stored in the statistical data are set to zero.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 security/fbfam/fbfam.c | 76 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 71 insertions(+), 5 deletions(-)

Comments

Jann Horn Sept. 10, 2020, 8:55 p.m. UTC | #1
On Thu, Sep 10, 2020 at 10:22 PM Kees Cook <keescook@chromium.org> wrote:
> In order to mitigate a fork brute force attack it is necessary to kill
> all the offending tasks. This tasks are all the ones that share the
> statistical data with the current task (the task that has crashed).
>
> Since the attack detection is done in the function fbfam_handle_attack()
> that is called every time a core dump is triggered, only is needed to
> kill the others tasks that share the same statistical data, not the
> current one as this is in the path to be killed.
>
> When the SIGKILL signal is sent to the offending tasks from the function
> fbfam_kill_tasks(), this one will be called again during the core dump
> due to the shared statistical data shows a quickly crashing rate. So, to
> avoid kill again the same tasks due to a recursive call of this
> function, it is necessary to disable the attack detection.
>
> To disable this attack detection, add a condition in the function
> fbfam_handle_attack() to not compute the crashing rate when the jiffies
> stored in the statistical data are set to zero.
[...]
>  /**
> - * fbfam_handle_attack() - Fork brute force attack detection.
> + * fbfam_kill_tasks() - Kill the offending tasks
> + *
> + * When a fork brute force attack is detected it is necessary to kill all the
> + * offending tasks. Since this function is called from fbfam_handle_attack(),
> + * and so, every time a core dump is triggered, only is needed to kill the
> + * others tasks that share the same statistical data, not the current one as
> + * this is in the path to be killed.
> + *
> + * When the SIGKILL signal is sent to the offending tasks, this function will be
> + * called again during the core dump due to the shared statistical data shows a
> + * quickly crashing rate. So, to avoid kill again the same tasks due to a
> + * recursive call of this function, it is necessary to disable the attack
> + * detection setting the jiffies to zero.
> + *
> + * To improve the for_each_process loop it is possible to end it when all the
> + * tasks that shared the same statistics are found.

This is not a fastpath, there's no need to be clever and optimize
things here, please get rid of that optimization. Especially since
that fastpath looks racy against concurrent execve().

> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +static int fbfam_kill_tasks(void)
> +{
> +       struct fbfam_stats *stats = current->fbfam_stats;
> +       struct task_struct *p;
> +       unsigned int to_kill, killed = 0;
> +
> +       if (!stats)
> +               return -EFAULT;
> +
> +       to_kill = refcount_read(&stats->refc) - 1;
> +       if (!to_kill)
> +               return 0;
> +
> +       /* Disable the attack detection */
> +       stats->jiffies = 0;
> +       rcu_read_lock();
> +
> +       for_each_process(p) {
> +               if (p == current || p->fbfam_stats != stats)

p->fbfam_stats could change concurrently, you should at least use
READ_ONCE() here.

Also, if this codepath is hit by a non-leader thread, "p == current"
will always be false, and you'll end up killing the caller, too. You
may want to compare with current->group_leader instead.


> +                       continue;
> +
> +               do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> +               pr_warn("fbfam: Offending process with PID %d killed\n",
> +                       p->pid);

Normally pr_*() messages about tasks mention not just the pid, but
also the ->comm name of the task.

> +               killed += 1;
> +               if (killed >= to_kill)
> +                       break;
> +       }
> +
> +       rcu_read_unlock();
> +       return 0;
> +}
Kees Cook Sept. 10, 2020, 11:56 p.m. UTC | #2
On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> In order to mitigate a fork brute force attack it is necessary to kill
> all the offending tasks. This tasks are all the ones that share the
> statistical data with the current task (the task that has crashed).
> 
> Since the attack detection is done in the function fbfam_handle_attack()
> that is called every time a core dump is triggered, only is needed to
> kill the others tasks that share the same statistical data, not the
> current one as this is in the path to be killed.
> 
> When the SIGKILL signal is sent to the offending tasks from the function
> fbfam_kill_tasks(), this one will be called again during the core dump
> due to the shared statistical data shows a quickly crashing rate. So, to
> avoid kill again the same tasks due to a recursive call of this
> function, it is necessary to disable the attack detection.
> 
> To disable this attack detection, add a condition in the function
> fbfam_handle_attack() to not compute the crashing rate when the jiffies
> stored in the statistical data are set to zero.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  security/fbfam/fbfam.c | 76 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
> index 3aa669e4ea51..173a6122390f 100644
> --- a/security/fbfam/fbfam.c
> +++ b/security/fbfam/fbfam.c
> @@ -4,8 +4,11 @@
>  #include <linux/errno.h>
>  #include <linux/gfp.h>
>  #include <linux/jiffies.h>
> +#include <linux/pid.h>
>  #include <linux/printk.h>
> +#include <linux/rcupdate.h>
>  #include <linux/refcount.h>
> +#include <linux/sched/signal.h>
>  #include <linux/signal.h>
>  #include <linux/slab.h>
>  
> @@ -24,7 +27,8 @@ unsigned long sysctl_crashing_rate_threshold = 30000;
>   * struct fbfam_stats - Fork brute force attack mitigation statistics.
>   * @refc: Reference counter.
>   * @faults: Number of crashes since jiffies.
> - * @jiffies: First fork or execve timestamp.
> + * @jiffies: First fork or execve timestamp. If zero, the attack detection is
> + *           disabled.
>   *
>   * The purpose of this structure is to manage all the necessary information to
>   * compute the crashing rate of an application. So, it holds a first fork or
> @@ -175,13 +179,69 @@ int fbfam_exit(void)
>  }
>  
>  /**
> - * fbfam_handle_attack() - Fork brute force attack detection.
> + * fbfam_kill_tasks() - Kill the offending tasks
> + *
> + * When a fork brute force attack is detected it is necessary to kill all the
> + * offending tasks. Since this function is called from fbfam_handle_attack(),
> + * and so, every time a core dump is triggered, only is needed to kill the
> + * others tasks that share the same statistical data, not the current one as
> + * this is in the path to be killed.
> + *
> + * When the SIGKILL signal is sent to the offending tasks, this function will be
> + * called again during the core dump due to the shared statistical data shows a
> + * quickly crashing rate. So, to avoid kill again the same tasks due to a
> + * recursive call of this function, it is necessary to disable the attack
> + * detection setting the jiffies to zero.
> + *
> + * To improve the for_each_process loop it is possible to end it when all the
> + * tasks that shared the same statistics are found.
> + *
> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +static int fbfam_kill_tasks(void)
> +{
> +	struct fbfam_stats *stats = current->fbfam_stats;
> +	struct task_struct *p;
> +	unsigned int to_kill, killed = 0;
> +
> +	if (!stats)
> +		return -EFAULT;
> +
> +	to_kill = refcount_read(&stats->refc) - 1;
> +	if (!to_kill)
> +		return 0;
> +
> +	/* Disable the attack detection */
> +	stats->jiffies = 0;
> +	rcu_read_lock();
> +
> +	for_each_process(p) {
> +		if (p == current || p->fbfam_stats != stats)
> +			continue;
> +
> +		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> +		pr_warn("fbfam: Offending process with PID %d killed\n",
> +			p->pid);

I'd make this ratelimited (along with Jann's suggestions). Also, instead
of the explicit "fbfam:" prefix, use the regular prefixing method:

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt


> +
> +		killed += 1;
> +		if (killed >= to_kill)
> +			break;
> +	}
> +
> +	rcu_read_unlock();

Can't newly created processes escape this RCU read lock? I think this
need alternate locking, or something in the task_alloc hook that will
block any new process from being created within the stats group.

> +	return 0;
> +}
> +
> +/**
> + * fbfam_handle_attack() - Fork brute force attack detection and mitigation.
>   * @signal: Signal number that causes the core dump.
>   *
>   * The crashing rate of an application is computed in milliseconds per fault in
>   * each crash. So, if this rate goes under a certain threshold there is a clear
>   * signal that the application is crashing quickly. At this moment, a fork brute
> - * force attack is happening.
> + * force attack is happening. Under this scenario it is necessary to kill all
> + * the offending tasks in order to mitigate the attack.
>   *
>   * Return: -EFAULT if the current task doesn't have statistical data. Zero
>   *         otherwise.
> @@ -195,6 +255,10 @@ int fbfam_handle_attack(int signal)
>  	if (!stats)
>  		return -EFAULT;
>  
> +	/* The attack detection is disabled */
> +	if (!stats->jiffies)
> +		return 0;
> +
>  	if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
>  	      signal == SIGSEGV || signal == SIGSYS))
>  		return 0;
> @@ -205,9 +269,11 @@ int fbfam_handle_attack(int signal)
>  	delta_time = jiffies64_to_msecs(delta_jiffies);
>  	crashing_rate = delta_time / (u64)stats->faults;
>  
> -	if (crashing_rate < (u64)sysctl_crashing_rate_threshold)
> -		pr_warn("fbfam: Fork brute force attack detected\n");
> +	if (crashing_rate >= (u64)sysctl_crashing_rate_threshold)
> +		return 0;
>  
> +	pr_warn("fbfam: Fork brute force attack detected\n");
> +	fbfam_kill_tasks();
>  	return 0;
>  }
>  
> -- 
> 2.25.1
>
Jann Horn Sept. 11, 2020, 12:20 a.m. UTC | #3
On Fri, Sep 11, 2020 at 1:56 AM Kees Cook <keescook@chromium.org> wrote:
> On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> > From: John Wood <john.wood@gmx.com>
> >
> > In order to mitigate a fork brute force attack it is necessary to kill
> > all the offending tasks. This tasks are all the ones that share the
> > statistical data with the current task (the task that has crashed).
> >
> > Since the attack detection is done in the function fbfam_handle_attack()
> > that is called every time a core dump is triggered, only is needed to
> > kill the others tasks that share the same statistical data, not the
> > current one as this is in the path to be killed.
[...]
> > +     for_each_process(p) {
> > +             if (p == current || p->fbfam_stats != stats)
> > +                     continue;
> > +
> > +             do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> > +             pr_warn("fbfam: Offending process with PID %d killed\n",
> > +                     p->pid);
[...]
> > +
> > +             killed += 1;
> > +             if (killed >= to_kill)
> > +                     break;
> > +     }
> > +
> > +     rcu_read_unlock();
>
> Can't newly created processes escape this RCU read lock? I think this
> need alternate locking, or something in the task_alloc hook that will
> block any new process from being created within the stats group.

Good point; the proper way to deal with this would probably be to take
the tasklist_lock in read mode around this loop (with
read_lock(&tasklist_lock) / read_unlock(&tasklist_lock)), which pairs
with the write_lock_irq(&tasklist_lock) in copy_process(). Thanks to
the fatal_signal_pending() check while holding the lock in
copy_process(), that would be race-free - any fork() that has not yet
inserted the new task into the global task list would wait for us to
drop the tasklist_lock, then bail out at the fatal_signal_pending()
check.
John Wood Sept. 18, 2020, 4:02 p.m. UTC | #4
On Thu, Sep 10, 2020 at 04:56:19PM -0700, Kees Cook wrote:
> On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> >  /**
> > + * fbfam_kill_tasks() - Kill the offending tasks
> > + *
> > + * When a fork brute force attack is detected it is necessary to kill all the
> > + * offending tasks. Since this function is called from fbfam_handle_attack(),
> > + * and so, every time a core dump is triggered, only is needed to kill the
> > + * others tasks that share the same statistical data, not the current one as
> > + * this is in the path to be killed.
> > + *
> > + * When the SIGKILL signal is sent to the offending tasks, this function will be
> > + * called again during the core dump due to the shared statistical data shows a
> > + * quickly crashing rate. So, to avoid kill again the same tasks due to a
> > + * recursive call of this function, it is necessary to disable the attack
> > + * detection setting the jiffies to zero.
> > + *
> > + * To improve the for_each_process loop it is possible to end it when all the
> > + * tasks that shared the same statistics are found.
> > + *
> > + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> > + *         otherwise.
> > + */
> > +static int fbfam_kill_tasks(void)
> > +{
> > +	struct fbfam_stats *stats = current->fbfam_stats;
> > +	struct task_struct *p;
> > +	unsigned int to_kill, killed = 0;
> > +
> > +	if (!stats)
> > +		return -EFAULT;
> > +
> > +	to_kill = refcount_read(&stats->refc) - 1;
> > +	if (!to_kill)
> > +		return 0;
> > +
> > +	/* Disable the attack detection */
> > +	stats->jiffies = 0;
> > +	rcu_read_lock();
> > +
> > +	for_each_process(p) {
> > +		if (p == current || p->fbfam_stats != stats)
> > +			continue;
> > +
> > +		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> > +		pr_warn("fbfam: Offending process with PID %d killed\n",
> > +			p->pid);
>
> I'd make this ratelimited (along with Jann's suggestions).

Sorry, but I don't understand what you mean with "make this ratelimited".
A clarification would be greatly appreciated.

> Also, instead of the explicit "fbfam:" prefix, use the regular
> prefixing method:
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

Understood.

> > +
> > +		killed += 1;
> > +		if (killed >= to_kill)
> > +			break;
> > +	}
> > +
> > +	rcu_read_unlock();
>
> Can't newly created processes escape this RCU read lock? I think this
> need alternate locking, or something in the task_alloc hook that will
> block any new process from being created within the stats group.

I will work on this for the next version. Thanks.

> > +	return 0;
> > +}
>
> --
> Kees Cook

Thanks
John Wood
Kees Cook Sept. 18, 2020, 9:35 p.m. UTC | #5
On Fri, Sep 18, 2020 at 06:02:16PM +0200, John Wood wrote:
> On Thu, Sep 10, 2020 at 04:56:19PM -0700, Kees Cook wrote:
> > On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> > > +		pr_warn("fbfam: Offending process with PID %d killed\n",
> > > +			p->pid);
> >
> > I'd make this ratelimited (along with Jann's suggestions).
> 
> Sorry, but I don't understand what you mean with "make this ratelimited".
> A clarification would be greatly appreciated.

Ah! Yes, sorry for not being more clear. There are ratelimit helpers for
the pr_*() family of functions, e.g.:

	pr_warn_ratelimited("brute: Offending process with PID...
John Wood Sept. 19, 2020, 8:01 a.m. UTC | #6
On Fri, Sep 18, 2020 at 02:35:12PM -0700, Kees Cook wrote:
> On Fri, Sep 18, 2020 at 06:02:16PM +0200, John Wood wrote:
> > On Thu, Sep 10, 2020 at 04:56:19PM -0700, Kees Cook wrote:
> > > On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> > > > +		pr_warn("fbfam: Offending process with PID %d killed\n",
> > > > +			p->pid);
> > >
> > > I'd make this ratelimited (along with Jann's suggestions).
> >
> > Sorry, but I don't understand what you mean with "make this ratelimited".
> > A clarification would be greatly appreciated.
>
> Ah! Yes, sorry for not being more clear. There are ratelimit helpers for
> the pr_*() family of functions, e.g.:
>
> 	pr_warn_ratelimited("brute: Offending process with PID...

Thanks for the clarification.

> --
> Kees Cook

Regards,
John Wood
diff mbox series

Patch

diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
index 3aa669e4ea51..173a6122390f 100644
--- a/security/fbfam/fbfam.c
+++ b/security/fbfam/fbfam.c
@@ -4,8 +4,11 @@ 
 #include <linux/errno.h>
 #include <linux/gfp.h>
 #include <linux/jiffies.h>
+#include <linux/pid.h>
 #include <linux/printk.h>
+#include <linux/rcupdate.h>
 #include <linux/refcount.h>
+#include <linux/sched/signal.h>
 #include <linux/signal.h>
 #include <linux/slab.h>
 
@@ -24,7 +27,8 @@  unsigned long sysctl_crashing_rate_threshold = 30000;
  * struct fbfam_stats - Fork brute force attack mitigation statistics.
  * @refc: Reference counter.
  * @faults: Number of crashes since jiffies.
- * @jiffies: First fork or execve timestamp.
+ * @jiffies: First fork or execve timestamp. If zero, the attack detection is
+ *           disabled.
  *
  * The purpose of this structure is to manage all the necessary information to
  * compute the crashing rate of an application. So, it holds a first fork or
@@ -175,13 +179,69 @@  int fbfam_exit(void)
 }
 
 /**
- * fbfam_handle_attack() - Fork brute force attack detection.
+ * fbfam_kill_tasks() - Kill the offending tasks
+ *
+ * When a fork brute force attack is detected it is necessary to kill all the
+ * offending tasks. Since this function is called from fbfam_handle_attack(),
+ * and so, every time a core dump is triggered, only is needed to kill the
+ * others tasks that share the same statistical data, not the current one as
+ * this is in the path to be killed.
+ *
+ * When the SIGKILL signal is sent to the offending tasks, this function will be
+ * called again during the core dump due to the shared statistical data shows a
+ * quickly crashing rate. So, to avoid kill again the same tasks due to a
+ * recursive call of this function, it is necessary to disable the attack
+ * detection setting the jiffies to zero.
+ *
+ * To improve the for_each_process loop it is possible to end it when all the
+ * tasks that shared the same statistics are found.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. Zero
+ *         otherwise.
+ */
+static int fbfam_kill_tasks(void)
+{
+	struct fbfam_stats *stats = current->fbfam_stats;
+	struct task_struct *p;
+	unsigned int to_kill, killed = 0;
+
+	if (!stats)
+		return -EFAULT;
+
+	to_kill = refcount_read(&stats->refc) - 1;
+	if (!to_kill)
+		return 0;
+
+	/* Disable the attack detection */
+	stats->jiffies = 0;
+	rcu_read_lock();
+
+	for_each_process(p) {
+		if (p == current || p->fbfam_stats != stats)
+			continue;
+
+		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
+		pr_warn("fbfam: Offending process with PID %d killed\n",
+			p->pid);
+
+		killed += 1;
+		if (killed >= to_kill)
+			break;
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
+/**
+ * fbfam_handle_attack() - Fork brute force attack detection and mitigation.
  * @signal: Signal number that causes the core dump.
  *
  * The crashing rate of an application is computed in milliseconds per fault in
  * each crash. So, if this rate goes under a certain threshold there is a clear
  * signal that the application is crashing quickly. At this moment, a fork brute
- * force attack is happening.
+ * force attack is happening. Under this scenario it is necessary to kill all
+ * the offending tasks in order to mitigate the attack.
  *
  * Return: -EFAULT if the current task doesn't have statistical data. Zero
  *         otherwise.
@@ -195,6 +255,10 @@  int fbfam_handle_attack(int signal)
 	if (!stats)
 		return -EFAULT;
 
+	/* The attack detection is disabled */
+	if (!stats->jiffies)
+		return 0;
+
 	if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
 	      signal == SIGSEGV || signal == SIGSYS))
 		return 0;
@@ -205,9 +269,11 @@  int fbfam_handle_attack(int signal)
 	delta_time = jiffies64_to_msecs(delta_jiffies);
 	crashing_rate = delta_time / (u64)stats->faults;
 
-	if (crashing_rate < (u64)sysctl_crashing_rate_threshold)
-		pr_warn("fbfam: Fork brute force attack detected\n");
+	if (crashing_rate >= (u64)sysctl_crashing_rate_threshold)
+		return 0;
 
+	pr_warn("fbfam: Fork brute force attack detected\n");
+	fbfam_kill_tasks();
 	return 0;
 }