diff mbox series

[v5,7/8] Documentation: Add documentation for the Brute LSM

Message ID 20210227153013.6747-8-john.wood@gmx.com (mailing list archive)
State New, archived
Headers show
Series Fork brute force attack mitigation | expand

Commit Message

John Wood Feb. 27, 2021, 3:30 p.m. UTC
Add some info detailing what is the Brute LSM, its motivation, weak
points of existing implementations, proposed solutions, enabling,
disabling and self-tests.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 Documentation/admin-guide/LSM/Brute.rst | 224 ++++++++++++++++++++++++
 Documentation/admin-guide/LSM/index.rst |   1 +
 security/brute/Kconfig                  |   3 +-
 3 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/LSM/Brute.rst

--
2.25.1

Comments

Andi Kleen Feb. 28, 2021, 6:56 p.m. UTC | #1
John Wood <john.wood@gmx.com> writes:
> +
> +To detect a brute force attack it is necessary that the statistics shared by all
> +the fork hierarchy processes be updated in every fatal crash and the most
> +important data to update is the application crash period.

So I haven't really followed the discussion and also not completely read
the patches (so apologies if that was already explained or is documented
somewhere else).

But what I'm missing here is some indication how much
memory these statistics can use up and how are they limited.

How much is the worst case extra memory consumption?

If there is no limit how is DoS prevented?

If there is a limit, there likely needs to be a way to throw out
information, and so the attack would just shift to forcing the kernel
to throw out this information before retrying.

e.g. if the data is hold for the parent shell: restart the parent
shell all the time.
e.g. if the data is hold for the sshd daemon used to log in:
Somehow cause sshd to respawn to discard the statistics.

Do I miss something here? How is that mitigated?

Instead of discussing all the low level tedious details of the
statistics it would be better to focus on these "high level"
problems here.

-Andi
John Wood March 2, 2021, 6:31 p.m. UTC | #2
On Sun, Feb 28, 2021 at 10:56:45AM -0800, Andi Kleen wrote:
> John Wood <john.wood@gmx.com> writes:
> > +
> > +To detect a brute force attack it is necessary that the statistics shared by all
> > +the fork hierarchy processes be updated in every fatal crash and the most
> > +important data to update is the application crash period.
>
> So I haven't really followed the discussion and also not completely read
> the patches (so apologies if that was already explained or is documented
> somewhere else).
>
> But what I'm missing here is some indication how much
> memory these statistics can use up and how are they limited.

The statistics shared by all the fork hierarchy processes are hold by the
"brute_stats" struct.

struct brute_cred {
	kuid_t uid;
	kgid_t gid;
	kuid_t suid;
	kgid_t sgid;
	kuid_t euid;
	kgid_t egid;
	kuid_t fsuid;
	kgid_t fsgid;
};

struct brute_stats {
	spinlock_t lock;
	refcount_t refc;
	unsigned char faults;
	u64 jiffies;
	u64 period;
	struct brute_cred saved_cred;
	unsigned char network : 1;
	unsigned char bounds_crossed : 1;
};

This is a fixed size struct where, in every process crash (due to a fatal
signal), the application crash period (period field) is updated on an on going
basis using an exponential moving average (this way it is not necessary to save
the old crash period values). The jiffies and faults fields complete the
statistics basic info. The saved_cred field is used to fine tuning the detection
(detect if the privileges have changed) and also is fixed size. And the
remaining flags are also used to narrow the detection (detect if a privilege
boundary has been crossed).

> How much is the worst case extra memory consumption?

In every fork system call the parent statistics are shared with the child
process. In every execve system call a new brute_stats struct is allocated. So,
only one brute_stats struct is allocated for every fork hierarchy (hierarchy of
processes from the execve system call). The more processes are running, the more
memory will be used.

> If there is no limit how is DoS prevented?
>
> If there is a limit, there likely needs to be a way to throw out
> information, and so the attack would just shift to forcing the kernel
> to throw out this information before retrying.
>
> e.g. if the data is hold for the parent shell: restart the parent
> shell all the time.
> e.g. if the data is hold for the sshd daemon used to log in:
> Somehow cause sshd to respawn to discard the statistics.

When a process crashes due to a fatal signal delivered by the kernel (with some
user signal exceptions) the statistics shared by this process with all the fork
hierarchy processes are updated. This allow us to detect a brute force attack
through the "fork" system call. If these statistics show a fast crash rate a
mitigation is triggered. Also, these statistics are removed when all the
processes in this hierarchy have finished.

But at the same time these statistics are updated, also are updated the
statistics of the parent fork hierarchy (the statistics shared by the process
that "exec" the child process annotated in the last paragraph). This way a brute
force attack through the "execve" system call can be detected. Also, if this
new statistics show a fast crash rate the mitigation is triggered.

> Do I miss something here? How is that mitigated?

As a mitigation method, all the offending tasks involved in the attack are
killed. Or in other words, all the tasks that share the same statistics
(statistics showing a fast crash rate) are killed.

> Instead of discussing all the low level tedious details of the
> statistics it would be better to focus on these "high level"
> problems here.

Thanks for the advise. I will improve the documentation adding these high level
details.

> -Andi
>

I hope this info clarify your questions. If not, I will try again.

Thanks,
John Wood
Andi Kleen March 7, 2021, 3:19 p.m. UTC | #3
Sorry for the late answer. I somehow missed your email earlier.

> As a mitigation method, all the offending tasks involved in the attack are
> killed. Or in other words, all the tasks that share the same statistics
> (statistics showing a fast crash rate) are killed.

So systemd will just restart the network daemon and then the attack works
again?

Or if it's a interactive login you log in again.

I think it might be useful even with these limitations, but it would
be good to spell out the limitations of the method more clearly.

I suspect to be useful it'll likely need some user space configuration
changes too.

-Andi
John Wood March 7, 2021, 4:45 p.m. UTC | #4
On Sun, Mar 07, 2021 at 07:19:20AM -0800, Andi Kleen wrote:
> Sorry for the late answer. I somehow missed your email earlier.
>
> > As a mitigation method, all the offending tasks involved in the attack are
> > killed. Or in other words, all the tasks that share the same statistics
> > (statistics showing a fast crash rate) are killed.
>
> So systemd will just restart the network daemon and then the attack works
> again?

Sorry, but I think my last explanation is not clear enough. If the network
daemon crashes repeatedly in a short period of time it will trigger a brute
force attack through the fork system call. Then this daemon and all the fork
processes created from it will be killed. If the systemd restart the network
daemon and it will crash again, then the systemd will be killed. I think this
way the attack is fully mitigated.

> Or if it's a interactive login you log in again.

First the login will be killed (if it fails with a fatal signal) and if it is
restarted, the process that exec() it again will be killed. In this case I think
that the threat is also completely mitigated.

> I think it might be useful even with these limitations, but it would
> be good to spell out the limitations of the method more clearly.
>
> I suspect to be useful it'll likely need some user space configuration
> changes too.

In the v2 version there were some sysctl attributes to fine tuning the
detection. The following two paragraph are extracted from the documentation
patch of this version:

To customize the detection's sensibility there are two new sysctl attributes
that allow to set the last crashes timestamps list size and the application
crash period threshold (in milliseconds). Both are accessible through the
following files respectively.

/proc/sys/kernel/brute/timestamps_list_size
/proc/sys/kernel/brute/crash_period_threshold

However, Kees Cook suggested that if we narrow the attack detection focusing in
the crossing of privilege boundaries and signals delivered only by the kernel,
it seems not necessary the customization of this feature by the user. I aggree
with that.

>
> -Andi

I have sent a v6 version with the documentation improved.

Thanks for your comments,
John Wood
Andi Kleen March 7, 2021, 5:25 p.m. UTC | #5
> processes created from it will be killed. If the systemd restart the network
> daemon and it will crash again, then the systemd will be killed. I think this
> way the attack is fully mitigated.

Wouldn't that panic the system? Killing init is usually a panic.

> > Or if it's a interactive login you log in again.
> 
> First the login will be killed (if it fails with a fatal signal) and if it is
> restarted, the process that exec() it again will be killed. In this case I think
> that the threat is also completely mitigated.

Okay so sshd will be killed. And if it gets restarted eventually init,
so panic again.

That's a fairly drastic consequence because even without panic 
it means nobody can fix the system anymore without a console.

So probably the mitigation means that most such attacks eventually lead
to a panic because they will reach init sooner or later.

Another somewhat worrying case is some bug that kills KVM guests.
So if the bug can be triggered frequently you can kill all the
virtualization management infrastructure.

I don't remember seeing a discussion of such drastic consequences in
your description. It might be ok depending on the use case,
but people certainly need to be aware of it.

It's probably not something you want to have enabled by default ever.

-Andi
John Wood March 7, 2021, 6:05 p.m. UTC | #6
On Sun, Mar 07, 2021 at 09:25:40AM -0800, Andi Kleen wrote:
> > processes created from it will be killed. If the systemd restart the network
> > daemon and it will crash again, then the systemd will be killed. I think this
> > way the attack is fully mitigated.
>
> Wouldn't that panic the system? Killing init is usually a panic.

The mitigation acts only over the process that crashes (network daemon) and the
process that exec() it (systemd). This mitigation don't go up in the processes
tree until reach the init process.

Note: I am a kernel newbie and I don't know if the systemd is init. Sorry if it
is a stupid question. AFAIK systemd is not the init process (the first process
that is executed) but I am not sure.

>
> > > Or if it's a interactive login you log in again.
> >
> > First the login will be killed (if it fails with a fatal signal) and if it is
> > restarted, the process that exec() it again will be killed. In this case I think
> > that the threat is also completely mitigated.
>
> Okay so sshd will be killed. And if it gets restarted eventually init,
> so panic again.

In this scenario the process that exec() the login will be killed (sshd
process). But I think that sshd is not the init process. So no panic.

> That's a fairly drastic consequence because even without panic
> it means nobody can fix the system anymore without a console.

So, you suggest that the mitigation method for the brute force attack through
the execve system call should be different (not kill the process that exec).
Any suggestions would be welcome to improve this feature.

> So probably the mitigation means that most such attacks eventually lead
> to a panic because they will reach init sooner or later.

I think it is not correct. As explain earlier the current mitigation method only
works over the process that crashes and their parent. It not go up in the
processes tree until reach the init process.

> Another somewhat worrying case is some bug that kills KVM guests.
> So if the bug can be triggered frequently you can kill all the
> virtualization management infrastructure.

Well, we need to work to avoid false positives.

> I don't remember seeing a discussion of such drastic consequences in
> your description. It might be ok depending on the use case,
> but people certainly need to be aware of it.
>
> It's probably not something you want to have enabled by default ever.
>
> -Andi
>
Thanks,
John Wood
Andi Kleen March 7, 2021, 10:49 p.m. UTC | #7
On Sun, Mar 07, 2021 at 07:05:41PM +0100, John Wood wrote:
> On Sun, Mar 07, 2021 at 09:25:40AM -0800, Andi Kleen wrote:
> > > processes created from it will be killed. If the systemd restart the network
> > > daemon and it will crash again, then the systemd will be killed. I think this
> > > way the attack is fully mitigated.
> >
> > Wouldn't that panic the system? Killing init is usually a panic.
> 
> The mitigation acts only over the process that crashes (network daemon) and the
> process that exec() it (systemd). This mitigation don't go up in the processes
> tree until reach the init process.

Most daemons have some supervisor that respawns them when they crash. 
(maybe read up on "supervisor trees" if you haven't, it's a standard concept)

That's usually (but not) always init, as in systemd. There might be something
inbetween it and init, but likely init would respawn the something in between
it it. One of the main tasks of init is to respawn things under it.

If you have a supervisor tree starting from init the kill should eventually
travel up to init.

At least that's the theory. Do you have some experiments that show
this doesn't happen?

> 
> Note: I am a kernel newbie and I don't know if the systemd is init. Sorry if it
> is a stupid question. AFAIK systemd is not the init process (the first process
> that is executed) but I am not sure.

At least the part of systemd that respawns is often (but not always) init.

> 
> >
> > > > Or if it's a interactive login you log in again.
> > >
> > > First the login will be killed (if it fails with a fatal signal) and if it is
> > > restarted, the process that exec() it again will be killed. In this case I think
> > > that the threat is also completely mitigated.
> >
> > Okay so sshd will be killed. And if it gets restarted eventually init,
> > so panic again.
> 
> In this scenario the process that exec() the login will be killed (sshd
> process). But I think that sshd is not the init process. So no panic.

sshd would be respawned by the supervisor, which is likely init.

> > That's a fairly drastic consequence because even without panic
> > it means nobody can fix the system anymore without a console.
> 
> So, you suggest that the mitigation method for the brute force attack through
> the execve system call should be different (not kill the process that exec).
> Any suggestions would be welcome to improve this feature.

If the system is part of some cluster, then panicing on attack or failure
could be a reasonable reaction. Some other system in the cluster should
take over. There's also a risk that all the systems get taken
out quickly one by one, in this case you might still need something
like the below.

But it's something that would need to be very carefully considered
for the environment.

The other case is when there isn't some fallback, as in a standalone
machine.

It could be only used when the supervisor daemons are aware of it.
Often they already have respawn limits, but would need to make sure they
trigger before your algorithm trigger. Or maybe some way to opt-out 
per process.  Then the DoS would be only against that process, but
not everything on the machine. 

So I think it needs more work on the user space side for most usages.

-Andi
John Wood March 9, 2021, 6:40 p.m. UTC | #8
Hi,

On Sun, Mar 07, 2021 at 02:49:27PM -0800, Andi Kleen wrote:
> On Sun, Mar 07, 2021 at 07:05:41PM +0100, John Wood wrote:
> > On Sun, Mar 07, 2021 at 09:25:40AM -0800, Andi Kleen wrote:
> > > > processes created from it will be killed. If the systemd restart the network
> > > > daemon and it will crash again, then the systemd will be killed. I think this
> > > > way the attack is fully mitigated.
> > >
> > > Wouldn't that panic the system? Killing init is usually a panic.
> >
> > The mitigation acts only over the process that crashes (network daemon) and the
> > process that exec() it (systemd). This mitigation don't go up in the processes
> > tree until reach the init process.
>
> Most daemons have some supervisor that respawns them when they crash.
> (maybe read up on "supervisor trees" if you haven't, it's a standard concept)
>
> That's usually (but not) always init, as in systemd. There might be something
> inbetween it and init, but likely init would respawn the something in between
> it it. One of the main tasks of init is to respawn things under it.
>
> If you have a supervisor tree starting from init the kill should eventually
> travel up to init.

I will try to demostrate that the mitigation don't travel up to init. To do so I
will use the following scenario (brute force attack through the execve system
call):

init -------exec()-------> supervisor -------exec()-----> network daemon
faults = 0                 faults = 0                     faults = 0
period = ---               period = ---                   period = ---

Now the network daemon crashes (its stats an updated and also the supervisor
stats):

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 1                     faults = 1
period = ---               period = 10ms                  period = 10ms

Then the network daemon is freed and its stats are removed:

init --------------------> supervisor
faults = 0                 faults = 1
period = ---               period = 10ms

Now the supervisor respawns the daemon (the stats are initialized):

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 1                     faults = 0
period = ---               period = 10ms                  period = ---

The network daemon crashes again:

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 2                     faults = 1
period = ---               period = 11ms                  period = 12ms

The network daemon is freed again:

init --------------------> supervisor
faults = 0                 faults = 2
period = ---               period = 11ms

The supervisor respawns again the daemon:

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 2                     faults = 0
period = ---               period = 11ms                  period = ---

This steps are repeated x number of times until a minimum number of faults
triggers the brute force attack mitigation. At this moment:

init --------------------> supervisor ------------------> network daemon
faults = 0                 faults = 5                     faults = 1
period = ---               period = 13ms                  period = 15ms

Now the network daemon is freed and the supervisor is killed by the mitigation
method. At this point is importart to note that before send the kill signal to
the supervisor its stats are disabled. This means that when the supervisor is
killed its stats are now not updated. So the init stats are also not updated.

init
faults = 0
period = ---

From the point of view of the init process nothing has happened.

> At least that's the theory. Do you have some experiments that show
> this doesn't happen?

Yes. The kernel selftest try to emulate some scenarios. Basically brute force
attacks through the execve system call (like the case exposed) and also brute
force attacks through the fork system call. Playing with the crossing of some
privilege boundaries.

For example:

In the tests an application execs() another application that crashes. Then
respawn the application that has crashed and this last crashes again. The
respawn is executed until the brute force attack through the execve system call
and then the application that execs() is killed. But any other applications are
killed. Only the tasks involved in the attack.
>
> >
> > Note: I am a kernel newbie and I don't know if the systemd is init. Sorry if it
> > is a stupid question. AFAIK systemd is not the init process (the first process
> > that is executed) but I am not sure.
>
> At least the part of systemd that respawns is often (but not always) init.

Thanks for the clarification.

> > So, you suggest that the mitigation method for the brute force attack through
> > the execve system call should be different (not kill the process that exec).
> > Any suggestions would be welcome to improve this feature.
>
> If the system is part of some cluster, then panicing on attack or failure
> could be a reasonable reaction. Some other system in the cluster should
> take over. There's also a risk that all the systems get taken
> out quickly one by one, in this case you might still need something
> like the below.
>
> But it's something that would need to be very carefully considered
> for the environment.
>
> The other case is when there isn't some fallback, as in a standalone
> machine.
>
> It could be only used when the supervisor daemons are aware of it.
> Often they already have respawn limits, but would need to make sure they
> trigger before your algorithm trigger. Or maybe some way to opt-out
> per process.  Then the DoS would be only against that process, but
> not everything on the machine.

Thanks for the suggestions.

> So I think it needs more work on the user space side for most usages.
>

Anyway, in the case that the supervisor is init then the system will panic. So,
I think that we can add a prctl to avoid kill the parent task (the task that
exec) and only block new fork system calls from this task. When this boolean is
set, any parent task that is involved in the attack will not be killed. In this
case, any following forks will be blocked. This way the system will not crash.

What do you think?

> -Andi

Thanks for your time and patience.
John Wood
John Wood March 11, 2021, 6:22 p.m. UTC | #9
Hi,

On Tue, Mar 09, 2021 at 07:40:54PM +0100, John Wood wrote:
> On Sun, Mar 07, 2021 at 02:49:27PM -0800, Andi Kleen wrote:
>
> > So I think it needs more work on the user space side for most usages.
>
> Anyway, in the case that the supervisor is init then the system will panic. So,
> I think that we can add a prctl to avoid kill the parent task (the task that
> exec) and only block new fork system calls from this task. When this boolean is
> set, any parent task that is involved in the attack will not be killed. In this
> case, any following forks will be blocked. This way the system will not crash.

Another proposal that I think suits better:

When a brute force attack is detected through the fork or execve system call,
all the tasks involved in the attack will be killed with the exception of the
init task (task with pid equal to zero). Now, and only if the init task is
involved in the attack, block the fork system call from the init process during
a user defined time (using a sysctl attribute). This way the brute force attack
is mitigated and the system does not panic.

I think that this is a better solution than the other one since this is a per
system solution. And I believe that with a default value for the blocking time
(sysctl attribute) could be useful in a generic way (for most usages).

The proposal using prctl will need more actions from userspace and it is not a
generic one due to it is a per process solution.

> What do you think?

Thanks,
John Wood
Andi Kleen March 11, 2021, 8:05 p.m. UTC | #10
<scenario that init will not be killed>

Thanks.

Okay but that means that the brute force attack can just continue
because the attacked daemon will be respawned?

You need some way to stop the respawning, otherwise the
mitigation doesn't work for daemons.


-Andi
Andi Kleen March 11, 2021, 8:08 p.m. UTC | #11
> When a brute force attack is detected through the fork or execve system call,
> all the tasks involved in the attack will be killed with the exception of the
> init task (task with pid equal to zero). Now, and only if the init task is
> involved in the attack, block the fork system call from the init process during
> a user defined time (using a sysctl attribute). This way the brute force attack
> is mitigated and the system does not panic.

That means nobody can log in and fix the system during that time.

Would be better to have that policy in init. Perhaps add some way
that someone doing wait*() can know the exit was due this mitigation
(and not something way) Then they could disable respawning of that daemon.

-Andi
John Wood March 12, 2021, 5:47 p.m. UTC | #12
On Thu, Mar 11, 2021 at 12:08:11PM -0800, Andi Kleen wrote:
> > When a brute force attack is detected through the fork or execve system call,
> > all the tasks involved in the attack will be killed with the exception of the
> > init task (task with pid equal to zero). Now, and only if the init task is
> > involved in the attack, block the fork system call from the init process during
> > a user defined time (using a sysctl attribute). This way the brute force attack
> > is mitigated and the system does not panic.
>
> That means nobody can log in and fix the system during that time.
>
> Would be better to have that policy in init. Perhaps add some way
> that someone doing wait*() can know the exit was due this mitigation
> (and not something way) Then they could disable respawning of that daemon.

Great. So, if we use wait*() to inform userspace that the exit of a process was
due to a brute force attack then, the supervisors (not only init) can adopt the
necessary policy in each case. This also allow us to deal with the respawned
daemons.

As a summary of this useful discussion:

- When a brute force attack is detected through the fork or execve system call
  all the offending tasks involved in the attack will be killed. Due to the
  mitigation normally not reach init, do nothing special in this case -> the
  system will panic when we kill init.

- Use wait*() to inform userspace that every process killed by the mitigation
  has exited due to a brute force attack mitigation. So, each supervisor can
  adopt their own policy regarding respawned daemons.

I will work in that direction for the next version.

Thanks a lot for your time, proposals, guidance and solutions.
John Wood
John Wood March 12, 2021, 5:54 p.m. UTC | #13
On Thu, Mar 11, 2021 at 12:05:17PM -0800, Andi Kleen wrote:
>
> Okay but that means that the brute force attack can just continue
> because the attacked daemon will be respawned?
>
> You need some way to stop the respawning, otherwise the
> mitigation doesn't work for daemons.
>
I will work on your solution regarding respawned daemons (use wait*() to inform
userspace that the offending processes killed by the mitigation exited due to
this mitigation -> then the supervisor can adopt their own policy).

>
> -Andi
>

Thank you very much,
John Wood
diff mbox series

Patch

diff --git a/Documentation/admin-guide/LSM/Brute.rst b/Documentation/admin-guide/LSM/Brute.rst
new file mode 100644
index 000000000000..485966a610bb
--- /dev/null
+++ b/Documentation/admin-guide/LSM/Brute.rst
@@ -0,0 +1,224 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+===========================================================
+Brute: Fork brute force attack detection and mitigation LSM
+===========================================================
+
+Attacks against vulnerable userspace applications with the purpose to break ASLR
+or bypass canaries traditionally use some level of brute force with the help of
+the fork system call. This is possible since when creating a new process using
+fork its memory contents are the same as those of the parent process (the
+process that called the fork system call). So, the attacker can test the memory
+infinite times to find the correct memory values or the correct memory addresses
+without worrying about crashing the application.
+
+Based on the above scenario it would be nice to have this detected and
+mitigated, and this is the goal of this implementation. Specifically the
+following attacks are expected to be detected:
+
+1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
+    desirable memory layout is got (e.g. Stack Clash).
+2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a
+    desirable memory layout is got (e.g. what CTFs do for simple network
+    service).
+3.- Launching processes without exec() (e.g. Android Zygote) and exposing state
+    to attack a sibling.
+4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the
+    previously shared memory layout of all the other children is exposed (e.g.
+    kind of related to HeartBleed).
+
+In each case, a privilege boundary has been crossed:
+
+Case 1: setuid/setgid process
+Case 2: network to local
+Case 3: privilege changes
+Case 4: network to local
+
+So, what really needs to be detected are fork/exec brute force attacks that
+cross any of the commented bounds.
+
+
+Other implementations
+=====================
+
+The public version of grsecurity, as a summary, is based on the idea of delaying
+the fork system call if a child died due to some fatal signal (SIGSEGV, SIGBUS,
+SIGKILL or SIGILL). This has some issues:
+
+Bad practices
+-------------
+
+Adding delays to the kernel is, in general, a bad idea.
+
+Scenarios not detected (false negatives)
+----------------------------------------
+
+This protection acts only when the fork system call is called after a child has
+crashed. So, it would still be possible for an attacker to fork a big amount of
+children (in the order of thousands), then probe all of them, and finally wait
+the protection time before repeating the steps.
+
+Moreover, this method is based on the idea that the protection doesn't act if
+the parent crashes. So, it would still be possible for an attacker to fork a
+process and probe itself. Then, fork the child process and probe itself again.
+This way, these steps can be repeated infinite times without any mitigation.
+
+Scenarios detected (false positives)
+------------------------------------
+
+Scenarios where an application rarely fails for reasons unrelated to a real
+attack.
+
+
+This implementation
+===================
+
+The main idea behind this implementation is to improve the existing ones
+focusing on the weak points annotated before. Basically, the adopted solution is
+to detect a fast crash rate instead of only one simple crash and to detect both
+the crash of parent and child processes. Also, fine tune the detection focusing
+on privilege boundary crossing. And finally, as a mitigation method, kill all
+the offending tasks involved in the attack instead of using delays.
+
+To achieve this goal, and going into more details, this implementation is based
+on the use of some statistical data shared across all the processes that can
+have the same memory contents. Or in other words, a statistical data shared
+between all the fork hierarchy processes after an execve system call.
+
+The purpose of these statistics is, basically, collect all the necessary info
+to compute the application crash period in order to detect an attack. This crash
+period is the time between the execve system call and the first fault or the
+time between two consecutive faults, but this has a drawback. If an application
+crashes twice in a short period of time for some reason unrelated to a real
+attack, a false positive will be triggered. To avoid this scenario the
+exponential moving average (EMA) is used. This way, the application crash period
+will be a value that is not prone to change due to spurious data and follows the
+real crash period.
+
+To detect a brute force attack it is necessary that the statistics shared by all
+the fork hierarchy processes be updated in every fatal crash and the most
+important data to update is the application crash period.
+
+There are two types of brute force attacks that need to be detected. The first
+one is an attack that happens through the fork system call and the second one is
+an attack that happens through the execve system call. The first type uses the
+statistics shared by all the fork hierarchy processes, but the second type
+cannot use this statistical data due to these statistics dissapear when the
+involved tasks finished. In this last scenario the attack info should be tracked
+by the statistics of a higher fork hierarchy (the hierarchy that contains the
+process that forks before the execve system call).
+
+Moreover, these two attack types have two variants. A slow brute force attack
+that is detected if a maximum number of faults per fork hierarchy is reached and
+a fast brute force attack that is detected if the application crash period falls
+below a certain threshold.
+
+Exponential moving average (EMA)
+--------------------------------
+
+This kind of average defines a weight (between 0 and 1) for the new value to add
+and applies the remainder of the weight to the current average value. This way,
+some spurious data will not excessively modify the average and only if the new
+values are persistent, the moving average will tend towards them.
+
+Mathematically the application crash period's EMA can be expressed as follows:
+
+period_ema = period * weight + period_ema * (1 - weight)
+
+Related to the attack detection, the EMA must guarantee that not many crashes
+are needed. To demonstrate this, the scenario where an application has been
+running without any crashes for a month will be used.
+
+The period's EMA can be written now as:
+
+period_ema[i] = period[i] * weight + period_ema[i - 1] * (1 - weight)
+
+If the new crash periods have insignificant values related to the first crash
+period (a month in this case), the formula can be rewritten as:
+
+period_ema[i] = period_ema[i - 1] * (1 - weight)
+
+And by extension:
+
+period_ema[i - 1] = period_ema[i - 2] * (1 - weight)
+period_ema[i - 2] = period_ema[i - 3] * (1 - weight)
+period_ema[i - 3] = period_ema[i - 4] * (1 - weight)
+
+So, if the substitution is made:
+
+period_ema[i] = period_ema[i - 1] * (1 - weight)
+period_ema[i] = period_ema[i - 2] * pow((1 - weight) , 2)
+period_ema[i] = period_ema[i - 3] * pow((1 - weight) , 3)
+period_ema[i] = period_ema[i - 4] * pow((1 - weight) , 4)
+
+And in a more generic form:
+
+period_ema[i] = period_ema[i - n] * pow((1 - weight) , n)
+
+Where n represents the number of iterations to obtain an EMA value. Or in other
+words, the number of crashes to detect an attack.
+
+So, if we isolate the number of crashes:
+
+period_ema[i] / period_ema[i - n] = pow((1 - weight), n)
+log(period_ema[i] / period_ema[i - n]) = log(pow((1 - weight), n))
+log(period_ema[i] / period_ema[i - n]) = n * log(1 - weight)
+n = log(period_ema[i] / period_ema[i - n]) / log(1 - weight)
+
+Then, in the commented scenario (an application has been running without any
+crashes for a month), the approximate number of crashes to detect an attack
+(using the implementation values for the weight and the crash period threshold)
+is:
+
+weight = 7 / 10
+crash_period_threshold = 30 seconds
+
+n = log(crash_period_threshold / seconds_per_month) / log(1 - weight)
+n = log(30 / (30 * 24 * 3600)) / log(1 - 0.7)
+n = 9.44
+
+So, with 10 crashes for this scenario an attack will be detected. If these steps
+are repeated for different scenarios and the results are collected:
+
+1 month without any crashes ----> 9.44 crashes to detect an attack
+1 year without any crashes -----> 11.50 crashes to detect an attack
+10 years without any crashes ---> 13.42 crashes to detect an attack
+
+However, this computation has a drawback. The first data added to the EMA not
+obtains a real average showing a trend. So the solution is simple, the EMA needs
+a minimum number of data to be able to be interpreted. This way, the case where
+a few first faults are fast enough followed by no crashes is avoided.
+
+Per system enabling/disabling
+-----------------------------
+
+This feature can be enabled at build time using the CONFIG_SECURITY_FORK_BRUTE
+option or using the visual config application under the following menu:
+
+Security options  --->  Fork brute force attack detection and mitigation
+
+Also, at boot time, this feature can be disable too, by changing the "lsm=" boot
+parameter.
+
+Kernel selftests
+----------------
+
+To validate all the expectations about this implementation, there is a set of
+selftests. This tests cover fork/exec brute force attacks crossing the following
+privilege boundaries:
+
+1.- setuid process
+2.- privilege changes
+3.- network to local
+
+Also, there are some tests to check that fork/exec brute force attacks without
+crossing any privilege boundariy already commented doesn't trigger the detection
+and mitigation stage.
+
+To build the tests:
+make -C tools/testing/selftests/ TARGETS=brute
+
+To run the tests:
+make -C tools/testing/selftests TARGETS=brute run_tests
+
+To package the tests:
+make -C tools/testing/selftests TARGETS=brute gen_tar
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
index a6ba95fbaa9f..1f68982bb330 100644
--- a/Documentation/admin-guide/LSM/index.rst
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -41,6 +41,7 @@  subdirectories.
    :maxdepth: 1

    apparmor
+   Brute
    LoadPin
    SELinux
    Smack
diff --git a/security/brute/Kconfig b/security/brute/Kconfig
index 1bd2df1e2dec..334d7e88d27f 100644
--- a/security/brute/Kconfig
+++ b/security/brute/Kconfig
@@ -7,6 +7,7 @@  config SECURITY_FORK_BRUTE
 	  vulnerable userspace processes. The detection method is based on
 	  the application crash period and as a mitigation procedure all the
 	  offending tasks are killed. Like capabilities, this security module
-	  stacks with other LSMs.
+	  stacks with other LSMs. Further information can be found in
+	  Documentation/admin-guide/LSM/Brute.rst.

 	  If you are unsure how to answer this question, answer N.