diff mbox series

[RFC,09/10] kfence, Documentation: add KFENCE documentation

Message ID 20200907134055.2878499-10-elver@google.com
State New
Headers show
Series KFENCE: A low-overhead sampling-based memory safety error detector | expand

Commit Message

Marco Elver Sept. 7, 2020, 1:40 p.m. UTC
Add KFENCE documentation in dev-tools/kfence.rst, and add to index.

Co-developed-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
---
 Documentation/dev-tools/index.rst  |   1 +
 Documentation/dev-tools/kfence.rst | 285 +++++++++++++++++++++++++++++
 2 files changed, 286 insertions(+)
 create mode 100644 Documentation/dev-tools/kfence.rst

Comments

Andrey Konovalov Sept. 7, 2020, 3:33 p.m. UTC | #1
On Mon, Sep 7, 2020 at 3:41 PM Marco Elver <elver@google.com> wrote:
>
> Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
>
> Co-developed-by: Alexander Potapenko <glider@google.com>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  Documentation/dev-tools/index.rst  |   1 +
>  Documentation/dev-tools/kfence.rst | 285 +++++++++++++++++++++++++++++
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/dev-tools/kfence.rst
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..1b1cf4f5c9d9 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -22,6 +22,7 @@ whole; patches welcome!
>     ubsan
>     kmemleak
>     kcsan
> +   kfence
>     gdb-kernel-debugging
>     kgdb
>     kselftest
> diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
> new file mode 100644
> index 000000000000..254f4f089104
> --- /dev/null
> +++ b/Documentation/dev-tools/kfence.rst
> @@ -0,0 +1,285 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Kernel Electric-Fence (KFENCE)
> +==============================
> +
> +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
> +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
> +invalid-free errors.
> +
> +KFENCE is designed to be enabled in production kernels, and has near zero
> +performance overhead. Compared to KASAN, KFENCE trades performance for
> +precision. The main motivation behind KFENCE's design, is that with enough
> +total uptime KFENCE will detect bugs in code paths not typically exercised by
> +non-production test workloads. One way to quickly achieve a large enough total
> +uptime is when the tool is deployed across a large fleet of machines.
> +
> +Usage
> +-----
> +
> +To enable KFENCE, configure the kernel with::
> +
> +    CONFIG_KFENCE=y
> +
> +KFENCE provides several other configuration options to customize behaviour (see
> +the respective help text in ``lib/Kconfig.kfence`` for more info).
> +
> +Tuning performance
> +~~~~~~~~~~~~~~~~~~
> +
> +The most important parameter is KFENCE's sample interval, which can be set via
> +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> +sample interval determines the frequency with which heap allocations will be
> +guarded by KFENCE. The default is configurable via the Kconfig option
> +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> +disables KFENCE.
> +
> +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> +of available guarded objects can be controlled. Each object requires 2 pages,
> +one for the object itself and the other one used as a guard page; object pages
> +are interleaved with guard pages, and every object page is therefore surrounded
> +by two guard pages.
> +
> +The total memory dedicated to the KFENCE memory pool can be computed as::
> +
> +    ( #objects + 1 ) * 2 * PAGE_SIZE
> +
> +Using the default config, and assuming a page size of 4 KiB, results in
> +dedicating 2 MiB to the KFENCE memory pool.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical out-of-bounds access looks like this::
> +
> +    ==================================================================
> +    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
> +
> +    Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
> +     test_out_of_bounds_read+0xa3/0x22b
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:

Does the user need to know that this is object #17? This doesn't seem
like something that can be useful for anything.

> +     __kfence_alloc+0x42d/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_out_of_bounds_read+0x98/0x22b
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the function involved in
> +the access. It is followed by more detailed information about the access and
> +its origin.
> +
> +Use-after-free accesses are reported as::
> +
> +    ==================================================================
> +    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
> +
> +    Use-after-free access at 0xffffffffb673dfe0:
> +     test_use_after_free_read+0xb3/0x143
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:

Same here.

Also, this says object #24, but the stack trace above doesn't mention
which object it is. Is it the same one?

> +     __kfence_alloc+0x277/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_use_after_free_read+0x76/0x143
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +    freed in:
> +     kfence_guarded_free+0x158/0x380
> +     __kfence_free+0x38/0xc0
> +     test_use_after_free_read+0xa8/0x143
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +KFENCE also reports on invalid frees, such as double-frees::
> +
> +    ==================================================================
> +    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
> +
> +    Invalid free of 0xffffffffb6741000:
> +     test_double_free+0xdc/0x171
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
> +     __kfence_alloc+0x42d/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_double_free+0x76/0x171
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +    freed in:
> +     kfence_guarded_free+0x158/0x380
> +     __kfence_free+0x38/0xc0
> +     test_double_free+0xa8/0x171
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +KFENCE also uses pattern-based redzones on the other side of an object's guard
> +page, to detect out-of-bounds writes on the unprotected side of the object.
> +These are reported on frees::
> +
> +    ==================================================================
> +    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
> +
> +    Detected corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ]:

It's not really clear what is 0xac here. Value of the corrupted byte?
What does '.' stand for?

Also, if this is to be used in production, printing kernel memory
bytes might lead to info-leaks.

> +     test_kmalloc_aligned_oob_write+0xef/0x184
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
> +     __kfence_alloc+0x277/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_kmalloc_aligned_oob_write+0x57/0x184
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +For such errors, the address where the corruption as well as the corrupt bytes
> +are shown.
> +
> +And finally, KFENCE may also report on invalid accesses to any protected page
> +where it was not possible to determine an associated object, e.g. if adjacent
> +object pages had not yet been allocated::
> +
> +    ==================================================================
> +    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
> +
> +    Invalid access at 0xffffffffb670b00a:
> +     test_invalid_access+0x26/0xe0
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +DebugFS interface
> +~~~~~~~~~~~~~~~~~
> +
> +Some debugging information is exposed via debugfs:
> +
> +* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
> +
> +* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
> +  allocated via KFENCE, including those already freed but protected.
> +
> +Implementation Details
> +----------------------
> +
> +Guarded allocations are set up based on the sample interval. After expiration
> +of the sample interval, a guarded allocation from the KFENCE object pool is
> +returned to the main allocator (SLAB or SLUB).

Only for freed allocations, right?

> At this point, the timer is
> +reset, and the next allocation is set up after the expiration of the interval.
> +To "gate" a KFENCE allocation through the main allocator's fast-path without
> +overhead, KFENCE relies on static branches via the static keys infrastructure.
> +The static branch is toggled to redirect the allocation to KFENCE.
> +
> +KFENCE objects each reside on a dedicated page, at either the left or right
> +page boundaries selected at random. The pages to the left and right of the
> +object page are "guard pages", whose attributes are changed to a protected
> +state, and cause page faults on any attempted access. Such page faults are then
> +intercepted by KFENCE, which handles the fault gracefully by reporting an
> +out-of-bounds access.

I'd start a new paragraph here:

> The side opposite of an object's guard page is used as a

Not a native speaker, but "The side opposite _to_" sounds better. Or
"The opposite side of".

> +pattern-based redzone, to detect out-of-bounds writes on the unprotected sed of

"sed"?

> +the object on frees (for special alignment and size combinations, both sides of
> +the object are redzoned).
> +
> +KFENCE also uses pattern-based redzones on the other side of an object's guard
> +page, to detect out-of-bounds writes on the unprotected side of the object;
> +these are reported on frees.

Not really clear, what is "other side" and how it's different from the
"opposite side" mentioned above. The figure doesn't really help.

> +
> +The following figure illustrates the page layout::
> +
> +    ---+-----------+-----------+-----------+-----------+-----------+---
> +       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
> +       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
> +       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> +       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
> +       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
> +       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
> +    ---+-----------+-----------+-----------+-----------+-----------+---
> +
> +Upon deallocation of a KFENCE object, the object's page is again protected and
> +the object is marked as freed. Any further access to the object causes a fault
> +and KFENCE reports a use-after-free access. Freed objects are inserted at the
> +tail of KFENCE's freelist, so that the least recently freed objects are reused
> +first, and the chances of detecting use-after-frees of recently freed objects
> +is increased.

Seems really similar to KASAN's quarantine? Is the implementation much
different?

> +
> +Interface
> +---------
> +
> +The following describes the functions which are used by allocators as well page
> +handling code to set up and deal with KFENCE allocations.
> +
> +.. kernel-doc:: include/linux/kfence.h
> +   :functions: is_kfence_address
> +               kfence_shutdown_cache
> +               kfence_alloc kfence_free
> +               kfence_ksize kfence_object_start
> +               kfence_handle_page_fault
> +
> +Related Tools
> +-------------
> +
> +In userspace, a similar approach is taken by `GWP-ASan
> +<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
> +a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
> +directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
> +similar but non-sampling approach, that also inspired the name "KFENCE", can be
> +found in the userspace `Electric Fence Malloc Debugger
> +<https://linux.die.net/man/3/efence>`_.
> +
> +In the kernel, several tools exist to debug memory access errors, and in
> +particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
> +is more precise, relying on compiler instrumentation, this comes at a
> +performance cost. We want to highlight that KASAN and KFENCE are complementary,
> +with different target environments. For instance, KASAN is the better
> +debugging-aid, where a simple reproducer exists: due to the lower chance to
> +detect the error, it would require more effort using KFENCE to debug.
> +Deployments at scale, however, would benefit from using KFENCE to discover bugs
> +due to code paths not exercised by test cases or fuzzers.
> --
> 2.28.0.526.ge36021eeef-goog
>
Marco Elver Sept. 7, 2020, 4:33 p.m. UTC | #2
On Mon, 7 Sep 2020 at 17:34, Andrey Konovalov <andreyknvl@google.com> wrote:
>
> On Mon, Sep 7, 2020 at 3:41 PM Marco Elver <elver@google.com> wrote:
> >
> > Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
> >
> > Co-developed-by: Alexander Potapenko <glider@google.com>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  Documentation/dev-tools/index.rst  |   1 +
> >  Documentation/dev-tools/kfence.rst | 285 +++++++++++++++++++++++++++++
> >  2 files changed, 286 insertions(+)
> >  create mode 100644 Documentation/dev-tools/kfence.rst
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9..1b1cf4f5c9d9 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -22,6 +22,7 @@ whole; patches welcome!
> >     ubsan
> >     kmemleak
> >     kcsan
> > +   kfence
> >     gdb-kernel-debugging
> >     kgdb
> >     kselftest
> > diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
> > new file mode 100644
> > index 000000000000..254f4f089104
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kfence.rst
> > @@ -0,0 +1,285 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +Kernel Electric-Fence (KFENCE)
> > +==============================
> > +
> > +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
> > +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
> > +invalid-free errors.
> > +
> > +KFENCE is designed to be enabled in production kernels, and has near zero
> > +performance overhead. Compared to KASAN, KFENCE trades performance for
> > +precision. The main motivation behind KFENCE's design, is that with enough
> > +total uptime KFENCE will detect bugs in code paths not typically exercised by
> > +non-production test workloads. One way to quickly achieve a large enough total
> > +uptime is when the tool is deployed across a large fleet of machines.
> > +
> > +Usage
> > +-----
> > +
> > +To enable KFENCE, configure the kernel with::
> > +
> > +    CONFIG_KFENCE=y
> > +
> > +KFENCE provides several other configuration options to customize behaviour (see
> > +the respective help text in ``lib/Kconfig.kfence`` for more info).
> > +
> > +Tuning performance
> > +~~~~~~~~~~~~~~~~~~
> > +
> > +The most important parameter is KFENCE's sample interval, which can be set via
> > +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> > +sample interval determines the frequency with which heap allocations will be
> > +guarded by KFENCE. The default is configurable via the Kconfig option
> > +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> > +disables KFENCE.
> > +
> > +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> > +of available guarded objects can be controlled. Each object requires 2 pages,
> > +one for the object itself and the other one used as a guard page; object pages
> > +are interleaved with guard pages, and every object page is therefore surrounded
> > +by two guard pages.
> > +
> > +The total memory dedicated to the KFENCE memory pool can be computed as::
> > +
> > +    ( #objects + 1 ) * 2 * PAGE_SIZE
> > +
> > +Using the default config, and assuming a page size of 4 KiB, results in
> > +dedicating 2 MiB to the KFENCE memory pool.
> > +
> > +Error reports
> > +~~~~~~~~~~~~~
> > +
> > +A typical out-of-bounds access looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
> > +
> > +    Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
> > +     test_out_of_bounds_read+0xa3/0x22b
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:
>
> Does the user need to know that this is object #17? This doesn't seem
> like something that can be useful for anything.

Some arguments for keeping it:

- We need to write something like "left of <object>". And then we need
to say where <object> is allocated. Giving objects names makes it
easier to understand the link between "left of <object>" and the
stacktrace shown after "<object> allocated in". We could make <object>
just "object", but reading "left/right of object" and then "object
allocated in:" can be a little confusing.

- We can look up the object via its number in the debugfs objects list
(/sys/kernel/debug/kfence/objects). For example, if we see an OOB
access, we can then check the objects file and see if the object is
still allocated or not, or if it has been recycled.

I don't believe it's distracting anyone, and if there is a chance that
keeping this information can help debug a problem, we ought to keep
it.

> > +     __kfence_alloc+0x42d/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_out_of_bounds_read+0x98/0x22b
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +The header of the report provides a short summary of the function involved in
> > +the access. It is followed by more detailed information about the access and
> > +its origin.
> > +
> > +Use-after-free accesses are reported as::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
> > +
> > +    Use-after-free access at 0xffffffffb673dfe0:
> > +     test_use_after_free_read+0xb3/0x143
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:
>
> Same here.
>
> Also, this says object #24, but the stack trace above doesn't mention
> which object it is. Is it the same one?

Right, the above stacktrace should then say "kfence-#24". (But the
address also hints at this.)

> > +     __kfence_alloc+0x277/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_use_after_free_read+0x76/0x143
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +    freed in:
> > +     kfence_guarded_free+0x158/0x380
> > +     __kfence_free+0x38/0xc0
> > +     test_use_after_free_read+0xa8/0x143
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +KFENCE also reports on invalid frees, such as double-frees::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
> > +
> > +    Invalid free of 0xffffffffb6741000:
> > +     test_double_free+0xdc/0x171
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
> > +     __kfence_alloc+0x42d/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_double_free+0x76/0x171
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +    freed in:
> > +     kfence_guarded_free+0x158/0x380
> > +     __kfence_free+0x38/0xc0
> > +     test_double_free+0xa8/0x171
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +KFENCE also uses pattern-based redzones on the other side of an object's guard
> > +page, to detect out-of-bounds writes on the unprotected side of the object.
> > +These are reported on frees::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
> > +
> > +    Detected corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ]:
>
> It's not really clear what is 0xac here. Value of the corrupted byte?
> What does '.' stand for?

We can probably explain that better below. The values are the corrupt
bytes, the '.' are untouched bytes.

> Also, if this is to be used in production, printing kernel memory
> bytes might lead to info-leaks.

We do not print them if !CONFIG_DEBUG_KERNEL, and instead show '!' for
changed bytes. Maybe we can add this somewhere here as well.

> > +     test_kmalloc_aligned_oob_write+0xef/0x184
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
> > +     __kfence_alloc+0x277/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_kmalloc_aligned_oob_write+0x57/0x184
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +For such errors, the address where the corruption as well as the corrupt bytes
> > +are shown.
> > +
> > +And finally, KFENCE may also report on invalid accesses to any protected page
> > +where it was not possible to determine an associated object, e.g. if adjacent
> > +object pages had not yet been allocated::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
> > +
> > +    Invalid access at 0xffffffffb670b00a:
> > +     test_invalid_access+0x26/0xe0
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +DebugFS interface
> > +~~~~~~~~~~~~~~~~~
> > +
> > +Some debugging information is exposed via debugfs:
> > +
> > +* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
> > +
> > +* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
> > +  allocated via KFENCE, including those already freed but protected.
> > +
> > +Implementation Details
> > +----------------------
> > +
> > +Guarded allocations are set up based on the sample interval. After expiration
> > +of the sample interval, a guarded allocation from the KFENCE object pool is
> > +returned to the main allocator (SLAB or SLUB).
>
> Only for freed allocations, right?

Which "freed allocation"? What this paragraph says is that after the
sample interval elapsed, we'll return a KFENCE allocation on kmalloc.
It doesn't yet talk about freeing.

> > At this point, the timer is
> > +reset, and the next allocation is set up after the expiration of the interval.
> > +To "gate" a KFENCE allocation through the main allocator's fast-path without
> > +overhead, KFENCE relies on static branches via the static keys infrastructure.
> > +The static branch is toggled to redirect the allocation to KFENCE.
> > +
> > +KFENCE objects each reside on a dedicated page, at either the left or right
> > +page boundaries selected at random. The pages to the left and right of the
> > +object page are "guard pages", whose attributes are changed to a protected
> > +state, and cause page faults on any attempted access. Such page faults are then
> > +intercepted by KFENCE, which handles the fault gracefully by reporting an
> > +out-of-bounds access.
>
> I'd start a new paragraph here:
>
> > The side opposite of an object's guard page is used as a
>
> Not a native speaker, but "The side opposite _to_" sounds better. Or
> "The opposite side of".

All are fine. Using "to" indicates direction, which in this case is valid too.

> > +pattern-based redzone, to detect out-of-bounds writes on the unprotected sed of
>
> "sed"?

side

> > +the object on frees (for special alignment and size combinations, both sides of
> > +the object are redzoned).
> > +
> > +KFENCE also uses pattern-based redzones on the other side of an object's guard
> > +page, to detect out-of-bounds writes on the unprotected side of the object;
> > +these are reported on frees.
>
> Not really clear, what is "other side" and how it's different from the
> "opposite side" mentioned above. The figure doesn't really help.

Redzone and guard page sandwich the object. Not sure how I can make it
clearer yet, but I'll try.

> > +
> > +The following figure illustrates the page layout::
> > +
> > +    ---+-----------+-----------+-----------+-----------+-----------+---
> > +       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
> > +       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
> > +       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> > +       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
> > +       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
> > +       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
> > +    ---+-----------+-----------+-----------+-----------+-----------+---
> > +
> > +Upon deallocation of a KFENCE object, the object's page is again protected and
> > +the object is marked as freed. Any further access to the object causes a fault
> > +and KFENCE reports a use-after-free access. Freed objects are inserted at the
> > +tail of KFENCE's freelist, so that the least recently freed objects are reused
> > +first, and the chances of detecting use-after-frees of recently freed objects
> > +is increased.
>
> Seems really similar to KASAN's quarantine? Is the implementation much
> different?

It's a list, and we just insert at the tail. Why does it matter?



Thanks for the comments. I'll try to fix in v2.

Thanks,
-- Marco
Andrey Konovalov Sept. 7, 2020, 5:55 p.m. UTC | #3
On Mon, Sep 7, 2020 at 6:33 PM Marco Elver <elver@google.com> wrote:
>
> On Mon, 7 Sep 2020 at 17:34, Andrey Konovalov <andreyknvl@google.com> wrote:
> >
> > On Mon, Sep 7, 2020 at 3:41 PM Marco Elver <elver@google.com> wrote:
> > >
> > > Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
> > >
> > > Co-developed-by: Alexander Potapenko <glider@google.com>
> > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > ---
> > >  Documentation/dev-tools/index.rst  |   1 +
> > >  Documentation/dev-tools/kfence.rst | 285 +++++++++++++++++++++++++++++
> > >  2 files changed, 286 insertions(+)
> > >  create mode 100644 Documentation/dev-tools/kfence.rst
> > >
> > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > index f7809c7b1ba9..1b1cf4f5c9d9 100644
> > > --- a/Documentation/dev-tools/index.rst
> > > +++ b/Documentation/dev-tools/index.rst
> > > @@ -22,6 +22,7 @@ whole; patches welcome!
> > >     ubsan
> > >     kmemleak
> > >     kcsan
> > > +   kfence
> > >     gdb-kernel-debugging
> > >     kgdb
> > >     kselftest
> > > diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
> > > new file mode 100644
> > > index 000000000000..254f4f089104
> > > --- /dev/null
> > > +++ b/Documentation/dev-tools/kfence.rst
> > > @@ -0,0 +1,285 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +
> > > +Kernel Electric-Fence (KFENCE)
> > > +==============================
> > > +
> > > +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
> > > +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
> > > +invalid-free errors.
> > > +
> > > +KFENCE is designed to be enabled in production kernels, and has near zero
> > > +performance overhead. Compared to KASAN, KFENCE trades performance for
> > > +precision. The main motivation behind KFENCE's design, is that with enough
> > > +total uptime KFENCE will detect bugs in code paths not typically exercised by
> > > +non-production test workloads. One way to quickly achieve a large enough total
> > > +uptime is when the tool is deployed across a large fleet of machines.
> > > +
> > > +Usage
> > > +-----
> > > +
> > > +To enable KFENCE, configure the kernel with::
> > > +
> > > +    CONFIG_KFENCE=y
> > > +
> > > +KFENCE provides several other configuration options to customize behaviour (see
> > > +the respective help text in ``lib/Kconfig.kfence`` for more info).
> > > +
> > > +Tuning performance
> > > +~~~~~~~~~~~~~~~~~~
> > > +
> > > +The most important parameter is KFENCE's sample interval, which can be set via
> > > +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> > > +sample interval determines the frequency with which heap allocations will be
> > > +guarded by KFENCE. The default is configurable via the Kconfig option
> > > +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> > > +disables KFENCE.
> > > +
> > > +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> > > +of available guarded objects can be controlled. Each object requires 2 pages,
> > > +one for the object itself and the other one used as a guard page; object pages
> > > +are interleaved with guard pages, and every object page is therefore surrounded
> > > +by two guard pages.
> > > +
> > > +The total memory dedicated to the KFENCE memory pool can be computed as::
> > > +
> > > +    ( #objects + 1 ) * 2 * PAGE_SIZE
> > > +
> > > +Using the default config, and assuming a page size of 4 KiB, results in
> > > +dedicating 2 MiB to the KFENCE memory pool.
> > > +
> > > +Error reports
> > > +~~~~~~~~~~~~~
> > > +
> > > +A typical out-of-bounds access looks like this::
> > > +
> > > +    ==================================================================
> > > +    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
> > > +
> > > +    Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
> > > +     test_out_of_bounds_read+0xa3/0x22b
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:
> >
> > Does the user need to know that this is object #17? This doesn't seem
> > like something that can be useful for anything.
>
> Some arguments for keeping it:
>
> - We need to write something like "left of <object>". And then we need
> to say where <object> is allocated. Giving objects names makes it
> easier to understand the link between "left of <object>" and the
> stacktrace shown after "<object> allocated in". We could make <object>
> just "object", but reading "left/right of object" and then "object
> allocated in:" can be a little confusing.
>
> - We can look up the object via its number in the debugfs objects list
> (/sys/kernel/debug/kfence/objects). For example, if we see an OOB
> access, we can then check the objects file and see if the object is
> still allocated or not, or if it has been recycled.
>
> I don't believe it's distracting anyone, and if there is a chance that
> keeping this information can help debug a problem, we ought to keep
> it.
>
> > > +     __kfence_alloc+0x42d/0x4c0
> > > +     __kmalloc+0x133/0x200
> > > +     test_alloc+0xf3/0x25b
> > > +     test_out_of_bounds_read+0x98/0x22b
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
> > > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > +    ==================================================================
> > > +
> > > +The header of the report provides a short summary of the function involved in
> > > +the access. It is followed by more detailed information about the access and
> > > +its origin.
> > > +
> > > +Use-after-free accesses are reported as::
> > > +
> > > +    ==================================================================
> > > +    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
> > > +
> > > +    Use-after-free access at 0xffffffffb673dfe0:
> > > +     test_use_after_free_read+0xb3/0x143
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:
> >
> > Same here.
> >
> > Also, this says object #24, but the stack trace above doesn't mention
> > which object it is. Is it the same one?
>
> Right, the above stacktrace should then say "kfence-#24". (But the
> address also hints at this.)
>
> > > +     __kfence_alloc+0x277/0x4c0
> > > +     __kmalloc+0x133/0x200
> > > +     test_alloc+0xf3/0x25b
> > > +     test_use_after_free_read+0x76/0x143
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +    freed in:
> > > +     kfence_guarded_free+0x158/0x380
> > > +     __kfence_free+0x38/0xc0
> > > +     test_use_after_free_read+0xa8/0x143
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > +    ==================================================================
> > > +
> > > +KFENCE also reports on invalid frees, such as double-frees::
> > > +
> > > +    ==================================================================
> > > +    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
> > > +
> > > +    Invalid free of 0xffffffffb6741000:
> > > +     test_double_free+0xdc/0x171
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
> > > +     __kfence_alloc+0x42d/0x4c0
> > > +     __kmalloc+0x133/0x200
> > > +     test_alloc+0xf3/0x25b
> > > +     test_double_free+0x76/0x171
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +    freed in:
> > > +     kfence_guarded_free+0x158/0x380
> > > +     __kfence_free+0x38/0xc0
> > > +     test_double_free+0xa8/0x171
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > +    ==================================================================
> > > +
> > > +KFENCE also uses pattern-based redzones on the other side of an object's guard
> > > +page, to detect out-of-bounds writes on the unprotected side of the object.
> > > +These are reported on frees::
> > > +
> > > +    ==================================================================
> > > +    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
> > > +
> > > +    Detected corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ]:
> >
> > It's not really clear what is 0xac here. Value of the corrupted byte?
> > What does '.' stand for?
>
> We can probably explain that better below. The values are the corrupt
> bytes, the '.' are untouched bytes.
>
> > Also, if this is to be used in production, printing kernel memory
> > bytes might lead to info-leaks.
>
> We do not print them if !CONFIG_DEBUG_KERNEL, and instead show '!' for
> changed bytes. Maybe we can add this somewhere here as well.
>
> > > +     test_kmalloc_aligned_oob_write+0xef/0x184
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
> > > +     __kfence_alloc+0x277/0x4c0
> > > +     __kmalloc+0x133/0x200
> > > +     test_alloc+0xf3/0x25b
> > > +     test_kmalloc_aligned_oob_write+0x57/0x184
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > +    ==================================================================
> > > +
> > > +For such errors, the address where the corruption as well as the corrupt bytes
> > > +are shown.
> > > +
> > > +And finally, KFENCE may also report on invalid accesses to any protected page
> > > +where it was not possible to determine an associated object, e.g. if adjacent
> > > +object pages had not yet been allocated::
> > > +
> > > +    ==================================================================
> > > +    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
> > > +
> > > +    Invalid access at 0xffffffffb670b00a:
> > > +     test_invalid_access+0x26/0xe0
> > > +     kunit_try_run_case+0x51/0x85
> > > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > > +     kthread+0x137/0x160
> > > +     ret_from_fork+0x22/0x30
> > > +
> > > +    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > +    ==================================================================
> > > +
> > > +DebugFS interface
> > > +~~~~~~~~~~~~~~~~~
> > > +
> > > +Some debugging information is exposed via debugfs:
> > > +
> > > +* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
> > > +
> > > +* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
> > > +  allocated via KFENCE, including those already freed but protected.
> > > +
> > > +Implementation Details
> > > +----------------------
> > > +
> > > +Guarded allocations are set up based on the sample interval. After expiration
> > > +of the sample interval, a guarded allocation from the KFENCE object pool is
> > > +returned to the main allocator (SLAB or SLUB).
> >
> > Only for freed allocations, right?
>
> Which "freed allocation"? What this paragraph says is that after the
> sample interval elapsed, we'll return a KFENCE allocation on kmalloc.
> It doesn't yet talk about freeing.

It says that an allocation is returned to the main allocator, and this
is what is usually described with the word "freed". Do you mean
something else here?

> > > At this point, the timer is
> > > +reset, and the next allocation is set up after the expiration of the interval.
> > > +To "gate" a KFENCE allocation through the main allocator's fast-path without
> > > +overhead, KFENCE relies on static branches via the static keys infrastructure.
> > > +The static branch is toggled to redirect the allocation to KFENCE.
> > > +
> > > +KFENCE objects each reside on a dedicated page, at either the left or right
> > > +page boundaries selected at random. The pages to the left and right of the
> > > +object page are "guard pages", whose attributes are changed to a protected
> > > +state, and cause page faults on any attempted access. Such page faults are then
> > > +intercepted by KFENCE, which handles the fault gracefully by reporting an
> > > +out-of-bounds access.
> >
> > I'd start a new paragraph here:
> >
> > > The side opposite of an object's guard page is used as a
> >
> > Not a native speaker, but "The side opposite _to_" sounds better. Or
> > "The opposite side of".
>
> All are fine. Using "to" indicates direction, which in this case is valid too.
>
> > > +pattern-based redzone, to detect out-of-bounds writes on the unprotected sed of
> >
> > "sed"?
>
> side
>
> > > +the object on frees (for special alignment and size combinations, both sides of
> > > +the object are redzoned).
> > > +
> > > +KFENCE also uses pattern-based redzones on the other side of an object's guard
> > > +page, to detect out-of-bounds writes on the unprotected side of the object;
> > > +these are reported on frees.
> >
> > Not really clear, what is "other side" and how it's different from the
> > "opposite side" mentioned above. The figure doesn't really help.
>
> Redzone and guard page sandwich the object. Not sure how I can make it
> clearer yet, but I'll try.
>
> > > +
> > > +The following figure illustrates the page layout::
> > > +
> > > +    ---+-----------+-----------+-----------+-----------+-----------+---
> > > +       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
> > > +       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
> > > +       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> > > +       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
> > > +       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
> > > +       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
> > > +    ---+-----------+-----------+-----------+-----------+-----------+---
> > > +
> > > +Upon deallocation of a KFENCE object, the object's page is again protected and
> > > +the object is marked as freed. Any further access to the object causes a fault
> > > +and KFENCE reports a use-after-free access. Freed objects are inserted at the
> > > +tail of KFENCE's freelist, so that the least recently freed objects are reused
> > > +first, and the chances of detecting use-after-frees of recently freed objects
> > > +is increased.
> >
> > Seems really similar to KASAN's quarantine? Is the implementation much
> > different?
>
> It's a list, and we just insert at the tail. Why does it matter?

If the implementation is similar, we can then reuse quarantine. But I
guess it's not.
Marco Elver Sept. 7, 2020, 6:16 p.m. UTC | #4
On Mon, 7 Sep 2020 at 19:55, Andrey Konovalov <andreyknvl@google.com> wrote:
> On Mon, Sep 7, 2020 at 6:33 PM Marco Elver <elver@google.com> wrote:
[...]
> > > > +Guarded allocations are set up based on the sample interval. After expiration
> > > > +of the sample interval, a guarded allocation from the KFENCE object pool is
> > > > +returned to the main allocator (SLAB or SLUB).
> > >
> > > Only for freed allocations, right?
> >
> > Which "freed allocation"? What this paragraph says is that after the
> > sample interval elapsed, we'll return a KFENCE allocation on kmalloc.
> > It doesn't yet talk about freeing.
>
> It says that an allocation is returned to the main allocator, and this
> is what is usually described with the word "freed". Do you mean
> something else here?

Ah, I see what's goin on. So the "returned to the main allocator" is
ambiguous here. I meant to say "returned" as in kfence gives sl[au]b a
kfence object to return for the next kmalloc. I'll reword this as it
seems the phrase is overloaded in this context already.

[...]
> > > > +Upon deallocation of a KFENCE object, the object's page is again protected and
> > > > +the object is marked as freed. Any further access to the object causes a fault
> > > > +and KFENCE reports a use-after-free access. Freed objects are inserted at the
> > > > +tail of KFENCE's freelist, so that the least recently freed objects are reused
> > > > +first, and the chances of detecting use-after-frees of recently freed objects
> > > > +is increased.
> > >
> > > Seems really similar to KASAN's quarantine? Is the implementation much
> > > different?
> >
> > It's a list, and we just insert at the tail. Why does it matter?
>
> If the implementation is similar, we can then reuse quarantine. But I
> guess it's not.

The concept is similar, but the implementations are very different.
Both use a list (although KASAN quarantine seems to reimplement its
own singly-linked list). We just rely on a standard doubly-linked
list, without any of the delayed freeing logic of the KASAN quarantine
as KFENCE objects just change state to "freed" until they're reused
(freed kfence objects are just inserted at the tail, and the next
object to be used for an allocation is at the head).

Thanks,
-- Marco
Dave Hansen Sept. 8, 2020, 3:54 p.m. UTC | #5
On 9/7/20 6:40 AM, Marco Elver wrote:
> +The most important parameter is KFENCE's sample interval, which can be set via
> +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> +sample interval determines the frequency with which heap allocations will be
> +guarded by KFENCE. The default is configurable via the Kconfig option
> +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> +disables KFENCE.
> +
> +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> +of available guarded objects can be controlled. Each object requires 2 pages,
> +one for the object itself and the other one used as a guard page; object pages
> +are interleaved with guard pages, and every object page is therefore surrounded
> +by two guard pages.

Is it hard to make these both tunable at runtime?

It would be nice if I hit a KFENCE error on a system to bump up the
number of objects and turn up the frequency of guarded objects to try to
hit it again.  That would be a really nice feature for development
environments.

It would also be nice to have a counter somewhere (/proc/vmstat?) to
explicitly say how many pages are currently being used.

I didn't mention it elsewhere, but this work looks really nice.  It has
very little impact on the core kernel and looks like a very nice tool to
have in the toolbox.  I don't see any major reasons we wouldn't want to
merge after our typical bikeshedding. :)
Marco Elver Sept. 8, 2020, 4:14 p.m. UTC | #6
On Tue, Sep 08, 2020 at 08:54AM -0700, Dave Hansen wrote:
> On 9/7/20 6:40 AM, Marco Elver wrote:
> > +The most important parameter is KFENCE's sample interval, which can be set via
> > +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> > +sample interval determines the frequency with which heap allocations will be
> > +guarded by KFENCE. The default is configurable via the Kconfig option
> > +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> > +disables KFENCE.
> > +
> > +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> > +of available guarded objects can be controlled. Each object requires 2 pages,
> > +one for the object itself and the other one used as a guard page; object pages
> > +are interleaved with guard pages, and every object page is therefore surrounded
> > +by two guard pages.
> 
> Is it hard to make these both tunable at runtime?

The number of objects is quite hard, because it really complicates
bookkeeping and might also have an impact on performance, which is why
we prefer the statically allocated pool (like on x86, and we're trying
to get it for arm64 as well).

The sample interval is already tunable, just write to
/sys/module/kfence/parameters/sample_interval. Although we have this
(see core.c):

	module_param_named(sample_interval, kfence_sample_interval, ulong,
			   IS_ENABLED(CONFIG_DEBUG_KERNEL) ? 0600 : 0400);

I was wondering if it should also be tweakable on non-debug kernels, but
I fear it might be abused. Sure, you need to be root to change it, but
maybe I'm being overly conservative here? If you don't see huge problems
with it we could just make it 0600 for all builds.

> It would be nice if I hit a KFENCE error on a system to bump up the
> number of objects and turn up the frequency of guarded objects to try to
> hit it again.  That would be a really nice feature for development
> environments.

Indeed, which is why we also found it might be useful to tweak
sample_interval at runtime for debug-kernels. Although I don't know how
much luck you'll have hitting it again.

My strategy at that point would be to take the stack traces, try to
construct test-cases for those code paths, and run them through KASAN
(if it isn't immediately obvious what the problem is).

> It would also be nice to have a counter somewhere (/proc/vmstat?) to
> explicitly say how many pages are currently being used.

You can check /sys/kernel/debug/kfence/stats. On a system I just booted:

	[root@syzkaller][~]# cat /sys/kernel/debug/kfence/stats
	enabled: 1
	currently allocated: 18
	total allocations: 105
	total frees: 87
	total bugs: 0

The "currently allocated" count is the currently used KFENCE objects (of
255 for the default config).

> I didn't mention it elsewhere, but this work looks really nice.  It has
> very little impact on the core kernel and looks like a very nice tool to
> have in the toolbox.  I don't see any major reasons we wouldn't want to
> merge after our typical bikeshedding. :)

Thank you!

-- Marco
Dmitry Vyukov Sept. 11, 2020, 7:14 a.m. UTC | #7
On Mon, Sep 7, 2020 at 3:41 PM Marco Elver <elver@google.com> wrote:
>
> Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
>
> Co-developed-by: Alexander Potapenko <glider@google.com>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  Documentation/dev-tools/index.rst  |   1 +
>  Documentation/dev-tools/kfence.rst | 285 +++++++++++++++++++++++++++++
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/dev-tools/kfence.rst
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..1b1cf4f5c9d9 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -22,6 +22,7 @@ whole; patches welcome!
>     ubsan
>     kmemleak
>     kcsan
> +   kfence
>     gdb-kernel-debugging
>     kgdb
>     kselftest
> diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
> new file mode 100644
> index 000000000000..254f4f089104
> --- /dev/null
> +++ b/Documentation/dev-tools/kfence.rst
> @@ -0,0 +1,285 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Kernel Electric-Fence (KFENCE)
> +==============================
> +
> +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
> +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
> +invalid-free errors.
> +
> +KFENCE is designed to be enabled in production kernels, and has near zero
> +performance overhead. Compared to KASAN, KFENCE trades performance for
> +precision. The main motivation behind KFENCE's design, is that with enough
> +total uptime KFENCE will detect bugs in code paths not typically exercised by
> +non-production test workloads. One way to quickly achieve a large enough total
> +uptime is when the tool is deployed across a large fleet of machines.
> +
> +Usage
> +-----
> +
> +To enable KFENCE, configure the kernel with::
> +
> +    CONFIG_KFENCE=y
> +
> +KFENCE provides several other configuration options to customize behaviour (see
> +the respective help text in ``lib/Kconfig.kfence`` for more info).
> +
> +Tuning performance
> +~~~~~~~~~~~~~~~~~~
> +
> +The most important parameter is KFENCE's sample interval, which can be set via
> +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> +sample interval determines the frequency with which heap allocations will be
> +guarded by KFENCE. The default is configurable via the Kconfig option
> +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> +disables KFENCE.
> +
> +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> +of available guarded objects can be controlled. Each object requires 2 pages,
> +one for the object itself and the other one used as a guard page; object pages
> +are interleaved with guard pages, and every object page is therefore surrounded
> +by two guard pages.
> +
> +The total memory dedicated to the KFENCE memory pool can be computed as::
> +
> +    ( #objects + 1 ) * 2 * PAGE_SIZE
> +
> +Using the default config, and assuming a page size of 4 KiB, results in
> +dedicating 2 MiB to the KFENCE memory pool.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical out-of-bounds access looks like this::
> +
> +    ==================================================================
> +    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
> +
> +    Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
> +     test_out_of_bounds_read+0xa3/0x22b
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:
> +     __kfence_alloc+0x42d/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_out_of_bounds_read+0x98/0x22b
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the function involved in
> +the access. It is followed by more detailed information about the access and
> +its origin.
> +
> +Use-after-free accesses are reported as::
> +
> +    ==================================================================
> +    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
> +
> +    Use-after-free access at 0xffffffffb673dfe0:
> +     test_use_after_free_read+0xb3/0x143
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:
> +     __kfence_alloc+0x277/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_use_after_free_read+0x76/0x143
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30

Empty line between stacks for consistency and readability.

> +    freed in:
> +     kfence_guarded_free+0x158/0x380
> +     __kfence_free+0x38/0xc0
> +     test_use_after_free_read+0xa8/0x143
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +KFENCE also reports on invalid frees, such as double-frees::
> +
> +    ==================================================================
> +    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
> +
> +    Invalid free of 0xffffffffb6741000:
> +     test_double_free+0xdc/0x171
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
> +     __kfence_alloc+0x42d/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_double_free+0x76/0x171
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +    freed in:
> +     kfence_guarded_free+0x158/0x380
> +     __kfence_free+0x38/0xc0
> +     test_double_free+0xa8/0x171
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +KFENCE also uses pattern-based redzones on the other side of an object's guard
> +page, to detect out-of-bounds writes on the unprotected side of the object.
> +These are reported on frees::
> +
> +    ==================================================================
> +    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
> +
> +    Detected corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ]:
> +     test_kmalloc_aligned_oob_write+0xef/0x184
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
> +     __kfence_alloc+0x277/0x4c0
> +     __kmalloc+0x133/0x200
> +     test_alloc+0xf3/0x25b
> +     test_kmalloc_aligned_oob_write+0x57/0x184
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +For such errors, the address where the corruption as well as the corrupt bytes
> +are shown.
> +
> +And finally, KFENCE may also report on invalid accesses to any protected page
> +where it was not possible to determine an associated object, e.g. if adjacent
> +object pages had not yet been allocated::
> +
> +    ==================================================================
> +    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
> +
> +    Invalid access at 0xffffffffb670b00a:
> +     test_invalid_access+0x26/0xe0
> +     kunit_try_run_case+0x51/0x85
> +     kunit_generic_run_threadfn_adapter+0x16/0x30
> +     kthread+0x137/0x160
> +     ret_from_fork+0x22/0x30
> +
> +    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> +    ==================================================================
> +
> +DebugFS interface
> +~~~~~~~~~~~~~~~~~
> +
> +Some debugging information is exposed via debugfs:
> +
> +* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
> +
> +* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
> +  allocated via KFENCE, including those already freed but protected.
> +
> +Implementation Details
> +----------------------
> +
> +Guarded allocations are set up based on the sample interval. After expiration
> +of the sample interval, a guarded allocation from the KFENCE object pool is
> +returned to the main allocator (SLAB or SLUB). At this point, the timer is
> +reset, and the next allocation is set up after the expiration of the interval.
> +To "gate" a KFENCE allocation through the main allocator's fast-path without
> +overhead, KFENCE relies on static branches via the static keys infrastructure.
> +The static branch is toggled to redirect the allocation to KFENCE.
> +
> +KFENCE objects each reside on a dedicated page, at either the left or right

Do we mention anywhere explicitly that KFENCE currently only supports
allocations <=page_size?
May be worth mentioning. It kinda follows from implementation but
quite implicitly. One may also be confused assuming KFENCE handles
larger allocations, but then not being able to figure out.

> +page boundaries selected at random. The pages to the left and right of the
> +object page are "guard pages", whose attributes are changed to a protected
> +state, and cause page faults on any attempted access. Such page faults are then
> +intercepted by KFENCE, which handles the fault gracefully by reporting an
> +out-of-bounds access. The side opposite of an object's guard page is used as a
> +pattern-based redzone, to detect out-of-bounds writes on the unprotected sed of
> +the object on frees (for special alignment and size combinations, both sides of
> +the object are redzoned).
> +
> +KFENCE also uses pattern-based redzones on the other side of an object's guard
> +page, to detect out-of-bounds writes on the unprotected side of the object;
> +these are reported on frees.
> +
> +The following figure illustrates the page layout::
> +
> +    ---+-----------+-----------+-----------+-----------+-----------+---
> +       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
> +       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
> +       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> +       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
> +       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
> +       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
> +    ---+-----------+-----------+-----------+-----------+-----------+---
> +
> +Upon deallocation of a KFENCE object, the object's page is again protected and
> +the object is marked as freed. Any further access to the object causes a fault
> +and KFENCE reports a use-after-free access. Freed objects are inserted at the
> +tail of KFENCE's freelist, so that the least recently freed objects are reused
> +first, and the chances of detecting use-after-frees of recently freed objects
> +is increased.
> +
> +Interface
> +---------
> +
> +The following describes the functions which are used by allocators as well page
> +handling code to set up and deal with KFENCE allocations.
> +
> +.. kernel-doc:: include/linux/kfence.h
> +   :functions: is_kfence_address
> +               kfence_shutdown_cache
> +               kfence_alloc kfence_free
> +               kfence_ksize kfence_object_start
> +               kfence_handle_page_fault
> +
> +Related Tools
> +-------------
> +
> +In userspace, a similar approach is taken by `GWP-ASan
> +<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
> +a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
> +directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
> +similar but non-sampling approach, that also inspired the name "KFENCE", can be
> +found in the userspace `Electric Fence Malloc Debugger
> +<https://linux.die.net/man/3/efence>`_.
> +
> +In the kernel, several tools exist to debug memory access errors, and in
> +particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
> +is more precise, relying on compiler instrumentation, this comes at a
> +performance cost. We want to highlight that KASAN and KFENCE are complementary,
> +with different target environments. For instance, KASAN is the better
> +debugging-aid, where a simple reproducer exists: due to the lower chance to
> +detect the error, it would require more effort using KFENCE to debug.
> +Deployments at scale, however, would benefit from using KFENCE to discover bugs
> +due to code paths not exercised by test cases or fuzzers.
> --
> 2.28.0.526.ge36021eeef-goog
>
Marco Elver Sept. 11, 2020, 7:46 a.m. UTC | #8
On Fri, 11 Sep 2020 at 09:14, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Mon, Sep 7, 2020 at 3:41 PM Marco Elver <elver@google.com> wrote:
> >
> > Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
> >
> > Co-developed-by: Alexander Potapenko <glider@google.com>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  Documentation/dev-tools/index.rst  |   1 +
> >  Documentation/dev-tools/kfence.rst | 285 +++++++++++++++++++++++++++++
> >  2 files changed, 286 insertions(+)
> >  create mode 100644 Documentation/dev-tools/kfence.rst
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9..1b1cf4f5c9d9 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -22,6 +22,7 @@ whole; patches welcome!
> >     ubsan
> >     kmemleak
> >     kcsan
> > +   kfence
> >     gdb-kernel-debugging
> >     kgdb
> >     kselftest
> > diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
> > new file mode 100644
> > index 000000000000..254f4f089104
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kfence.rst
> > @@ -0,0 +1,285 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +Kernel Electric-Fence (KFENCE)
> > +==============================
> > +
> > +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
> > +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
> > +invalid-free errors.
> > +
> > +KFENCE is designed to be enabled in production kernels, and has near zero
> > +performance overhead. Compared to KASAN, KFENCE trades performance for
> > +precision. The main motivation behind KFENCE's design, is that with enough
> > +total uptime KFENCE will detect bugs in code paths not typically exercised by
> > +non-production test workloads. One way to quickly achieve a large enough total
> > +uptime is when the tool is deployed across a large fleet of machines.
> > +
> > +Usage
> > +-----
> > +
> > +To enable KFENCE, configure the kernel with::
> > +
> > +    CONFIG_KFENCE=y
> > +
> > +KFENCE provides several other configuration options to customize behaviour (see
> > +the respective help text in ``lib/Kconfig.kfence`` for more info).
> > +
> > +Tuning performance
> > +~~~~~~~~~~~~~~~~~~
> > +
> > +The most important parameter is KFENCE's sample interval, which can be set via
> > +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
> > +sample interval determines the frequency with which heap allocations will be
> > +guarded by KFENCE. The default is configurable via the Kconfig option
> > +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
> > +disables KFENCE.
> > +
> > +With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
> > +of available guarded objects can be controlled. Each object requires 2 pages,
> > +one for the object itself and the other one used as a guard page; object pages
> > +are interleaved with guard pages, and every object page is therefore surrounded
> > +by two guard pages.
> > +
> > +The total memory dedicated to the KFENCE memory pool can be computed as::
> > +
> > +    ( #objects + 1 ) * 2 * PAGE_SIZE
> > +
> > +Using the default config, and assuming a page size of 4 KiB, results in
> > +dedicating 2 MiB to the KFENCE memory pool.
> > +
> > +Error reports
> > +~~~~~~~~~~~~~
> > +
> > +A typical out-of-bounds access looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
> > +
> > +    Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
> > +     test_out_of_bounds_read+0xa3/0x22b
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:
> > +     __kfence_alloc+0x42d/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_out_of_bounds_read+0x98/0x22b
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +The header of the report provides a short summary of the function involved in
> > +the access. It is followed by more detailed information about the access and
> > +its origin.
> > +
> > +Use-after-free accesses are reported as::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
> > +
> > +    Use-after-free access at 0xffffffffb673dfe0:
> > +     test_use_after_free_read+0xb3/0x143
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:
> > +     __kfence_alloc+0x277/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_use_after_free_read+0x76/0x143
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
>
> Empty line between stacks for consistency and readability.

Done for v2.

> > +    freed in:
> > +     kfence_guarded_free+0x158/0x380
> > +     __kfence_free+0x38/0xc0
> > +     test_use_after_free_read+0xa8/0x143
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +KFENCE also reports on invalid frees, such as double-frees::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
> > +
> > +    Invalid free of 0xffffffffb6741000:
> > +     test_double_free+0xdc/0x171
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
> > +     __kfence_alloc+0x42d/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_double_free+0x76/0x171
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +    freed in:
> > +     kfence_guarded_free+0x158/0x380
> > +     __kfence_free+0x38/0xc0
> > +     test_double_free+0xa8/0x171
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +KFENCE also uses pattern-based redzones on the other side of an object's guard
> > +page, to detect out-of-bounds writes on the unprotected side of the object.
> > +These are reported on frees::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
> > +
> > +    Detected corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ]:
> > +     test_kmalloc_aligned_oob_write+0xef/0x184
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
> > +     __kfence_alloc+0x277/0x4c0
> > +     __kmalloc+0x133/0x200
> > +     test_alloc+0xf3/0x25b
> > +     test_kmalloc_aligned_oob_write+0x57/0x184
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +For such errors, the address where the corruption as well as the corrupt bytes
> > +are shown.
> > +
> > +And finally, KFENCE may also report on invalid accesses to any protected page
> > +where it was not possible to determine an associated object, e.g. if adjacent
> > +object pages had not yet been allocated::
> > +
> > +    ==================================================================
> > +    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
> > +
> > +    Invalid access at 0xffffffffb670b00a:
> > +     test_invalid_access+0x26/0xe0
> > +     kunit_try_run_case+0x51/0x85
> > +     kunit_generic_run_threadfn_adapter+0x16/0x30
> > +     kthread+0x137/0x160
> > +     ret_from_fork+0x22/0x30
> > +
> > +    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +DebugFS interface
> > +~~~~~~~~~~~~~~~~~
> > +
> > +Some debugging information is exposed via debugfs:
> > +
> > +* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
> > +
> > +* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
> > +  allocated via KFENCE, including those already freed but protected.
> > +
> > +Implementation Details
> > +----------------------
> > +
> > +Guarded allocations are set up based on the sample interval. After expiration
> > +of the sample interval, a guarded allocation from the KFENCE object pool is
> > +returned to the main allocator (SLAB or SLUB). At this point, the timer is
> > +reset, and the next allocation is set up after the expiration of the interval.
> > +To "gate" a KFENCE allocation through the main allocator's fast-path without
> > +overhead, KFENCE relies on static branches via the static keys infrastructure.
> > +The static branch is toggled to redirect the allocation to KFENCE.
> > +
> > +KFENCE objects each reside on a dedicated page, at either the left or right
>
> Do we mention anywhere explicitly that KFENCE currently only supports
> allocations <=page_size?
> May be worth mentioning. It kinda follows from implementation but
> quite implicitly. One may also be confused assuming KFENCE handles
> larger allocations, but then not being able to figure out.

We can add a note that "allocation sizes up to PAGE_SIZE are supported".

Thanks,
-- Marco
diff mbox series

Patch

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..1b1cf4f5c9d9 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -22,6 +22,7 @@  whole; patches welcome!
    ubsan
    kmemleak
    kcsan
+   kfence
    gdb-kernel-debugging
    kgdb
    kselftest
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
new file mode 100644
index 000000000000..254f4f089104
--- /dev/null
+++ b/Documentation/dev-tools/kfence.rst
@@ -0,0 +1,285 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel Electric-Fence (KFENCE)
+==============================
+
+Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
+error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
+invalid-free errors.
+
+KFENCE is designed to be enabled in production kernels, and has near zero
+performance overhead. Compared to KASAN, KFENCE trades performance for
+precision. The main motivation behind KFENCE's design, is that with enough
+total uptime KFENCE will detect bugs in code paths not typically exercised by
+non-production test workloads. One way to quickly achieve a large enough total
+uptime is when the tool is deployed across a large fleet of machines.
+
+Usage
+-----
+
+To enable KFENCE, configure the kernel with::
+
+    CONFIG_KFENCE=y
+
+KFENCE provides several other configuration options to customize behaviour (see
+the respective help text in ``lib/Kconfig.kfence`` for more info).
+
+Tuning performance
+~~~~~~~~~~~~~~~~~~
+
+The most important parameter is KFENCE's sample interval, which can be set via
+the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
+sample interval determines the frequency with which heap allocations will be
+guarded by KFENCE. The default is configurable via the Kconfig option
+``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
+disables KFENCE.
+
+With the Kconfig option ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number
+of available guarded objects can be controlled. Each object requires 2 pages,
+one for the object itself and the other one used as a guard page; object pages
+are interleaved with guard pages, and every object page is therefore surrounded
+by two guard pages.
+
+The total memory dedicated to the KFENCE memory pool can be computed as::
+
+    ( #objects + 1 ) * 2 * PAGE_SIZE
+
+Using the default config, and assuming a page size of 4 KiB, results in
+dedicating 2 MiB to the KFENCE memory pool.
+
+Error reports
+~~~~~~~~~~~~~
+
+A typical out-of-bounds access looks like this::
+
+    ==================================================================
+    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
+
+    Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
+     test_out_of_bounds_read+0xa3/0x22b
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:
+     __kfence_alloc+0x42d/0x4c0
+     __kmalloc+0x133/0x200
+     test_alloc+0xf3/0x25b
+     test_out_of_bounds_read+0x98/0x22b
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+The header of the report provides a short summary of the function involved in
+the access. It is followed by more detailed information about the access and
+its origin.
+
+Use-after-free accesses are reported as::
+
+    ==================================================================
+    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
+
+    Use-after-free access at 0xffffffffb673dfe0:
+     test_use_after_free_read+0xb3/0x143
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:
+     __kfence_alloc+0x277/0x4c0
+     __kmalloc+0x133/0x200
+     test_alloc+0xf3/0x25b
+     test_use_after_free_read+0x76/0x143
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+    freed in:
+     kfence_guarded_free+0x158/0x380
+     __kfence_free+0x38/0xc0
+     test_use_after_free_read+0xa8/0x143
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+KFENCE also reports on invalid frees, such as double-frees::
+
+    ==================================================================
+    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
+
+    Invalid free of 0xffffffffb6741000:
+     test_double_free+0xdc/0x171
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
+     __kfence_alloc+0x42d/0x4c0
+     __kmalloc+0x133/0x200
+     test_alloc+0xf3/0x25b
+     test_double_free+0x76/0x171
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+    freed in:
+     kfence_guarded_free+0x158/0x380
+     __kfence_free+0x38/0xc0
+     test_double_free+0xa8/0x171
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+KFENCE also uses pattern-based redzones on the other side of an object's guard
+page, to detect out-of-bounds writes on the unprotected side of the object.
+These are reported on frees::
+
+    ==================================================================
+    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
+
+    Detected corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ]:
+     test_kmalloc_aligned_oob_write+0xef/0x184
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
+     __kfence_alloc+0x277/0x4c0
+     __kmalloc+0x133/0x200
+     test_alloc+0xf3/0x25b
+     test_kmalloc_aligned_oob_write+0x57/0x184
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+For such errors, the address where the corruption as well as the corrupt bytes
+are shown.
+
+And finally, KFENCE may also report on invalid accesses to any protected page
+where it was not possible to determine an associated object, e.g. if adjacent
+object pages had not yet been allocated::
+
+    ==================================================================
+    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
+
+    Invalid access at 0xffffffffb670b00a:
+     test_invalid_access+0x26/0xe0
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+DebugFS interface
+~~~~~~~~~~~~~~~~~
+
+Some debugging information is exposed via debugfs:
+
+* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
+
+* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
+  allocated via KFENCE, including those already freed but protected.
+
+Implementation Details
+----------------------
+
+Guarded allocations are set up based on the sample interval. After expiration
+of the sample interval, a guarded allocation from the KFENCE object pool is
+returned to the main allocator (SLAB or SLUB). At this point, the timer is
+reset, and the next allocation is set up after the expiration of the interval.
+To "gate" a KFENCE allocation through the main allocator's fast-path without
+overhead, KFENCE relies on static branches via the static keys infrastructure.
+The static branch is toggled to redirect the allocation to KFENCE.
+
+KFENCE objects each reside on a dedicated page, at either the left or right
+page boundaries selected at random. The pages to the left and right of the
+object page are "guard pages", whose attributes are changed to a protected
+state, and cause page faults on any attempted access. Such page faults are then
+intercepted by KFENCE, which handles the fault gracefully by reporting an
+out-of-bounds access. The side opposite of an object's guard page is used as a
+pattern-based redzone, to detect out-of-bounds writes on the unprotected sed of
+the object on frees (for special alignment and size combinations, both sides of
+the object are redzoned).
+
+KFENCE also uses pattern-based redzones on the other side of an object's guard
+page, to detect out-of-bounds writes on the unprotected side of the object;
+these are reported on frees.
+
+The following figure illustrates the page layout::
+
+    ---+-----------+-----------+-----------+-----------+-----------+---
+       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
+       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
+       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
+       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
+       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
+       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
+    ---+-----------+-----------+-----------+-----------+-----------+---
+
+Upon deallocation of a KFENCE object, the object's page is again protected and
+the object is marked as freed. Any further access to the object causes a fault
+and KFENCE reports a use-after-free access. Freed objects are inserted at the
+tail of KFENCE's freelist, so that the least recently freed objects are reused
+first, and the chances of detecting use-after-frees of recently freed objects
+is increased.
+
+Interface
+---------
+
+The following describes the functions which are used by allocators as well page
+handling code to set up and deal with KFENCE allocations.
+
+.. kernel-doc:: include/linux/kfence.h
+   :functions: is_kfence_address
+               kfence_shutdown_cache
+               kfence_alloc kfence_free
+               kfence_ksize kfence_object_start
+               kfence_handle_page_fault
+
+Related Tools
+-------------
+
+In userspace, a similar approach is taken by `GWP-ASan
+<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
+a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
+directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
+similar but non-sampling approach, that also inspired the name "KFENCE", can be
+found in the userspace `Electric Fence Malloc Debugger
+<https://linux.die.net/man/3/efence>`_.
+
+In the kernel, several tools exist to debug memory access errors, and in
+particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
+is more precise, relying on compiler instrumentation, this comes at a
+performance cost. We want to highlight that KASAN and KFENCE are complementary,
+with different target environments. For instance, KASAN is the better
+debugging-aid, where a simple reproducer exists: due to the lower chance to
+detect the error, it would require more effort using KFENCE to debug.
+Deployments at scale, however, would benefit from using KFENCE to discover bugs
+due to code paths not exercised by test cases or fuzzers.