[RFC,v3,05/36] kmsan: add ReST documentation

Message ID	20191122112621.204798-6-glider@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=DNi/=ZO=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BDE9320674 Date: Fri, 22 Nov 2019 12:25:50 +0100 In-Reply-To: <20191122112621.204798-1-glider@google.com> Message-Id: <20191122112621.204798-6-glider@google.com> Mime-Version: 1.0 References: <20191122112621.204798-1-glider@google.com> Subject: [PATCH RFC v3 05/36] kmsan: add ReST documentation From: glider@google.com To: Vegard Nossum <vegard.nossum@oracle.com>, Dmitry Vyukov <dvyukov@google.com>, linux-mm@kvack.org Cc: glider@google.com, viro@zeniv.linux.org.uk, adilger.kernel@dilger.ca, akpm@linux-foundation.org, andreyknvl@google.com, aryabinin@virtuozzo.com, luto@kernel.org, ard.biesheuvel@linaro.org, arnd@arndb.de, hch@infradead.org, hch@lst.de, darrick.wong@oracle.com, davem@davemloft.net, dmitry.torokhov@gmail.com, ebiggers@google.com, edumazet@google.com, ericvh@gmail.com, gregkh@linuxfoundation.org, harry.wentland@amd.com, herbert@gondor.apana.org.au, iii@linux.ibm.com, mingo@elte.hu, jasowang@redhat.com, axboe@kernel.dk, m.szyprowski@samsung.com, elver@google.com, mark.rutland@arm.com, martin.petersen@oracle.com, schwidefsky@de.ibm.com, willy@infradead.org, mst@redhat.com, monstr@monstr.eu, pmladek@suse.com, cai@lca.pw, rdunlap@infradead.org, robin.murphy@arm.com, sergey.senozhatsky@gmail.com, rostedt@goodmis.org, tiwai@suse.com, tytso@mit.edu, tglx@linutronix.de, gor@linux.ibm.com, wsa@the-dreams.de Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Add KernelMemorySanitizer infrastructure \| expand [RFC,v3,00/36] Add KernelMemorySanitizer infrastructure [RFC,v3,01/36] stackdepot: check depot_index before accessing the stack slab [RFC,v3,02/36] stackdepot: build with -fno-builtin [RFC,v3,03/36] kasan: stackdepot: move filter_irq_stacks() to stackdepot.c [RFC,v3,04/36] stackdepot: reserve 5 extra bits in depot_stack_handle_t [RFC,v3,05/36] kmsan: add ReST documentation [RFC,v3,06/36] kmsan: gfp: introduce __GFP_NO_KMSAN_SHADOW [RFC,v3,07/36] kmsan: introduce __no_sanitize_memory and __SANITIZE_MEMORY__ [RFC,v3,08/36] kmsan: reduce vmalloc space [RFC,v3,09/36] kmsan: add KMSAN bits to struct page and struct task_struct [RFC,v3,10/36] kmsan: add KMSAN runtime [RFC,v3,11/36] kmsan: stackdepot: don't allocate KMSAN metadata for stackdepot [RFC,v3,12/36] kmsan: define READ_ONCE_NOCHECK() [RFC,v3,13/36] kmsan: make READ_ONCE_TASK_STACK() return initialized values [RFC,v3,14/36] kmsan: x86: sync metadata pages on page fault [RFC,v3,15/36] kmsan: add tests for KMSAN [RFC,v3,16/36] crypto: kmsan: disable accelerated configs under KMSAN [RFC,v3,17/36] kmsan: x86: disable UNWINDER_ORC under KMSAN [RFC,v3,18/36] kmsan: disable LOCK_DEBUGGING_SUPPORT [RFC,v3,20/36] kmsan: x86: increase stack sizes in KMSAN builds [RFC,v3,21/36] kmsan: disable KMSAN instrumentation for certain kernel parts [RFC,v3,22/36] kmsan: mm: call KMSAN hooks from SLUB code [RFC,v3,23/36] kmsan: call KMSAN hooks where needed [RFC,v3,24/36] kmsan: disable instrumentation of certain functions [RFC,v3,25/36] kmsan: unpoison \|tlb\| in arch_tlb_gather_mmu() [RFC,v3,26/36] kmsan: use __msan_memcpy() where possible. [RFC,v3,27/36] kmsan: hooks for copy_to_user() and friends [RFC,v3,28/36] kmsan: enable KMSAN builds [RFC,v3,29/36] kmsan: handle /dev/[u]random [RFC,v3,30/36] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg() [RFC,v3,31/36] kmsan: disable strscpy() optimization under KMSAN [RFC,v3,32/36] kmsan: add iomap support [RFC,v3,33/36] kmsan: dma: unpoison memory mapped by dma_direct_map_page() [RFC,v3,34/36] kmsan: disable physical page merging in biovec [RFC,v3,35/36] kmsan: ext4: skip block merging logic in ext4_mpage_readpages for KMSAN [RFC,v3,36/36] net: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN

Message ID

20191122112621.204798-6-glider@google.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BDE9320674
Date: Fri, 22 Nov 2019 12:25:50 +0100
In-Reply-To: <20191122112621.204798-1-glider@google.com>
Message-Id: <20191122112621.204798-6-glider@google.com>
Mime-Version: 1.0
References: <20191122112621.204798-1-glider@google.com>
Subject: [PATCH RFC v3 05/36] kmsan: add ReST documentation
From: glider@google.com
To: Vegard Nossum <vegard.nossum@oracle.com>,
 Dmitry Vyukov <dvyukov@google.com>, linux-mm@kvack.org
Cc: glider@google.com, viro@zeniv.linux.org.uk, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, andreyknvl@google.com, aryabinin@virtuozzo.com,
	luto@kernel.org, ard.biesheuvel@linaro.org, arnd@arndb.de, hch@infradead.org,
	hch@lst.de, darrick.wong@oracle.com, davem@davemloft.net,
	dmitry.torokhov@gmail.com, ebiggers@google.com, edumazet@google.com,
	ericvh@gmail.com, gregkh@linuxfoundation.org, harry.wentland@amd.com,
	herbert@gondor.apana.org.au, iii@linux.ibm.com, mingo@elte.hu,
	jasowang@redhat.com, axboe@kernel.dk, m.szyprowski@samsung.com,
	elver@google.com, mark.rutland@arm.com, martin.petersen@oracle.com,
	schwidefsky@de.ibm.com, willy@infradead.org, mst@redhat.com,
 monstr@monstr.eu,
	pmladek@suse.com, cai@lca.pw, rdunlap@infradead.org, robin.murphy@arm.com,
	sergey.senozhatsky@gmail.com, rostedt@goodmis.org, tiwai@suse.com,
	tytso@mit.edu, tglx@linutronix.de, gor@linux.ibm.com, wsa@the-dreams.de
Content-Type: text/plain; charset="UTF-8"
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Add KernelMemorySanitizer infrastructure | expand

Commit Message

Alexander Potapenko Nov. 22, 2019, 11:25 a.m. UTC

Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: linux-mm@kvack.org

---

Change-Id: Iac6345065e6804ef811f1124fdf779c67ff1530e
---
 Documentation/dev-tools/index.rst |   1 +
 Documentation/dev-tools/kmsan.rst | 418 ++++++++++++++++++++++++++++++
 2 files changed, 419 insertions(+)
 create mode 100644 Documentation/dev-tools/kmsan.rst

Comments

Marco Elver Nov. 27, 2019, 2:22 p.m. UTC | #1

General comments:
* it's -> it is
* don't -> do not

On Fri, 22 Nov 2019 at 12:26, <glider@google.com> wrote:
[...]
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index b0522a4dd107..bc5e3fd87efa 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -19,6 +19,7 @@ whole; patches welcome!
>     kcov
>     gcov
>     kasan
> +   kmsan
>     ubsan
>     kmemleak
>     gdb-kernel-debugging
> diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
> new file mode 100644
> index 000000000000..51f9c207cc2c
> --- /dev/null
> +++ b/Documentation/dev-tools/kmsan.rst
> @@ -0,0 +1,418 @@
> +=============================
> +KernelMemorySanitizer (KMSAN)
> +=============================
> +
> +KMSAN is a dynamic memory error detector aimed at finding uses of uninitialized
> +memory.
> +It is based on compiler instrumentation, and is quite similar to the userspace
> +MemorySanitizer tool (http://clang.llvm.org/docs/MemorySanitizer.html).

These should be real links:  `Memory sanitizer tool <...url...>`_.

> +KMSAN and Clang
> +===============
> +
> +In order for KMSAN to work the kernel must be
> +built with Clang, which is so far the only compiler that has KMSAN support.

"is so far" -> "so far is"

> +The kernel instrumentation pass is based on the userspace MemorySanitizer tool
> +(http://clang.llvm.org/docs/MemorySanitizer.html). Because of the

Should also be real link: `MemorySanitizer tool <..url..>`_

> +instrumentation complexity it's unlikely that any other compiler will support
> +KMSAN soon.
> +
> +Right now the instrumentation pass supports x86_64 only.
> +
> +How to build
> +============
> +
> +In order to build a kernel with KMSAN you'll need a fresh Clang (10.0.0+, trunk
> +version r365008 or greater). Please refer to
> +https://llvm.org/docs/GettingStarted.html for the instructions on how to build
> +Clang::
> +
> +  export KMSAN_CLANG_PATH=/path/to/clang
>
> +  # Now configure and build the kernel with CONFIG_KMSAN enabled.
> +  make CC=$KMSAN_CLANG_PATH -j64

I don't think '-j64' is necessary to build.  Also the 'export' is
technically not required AFAIK, but I don't think it bothers anyone.

> +How KMSAN works
> +===============
> +
> +KMSAN shadow memory
> +-------------------
> +
> +KMSAN associates a so-called shadow byte with every byte of kernel memory.

'shadow' memory may not be a well-defined term. More intuitive would
be saying that it's metadata associated with every byte of kernel
memory. From then on you can say it's shadow memory.

> +A bit in the shadow byte is set iff the corresponding bit of the kernel memory
> +byte is uninitialized.
> +Marking the memory uninitialized (i.e. setting its shadow bytes to 0xff) is
> +called poisoning, marking it initialized (setting the shadow bytes to 0x00) is
> +called unpoisoning.
> +
> +When a new variable is allocated on the stack, it's poisoned by default by
> +instrumentation code inserted by the compiler (unless it's a stack variable that
> +is immediately initialized). Any new heap allocation done without ``__GFP_ZERO``
> +is also poisoned.
> +
> +Compiler instrumentation also tracks the shadow values with the help from the
> +runtime library in ``mm/kmsan/``.
> +
> +The shadow value of a basic or compound type is an array of bytes of the same
> +length.
> +When a constant value is written into memory, that memory is unpoisoned.
> +When a value is read from memory, its shadow memory is also obtained and
> +propagated into all the operations which use that value. For every instruction
> +that takes one or more values the compiler generates code that calculates the
> +shadow of the result depending on those values and their shadows.
> +
> +Example::
> +
> +  int a = 0xff;
> +  int b;
> +  int c = a | b;
> +
> +In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
> +shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
> +``c`` are uninitialized, while the lower byte is initialized.
> +
> +
> +Origin tracking
> +---------------
> +
> +Every four bytes of kernel memory also have a so-called origin assigned to
> +them.
> +This origin describes the point in program execution at which the uninitialized
> +value was created. Every origin is associated with a creation stack, which lets
> +the user figure out what's going on.
> +
> +When an uninitialized variable is allocated on stack or heap, a new origin
> +value is created, and that variable's origin is filled with that value.
> +When a value is read from memory, its origin is also read and kept together
> +with the shadow. For every instruction that takes one or more values the origin
> +of the result is one of the origins corresponding to any of the uninitialized
> +inputs.
> +If a poisoned value is written into memory, its origin is written to the
> +corresponding storage as well.
> +
> +Example 1::
> +
> +  int a = 0;
> +  int b;
> +  int c = a + b;
> +
> +In this case the origin of ``b`` is generated upon function entry, and is
> +stored to the origin of ``c`` right before the addition result is written into
> +memory.
> +
> +Several variables may share the same origin address, if they are stored in the
> +same four-byte chunk.
> +In this case every write to either variable updates the origin for all of them.
> +
> +Example 2::
> +
> +  int combine(short a, short b) {
> +    union ret_t {
> +      int i;
> +      short s[2];
> +    } ret;
> +    ret.s[0] = a;
> +    ret.s[1] = b;
> +    return ret.i;
> +  }
> +
> +If ``a`` is initialized and ``b`` is not, the shadow of the result would be
> +0xffff0000, and the origin of the result would be the origin of ``b``.
> +``ret.s[0]`` would have the same origin, but it will be never used, because
> +that variable is initialized.
> +
> +If both function arguments are uninitialized, only the origin of the second
> +argument is preserved.
> +
> +Origin chaining
> +~~~~~~~~~~~~~~~
> +To ease the debugging, KMSAN creates a new origin for every memory store.

"the debugging" -> "debugging"

> +The new origin references both its creation stack and the previous origin the
> +memory location had.
> +This may cause increased memory consumption, so we limit the length of origin
> +chains in the runtime.
> +
> +Clang instrumentation API
> +-------------------------
> +
> +Clang instrumentation pass inserts calls to functions defined in
> +``mm/kmsan/kmsan_instr.c`` into the kernel code.

> +Shadow manipulation
> +~~~~~~~~~~~~~~~~~~~
> +For every memory access the compiler emits a call to a function that returns a
> +pair of pointers to the shadow and origin addresses of the given memory::
> +
> +  typedef struct {
> +    void *s, *o;
> +  } shadow_origin_ptr_t
> +
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, u64 size)
> +  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, u64 size)
> +
> +The function name depends on the memory access size.
> +Each such function also checks if the shadow of the memory in the range
> +[``addr``, ``addr + n``) is contiguous and reports an error otherwise.
> +
> +The compiler makes sure that for every loaded value its shadow and origin
> +values are read from memory.
> +When a value is stored to memory, its shadow and origin are also stored using
> +the metadata pointers.
> +
> +Origin tracking
> +~~~~~~~~~~~~~~~
> +A special function is used to create a new origin value for a local variable
> +and set the origin of that variable to that value::
> +
> +  void __msan_poison_alloca(u64 address, u64 size, char *descr)
> +
> +Access to per-task data
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +At the beginning of every instrumented function KMSAN inserts a call to
> +``__msan_get_context_state()``::
> +
> +  kmsan_context_state *__msan_get_context_state(void)
> +
> +``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
> +
> +  struct kmsan_context_s {
> +    char param_tls[KMSAN_PARAM_SIZE];
> +    char retval_tls[RETVAL_SIZE];
> +    char va_arg_tls[KMSAN_PARAM_SIZE];
> +    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> +    u64 va_arg_overflow_size_tls;
> +    depot_stack_handle_t param_origin_tls[PARAM_ARRAY_SIZE];
> +    depot_stack_handle_t retval_origin_tls;
> +    depot_stack_handle_t origin_tls;
> +  };
> +
> +This structure is used by KMSAN to pass parameter shadows and origins between
> +instrumented functions.
> +
> +String functions
> +~~~~~~~~~~~~~~~~
> +
> +The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
> +following functions. These functions are also called when data structures are
> +initialized or copied, making sure shadow and origin values are copied alongside
> +with the data::
> +
> +  void *__msan_memcpy(void *dst, void *src, u64 n)
> +  void *__msan_memmove(void *dst, void *src, u64 n)
> +  void *__msan_memset(void *dst, int c, size_t n)
> +
> +Error reporting
> +~~~~~~~~~~~~~~~
> +
> +For each pointer dereference and each condition the compiler emits a shadow
> +check that calls ``__msan_warning()`` in the case a poisoned value is being
> +used::
> +
> +  void __msan_warning(u32 origin)
> +
> +``__msan_warning()`` causes KMSAN runtime to print an error report.
> +
> +Inline assembly instrumentation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +KMSAN instruments every inline assembly output with a call to::
> +
> +  void __msan_instrument_asm_store(u64 addr, u64 size)
> +
> +, which unpoisons the memory region.
> +
> +This approach may mask certain errors, but it also helps to avoid a lot of
> +false positives in bitwise operations, atomics etc.
> +
> +Sometimes the pointers passed into inline assembly don't point to valid memory.
> +In such cases they are ignored at runtime.
> +
> +Disabling the instrumentation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +A function can be marked with ``__no_sanitize_memory``.
> +Doing so doesn't remove KMSAN instrumentation from it, however it makes the
> +compiler ignore the uninitialized values coming from the function's inputs,
> +and initialize the function's outputs.
> +The compiler won't inline functions marked with this attribute into functions
> +not marked with it, and vice versa.
> +
> +It's also possible to disable KMSAN for a single file (e.g. main.o)::
> +
> +  KMSAN_SANITIZE_main.o := n
> +
> +or for the whole directory::
> +
> +  KMSAN_SANITIZE := n
> +
> +in the Makefile. This comes at a cost however: stack allocations from such files
> +and parameters of instrumented functions called from them will have incorrect
> +shadow/origin values. As a rule of thumb, avoid using KMSAN_SANITIZE.
> +
> +Runtime library
> +---------------
> +The code is located in ``mm/kmsan/``.
> +
> +Per-task KMSAN state
> +~~~~~~~~~~~~~~~~~~~~
> +
> +Every task_struct has an associated KMSAN task state that holds the KMSAN
> +context (see above) and a per-task flag disallowing KMSAN reports::
> +
> +  struct kmsan_task_state {
> +    ...
> +    bool allow_reporting;
> +    struct kmsan_context_state cstate;
> +    ...
> +  }
> +
> +  struct task_struct {
> +    ...
> +    struct kmsan_task_state kmsan;
> +    ...
> +  }
> +
> +
> +KMSAN contexts
> +~~~~~~~~~~~~~~
> +
> +When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
> +hold the metadata for function parameters and return values.
> +
> +But in the case the kernel is running in the interrupt, softirq or NMI context,
> +where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
> +
> +  DEFINE_PER_CPU(kmsan_context_state[KMSAN_NESTED_CONTEXT_MAX],
> +                 kmsan_percpu_cstate);
> +
> +Metadata allocation
> +~~~~~~~~~~~~~~~~~~~
> +There are several places in the kernel for which the metadata is stored.
> +
> +1. Each ``struct page`` instance contains two pointers to its shadow and
> +origin pages::
> +
> +  struct page {
> +    ...
> +    struct page *shadow, *origin;
> +    ...
> +  };
> +
> +Every time a ``struct page`` is allocated, the runtime library allocates two
> +additional pages to hold its shadow and origins. This is done by adding hooks
> +to ``alloc_pages()``/``free_pages()`` in ``mm/page_alloc.c``.
> +To avoid allocating the metadata for non-interesting pages (right now only the
> +shadow/origin page themselves and stackdepot storage) the
> +``__GFP_NO_KMSAN_SHADOW`` flag is used.
> +
> +There is a problem related to this allocation algorithm: when two contiguous
> +memory blocks are allocated with two different ``alloc_pages()`` calls, their
> +shadow pages may not be contiguous. So, if a memory access crosses the boundary
> +of a memory block, accesses to shadow/origin memory may potentially corrupt
> +other pages or read incorrect values from them.
> +
> +As a workaround, we check the access size in
> +``__msan_metadata_ptr_for_XXX_YYY()`` and return a pointer to a fake shadow
> +region in the case of an error::
> +
> +  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
> +  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
> +
> +``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
> +All stores to ``dummy_store_page`` are ignored.
> +
> +Unfortunately at boot time we need to allocate shadow and origin pages for the
> +kernel data (``.data``, ``.bss`` etc.) and percpu memory regions, the size of
> +which is not a power of 2. As a result, we have to allocate the metadata page by
> +page, so that it is also non-contiguous, although it may be perfectly valid to
> +access the corresponding kernel memory across page boundaries.
> +This can be probably fixed by allocating 1<<N pages at once, splitting them and
> +deallocating the rest.
> +
> +LSB of the ``shadow`` pointer in a ``struct page`` may be set to 1. In this case
> +shadow and origin pages are allocated, but KMSAN ignores accesses to them by
> +falling back to dummy pages. Allocating the metadata pages is still needed to
> +support ``vmap()/vunmap()`` operations on this struct page.
> +
> +2. For vmalloc memory and modules, there's a direct mapping between the memory
> +range, its shadow and origin. KMSAN lessens the vmalloc area by 3/4, making only
> +the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
> +area contains shadow memory for the first quarter, the third one holds the
> +origins. A small part of the fourth quarter contains shadow and origins for the
> +kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
> +more details.
> +
> +When an array of pages is mapped into a contiguous virtual memory space, their
> +shadow and origin pages are similarly mapped into contiguous regions.
> +
> +3. For CPU entry area there're separate per-CPU arrays that hold its metadata::
> +
> +  DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_shadow);
> +  DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_origin);

For some reason rst2html complains here that this is not a literal block.

> +When calculating shadow and origin addresses for a given memory address, the
> +runtime checks whether the address belongs to the physical page range, the
> +virtual page range or CPU entry area.
> +
> +Handling ``pt_regs``
> +~~~~~~~~~~~~~~~~~~~

This is missing a '~' (I ran it through rst2html to find).

> +Many functions receive a ``struct pt_regs`` holding the register state at a
> +certain point. Registers don't have (easily calculatable) shadow or origin
> +associated with them.
> +We can assume that the registers are always initialized.
> +
> +Example report
> +--------------
> +Here's an example of a real KMSAN report in ``packet_bind_spkt()``::

Shouldn't this section be somewhere at the top in a section such as
"usage". A user of KMSAN doesn't really care how KMSAN works.

> +  ==================================================================
> +  BUG: KMSAN: uninit-value in strlen
> +  CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
> +  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> +   0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
> +   ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
> +   0000000000000000 0000000000000092 00000000ec400911 0000000000000002
> +  Call Trace:
> +   [<     inline     >] __dump_stack lib/dump_stack.c:15
> +   [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
> +   [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
> +   [<ffffffff818a783b>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424
> +   [<     inline     >] strlen lib/string.c:484
> +   [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
> +   [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132
> +   [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
> +   [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
> +   [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:?
> +  chained origin:
> +   [<ffffffff810bb787>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
> +   [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
> +   [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
> +   [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527
> +   [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380
> +   [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
> +   [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
> +   [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:?
> +  origin description: ----address@SYSC_bind (origin=00000000eb400911)
> +  ==================================================================
> +
> +The report tells that the local variable ``address`` was created uninitialized
> +in ``SYSC_bind()`` (the ``bind`` system call implementation). The lower stack
> +trace corresponds to the place where this variable was created.
> +
> +The upper stack shows where the uninit value was used - in ``strlen()``.
> +It turned out that the contents of ``address`` were partially copied from the
> +userspace, but the buffer wasn't zero-terminated and contained some trailing
> +uninitialized bytes.
> +``packet_bind_spkt()`` didn't check the length of the buffer, but called
> +``strlcpy()`` on it, which called ``strlen()``, which started reading the
> +buffer byte by byte till it hit the uninitialized memory.
> +
> +
> +References
> +==========
> +
> +E. Stepanov, K. Serebryany. MemorySanitizer: fast detector of uninitialized
> +memory use in C++.
> +In Proceedings of CGO 2015.

This should be turned into a link.

Alexander Potapenko Dec. 3, 2019, 12:42 p.m. UTC | #2

On Wed, Nov 27, 2019 at 3:23 PM Marco Elver <elver@google.com> wrote:
>
> General comments:
> * it's -> it is
> * don't -> do not
>
> On Fri, 22 Nov 2019 at 12:26, <glider@google.com> wrote:
> [...]
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index b0522a4dd107..bc5e3fd87efa 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -19,6 +19,7 @@ whole; patches welcome!
> >     kcov
> >     gcov
> >     kasan
> > +   kmsan
> >     ubsan
> >     kmemleak
> >     gdb-kernel-debugging
> > diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
> > new file mode 100644
> > index 000000000000..51f9c207cc2c
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kmsan.rst
> > @@ -0,0 +1,418 @@
> > +=============================
> > +KernelMemorySanitizer (KMSAN)
> > +=============================
> > +
> > +KMSAN is a dynamic memory error detector aimed at finding uses of uninitialized
> > +memory.
> > +It is based on compiler instrumentation, and is quite similar to the userspace
> > +MemorySanitizer tool (http://clang.llvm.org/docs/MemorySanitizer.html).
>
> These should be real links:  `Memory sanitizer tool <...url...>`_.
Changed these links to link targets.

> > +KMSAN and Clang
> > +===============
> > +
> > +In order for KMSAN to work the kernel must be
> > +built with Clang, which is so far the only compiler that has KMSAN support.
>
> "is so far" -> "so far is"
Ack.
> > +The kernel instrumentation pass is based on the userspace MemorySanitizer tool
> > +(http://clang.llvm.org/docs/MemorySanitizer.html). Because of the
>
> Should also be real link: `MemorySanitizer tool <..url..>`_
Ack.
> > +instrumentation complexity it's unlikely that any other compiler will support
> > +KMSAN soon.
> > +
> > +Right now the instrumentation pass supports x86_64 only.
> > +
> > +How to build
> > +============
> > +
> > +In order to build a kernel with KMSAN you'll need a fresh Clang (10.0.0+, trunk
> > +version r365008 or greater). Please refer to
> > +https://llvm.org/docs/GettingStarted.html for the instructions on how to build
> > +Clang::
> > +
> > +  export KMSAN_CLANG_PATH=/path/to/clang
> >
> > +  # Now configure and build the kernel with CONFIG_KMSAN enabled.
> > +  make CC=$KMSAN_CLANG_PATH -j64
>
> I don't think '-j64' is necessary to build.  Also the 'export' is
> technically not required AFAIK, but I don't think it bothers anyone.
Ack.
> > +How KMSAN works
> > +===============
> > +
> > +KMSAN shadow memory
> > +-------------------
> > +
> > +KMSAN associates a so-called shadow byte with every byte of kernel memory.
>
> 'shadow' memory may not be a well-defined term. More intuitive would
> be saying that it's metadata associated with every byte of kernel
> memory. From then on you can say it's shadow memory.
Changed this to be:
"KMSAN associates a metadata byte (also called shadow byte) with every byte of
kernel memory.
A bit in the shadow byte is set..."

> > +A bit in the shadow byte is set iff the corresponding bit of the kernel memory
> > +byte is uninitialized.
> > +Marking the memory uninitialized (i.e. setting its shadow bytes to 0xff) is
> > +called poisoning, marking it initialized (setting the shadow bytes to 0x00) is
> > +called unpoisoning.
> > +
> > +When a new variable is allocated on the stack, it's poisoned by default by
> > +instrumentation code inserted by the compiler (unless it's a stack variable that
> > +is immediately initialized). Any new heap allocation done without ``__GFP_ZERO``
> > +is also poisoned.
> > +
> > +Compiler instrumentation also tracks the shadow values with the help from the
> > +runtime library in ``mm/kmsan/``.
> > +
> > +The shadow value of a basic or compound type is an array of bytes of the same
> > +length.
> > +When a constant value is written into memory, that memory is unpoisoned.
> > +When a value is read from memory, its shadow memory is also obtained and
> > +propagated into all the operations which use that value. For every instruction
> > +that takes one or more values the compiler generates code that calculates the
> > +shadow of the result depending on those values and their shadows.
> > +
> > +Example::
> > +
> > +  int a = 0xff;
> > +  int b;
> > +  int c = a | b;
> > +
> > +In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
> > +shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
> > +``c`` are uninitialized, while the lower byte is initialized.
> > +
> > +
> > +Origin tracking
> > +---------------
> > +
> > +Every four bytes of kernel memory also have a so-called origin assigned to
> > +them.
> > +This origin describes the point in program execution at which the uninitialized
> > +value was created. Every origin is associated with a creation stack, which lets
> > +the user figure out what's going on.
> > +
> > +When an uninitialized variable is allocated on stack or heap, a new origin
> > +value is created, and that variable's origin is filled with that value.
> > +When a value is read from memory, its origin is also read and kept together
> > +with the shadow. For every instruction that takes one or more values the origin
> > +of the result is one of the origins corresponding to any of the uninitialized
> > +inputs.
> > +If a poisoned value is written into memory, its origin is written to the
> > +corresponding storage as well.
> > +
> > +Example 1::
> > +
> > +  int a = 0;
> > +  int b;
> > +  int c = a + b;
> > +
> > +In this case the origin of ``b`` is generated upon function entry, and is
> > +stored to the origin of ``c`` right before the addition result is written into
> > +memory.
> > +
> > +Several variables may share the same origin address, if they are stored in the
> > +same four-byte chunk.
> > +In this case every write to either variable updates the origin for all of them.
> > +
> > +Example 2::
> > +
> > +  int combine(short a, short b) {
> > +    union ret_t {
> > +      int i;
> > +      short s[2];
> > +    } ret;
> > +    ret.s[0] = a;
> > +    ret.s[1] = b;
> > +    return ret.i;
> > +  }
> > +
> > +If ``a`` is initialized and ``b`` is not, the shadow of the result would be
> > +0xffff0000, and the origin of the result would be the origin of ``b``.
> > +``ret.s[0]`` would have the same origin, but it will be never used, because
> > +that variable is initialized.
> > +
> > +If both function arguments are uninitialized, only the origin of the second
> > +argument is preserved.
> > +
> > +Origin chaining
> > +~~~~~~~~~~~~~~~
> > +To ease the debugging, KMSAN creates a new origin for every memory store.
>
> "the debugging" -> "debugging"
Ack
> > +The new origin references both its creation stack and the previous origin the
> > +memory location had.
> > +This may cause increased memory consumption, so we limit the length of origin
> > +chains in the runtime.
> > +
> > +Clang instrumentation API
> > +-------------------------
> > +
> > +Clang instrumentation pass inserts calls to functions defined in
> > +``mm/kmsan/kmsan_instr.c`` into the kernel code.
>
> > +Shadow manipulation
> > +~~~~~~~~~~~~~~~~~~~
> > +For every memory access the compiler emits a call to a function that returns a
> > +pair of pointers to the shadow and origin addresses of the given memory::
> > +
> > +  typedef struct {
> > +    void *s, *o;
> > +  } shadow_origin_ptr_t
> > +
> > +  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
> > +  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
> > +  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, u64 size)
> > +  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, u64 size)
> > +
> > +The function name depends on the memory access size.
> > +Each such function also checks if the shadow of the memory in the range
> > +[``addr``, ``addr + n``) is contiguous and reports an error otherwise.
> > +
> > +The compiler makes sure that for every loaded value its shadow and origin
> > +values are read from memory.
> > +When a value is stored to memory, its shadow and origin are also stored using
> > +the metadata pointers.
> > +
> > +Origin tracking
> > +~~~~~~~~~~~~~~~
> > +A special function is used to create a new origin value for a local variable
> > +and set the origin of that variable to that value::
> > +
> > +  void __msan_poison_alloca(u64 address, u64 size, char *descr)
> > +
> > +Access to per-task data
> > +~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +At the beginning of every instrumented function KMSAN inserts a call to
> > +``__msan_get_context_state()``::
> > +
> > +  kmsan_context_state *__msan_get_context_state(void)
> > +
> > +``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
> > +
> > +  struct kmsan_context_s {
> > +    char param_tls[KMSAN_PARAM_SIZE];
> > +    char retval_tls[RETVAL_SIZE];
> > +    char va_arg_tls[KMSAN_PARAM_SIZE];
> > +    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> > +    u64 va_arg_overflow_size_tls;
> > +    depot_stack_handle_t param_origin_tls[PARAM_ARRAY_SIZE];
> > +    depot_stack_handle_t retval_origin_tls;
> > +    depot_stack_handle_t origin_tls;
> > +  };
> > +
> > +This structure is used by KMSAN to pass parameter shadows and origins between
> > +instrumented functions.
> > +
> > +String functions
> > +~~~~~~~~~~~~~~~~
> > +
> > +The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
> > +following functions. These functions are also called when data structures are
> > +initialized or copied, making sure shadow and origin values are copied alongside
> > +with the data::
> > +
> > +  void *__msan_memcpy(void *dst, void *src, u64 n)
> > +  void *__msan_memmove(void *dst, void *src, u64 n)
> > +  void *__msan_memset(void *dst, int c, size_t n)
> > +
> > +Error reporting
> > +~~~~~~~~~~~~~~~
> > +
> > +For each pointer dereference and each condition the compiler emits a shadow
> > +check that calls ``__msan_warning()`` in the case a poisoned value is being
> > +used::
> > +
> > +  void __msan_warning(u32 origin)
> > +
> > +``__msan_warning()`` causes KMSAN runtime to print an error report.
> > +
> > +Inline assembly instrumentation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +KMSAN instruments every inline assembly output with a call to::
> > +
> > +  void __msan_instrument_asm_store(u64 addr, u64 size)
> > +
> > +, which unpoisons the memory region.
> > +
> > +This approach may mask certain errors, but it also helps to avoid a lot of
> > +false positives in bitwise operations, atomics etc.
> > +
> > +Sometimes the pointers passed into inline assembly don't point to valid memory.
> > +In such cases they are ignored at runtime.
> > +
> > +Disabling the instrumentation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +A function can be marked with ``__no_sanitize_memory``.
> > +Doing so doesn't remove KMSAN instrumentation from it, however it makes the
> > +compiler ignore the uninitialized values coming from the function's inputs,
> > +and initialize the function's outputs.
> > +The compiler won't inline functions marked with this attribute into functions
> > +not marked with it, and vice versa.
> > +
> > +It's also possible to disable KMSAN for a single file (e.g. main.o)::
> > +
> > +  KMSAN_SANITIZE_main.o := n
> > +
> > +or for the whole directory::
> > +
> > +  KMSAN_SANITIZE := n
> > +
> > +in the Makefile. This comes at a cost however: stack allocations from such files
> > +and parameters of instrumented functions called from them will have incorrect
> > +shadow/origin values. As a rule of thumb, avoid using KMSAN_SANITIZE.
> > +
> > +Runtime library
> > +---------------
> > +The code is located in ``mm/kmsan/``.
> > +
> > +Per-task KMSAN state
> > +~~~~~~~~~~~~~~~~~~~~
> > +
> > +Every task_struct has an associated KMSAN task state that holds the KMSAN
> > +context (see above) and a per-task flag disallowing KMSAN reports::
> > +
> > +  struct kmsan_task_state {
> > +    ...
> > +    bool allow_reporting;
> > +    struct kmsan_context_state cstate;
> > +    ...
> > +  }
> > +
> > +  struct task_struct {
> > +    ...
> > +    struct kmsan_task_state kmsan;
> > +    ...
> > +  }
> > +
> > +
> > +KMSAN contexts
> > +~~~~~~~~~~~~~~
> > +
> > +When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
> > +hold the metadata for function parameters and return values.
> > +
> > +But in the case the kernel is running in the interrupt, softirq or NMI context,
> > +where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
> > +
> > +  DEFINE_PER_CPU(kmsan_context_state[KMSAN_NESTED_CONTEXT_MAX],
> > +                 kmsan_percpu_cstate);
> > +
> > +Metadata allocation
> > +~~~~~~~~~~~~~~~~~~~
> > +There are several places in the kernel for which the metadata is stored.
> > +
> > +1. Each ``struct page`` instance contains two pointers to its shadow and
> > +origin pages::
> > +
> > +  struct page {
> > +    ...
> > +    struct page *shadow, *origin;
> > +    ...
> > +  };
> > +
> > +Every time a ``struct page`` is allocated, the runtime library allocates two
> > +additional pages to hold its shadow and origins. This is done by adding hooks
> > +to ``alloc_pages()``/``free_pages()`` in ``mm/page_alloc.c``.
> > +To avoid allocating the metadata for non-interesting pages (right now only the
> > +shadow/origin page themselves and stackdepot storage) the
> > +``__GFP_NO_KMSAN_SHADOW`` flag is used.
> > +
> > +There is a problem related to this allocation algorithm: when two contiguous
> > +memory blocks are allocated with two different ``alloc_pages()`` calls, their
> > +shadow pages may not be contiguous. So, if a memory access crosses the boundary
> > +of a memory block, accesses to shadow/origin memory may potentially corrupt
> > +other pages or read incorrect values from them.
> > +
> > +As a workaround, we check the access size in
> > +``__msan_metadata_ptr_for_XXX_YYY()`` and return a pointer to a fake shadow
> > +region in the case of an error::
> > +
> > +  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
> > +  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
> > +
> > +``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
> > +All stores to ``dummy_store_page`` are ignored.
> > +
> > +Unfortunately at boot time we need to allocate shadow and origin pages for the
> > +kernel data (``.data``, ``.bss`` etc.) and percpu memory regions, the size of
> > +which is not a power of 2. As a result, we have to allocate the metadata page by
> > +page, so that it is also non-contiguous, although it may be perfectly valid to
> > +access the corresponding kernel memory across page boundaries.
> > +This can be probably fixed by allocating 1<<N pages at once, splitting them and
> > +deallocating the rest.
> > +
> > +LSB of the ``shadow`` pointer in a ``struct page`` may be set to 1. In this case
> > +shadow and origin pages are allocated, but KMSAN ignores accesses to them by
> > +falling back to dummy pages. Allocating the metadata pages is still needed to
> > +support ``vmap()/vunmap()`` operations on this struct page.
> > +
> > +2. For vmalloc memory and modules, there's a direct mapping between the memory
> > +range, its shadow and origin. KMSAN lessens the vmalloc area by 3/4, making only
> > +the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
> > +area contains shadow memory for the first quarter, the third one holds the
> > +origins. A small part of the fourth quarter contains shadow and origins for the
> > +kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
> > +more details.
> > +
> > +When an array of pages is mapped into a contiguous virtual memory space, their
> > +shadow and origin pages are similarly mapped into contiguous regions.
> > +
> > +3. For CPU entry area there're separate per-CPU arrays that hold its metadata::
> > +
> > +  DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_shadow);
> > +  DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_origin);
>
> For some reason rst2html complains here that this is not a literal block.
Maybe that's because the preceding paragraph only contained a single
line. Adding a line break fixed the problem.

> > +When calculating shadow and origin addresses for a given memory address, the
> > +runtime checks whether the address belongs to the physical page range, the
> > +virtual page range or CPU entry area.
> > +
> > +Handling ``pt_regs``
> > +~~~~~~~~~~~~~~~~~~~
>
> This is missing a '~' (I ran it through rst2html to find).
Ack.
> > +Many functions receive a ``struct pt_regs`` holding the register state at a
> > +certain point. Registers don't have (easily calculatable) shadow or origin
> > +associated with them.
> > +We can assume that the registers are always initialized.
> > +
> > +Example report
> > +--------------
> > +Here's an example of a real KMSAN report in ``packet_bind_spkt()``::
>
> Shouldn't this section be somewhere at the top in a section such as
> "usage". A user of KMSAN doesn't really care how KMSAN works.
Good idea, thanks!
Moved this section to the very beginning.
> > +  ==================================================================
> > +  BUG: KMSAN: uninit-value in strlen
> > +  CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
> > +  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > +   0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
> > +   ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
> > +   0000000000000000 0000000000000092 00000000ec400911 0000000000000002
> > +  Call Trace:
> > +   [<     inline     >] __dump_stack lib/dump_stack.c:15
> > +   [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
> > +   [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
> > +   [<ffffffff818a783b>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424
> > +   [<     inline     >] strlen lib/string.c:484
> > +   [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
> > +   [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132
> > +   [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
> > +   [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
> > +   [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:?
> > +  chained origin:
> > +   [<ffffffff810bb787>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
> > +   [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
> > +   [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
> > +   [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527
> > +   [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380
> > +   [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
> > +   [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
> > +   [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:?
> > +  origin description: ----address@SYSC_bind (origin=00000000eb400911)
> > +  ==================================================================
> > +
> > +The report tells that the local variable ``address`` was created uninitialized
> > +in ``SYSC_bind()`` (the ``bind`` system call implementation). The lower stack
> > +trace corresponds to the place where this variable was created.
> > +
> > +The upper stack shows where the uninit value was used - in ``strlen()``.
> > +It turned out that the contents of ``address`` were partially copied from the
> > +userspace, but the buffer wasn't zero-terminated and contained some trailing
> > +uninitialized bytes.
> > +``packet_bind_spkt()`` didn't check the length of the buffer, but called
> > +``strlcpy()`` on it, which called ``strlen()``, which started reading the
> > +buffer byte by byte till it hit the uninitialized memory.
> > +
> > +
> > +References
> > +==========
> > +
> > +E. Stepanov, K. Serebryany. MemorySanitizer: fast detector of uninitialized
> > +memory use in C++.
> > +In Proceedings of CGO 2015.
>
> This should be turned into a link.

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index b0522a4dd107..bc5e3fd87efa 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -19,6 +19,7 @@  whole; patches welcome!
    kcov
    gcov
    kasan
+   kmsan
    ubsan
    kmemleak
    gdb-kernel-debugging
diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
new file mode 100644
index 000000000000..51f9c207cc2c
--- /dev/null
+++ b/Documentation/dev-tools/kmsan.rst
@@ -0,0 +1,418 @@ 
+=============================
+KernelMemorySanitizer (KMSAN)
+=============================
+
+KMSAN is a dynamic memory error detector aimed at finding uses of uninitialized
+memory.
+It is based on compiler instrumentation, and is quite similar to the userspace
+MemorySanitizer tool (http://clang.llvm.org/docs/MemorySanitizer.html).
+
+KMSAN and Clang
+===============
+
+In order for KMSAN to work the kernel must be
+built with Clang, which is so far the only compiler that has KMSAN support.
+The kernel instrumentation pass is based on the userspace MemorySanitizer tool
+(http://clang.llvm.org/docs/MemorySanitizer.html). Because of the
+instrumentation complexity it's unlikely that any other compiler will support
+KMSAN soon.
+
+Right now the instrumentation pass supports x86_64 only.
+
+How to build
+============
+
+In order to build a kernel with KMSAN you'll need a fresh Clang (10.0.0+, trunk
+version r365008 or greater). Please refer to
+https://llvm.org/docs/GettingStarted.html for the instructions on how to build
+Clang::
+
+  export KMSAN_CLANG_PATH=/path/to/clang
+  # Now configure and build the kernel with CONFIG_KMSAN enabled.
+  make CC=$KMSAN_CLANG_PATH -j64
+
+How KMSAN works
+===============
+
+KMSAN shadow memory
+-------------------
+
+KMSAN associates a so-called shadow byte with every byte of kernel memory.
+A bit in the shadow byte is set iff the corresponding bit of the kernel memory
+byte is uninitialized.
+Marking the memory uninitialized (i.e. setting its shadow bytes to 0xff) is
+called poisoning, marking it initialized (setting the shadow bytes to 0x00) is
+called unpoisoning.
+
+When a new variable is allocated on the stack, it's poisoned by default by
+instrumentation code inserted by the compiler (unless it's a stack variable that
+is immediately initialized). Any new heap allocation done without ``__GFP_ZERO``
+is also poisoned.
+
+Compiler instrumentation also tracks the shadow values with the help from the
+runtime library in ``mm/kmsan/``.
+
+The shadow value of a basic or compound type is an array of bytes of the same
+length.
+When a constant value is written into memory, that memory is unpoisoned.
+When a value is read from memory, its shadow memory is also obtained and
+propagated into all the operations which use that value. For every instruction
+that takes one or more values the compiler generates code that calculates the
+shadow of the result depending on those values and their shadows.
+
+Example::
+
+  int a = 0xff;
+  int b;
+  int c = a | b;
+
+In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
+shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
+``c`` are uninitialized, while the lower byte is initialized.
+
+
+Origin tracking
+---------------
+
+Every four bytes of kernel memory also have a so-called origin assigned to
+them.
+This origin describes the point in program execution at which the uninitialized
+value was created. Every origin is associated with a creation stack, which lets
+the user figure out what's going on.
+
+When an uninitialized variable is allocated on stack or heap, a new origin
+value is created, and that variable's origin is filled with that value.
+When a value is read from memory, its origin is also read and kept together
+with the shadow. For every instruction that takes one or more values the origin
+of the result is one of the origins corresponding to any of the uninitialized
+inputs.
+If a poisoned value is written into memory, its origin is written to the
+corresponding storage as well.
+
+Example 1::
+
+  int a = 0;
+  int b;
+  int c = a + b;
+
+In this case the origin of ``b`` is generated upon function entry, and is
+stored to the origin of ``c`` right before the addition result is written into
+memory.
+
+Several variables may share the same origin address, if they are stored in the
+same four-byte chunk.
+In this case every write to either variable updates the origin for all of them.
+
+Example 2::
+
+  int combine(short a, short b) {
+    union ret_t {
+      int i;
+      short s[2];
+    } ret;
+    ret.s[0] = a;
+    ret.s[1] = b;
+    return ret.i;
+  }
+
+If ``a`` is initialized and ``b`` is not, the shadow of the result would be
+0xffff0000, and the origin of the result would be the origin of ``b``.
+``ret.s[0]`` would have the same origin, but it will be never used, because
+that variable is initialized.
+
+If both function arguments are uninitialized, only the origin of the second
+argument is preserved.
+
+Origin chaining
+~~~~~~~~~~~~~~~
+To ease the debugging, KMSAN creates a new origin for every memory store.
+The new origin references both its creation stack and the previous origin the
+memory location had.
+This may cause increased memory consumption, so we limit the length of origin
+chains in the runtime.
+
+Clang instrumentation API
+-------------------------
+
+Clang instrumentation pass inserts calls to functions defined in
+``mm/kmsan/kmsan_instr.c`` into the kernel code.
+
+Shadow manipulation
+~~~~~~~~~~~~~~~~~~~
+For every memory access the compiler emits a call to a function that returns a
+pair of pointers to the shadow and origin addresses of the given memory::
+
+  typedef struct {
+    void *s, *o;
+  } shadow_origin_ptr_t
+
+  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
+  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
+  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, u64 size)
+  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, u64 size)
+
+The function name depends on the memory access size.
+Each such function also checks if the shadow of the memory in the range
+[``addr``, ``addr + n``) is contiguous and reports an error otherwise.
+
+The compiler makes sure that for every loaded value its shadow and origin
+values are read from memory.
+When a value is stored to memory, its shadow and origin are also stored using
+the metadata pointers.
+
+Origin tracking
+~~~~~~~~~~~~~~~
+A special function is used to create a new origin value for a local variable
+and set the origin of that variable to that value::
+
+  void __msan_poison_alloca(u64 address, u64 size, char *descr)
+
+Access to per-task data
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At the beginning of every instrumented function KMSAN inserts a call to
+``__msan_get_context_state()``::
+
+  kmsan_context_state *__msan_get_context_state(void)
+
+``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
+
+  struct kmsan_context_s {
+    char param_tls[KMSAN_PARAM_SIZE];
+    char retval_tls[RETVAL_SIZE];
+    char va_arg_tls[KMSAN_PARAM_SIZE];
+    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+    u64 va_arg_overflow_size_tls;
+    depot_stack_handle_t param_origin_tls[PARAM_ARRAY_SIZE];
+    depot_stack_handle_t retval_origin_tls;
+    depot_stack_handle_t origin_tls;
+  };
+
+This structure is used by KMSAN to pass parameter shadows and origins between
+instrumented functions.
+
+String functions
+~~~~~~~~~~~~~~~~
+
+The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
+following functions. These functions are also called when data structures are
+initialized or copied, making sure shadow and origin values are copied alongside
+with the data::
+
+  void *__msan_memcpy(void *dst, void *src, u64 n)
+  void *__msan_memmove(void *dst, void *src, u64 n)
+  void *__msan_memset(void *dst, int c, size_t n)
+
+Error reporting
+~~~~~~~~~~~~~~~
+
+For each pointer dereference and each condition the compiler emits a shadow
+check that calls ``__msan_warning()`` in the case a poisoned value is being
+used::
+
+  void __msan_warning(u32 origin)
+
+``__msan_warning()`` causes KMSAN runtime to print an error report.
+
+Inline assembly instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instruments every inline assembly output with a call to::
+
+  void __msan_instrument_asm_store(u64 addr, u64 size)
+
+, which unpoisons the memory region.
+
+This approach may mask certain errors, but it also helps to avoid a lot of
+false positives in bitwise operations, atomics etc.
+
+Sometimes the pointers passed into inline assembly don't point to valid memory.
+In such cases they are ignored at runtime.
+
+Disabling the instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A function can be marked with ``__no_sanitize_memory``.
+Doing so doesn't remove KMSAN instrumentation from it, however it makes the
+compiler ignore the uninitialized values coming from the function's inputs,
+and initialize the function's outputs.
+The compiler won't inline functions marked with this attribute into functions
+not marked with it, and vice versa.
+
+It's also possible to disable KMSAN for a single file (e.g. main.o)::
+
+  KMSAN_SANITIZE_main.o := n
+
+or for the whole directory::
+
+  KMSAN_SANITIZE := n
+
+in the Makefile. This comes at a cost however: stack allocations from such files
+and parameters of instrumented functions called from them will have incorrect
+shadow/origin values. As a rule of thumb, avoid using KMSAN_SANITIZE.
+
+Runtime library
+---------------
+The code is located in ``mm/kmsan/``.
+
+Per-task KMSAN state
+~~~~~~~~~~~~~~~~~~~~
+
+Every task_struct has an associated KMSAN task state that holds the KMSAN
+context (see above) and a per-task flag disallowing KMSAN reports::
+
+  struct kmsan_task_state {
+    ...
+    bool allow_reporting;
+    struct kmsan_context_state cstate;
+    ...
+  }
+
+  struct task_struct {
+    ...
+    struct kmsan_task_state kmsan;
+    ...
+  }
+
+
+KMSAN contexts
+~~~~~~~~~~~~~~
+
+When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
+hold the metadata for function parameters and return values.
+
+But in the case the kernel is running in the interrupt, softirq or NMI context,
+where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
+
+  DEFINE_PER_CPU(kmsan_context_state[KMSAN_NESTED_CONTEXT_MAX],
+                 kmsan_percpu_cstate);
+
+Metadata allocation
+~~~~~~~~~~~~~~~~~~~
+There are several places in the kernel for which the metadata is stored.
+
+1. Each ``struct page`` instance contains two pointers to its shadow and
+origin pages::
+
+  struct page {
+    ...
+    struct page *shadow, *origin;
+    ...
+  };
+
+Every time a ``struct page`` is allocated, the runtime library allocates two
+additional pages to hold its shadow and origins. This is done by adding hooks
+to ``alloc_pages()``/``free_pages()`` in ``mm/page_alloc.c``.
+To avoid allocating the metadata for non-interesting pages (right now only the
+shadow/origin page themselves and stackdepot storage) the
+``__GFP_NO_KMSAN_SHADOW`` flag is used.
+
+There is a problem related to this allocation algorithm: when two contiguous
+memory blocks are allocated with two different ``alloc_pages()`` calls, their
+shadow pages may not be contiguous. So, if a memory access crosses the boundary
+of a memory block, accesses to shadow/origin memory may potentially corrupt
+other pages or read incorrect values from them.
+
+As a workaround, we check the access size in
+``__msan_metadata_ptr_for_XXX_YYY()`` and return a pointer to a fake shadow
+region in the case of an error::
+
+  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+
+``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
+All stores to ``dummy_store_page`` are ignored.
+
+Unfortunately at boot time we need to allocate shadow and origin pages for the
+kernel data (``.data``, ``.bss`` etc.) and percpu memory regions, the size of
+which is not a power of 2. As a result, we have to allocate the metadata page by
+page, so that it is also non-contiguous, although it may be perfectly valid to
+access the corresponding kernel memory across page boundaries.
+This can be probably fixed by allocating 1<<N pages at once, splitting them and
+deallocating the rest.
+
+LSB of the ``shadow`` pointer in a ``struct page`` may be set to 1. In this case
+shadow and origin pages are allocated, but KMSAN ignores accesses to them by
+falling back to dummy pages. Allocating the metadata pages is still needed to
+support ``vmap()/vunmap()`` operations on this struct page.
+
+2. For vmalloc memory and modules, there's a direct mapping between the memory
+range, its shadow and origin. KMSAN lessens the vmalloc area by 3/4, making only
+the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
+area contains shadow memory for the first quarter, the third one holds the
+origins. A small part of the fourth quarter contains shadow and origins for the
+kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
+more details.
+
+When an array of pages is mapped into a contiguous virtual memory space, their
+shadow and origin pages are similarly mapped into contiguous regions.
+
+3. For CPU entry area there're separate per-CPU arrays that hold its metadata::
+
+  DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_shadow);
+  DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_origin);
+
+When calculating shadow and origin addresses for a given memory address, the
+runtime checks whether the address belongs to the physical page range, the
+virtual page range or CPU entry area.
+
+Handling ``pt_regs``
+~~~~~~~~~~~~~~~~~~~
+
+Many functions receive a ``struct pt_regs`` holding the register state at a
+certain point. Registers don't have (easily calculatable) shadow or origin
+associated with them.
+We can assume that the registers are always initialized.
+
+Example report
+--------------
+Here's an example of a real KMSAN report in ``packet_bind_spkt()``::
+
+  ==================================================================
+  BUG: KMSAN: uninit-value in strlen
+  CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
+  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
+   0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
+   ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
+   0000000000000000 0000000000000092 00000000ec400911 0000000000000002
+  Call Trace:
+   [<     inline     >] __dump_stack lib/dump_stack.c:15
+   [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
+   [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
+   [<ffffffff818a783b>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424
+   [<     inline     >] strlen lib/string.c:484
+   [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
+   [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132
+   [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
+   [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
+   [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:?
+  chained origin:
+   [<ffffffff810bb787>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
+   [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
+   [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
+   [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527
+   [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380
+   [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
+   [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
+   [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:?
+  origin description: ----address@SYSC_bind (origin=00000000eb400911)
+  ==================================================================
+
+The report tells that the local variable ``address`` was created uninitialized
+in ``SYSC_bind()`` (the ``bind`` system call implementation). The lower stack
+trace corresponds to the place where this variable was created.
+
+The upper stack shows where the uninit value was used - in ``strlen()``.
+It turned out that the contents of ``address`` were partially copied from the
+userspace, but the buffer wasn't zero-terminated and contained some trailing
+uninitialized bytes.
+``packet_bind_spkt()`` didn't check the length of the buffer, but called
+``strlcpy()`` on it, which called ``strlen()``, which started reading the
+buffer byte by byte till it hit the uninitialized memory.
+
+
+References
+==========
+
+E. Stepanov, K. Serebryany. MemorySanitizer: fast detector of uninitialized
+memory use in C++.
+In Proceedings of CGO 2015.

[RFC,v3,05/36] kmsan: add ReST documentation

Commit Message

Comments

Patch