From patchwork Wed Oct 30 14:22:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Potapenko X-Patchwork-Id: 11219615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23DD41668 for ; Wed, 30 Oct 2019 14:23:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C946420656 for ; Wed, 30 Oct 2019 14:23:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ecTIvnoM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C946420656 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 605946B000E; Wed, 30 Oct 2019 10:23:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 58F776B0010; Wed, 30 Oct 2019 10:23:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 458176B0266; Wed, 30 Oct 2019 10:23:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 0B4256B000E for ; Wed, 30 Oct 2019 10:23:01 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 9FCC3180AD822 for ; Wed, 30 Oct 2019 14:23:00 +0000 (UTC) X-FDA: 76100667720.07.slave11_3748f6acd0620 X-Spam-Summary: 50,0,0,da1853dce7c23e03,d41d8cd98f00b204,3wpy5xqykcnu7c945i7ff7c5.3fdc9elo-ddbm13b.fi7@flex--glider.bounces.google.com,:vegard.nossum@oracle.com:dvyukov@google.com::viro@zeniv.linux.org.uk:akpm@linux-foundation.org:aryabinin@virtuozzo.com:luto@kernel.org:ard.biesheuvel@linaro.org:arnd@arndb.de:hch@lst.de:dmitry.torokhov@gmail.com:edumazet@google.com:ericvh@gmail.com:gregkh@linuxfoundation.org:harry.wentland@amd.com:herbert@gondor.apana.org.au:mingo@elte.hu:axboe@kernel.dk:martin.petersen@oracle.com:schwidefsky@de.ibm.com:mst@redhat.com:monstr@monstr.eu:pmladek@suse.com:sergey.senozhatsky@gmail.com:rostedt@goodmis.org:tiwai@suse.com:tytso@mit.edu:tglx@linutronix.de:wsa@the-dreams.de:gor@linux.ibm.com:iii@linux.ibm.com:mark.rutland@arm.com:willy@infradead.org:rdunlap@infradead.org:andreyknvl@google.com:elver@google.com:glider@google.com,RULES_HIT:41:69:152:327:355:379:541:800:960:966:967:968:973:982:988:989:1260:1263:1277:1313:1314:1345:1359:1437:1516:1518:1593: 1594:160 X-HE-Tag: slave11_3748f6acd0620 X-Filterd-Recvd-Size: 21426 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Wed, 30 Oct 2019 14:22:59 +0000 (UTC) Received: by mail-wr1-f73.google.com with SMTP id t2so1388191wri.18 for ; Wed, 30 Oct 2019 07:22:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kC4fHddMlzrW3H+GQD1LapxofFrWtpYFYDWwAFlzg/I=; b=ecTIvnoMfs5vGHbKoX1zLDqg+33Jz/M6k9xcrhGedUxKKIYEEWMk1Jp83d7tCiQYM8 OgFrCpNxnuG0mJ8VOUfA7ARXyCx9f3ur6gJ7psrvl5QiUrx/6dfolS3Wi5cI5qCPn5ZZ +m229o1R6PMGs51aW1GvQ9fdzy6lR6vhggaL/qmT8pH5viI20X76TgTXr1p9HrZJvqFB EP2Z/dEeOYdLYZog23OKdj3DYmaaRVaL0dJigUG9LZ534hXaSUpj4ktxcJyqobh09Xb4 MbM0iIZF8po2536sJgUx3I6LngIGYqaB7AUjv9/EHbS1jVPb/ZQR83IpeUOvGvsKJ3pc GLHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kC4fHddMlzrW3H+GQD1LapxofFrWtpYFYDWwAFlzg/I=; b=PdE+zro50KILmgO7DzZ7YeKenjy7kKEjI1vBXHC5Zr0eL9xrqI1s8cpuZKtB9je2e1 ZyVKTu0swLhSzO/qyi3ypk09jeA1sRYaKD2tPM4yHVhXYpDj2/2oBNDqCZsKSJgMRYNM 0VqBQTWLEmtsT+kNxdFrE9LUDa226nyLoMbattff+TnNaTiFjsjYZ4jNU9kbw+OB9rc6 0bMu2xfmstx/PxfUPoCIV0SFdjn3nIlSex7BkRGzziW3gK9jIXJAyPtjc19wGlOTfan3 4jCLgyONNlOnBF4D4ngqDxyNnlIvr/Ws/EiVHSTSqfSOXkTBMyjxmG1yh8ndKIAbGQtR 13SA== X-Gm-Message-State: APjAAAXvAWHNdEiogNVX77aoPlQON0v8R3ZyX8v9Kvmw2VpjVRkZ4m5D YlvM2uIe8Mq1+Dfa6+aMhgY9PYjBB0E= X-Google-Smtp-Source: APXvYqykSzlkcXE6zFNul/+PiyRpvdnTXjZBgtyg7XLn8E2I4OThbJDTJ1wfwPjmuhXlBbr0JAVYMdzsMHk= X-Received: by 2002:adf:9f08:: with SMTP id l8mr78847wrf.325.1572445378000; Wed, 30 Oct 2019 07:22:58 -0700 (PDT) Date: Wed, 30 Oct 2019 15:22:17 +0100 In-Reply-To: <20191030142237.249532-1-glider@google.com> Message-Id: <20191030142237.249532-6-glider@google.com> Mime-Version: 1.0 References: <20191030142237.249532-1-glider@google.com> X-Mailer: git-send-email 2.24.0.rc0.303.g954a862665-goog Subject: [PATCH RFC v2 05/25] kmsan: add ReST documentation From: glider@google.com To: Vegard Nossum , Dmitry Vyukov , linux-mm@kvack.org Cc: viro@zeniv.linux.org.uk, akpm@linux-foundation.org, aryabinin@virtuozzo.com, luto@kernel.org, ard.biesheuvel@linaro.org, arnd@arndb.de, hch@lst.de, dmitry.torokhov@gmail.com, edumazet@google.com, ericvh@gmail.com, gregkh@linuxfoundation.org, harry.wentland@amd.com, herbert@gondor.apana.org.au, mingo@elte.hu, axboe@kernel.dk, martin.petersen@oracle.com, schwidefsky@de.ibm.com, mst@redhat.com, monstr@monstr.eu, pmladek@suse.com, sergey.senozhatsky@gmail.com, rostedt@goodmis.org, tiwai@suse.com, tytso@mit.edu, tglx@linutronix.de, wsa@the-dreams.de, gor@linux.ibm.com, iii@linux.ibm.com, mark.rutland@arm.com, willy@infradead.org, rdunlap@infradead.org, andreyknvl@google.com, elver@google.com, Alexander Potapenko X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools index. Signed-off-by: Alexander Potapenko To: Alexander Potapenko Cc: Vegard Nossum Cc: Dmitry Vyukov Cc: linux-mm@kvack.org --- Change-Id: Iac6345065e6804ef811f1124fdf779c67ff1530e --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/kmsan.rst | 418 ++++++++++++++++++++++++++++++ 2 files changed, 419 insertions(+) create mode 100644 Documentation/dev-tools/kmsan.rst diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index b0522a4dd107..bc5e3fd87efa 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -19,6 +19,7 @@ whole; patches welcome! kcov gcov kasan + kmsan ubsan kmemleak gdb-kernel-debugging diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst new file mode 100644 index 000000000000..51f9c207cc2c --- /dev/null +++ b/Documentation/dev-tools/kmsan.rst @@ -0,0 +1,418 @@ +============================= +KernelMemorySanitizer (KMSAN) +============================= + +KMSAN is a dynamic memory error detector aimed at finding uses of uninitialized +memory. +It is based on compiler instrumentation, and is quite similar to the userspace +MemorySanitizer tool (http://clang.llvm.org/docs/MemorySanitizer.html). + +KMSAN and Clang +=============== + +In order for KMSAN to work the kernel must be +built with Clang, which is so far the only compiler that has KMSAN support. +The kernel instrumentation pass is based on the userspace MemorySanitizer tool +(http://clang.llvm.org/docs/MemorySanitizer.html). Because of the +instrumentation complexity it's unlikely that any other compiler will support +KMSAN soon. + +Right now the instrumentation pass supports x86_64 only. + +How to build +============ + +In order to build a kernel with KMSAN you'll need a fresh Clang (10.0.0+, trunk +version r365008 or greater). Please refer to +https://llvm.org/docs/GettingStarted.html for the instructions on how to build +Clang:: + + export KMSAN_CLANG_PATH=/path/to/clang + # Now configure and build the kernel with CONFIG_KMSAN enabled. + make CC=$KMSAN_CLANG_PATH -j64 + +How KMSAN works +=============== + +KMSAN shadow memory +------------------- + +KMSAN associates a so-called shadow byte with every byte of kernel memory. +A bit in the shadow byte is set iff the corresponding bit of the kernel memory +byte is uninitialized. +Marking the memory uninitialized (i.e. setting its shadow bytes to 0xff) is +called poisoning, marking it initialized (setting the shadow bytes to 0x00) is +called unpoisoning. + +When a new variable is allocated on the stack, it's poisoned by default by +instrumentation code inserted by the compiler (unless it's a stack variable that +is immediately initialized). Any new heap allocation done without ``__GFP_ZERO`` +is also poisoned. + +Compiler instrumentation also tracks the shadow values with the help from the +runtime library in ``mm/kmsan/``. + +The shadow value of a basic or compound type is an array of bytes of the same +length. +When a constant value is written into memory, that memory is unpoisoned. +When a value is read from memory, its shadow memory is also obtained and +propagated into all the operations which use that value. For every instruction +that takes one or more values the compiler generates code that calculates the +shadow of the result depending on those values and their shadows. + +Example:: + + int a = 0xff; + int b; + int c = a | b; + +In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``, +shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of +``c`` are uninitialized, while the lower byte is initialized. + + +Origin tracking +--------------- + +Every four bytes of kernel memory also have a so-called origin assigned to +them. +This origin describes the point in program execution at which the uninitialized +value was created. Every origin is associated with a creation stack, which lets +the user figure out what's going on. + +When an uninitialized variable is allocated on stack or heap, a new origin +value is created, and that variable's origin is filled with that value. +When a value is read from memory, its origin is also read and kept together +with the shadow. For every instruction that takes one or more values the origin +of the result is one of the origins corresponding to any of the uninitialized +inputs. +If a poisoned value is written into memory, its origin is written to the +corresponding storage as well. + +Example 1:: + + int a = 0; + int b; + int c = a + b; + +In this case the origin of ``b`` is generated upon function entry, and is +stored to the origin of ``c`` right before the addition result is written into +memory. + +Several variables may share the same origin address, if they are stored in the +same four-byte chunk. +In this case every write to either variable updates the origin for all of them. + +Example 2:: + + int combine(short a, short b) { + union ret_t { + int i; + short s[2]; + } ret; + ret.s[0] = a; + ret.s[1] = b; + return ret.i; + } + +If ``a`` is initialized and ``b`` is not, the shadow of the result would be +0xffff0000, and the origin of the result would be the origin of ``b``. +``ret.s[0]`` would have the same origin, but it will be never used, because +that variable is initialized. + +If both function arguments are uninitialized, only the origin of the second +argument is preserved. + +Origin chaining +~~~~~~~~~~~~~~~ +To ease the debugging, KMSAN creates a new origin for every memory store. +The new origin references both its creation stack and the previous origin the +memory location had. +This may cause increased memory consumption, so we limit the length of origin +chains in the runtime. + +Clang instrumentation API +------------------------- + +Clang instrumentation pass inserts calls to functions defined in +``mm/kmsan/kmsan_instr.c`` into the kernel code. + +Shadow manipulation +~~~~~~~~~~~~~~~~~~~ +For every memory access the compiler emits a call to a function that returns a +pair of pointers to the shadow and origin addresses of the given memory:: + + typedef struct { + void *s, *o; + } shadow_origin_ptr_t + + shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr) + shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr) + shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, u64 size) + shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, u64 size) + +The function name depends on the memory access size. +Each such function also checks if the shadow of the memory in the range +[``addr``, ``addr + n``) is contiguous and reports an error otherwise. + +The compiler makes sure that for every loaded value its shadow and origin +values are read from memory. +When a value is stored to memory, its shadow and origin are also stored using +the metadata pointers. + +Origin tracking +~~~~~~~~~~~~~~~ +A special function is used to create a new origin value for a local variable +and set the origin of that variable to that value:: + + void __msan_poison_alloca(u64 address, u64 size, char *descr) + +Access to per-task data +~~~~~~~~~~~~~~~~~~~~~~~~~ + +At the beginning of every instrumented function KMSAN inserts a call to +``__msan_get_context_state()``:: + + kmsan_context_state *__msan_get_context_state(void) + +``kmsan_context_state`` is declared in ``include/linux/kmsan.h``:: + + struct kmsan_context_s { + char param_tls[KMSAN_PARAM_SIZE]; + char retval_tls[RETVAL_SIZE]; + char va_arg_tls[KMSAN_PARAM_SIZE]; + char va_arg_origin_tls[KMSAN_PARAM_SIZE]; + u64 va_arg_overflow_size_tls; + depot_stack_handle_t param_origin_tls[PARAM_ARRAY_SIZE]; + depot_stack_handle_t retval_origin_tls; + depot_stack_handle_t origin_tls; + }; + +This structure is used by KMSAN to pass parameter shadows and origins between +instrumented functions. + +String functions +~~~~~~~~~~~~~~~~ + +The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the +following functions. These functions are also called when data structures are +initialized or copied, making sure shadow and origin values are copied alongside +with the data:: + + void *__msan_memcpy(void *dst, void *src, u64 n) + void *__msan_memmove(void *dst, void *src, u64 n) + void *__msan_memset(void *dst, int c, size_t n) + +Error reporting +~~~~~~~~~~~~~~~ + +For each pointer dereference and each condition the compiler emits a shadow +check that calls ``__msan_warning()`` in the case a poisoned value is being +used:: + + void __msan_warning(u32 origin) + +``__msan_warning()`` causes KMSAN runtime to print an error report. + +Inline assembly instrumentation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +KMSAN instruments every inline assembly output with a call to:: + + void __msan_instrument_asm_store(u64 addr, u64 size) + +, which unpoisons the memory region. + +This approach may mask certain errors, but it also helps to avoid a lot of +false positives in bitwise operations, atomics etc. + +Sometimes the pointers passed into inline assembly don't point to valid memory. +In such cases they are ignored at runtime. + +Disabling the instrumentation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A function can be marked with ``__no_sanitize_memory``. +Doing so doesn't remove KMSAN instrumentation from it, however it makes the +compiler ignore the uninitialized values coming from the function's inputs, +and initialize the function's outputs. +The compiler won't inline functions marked with this attribute into functions +not marked with it, and vice versa. + +It's also possible to disable KMSAN for a single file (e.g. main.o):: + + KMSAN_SANITIZE_main.o := n + +or for the whole directory:: + + KMSAN_SANITIZE := n + +in the Makefile. This comes at a cost however: stack allocations from such files +and parameters of instrumented functions called from them will have incorrect +shadow/origin values. As a rule of thumb, avoid using KMSAN_SANITIZE. + +Runtime library +--------------- +The code is located in ``mm/kmsan/``. + +Per-task KMSAN state +~~~~~~~~~~~~~~~~~~~~ + +Every task_struct has an associated KMSAN task state that holds the KMSAN +context (see above) and a per-task flag disallowing KMSAN reports:: + + struct kmsan_task_state { + ... + bool allow_reporting; + struct kmsan_context_state cstate; + ... + } + + struct task_struct { + ... + struct kmsan_task_state kmsan; + ... + } + + +KMSAN contexts +~~~~~~~~~~~~~~ + +When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to +hold the metadata for function parameters and return values. + +But in the case the kernel is running in the interrupt, softirq or NMI context, +where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state:: + + DEFINE_PER_CPU(kmsan_context_state[KMSAN_NESTED_CONTEXT_MAX], + kmsan_percpu_cstate); + +Metadata allocation +~~~~~~~~~~~~~~~~~~~ +There are several places in the kernel for which the metadata is stored. + +1. Each ``struct page`` instance contains two pointers to its shadow and +origin pages:: + + struct page { + ... + struct page *shadow, *origin; + ... + }; + +Every time a ``struct page`` is allocated, the runtime library allocates two +additional pages to hold its shadow and origins. This is done by adding hooks +to ``alloc_pages()``/``free_pages()`` in ``mm/page_alloc.c``. +To avoid allocating the metadata for non-interesting pages (right now only the +shadow/origin page themselves and stackdepot storage) the +``__GFP_NO_KMSAN_SHADOW`` flag is used. + +There is a problem related to this allocation algorithm: when two contiguous +memory blocks are allocated with two different ``alloc_pages()`` calls, their +shadow pages may not be contiguous. So, if a memory access crosses the boundary +of a memory block, accesses to shadow/origin memory may potentially corrupt +other pages or read incorrect values from them. + +As a workaround, we check the access size in +``__msan_metadata_ptr_for_XXX_YYY()`` and return a pointer to a fake shadow +region in the case of an error:: + + char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); + char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); + +``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes. +All stores to ``dummy_store_page`` are ignored. + +Unfortunately at boot time we need to allocate shadow and origin pages for the +kernel data (``.data``, ``.bss`` etc.) and percpu memory regions, the size of +which is not a power of 2. As a result, we have to allocate the metadata page by +page, so that it is also non-contiguous, although it may be perfectly valid to +access the corresponding kernel memory across page boundaries. +This can be probably fixed by allocating 1<] __dump_stack lib/dump_stack.c:15 + [] dump_stack+0x238/0x290 lib/dump_stack.c:51 + [] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003 + [] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424 + [< inline >] strlen lib/string.c:484 + [] strlcpy+0x9d/0x200 lib/string.c:144 + [] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132 + [] SYSC_bind+0x40d/0x5f0 net/socket.c:1370 + [] SyS_bind+0x82/0xa0 net/socket.c:1356 + [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? + chained origin: + [] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 + [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322 + [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334 + [] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527 + [] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380 + [] SYSC_bind+0x129/0x5f0 net/socket.c:1356 + [] SyS_bind+0x82/0xa0 net/socket.c:1356 + [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? + origin description: ----address@SYSC_bind (origin=00000000eb400911) + ================================================================== + +The report tells that the local variable ``address`` was created uninitialized +in ``SYSC_bind()`` (the ``bind`` system call implementation). The lower stack +trace corresponds to the place where this variable was created. + +The upper stack shows where the uninit value was used - in ``strlen()``. +It turned out that the contents of ``address`` were partially copied from the +userspace, but the buffer wasn't zero-terminated and contained some trailing +uninitialized bytes. +``packet_bind_spkt()`` didn't check the length of the buffer, but called +``strlcpy()`` on it, which called ``strlen()``, which started reading the +buffer byte by byte till it hit the uninitialized memory. + + +References +========== + +E. Stepanov, K. Serebryany. MemorySanitizer: fast detector of uninitialized +memory use in C++. +In Proceedings of CGO 2015.