From patchwork Sat Nov 10 01:38:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10676797 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA838109C for ; Sat, 10 Nov 2018 01:36:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 962842E2AF for ; Sat, 10 Nov 2018 01:36:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88CD52F050; Sat, 10 Nov 2018 01:36:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D16C2E2AF for ; Sat, 10 Nov 2018 01:36:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A073C6B075E; Fri, 9 Nov 2018 20:36:30 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 918DC6B075B; Fri, 9 Nov 2018 20:36:30 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E36A6B075C; Fri, 9 Nov 2018 20:36:30 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 368986B0758 for ; Fri, 9 Nov 2018 20:36:30 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id a2so2374288pgt.11 for ; Fri, 09 Nov 2018 17:36:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:mime-version:content-transfer-encoding; bh=KoKojsnFvXv8vIimsWjENQpqVj624zGTuJiQoSijGhE=; b=YWgzeu7hsPvrx/x3wSepWCXbRk8le2CX2DpiPeOJ3bWdW1YDimZ4Ux1ikalo39mAky 3Smvhy+caoTZdgtLv8A3Nk/ey2vxz17iF1Z4OOFqf4QVUneJd4zyPN/ovKh6ULESGyYH NBj7HqcNfvC9fZXSiTolFiIQuWUndFJ676BOFh/B10eydL+QEr//GKIeI2pYik3OIhqS hYNTHY0LDC/lsurldy/jvaj4VHxHc8i+BBrCactkPhTSDHUvn7vzh4HDyUhFgKBSfvUe zFHUJLU+KkwGelF8ilfQfCOZ4Zc1S8cQ/W7tPmIaShC8YrIeYTNQUmnryPSWbpbKQjaq UvVA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AGRZ1gJBulkI87XwzXPJwEaYot0doT7QGI+aGJSw9GvN5fK6rokcXlJm pP1dtSaPX/ot+q/Vy10oLFzPN20O2N6kQR54ddXFaNSHQBgHwcrR6m58vnIxe5uyQ/P6zp62l61 I4Y9TuBWXIlmnR8ZUuPvAqICYtneuecihE2DGWFBjghTM+IDGRjg7lV8Ib0fOkHOpcA== X-Received: by 2002:a17:902:8347:: with SMTP id z7-v6mr11139158pln.340.1541813789825; Fri, 09 Nov 2018 17:36:29 -0800 (PST) X-Google-Smtp-Source: AJdET5eDrR+Em7sISGI4tz2FcO3kV+kG0GcApeF2MQtavnx6moCphVF/w5DTNXpXCs497hN/9uBY X-Received: by 2002:a17:902:8347:: with SMTP id z7-v6mr11139080pln.340.1541813788340; Fri, 09 Nov 2018 17:36:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541813788; cv=none; d=google.com; s=arc-20160816; b=w6hGVjq4+VNBxQLfgJCI+nd3B97OsulcRTeSuiwSyd9sjEJzycFanEGgRvQi3+vLBI IFAbaVmD64gDDa73xvuFhdebl0KMLHARGCWANXc2HIg6Zom4SZKUYEuT8KopKiGlcMBU GfL+GhQGt3HyeO8U12Hr4KlSZXWVE3A/YBuWr7vfpKL/3og61zHn2Izow4nKPr4pEmH+ TtUGlAn21BMWfCjiGPNzv56hZTgJxLTu9TENi/T6VuSuU9ZwrLOE0E9V4iF2eqyR6n/m 02+VT2gbKmSWhgNBTMID7UiM/rgg4Abi5HtHc3Do1vMUPnp9NIn+PtI9kEROkrKgv408 CdvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from; bh=KoKojsnFvXv8vIimsWjENQpqVj624zGTuJiQoSijGhE=; b=E8nLVsQS50G0eUgI1uF8/Cb6K1+q1/uvTDRIBSacSvYpgGbmav5DxSS2hIoBCaApG/ JOT0pxhErb089SCJ6W1mfnV8gEqHe7bPKr/2Qdkkw9hwwzxccNrNrOYkwKEgXe/lBqVZ cSUo8u1qD9scxAbzZLIXWRtfmLzEv8H3+WOeSc7/NKCWOi1wAqAMa7Og/4Jcd8CQtbz/ RFd9YqHk+OweIuqX9USZ5dlVG0wKgPnU05A2g3l7nLTzxYOMmttbmCu1Wh72GLrY8gtT LuEMDMsJO2OlSv2ya1TKPOt0wyKB+MfLI8qs/Hyt8IiWoeCvWI9AOjEQFUHIh4pJKolT vsIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id n32si8099080pgm.439.2018.11.09.17.36.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Nov 2018 17:36:28 -0800 (PST) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Nov 2018 17:36:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,485,1534834800"; d="scan'208";a="105025657" Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.168]) by fmsmga004.fm.intel.com with ESMTP; 09 Nov 2018 17:36:26 -0800 From: Rick Edgecombe To: akpm@linux-foundation.org, willy@infradead.org, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, daniel@iogearbox.net, jannh@google.com, keescook@chromium.org Cc: kristen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, Rick Edgecombe Subject: [PATCH v9 0/4] KASLR feature to randomize each loadable module Date: Fri, 9 Nov 2018 17:38:03 -0800 Message-Id: <20181110013807.24903-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is V9 of the "KASLR feature to randomize each loadable module" patchset. The purpose is to increase the randomization for the module space from 10 to 17 bits, and also to make the modules randomized in relation to each other instead of just the address where the allocations begin, so that if one module leaks the location of the others can't be inferred. Why its useful ============== Randomizing the location of executable code is a defense against control flow attacks, where the kernel is tricked into jumping, or speculatively executing code other than what is intended. By randomizing the location of the code, the attacker doesn't know where to redirect the control flow. Today the RANDOMIZE_BASE feature randomizes the base address where the module allocations begin with 10 bits of entropy for this purpose. From here, a highly deterministic algorithm allocates space for the modules as they are loaded and unloaded. If an attacker can predict the order and identities for modules that will be loaded (either by the system, or controlled by the user with request_module or BPF), then a single text address leak can give the attacker access to the locations of other modules. So in this case this new algorithm can take the entropy of the other modules from ~0 to 17, making it much more robust. Another problem today is that the low 10 bits of entropy makes brute force attacks feasible, especially in the case of speculative execution where a wrong guess won't necessarily cause a crash. In this case, increasing the randomization will force attacks to take longer, and so increase the time an attacker may be detected on a system. In the past KASLR has been considered mostly a remote defense, due to available methods of de-randomizing the kernel text locally, but previous easier local de-randomizing methods have been blocked by KPTI. There are multiple efforts to apply more randomization to the core kernel text as well, and so this module space piece can be a first step to increasing randomization for all kernel space executable code. Userspace ASLR can get 28 bits of entropy or more, so at least increasing this to 17 for now improves what is currently a pretty low amount of randomization for the higher privileged kernel space. How it works ============ The algorithm is pretty simple. It just breaks the module space in two, a random area (2/3 of module space) and a backup area (1/3 of module space). It first tries to allocate up to 10000 randomly located starting pages inside the random section. If this fails, it will allocate in the backup area. The backup area base will be offset in the same way as current algorithm does for the base area, which has 10 bits of entropy. The vmalloc allocator can be used to try an allocation at a specific address, however it is usually used to try an allocation over a large address range, and so some behaviors which are non-issues in normal usage can be be sub-optimal when trying the an allocation at 10000 small ranges. So this patch also includes a new vmalloc function __vmalloc_node_try_addr and some other vmalloc tweaks that allow for more efficient trying of addresses. This algorithm targets maintaining high entropy for many 1000's of module allocations. This is because there are other users of the module space besides kernel modules, like eBPF JIT, classic BPF socket filter JIT and kprobes. Performance =========== Simulations were run using module sizes derived from the x86_64 modules to measure the allocation performance at various levels of fragmentation and whether the backup area was used. Capacity -------- There is a slight reduction in the capacity of modules as simulated by the x86_64 module sizes of <1000. Note this is a worst case, since in practice module allocations in the 1000's will consist of smaller BPF JIT allocations or kprobes which would fit better in the random area. Allocation time --------------- Below are three sets of measurements in ns of the allocation time as measured by the included kselftests. The first two columns are this new algorithm with and with out the vmalloc optimizations for trying random addresses quickly. They are included for consideration of whether the changes are worth it. The last column is the performance of the original algorithm. Modules Vmalloc optimization No Vmalloc Optimization Existing Module KASLR 1000 1433 1993 3821 2000 2295 3681 7830 3000 4424 7450 13012 4000 7746 13824 18106 5000 12721 21852 22572 6000 19724 33926 26443 7000 27638 47427 30473 8000 37745 64443 34200 These allocations are not taking very long, but it may show up on systems with very high usage of the module space (BPF JITs). If the trade-off of touching vmalloc doesn't seem worth it to people, I can remove the optimizations. Randomness ---------- Unlike the existing algorithm, the amount of randomness provided has a dependency on the number of modules allocated and the sizes of the modules text sections. The entropy provided for the Nth allocation will come from three sources of randomness, the range of addresses for the random area, the probability the section will be allocated in the backup area and randomness from the number of modules already allocated in the backup area. For computing a lower bound entropy in the following calculations, the randomness of the modules already in the backup area, or overlapping from the random area, is ignored since it is usually small and will only increase the entropy. Below is an attempt to compute a worst case value for entropy to compare to the existing algorithm. For probability of the Nth allocation being in the backup area, p, a lower bound entropy estimate is calculated here as: Random Area Slots = ((2/3)*1073741824)/4096 = 174762 Entropy = -( (1-p)*log2((1-p)/174762) + p*log2(p/1024) ) For >8000 modules the entropy remains above 17.3. For non-speculative control flow attacks, an attack might crash the system. So the probability of the first guess being right can be more important than the Nth guess. KASLR schemes usually have equal probability for each possible position, but in this scheme that is not the case. So a more conservative comparison to existing schemes is the amount of information that would have to be guessed correctly for the position that has the highest probability for having the Nth module allocated (as that would be the attackers best guess): Min Info = MIN(-log2(p/1024), -log2((1-p)/174762)) Allocations Entropy 1000 17.4 2000 17.4 3000 17.4 4000 16.8 5000 15.8 6000 14.9 7000 14.8 8000 14.2 If anyone is keeping track, these numbers are different than as reported in V2, because they are generated using the more compact allocation size heuristic that is included in the kselftest rather than the real much larger dataset. The heuristic generates randomization benchmarks that are slightly slower than the real dataset. The real dataset also isn't representative of the case of mostly smaller BPF filters, so it represents a worst case lower bound for entropy and in practice 17+ bits should be maintained to much higher number of modules. PTE usage --------- Since the allocations are spread out over a wider address space, there is increased PTE usage which should not exceed 1.3MB more than the old algorithm. Changes for V9: - Better explanations in commit messages, instructions in kselftests (Andrew Morton) Changes for V8: - Simplify code by removing logic for optimum handling of lazy free areas Changes for V7: - More 0-day build fixes, readability improvements (Kees Cook) Changes for V6: - 0-day build fixes by removing un-needed functional testing, more error handling Changes for V5: - Add module_alloc test module Changes for V4: - Fix issue caused by KASAN, kmemleak being provided different allocation lengths (padding). - Avoid kmalloc until sure its needed in __vmalloc_node_try_addr. - Fixed issues reported by 0-day. Changes for V3: - Code cleanup based on internal feedback. (thanks to Dave Hansen and Andriy Shevchenko) - Slight refactor of existing algorithm to more cleanly live along side new one. - BPF synthetic benchmark Changes for V2: - New implementation of __vmalloc_node_try_addr based on the __vmalloc_node_range implementation, that only flushes TLB when needed. - Modified module loading algorithm to try to reduce the TLB flushes further. - Increase "random area" tries in order to increase the number of modules that can get high randomness. - Increase "random area" size to 2/3 of module area in order to increase the number of modules that can get high randomness. - Fix for 0day failures on other architectures. - Fix for wrong debugfs permissions. (thanks to Jann Horn) - Spelling fix. (thanks to Jann Horn) - Data on module_alloc performance and TLB flushes. (brought up by Kees Cook and Jann Horn) - Data on memory usage. (suggested by Jann) Rick Edgecombe (4): vmalloc: Add __vmalloc_node_try_addr function x86/modules: Increase randomization for modules vmalloc: Add debugfs modfraginfo Kselftest for module text allocation benchmarking arch/x86/Kconfig | 3 + arch/x86/include/asm/kaslr_modules.h | 38 ++ arch/x86/include/asm/pgtable_64_types.h | 7 + arch/x86/kernel/module.c | 111 ++++-- include/linux/vmalloc.h | 3 + lib/Kconfig.debug | 9 + lib/Makefile | 1 + lib/test_mod_alloc.c | 375 ++++++++++++++++++ mm/vmalloc.c | 228 +++++++++-- tools/testing/selftests/bpf/test_mod_alloc.sh | 29 ++ 10 files changed, 743 insertions(+), 61 deletions(-) create mode 100644 arch/x86/include/asm/kaslr_modules.h create mode 100644 lib/test_mod_alloc.c create mode 100755 tools/testing/selftests/bpf/test_mod_alloc.sh