From patchwork Thu Oct 11 01:36:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10635619 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E79214DB for ; Thu, 11 Oct 2018 01:50:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4F4A62A71F for ; Thu, 11 Oct 2018 01:50:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E7AA2A739; Thu, 11 Oct 2018 01:50:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 085602A71F for ; Thu, 11 Oct 2018 01:50:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 157196B0010; Wed, 10 Oct 2018 21:50:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 105DB6B0266; Wed, 10 Oct 2018 21:50:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F35AB6B0269; Wed, 10 Oct 2018 21:50:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id A3F166B0010 for ; Wed, 10 Oct 2018 21:50:01 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id ce7-v6so5157089plb.22 for ; Wed, 10 Oct 2018 18:50:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=ouoC/FBl+sQKwNOoDmtusG7yiMygB/Z2CEfjfnuAC9o=; b=Ul5THk05vJQ5TleWa5SRw0/0c0tdVeGLTLo7UDmYnsFjZsKfXEmvAvFjDA68c/6GB/ KSzr4c0qYQty3gHXeDvq3Q4/jLvsaBzELq7kzCvnEbAAzbuUk/XTVBZxpg6BZinmeYC8 Qh4LcThMw/tOhtpjcduPAsaOnue790qBGNJsxMCmJVDpFO7RXq2auCH3f/JzsqqDqeCI mWhkBwBRLI/ihUFcfThwbTTT4/Z6kF1CyckLW7xgiAUkxcqIo6SU48h/wD+gjS6K3Z9g 1deEnZwA92I/u/PHyWOL7qEMEncwAUTD2OsQkIAXP2/rH+4M3xXSPOGxv14OijiT2rOg ASLQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojFoWzSVbKu36+rj2ZMDKMc0cuQTvuSa36N8eyyYcXzDZHm4Bc5 VavM42kJsXqrUSQF9Yw+tmor9xiE0XdLbetsIWfBorn1LKBdbiPUKEeCy6/nZ7faKK8P2Umta6w EtyAHJt4GDh7GsXBytZva0bQdhRmajLuiJUlmMwZPCHwY31awKPP3HGhyrMPMQqKsCA== X-Received: by 2002:a17:902:16a4:: with SMTP id h33-v6mr27524318plh.3.1539222601287; Wed, 10 Oct 2018 18:50:01 -0700 (PDT) X-Google-Smtp-Source: ACcGV60CCx37bwKCsRWrEl0e3/FPBcxnmd7Mt8Mdg8t3V9O0s5Du4Hf1JvZ7WnArMpc1A29SgBwn X-Received: by 2002:a17:902:16a4:: with SMTP id h33-v6mr27524264plh.3.1539222599924; Wed, 10 Oct 2018 18:49:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539222599; cv=none; d=google.com; s=arc-20160816; b=iPEO10b6ubYFFQnU9/l/vwONmXo2GssQuEBZAWl5UBPaMQMS9rqBfuGvzQA1KU+wfX USjOMlwTx/7WaMlD/MjUxDqTmznIFO9AKwszhE5BlNhXBbiQDJ2b1QD599ASFTqK2Srj YDill/cvqwfNLvtXoVRyZXdn4MW3W4u91qMCmfIpyJV2GmIDSqa6KprDqta9UUvkbaOE wazdWDMP/Dqfudp50dmlieOI4HHNuUE6TupvLLSq89NHhsVTV/bIARQr1/0EmGXgtTgn MjKzfrBAh7W+4mSLgVFHv7WL84g/eft2bpnuvKGAoalFZwHXKYnktmsHbj7Hff8G5NtK 33Zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=ouoC/FBl+sQKwNOoDmtusG7yiMygB/Z2CEfjfnuAC9o=; b=uebb0az91cEyOQC/paC2iT9UcfVrvCpGaCsjO/tI3FJzGvPuRl6PY93TYWjUBOOQp2 e8jo3yINb+nVpeUi+IJ9BwmYyPyHKqGwpBc06Vu4GIFYyLgYaIfLN6GPedWwUri31Xol GhdNR9zJqHfMK654Z0WAE8kzsEG5sq023uCJIpEJcHaHZ1n6INhkgShYKjROefsr2SEy Us0YLznL1Oj89GWJCjac3XArWnlr2VezJlRa6SdoUoj9n4ej/++PuPCo4Ro4tGiL/mDg 0bTca7G8voD2dUuK1SnMptpcH8Mf+stqXXdw5CMN0BdjVn65fDInAMOl+4swTMTT4ez7 UK3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id l14-v6si25628752pgi.34.2018.10.10.18.49.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 18:49:59 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 18:49:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,366,1534834800"; d="scan'208";a="264684939" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga005.jf.intel.com with ESMTP; 10 Oct 2018 18:48:34 -0700 Subject: [PATCH v4 1/3] mm: Shuffle initial free memory From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Kees Cook , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, keescook@chromium.org Date: Wed, 10 Oct 2018 18:36:47 -0700 Message-ID: <153922180696.838512.12621709717839260874.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." Another motivation for this change is performance in the presence of a memory-side cache. In the future, memory-side-cache technology will be available on generally available server platforms. The proposed randomization approach has been measured to improve the cache conflict rate by a factor of 2.5X on a well-known Java benchmark. It avoids performance peaks and valleys to provide more predictable performance. While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. That ordering can be detected by a memory side-cache. The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. Also the bulk of the work is done in the background as a part of deferred_init_memmap(). This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. Cc: Michal Hocko Cc: Kees Cook Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/list.h | 17 +++++ include/linux/mm.h | 17 +++++ include/linux/mmzone.h | 4 + init/Kconfig | 32 +++++++++ mm/Makefile | 1 mm/memblock.c | 9 ++- mm/memory_hotplug.c | 2 + mm/page_alloc.c | 2 + mm/shuffle.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 253 insertions(+), 1 deletion(-) create mode 100644 mm/shuffle.c diff --git a/include/linux/list.h b/include/linux/list.h index de04cc5ed536..43f963328d7c 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -150,6 +150,23 @@ static inline void list_replace_init(struct list_head *old, INIT_LIST_HEAD(old); } +/** + * list_swap - replace entry1 with entry2 and re-add entry1 at entry2's position + * @entry1: the location to place entry2 + * @entry2: the location to place entry1 + */ +static inline void list_swap(struct list_head *entry1, + struct list_head *entry2) +{ + struct list_head *pos = entry2->prev; + + list_del(entry2); + list_replace(entry1, entry2); + if (pos == entry1) + pos = entry2; + list_add(entry1, pos); +} + /** * list_del_init - deletes entry from list and reinitialize it. * @entry: the element to delete from the list. diff --git a/include/linux/mm.h b/include/linux/mm.h index 273d4dbd3883..5891bd4e5d29 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2043,6 +2043,23 @@ extern void mem_init_print_info(const char *str); extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); +#ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR +extern void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn); +extern void shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn); +#else +static inline void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn) +{ +} + +static inline void shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn) +{ +} +#endif + /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ea29f7081f9d..15029fedbfe6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1273,6 +1273,10 @@ void sparse_init(void); #else #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) +static inline int pfn_present(unsigned long pfn) +{ + return 1; +} #endif /* CONFIG_SPARSEMEM */ /* diff --git a/init/Kconfig b/init/Kconfig index 1f1bbf7540f8..64123c28eeca 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1701,6 +1701,38 @@ config SLAB_FREELIST_HARDENED sacrifies to harden the kernel slab allocator against common freelist exploit methods. +config SHUFFLE_PAGE_ALLOCATOR + bool "Page allocator randomization" + default SLAB_FREELIST_RANDOM + help + Randomization of the page allocator is both a security feature + and a performance feature on platforms that have a + direct-mapped memory side cache. See section 5.2.27 + Heterogeneous Memory Attribute Table (HMAT) in the ACPI 6.2a + specification for an example of how a platform advertises the + presence of a memory side cache. For the security + benefits of this capability it is expected to be paired with + SLAB_FREELIST_RANDOM as that adds local randomization for + small objects while SHUFFLE_PAGE_ALLOCATOR adds randomization at + SHUFFLE_PAGE_ORDER granularities. The runtime impact of the + shuffling is negligible. The performance implications of not + shuffling are significant on platforms with a direct-mapped + memory-side cache. + + Say Y if unsure. + +config SHUFFLE_PAGE_ORDER + depends on SHUFFLE_PAGE_ALLOCATOR + int "Page allocator shuffle order" + range 0 10 + default 10 + help + Specify the granularity at which shuffling (randomization) is + performed. By default this is set to MAX_ORDER-1 to minimize + runtime impact of randomization and with the expectation that + SLAB_FREELIST_RANDOM mitigates heap attacks on smaller + object granularities. + config SLUB_CPU_PARTIAL default y depends on SLUB && SMP diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..1ffbc67f7395 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_SLUB) += slub.o obj-$(CONFIG_KASAN) += kasan/ obj-$(CONFIG_FAILSLAB) += failslab.o obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o +obj-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o obj-$(CONFIG_MEMTEST) += memtest.o obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_QUICKLIST) += quicklist.o diff --git a/mm/memblock.c b/mm/memblock.c index b0ebca546ba1..5b57964352a4 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1985,9 +1985,16 @@ static unsigned long __init free_low_memory_core_early(void) * low ram will be on Node1 */ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, - NULL) + NULL) { + pg_data_t *pgdat; + count += __free_memory_core(start, end); + for_each_online_pgdat(pgdat) + shuffle_free_memory(pgdat, PHYS_PFN(start), + PHYS_PFN(end)); + } + return count; } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 61972da38d93..34c9b6eb3159 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -894,6 +894,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ zone->zone_pgdat->node_present_pages += onlined_pages; pgdat_resize_unlock(zone->zone_pgdat, &flags); + shuffle_zone(zone, pfn, zone_end_pfn(zone)); + if (onlined_pages) { node_states_set_node(nid, &arg); if (need_zonelists_rebuild) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a02ce11c49f2..9b295b2287da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1595,6 +1595,8 @@ static int __init deferred_init_memmap(void *data) } pgdat_resize_unlock(pgdat, &flags); + shuffle_zone(zone, first_init_pfn, zone_end_pfn(zone)); + /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); diff --git a/mm/shuffle.c b/mm/shuffle.c new file mode 100644 index 000000000000..5ed91b5b8441 --- /dev/null +++ b/mm/shuffle.c @@ -0,0 +1,170 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2018 Intel Corporation. All rights reserved. + +#include +#include +#include +#include +#include "internal.h" + +/* + * For two pages to be swapped in the shuffle, they must be free (on a + * 'free_area' lru), have the same order, and have the same migratetype. + */ +static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order) +{ + struct page *page; + + /* + * Given we're dealing with randomly selected pfns in a zone we + * need to ask questions like... + */ + + /* ...is the pfn even in the memmap? */ + if (!pfn_valid_within(pfn)) + return NULL; + + /* ...is the pfn in a present section or a hole? */ + if (!pfn_present(pfn)) + return NULL; + + /* ...is the page free and currently on a free_area list? */ + page = pfn_to_page(pfn); + if (!PageBuddy(page)) + return NULL; + + /* + * ...is the page on the same list as the page we will + * shuffle it with? + */ + if (page_order(page) != order) + return NULL; + + return page; +} + +/* + * Fisher-Yates shuffle the freelist which prescribes iterating through + * an array, pfns in this case, and randomly swapping each entry with + * another in the span, end_pfn - start_pfn. + * + * To keep the implementation simple it does not attempt to correct for + * sources of bias in the distribution, like modulo bias or + * pseudo-random number generator bias. I.e. the expectation is that + * this shuffling raises the bar for attacks that exploit the + * predictability of page allocations, but need not be a perfect + * shuffle. + * + * Note that we don't use @z->zone_start_pfn and zone_end_pfn(@z) + * directly since the caller may be aware of holes in the zone and can + * improve the accuracy of the random pfn selection. + */ +#define SHUFFLE_RETRY 10 +static void __meminit shuffle_zone_order(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn, const int order) +{ + unsigned long i, flags; + const int order_pages = 1 << order; + + if (start_pfn < z->zone_start_pfn) + start_pfn = z->zone_start_pfn; + if (end_pfn > zone_end_pfn(z)) + end_pfn = zone_end_pfn(z); + + /* probably means that start/end were outside the zone */ + if (end_pfn <= start_pfn) + return; + spin_lock_irqsave(&z->lock, flags); + start_pfn = ALIGN(start_pfn, order_pages); + for (i = start_pfn; i < end_pfn; i += order_pages) { + unsigned long j; + int migratetype, retry; + struct page *page_i, *page_j; + + /* + * We expect page_i, in the sub-range of a zone being + * added (@start_pfn to @end_pfn), to more likely be + * valid compared to page_j randomly selected in the + * span @zone_start_pfn to @spanned_pages. + */ + page_i = shuffle_valid_page(i, order); + if (!page_i) + continue; + + for (retry = 0; retry < SHUFFLE_RETRY; retry++) { + /* + * Pick a random order aligned page from the + * start of the zone. Use the *whole* zone here + * so that if it is freed in tiny pieces that we + * randomize in the whole zone, not just within + * those fragments. + * + * Since page_j comes from a potentially sparse + * address range we want to try a bit harder to + * find a shuffle point for page_i. + */ + j = z->zone_start_pfn + + ALIGN_DOWN(get_random_long() % z->spanned_pages, + order_pages); + page_j = shuffle_valid_page(j, order); + if (page_j && page_j != page_i) + break; + } + if (retry >= SHUFFLE_RETRY) { + pr_debug("%s: failed to swap %#lx\n", __func__, i); + continue; + } + + /* + * Each migratetype corresponds to its own list, make + * sure the types match otherwise we're moving pages to + * lists where they do not belong. + */ + migratetype = get_pageblock_migratetype(page_i); + if (get_pageblock_migratetype(page_j) != migratetype) { + pr_debug("%s: migratetype mismatch %#lx\n", __func__, i); + continue; + } + + list_swap(&page_i->lru, &page_j->lru); + + pr_debug("%s: swap: %#lx -> %#lx\n", __func__, i, j); + + /* take it easy on the zone lock */ + if ((i % (100 * order_pages)) == 0) { + spin_unlock_irqrestore(&z->lock, flags); + cond_resched(); + spin_lock_irqsave(&z->lock, flags); + } + } + spin_unlock_irqrestore(&z->lock, flags); +} + +void __meminit shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn) +{ + int i; + + /* shuffle all the orders at the specified order and higher */ + for (i = CONFIG_SHUFFLE_PAGE_ORDER; i < MAX_ORDER; i++) + shuffle_zone_order(z, start_pfn, end_pfn, i); +} + +/** + * shuffle_free_memory - reduce the predictability of the page allocator + * @pgdat: node page data + * @start_pfn: Limit the shuffle to the greater of this value or zone start + * @end_pfn: Limit the shuffle to the less of this value or zone end + * + * While shuffle_zone() attempts to avoid holes with pfn_valid() and + * pfn_present() they can not report sub-section sized holes. @start_pfn + * and @end_pfn limit the shuffle to the exact memory pages being freed. + */ +void __meminit shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn) +{ + struct zone *z; + + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) + shuffle_zone(z, start_pfn, end_pfn); +} From patchwork Thu Oct 11 01:36:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10635617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5739114DB for ; Thu, 11 Oct 2018 01:48:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30C232A71F for ; Thu, 11 Oct 2018 01:48:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2250F2A739; Thu, 11 Oct 2018 01:48:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 356862A71F for ; Thu, 11 Oct 2018 01:48:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F3606B000D; Wed, 10 Oct 2018 21:48:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1535E6B0010; Wed, 10 Oct 2018 21:48:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 002376B000D; Wed, 10 Oct 2018 21:48:51 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id AEC166B000D for ; Wed, 10 Oct 2018 21:48:51 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id z12-v6so6492988pfl.17 for ; Wed, 10 Oct 2018 18:48:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=jJRJ5bCI8z/+lQYvc7x/IOpfKalpazRrgYNUfgI64GU=; b=Vfepp6hst5BWQPq5uUM62nCfkxeQW+rAQvqBzKBHszyaOzBkjBowxSu2uco7NOTp9S Ck+afjbsSeS4vTrrQKj/M0/8g1i9KBxdvWvDhROJ6tiIWM9rAHmQPKOxT8vSx6+rQK6a 9+fYAyCSAF06Io80LMYjYzmh/UP61zxCCFNoOc6ecR6qSg3FVUa/rWzt7vRYDF+3Dbuu 2d9rxkTFtpQmRMYjQdj7ZWifmdf+m87FyDrrK9Lrmj5yBstdZGoXt1tnCqVwob55auPT 6pPYu6akIZRCUWYvwbS8ph/UxPhMZBYbS2DKIIObMC4mK68qzzVYVS7g0KwEK+6mOTQ6 YNGQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojeew+ebZXCBN93sBTIxRrbsThy0sfasWxZHRdRnQOxSA33ghqN r6PCRawNTrzpBD5ymwllOQ3nP4h5DpP+T712U2juvBXxqTvZh5MjcqzqYtXLUK6kl8DFuFp3CIJ 5uMdjKuSgfWR0OhAuCao/mKjucaCNxRNi5Zde7w8+J4yybIX5hTnUFY80cUqGQXMk9Q== X-Received: by 2002:a63:2807:: with SMTP id o7-v6mr31591796pgo.155.1539222531342; Wed, 10 Oct 2018 18:48:51 -0700 (PDT) X-Google-Smtp-Source: ACcGV60A5b6Ytj6WTYBSAzoYSP+/fQtYWdRnNETWsbsSP/GZkZ1HTuEUMNS0wkhFEGO1CHx0q5in X-Received: by 2002:a63:2807:: with SMTP id o7-v6mr31591747pgo.155.1539222530150; Wed, 10 Oct 2018 18:48:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539222530; cv=none; d=google.com; s=arc-20160816; b=vt1cI+rslAJ1K1v3lcqluCexUiFKFnCyT3VkmAkZGAmI8U9mJ7VoPxsu6ivfAmG/11 gR6CiQGK7QREa6qV6Ss8qUA6koQljoRuc8/9JECX7hDnsY6n9omDE9SVFcHZK/J89ofM kdb1V6GPx6cJvn8dFIhrxregH5apAN6YXRTpJfFWpGkLYqus0SDm7GuZhcL3g7VecjE6 c6mICPtTBTVsD/dgfod9WqT+GZqz5hlNCbj7FwaRX9/Dx2b+rLXPxruQ8nfffbO8VGo3 DQSH414sZ0N3aaNtyg4/yJz0ZZSAbh+WzXHZEqJA84hKNZvy/RFTO8B+b1drw4qZWUc8 TcjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=jJRJ5bCI8z/+lQYvc7x/IOpfKalpazRrgYNUfgI64GU=; b=LY5ajM706FojW4U1zzFE+UeLLcLv/nSF8QO8UO+ziaOZF/oFvxpZHPl/4aIKlL0ML4 vkdDo4UnR5oTqHGbtHjzl0X9nImmE55FiTBhHjFrqjlipnV5Xdll9wOECYuIaUhm9u/z GCfJ6QR1ECs8c6eCkonJMlTYxaznByWHo2Jeo3NYynW0U2f2amuuV7LQ3b4zZoNDl4hV OvWc0i5/3VLVmw2oBr8tWH+7V0ne5QfblxjVog43KqoAqV5zX+4jZThGZUqmVtsATrUn vJYoDIfQt76lH6Nv7z4G57FkVeno6ZTQmiKYg4q/O+xB/XOd2HnJ8qA+RKfQCPdRDhc+ 7+ew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id n7-v6si25709344plp.43.2018.10.10.18.48.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 18:48:50 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 18:48:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,366,1534834800"; d="scan'208";a="240332123" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga004.jf.intel.com with ESMTP; 10 Oct 2018 18:48:39 -0700 Subject: [PATCH v4 2/3] mm: Move buddy list manipulations into helpers From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, keescook@chromium.org Date: Wed, 10 Oct 2018 18:36:52 -0700 Message-ID: <153922181209.838512.17477991183677520315.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP In preparation for runtime randomization of the zone lists, take all (well, most of) the list_*() functions in the buddy allocator and put them in helper functions. Provide a common control point for injecting additional behavior when freeing pages. Cc: Michal Hocko Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/mm.h | 3 -- include/linux/mm_types.h | 3 ++ include/linux/mmzone.h | 51 ++++++++++++++++++++++++++++++++++ mm/compaction.c | 4 +-- mm/page_alloc.c | 70 ++++++++++++++++++---------------------------- 5 files changed, 84 insertions(+), 47 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5891bd4e5d29..856b0530c55d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -473,9 +473,6 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma) struct mmu_gather; struct inode; -#define page_private(page) ((page)->private) -#define set_page_private(page, v) ((page)->private = (v)) - #if !defined(__HAVE_ARCH_PTE_DEVMAP) || !defined(CONFIG_TRANSPARENT_HUGEPAGE) static inline int pmd_devmap(pmd_t pmd) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5ed8f6292a53..72f37ea6dedb 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -209,6 +209,9 @@ struct page { #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) +#define page_private(page) ((page)->private) +#define set_page_private(page, v) ((page)->private = (v)) + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 15029fedbfe6..0b91ce871895 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include /* Free memory management - zoned buddy allocator. */ @@ -98,6 +100,55 @@ struct free_area { unsigned long nr_free; }; +/* Used for pages not on another list */ +static inline void add_to_free_area(struct page *page, struct free_area *area, + int migratetype) +{ + list_add(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages not on another list */ +static inline void add_to_free_area_tail(struct page *page, struct free_area *area, + int migratetype) +{ + list_add_tail(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages which are on another list */ +static inline void move_to_free_area(struct page *page, struct free_area *area, + int migratetype) +{ + list_move(&page->lru, &area->free_list[migratetype]); +} + +static inline struct page *get_page_from_free_area(struct free_area *area, + int migratetype) +{ + return list_first_entry_or_null(&area->free_list[migratetype], + struct page, lru); +} + +static inline void rmv_page_order(struct page *page) +{ + __ClearPageBuddy(page); + set_page_private(page, 0); +} + +static inline void del_page_from_free_area(struct page *page, + struct free_area *area, int migratetype) +{ + list_del(&page->lru); + rmv_page_order(page); + area->nr_free--; +} + +static inline bool free_area_empty(struct free_area *area, int migratetype) +{ + return list_empty(&area->free_list[migratetype]); +} + struct pglist_data; /* diff --git a/mm/compaction.c b/mm/compaction.c index 7c607479de4a..44adbfa073b3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1359,13 +1359,13 @@ static enum compact_result __compact_finished(struct zone *zone, bool can_steal; /* Job done if page is free of the right migratetype */ - if (!list_empty(&area->free_list[migratetype])) + if (!free_area_empty(area, migratetype)) return COMPACT_SUCCESS; #ifdef CONFIG_CMA /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */ if (migratetype == MIGRATE_MOVABLE && - !list_empty(&area->free_list[MIGRATE_CMA])) + !free_area_empty(area, MIGRATE_CMA)) return COMPACT_SUCCESS; #endif /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9b295b2287da..e1e0b54423f0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -704,12 +704,6 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void rmv_page_order(struct page *page) -{ - __ClearPageBuddy(page); - set_page_private(page, 0); -} - /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -810,13 +804,11 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(&buddy->lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + del_page_from_free_area(buddy, &zone->free_area[order], + migratetype); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -866,15 +858,13 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(&page->lru, - &zone->free_area[order].free_list[migratetype]); - goto out; + add_to_free_area_tail(page, &zone->free_area[order], + migratetype); + return; } } - list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_free_area(page, &zone->free_area[order], migratetype); } /* @@ -1819,7 +1809,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - list_add(&page[size].lru, &area->free_list[migratetype]); + add_to_free_area(&page[size], area, migratetype); area->nr_free++; set_page_order(&page[size], high); } @@ -1961,13 +1951,10 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < MAX_ORDER; ++current_order) { area = &(zone->free_area[current_order]); - page = list_first_entry_or_null(&area->free_list[migratetype], - struct page, lru); + page = get_page_from_free_area(area, migratetype); if (!page) continue; - list_del(&page->lru); - rmv_page_order(page); - area->nr_free--; + del_page_from_free_area(page, area, migratetype); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2057,8 +2044,7 @@ static int move_freepages(struct zone *zone, } order = page_order(page); - list_move(&page->lru, - &zone->free_area[order].free_list[migratetype]); + move_to_free_area(page, &zone->free_area[order], migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -2207,7 +2193,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, single_page: area = &zone->free_area[current_order]; - list_move(&page->lru, &area->free_list[start_type]); + move_to_free_area(page, area, start_type); } /* @@ -2231,7 +2217,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (fallback_mt == MIGRATE_TYPES) break; - if (list_empty(&area->free_list[fallback_mt])) + if (free_area_empty(area, fallback_mt)) continue; if (can_steal_fallback(order, migratetype)) @@ -2318,9 +2304,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, for (order = 0; order < MAX_ORDER; order++) { struct free_area *area = &(zone->free_area[order]); - page = list_first_entry_or_null( - &area->free_list[MIGRATE_HIGHATOMIC], - struct page, lru); + page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); if (!page) continue; @@ -2433,8 +2417,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) VM_BUG_ON(current_order == MAX_ORDER); do_steal: - page = list_first_entry(&area->free_list[fallback_mt], - struct page, lru); + page = get_page_from_free_area(area, fallback_mt); steal_suitable_fallback(zone, page, start_migratetype, can_steal); @@ -2861,6 +2844,7 @@ EXPORT_SYMBOL_GPL(split_page); int __isolate_free_page(struct page *page, unsigned int order) { + struct free_area *area = &page_zone(page)->free_area[order]; unsigned long watermark; struct zone *zone; int mt; @@ -2885,9 +2869,8 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(&page->lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + + del_page_from_free_area(page, area, mt); /* * Set the pageblock if the isolated page is at least half of a @@ -3181,13 +3164,13 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { - if (!list_empty(&area->free_list[mt])) + if (!free_area_empty(area, mt)) return true; } #ifdef CONFIG_CMA if ((alloc_flags & ALLOC_CMA) && - !list_empty(&area->free_list[MIGRATE_CMA])) { + !free_area_empty(area, MIGRATE_CMA)) { return true; } #endif @@ -5022,7 +5005,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) types[order] = 0; for (type = 0; type < MIGRATE_TYPES; type++) { - if (!list_empty(&area->free_list[type])) + if (!free_area_empty(area, type)) types[order] |= 1 << type; } } @@ -8128,6 +8111,9 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) spin_lock_irqsave(&zone->lock, flags); pfn = start_pfn; while (pfn < end_pfn) { + struct free_area *area; + int mt; + if (!pfn_valid(pfn)) { pfn++; continue; @@ -8146,13 +8132,13 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); + area = &zone->free_area[order]; #ifdef CONFIG_DEBUG_VM pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); #endif - list_del(&page->lru); - rmv_page_order(page); - zone->free_area[order].nr_free--; + mt = get_pageblock_migratetype(page); + del_page_from_free_area(page, area, mt); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); From patchwork Thu Oct 11 01:36:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10635615 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE6ED112B for ; Thu, 11 Oct 2018 01:48:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA8032A71F for ; Thu, 11 Oct 2018 01:48:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B20E2A739; Thu, 11 Oct 2018 01:48:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 59A5A2A71F for ; Thu, 11 Oct 2018 01:48:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 639536B000A; Wed, 10 Oct 2018 21:48:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5E7A66B000C; Wed, 10 Oct 2018 21:48:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B0806B000D; Wed, 10 Oct 2018 21:48:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 077BF6B000A for ; Wed, 10 Oct 2018 21:48:47 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id 11-v6so5000339pgd.1 for ; Wed, 10 Oct 2018 18:48:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=kx8kRChaqJUYbY5CZ/f+26XmN/O4TloimqfHjGvVJRw=; b=Anj4DSYZQd46M8ym75SZQ8cUUo/mHyltebu8wK86Je2ddqpU9MqYCVVwDAg53FavrA kUPQ1AbFLAY8bBX3manoho91Ryu1Sqy6ZHZdowo7XL68ZcImoWHqPz5mSRlOjdbqclPv /a891jFCoFpOm4zHcegfnCXTwS2NBjxnf2FJRIVElimLjdDvgRKBh9z45JdCfjEv8Bk1 Oy4pw7QmCKpXV/VlqsNYgIomhNlkOZBWZgoraVzOgqCah8TI7i8cwzxQvX72iad8+9yt F0r/YXV3mCFrp7ScwAJHBS5COC9JOO0G715igKJq0VUGBTfXeCJNSutBQ9Fh1P9FnvST e2Pg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojEBTHJuNyFlPK2RfjZP8ZvF1UXobxo1yfwtM1hRCK0InhJPdcc OhLyPzO6G7gde0MBYTgo3dkrLQo1X23eW6TjHjLGZ7SONM/srAc94QXPLLx5u8XNQN7DPdffxY2 6QUsv1U3/EokUyTmeKzAfArR0DDcddFfn3E9humFE23x0iJ1pRD0GTDvHWkaFypkprA== X-Received: by 2002:a17:902:9a04:: with SMTP id v4-v6mr34799993plp.247.1539222526691; Wed, 10 Oct 2018 18:48:46 -0700 (PDT) X-Google-Smtp-Source: ACcGV62fJ5J8YvVFuu/MfQhpg6INdT2sqDM6NlocbqHxqBIMmZNgJpQN32CaF5Ufz0c8Rmr0thYj X-Received: by 2002:a17:902:9a04:: with SMTP id v4-v6mr34799961plp.247.1539222525761; Wed, 10 Oct 2018 18:48:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539222525; cv=none; d=google.com; s=arc-20160816; b=rt+lPlLUf5+9nTRpJlUNrqrVHtTVowUmDZ1Hr7pPTvN5DRNqAjYeNihQFkUhRGePht NI/vBg4D1h1kgRBOI0rF/7qYWFzKMij6rOdffKSd31KjB8XcCfn/awlwa6OH0A1Hl5EJ Q9sT7QtIR+s1Ghs7KCm1U6stDNMEJ4/wahY9HhES3aM4lyfHjl6NlqdzwbQ1DzqURkcp g6r3ipROaFn5x1bkBZkpsRtIvqxQg36/F2F+SD/+QrOmuxUPZVqHQDaVdKx/DHNCASxw HnBIyaANq8uW4Nlvl8nbadrvXChq5qcgskcFqAYc0prTKg0KDnF4be6J8pLGLojrwcYS 6SOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=kx8kRChaqJUYbY5CZ/f+26XmN/O4TloimqfHjGvVJRw=; b=Ri9hdkLlZrFDq0xMf97xxZx9NVAc2hApAGKGaAZOP0AdS+N5AOKjTxRQ6kV5Mkbkpu HnXILwmZZiBA66o1RUeaKNen2HCNH9botDtW9B6pEYOgUwXFvae08lQZpYGLiixOquxX ghWUTdMdjKIC3wwNZEYPm2rbBF/AutDV4v5NQ6l0KgVZR2LPmQ3WwkTAmG5UC3d776Wi gxTyHPCUIZ7lzmrOxldq/wZIjDhGzFbwbbyF3drsjI1ZCJDjVN7ngO4dr+PzplsmRslD lm8gYZ3FKmYKyvPUv5gXYqAruPqzE9iUIsq/7wIZVVvm6EfgSi7uoalND3Z3J/jRAGnk GpXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id e11-v6si353473plt.223.2018.10.10.18.48.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 18:48:45 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 18:48:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,366,1534834800"; d="scan'208";a="271364625" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006.fm.intel.com with ESMTP; 10 Oct 2018 18:48:45 -0700 Subject: [PATCH v4 3/3] mm: Maintain randomization of page free lists From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Kees Cook , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, keescook@chromium.org Date: Wed, 10 Oct 2018 18:36:57 -0700 Message-ID: <153922181720.838512.12133416124816480558.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When freeing a page with an order >= shuffle_page_order randomly select the front or back of the list for insertion. While the mm tries to defragment physical pages into huge pages this can tend to make the page allocator more predictable over time. Inject the front-back randomness to preserve the initial randomness established by shuffle_free_memory() when the kernel was booted. The overhead of this manipulation is constrained by only being applied for MAX_ORDER sized pages by default. Cc: Michal Hocko Cc: Kees Cook Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/mm.h | 10 ++++++++++ include/linux/mmzone.h | 10 ++++++++++ mm/page_alloc.c | 11 +++++++++-- mm/shuffle.c | 16 ++++++++++++++++ 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 856b0530c55d..91a1e7fb465a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2045,6 +2045,11 @@ extern void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, unsigned long end_pfn); extern void shuffle_zone(struct zone *z, unsigned long start_pfn, unsigned long end_pfn); + +static inline bool is_shuffle_order(int order) +{ + return order >= CONFIG_SHUFFLE_PAGE_ORDER; +} #else static inline void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, unsigned long end_pfn) @@ -2055,6 +2060,11 @@ static inline void shuffle_zone(struct zone *z, unsigned long start_pfn, unsigned long end_pfn) { } + +static inline bool is_shuffle_order(int order) +{ + return false; +} #endif /* Free the reserved page into the buddy system, so it gets managed. */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0b91ce871895..c7abf21ed9f4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -98,6 +98,8 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; unsigned long nr_free; + u64 rand; + u8 rand_bits; }; /* Used for pages not on another list */ @@ -116,6 +118,14 @@ static inline void add_to_free_area_tail(struct page *page, struct free_area *ar area->nr_free++; } +#ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR +/* Used to preserve page allocation order entropy */ +void add_to_free_area_random(struct page *page, struct free_area *area, + int migratetype); +#else +#define add_to_free_area_random add_to_free_area +#endif + /* Used for pages which are on another list */ static inline void move_to_free_area(struct page *page, struct free_area *area, int migratetype) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e1e0b54423f0..eef241ceb2c4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -850,7 +851,8 @@ static inline void __free_one_page(struct page *page, * so it's less likely to be used soon and more likely to be merged * as a higher order page */ - if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn)) { + if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn) + && !is_shuffle_order(order)) { struct page *higher_page, *higher_buddy; combined_pfn = buddy_pfn & pfn; higher_page = page + (combined_pfn - pfn); @@ -864,7 +866,12 @@ static inline void __free_one_page(struct page *page, } } - add_to_free_area(page, &zone->free_area[order], migratetype); + if (is_shuffle_order(order)) + add_to_free_area_random(page, &zone->free_area[order], + migratetype); + else + add_to_free_area(page, &zone->free_area[order], migratetype); + } /* diff --git a/mm/shuffle.c b/mm/shuffle.c index 5ed91b5b8441..3937d0bc3670 100644 --- a/mm/shuffle.c +++ b/mm/shuffle.c @@ -168,3 +168,19 @@ void __meminit shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) shuffle_zone(z, start_pfn, end_pfn); } + +void add_to_free_area_random(struct page *page, struct free_area *area, + int migratetype) +{ + if (area->rand_bits == 0) { + area->rand_bits = 64; + area->rand = get_random_u64(); + } + + if (area->rand & 1) + add_to_free_area(page, area, migratetype); + else + add_to_free_area_tail(page, area, migratetype); + area->rand_bits--; + area->rand >>= 1; +}