From patchwork Sat Sep 15 16:23:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10601555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AF39112B for ; Sat, 15 Sep 2018 16:35:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68C0B2A9D9 for ; Sat, 15 Sep 2018 16:35:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5C64C2AA26; Sat, 15 Sep 2018 16:35:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 655FF2A9D9 for ; Sat, 15 Sep 2018 16:35:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C1958E0005; Sat, 15 Sep 2018 12:35:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 570EA8E0001; Sat, 15 Sep 2018 12:35:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4392E8E0005; Sat, 15 Sep 2018 12:35:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 01AB48E0001 for ; Sat, 15 Sep 2018 12:35:19 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id d22-v6so6250544pfn.3 for ; Sat, 15 Sep 2018 09:35:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=wYkG3tH3q/i+cGNeqELamqDAI5ljgHxmfR4VK95KA5U=; b=gB90bgZzzS8LpUIJ7UsnWmYUBj+t8ORiXgS2tjAbhHEDfXu37YMPqREUXqA8Wl3aun E8pAcKlq3Hn99vtKWXQZMosLkWPpa6l8/bAovJUbrlUvLEtbtNQgb5OBhoA5WnbclcZU R6lbFxIoyXMPzUerfZg1NTfloL4szq+kXv8E30ORznmzFxwSdO3F4yYIMSpihmM91UDi 7HnRK0taP2BnE7XAmBE6IZrYT1K8e1UutS77Nci2OgH2xZklf9fUuR7wHKStF/tW1LaD pZ+e4xmxCAgRwYu2EPMt3U7OQh8o0LEcvTlpfKJNdSGskp7d6nydy6E4CVCroGQjZ40Q 7ICA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CN/9D9tAi7PkwpJ5xFeUtCuUZtqbwmuDbXwoLd+mnvZg/cV2Nd q2JnmZKdJM5wChavzJ/ZVi2akGP7CNo8Td01aWgVVo/zQ5H0lWOqKYDaILQH66zv/DhW9EUY92x R+6h6diu6YhDBtozAtdL4J9Vri1qT/mebb9wobr6mluWU6tXFwTapBIOmd6eE6MdB6A== X-Received: by 2002:a62:205d:: with SMTP id g90-v6mr18329245pfg.253.1537029318648; Sat, 15 Sep 2018 09:35:18 -0700 (PDT) X-Google-Smtp-Source: ANB0Vda7Ahd207i/ab8/Cod3FQR/S1krtKrvvKqvlpSWOdr3/xrnzxG/1JMFKE1ejawAqXcltCPR X-Received: by 2002:a62:205d:: with SMTP id g90-v6mr18329213pfg.253.1537029317500; Sat, 15 Sep 2018 09:35:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537029317; cv=none; d=google.com; s=arc-20160816; b=RDsHiPe0baVWGyQVOkpRFhNP5aor60r79FV1dvnv5IX+NpBZiws0D14ShqsnSP5nLg OQf/34Q8z0puYHbUdRbyys8ywoAdjbTOmRAnit+Nh/LwdGtpyUfiZ8YO3SzYC6/XOBm2 y5sNZHK6PnUZOja71n+ry5ADT7dxXNR3prQwPBxzesxeVAsVFSvsjE4a8ch1g67k8+hg ODgvFdJGe84lK6ciXdjb0LICR6MSQTwZVAP6Dyb1QvtFqzUJKbS6r6xWxFyof+RjpCva oI0aAe3djb9D4uL2Yo28W7q6cHGy2T2mt9yCpZEyXa1YenY21twrr3aRbUPlZErcuxeE mhBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=wYkG3tH3q/i+cGNeqELamqDAI5ljgHxmfR4VK95KA5U=; b=najpwqIF+saNPQliKt+Bp2PBzRUlTXnkOx7KOrxcBf4J65U2ShPq2+SxfLIwcFmlcA AJSfcj2mPmXmYOUfXcNNY6gAoDHYKCD0bZaRe+d8lWSyGu/FeuM6ToTARNngbhs/2tRw uRXr/B/BvZIWdvWJsla6RBoJW4uFDZXsSA9waLr5orWFYpI7EK8H/Shl/gcqsWnegydl FewxJuGH9FNtxkAqS+mTHuW/J+i+bGygg7dSBe3el2JucBbvZuWQ6xFvzoEnQHoYyTFa ujmLCMfMnUMn1EbB9g5ihhavX19rcDOJg6FfV++c//R4N543zBy6JRGYBFRwfh8r7KxX gboQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id bh12-v6si9653943plb.425.2018.09.15.09.35.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 15 Sep 2018 09:35:17 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Sep 2018 09:35:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,377,1531810800"; d="scan'208";a="262887036" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005.fm.intel.com with ESMTP; 15 Sep 2018 09:34:50 -0700 Subject: [PATCH 1/3] mm: Shuffle initial free memory From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Kees Cook , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Sat, 15 Sep 2018 09:23:08 -0700 Message-ID: <153702858808.1603922.13788275916530966227.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory. The shuffling is done in terms of 'shuffle_page_order' sized free pages where the default shuffle_page_order is MAX_ORDER-1 i.e. 10, 4MB. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. Also the bulk of the work is done in the background as a part of deferred_init_memmap(). Cc: Michal Hocko Cc: Kees Cook Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/list.h | 17 +++++ include/linux/mm.h | 2 + include/linux/mmzone.h | 4 + mm/bootmem.c | 9 ++- mm/nobootmem.c | 7 ++ mm/page_alloc.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 207 insertions(+), 4 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index de04cc5ed536..43f963328d7c 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -150,6 +150,23 @@ static inline void list_replace_init(struct list_head *old, INIT_LIST_HEAD(old); } +/** + * list_swap - replace entry1 with entry2 and re-add entry1 at entry2's position + * @entry1: the location to place entry2 + * @entry2: the location to place entry1 + */ +static inline void list_swap(struct list_head *entry1, + struct list_head *entry2) +{ + struct list_head *pos = entry2->prev; + + list_del(entry2); + list_replace(entry1, entry2); + if (pos == entry1) + pos = entry2; + list_add(entry1, pos); +} + /** * list_del_init - deletes entry from list and reinitialize it. * @entry: the element to delete from the list. diff --git a/include/linux/mm.h b/include/linux/mm.h index a61ebe8ad4ca..588f34e4390e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2040,6 +2040,8 @@ extern void adjust_managed_page_count(struct page *page, long count); extern void mem_init_print_info(const char *str); extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); +extern void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn); /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96734e0..8f8fc7dab5cb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1277,6 +1277,10 @@ void sparse_init(void); #else #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) +static inline int pfn_present(unsigned long pfn) +{ + return 1; +} #endif /* CONFIG_SPARSEMEM */ /* diff --git a/mm/bootmem.c b/mm/bootmem.c index 97db0e8e362b..7f5ff899c622 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -210,6 +210,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size) static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) { struct page *page; + int nid = bdata - bootmem_node_data; unsigned long *map, start, end, pages, cur, count = 0; if (!bdata->node_bootmem_map) @@ -219,8 +220,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) start = bdata->node_min_pfn; end = bdata->node_low_pfn; - bdebug("nid=%td start=%lx end=%lx\n", - bdata - bootmem_node_data, start, end); + bdebug("nid=%d start=%lx end=%lx\n", nid, start, end); while (start < end) { unsigned long idx, vec; @@ -276,7 +276,10 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) __free_pages_bootmem(page++, cur++, 0); bdata->node_bootmem_map = NULL; - bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count); + shuffle_free_memory(NODE_DATA(nid), bdata->node_min_pfn, + bdata->node_low_pfn); + + bdebug("nid=%d released=%lx\n", nid, count); return count; } diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 439af3b765a7..40b42434e805 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -131,6 +131,7 @@ static unsigned long __init free_low_memory_core_early(void) { unsigned long count = 0; phys_addr_t start, end; + pg_data_t *pgdat; u64 i; memblock_clear_hotplug(0, -1); @@ -144,8 +145,12 @@ static unsigned long __init free_low_memory_core_early(void) * low ram will be on Node1 */ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, - NULL) + NULL) { count += __free_memory_core(start, end); + for_each_online_pgdat(pgdat) + shuffle_free_memory(pgdat, PHYS_PFN(start), + PHYS_PFN(end)); + } return count; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89d2a2ab3fe6..2fff9e69d8f3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #include #include @@ -72,6 +73,13 @@ #include #include "internal.h" +/* + * page_alloc.shuffle_page_order gates which page orders are shuffled by + * shuffle_zone() during memory initialization. + */ +static int __read_mostly shuffle_page_order = MAX_ORDER-1; +module_param(shuffle_page_order, int, 0444); + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_FRACTION (8) @@ -1035,6 +1043,168 @@ static __always_inline bool free_pages_prepare(struct page *page, return true; } +/* + * For two pages to be swapped in the shuffle, they must be free (on a + * 'free_area' lru), have the same order, and have the same migratetype. + */ +static struct page * __init shuffle_valid_page(unsigned long pfn, int order) +{ + struct page *page; + + /* + * Given we're dealing with randomly selected pfns in a zone we + * need to ask questions like... + */ + + /* ...is the pfn even in the memmap? */ + if (!pfn_valid_within(pfn)) + return NULL; + + /* ...is the pfn in a present section or a hole? */ + if (!pfn_present(pfn)) + return NULL; + + /* ...is the page free and currently on a free_area list? */ + page = pfn_to_page(pfn); + if (!PageBuddy(page)) + return NULL; + + /* + * ...is the page on the same list as the page we will + * shuffle it with? + */ + if (page_order(page) != order) + return NULL; + + return page; +} + +/* + * Fisher-Yates shuffle the freelist which prescribes iterating through + * an array, pfns in this case, and randomly swapping each entry with + * another in the span, end_pfn - start_pfn. + * + * To keep the implementation simple it does not attempt to correct for + * sources of bias in the distribution, like modulo bias or + * pseudo-random number generator bias. I.e. the expectation is that + * this shuffling raises the bar for attacks that exploit the + * predictability of page allocations, but need not be a perfect + * shuffle. + * + * Note that we don't use @z->zone_start_pfn and zone_end_pfn(@z) + * directly since the caller may be aware of holes in the zone and can + * improve the accuracy of the random pfn selection. + */ +#define SHUFFLE_RETRY 10 +static void __init shuffle_zone_order(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn, const int order) +{ + unsigned long i, flags; + const int order_pages = 1 << order; + + if (start_pfn < z->zone_start_pfn) + start_pfn = z->zone_start_pfn; + if (end_pfn > zone_end_pfn(z)) + end_pfn = zone_end_pfn(z); + + /* probably means that start/end were outside the zone */ + if (end_pfn <= start_pfn) + return; + spin_lock_irqsave(&z->lock, flags); + start_pfn = ALIGN(start_pfn, order_pages); + for (i = start_pfn; i < end_pfn; i += order_pages) { + unsigned long j; + int migratetype, retry; + struct page *page_i, *page_j; + + /* + * We expect page_i, in the sub-range of a zone being + * added (@start_pfn to @end_pfn), to more likely be + * valid compared to page_j randomly selected in the + * span @zone_start_pfn to @spanned_pages. + */ + page_i = shuffle_valid_page(i, order); + if (!page_i) + continue; + + for (retry = 0; retry < SHUFFLE_RETRY; retry++) { + /* + * Pick a random order aligned page from the + * start of the zone. Use the *whole* zone here + * so that if it is freed in tiny pieces that we + * randomize in the whole zone, not just within + * those fragments. + * + * Since page_j comes from a potentially sparse + * address range we want to try a bit harder to + * find a shuffle point for page_i. + */ + j = z->zone_start_pfn + + ALIGN_DOWN(get_random_long() % z->spanned_pages, + order_pages); + page_j = shuffle_valid_page(j, order); + if (page_j && page_j != page_i) + break; + } + if (retry >= SHUFFLE_RETRY) { + pr_debug("%s: failed to swap %#lx\n", __func__, i); + continue; + } + + /* + * Each migratetype corresponds to its own list, make + * sure the types match otherwise we're moving pages to + * lists where they do not belong. + */ + migratetype = get_pageblock_migratetype(page_i); + if (get_pageblock_migratetype(page_j) != migratetype) { + pr_debug("%s: migratetype mismatch %#lx\n", __func__, i); + continue; + } + + list_swap(&page_i->lru, &page_j->lru); + + pr_debug("%s: swap: %#lx -> %#lx\n", __func__, i, j); + + /* take it easy on the zone lock */ + if ((i % (100 * order_pages)) == 0) { + spin_unlock_irqrestore(&z->lock, flags); + cond_resched(); + spin_lock_irqsave(&z->lock, flags); + } + } + spin_unlock_irqrestore(&z->lock, flags); +} + +static void __init shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn) +{ + int i; + + /* shuffle all the orders at the specified order and higher */ + for (i = shuffle_page_order; i < MAX_ORDER; i++) + shuffle_zone_order(z, start_pfn, end_pfn, i); +} + +/** + * shuffle_free_memory - reduce the predictability of the page allocator + * @pgdat: node page data + * @start_pfn: Limit the shuffle to the greater of this value or zone start + * @end_pfn: Limit the shuffle to the less of this value or zone end + * + * While shuffle_zone() attempts to avoid holes with pfn_valid() and + * pfn_present() they can not report sub-section sized holes. @start_pfn + * and @end_pfn limit the shuffle to the exact memory pages being freed. + */ +void __init shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn) +{ + struct zone *z; + + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) + shuffle_zone(z, start_pfn, end_pfn); +} + #ifdef CONFIG_DEBUG_VM static inline bool free_pcp_prepare(struct page *page) { @@ -1583,6 +1753,8 @@ static int __init deferred_init_memmap(void *data) } pgdat_resize_unlock(pgdat, &flags); + shuffle_zone(zone, first_init_pfn, zone_end_pfn(zone)); + /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); From patchwork Sat Sep 15 16:23:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10601551 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A612114DA for ; Sat, 15 Sep 2018 16:35:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D71A2A9D9 for ; Sat, 15 Sep 2018 16:35:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 819E02AA08; Sat, 15 Sep 2018 16:35:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 360ED2A9D9 for ; Sat, 15 Sep 2018 16:34:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39E0A8E0003; Sat, 15 Sep 2018 12:34:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 326818E0001; Sat, 15 Sep 2018 12:34:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F02D8E0003; Sat, 15 Sep 2018 12:34:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id CA8608E0001 for ; Sat, 15 Sep 2018 12:34:57 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id d10-v6so5766052pll.22 for ; Sat, 15 Sep 2018 09:34:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=yo25X6Gr7afT8Pdlf7xIH9qZrWTitHosCYIeiv0UXRs=; b=LESzmXmfu7qxJ80ZGjocWZpmxcnXr7fINDqouudBelfD6LZv3zvTB8C44+B/9UOcw6 bZHAMiI+MlWrydz/ISftpuDKpMaZmCFm5zbYdmKAJnKJLaMqfolftvoQFGWv15OOtOf1 KJRHIF8IhBDEQunSS3x2n3d2nqZo6BkPtHR7dmpm4PhWRJLIGeF/zmtoc9vmvTlMRdRI WIPbE7YduEYV7dj7PAdwGx8jnowZz5hkgrZucP1ajFDx8DCIfSmG0qlRUlMWw02j81Q+ hqYSxCXJRlfV1AQpQ43jRfY5lVWwe7dN6X4/akHOsMEhwfvlb6c3BzYDttmuLTz9iDFT rU7A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Ah3OJI8JKPH6e4BH/zPGhfeB6TAqWHm8TOay2rJNn82nfxUfuQ v5NJ9Jce+tYXXFyRiho1U4n9AMlVzAA7O/IXLdaauiF0T9kEZH78BV4noga0qLLQnf3/CuFqWDc PHq4+3JDLDZ9K9elaxuf/3Jt7xtqPA6XZk/3dyJFGo5wwhOcCk104c8JVhtuQ2hZUug== X-Received: by 2002:a62:9c17:: with SMTP id f23-v6mr18040394pfe.209.1537029297481; Sat, 15 Sep 2018 09:34:57 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZDJBT3qMEFYrl7K+OYTb+AaT/6D7tP9II18pNPycrtxiOw5UJWacG9d4uyz5D+MnZQ+Fj/ X-Received: by 2002:a62:9c17:: with SMTP id f23-v6mr18040331pfe.209.1537029296200; Sat, 15 Sep 2018 09:34:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537029296; cv=none; d=google.com; s=arc-20160816; b=SWCONVgrJjOSCoEzjH57cwocYamS0I7T7/RrdeF0v+5KMqzUnetBXoJvVVIG7rbpsY dNkLZoGiUYDeAuKuR4ZQy92+8P4i9/nV8ND4dkROLYITdAI1qQo3vnnrcV4hA+jDRG2o QEVNcNhaM5ZQVkrw3l+6BygYnXVawZ/QRiJURWigGVon+NajbnYUT95hxkaorJoTEEK3 HaMwV2Y9fCr/Xn9XkoGD6Vyzt18LTgJz8gE1B07YeEHCkio34bGlcngIOcmYZliJiBf7 p5ad4Vt7ky3LxJKtNBaBOvJNj37iecPlM8WXmAVLU+rIzpz0rl+2Euhirv2mr2MTvtXS bSkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=yo25X6Gr7afT8Pdlf7xIH9qZrWTitHosCYIeiv0UXRs=; b=MbHbyp1cXZ6DbV8gYPBDkfo3OLKbwbvAR9JG41WMDayNgoyYgCmTLwqRLVi9lUrisj qeVSAXhctZsR04NzbRkzLrzt6kZ8CGHk+jsWKgY+9FRTGslqHNzSwkw3W11JoPtl7CDy I6P4Se4uedF02yo9Rd8BmmvGiYMfTE6GdoATNR/FLj20WzkqiR5827vv1CDJzTw3BdyV UECF6UBi6cEDAotrNCxHr6i25qBAl40prU3CiYMigOB8HiUWOkSfnpA8gAzNlIA0etJX kxue0qPnLDNVw8S4V+zcofsFoDxdaRMeNAALVsgoidhY43+QoBvbjZuJmQoBHMTlMkiF NoHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id j8-v6si8569338pll.493.2018.09.15.09.34.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 15 Sep 2018 09:34:56 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Sep 2018 09:34:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,377,1531810800"; d="scan'208";a="90776940" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga001.jf.intel.com with ESMTP; 15 Sep 2018 09:34:55 -0700 Subject: [PATCH 2/3] mm: Move buddy list manipulations into helpers From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Sat, 15 Sep 2018 09:23:13 -0700 Message-ID: <153702859340.1603922.13749357958673161601.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP In preparation for runtime randomization of the zone lists, take all (well, most of) the list_*() functions in the buddy allocator and put them in helper functions. Provide a common control point for injecting additional behavior when freeing pages. Cc: Michal Hocko Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/mm.h | 3 -- include/linux/mm_types.h | 3 ++ include/linux/mmzone.h | 51 ++++++++++++++++++++++++++++++++++ mm/compaction.c | 4 +-- mm/page_alloc.c | 70 ++++++++++++++++++---------------------------- 5 files changed, 84 insertions(+), 47 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 588f34e4390e..9a87ab0782c3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -473,9 +473,6 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma) struct mmu_gather; struct inode; -#define page_private(page) ((page)->private) -#define set_page_private(page, v) ((page)->private = (v)) - #if !defined(__HAVE_ARCH_PTE_DEVMAP) || !defined(CONFIG_TRANSPARENT_HUGEPAGE) static inline int pmd_devmap(pmd_t pmd) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index cd2bc939efd0..191610be62bd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -209,6 +209,9 @@ struct page { #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) +#define page_private(page) ((page)->private) +#define set_page_private(page, v) ((page)->private = (v)) + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f8fc7dab5cb..adf9b3a7440d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include /* Free memory management - zoned buddy allocator. */ @@ -98,6 +100,55 @@ struct free_area { unsigned long nr_free; }; +/* Used for pages not on another list */ +static inline void add_to_free_area(struct page *page, struct free_area *area, + int migratetype) +{ + list_add(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages not on another list */ +static inline void add_to_free_area_tail(struct page *page, struct free_area *area, + int migratetype) +{ + list_add_tail(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages which are on another list */ +static inline void move_to_free_area(struct page *page, struct free_area *area, + int migratetype) +{ + list_move(&page->lru, &area->free_list[migratetype]); +} + +static inline struct page *get_page_from_free_area(struct free_area *area, + int migratetype) +{ + return list_first_entry_or_null(&area->free_list[migratetype], + struct page, lru); +} + +static inline void rmv_page_order(struct page *page) +{ + __ClearPageBuddy(page); + set_page_private(page, 0); +} + +static inline void del_page_from_free_area(struct page *page, + struct free_area *area, int migratetype) +{ + list_del(&page->lru); + rmv_page_order(page); + area->nr_free--; +} + +static inline bool free_area_empty(struct free_area *area, int migratetype) +{ + return list_empty(&area->free_list[migratetype]); +} + struct pglist_data; /* diff --git a/mm/compaction.c b/mm/compaction.c index faca45ebe62d..48736044f682 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1358,13 +1358,13 @@ static enum compact_result __compact_finished(struct zone *zone, bool can_steal; /* Job done if page is free of the right migratetype */ - if (!list_empty(&area->free_list[migratetype])) + if (!free_area_empty(area, migratetype)) return COMPACT_SUCCESS; #ifdef CONFIG_CMA /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */ if (migratetype == MIGRATE_MOVABLE && - !list_empty(&area->free_list[MIGRATE_CMA])) + !free_area_empty(area, MIGRATE_CMA)) return COMPACT_SUCCESS; #endif /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2fff9e69d8f3..175f2e5f9e50 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -705,12 +705,6 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void rmv_page_order(struct page *page) -{ - __ClearPageBuddy(page); - set_page_private(page, 0); -} - /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -811,13 +805,11 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(&buddy->lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + del_page_from_free_area(buddy, &zone->free_area[order], + migratetype); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -867,15 +859,13 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(&page->lru, - &zone->free_area[order].free_list[migratetype]); - goto out; + add_to_free_area_tail(page, &zone->free_area[order], + migratetype); + return; } } - list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_free_area(page, &zone->free_area[order], migratetype); } /* @@ -1977,7 +1967,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - list_add(&page[size].lru, &area->free_list[migratetype]); + add_to_free_area(&page[size], area, migratetype); area->nr_free++; set_page_order(&page[size], high); } @@ -2119,13 +2109,10 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < MAX_ORDER; ++current_order) { area = &(zone->free_area[current_order]); - page = list_first_entry_or_null(&area->free_list[migratetype], - struct page, lru); + page = get_page_from_free_area(area, migratetype); if (!page) continue; - list_del(&page->lru); - rmv_page_order(page); - area->nr_free--; + del_page_from_free_area(page, area, migratetype); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2215,8 +2202,7 @@ static int move_freepages(struct zone *zone, } order = page_order(page); - list_move(&page->lru, - &zone->free_area[order].free_list[migratetype]); + move_to_free_area(page, &zone->free_area[order], migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -2365,7 +2351,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, single_page: area = &zone->free_area[current_order]; - list_move(&page->lru, &area->free_list[start_type]); + move_to_free_area(page, area, start_type); } /* @@ -2389,7 +2375,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (fallback_mt == MIGRATE_TYPES) break; - if (list_empty(&area->free_list[fallback_mt])) + if (free_area_empty(area, fallback_mt)) continue; if (can_steal_fallback(order, migratetype)) @@ -2476,9 +2462,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, for (order = 0; order < MAX_ORDER; order++) { struct free_area *area = &(zone->free_area[order]); - page = list_first_entry_or_null( - &area->free_list[MIGRATE_HIGHATOMIC], - struct page, lru); + page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); if (!page) continue; @@ -2591,8 +2575,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) VM_BUG_ON(current_order == MAX_ORDER); do_steal: - page = list_first_entry(&area->free_list[fallback_mt], - struct page, lru); + page = get_page_from_free_area(area, fallback_mt); steal_suitable_fallback(zone, page, start_migratetype, can_steal); @@ -3019,6 +3002,7 @@ EXPORT_SYMBOL_GPL(split_page); int __isolate_free_page(struct page *page, unsigned int order) { + struct free_area *area = &page_zone(page)->free_area[order]; unsigned long watermark; struct zone *zone; int mt; @@ -3043,9 +3027,8 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(&page->lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + + del_page_from_free_area(page, area, mt); /* * Set the pageblock if the isolated page is at least half of a @@ -3339,13 +3322,13 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { - if (!list_empty(&area->free_list[mt])) + if (!free_area_empty(area, mt)) return true; } #ifdef CONFIG_CMA if ((alloc_flags & ALLOC_CMA) && - !list_empty(&area->free_list[MIGRATE_CMA])) { + !free_area_empty(area, MIGRATE_CMA)) { return true; } #endif @@ -5191,7 +5174,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) types[order] = 0; for (type = 0; type < MIGRATE_TYPES; type++) { - if (!list_empty(&area->free_list[type])) + if (!free_area_empty(area, type)) types[order] |= 1 << type; } } @@ -8220,6 +8203,9 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) spin_lock_irqsave(&zone->lock, flags); pfn = start_pfn; while (pfn < end_pfn) { + struct free_area *area; + int mt; + if (!pfn_valid(pfn)) { pfn++; continue; @@ -8238,13 +8224,13 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); + area = &zone->free_area[order]; #ifdef CONFIG_DEBUG_VM pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); #endif - list_del(&page->lru); - rmv_page_order(page); - zone->free_area[order].nr_free--; + mt = get_pageblock_migratetype(page); + del_page_from_free_area(page, area, mt); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); From patchwork Sat Sep 15 16:23:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10601553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2D9A14DA for ; Sat, 15 Sep 2018 16:35:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8ED472A9D9 for ; Sat, 15 Sep 2018 16:35:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82DCF2AA26; Sat, 15 Sep 2018 16:35:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3EBE02A9D9 for ; Sat, 15 Sep 2018 16:35:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F8338E0004; Sat, 15 Sep 2018 12:35:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6A7F98E0001; Sat, 15 Sep 2018 12:35:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 597C08E0004; Sat, 15 Sep 2018 12:35:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 1B9F28E0001 for ; Sat, 15 Sep 2018 12:35:15 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id h4-v6so5766845pls.17 for ; Sat, 15 Sep 2018 09:35:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=6+eKTRMHGxIuQXh7ZEMX8h+NaT+2wxnpQFWD6QFiAhc=; b=DDb2RiAvI3T0IMCWryjQm8yLqE0///PWODG7H0rPoeUEhV4hePdYZEjb6z2WVCkJwY 0bwqx7kcUkc8YmRNjr8SslE900OKGJ2WzCYNnXJze+FIQoXbjcLZWQQ6K+GuSF6ATMBZ bupjW4pEEafjMhcqcNv6ZugqYRobH2nAa+1Y3wSPkhZCHoUv6e8mDV76CjRnLxMDX6h6 rFL53YaEYgF/mGrI9xhvYsVvc2YNImaDlWJW+v1Ws6AN/13pN4Pmm4MZklWOAs6fcdso yPJs/n8KZQoQ9AzHTfB1tw6NvIWm9qLOiDytnGNF7vTi3VnDaP3r2+SfvJqyYbvQFOVc 8aIQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51ClY21p51NpQh+MwKNjFSaJLtrXKxu4LOsssZTIbbSFsQaB2K31 3k5lbyOAA9GQL22Yj5Xjq0stomNY3PyPF3TGIg1VAmyt50HAwJtqnsxhGPOkP/uPkhwriwj9ZKv 03u6+sbfMTnz1BKzLi0kfDxWHCpt9r537TI0xZEtYinRJCmdjcKj/n+rCPssaNhKf9w== X-Received: by 2002:a63:dd49:: with SMTP id g9-v6mr16241270pgj.356.1537029314730; Sat, 15 Sep 2018 09:35:14 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZauKfnHbRmBI14re8Mn9WGjZieQBYcpSfJED3vHkOJJqr6GEx+PrNf7RFiRG2jof/J6J3n X-Received: by 2002:a63:dd49:: with SMTP id g9-v6mr16241238pgj.356.1537029313723; Sat, 15 Sep 2018 09:35:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537029313; cv=none; d=google.com; s=arc-20160816; b=IZvDv2m75jPXujdpP2ekwF+walyK9ywMYrRim+u0V7TO9sfLivT0foTK3otw3ypr4k gOeZz4A/uq9gQu+Io6hD2FchwACh7W1xaqfots+QMqeJygsFFhL/VuhLSv3wIPvlHXoE Hm48ucHzB+/MtB/nlvcP+wORsFF0d1OVb3E3CbfZbK8YcBXQ0U2o3TPiXuyEEqIYqOQh tAozfDiEGWKryrSzWa6iFNZDze5qWZYErdcyVYQo5zyd25EX9OxMYRhYzv5N4PhE/QRH l/mFaiUP8qWiVp3+DdSJZ0MRw6xPpSVsmryTEQ3LFJ0ZjE7j6NB9n/0nkHVyTfVpega2 KjZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=6+eKTRMHGxIuQXh7ZEMX8h+NaT+2wxnpQFWD6QFiAhc=; b=sS/ropFBj0+ABcEZUujjfI5EfbV/iI91GHI2lNzAS/ewcrsEGhWshEMtIz7Iq//OUv XxuV3xiYGDIi6gnsrUDAC1MV7U03RT0LzurvwE86o2e3pz6Uwtx9SyLxHyRh24GCZUb1 Yr1dTZbF4VRsx4T0RU0ivw7Akd4IcwtVO7rxK+xQmDDIx8QAK6/5SVDxTNorcL+V4Sly TGwal1x33oDE7Gc6T2fbWKBYngz4JyNUTMFw5lEnDwOFtvrCrdDC2Qnmd9YxtWNcumpJ KIHW05sVPqIo5a9wSXzqsoeIcWDuNvkd+evNZjBdjBNR+29i2WRZOzqyHyIpN+A2HNB9 Yrkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id t7-v6si9896179pfh.3.2018.09.15.09.35.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 15 Sep 2018 09:35:13 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Sep 2018 09:35:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,377,1531810800"; d="scan'208";a="83785006" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga003.jf.intel.com with ESMTP; 15 Sep 2018 09:35:00 -0700 Subject: [PATCH 3/3] mm: Maintain randomization of page free lists From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Kees Cook , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Sat, 15 Sep 2018 09:23:18 -0700 Message-ID: <153702859851.1603922.5390659652135091505.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> References: <153702858249.1603922.12913911825267831671.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When freeing a page with an order >= shuffle_page_order randomly select the front or back of the list for insertion. While the mm tries to defragment physical pages into huge pages this can tend to make the page allocator more predictable over time. Inject the front-back randomness to preserve the initial randomness established by shuffle_free_memory() when the kernel was booted. The overhead of this manipulation is constrained by only being applied for MAX_ORDER sized pages by default. Cc: Michal Hocko Cc: Kees Cook Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/mmzone.h | 2 ++ mm/page_alloc.c | 27 +++++++++++++++++++++++++-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index adf9b3a7440d..4a095432843d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -98,6 +98,8 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; unsigned long nr_free; + u64 rand; + u8 rand_bits; }; /* Used for pages not on another list */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 175f2e5f9e50..33a6b40ae463 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include #include @@ -746,6 +747,22 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; } +static void add_to_free_area_random(struct page *page, struct free_area *area, + int migratetype) +{ + if (area->rand_bits == 0) { + area->rand_bits = 64; + area->rand = get_random_u64(); + } + + if (area->rand & 1) + add_to_free_area(page, area, migratetype); + else + add_to_free_area_tail(page, area, migratetype); + area->rand_bits--; + area->rand >>= 1; +} + /* * Freeing function for a buddy system allocator. * @@ -851,7 +868,8 @@ static inline void __free_one_page(struct page *page, * so it's less likely to be used soon and more likely to be merged * as a higher order page */ - if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn)) { + if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn) + && order < shuffle_page_order) { struct page *higher_page, *higher_buddy; combined_pfn = buddy_pfn & pfn; higher_page = page + (combined_pfn - pfn); @@ -865,7 +883,12 @@ static inline void __free_one_page(struct page *page, } } - add_to_free_area(page, &zone->free_area[order], migratetype); + if (order < shuffle_page_order) + add_to_free_area(page, &zone->free_area[order], migratetype); + else + add_to_free_area_random(page, &zone->free_area[order], + migratetype); + } /*