From patchwork Mon Dec 10 16:29:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10721653 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0C0018A7 for ; Mon, 10 Dec 2018 16:30:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EDA42AE48 for ; Mon, 10 Dec 2018 16:30:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82B612AE50; Mon, 10 Dec 2018 16:30:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD0FB2AE4D for ; Mon, 10 Dec 2018 16:30:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA9388E0030; Mon, 10 Dec 2018 11:30:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E58E88E0018; Mon, 10 Dec 2018 11:30:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D21608E0030; Mon, 10 Dec 2018 11:30:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id A5D968E0018 for ; Mon, 10 Dec 2018 11:30:22 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id b26so12063778qtq.14 for ; Mon, 10 Dec 2018 08:30:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=XIJAX2in3yipKR0j1caYgTnrdcy+DnN4hVgJkWuKi3Y=; b=IoBTDpmN9Do8HVU2WXvfERiIC30tXjDYfPB9Ev77A0suuHWYl7DxZOIaRh6FxxG08q iQUh6ocB2MkzV2OG1P1HWZDJzFFRSXIAR79xM4HCmHJn391uMqlSn+LN2i8ETGapZv/D JOL/E3BAb0oGq2TfIE7b91nOwiCyzhZxJwXjsp0kde9dV05U6t/8AVsrUBht8n5kmE1t FkiKdgIPOpWXcDOiKl/mRD/95xFyZuoDXUXl6b3rfX2NXvjz4Ex/6kTgeSfJdMoo2NDU XVd/iXqIgDaZXX+/BimmwiCSWciaKGc+zjaVxRvlcfQ+Is6sb/WGaRt9+M5fDDqyxAR5 +e8Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of longman@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AA+aEWahwTrEz3CqEbkzNOizLwEV5EK/exCbST7zoUSg+d3z57QssJu+ jBoqvwF+9F1WVU2TfEkGrWMQZ5DJY3YSvCtBqtCe7oQ7v4iihFUVSaxBeK/mz6SWt2HS1fRJNes TFhXaDLbf/53PAONZoZCNSg8A7YH7lAosj4CNJcmUh82PWFOKVoBAwbfvkMidDE6SlA== X-Received: by 2002:ac8:32dc:: with SMTP id a28mr12989881qtb.41.1544459422401; Mon, 10 Dec 2018 08:30:22 -0800 (PST) X-Google-Smtp-Source: AFSGD/U34VD7s1M5B4cxL1GxScn+7vm0rXNsuofhAaEgX7JsHgU0eINVWf0Pq/Z+RpjlhO2hqEgb X-Received: by 2002:ac8:32dc:: with SMTP id a28mr12989821qtb.41.1544459421634; Mon, 10 Dec 2018 08:30:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544459421; cv=none; d=google.com; s=arc-20160816; b=b/d/mOjKLDLKcOrXToCLpKPrWoN+y7vLqJ1amK+2s6DGycH7Mio6w4XbSjpvZfV3EX AGCgm/nYHSAt0HpoiSvzUkdnkYi12W2a2fm+FvauMhcYD7JISgz43vE2gw6OICrtdxjl QD+Z+eeakKtNQntNOOWdT2+48+WX7Dcgxckz6tzh6nYadn2Nc5NG8oxAZMs94G9hwi+d +UOUjpFkjus3XTACSS6BcVD6nxjYBzfJQWI4b8iBEORLh1/nPFCPadvVgDkZcVbAN3Ij UE0lfQ2lxvYR+RKlLPuzqqDU26F73JM+8cT0PeCXzob6ndmE59uFgqKRQcv8SMm8IBkt AUvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from; bh=XIJAX2in3yipKR0j1caYgTnrdcy+DnN4hVgJkWuKi3Y=; b=zhlVuqBV6RQIzlbuloZxKEKESgpMDws4YpBUcLq9p5VGXUbghFkLSEgwovlAv5wSU3 nSBu6+3U14rWgGaTvg257E51Hv0OWHws8/wcc58DNyi7B5JfU5UgPakxKPPXx+K4LWU9 12guHWc4BvD1wYhH11ehnYhFLajI3DKYWFmVZhtgyGyy4UbcmTA47E726ZoOa+4gZybj opuscAPVdjbrPUAWPo3Vcvvj/vK/LBIwvEhaaG+iC71QKI6ZgQGxAGLf6cWvtgNuU+xa CAveJgXIqmo1yhxuHQCWI5HWkej5DsIDqJhPcolh8DZMqSCFKIBCljPa9UXu1OtPLR1c uUGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of longman@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id x13si1369715qtq.206.2018.12.10.08.30.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 08:30:21 -0800 (PST) Received-SPF: pass (google.com: domain of longman@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of longman@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7FD204E916; Mon, 10 Dec 2018 16:30:20 +0000 (UTC) Received: from llong.com (dhcp-17-223.bos.redhat.com [10.18.17.223]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0191D5C1B4; Mon, 10 Dec 2018 16:30:14 +0000 (UTC) From: Waiman Long To: Andrey Ryabinin , Andrew Morton Cc: kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Potapenko , Dmitry Vyukov , Michal Hocko , Waiman Long Subject: [PATCH] mm/page_alloc: Don't call kasan_free_pages() at deferred mem init Date: Mon, 10 Dec 2018 11:29:48 -0500 Message-Id: <1544459388-8736-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 10 Dec 2018 16:30:20 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When CONFIG_KASAN is enabled on large memory SMP systems, the deferrred pages initialization can take a long time. Below were the reported init times on a 8-socket 96-core 4TB IvyBridge system. 1) Non-debug kernel without CONFIG_KASAN [ 8.764222] node 1 initialised, 132086516 pages in 7027ms 2) Debug kernel with CONFIG_KASAN [ 146.288115] node 1 initialised, 132075466 pages in 143052ms So the page init time in a debug kernel was 20X of the non-debug kernel. The long init time can be problematic as the page initialization is done with interrupt disabled. In this particular case, it caused the appearance of following warning messages as well as NMI backtraces of all the cores that were doing the initialization. [ 68.240049] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 68.241000] rcu: 25-...0: (100 ticks this GP) idle=b72/1/0x4000000000000000 softirq=915/915 fqs=16252 [ 68.241000] rcu: 44-...0: (95 ticks this GP) idle=49a/1/0x4000000000000000 softirq=788/788 fqs=16253 [ 68.241000] rcu: 54-...0: (104 ticks this GP) idle=03a/1/0x4000000000000000 softirq=721/825 fqs=16253 [ 68.241000] rcu: 60-...0: (103 ticks this GP) idle=cbe/1/0x4000000000000000 softirq=637/740 fqs=16253 [ 68.241000] rcu: 72-...0: (105 ticks this GP) idle=786/1/0x4000000000000000 softirq=536/641 fqs=16253 [ 68.241000] rcu: 84-...0: (99 ticks this GP) idle=292/1/0x4000000000000000 softirq=537/537 fqs=16253 [ 68.241000] rcu: 111-...0: (104 ticks this GP) idle=bde/1/0x4000000000000000 softirq=474/476 fqs=16253 [ 68.241000] rcu: (detected by 13, t=65018 jiffies, g=249, q=2) The long init time was mainly caused by the call to kasan_free_pages() to poison the newly initialized pages. On a 4TB system, we are talking about almost 500GB of memory probably on the same node. In reality, we may not need to poison the newly initialized pages before they are ever allocated. So KASAN poisoning of freed pages before the completion of deferred memory initialization is now disabled. Those pages will be properly poisoned when they are allocated or freed after deferred pages initialization is done. With this change, the new page initialization time became: [ 21.948010] node 1 initialised, 132075466 pages in 18702ms This was still about double the non-debug kernel time, but was much better than before. Signed-off-by: Waiman Long --- mm/page_alloc.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec9cc4..941161d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -294,6 +294,32 @@ bool pm_suspended_storage(void) int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +/* + * During boot we initialize deferred pages on-demand, as needed, but once + * page_alloc_init_late() has finished, the deferred pages are all initialized, + * and we can permanently disable that path. + */ +static DEFINE_STATIC_KEY_TRUE(deferred_pages); + +/* + * Calling kasan_free_pages() only after deferred memory initialization + * has completed. Poisoning pages during deferred memory init will greatly + * lengthen the process and cause problem in large memory systems as the + * deferred pages initialization is done with interrupt disabled. + * + * Assuming that there will be no reference to those newly initialized + * pages before they are ever allocated, this should have no effect on + * KASAN memory tracking as the poison will be properly inserted at page + * allocation time. The only corner case is when pages are allocated by + * on-demand allocation and then freed again before the deferred pages + * initialization is done, but this is not likely to happen. + */ +static inline void kasan_free_nondeferred_pages(struct page *page, int order) +{ + if (!static_branch_unlikely(&deferred_pages)) + kasan_free_pages(page, order); +} + /* Returns true if the struct page for the pfn is uninitialised */ static inline bool __meminit early_page_uninitialised(unsigned long pfn) { @@ -335,6 +361,8 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) return false; } #else +#define kasan_free_nondeferred_pages(p, o) kasan_free_pages(p, o) + static inline bool early_page_uninitialised(unsigned long pfn) { return false; @@ -1037,7 +1065,7 @@ static __always_inline bool free_pages_prepare(struct page *page, arch_free_page(page, order); kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); - kasan_free_pages(page, order); + kasan_free_nondeferred_pages(page, order); return true; } @@ -1606,13 +1634,6 @@ static int __init deferred_init_memmap(void *data) } /* - * During boot we initialize deferred pages on-demand, as needed, but once - * page_alloc_init_late() has finished, the deferred pages are all initialized, - * and we can permanently disable that path. - */ -static DEFINE_STATIC_KEY_TRUE(deferred_pages); - -/* * If this zone has deferred pages, try to grow it by initializing enough * deferred pages to satisfy the allocation specified by order, rounded up to * the nearest PAGES_PER_SECTION boundary. So we're adding memory in increments