From patchwork Thu Aug 23 02:29:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10573343 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBF1014BD for ; Thu, 23 Aug 2018 02:29:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE6CA2C020 for ; Thu, 23 Aug 2018 02:29:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BC2972C025; Thu, 23 Aug 2018 02:29:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 025BA2C020 for ; Thu, 23 Aug 2018 02:29:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 418A36B2798; Wed, 22 Aug 2018 22:29:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3C8736B2799; Wed, 22 Aug 2018 22:29:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DEEC6B279A; Wed, 22 Aug 2018 22:29:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id DED926B2798 for ; Wed, 22 Aug 2018 22:29:05 -0400 (EDT) Received: by mail-pl0-f69.google.com with SMTP id 90-v6so1790520pla.18 for ; Wed, 22 Aug 2018 19:29:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:subject:from:to:cc:date :message-id:user-agent:mime-version:content-transfer-encoding; bh=ZbCJ5OHEoUgfJzbcVyTjQ0JRU34J5rIj73z3WVlU5HI=; b=unkIv9Uzvm3jYXycX3dRHJ9ncSlgynl1T0g5e8Hl1z2NdvO4fIDITjp51+cN5tlQTC FERh6SX2fViF8Fw6mMrI1VLYfGoHDVsn/9lrTbijIXuw8+E0KiusPVa/elWqnjd65n5b aqSGEaRKjuvxIjcorkv7D0/9JsI2II/BVgoaVDxWEnbKbj6dy8BDStIiRo2eoDsIi4Qh CwAqVGeI5qlEMQjWtp0dINJDRJBZwMFE0sUtL7kttC+t16tDwUAfZwdOojVgjMkqHFe4 WTeT5Gn6rEM9ZG1mLOsdrR5CHpIfwtgi//HtqmIHHkpedoyJQ+AnwkuMoi6D/T2F0QEn c1FQ== X-Gm-Message-State: APzg51CjjS7eb3m6IrV5U5xkBOI+HUN5ecS59p9ZGEeurR3G4pgxKe76 6OrHeTAkMi/uhV86V38HSZ46JJAQb9ge3zd/I2vkd37w6c8lJG2DEjVM6Up26W2WxOCdyF6h/YT qfFLAItGGCP4SvWE2HAjIbGPdwhLyC7gyp/lxeam+0XmQ5rixSFfQ2WYhADxf7f/KxUr6zSGymQ KwsE0gKVAESGn2VK8RRrbkWCE3CAPPa6tWyJwkTnWLbcwCKkIZqZmTau5Yk6CVFko1Clhxcrqt6 KSYx/8jHW129PY+Dg9qeIY+/stwxTe2ExqSCMs8Ge6YqEAjQ3xvyPjna9ydIeyxNLw9+1dXrXYl HOnBbfdQ1pQ9VLtiGZT+wdW73nxMlrl3YGEHlJm0oUw3GEc8+DvjAIWKhHAFtI92jrDx3AAJrv3 J X-Received: by 2002:a63:5343:: with SMTP id t3-v6mr2799179pgl.425.1534991345497; Wed, 22 Aug 2018 19:29:05 -0700 (PDT) X-Received: by 2002:a63:5343:: with SMTP id t3-v6mr2799099pgl.425.1534991343919; Wed, 22 Aug 2018 19:29:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534991343; cv=none; d=google.com; s=arc-20160816; b=JZCTVrwMfyXzsQXDrAeXwoqLi+x9Fr+l0Gq+lVX6HxpeZhrmnky4ZIwmGJ8bV6I2Ps slMvJOP86thzouiLZmCCSxq06oLJdxZWrOoUUqy4EvoKDgaz5Ff3M1DDVnMk+FqRoXAK dGLdMW/8/20GwD+izq2i401mQvfxYuFZ28z5/MjNpKE2nsqFBhEhhcl8vzsV4/N65SR+ 0131v0OCeP9eh3v1zODSnxXscwqeEMVG3IJqGGDhpX7ubuaAbtg55mpfowyJG6uFK1AT hRuhiQBygi1rc9VI6cwYPGvY49qu6kUTZvCiEJ/R4bjoScEZgb1wvPFmonGNF66R77BY uO0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:message-id:date :cc:to:from:subject:dkim-signature:arc-authentication-results; bh=ZbCJ5OHEoUgfJzbcVyTjQ0JRU34J5rIj73z3WVlU5HI=; b=lcEURpLu4A5dMR/GSMMp1s2C71KQWIfV9IJMYEGu+kn8yXfIc8gBwxpzB0LLKw3FGy JlHTLLl0mNDHbjX6gKzqWnOOg42HEOSHn7tLu5zB+Lbjj/GSViRXkUKz4t8vwzDYTLcJ MGYh4Jaw8Idf14lwDFKTVkZ7ijZt8EJPWHx4zkylKDiIpYDlE4UbmlYJ0hUWQzrMuHIc JQctCsskj0X1SYTXhMJHox10EmcZtDyEyeYmyPiH8ztSnaOJNRkyfO/+LA494DZBBh5U qWX/cwByPGrgYuV64uSsvTe1e9dR5EPweOGULU9YHYX42Oos2PaI8OBwnClbmCOOVrPK utnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tWc4BSG1; spf=pass (google.com: domain of alexander.duyck@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id j4-v6sor955661pgh.210.2018.08.22.19.29.03 for (Google Transport Security); Wed, 22 Aug 2018 19:29:03 -0700 (PDT) Received-SPF: pass (google.com: domain of alexander.duyck@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tWc4BSG1; spf=pass (google.com: domain of alexander.duyck@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:user-agent:mime-version :content-transfer-encoding; bh=ZbCJ5OHEoUgfJzbcVyTjQ0JRU34J5rIj73z3WVlU5HI=; b=tWc4BSG1Nwll0BSTvjAJdavEE0P4+pKzCxrTFO2hIeYxRkeLZnX2JtHW8SfVbsWpXO hJ86qfvYl1v3SWJjd/dsjAVPvUQHYAzbXM5a1AnxGPCFMkdKz9ZawvCTfFOLWm4oXp5R yD9St6o3YKiwaYwjrpn/8i2KJVydcGpawsiat2bzFEAQa6r91gS6ZdMtcvLgzWO8QDXT m9UlM7avoeN93YWNYXwZiVT1IiXCzOgFUem6z35wwG94ek+w0U7GLp/ah6xKeiE8U8SP CfN6B3nEfkpNaEPK7WyUbP4sB06UGztKMaxCr306ZiiHbYadIEcIn30IvQk+ciCPTSzj uc2A== X-Google-Smtp-Source: AA+uWPz3RKqg3BidLPmzgTntpvuEyhiNruO3gtigTHIHHJHmuqO/CCM6/t7S5rvR45SRNsch2pkOQQ== X-Received: by 2002:a65:6143:: with SMTP id o3-v6mr24805781pgv.52.1534991343278; Wed, 22 Aug 2018 19:29:03 -0700 (PDT) Received: from localhost.localdomain (static-50-53-21-37.bvtn.or.frontiernet.net. [50.53.21.37]) by smtp.gmail.com with ESMTPSA id o10-v6sm4860431pfk.76.2018.08.22.19.29.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Aug 2018 19:29:02 -0700 (PDT) Subject: [RFC PATCH] mm: Streamline deferred_init_pages and deferred_free_pages From: Alexander Duyck To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, pasha.tatashin@oracle.com, vbabka@suse.cz, mhocko@kernel.org Date: Wed, 22 Aug 2018 19:29:02 -0700 Message-ID: <20180823022116.4536.6104.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Alexander Duyck >From what I could tell the deferred_init_pages and deferred_free_pages were running less than optimally as there were a number of checks that seemed like they only needed to be run once per page block instead of being run per page. For example there is the pfn_valid check which either needs to be run for every page if the architecture supports holes, or once per page block if it doesn't. We can get around needing to perform these checks by just using pfn_valid_within within the loop and running pfn_valid only on the first access to any given page. Also in the case of either a node ID mismatch or the pfn_valid check failing on an architecture that doesn't support holes it doesn't make sense to initialize pages for an invalid page block. So to skip over those pages I have opted to OR in the page block mask to allow us to skip to the end of a given page block. With this patch I am seeing a modest improvement in boot time as shown below on a system with 64GB on each node: -- before -- [ 2.945572] node 0 initialised, 15432905 pages in 636ms [ 2.968575] node 1 initialised, 15957078 pages in 659ms -- after -- [ 2.770127] node 0 initialised, 15432905 pages in 457ms [ 2.785129] node 1 initialised, 15957078 pages in 472ms Signed-off-by: Alexander Duyck --- I'm putting this out as an RFC in order to determine if the assumptions I have made are valid. I wasn't certain if they were correct but the definition and usage of deferred_pfn_valid just didn't seem right to me since it would result in us possibly skipping a head page, and then still initializing and freeing all of the tail pages. mm/page_alloc.c | 152 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 101 insertions(+), 51 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 15ea511fb41c..0aca48b377fa 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1410,27 +1410,17 @@ void clear_zone_contiguous(struct zone *zone) static void __init deferred_free_range(unsigned long pfn, unsigned long nr_pages) { - struct page *page; - unsigned long i; + struct page *page, *last_page; if (!nr_pages) return; page = pfn_to_page(pfn); + last_page = page + nr_pages; - /* Free a large naturally-aligned chunk if possible */ - if (nr_pages == pageblock_nr_pages && - (pfn & (pageblock_nr_pages - 1)) == 0) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_boot_core(page, pageblock_order); - return; - } - - for (i = 0; i < nr_pages; i++, page++, pfn++) { - if ((pfn & (pageblock_nr_pages - 1)) == 0) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); + do { __free_pages_boot_core(page, 0); - } + } while (++page < last_page); } /* Completion tracking for deferred_init_memmap() threads */ @@ -1446,14 +1436,10 @@ static inline void __init pgdat_init_report_one_done(void) /* * Returns true if page needs to be initialized or freed to buddy allocator. * - * First we check if pfn is valid on architectures where it is possible to have - * holes within pageblock_nr_pages. On systems where it is not possible, this - * function is optimized out. + * First we check if a current large page is valid by only checking the validity + * of the first pfn we have access to in the page. * - * Then, we check if a current large page is valid by only checking the validity - * of the head pfn. - * - * Finally, meminit_pfn_in_nid is checked on systems where pfns can interleave + * Then, meminit_pfn_in_nid is checked on systems where pfns can interleave * within a node: a pfn is between start and end of a node, but does not belong * to this memory node. */ @@ -1461,9 +1447,7 @@ static inline void __init pgdat_init_report_one_done(void) deferred_pfn_valid(int nid, unsigned long pfn, struct mminit_pfnnid_cache *nid_init_state) { - if (!pfn_valid_within(pfn)) - return false; - if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn)) + if (!pfn_valid(pfn)) return false; if (!meminit_pfn_in_nid(pfn, nid, nid_init_state)) return false; @@ -1477,24 +1461,53 @@ static inline void __init pgdat_init_report_one_done(void) static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, unsigned long end_pfn) { - struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pgmask = pageblock_nr_pages - 1; - unsigned long nr_free = 0; + struct mminit_pfnnid_cache nid_init_state = { }; - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 0; - } else if (!(pfn & nr_pgmask)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 1; - touch_nmi_watchdog(); - } else { - nr_free++; + while (pfn < end_pfn) { + unsigned long aligned_pfn, nr_free; + + /* + * Determine if our first pfn is valid, use this as a + * representative value for the page block. Store the value + * as either 0 or 1 to nr_free. + * + * If the pfn itself is valid, but this page block isn't + * then we can assume the issue is the entire page block + * and it can be skipped. + */ + nr_free = !!deferred_pfn_valid(nid, pfn, &nid_init_state); + if (!nr_free && pfn_valid_within(pfn)) + pfn |= nr_pgmask; + + /* + * Move to next pfn and align the end of our next section + * to process with the end of the block. If we were given + * the end of a block to process we will do nothing in the + * loop below. + */ + pfn++; + aligned_pfn = min(__ALIGN_MASK(pfn, nr_pgmask), end_pfn); + + for (; pfn < aligned_pfn; pfn++) { + if (!pfn_valid_within(pfn)) { + deferred_free_range(pfn - nr_free, nr_free); + nr_free = 0; + } else { + nr_free++; + } } + + /* Free a large naturally-aligned chunk if possible */ + if (nr_free == pageblock_nr_pages) + __free_pages_boot_core(pfn_to_page(pfn - nr_free), + pageblock_order); + else + deferred_free_range(pfn - nr_free, nr_free); + + /* Let the watchdog know we are still alive */ + touch_nmi_watchdog(); } - /* Free the last block of pages to allocator */ - deferred_free_range(pfn - nr_free, nr_free); } /* @@ -1506,25 +1519,62 @@ static unsigned long __init deferred_init_pages(int nid, int zid, unsigned long pfn, unsigned long end_pfn) { - struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pgmask = pageblock_nr_pages - 1; + struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pages = 0; - struct page *page = NULL; - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) { - page = NULL; - continue; - } else if (!page || !(pfn & nr_pgmask)) { + while (pfn < end_pfn) { + unsigned long aligned_pfn; + struct page *page = NULL; + + /* + * Determine if our first pfn is valid, use this as a + * representative value for the page block. + */ + if (deferred_pfn_valid(nid, pfn, &nid_init_state)) { page = pfn_to_page(pfn); - touch_nmi_watchdog(); - } else { - page++; + __init_single_page(page++, pfn, zid, nid); + nr_pages++; + + if (!(pfn & nr_pgmask)) + set_pageblock_migratetype(page, + MIGRATE_MOVABLE); + } else if (pfn_valid_within(pfn)) { + /* + * If the issue is the node or is not specific to + * this individual pfn we can just jump to the end + * of the page block and skip the entire block. + */ + pfn |= nr_pgmask; } - __init_single_page(page, pfn, zid, nid); - nr_pages++; + + /* + * Move to next pfn and align the end of our next section + * to process with the end of the block. If we were given + * the end of a block to process we will do nothing in the + * loop below. + */ + pfn++; + aligned_pfn = min(__ALIGN_MASK(pfn, nr_pgmask), end_pfn); + + for (; pfn < aligned_pfn; pfn++) { + if (!pfn_valid_within(pfn)) { + page = NULL; + continue; + } + + if (!page) + page = pfn_to_page(pfn); + + __init_single_page(page++, pfn, zid, nid); + nr_pages++; + } + + /* Let the watchdog know we are still alive */ + touch_nmi_watchdog(); } - return (nr_pages); + + return nr_pages; } /* Initialise remaining memory on a node */