From patchwork Wed Oct 17 06:33:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10644627 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 007A4112B for ; Wed, 17 Oct 2018 06:33:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4134289D0 for ; Wed, 17 Oct 2018 06:33:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8206289E1; Wed, 17 Oct 2018 06:33:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56A52289D0 for ; Wed, 17 Oct 2018 06:33:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B7386B0005; Wed, 17 Oct 2018 02:33:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 78C8E6B0006; Wed, 17 Oct 2018 02:33:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67C466B0007; Wed, 17 Oct 2018 02:33:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 2BC916B0005 for ; Wed, 17 Oct 2018 02:33:39 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id s15-v6so19299013pgv.9 for ; Tue, 16 Oct 2018 23:33:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=qGxadi8yG+CAoBsPLQsJm3DFIcMzGgKrjqFOcxfOzZY=; b=fcH0yCNYBy8swpczHp/nDAhPcUiWzI7dgvGK3TgAr3G6SVv9U9zeXXhhuF4mYNpNKn +6oAPBEDIdPDfBmTeSFrct3hGyWx2W78FYVdOjCaI4fmdRyEk9POVGmV/TND8alyJ9ne rVF1JPDIfppeLnEZidJJMUllwZmv6INof9YZ1nqWSdClIRTCftDCpQTVbB37ohi3Qpk/ mOiSbSymPAr/dJV4V77hx3LFqXoFSlO2vHbx2RyQLZvI0kFFR+WSCGmJTUyGFBjj9u8W 4L/NAHGoxiRgc3JXJhow688x7RbYLrjdooKyIj75WIsZA01XLdOXLkmEHRljPutxVaF7 1/Aw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoiklODxlWtX7liDG0NW58ETCp08hDW1y1GzIapXTR47Hv61W0aI l/MM3GC73h94HanHvtbI/twBi1mV/m3vAR/g5BA+6pTieRk6kb5vqTouKxQCIK4sncv2u1kL0RD 7UJbRNphHFf3S2oWF2MUO6L0BlkRuXKxY7+I5q0ceJCvppLiy6AC0BBmBvIKNvHPbuA== X-Received: by 2002:a63:9c3:: with SMTP id 186-v6mr23431560pgj.249.1539758018850; Tue, 16 Oct 2018 23:33:38 -0700 (PDT) X-Google-Smtp-Source: ACcGV62MWMwhFp8vrHGTUea4X5lcoAvS3YFhh8/V5Dp2PI1nkfQHeldL3o6/jnxr+HaNwQk1EFqH X-Received: by 2002:a63:9c3:: with SMTP id 186-v6mr23431518pgj.249.1539758017792; Tue, 16 Oct 2018 23:33:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539758017; cv=none; d=google.com; s=arc-20160816; b=H2VG8TSyoShBB+ah1HDEnkkC6nDEcecYKkg3gN0wgXgnTm6LevAz7rrJszYruXvZLg gMOLVzY9HPdqnm8/R8CBAutNaopBD8ENiuCWcgqBzMN6XfvRKfD3YYRfdYYLpVcLjRgg VGURCmOAf88YDz/bN6sgOlRe0EweLjJKpYK3MWFdB7lqjOxOuRPOCaxUP7b/bNg7O6rl wVBPipYvrgEMe9HO3jdpchjs7CRHiLrBznDbuVyc7tYCGOYziphWIcAxiiwlOr8Ik5VX kRD4tTQPdmW+ZyIHb2yQOBixQFBiCSu75cC6HuyyBm7oSCycHaJDFrA3b08gvu4S+KAW hkUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=qGxadi8yG+CAoBsPLQsJm3DFIcMzGgKrjqFOcxfOzZY=; b=SpjslZQHLMkjnLKWbZnLQpEdCpv81s5l/YpJWa12vDLZVhP5XzUUn1hQOqYwAxkomK 5UwcJWpnA1A3fSELk2IOnDBcqeRlgYv5n1ixwX3f+QCvMlBiYmjVbix829h0QwHxjz6x Fq/LKMyBmHZCHtZ6t7FliOQD4aWgQHExAeOGor3qK+StvLIOrtqIHwDq9w7xvLt/IG9g G9/5j36MpMB2PYfuCqK/IESrdvHP5iEsT9papI3vHKFmMSYZB+bWxa2Vc376LjkNePsw nnpL5O8DC59UWNDkh4OiHZatJzA5TAIfPMcWV+gRyT1k4zLJiZ92pggiyKBc/7DPq1FA zooQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id f2-v6si16427856pgf.423.2018.10.16.23.33.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 23:33:37 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 23:33:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,391,1534834800"; d="scan'208";a="273090072" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 16 Oct 2018 23:33:34 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Jesper Dangaard Brouer Subject: [RFC v4 PATCH 1/5] mm/page_alloc: use helper functions to add/remove a page to/from buddy Date: Wed, 17 Oct 2018 14:33:26 +0800 Message-Id: <20181017063330.15384-2-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181017063330.15384-1-aaron.lu@intel.com> References: <20181017063330.15384-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are multiple places that add/remove a page into/from buddy, introduce helper functions for them. This also makes it easier to add code when a page is added/removed to/from buddy. No functionality change. Acked-by: Vlastimil Babka Signed-off-by: Aaron Lu Acked-by: Mel Gorman --- mm/page_alloc.c | 65 +++++++++++++++++++++++++++++-------------------- 1 file changed, 39 insertions(+), 26 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89d2a2ab3fe6..14c20bb3a3da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -697,12 +697,41 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } +static inline void add_to_buddy_common(struct page *page, struct zone *zone, + unsigned int order) +{ + set_page_order(page, order); + zone->free_area[order].nr_free++; +} + +static inline void add_to_buddy_head(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add(&page->lru, &zone->free_area[order].free_list[mt]); +} + +static inline void add_to_buddy_tail(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add_tail(&page->lru, &zone->free_area[order].free_list[mt]); +} + static inline void rmv_page_order(struct page *page) { __ClearPageBuddy(page); set_page_private(page, 0); } +static inline void remove_from_buddy(struct page *page, struct zone *zone, + unsigned int order) +{ + list_del(&page->lru); + zone->free_area[order].nr_free--; + rmv_page_order(page); +} + /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -803,13 +832,10 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(&buddy->lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + remove_from_buddy(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -841,8 +867,6 @@ static inline void __free_one_page(struct page *page, } done_merging: - set_page_order(page, order); - /* * If this is not the largest possible page, check if the buddy * of the next-highest order is free. If it is, it's possible @@ -859,15 +883,12 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(&page->lru, - &zone->free_area[order].free_list[migratetype]); - goto out; + add_to_buddy_tail(page, zone, order, migratetype); + return; } } - list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_buddy_head(page, zone, order, migratetype); } /* @@ -1805,9 +1826,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - list_add(&page[size].lru, &area->free_list[migratetype]); - area->nr_free++; - set_page_order(&page[size], high); + add_to_buddy_head(&page[size], zone, high, migratetype); } } @@ -1951,9 +1970,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, struct page, lru); if (!page) continue; - list_del(&page->lru); - rmv_page_order(page); - area->nr_free--; + remove_from_buddy(page, zone, current_order); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2871,9 +2888,7 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(&page->lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + remove_from_buddy(page, zone, order); /* * Set the pageblock if the isolated page is at least half of a @@ -8070,9 +8085,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); #endif - list_del(&page->lru); - rmv_page_order(page); - zone->free_area[order].nr_free--; + remove_from_buddy(page, zone, order); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); From patchwork Wed Oct 17 06:33:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10644629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED9D5157A for ; Wed, 17 Oct 2018 06:33:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE230292F4 for ; Wed, 17 Oct 2018 06:33:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CFED02A441; Wed, 17 Oct 2018 06:33:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 14EF1292F4 for ; Wed, 17 Oct 2018 06:33:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 302516B0006; Wed, 17 Oct 2018 02:33:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 25B786B0008; Wed, 17 Oct 2018 02:33:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FEFC6B000A; Wed, 17 Oct 2018 02:33:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id BD01D6B0006 for ; Wed, 17 Oct 2018 02:33:41 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id s24-v6so20085143plp.12 for ; Tue, 16 Oct 2018 23:33:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ieSkXsPkKgXtPUQL/svrPOXc6rvTONrU4g1f5Djtbxk=; b=Tlq02YSV0rFOufvyzRwTTYghhIDnw9G8V20SWxqJ0cfqwq743yV3mTWBGMkNkr2V7V CW7zEqbx1sC3JNYEeAmTcsaDAiOBBkuSi4UHQVF84jSBZhLBqkM+IYwLIIqzfFvHLc3a oqYGB6FJVwFumfOTvN6F3sTzN85adDj/+2jt+0nSZR0XA2Ex1YufhrsaMJ7p5ZN7dbFu f5PTB3rnEVZ9dymt8n0X7KRBqPk9u2I7SJ8y39wYjnGvNOrevCSXk7sP4NdZ3weAqRTF 7uQgB8S0qkXpO2GYGtAteAew2VhAdPaL2kIVaALYxd0MOL//CY0VuQBMusTVNkNoWkEr /W7Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohP0riFCvl2ov2FDPl3zapHPAoUVVggQJwEGpP5i0ngBZZr3P7v OY7cF3vBeFdX/CfOBpt66C56heTLL+f83wLjGShjxzteR7o6zOiLwiqa7ywWU2V+s2zfwe6nqWP I/ZKJH/ZSgkfboCShg+YsX+A44Cqz/YOlxwA5XI5bPSqkAnHxxtXkDrmrc7RIeIz58Q== X-Received: by 2002:a62:aa17:: with SMTP id e23-v6mr25393869pff.211.1539758021438; Tue, 16 Oct 2018 23:33:41 -0700 (PDT) X-Google-Smtp-Source: ACcGV618R+pokCSqVP+ORC2p4x5OzQvOqiA2WpWEhweQwEeKxH5StMfrsnBdvV/Z0CLBho8B4A8S X-Received: by 2002:a62:aa17:: with SMTP id e23-v6mr25393830pff.211.1539758020450; Tue, 16 Oct 2018 23:33:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539758020; cv=none; d=google.com; s=arc-20160816; b=RBJnpAuKUHhXVCc1v8TUwB+sYXJTf3pLMAl1hmv+cWFp40/27SnPbDtXoidiZT3uLo F8Q/GELMHvcK4pSnhCuWC99w0BRkz7uYN9nxZMex7GocDIpCap7QqOXgBNpzc3sUPnh5 hUqdupSxq7h9jNGlU6TMSUPt4FklNAb08s8YPQtcI2GnRAriMC5X7kVB4nVr5vrolR1k EcNw93tXvFT6M0I37HozA39nEmc8fN27SGQmyblQVPRuQ+rG77Jq9g6pD1b3+otG7s4+ vJ+E3JYaHP2Hd1PgT67tCYmJLPgn/gMUdPUsa6vf1pW+Lw5tzl1/8qtwxE50ilmXCtEf xqUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ieSkXsPkKgXtPUQL/svrPOXc6rvTONrU4g1f5Djtbxk=; b=NTqGV+SFo+A9Qd2YWGR2tDiu3FZWtv1zvNM/QPBSd7Qa00KL62+EWm/h72ehPNSyNb wmtU61KPGcu+CieINe3yiVESZk6OZcVCwClsHbOcLm/nuRBY4Z+8Uuua3YEjlctK9qOH +7QtpcJS25KtyYQOo25H7xlbcAB9q7edqVaQz0yygM048pO4vqGMA8x6/vzLH9gTt+LB WV7dBBojRhQDDbwQTrDlQvKm7BeGbA2HgiomCAOCQLxSVOTxdokCjVcVMMTDVhRnjWq1 Jon1Fq+P60csj1z92x+kEuAY2Z5Ep4WeZpDSHgbANJjLlU7YSqK5w15z9YKKl/9VKqli 131Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id f2-v6si16427856pgf.423.2018.10.16.23.33.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 23:33:40 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 23:33:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,391,1534834800"; d="scan'208";a="273090084" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 16 Oct 2018 23:33:37 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Jesper Dangaard Brouer Subject: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed Date: Wed, 17 Oct 2018 14:33:27 +0800 Message-Id: <20181017063330.15384-3-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181017063330.15384-1-aaron.lu@intel.com> References: <20181017063330.15384-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Running will-it-scale/page_fault1 process mode workload on a 2 sockets Intel Skylake server showed severe lock contention of zone->lock, as high as about 80%(42% on allocation path and 35% on free path) CPU cycles are burnt spinning. With perf, the most time consuming part inside that lock on free path is cache missing on page structures, mostly on the to-be-freed page's buddy due to merging. One way to avoid this overhead is not do any merging at all for order-0 pages. With this approach, the lock contention for zone->lock on free path dropped to 1.1% but allocation side still has as high as 42% lock contention. In the meantime, the dropped lock contention on free side doesn't translate to performance increase, instead, it's consumed by increased lock contention of the per node lru_lock(rose from 5% to 37%) and the final performance slightly dropped about 1%. Though performance dropped a little, it almost eliminated zone lock contention on free path and it is the foundation for the next patch that eliminates zone lock contention for allocation path. Suggested-by: Dave Hansen Signed-off-by: Aaron Lu --- include/linux/mm_types.h | 9 +++- mm/compaction.c | 13 +++++- mm/internal.h | 27 ++++++++++++ mm/page_alloc.c | 88 ++++++++++++++++++++++++++++++++++------ 4 files changed, 121 insertions(+), 16 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5ed8f6292a53..aed93053ef6e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -179,8 +179,13 @@ struct page { int units; /* SLOB */ }; - /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ - atomic_t _refcount; + union { + /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ + atomic_t _refcount; + + /* For pages in Buddy: if skipped merging when added to Buddy */ + bool buddy_merge_skipped; + }; #ifdef CONFIG_MEMCG struct mem_cgroup *mem_cgroup; diff --git a/mm/compaction.c b/mm/compaction.c index faca45ebe62d..0c9c7a30dde3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -777,8 +777,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * potential isolation targets. */ if (PageBuddy(page)) { - unsigned long freepage_order = page_order_unsafe(page); + unsigned long freepage_order; + /* + * If this is a merge_skipped page, do merge now + * since high-order pages are needed. zone lock + * isn't taken for the merge_skipped check so the + * check could be wrong but the worst case is we + * lose a merge opportunity. + */ + if (page_merge_was_skipped(page)) + try_to_merge_page(page); + + freepage_order = page_order_unsafe(page); /* * Without lock, we cannot be sure that what we got is * a valid page order. Consider only values in the diff --git a/mm/internal.h b/mm/internal.h index 87256ae1bef8..c166735a559e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -527,4 +527,31 @@ static inline bool is_migrate_highatomic_page(struct page *page) void setup_zone_pageset(struct zone *zone); extern struct page *alloc_new_node_page(struct page *page, unsigned long node); + +static inline bool page_merge_was_skipped(struct page *page) +{ + return page->buddy_merge_skipped; +} + +void try_to_merge_page(struct page *page); + +#ifdef CONFIG_COMPACTION +static inline bool can_skip_merge(struct zone *zone, int order) +{ + /* Compaction has failed in this zone, we shouldn't skip merging */ + if (zone->compact_considered) + return false; + + /* Only consider no_merge for order 0 pages */ + if (order) + return false; + + return true; +} +#else /* CONFIG_COMPACTION */ +static inline bool can_skip_merge(struct zone *zone, int order) +{ + return false; +} +#endif /* CONFIG_COMPACTION */ #endif /* __MM_INTERNAL_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 14c20bb3a3da..76d471e0ab24 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -691,6 +691,16 @@ static inline void clear_page_guard(struct zone *zone, struct page *page, unsigned int order, int migratetype) {} #endif +static inline void set_page_merge_skipped(struct page *page) +{ + page->buddy_merge_skipped = true; +} + +static inline void clear_page_merge_skipped(struct page *page) +{ + page->buddy_merge_skipped = false; +} + static inline void set_page_order(struct page *page, unsigned int order) { set_page_private(page, order); @@ -700,6 +710,7 @@ static inline void set_page_order(struct page *page, unsigned int order) static inline void add_to_buddy_common(struct page *page, struct zone *zone, unsigned int order) { + clear_page_merge_skipped(page); set_page_order(page, order); zone->free_area[order].nr_free++; } @@ -730,6 +741,7 @@ static inline void remove_from_buddy(struct page *page, struct zone *zone, list_del(&page->lru); zone->free_area[order].nr_free--; rmv_page_order(page); + clear_page_merge_skipped(page); } /* @@ -797,7 +809,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, * -- nyc */ -static inline void __free_one_page(struct page *page, +static inline void do_merge(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, int migratetype) @@ -809,16 +821,6 @@ static inline void __free_one_page(struct page *page, max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); - VM_BUG_ON(!zone_is_initialized(zone)); - VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); - - VM_BUG_ON(migratetype == -1); - if (likely(!is_migrate_isolate(migratetype))) - __mod_zone_freepage_state(zone, 1 << order, migratetype); - - VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); - VM_BUG_ON_PAGE(bad_range(zone, page), page); - continue_merging: while (order < max_order - 1) { buddy_pfn = __find_buddy_pfn(pfn, order); @@ -891,6 +893,61 @@ static inline void __free_one_page(struct page *page, add_to_buddy_head(page, zone, order, migratetype); } +void try_to_merge_page(struct page *page) +{ + unsigned long pfn, buddy_pfn, flags; + struct page *buddy; + struct zone *zone; + + /* + * No need to do merging if buddy is not free. + * zone lock isn't taken so this could be wrong but worst case + * is we lose a merge opportunity. + */ + pfn = page_to_pfn(page); + buddy_pfn = __find_buddy_pfn(pfn, 0); + buddy = page + (buddy_pfn - pfn); + if (!PageBuddy(buddy)) + return; + + zone = page_zone(page); + spin_lock_irqsave(&zone->lock, flags); + /* Verify again after taking the lock */ + if (likely(PageBuddy(page) && page_merge_was_skipped(page) && + PageBuddy(buddy))) { + int mt = get_pageblock_migratetype(page); + + remove_from_buddy(page, zone, 0); + do_merge(page, pfn, zone, 0, mt); + } + spin_unlock_irqrestore(&zone->lock, flags); +} + +static inline void __free_one_page(struct page *page, + unsigned long pfn, + struct zone *zone, unsigned int order, + int migratetype) +{ + VM_BUG_ON(!zone_is_initialized(zone)); + VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); + + VM_BUG_ON(migratetype == -1); + if (likely(!is_migrate_isolate(migratetype))) + __mod_zone_freepage_state(zone, 1 << order, migratetype); + + VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); + VM_BUG_ON_PAGE(bad_range(zone, page), page); + + if (can_skip_merge(zone, order)) { + add_to_buddy_head(page, zone, 0, migratetype); + set_page_merge_skipped(page); + return; + } + + do_merge(page, pfn, zone, order, migratetype); +} + + /* * A bad page could be due to a number of fields. Instead of multiple branches, * try and check multiple fields with one check. The caller must do a detailed @@ -1148,9 +1205,14 @@ static void free_pcppages_bulk(struct zone *zone, int count, * can be offset by reduced memory latency later. To * avoid excessive prefetching due to large count, only * prefetch buddy for the first pcp->batch nr of pages. + * + * If merge can be skipped, no need to prefetch buddy. */ - if (prefetch_nr++ < pcp->batch) - prefetch_buddy(page); + if (can_skip_merge(zone, 0) || prefetch_nr > pcp->batch) + continue; + + prefetch_buddy(page); + prefetch_nr++; } while (--count && --batch_free && !list_empty(list)); } From patchwork Wed Oct 17 06:33:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10644631 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 70AB5157A for ; Wed, 17 Oct 2018 06:33:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E7AC2A441 for ; Wed, 17 Oct 2018 06:33:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5235F2A76D; Wed, 17 Oct 2018 06:33:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D41D52A441 for ; Wed, 17 Oct 2018 06:33:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E43506B000A; Wed, 17 Oct 2018 02:33:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DF94B6B000C; Wed, 17 Oct 2018 02:33:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBB0B6B000D; Wed, 17 Oct 2018 02:33:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 7D2266B000A for ; Wed, 17 Oct 2018 02:33:45 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id v4-v6so20190389plz.21 for ; Tue, 16 Oct 2018 23:33:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=0+1b2W4Ot5f4bvUHSv8R8ClH3Ctwr5AHKQa+HdlPC/M=; b=AfVwDxWfozuPdnnM5RzRvhpxpmz/UGJ8r9DwSwqBosaCQwNNS1GJL38zwaiSGjNhzR zsPRSicGZ5nVVryKwC2aOjplbPMQcuCf1Vqq6hwdyl6zr49ROXNenmr8M3OVh0XqE7u1 DfGxqNozdGoYOGMS65Njt6/vlSqQERyvGss+Ya+13l/X7QoKu1aPgT5PzEQ2Vc5sbCby guZ8YbeyLlEU5gqvEtdYtR3HhqFwWfB++4IexG9qsejBgY+2QKD4exaYU04Nc7P93Ihg 7rPs9VqfLMkMsDtYSkMPy3jX29dcQ7g/KCDksinVn6Rjzc3AZ3ckn/PivT94nwd3vYwh iYgQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogsNlDpZnnANJnTD4YP501lv5ZNrjZQ5ZYGtYGTcOvw2fYp3xrb OcCuJtQkCdAd5qh8HZDXGzhkviHoLHDRQpQWEwC5ECl7Y1CKQBYLWOP26VHDp38o298WH8EKtCd ByYogrOJ4+Nx22yew1BEozBMDdjby0QG36ysbmfokwqkkh+2l/RB1IqZDQ4afmozOvg== X-Received: by 2002:a17:902:108a:: with SMTP id c10-v6mr15541198pla.49.1539758025140; Tue, 16 Oct 2018 23:33:45 -0700 (PDT) X-Google-Smtp-Source: ACcGV61yo7gzsrVMWwzGUYV14/6Ls0i0m2/SuvFEzkiOpUPesiRr5IdQ6ojv4Jb3ZW2RfoTSH8Iw X-Received: by 2002:a17:902:108a:: with SMTP id c10-v6mr15541117pla.49.1539758023448; Tue, 16 Oct 2018 23:33:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539758023; cv=none; d=google.com; s=arc-20160816; b=tBybeTdSGFmWB3VMO5WTUhODv4B9tZQ/CdohK64NWv567IGHFIaj4FYRhZCw8qEgjg l/CLAfEKvDRUx2eR9J0KKBx1+sTQYgsJxlWfEqK0c8PY/64/n/504ObyrCu0F2lHpLQF Ri3Jw6YtpFVhMbBmozOsa2d6Vai+KwQ+dE4EnlZFRBHYqokAnEDBChW9Nq1KVMcr9Cxn MNKhmXJTHZE+8sN1DNJSFvYVQZItyQlJsrQtH9fvy7NDI5N8JpzEY2FyiYxWiNYtoJxf SdvszygJs/kih04FSdWsLCo0sJBDcfR7JgV2+hF0syIGK7hZ64g+oTYKFcmW9nym5UWA THdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=0+1b2W4Ot5f4bvUHSv8R8ClH3Ctwr5AHKQa+HdlPC/M=; b=Z4gZSukFHjBejGgpO9QYo0/QRyIEJbIZqA90QUdlo95B6Utial/fqOUhBypEUQnU0K lVSK3eB6OnLUI3BfBDx6IekA+Pmx2UBQFOGkuZUylKFVPJ73zVyUtkIJnP2uOxMHzhLa pqoVB7u6/zM/UzfhxXit1y4Yf8Dln31l5Sb0HyAQ9ZN8txnwS0XOsYDu8TEckTQ+Stt0 RG51skpSBmS05V7rG/JGtCpELxQJILtA8P0v75+KEH+LoOyZnlIVlTJurvMi9MFvsSZ9 sCx3nMzSCbXvIxIvrAvo4yloswZtL9q5Q2xj6tS071wrGekKgHxIIWlTmKS7Po37ZXbQ XM6Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id f2-v6si16427856pgf.423.2018.10.16.23.33.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 23:33:43 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 23:33:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,391,1534834800"; d="scan'208";a="273090105" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 16 Oct 2018 23:33:40 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Jesper Dangaard Brouer Subject: [RFC v4 PATCH 3/5] mm/rmqueue_bulk: alloc without touching individual page structure Date: Wed, 17 Oct 2018 14:33:28 +0800 Message-Id: <20181017063330.15384-4-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181017063330.15384-1-aaron.lu@intel.com> References: <20181017063330.15384-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Profile on Intel Skylake server shows the most time consuming part under zone->lock on allocation path is accessing those to-be-returned page's "struct page" on the free_list inside zone->lock. One explanation is, different CPUs are releasing pages to the head of free_list and those page's 'struct page' may very well be cache cold for the allocating CPU when it grabs these pages from free_list' head. The purpose here is to avoid touching these pages one by one inside zone->lock. One idea is, we just take the requested number of pages off free_list with something like list_cut_position() and then adjust nr_free of free_area accordingly inside zone->lock and other operations like clearing PageBuddy flag for these pages are done outside of zone->lock. list_cut_position() needs to know where to cut, that's what the new 'struct cluster' meant to provide. All pages on order 0's free_list belongs to a cluster so when a number of pages is needed, the cluster to which head page of free_list belongs is checked and then tail page of the cluster could be found. With tail page, list_cut_position() can be used to drop the cluster off free_list. The 'struct cluster' also has 'nr' to tell how many pages this cluster has so nr_free of free_area can be adjusted inside the lock too. This caused a race window though: from the moment zone->lock is dropped till these pages' PageBuddy flags get cleared, these pages are not in buddy but still have PageBuddy flag set. This doesn't cause problems for users that access buddy pages through free_list. But there are other users, like move_freepages() which is used to move a pageblock pages from one migratetype to another in fallback allocation path, will test PageBuddy flag of a page derived from PFN. The end result could be that for pages in the race window, they are moved back to free_list of another migratetype. For this reason, a synchronization function zone_wait_cluster_alloc() is introduced to wait till all pages are in correct state. This function is meant to be called with zone->lock held, so after this function returns, we do not need to worry about new pages becoming racy state. Another user is compaction, where it will scan a pageblock for migratable candidates. In this process, pages derived from PFN will be checked for PageBuddy flag to decide if it is a merge skipped page. To avoid a racy page getting merged back into buddy, the zone_wait_and_disable_cluster_alloc() function is introduced to: 1 disable clustered allocation by increasing zone->cluster.disable_depth; 2 wait till the race window pass by calling zone_wait_cluster_alloc(). This function is also meant to be called with zone->lock held so after it returns, all pages are in correct state and no more cluster alloc will be attempted till zone_enable_cluster_alloc() is called to decrease zone->cluster.disable_depth. The two patches could eliminate zone->lock contention entirely but at the same time, pgdat->lru_lock contention rose to 82%. Final performance increased about 8.3%. Suggested-by: Ying Huang Suggested-by: Dave Hansen Signed-off-by: Aaron Lu --- include/linux/mm_types.h | 19 +-- include/linux/mmzone.h | 35 +++++ mm/compaction.c | 4 + mm/internal.h | 34 +++++ mm/page_alloc.c | 288 +++++++++++++++++++++++++++++++++++++-- 5 files changed, 363 insertions(+), 17 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index aed93053ef6e..3abe1515502e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -85,8 +85,14 @@ struct page { */ struct list_head lru; /* See page-flags.h for PAGE_MAPPING_FLAGS */ - struct address_space *mapping; - pgoff_t index; /* Our offset within mapping. */ + union { + struct address_space *mapping; + struct cluster *cluster; + }; + union { + pgoff_t index; /* Our offset within mapping. */ + bool buddy_merge_skipped; + }; /** * @private: Mapping-private opaque data. * Usually used for buffer_heads if PagePrivate. @@ -179,13 +185,8 @@ struct page { int units; /* SLOB */ }; - union { - /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ - atomic_t _refcount; - - /* For pages in Buddy: if skipped merging when added to Buddy */ - bool buddy_merge_skipped; - }; + /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ + atomic_t _refcount; #ifdef CONFIG_MEMCG struct mem_cgroup *mem_cgroup; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96734e0..765567366ddb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -356,6 +356,40 @@ enum zone_type { #ifndef __GENERATING_BOUNDS_H +struct cluster { + struct page *tail; /* tail page of the cluster */ + int nr; /* how many pages are in this cluster */ +}; + +struct order0_cluster { + /* order 0 cluster array, dynamically allocated */ + struct cluster *array; + /* + * order 0 cluster array length, also used to indicate if cluster + * allocation is enabled for this zone(cluster allocation is disabled + * for small zones whose batch size is smaller than 1, like DMA zone) + */ + int len; + /* + * smallest position from where we search for an + * empty cluster from the cluster array + */ + int zero_bit; + /* bitmap used to quickly locate an empty cluster from cluster array */ + unsigned long *bitmap; + + /* disable cluster allocation to avoid new pages becoming racy state. */ + unsigned long disable_depth; + + /* + * used to indicate if there are pages allocated in cluster mode + * still in racy state. Caller with zone->lock held could use helper + * function zone_wait_cluster_alloc() to wait all such pages to exit + * the race window. + */ + atomic_t in_progress; +}; + struct zone { /* Read-mostly fields */ @@ -460,6 +494,7 @@ struct zone { /* free areas of different sizes */ struct free_area free_area[MAX_ORDER]; + struct order0_cluster cluster; /* zone flags, see below */ unsigned long flags; diff --git a/mm/compaction.c b/mm/compaction.c index 0c9c7a30dde3..b732136dfc4c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1601,6 +1601,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro migrate_prep_local(); + zone_wait_and_disable_cluster_alloc(zone); + while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) { int err; @@ -1699,6 +1701,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro zone->compact_cached_free_pfn = free_pfn; } + zone_enable_cluster_alloc(zone); + count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned); count_compact_events(COMPACTFREE_SCANNED, cc->total_free_scanned); diff --git a/mm/internal.h b/mm/internal.h index c166735a559e..fb4e8f7976e5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -546,12 +546,46 @@ static inline bool can_skip_merge(struct zone *zone, int order) if (order) return false; + /* + * Clustered allocation is only disabled when high-order pages + * are needed, e.g. in compaction and CMA alloc, so we should + * also skip merging in that case. + */ + if (zone->cluster.disable_depth) + return false; + return true; } + +static inline void zone_wait_cluster_alloc(struct zone *zone) +{ + while (atomic_read(&zone->cluster.in_progress)) + cpu_relax(); +} + +static inline void zone_wait_and_disable_cluster_alloc(struct zone *zone) +{ + unsigned long flags; + spin_lock_irqsave(&zone->lock, flags); + zone->cluster.disable_depth++; + zone_wait_cluster_alloc(zone); + spin_unlock_irqrestore(&zone->lock, flags); +} + +static inline void zone_enable_cluster_alloc(struct zone *zone) +{ + unsigned long flags; + spin_lock_irqsave(&zone->lock, flags); + zone->cluster.disable_depth--; + spin_unlock_irqrestore(&zone->lock, flags); +} #else /* CONFIG_COMPACTION */ static inline bool can_skip_merge(struct zone *zone, int order) { return false; } +static inline void zone_wait_cluster_alloc(struct zone *zone) {} +static inline void zone_wait_and_disable_cluster_alloc(struct zone *zone) {} +static inline void zone_enable_cluster_alloc(struct zone *zone) {} #endif /* CONFIG_COMPACTION */ #endif /* __MM_INTERNAL_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 76d471e0ab24..e60a248030dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -707,6 +707,82 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } +static inline struct cluster *new_cluster(struct zone *zone, int nr, + struct page *tail) +{ + struct order0_cluster *cluster = &zone->cluster; + int n = find_next_zero_bit(cluster->bitmap, cluster->len, cluster->zero_bit); + if (n == cluster->len) { + printk_ratelimited("node%d zone %s cluster used up\n", + zone->zone_pgdat->node_id, zone->name); + return NULL; + } + cluster->zero_bit = n; + set_bit(n, cluster->bitmap); + cluster->array[n].nr = nr; + cluster->array[n].tail = tail; + return &cluster->array[n]; +} + +static inline struct cluster *add_to_cluster_common(struct page *page, + struct zone *zone, struct page *neighbor) +{ + struct cluster *c; + + if (neighbor) { + int batch = this_cpu_ptr(zone->pageset)->pcp.batch; + c = neighbor->cluster; + if (c && c->nr < batch) { + page->cluster = c; + c->nr++; + return c; + } + } + + c = new_cluster(zone, 1, page); + if (unlikely(!c)) + return NULL; + + page->cluster = c; + return c; +} + +/* + * Add this page to the cluster where the previous head page belongs. + * Called after page is added to free_list(and becoming the new head). + */ +static inline void add_to_cluster_head(struct page *page, struct zone *zone, + int order, int mt) +{ + struct page *neighbor; + + if (order || !zone->cluster.len) + return; + + neighbor = page->lru.next == &zone->free_area[0].free_list[mt] ? + NULL : list_entry(page->lru.next, struct page, lru); + add_to_cluster_common(page, zone, neighbor); +} + +/* + * Add this page to the cluster where the previous tail page belongs. + * Called after page is added to free_list(and becoming the new tail). + */ +static inline void add_to_cluster_tail(struct page *page, struct zone *zone, + int order, int mt) +{ + struct page *neighbor; + struct cluster *c; + + if (order || !zone->cluster.len) + return; + + neighbor = page->lru.prev == &zone->free_area[0].free_list[mt] ? + NULL : list_entry(page->lru.prev, struct page, lru); + c = add_to_cluster_common(page, zone, neighbor); + c->tail = page; +} + static inline void add_to_buddy_common(struct page *page, struct zone *zone, unsigned int order) { @@ -720,6 +796,7 @@ static inline void add_to_buddy_head(struct page *page, struct zone *zone, { add_to_buddy_common(page, zone, order); list_add(&page->lru, &zone->free_area[order].free_list[mt]); + add_to_cluster_head(page, zone, order, mt); } static inline void add_to_buddy_tail(struct page *page, struct zone *zone, @@ -727,6 +804,7 @@ static inline void add_to_buddy_tail(struct page *page, struct zone *zone, { add_to_buddy_common(page, zone, order); list_add_tail(&page->lru, &zone->free_area[order].free_list[mt]); + add_to_cluster_tail(page, zone, order, mt); } static inline void rmv_page_order(struct page *page) @@ -735,9 +813,29 @@ static inline void rmv_page_order(struct page *page) set_page_private(page, 0); } +/* called before removed from free_list */ +static inline void remove_from_cluster(struct page *page, struct zone *zone) +{ + struct cluster *c = page->cluster; + if (!c) + return; + + page->cluster = NULL; + c->nr--; + if (!c->nr) { + int bit = c - zone->cluster.array; + c->tail = NULL; + clear_bit(bit, zone->cluster.bitmap); + if (bit < zone->cluster.zero_bit) + zone->cluster.zero_bit = bit; + } else if (page == c->tail) + c->tail = list_entry(page->lru.prev, struct page, lru); +} + static inline void remove_from_buddy(struct page *page, struct zone *zone, unsigned int order) { + remove_from_cluster(page, zone); list_del(&page->lru); zone->free_area[order].nr_free--; rmv_page_order(page); @@ -2098,6 +2196,17 @@ static int move_freepages(struct zone *zone, if (num_movable) *num_movable = 0; + /* + * Cluster alloced pages may have their PageBuddy flag unclear yet + * after dropping zone->lock in rmqueue_bulk() and steal here could + * move them back to free_list. So it's necessary to wait till all + * those pages have their flags properly cleared. + * + * We do not need to disable cluster alloc though since we already + * held zone->lock and no allocation could happen. + */ + zone_wait_cluster_alloc(zone); + for (page = start_page; page <= end_page;) { if (!pfn_valid_within(page_to_pfn(page))) { page++; @@ -2122,8 +2231,10 @@ static int move_freepages(struct zone *zone, } order = page_order(page); + remove_from_cluster(page, zone); list_move(&page->lru, &zone->free_area[order].free_list[migratetype]); + add_to_cluster_head(page, zone, order, migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -2272,7 +2383,9 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, single_page: area = &zone->free_area[current_order]; + remove_from_cluster(page, zone); list_move(&page->lru, &area->free_list[start_type]); + add_to_cluster_head(page, zone, current_order, start_type); } /* @@ -2533,6 +2646,145 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype) return page; } +static int __init zone_order0_cluster_init(void) +{ + struct zone *zone; + + for_each_zone(zone) { + int len, mt, batch; + unsigned long flags; + struct order0_cluster *cluster; + + if (!managed_zone(zone)) + continue; + + /* no need to enable cluster allocation for batch<=1 zone */ + preempt_disable(); + batch = this_cpu_ptr(zone->pageset)->pcp.batch; + preempt_enable(); + if (batch <= 1) + continue; + + cluster = &zone->cluster; + /* FIXME: possible overflow of int type */ + len = DIV_ROUND_UP(zone->managed_pages, batch); + cluster->array = vzalloc(len * sizeof(struct cluster)); + if (!cluster->array) + return -ENOMEM; + cluster->bitmap = vzalloc(DIV_ROUND_UP(len, BITS_PER_LONG) * + sizeof(unsigned long)); + if (!cluster->bitmap) + return -ENOMEM; + + spin_lock_irqsave(&zone->lock, flags); + cluster->len = len; + for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { + struct page *page; + list_for_each_entry_reverse(page, + &zone->free_area[0].free_list[mt], lru) + add_to_cluster_head(page, zone, 0, mt); + } + spin_unlock_irqrestore(&zone->lock, flags); + } + + return 0; +} +subsys_initcall(zone_order0_cluster_init); + +static inline int __rmqueue_bulk_cluster(struct zone *zone, unsigned long count, + struct list_head *list, int mt) +{ + struct list_head *head = &zone->free_area[0].free_list[mt]; + int nr = 0; + + while (nr < count) { + struct page *head_page; + struct list_head *tail, tmp_list; + struct cluster *c; + int bit; + + head_page = list_first_entry_or_null(head, struct page, lru); + if (!head_page || !head_page->cluster) + break; + + c = head_page->cluster; + tail = &c->tail->lru; + + /* drop the cluster off free_list and attach to list */ + list_cut_position(&tmp_list, head, tail); + list_splice_tail(&tmp_list, list); + + nr += c->nr; + zone->free_area[0].nr_free -= c->nr; + + /* this cluster is empty now */ + c->tail = NULL; + c->nr = 0; + bit = c - zone->cluster.array; + clear_bit(bit, zone->cluster.bitmap); + if (bit < zone->cluster.zero_bit) + zone->cluster.zero_bit = bit; + } + + return nr; +} + +static inline int rmqueue_bulk_cluster(struct zone *zone, unsigned int order, + unsigned long count, struct list_head *list, + int migratetype) +{ + int alloced; + struct page *page; + + /* + * Cluster alloc races with merging so don't try cluster alloc when we + * can't skip merging. Note that can_skip_merge() keeps the same return + * value from here till all pages have their flags properly processed, + * i.e. the end of the function where in_progress is incremented, even + * we have dropped the lock in the middle because the only place that + * can change can_skip_merge()'s return value is compaction code and + * compaction needs to wait on in_progress. + */ + if (!can_skip_merge(zone, 0)) + return 0; + + /* Cluster alloc is disabled, mostly compaction is already in progress */ + if (zone->cluster.disable_depth) + return 0; + + /* Cluster alloc is disabled for this zone */ + if (unlikely(!zone->cluster.len)) + return 0; + + alloced = __rmqueue_bulk_cluster(zone, count, list, migratetype); + if (!alloced) + return 0; + + /* + * Cache miss on page structure could slow things down + * dramatically so accessing these alloced pages without + * holding lock for better performance. + * + * Since these pages still have PageBuddy set, there is a race + * window between now and when PageBuddy is cleared for them + * below. Any operation that would scan a pageblock and check + * PageBuddy(page), e.g. compaction, will need to wait till all + * such pages are properly processed. in_progress is used for + * such purpose so increase it now before dropping the lock. + */ + atomic_inc(&zone->cluster.in_progress); + spin_unlock(&zone->lock); + + list_for_each_entry(page, list, lru) { + rmv_page_order(page); + page->cluster = NULL; + set_pcppage_migratetype(page, migratetype); + } + atomic_dec(&zone->cluster.in_progress); + + return alloced; +} + /* * Obtain a specified number of elements from the buddy allocator, all under * a single hold of the lock, for efficiency. Add them to the supplied list. @@ -2542,17 +2794,23 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype) { - int i, alloced = 0; + int i, alloced; + struct page *page, *tmp; spin_lock(&zone->lock); - for (i = 0; i < count; ++i) { - struct page *page = __rmqueue(zone, order, migratetype); + alloced = rmqueue_bulk_cluster(zone, order, count, list, migratetype); + if (alloced > 0) { + if (alloced >= count) + goto out; + else + spin_lock(&zone->lock); + } + + for (; alloced < count; alloced++) { + page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; - if (unlikely(check_pcp_refill(page))) - continue; - /* * Split buddy pages returned by expand() are received here in * physical page order. The page is added to the tail of @@ -2564,7 +2822,18 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->lru, list); - alloced++; + } + spin_unlock(&zone->lock); + +out: + i = alloced; + list_for_each_entry_safe(page, tmp, list, lru) { + if (unlikely(check_pcp_refill(page))) { + list_del(&page->lru); + alloced--; + continue; + } + if (is_migrate_cma(get_pcppage_migratetype(page))) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); @@ -2577,7 +2846,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages added to the pcp list. */ __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); - spin_unlock(&zone->lock); return alloced; } @@ -7925,6 +8193,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, unsigned long outer_start, outer_end; unsigned int order; int ret = 0; + struct zone *zone = page_zone(pfn_to_page(start)); struct compact_control cc = { .nr_migratepages = 0, @@ -7967,6 +8236,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, if (ret) return ret; + zone_wait_and_disable_cluster_alloc(zone); /* * In case of -EBUSY, we'd like to know which page causes problem. * So, just fall through. test_pages_isolated() has a tracepoint @@ -8049,6 +8319,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, done: undo_isolate_page_range(pfn_max_align_down(start), pfn_max_align_up(end), migratetype); + + zone_enable_cluster_alloc(zone); return ret; } From patchwork Wed Oct 17 06:33:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10644633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0D55112B for ; Wed, 17 Oct 2018 06:33:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C08E72A76D for ; Wed, 17 Oct 2018 06:33:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B4C402A874; Wed, 17 Oct 2018 06:33:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 426332A76D for ; Wed, 17 Oct 2018 06:33:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD7796B000D; Wed, 17 Oct 2018 02:33:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B5D1F6B000E; Wed, 17 Oct 2018 02:33:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B0E06B0010; Wed, 17 Oct 2018 02:33:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 514586B000D for ; Wed, 17 Oct 2018 02:33:47 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id d69-v6so19012576pgc.22 for ; Tue, 16 Oct 2018 23:33:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=UEWF/oMfwuMtnniRCkmBf5lutGOCwkrA40ose+iQuNY=; b=CL8+lLF4iGT6NGokNJzOSFIakhyvxeTX/ON8d9x9VqbT72ebDtuehDu7ztORKtvMW2 UyMPvx5GTSYjhDzE7VJSnFT/wddSOQ+lzSmiBclZQQugJXVwnkXjJu5aTbyCbDCU7XIp +p6RJQ4ME+RLEUQ1re8e3n7iuF+2xgyeD1d2vRaVGYoulkhHf03xf0wShsP2vBMEldph Px6qjJ0L5Z05XEt8IAs9QU06bX3CeL5zF88s5zqy/pnZS/M9HpCFGiRlhEDT4KcrMMdU 7wJn/uRApgzHtgqbKRBTlUfC5VBEiYEzGarUJN3npP6G7Rut6b6d3T7DyqEvQOMrfEi9 H3Tw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohII7eG8+D0OStEX8OChVsdpAJ5IdJ+tUCZjMfvvoqonq7og9pf RMtReAQyFYi/7sniXYQlfO8gE7tGNrO2omST9ebTICUcGgrvxBLOIlDwFVo+bSQv4jcDyZpV9Tf 6HD9ihacvJEehYArvRZguW+Ji90zakkdHYNtse0zFHf4i/XpoNCfq1pBRAntFY364xQ== X-Received: by 2002:a65:4783:: with SMTP id e3-v6mr22871830pgs.12.1539758027017; Tue, 16 Oct 2018 23:33:47 -0700 (PDT) X-Google-Smtp-Source: ACcGV61paSl0vAYCE5yKz28JE7ewP4zumBjtlxcMTdMzP/D5jI1MOVs2KpvsYAkakUa6tBl1LrEa X-Received: by 2002:a65:4783:: with SMTP id e3-v6mr22871800pgs.12.1539758026330; Tue, 16 Oct 2018 23:33:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539758026; cv=none; d=google.com; s=arc-20160816; b=FekpC11VKEC5xigfKeOCCkWC9km0B09n40AsHAxEyE/3M17CboZqp2V5+L7i3zzBul Oc5mvScFXuck8jMzNPSjPOlLG3UWSXJp8hwXUyXJ9HG0K2ZzFdzIyoJw+rNwMRIlY35J KanGhiXZtR5qt1SkfUQYU4FVzpsJxV0RsgxXFxMMUx4gdR0RWRYHE0acbxBcA/x1dfsF IIFsG64Ad2iIP2Ud9TpgTGCGCPgT6V3uHq5RNhrvzbf72O4Pz/ud5i0ixhR8VMEcJmZU QJUlBqJJtZS1VGmjTGP+FAYB3pYy5HBzodWoc0HXplNCo+TwpCAftXh3s9xAMurHNFBY Q2cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=UEWF/oMfwuMtnniRCkmBf5lutGOCwkrA40ose+iQuNY=; b=c85QrJK099LxzMHBQGmqIAjGIexl5MMASgzgHHB8WigPBeLtV3vgzWs+p5m8Tkdoul gX5zHgPtMsufGPNxlIhf3VptAQk/N1vwCnDgdFAqOdJDKxfPJtJY0LQOduQMI2h/OWqH U5z7YR65MTyGgmGAb4urR2Hq5c58hI3tbL1FDkhq+r+krccQhL6QrYHEbXJiUCGd1YgR X449IZsdqW9JSUjfN2ffYw5jySOyFY1INpcS6dlgtCzzRejvddRHgPyzlK0ijOf/G7sP HF85DzkE07xPJ9NXZqvsm4Xn4hf4BKjgLrd8QIdHY1SiV4cCmqVnGIL8MYqnOqlKJtU8 bx0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id v202-v6si1081311pgb.96.2018.10.16.23.33.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 23:33:46 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 23:33:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,391,1534834800"; d="scan'208";a="273090123" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 16 Oct 2018 23:33:43 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Jesper Dangaard Brouer Subject: [RFC v4 PATCH 4/5] mm/free_pcppages_bulk: reduce overhead of cluster operation on free path Date: Wed, 17 Oct 2018 14:33:29 +0800 Message-Id: <20181017063330.15384-5-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181017063330.15384-1-aaron.lu@intel.com> References: <20181017063330.15384-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP After "no_merge for order 0", the biggest overhead in free path for order 0 pages is now add_to_cluster(). As pages are freed one by one, it caused frequent operation of add_to_cluster(). Ideally, if only one migratetype pcp list has pages to free and count=pcp->batch in free_pcppages_bulk(), we can avoid calling add_to_cluster() one time per page but adding them in one go as a single cluster so this patch just did this. This optimization brings zone->lock contention down from 25% to almost zero again using the parallel free workload. Signed-off-by: Aaron Lu --- mm/page_alloc.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 46 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e60a248030dc..204696f6c2f4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1242,6 +1242,36 @@ static inline void prefetch_buddy(struct page *page) prefetch(buddy); } +static inline bool free_cluster_pages(struct zone *zone, struct list_head *list, + int mt, int count) +{ + struct cluster *c; + struct page *page, *n; + + if (!can_skip_merge(zone, 0)) + return false; + + if (count != this_cpu_ptr(zone->pageset)->pcp.batch) + return false; + + c = new_cluster(zone, count, list_first_entry(list, struct page, lru)); + if (unlikely(!c)) + return false; + + list_for_each_entry_safe(page, n, list, lru) { + set_page_order(page, 0); + set_page_merge_skipped(page); + page->cluster = c; + list_add(&page->lru, &zone->free_area[0].free_list[mt]); + } + + INIT_LIST_HEAD(list); + zone->free_area[0].nr_free += count; + __mod_zone_page_state(zone, NR_FREE_PAGES, count); + + return true; +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone, and of same order. @@ -1256,10 +1286,10 @@ static inline void prefetch_buddy(struct page *page) static void free_pcppages_bulk(struct zone *zone, int count, struct per_cpu_pages *pcp) { - int migratetype = 0; - int batch_free = 0; + int migratetype = 0, i, count_mt[MIGRATE_PCPTYPES] = {0}; + int batch_free = 0, saved_count = count; int prefetch_nr = 0; - bool isolated_pageblocks; + bool isolated_pageblocks, single_mt = false; struct page *page, *tmp; LIST_HEAD(head); @@ -1283,6 +1313,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* This is the only non-empty list. Free them all. */ if (batch_free == MIGRATE_PCPTYPES) batch_free = count; + count_mt[migratetype] += batch_free; do { page = list_last_entry(list, struct page, lru); @@ -1314,12 +1345,24 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } + for (i = 0; i < MIGRATE_PCPTYPES; i++) { + if (count_mt[i] == saved_count) { + single_mt = true; + break; + } + } + spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); + if (!isolated_pageblocks && single_mt) + free_cluster_pages(zone, &head, migratetype, saved_count); + /* * Use safe version since after __free_one_page(), * page->lru.next will not point to original list. + * + * If free_cluster_pages() succeeds, head will be an empty list here. */ list_for_each_entry_safe(page, tmp, &head, lru) { int mt = get_pcppage_migratetype(page); From patchwork Wed Oct 17 06:33:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10644635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D90F112B for ; Wed, 17 Oct 2018 06:34:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BC042A786 for ; Wed, 17 Oct 2018 06:34:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0AEBE2A874; Wed, 17 Oct 2018 06:34:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DEC52A76D for ; Wed, 17 Oct 2018 06:33:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 267706B0010; Wed, 17 Oct 2018 02:33:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 126A86B0269; Wed, 17 Oct 2018 02:33:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDF106B0266; Wed, 17 Oct 2018 02:33:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id AE2C96B000E for ; Wed, 17 Oct 2018 02:33:49 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id g63-v6so9960796pfc.9 for ; Tue, 16 Oct 2018 23:33:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=e1fGaAEHRld7oLwWuf8wocQSXEQGnFQzFkkW47SU2QE=; b=VSuah5pUOuZtNmrjJvN0gePBav341j81t0/bQUuvdszqXW2+udYnSQwe3gVdIZ1syv PN947qfv3aMka2a9SydKMuWHLGjwsbkV0fLtR6xN73iBClT9e01eLf30PbaEBJTVtItj UzkG1RhnVl3xaXLj0mr3Bi5DMSlxxDg7B54RpNDh73Lw9IxiY7WWDiPf2LzybZGn+EK3 WsumAjNe+qWABDij0TRwgmYo1MkiUGe76o+gxOqx9kvsfUHuoXqdn5bTFLKHBqxSu8q4 cu5fyuJkEwOmpExRBF6/gqmUU4R6ttsjAT38W896V2fQcmueP6mY5OAIzA5x5fNejCLk E1NQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoigFGqr9b/+c3D5PR2rOho5Itb+A/YULpjFxG27fVR1fsWWwcmh Q7FmRJDCaC3MHEGr6PGDd31+Nu+T1+lksknGD3CiS8C6/Ecw6nVgk5xvShIJudJwCGlnXQoOXuV VLlKORw+x0BZC2qr0Dk7FMXRTskdWoEqMncOcUcP6tF0hHMozpJ0Y4hCgcZN19QJakA== X-Received: by 2002:a17:902:1e2:: with SMTP id b89-v6mr25137444plb.296.1539758029430; Tue, 16 Oct 2018 23:33:49 -0700 (PDT) X-Google-Smtp-Source: ACcGV63ISgDlxCR/iik2lflojPBjnOnWWAbFkohrxb9ZUsIZppXduwDHLYe4uF80Hwe7S36Je20N X-Received: by 2002:a17:902:1e2:: with SMTP id b89-v6mr25137414plb.296.1539758028855; Tue, 16 Oct 2018 23:33:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539758028; cv=none; d=google.com; s=arc-20160816; b=I/DQ7Bdm9FpBNcTc/J+y37O4iuyoJwyUC1JZzSaglx4yxuUpqYXJlNFo9LXNVIH/Y2 R3Vwa9Na5KIULIJPePza+0ti8nwTPpKAjyR+Rtd8vPBQ9xycuy/DFbgGcA1svoGJhNKi 6dfdz/vtQGf+upxQ4TLheAljtdROJVmpKhXbr+nD5Sn9HPE9EZRRXRSyJO9gSb90OKnR AXxedLuqPTX/OIFGVng1LN+2yqREH/eDmEJtknoamzxGcnDlLtgTfdipbowewWXRO6cV q0QRWaVi/Wm6F2JPAZOnk/24sB9AqTKOc4fTZw/dVCMMBjGoWEET1v6JwFiVV0SCs4Nz cx6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=e1fGaAEHRld7oLwWuf8wocQSXEQGnFQzFkkW47SU2QE=; b=m6gRItWq4sgClJKTGuXsw/sXW+h6txrK5u8Xsf9GFsKO/UugRNGCyF0nip4aOvFRz+ KpoeTDJftzvOdPLgxvMR5hmlLP/EekDc8RJz0vXwrW9B7a1d2PAaosLQGhnV5ZKvIsty 7nEFcMOH8+W7FkanE69xyO0gcrEmbjNOsoprXF1+Ie9EQjg/JpWYdBvD3LKbkwOmAxZI de3vM0+EuHy+Vo+L3a6/fosoOqIrzNLVQRsf2zY+p0LQfPZ/ut320HoM99uin4xQq2Zp 28gcJylBuQOIJz2CAU65qBa4P/SuDeVJcgEliJaSJNv4VwoEswOPOl9AzJ+9cmSld2bS qgpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id v202-v6si1081311pgb.96.2018.10.16.23.33.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 23:33:48 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 23:33:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,391,1534834800"; d="scan'208";a="273090138" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 16 Oct 2018 23:33:45 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Jesper Dangaard Brouer Subject: [RFC v4 PATCH 5/5] mm/can_skip_merge(): make it more aggressive to attempt cluster alloc/free Date: Wed, 17 Oct 2018 14:33:30 +0800 Message-Id: <20181017063330.15384-6-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181017063330.15384-1-aaron.lu@intel.com> References: <20181017063330.15384-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP After system runs a long time, it's easy for a zone to have no suitable high order page available and that will stop cluster alloc and free in current implementation due to compact_considered > 0. To make it favour order0 alloc/free, relax the condition to only disallow cluster alloc/free when problem would occur, e.g. when compaction is in progress. Signed-off-by: Aaron Lu --- mm/internal.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index fb4e8f7976e5..309a3f43e613 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -538,10 +538,6 @@ void try_to_merge_page(struct page *page); #ifdef CONFIG_COMPACTION static inline bool can_skip_merge(struct zone *zone, int order) { - /* Compaction has failed in this zone, we shouldn't skip merging */ - if (zone->compact_considered) - return false; - /* Only consider no_merge for order 0 pages */ if (order) return false;