From patchwork Sat Apr 29 08:27:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yin, Fengwei" X-Patchwork-Id: 13226930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 003ECC77B7E for ; Sat, 29 Apr 2023 08:28:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DE3D280001; Sat, 29 Apr 2023 04:28:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 568386B0078; Sat, 29 Apr 2023 04:28:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40982280001; Sat, 29 Apr 2023 04:28:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 325EB6B0075 for ; Sat, 29 Apr 2023 04:28:25 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EDD1B1A0422 for ; Sat, 29 Apr 2023 08:28:24 +0000 (UTC) X-FDA: 80733751728.25.D19458A Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf14.hostedemail.com (Postfix) with ESMTP id DAF55100009 for ; Sat, 29 Apr 2023 08:28:22 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jF6Kjv3m; spf=pass (imf14.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682756903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yWkdJtBXN1id9wlaTZHPb/xIZsXQnpXV0lQOjHNHjFI=; b=GrrlWc5rf5zAGu+rJTzHZu3GM1NvxPOYXED9DimbZANu/IqU0hJcNy8V3UxYXwFmglosuH wKsL0NcGUn7iRXpFYgW6mwEtWItcDwMn9yem7NaoK5DToofRPDM+jv8Fo/Wir/YhMeXJi0 T2fhBIVLZ5e/AXPb4ZhYSeKa5HLjsmc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jF6Kjv3m; spf=pass (imf14.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682756903; a=rsa-sha256; cv=none; b=UCMjPVjw8pFHMJOaTeLun+lUDskpFPkTdYQPyyZdx9dQUcTzWONZ56Azc9fwMp1Zt4RbUL zFJP1nWQpkhVLR6xrjvzlXJXnO9UYoxfQRKZXaEMY/uU/RbgG5CoETBNKmz1v8iw3S+oSS /Jn8HmhNcdB/mapCuhb2vgy/ZADOuEg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682756901; x=1714292901; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G7cBUEarBIL0t+42BaV12wsz0NXC3bycF0bAd4rxs04=; b=jF6Kjv3mktmRSvp0ZqQW/jtRVGCObq+iKtbhQ+ECJiri4yW6c+dO1d3Y EJTN/KDopZWGGR5QoRJfxcyZbnh/jhzNqiDgKtYNZEOWptHgHcNRovKLL bAUE/OHfYfd+lbjyWjEB2gximTFOz5bcL+sXeY3KTH2O3EjidQDA1bpD/ lGoYEd0Jo9V4Nk+aXAHUaeqf/RLfC0caMY/KuWvDeHUMgYzNiYzlYm+b6 Pj8m83iqR1DBt6IXn9rWZdHzXwr+6h/MxsNtEBtz+oB3J4Vh32fKl0psL Xmo7Ijp1jjed6VV3lJ1TrqNZqZxJAJHJcquSCpMokw6jBdwwSe+sujzA6 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10694"; a="413326031" X-IronPort-AV: E=Sophos;i="5.99,236,1677571200"; d="scan'208";a="413326031" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2023 01:28:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10694"; a="784580237" X-IronPort-AV: E=Sophos;i="5.99,236,1677571200"; d="scan'208";a="784580237" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by FMSMGA003.fm.intel.com with ESMTP; 29 Apr 2023 01:28:18 -0700 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, kirill@shutemov.name, yuzhao@google.com, ryan.roberts@arm.com, ying.huang@intel.com Cc: fengwei.yin@intel.com Subject: [PATCH v3 2/2] lru: allow large batched add large folio to lru list Date: Sat, 29 Apr 2023 16:27:59 +0800 Message-Id: <20230429082759.1600796-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230429082759.1600796-1-fengwei.yin@intel.com> References: <20230429082759.1600796-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Stat-Signature: qnjwuttsw6m1hmqrg9zzajtrpa7e67wx X-Rspam-User: X-Rspamd-Queue-Id: DAF55100009 X-Rspamd-Server: rspam06 X-HE-Tag: 1682756902-577994 X-HE-Meta: U2FsdGVkX1+BIXhhiQ+YcL5ilYEpzxdJr5sv9xGh3Eh7AZyVNfmiQkZ4UMZVIWDVCWegp6OggltYkQXK5+h3X075ePHieey6gISgWqwR7QCiedJfKUSQcC0u4MKe47tyjMJpREX7EPvAt6hzqbWqrhp+fxiD2kx0cfG2+rBsuK0GJQM8E7exOJS7Hpu0tY9fjtsL4j75Cig3k30NNnIK/zfv9kHfY35sIdmWFagbM9ldQyZkCxbS19JVXa6LtxmzcYx/AefTI4MD0VIBBpmidTu9xUSSrMxbq3QN1ww31XDafcUKge7DnjZy1NqTzO1HUq2Fb1dHoEMfu3QT3eUlN1PgkdASl1g+KuKYAJ/kFqb86bSRQlvQK9POH3+vF6g/xqhVlK6qYT3aYSPnBpjc+9TLzXYxXXg3ePPuEdNyXaJD7ZVfraVhZol4n45gZwqcYtvhkXKj2JPrxH8hR6wXNCHGz15CX2aVYVpqtW6ZEBG+pmv57PXD9FTn0fTIxJ5edROhsQKbk7sjDmKHUnsljToyxZMDBaPTtnlUlB4rbB9/a4w5hpC0Z1M7FQ63eSxehs6M7poa9WA9SPDWnUWrMjD96w3bbLYi5F3ukJyrH2K9d953adrKklnYvnS1iErrykwS3+qrGElD1tk2SUVdFQvEoBUI3zpDkEFkpvU2fnjVhuRiingTkqS5obloSF2eQKvqc+moGX09zkmYuv6gpCjGRhRK0zCbPPXZ/P8/LgRpH4K35Q9RksPhQRBqoVGnGYg3EzpCLUP/yhhT8I+7tbqjigrXxkOIsAvOjDQmFFK1dyRXpqQNaIFpLs9l9BvQ0lSgsoJmgsiUoj+KWreA1Ey28cNZXt7whUS+fWR51I9is4OywRIGPTl7BCstbWp1Az3kiOJssU7xk+29uUkxZfti5kagGq7PYMykuY3roPpyD5EnQ3dtG90jlS5Uqj+Zq0lpAdDSR0ZQQn/Rd7f h1vp1FCI MAQlePGJ74GH6VqDvZMpMBmJ/VIhRVcB5kRGiRSGk5Ja4jWrYsN1VQX+8I5rsxK+d6Wcpi4v12+ldIScFyNjENunqLsDTY9RpQ0+Gl6m440pZoAEOLrg55yBJQVL+H3LkQTayrAISO9pOVjI85SpmFegLdLhGUt/SOKwa X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, large folio is not batched added to lru list. Which cause high lru lock contention after enable large folio for anonymous mapping. Running page_fault1 of will-it-scale + order 2 folio with 96 processes on Ice Lake 48C/96T, the lru lock contention could be around 64%: - 64.31% 0.23% page_fault1_pro [kernel.kallsyms] [k] folio_lruvec_lock_irqsave - 64.07% folio_lruvec_lock_irqsave + 64.01% _raw_spin_lock_irqsave With this patch, the lru lock contention dropped to 43% with same testing: - 42.67% 0.19% page_fault1_pro [kernel.kallsyms] [k] folio_lruvec_lock_irqsave - 42.48% folio_lruvec_lock_irqsave + 42.42% _raw_spin_lock_irqsave Reported-by: "Huang, Ying" Signed-off-by: Yin Fengwei --- include/linux/pagevec.h | 46 ++++++++++++++++++++++++++++++++++++++--- mm/mlock.c | 7 +++---- mm/swap.c | 3 +-- 3 files changed, 47 insertions(+), 9 deletions(-) diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index f582f7213ea5..9479b7b50bc6 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -10,6 +10,7 @@ #define _LINUX_PAGEVEC_H #include +#include /* 15 pointers + header align the pagevec structure to a power of two */ #define PAGEVEC_SIZE 15 @@ -22,6 +23,7 @@ struct address_space; struct pagevec { unsigned char nr; bool percpu_pvec_drained; + unsigned short nr_pages; struct page *pages[PAGEVEC_SIZE]; }; @@ -30,12 +32,14 @@ void __pagevec_release(struct pagevec *pvec); static inline void pagevec_init(struct pagevec *pvec) { pvec->nr = 0; + pvec->nr_pages = 0; pvec->percpu_pvec_drained = false; } static inline void pagevec_reinit(struct pagevec *pvec) { pvec->nr = 0; + pvec->nr_pages = 0; } static inline unsigned pagevec_count(struct pagevec *pvec) @@ -54,7 +58,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec) static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page) { pvec->pages[pvec->nr++] = page; - return pagevec_space(pvec); + pvec->nr_pages += compound_nr(page); + + if (pvec->nr_pages > PAGEVEC_SIZE) + return 0; + else + return pagevec_space(pvec); } static inline void pagevec_release(struct pagevec *pvec) @@ -75,6 +84,7 @@ static inline void pagevec_release(struct pagevec *pvec) struct folio_batch { unsigned char nr; bool percpu_pvec_drained; + unsigned short nr_pages; struct folio *folios[PAGEVEC_SIZE]; }; @@ -92,12 +102,14 @@ static_assert(offsetof(struct pagevec, pages) == static inline void folio_batch_init(struct folio_batch *fbatch) { fbatch->nr = 0; + fbatch->nr_pages = 0; fbatch->percpu_pvec_drained = false; } static inline void folio_batch_reinit(struct folio_batch *fbatch) { fbatch->nr = 0; + fbatch->nr_pages = 0; } static inline unsigned int folio_batch_count(struct folio_batch *fbatch) @@ -110,6 +122,32 @@ static inline unsigned int fbatch_space(struct folio_batch *fbatch) return PAGEVEC_SIZE - fbatch->nr; } +/** + * folio_batch_add_nr_pages() - Add a folio to a batch. + * @fbatch: The folio batch. + * @folio: The folio to add. + * @nr_pages: The number of pages added to batch. + * + * The folio is added to the end of the batch. + * The batch must have previously been initialised using folio_batch_init(). + * + * Return: The number of slots still available. + * Note: parameter folio may not be direct reference to folio and can't + * use folio_nr_pages(folio). + * Currently, this function is only called in mlock.c. + */ +static inline unsigned folio_batch_add_nr_pages(struct folio_batch *fbatch, + struct folio *folio, unsigned int nr_pages) +{ + fbatch->folios[fbatch->nr++] = folio; + fbatch->nr_pages += nr_pages; + + if (fbatch->nr_pages > PAGEVEC_SIZE) + return 0; + else + return fbatch_space(fbatch); +} + /** * folio_batch_add() - Add a folio to a batch. * @fbatch: The folio batch. @@ -123,8 +161,10 @@ static inline unsigned int fbatch_space(struct folio_batch *fbatch) static inline unsigned folio_batch_add(struct folio_batch *fbatch, struct folio *folio) { - fbatch->folios[fbatch->nr++] = folio; - return fbatch_space(fbatch); + unsigned int nr_pages; + + nr_pages = xa_is_value(folio) ? 1 : folio_nr_pages(folio); + return folio_batch_add_nr_pages(fbatch, folio, nr_pages); } static inline void folio_batch_release(struct folio_batch *fbatch) diff --git a/mm/mlock.c b/mm/mlock.c index 617469fce96d..6de3e6d4639f 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -243,19 +243,18 @@ bool need_mlock_drain(int cpu) void mlock_folio(struct folio *folio) { struct folio_batch *fbatch; + unsigned int nr_pages = folio_nr_pages(folio); local_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); if (!folio_test_set_mlocked(folio)) { - int nr_pages = folio_nr_pages(folio); - zone_stat_mod_folio(folio, NR_MLOCK, nr_pages); __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); } folio_get(folio); - if (!folio_batch_add(fbatch, mlock_lru(folio)) || + if (!folio_batch_add_nr_pages(fbatch, mlock_lru(folio), nr_pages) || folio_test_large(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); @@ -278,7 +277,7 @@ void mlock_new_folio(struct folio *folio) __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); folio_get(folio); - if (!folio_batch_add(fbatch, mlock_new(folio)) || + if (!folio_batch_add_nr_pages(fbatch, mlock_new(folio), nr_pages) || folio_test_large(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); diff --git a/mm/swap.c b/mm/swap.c index 57cb01b042f6..0f8554aeb338 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -228,8 +228,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) static void folio_batch_add_and_move(struct folio_batch *fbatch, struct folio *folio, move_fn_t move_fn) { - if (folio_batch_add(fbatch, folio) && !folio_test_large(folio) && - !lru_cache_disabled()) + if (folio_batch_add(fbatch, folio) && !lru_cache_disabled()) return; folio_batch_move_lru(fbatch, move_fn); }