From patchwork Wed Jul 17 07:12:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13735171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F249C3DA4B for ; Wed, 17 Jul 2024 07:13:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4E896B0088; Wed, 17 Jul 2024 03:13:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFDB16B0089; Wed, 17 Jul 2024 03:13:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B02856B008C; Wed, 17 Jul 2024 03:13:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 91BA86B0088 for ; Wed, 17 Jul 2024 03:13:11 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 48FEEC03F1 for ; Wed, 17 Jul 2024 07:13:11 +0000 (UTC) X-FDA: 82348378182.02.5A89FAA Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf30.hostedemail.com (Postfix) with ESMTP id 925678002A for ; Wed, 17 Jul 2024 07:13:09 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721200359; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CGmugk0vXEPiWKdhzxDCKcliGZ27MPEateLv5viIHqE=; b=mRiW1YOw56bUB1FSsryaLnXcH2ucuNIUbTgDeXTC0vLChq4DOhyDMtCv1D9enfYhNbc7zs PKzJBxCjty/KLHMK9TNxJYXHlVmFDTCXMQeW6KyYYUF3zP3yhlr/DE2UCJCelbdcWv/L5o HgRlKq9pPsyPn8qvUhbcHr3cTLubU8I= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721200359; a=rsa-sha256; cv=none; b=6TftOzKJLIV5wfr8Z0MwKbhJcsaWmuqL+H0sx5Sy8fUrwYwvSpPr6FVoL8QbLh3rS+tGw0 stSopqdykLFllRj8OJ032dL5H9zDsZyvMv6VwVCXR6e/O+7ZIYq11IYKp+R8XU1t3QWiF2 jep2cyyHeI+QRfHv4TLacW8bXtArupg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 49719106F; Wed, 17 Jul 2024 00:13:34 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2E1173F762; Wed, 17 Jul 2024 00:13:07 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Barry Song , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , Daniel Gomez Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 1/4] mm: mTHP user controls to configure pagecache large folio sizes Date: Wed, 17 Jul 2024 08:12:53 +0100 Message-ID: <20240717071257.4141363-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240717071257.4141363-1-ryan.roberts@arm.com> References: <20240717071257.4141363-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Stat-Signature: t66qjhstwhjkapo4bnwjs9ntutw8o7se X-Rspam-User: X-Rspamd-Queue-Id: 925678002A X-Rspamd-Server: rspam02 X-HE-Tag: 1721200389-580224 X-HE-Meta: U2FsdGVkX1/R+CcEtxgikBVBK0rqVxTlwayjVAekmYDeBSNLiBBEVDSSyCN4WRidj2l3GukVc2TtS0dZDd2t9Ska1vQlfqObhVMRBMYvUh1XEGl9Ihg1IuO1eRSYjnngKPVVdm+aBTu3OW88smlP4Hax1WHfmRuyR7GFPgq4wNNUFApVtc4nWTK/MSpnTi0EpfY7LXlZVhaF7UARosx2hOi0BkqoY2WPX14HUwMCs3J99AgXJXkrZYLMS83UaOhlFp+sANzK2NT8ZIKQId/6GNMsnoNG7b8yDTibVZSbzmY9pNFG66nt/4MGv55mNJikqTugyDL2j/DbxsWJ1zboSVEPPayox5mvD8rWiKc6lXsdA2fDaVmdHftJznzKurXe0eoKgMC/jX96ShSyqaDwQtbQMcNDsat/A0B/T1kMIaTewBYo7+NF3CPlyBeq1aX2fwqoUrsDnCFCme+urtEyX58EmZFby3LVEVyj9PzaCxZ+HjDTC1+b0yVe4tIrU41b/Ff1NfbH2DCRCh/kwRnRALpi5zsyhs9p2LRdwbpy/vi/c30JUo/DhR7l/PWISFfs5ifRV7ADPcNfG8VL0qJnjw7vjKVKquoMkAIeDgtoWJJq6X/Y4GHOOXZJczuEVLKSiGzinJUaKdlKeBzZnyOZSASMtbaMPX6R42kqzH/m2lhy985gV4V0AFl41V/cUSLgjpEX/e7BWM5eytRru0YQJmJ8zsipucB1nroXb1lfONYqqBVzKLotkrJ7oiJhFiNIHs/THKAcjw4OL8ZL+MN162IBEby8O0ZQLMsPqbStE/55jNZYiVmQOOcrXVXFfs9Hc3KEa9ijW7HXOG8EJ30xtZy5cntJwFRFEVL4oNS7fUvvObo/XOMff9e7sMadLx2chdqVGGPnDWMQy7cQ4Qiow+Q4a295Fu6jipnSF5iCvhtmoAg+ng+CYhCfgqIL8qN5JRwNa4+idUKo+/k8HGO ygTmdhrN Hyq4tJ7uaCVVeImPqOeV+gx1vMm5xbHX9qcCEB2+kxrqtR4eOPUO12IX9nga7oufav2uBf26ztWXYrVwnbu7qgwNrb25mjF0Yccf4Th3pQ7ZdBw81bVjT2gHrq6RLgtot7v1M/Squzq/uSHgAcOfQoguvNZkrixAH7zSI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add mTHP controls to sysfs to allow user space to configure the folio sizes that can be considered for allocation of file-backed memory: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/file_enable For now, the control can be set to either `always` or `never` to enable or disable that size. More options may be added in future. By default, at boot, all folio sizes are enabled, and the algorithm used to select a folio size remains conceptually unchanged; increase by 2 enabled orders each time a readahead marker is hit then reduce to the closest enabled order to fit within bounds of ra size, index alignment and EOF. So when all folio sizes are enabled, behavior should be unchanged. When folio sizes are disabled, the algorithm will never select them. Systems such as Android are always under extreme memory pressure and as a result fragmentation often causes attempts to allocate large folios to fail and fallback to smaller folios. By fixing the pagecache to one large folio size (e.g. 64K) plus fallback to small folios, a large source of this fragmentation can be removed and 64K mTHP allocations succeed more often, allowing the system to benefit from improved performance on arm64 and other arches that support "contpte". Signed-off-by: Ryan Roberts --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++++++ include/linux/huge_mm.h | 50 +++++++++++++--------- mm/filemap.c | 15 ++++--- mm/huge_memory.c | 43 +++++++++++++++++++ mm/readahead.c | 43 +++++++++++++++---- 5 files changed, 138 insertions(+), 34 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index d4857e457add..9f3ed504c646 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -284,6 +284,27 @@ that THP is shared. Exceeding the number would block the collapse:: A higher value may increase memory footprint for some workloads. +File-Backed Hugepages +--------------------- + +The kernel will automatically select an appropriate THP size for file-backed +memory from a set of allowed sizes. By default all THP sizes that the page cache +supports are allowed, but this set can be modified with one of:: + + echo always >/sys/kernel/mm/transparent_hugepage/hugepages-kB/file_enabled + echo never >/sys/kernel/mm/transparent_hugepage/hugepages-kB/file_enabled + +where is the hugepage size being addressed, the available sizes for which +vary by system. ``always`` adds the hugepage size to the set of allowed sizes, +and ``never`` removes the hugepage size from the set of allowed sizes. + +In some situations, constraining the allowed sizes can reduce memory +fragmentation, resulting in fewer allocation fallbacks and improved system +performance. + +Note that any changes to the allowed set of sizes only applies to future +file-backed THP allocations. + Boot parameter ============== diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4f9109fcdded..19ced8192d39 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -114,6 +114,24 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define HPAGE_PUD_MASK (~(HPAGE_PUD_SIZE - 1)) #define HPAGE_PUD_SIZE ((1UL) << HPAGE_PUD_SHIFT) +static inline int lowest_order(unsigned long orders) +{ + if (orders) + return __ffs(orders); + return -1; +} + +static inline int highest_order(unsigned long orders) +{ + return fls_long(orders) - 1; +} + +static inline int next_order(unsigned long *orders, int prev) +{ + *orders &= ~BIT(prev); + return highest_order(*orders); +} + enum mthp_stat_item { MTHP_STAT_ANON_FAULT_ALLOC, MTHP_STAT_ANON_FAULT_FALLBACK, @@ -158,6 +176,12 @@ extern unsigned long transparent_hugepage_flags; extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +extern unsigned long huge_file_orders_always; + +static inline unsigned long file_orders_always(void) +{ + return READ_ONCE(huge_file_orders_always); +} static inline bool hugepage_global_enabled(void) { @@ -172,17 +196,6 @@ static inline bool hugepage_global_always(void) (1< MAX_PAGECACHE_ORDER) - order = MAX_PAGECACHE_ORDER; + + orders = file_orders_always() | BIT(0); + orders &= BIT(order + 1) - 1; /* If we're not aligned, allocate a smaller folio */ if (index & ((1UL << order) - 1)) - order = __ffs(index); + orders &= BIT(__ffs(index) + 1) - 1; + order = highest_order(orders); - do { + while (orders) { gfp_t alloc_gfp = gfp; err = -ENOMEM; @@ -1962,7 +1965,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, break; folio_put(folio); folio = NULL; - } while (order-- > 0); + + order = next_order(&orders, order); + }; if (err == -EEXIST) goto repeat; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 26d558e3e80f..e8fe28fe9cf9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -80,6 +80,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL; unsigned long huge_anon_orders_always __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; +unsigned long huge_file_orders_always __read_mostly; unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, @@ -525,6 +526,37 @@ static ssize_t anon_enabled_store(struct kobject *kobj, return ret; } +static ssize_t file_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int order = to_thpsize(kobj)->order; + const char *output; + + if (test_bit(order, &huge_file_orders_always)) + output = "[always] never"; + else + output = "always [never]"; + + return sysfs_emit(buf, "%s\n", output); +} + +static ssize_t file_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int order = to_thpsize(kobj)->order; + ssize_t ret = count; + + if (sysfs_streq(buf, "always")) + set_bit(order, &huge_file_orders_always); + else if (sysfs_streq(buf, "never")) + clear_bit(order, &huge_file_orders_always); + else + ret = -EINVAL; + + return ret; +} + static struct kobj_attribute anon_enabled_attr = __ATTR(enabled, 0644, anon_enabled_show, anon_enabled_store); @@ -537,7 +569,11 @@ static const struct attribute_group anon_ctrl_attr_grp = { .attrs = anon_ctrl_attrs, }; +static struct kobj_attribute file_enabled_attr = + __ATTR(file_enabled, 0644, file_enabled_show, file_enabled_store); + static struct attribute *file_ctrl_attrs[] = { + &file_enabled_attr.attr, #ifdef CONFIG_SHMEM &thpsize_shmem_enabled_attr.attr, #endif @@ -712,6 +748,13 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) */ huge_anon_orders_inherit = BIT(PMD_ORDER); + /* + * For pagecache, default to enabling all orders. powerpc's PMD_ORDER + * (and therefore THP_ORDERS_ALL_FILE_DEFAULT) isn't a compile-time + * constant so we have to do this here. + */ + huge_file_orders_always = THP_ORDERS_ALL_FILE_DEFAULT; + *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj); if (unlikely(!*hugepage_kobj)) { pr_err("failed to create transparent hugepage kobject\n"); diff --git a/mm/readahead.c b/mm/readahead.c index 517c0be7ce66..e05f85974396 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -432,6 +432,34 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index, return 0; } +static int select_new_order(int old_order, int max_order, unsigned long orders) +{ + unsigned long hi_orders, lo_orders; + + /* + * Select the next order to use from the set in `orders`, while ensuring + * we don't go above max_order. Prefer the next + 1 highest allowed + * order after old_order, unless there isn't one, in which case return + * the closest allowed order, which is either the next highest allowed + * order or less than or equal to old_order. The "next + 1" skip + * behaviour is intended to allow ramping up to large folios quickly. + */ + + orders &= BIT(max_order + 1) - 1; + VM_WARN_ON(!orders); + hi_orders = orders & ~(BIT(old_order + 1) - 1); + + if (hi_orders) { + old_order = lowest_order(hi_orders); + hi_orders &= ~BIT(old_order); + if (hi_orders) + return lowest_order(hi_orders); + } + + lo_orders = orders & (BIT(old_order + 1) - 1); + return highest_order(lo_orders); +} + void page_cache_ra_order(struct readahead_control *ractl, struct file_ra_state *ra, unsigned int new_order) { @@ -443,17 +471,15 @@ void page_cache_ra_order(struct readahead_control *ractl, unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); + unsigned long orders; - if (!mapping_large_folio_support(mapping) || ra->size < 4) + if (!mapping_large_folio_support(mapping)) goto fallback; limit = min(limit, index + ra->size - 1); - if (new_order < MAX_PAGECACHE_ORDER) - new_order += 2; - - new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order); - new_order = min_t(unsigned int, new_order, ilog2(ra->size)); + orders = file_orders_always() | BIT(0); + new_order = select_new_order(new_order, ilog2(ra->size), orders); /* See comment in page_cache_ra_unbounded() */ nofs = memalloc_nofs_save(); @@ -463,9 +489,10 @@ void page_cache_ra_order(struct readahead_control *ractl, /* Align with smaller pages if needed */ if (index & ((1UL << order) - 1)) - order = __ffs(index); + order = select_new_order(order, __ffs(index), orders); /* Don't allocate pages past EOF */ - while (index + (1UL << order) - 1 > limit) + while (index + (1UL << order) - 1 > limit && + (BIT(order) & orders) == 0) order--; err = ra_alloc_folio(ractl, index, mark, order, gfp); if (err) From patchwork Wed Jul 17 07:12:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13735172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B44C5C3DA42 for ; Wed, 17 Jul 2024 07:13:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 278E36B0089; Wed, 17 Jul 2024 03:13:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 201466B008C; Wed, 17 Jul 2024 03:13:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFB586B0092; Wed, 17 Jul 2024 03:13:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D32B46B0089 for ; Wed, 17 Jul 2024 03:13:13 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8C1B51208EA for ; Wed, 17 Jul 2024 07:13:13 +0000 (UTC) X-FDA: 82348378266.20.41A655E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf20.hostedemail.com (Postfix) with ESMTP id DFA171C002E for ; Wed, 17 Jul 2024 07:13:11 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf20.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721200347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V8VGjsHO1XOx1TyJrn3GYA8agJ2krXD/+XFuvdGtMf4=; b=bycxZP1otHbYJLJb/F13mfA5jcFFh+FF1zmrbVTpbbGtwbBZ5Ho9H1gSLOELhR4sjrz6g1 ohFF9Z8FaT9M/E1R5sqPMkCxFpu0WWC6TZWUbrOtBHgzEWKsYS4//qaTyJ+mZbKet9/Jrk idw4kUoitCg7ODDWi/k6TfxA59KXhxU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721200347; a=rsa-sha256; cv=none; b=iFxz2tZCCXYMKNquKI++fsZE8Y584METBLje2g5O9LzaP/s3jmR22QrJyYbBtoygs460+Z zUPXAN/BTVg045TdG2bL4AP5OxhODGGI18zn3fqxEOKSY5EunWzEyTdL5Q1IOEoUdBRAXi Lu10VYusGJMMt7F1m3kAgEkFTM2Y/3k= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf20.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 36C541476; Wed, 17 Jul 2024 00:13:36 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 358DD3F762; Wed, 17 Jul 2024 00:13:09 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Barry Song , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , Daniel Gomez Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 2/4] mm: Introduce "always+exec" for mTHP file_enabled control Date: Wed, 17 Jul 2024 08:12:54 +0100 Message-ID: <20240717071257.4141363-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240717071257.4141363-1-ryan.roberts@arm.com> References: <20240717071257.4141363-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DFA171C002E X-Stat-Signature: 19uoq6do3yxebughu57j7aqnonz4hsgc X-Rspam-User: X-HE-Tag: 1721200391-517627 X-HE-Meta: U2FsdGVkX19s5xXlLppuhIv2h9x3tVPbLe1ivII5tEis+qQhHXa58KxmtxHxg16uTENrzPu2TEnlOMfpIAvYS57E8/bb740arhVe8/1BUletKKmT9RG5aVtnglkYmnXTwXAj2C5FFidB76NANViCT4Nt6wH56dxOqzz3Zq9vBKcbkTSw81s+nApz8l8xySVTynYulbTiMAa2IB04MqAX4VfCld372stlAfOQ8768xCK2n1T3FSJQruIT9DZR7F1+hsNJV9d97ZxdWMYnAPaXEqaw2uLv4d+/Ix28ayBxa/OVdvuBjj/LIcQE1x8BDf3RMiwX3laUxFytFqCxbxrJhWKAGcUiyUtguqrWNEV7cqhgeukpwgjiG8CTr91wvsR4UqddqjzCJibSCJP8D3VaPdLuidg4n4Vv5GPr3UWdYw4Vlze8Rb88SUfkann7kZKRgIavMn0kNs6CTRaQ86Qcda/8drhPqJgbtQZZyHLm6KXrv3OKVb3btm3Lu73lzbUyGrFogaMBGDKjd7prxEZhvv/AM6LrUj6BusUkYmA5UU6Yo5k6/o4192wJ2BpouvXLxydTDuyzrAYtb8HRwyWYO+XKFV8MmnAubfjRQ2o7GbATmaDnhFJOF8gP5O6vddcbeom9ypSn/IJenQPZ9L7mMs6JIhVj9m29J6bFzpUr4JTqstMpchDPHjb/IQi45GORo0mv0n5IJNrWNhF3gc3Vdqr+/OLnU4hvVmhfVrY22GBCHmT3pnO9jjT59m9Rcds9BnmTZuQqluV1Mk1iEph30JO7D5qWqYWkz9mQV6krMDpBNXvnuT7O4qpxqcHD4kM/QhPSF5GRsGV0ZlaZDXJWW2nNbmPCctz0eImrCBX3FpZpj9jCVVbSDRHRvNw4Ml7KPyoQAY4bvMDB/tbq+Cmf4mAVOQEvRZmXgrXXXOlh54G65aUfSx+/wCPZoR79KWzmFTUyOLN6fsJ70ME5mrS F1PFQdzR 8SyjKa0qsaYeqkowCiDpyt+ZMYDeepn3zjUddXA8qhBqKNge6lRUYk8t8UZJ/kmUsWRWbUSXuu9Us/H3OPmY7/BXHWPBV1YR3jTf1L3EUtyBWiiHnlRTvPRRvQXs0N+EtcR9skRUWhbx+FEJq7nVcscHKEJZx1iXAKeTRPvrvIVlCjjpVMW/WlE1uZ/TvbB3b2KJhfIHGbBk/lMw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In addition to `always` and `never`, add `always+exec` as an option for: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/file_enabled `always+exec` acts like `always` but additionally marks the hugepage size as the preferred hugepage size for sections of any file mapped with execute permission. A maximum of one hugepage size can be marked as `exec` at a time, so applying it to a new size implicitly removes it from any size it was previously set for. Change readahead to use this flagged exec size; when a request is made for an executable mapping, do a synchronous read of the size in a naturally aligned manner. On arm64 if memory is physically contiguous and naturally aligned to the "contpte" size, we can use contpte mappings, which improves utilization of the TLB. When paired with the "multi-size THP" changes, this works well to reduce dTLB pressure. However iTLB pressure is still high due to executable mappings having a low liklihood of being in the required folio size and mapping alignment, even when the filesystem supports readahead into large folios (e.g. XFS). The reason for the low liklihood is that the current readahead algorithm starts with an order-2 folio and increases the folio order by 2 every time the readahead mark is hit. But most executable memory is faulted in fairly randomly and so the readahead mark is rarely hit and most executable folios remain order-2. This is observed impirically and confirmed from discussion with a gnu linker expert; in general, the linker does nothing to group temporally accessed text together spacially. Additionally, with the current read-around approach there are no alignment guarrantees between the file and folio. This is insufficient for arm64's contpte mapping requirement (order-4 for 4K base pages). So it seems reasonable to special-case the read(ahead) logic for executable mappings. The trade-off is performance improvement (due to more efficient storage of the translations in iTLB) vs potential read amplification (due to reading too much data around the fault which won't be used), and the latter is independent of base page size. Of course if no hugepage size is marked as `always+exec` the old behaviour is maintained. Performance Benchmarking ------------------------ The below shows kernel compilation and speedometer javascript benchmarks on Ampere Altra arm64 system. When the patch is applied, `always+exec` is set for 64K folios. First, confirmation that this patch causes more memory to be contained in 64K folios (this is for all file-backed memory so includes non-executable too): | File-backed folios | Speedometer | Kernel Compile | | by size as percentage |-----------------|-----------------| | of all mapped file mem | before | after | before | after | |=========================|========|========|========|========| |file-thp-aligned-16kB | 45% | 9% | 46% | 7% | |file-thp-aligned-32kB | 2% | 0% | 3% | 1% | |file-thp-aligned-64kB | 3% | 63% | 5% | 80% | |file-thp-aligned-128kB | 11% | 11% | 0% | 0% | |file-thp-unaligned-16kB | 1% | 0% | 3% | 1% | |file-thp-unaligned-128kB | 1% | 0% | 0% | 0% | |file-thp-partial | 0% | 0% | 0% | 0% | |-------------------------|--------|--------|--------|--------| |file-cont-aligned-64kB | 16% | 75% | 5% | 80% | The above shows that for both use cases, the amount of file memory backed by 16K folios reduces and the amount backed by 64K folios increases significantly. And the amount of memory that is contpte-mapped significantly increases (last line). And this is reflected in performance improvement: Kernel Compilation (smaller is faster): | kernel | real-time | kern-time | user-time | peak memory | |----------|-------------|-------------|-------------|---------------| | before | 0.0% | 0.0% | 0.0% | 0.0% | | after | -1.6% | -2.1% | -1.7% | 0.0% | Speedometer (bigger is faster): | kernel | runs_per_min | peak memory | |----------|----------------|---------------| | before | 0.0% | 0.0% | | after | 1.3% | 1.0% | Both benchmarks show a ~1.5% improvement once the patch is applied. Signed-off-by: Ryan Roberts --- Documentation/admin-guide/mm/transhuge.rst | 6 +++++ include/linux/huge_mm.h | 11 ++++++++ mm/filemap.c | 11 ++++++++ mm/huge_memory.c | 31 +++++++++++++++++----- 4 files changed, 52 insertions(+), 7 deletions(-) -- 2.43.0 diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 9f3ed504c646..1aaf8e3a0b5a 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -292,12 +292,18 @@ memory from a set of allowed sizes. By default all THP sizes that the page cache supports are allowed, but this set can be modified with one of:: echo always >/sys/kernel/mm/transparent_hugepage/hugepages-kB/file_enabled + echo always+exec >/sys/kernel/mm/transparent_hugepage/hugepages-kB/file_enabled echo never >/sys/kernel/mm/transparent_hugepage/hugepages-kB/file_enabled where is the hugepage size being addressed, the available sizes for which vary by system. ``always`` adds the hugepage size to the set of allowed sizes, and ``never`` removes the hugepage size from the set of allowed sizes. +``always+exec`` acts like ``always`` but additionally marks the hugepage size as +the preferred hugepage size for sections of any file mapped executable. A +maximum of one hugepage size can be marked as ``exec`` at a time, so applying it +to a new size implicitly removes it from any size it was previously set for. + In some situations, constraining the allowed sizes can reduce memory fragmentation, resulting in fewer allocation fallbacks and improved system performance. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 19ced8192d39..3571ea0c3d8c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -177,12 +177,18 @@ extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; extern unsigned long huge_file_orders_always; +extern int huge_file_exec_order; static inline unsigned long file_orders_always(void) { return READ_ONCE(huge_file_orders_always); } +static inline int file_exec_order(void) +{ + return READ_ONCE(huge_file_exec_order); +} + static inline bool hugepage_global_enabled(void) { return transparent_hugepage_flags & @@ -453,6 +459,11 @@ static inline unsigned long file_orders_always(void) return 0; } +static inline int file_exec_order(void) +{ + return -1; +} + static inline bool folio_test_pmd_mappable(struct folio *folio) { return false; diff --git a/mm/filemap.c b/mm/filemap.c index 870016fcfdde..c4a3cc6a2e46 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3128,6 +3128,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) struct file *fpin = NULL; unsigned long vm_flags = vmf->vma->vm_flags; unsigned int mmap_miss; + int exec_order = file_exec_order(); #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* Use the readahead code, even if readahead is disabled */ @@ -3147,6 +3148,16 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) } #endif + /* If explicit order is set for exec mappings, use it. */ + if ((vm_flags & VM_EXEC) && exec_order >= 0) { + fpin = maybe_unlock_mmap_for_io(vmf, fpin); + ra->size = 1UL << exec_order; + ra->async_size = 0; + ractl._index &= ~((unsigned long)ra->size - 1); + page_cache_ra_order(&ractl, ra, exec_order); + return fpin; + } + /* If we don't want any read-ahead, don't bother */ if (vm_flags & VM_RAND_READ) return fpin; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e8fe28fe9cf9..4249c0bc9388 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -81,6 +81,7 @@ unsigned long huge_anon_orders_always __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; unsigned long huge_file_orders_always __read_mostly; +int huge_file_exec_order __read_mostly = -1; unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, @@ -462,6 +463,7 @@ static const struct attribute_group hugepage_attr_group = { static void hugepage_exit_sysfs(struct kobject *hugepage_kobj); static void thpsize_release(struct kobject *kobj); static DEFINE_SPINLOCK(huge_anon_orders_lock); +static DEFINE_SPINLOCK(huge_file_orders_lock); static LIST_HEAD(thpsize_list); static ssize_t anon_enabled_show(struct kobject *kobj, @@ -531,11 +533,15 @@ static ssize_t file_enabled_show(struct kobject *kobj, { int order = to_thpsize(kobj)->order; const char *output; + bool exec; - if (test_bit(order, &huge_file_orders_always)) - output = "[always] never"; - else - output = "always [never]"; + if (test_bit(order, &huge_file_orders_always)) { + exec = READ_ONCE(huge_file_exec_order) == order; + output = exec ? "always [always+exec] never" : + "[always] always+exec never"; + } else { + output = "always always+exec [never]"; + } return sysfs_emit(buf, "%s\n", output); } @@ -547,13 +553,24 @@ static ssize_t file_enabled_store(struct kobject *kobj, int order = to_thpsize(kobj)->order; ssize_t ret = count; - if (sysfs_streq(buf, "always")) + spin_lock(&huge_file_orders_lock); + + if (sysfs_streq(buf, "always")) { set_bit(order, &huge_file_orders_always); - else if (sysfs_streq(buf, "never")) + if (huge_file_exec_order == order) + huge_file_exec_order = -1; + } else if (sysfs_streq(buf, "always+exec")) { + set_bit(order, &huge_file_orders_always); + huge_file_exec_order = order; + } else if (sysfs_streq(buf, "never")) { clear_bit(order, &huge_file_orders_always); - else + if (huge_file_exec_order == order) + huge_file_exec_order = -1; + } else { ret = -EINVAL; + } + spin_unlock(&huge_file_orders_lock); return ret; } From patchwork Wed Jul 17 07:12:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13735173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22118C3DA4B for ; Wed, 17 Jul 2024 07:13:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 958556B008C; Wed, 17 Jul 2024 03:13:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 907E16B0092; Wed, 17 Jul 2024 03:13:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75A516B0095; Wed, 17 Jul 2024 03:13:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 510CC6B008C for ; Wed, 17 Jul 2024 03:13:15 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0A1C9A33C3 for ; Wed, 17 Jul 2024 07:13:15 +0000 (UTC) X-FDA: 82348378350.10.BBD1FC4 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf09.hostedemail.com (Postfix) with ESMTP id 6797914000A for ; Wed, 17 Jul 2024 07:13:13 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf09.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721200349; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jv2Czli6NbyrYYvJj84Kky/Y0Ig8CMule9sgZLnnjW0=; b=VgLD5Mv1hnDHNfTtxAytxcOlJpaEuSioRpAc+k0En+X/4OOx/wiCxV0kqsgTXYWfAmqbJo 3AZyARQadC778VB/q9IxGAzSSqRumxaKDKm097mnTsGWcaipidv+qTAeuMBLmaF2QZq96Z mv6egh8hsBdkoMgdu6GPhb+nDXZrUFs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721200349; a=rsa-sha256; cv=none; b=vkk1Z5cWa/iXU3dpA8g6M/kaNXnsRuC2FuNBFuuR7MG+aiIiRrIDvlhmRvlJ9shcpu8/oO 4E55L1pUlFTqMHu/enApZe8MKON64I3r9yhqasi1y+dMhTBRH0qnE/vrJwRdBRhsDBohP0 xaGhgZkfvJVY54nkgqtmYeAxGCOQYLI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf09.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 237961480; Wed, 17 Jul 2024 00:13:38 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 228DD3F762; Wed, 17 Jul 2024 00:13:11 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Barry Song , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , Daniel Gomez Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 3/4] mm: Override mTHP "enabled" defaults at kernel cmdline Date: Wed, 17 Jul 2024 08:12:55 +0100 Message-ID: <20240717071257.4141363-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240717071257.4141363-1-ryan.roberts@arm.com> References: <20240717071257.4141363-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6797914000A X-Stat-Signature: 5xtgo6f6wdfgoi3ieh366fgkcsyts39t X-Rspam-User: X-HE-Tag: 1721200393-919213 X-HE-Meta: U2FsdGVkX1+hJbQBAXvcKjOT8GenqgbXIRXyw2LLCTDKocQ2bOLBHfXMOAENNTxbK0CTY41DgqkRSNr9X921iBxnl0SXQnoIpsDvwx8LP5l7PQ6/XnOi7zfrZR91GASgCcpd54bkd0RJhaDIA2w7qmQvjjntn6uCqT/geA2x4vAklyc1NbTsD0vdY9JYlPEuu9eZDdXI3nijlQ77LW/AqTzllsyaTPmgqtIfVxmaVxVIyOziI8nLiuJyVV3v9fhW8l2OO0rETZ3+ydSeQ+XVa3gZOaOyKV9SQ8eUlQDFW699hsTGFaGlfqIDxTNNym0YbVs8HBXe/lKftFWeDDUIzkWLbXma510yRxfFXDlO4Mc6HKl6xArNioOnuHg04CVBXuswucodhPMHKAtQDtUSWYzP1/HWBFlIMPJUTqblis63FEZg/hOW9mSVvYXrpSGa7ER7/yn5br8r0jXzDaV22Zrm97KxNqy/yMnRv7w1VrlD5k1/Y8H5KR2+lZ2JpGoeh5AzXGRQs2qFPmkVc6T6iOVS35wtcXgvDn7DaSqggjZBeWcNSG2Wfw8Gghjbmfmoiv5kztT02iZHJmt4sZqCB0zxJiIGsKW/cfVIKfR9qhgyT8SGBYRN7oDG1Dcm3pFk+ISRNGVdauYPKh5iK5VJtF2SAtU2IOl+25jdc+eOqeC7n/Rri/FBCEmwZfIS5veHXbHQuMKPSpdIhxpA4uiaSLXfvtoUhtRt5LeIZ+R+moSr/eEdEXmg9nm+5r6FrdRvH8OQFmLknj2zHIbH8urcytRcchuEwdlCiaE8C5mn5CmjBs1vUeYY9csJ2/ijY5Io6BMxpbn8H5PkriB8HDSDPUkAaGWBoMupChQCQaVPmPj0e8GhbAqqIPMVg7zbRBRsvYhIscobiiSe/8KI+7RM1d/Raqt/JSH/NYDm308+Hd1wnsvdc8x89Kxs47S/loly12KQ7E2b04MvvGLcUZl vLNi4Dy3 CyyUjclHMWBl5PztKECME8Q6si8KTk9ZgQ3JLDyQprKLNlMlMxV7qwZ5Oud6mqMBq334HQ7LXOmkPUJ1Rkgm7mNOWRGXFqqSkcKigR8Pe1G5cqdAIkVIsWXMHmK0M+6oi9nlhcEy6NpM3DzrdS/HsFhUwStPl8EMX0++g X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add thp_anon= cmdline parameter to allow specifying the default enablement of each supported anon THP size. The parameter accepts the following format and can be provided multiple times to configure each size: thp_anon=[KMG]: See Documentation/admin-guide/mm/transhuge.rst for more details. Configuring the defaults at boot time is useful to allow early user space to take advantage of mTHP before its been configured through sysfs. Signed-off-by: Ryan Roberts --- .../admin-guide/kernel-parameters.txt | 8 +++ Documentation/admin-guide/mm/transhuge.rst | 26 +++++++-- mm/huge_memory.c | 55 ++++++++++++++++++- 3 files changed, 82 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index bc55fb55cd26..48443ad12e3f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6592,6 +6592,14 @@ : poll all this frequency 0: no polling (default) + thp_anon= [KNL] + Format: [KMG]:always|madvise|never|inherit + Can be used to control the default behavior of the + system with respect to anonymous transparent hugepages. + Can be used multiple times for multiple anon THP sizes. + See Documentation/admin-guide/mm/transhuge.rst for more + details. + threadirqs [KNL,EARLY] Force threading of all interrupt handlers except those marked explicitly IRQF_NO_THREAD. diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 1aaf8e3a0b5a..f53d43d986e2 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -311,13 +311,27 @@ performance. Note that any changes to the allowed set of sizes only applies to future file-backed THP allocations. -Boot parameter -============== +Boot parameters +=============== -You can change the sysfs boot time defaults of Transparent Hugepage -Support by passing the parameter ``transparent_hugepage=always`` or -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` -to the kernel command line. +You can change the sysfs boot time default for the top-level "enabled" +control by passing the parameter ``transparent_hugepage=always`` or +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the +kernel command line. + +Alternatively, each supported anonymous THP size can be controlled by +passing ``thp_anon=[KMG]:``, where ```` is the THP size +and ```` is one of ``always``, ``madvise``, ``never`` or +``inherit``. + +For example, the following will set 64K THP to ``always``:: + + thp_anon=64K:always + +``thp_anon=`` may be specified multiple times to configure all THP sizes as +required. If ``thp_anon=`` is specified at least once, any anon THP sizes +not explicitly configured on the command line are implicitly set to +``never``. Hugepages in tmpfs/shmem ======================== diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4249c0bc9388..794d2790d90d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -82,6 +82,7 @@ unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; unsigned long huge_file_orders_always __read_mostly; int huge_file_exec_order __read_mostly = -1; +static bool anon_orders_configured; unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, @@ -763,7 +764,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time * constant so we have to do this here. */ - huge_anon_orders_inherit = BIT(PMD_ORDER); + if (!anon_orders_configured) { + huge_anon_orders_inherit = BIT(PMD_ORDER); + anon_orders_configured = true; + } /* * For pagecache, default to enabling all orders. powerpc's PMD_ORDER @@ -955,6 +959,55 @@ static int __init setup_transparent_hugepage(char *str) } __setup("transparent_hugepage=", setup_transparent_hugepage); +static int __init setup_thp_anon(char *str) +{ + unsigned long size; + char *state; + int order; + int ret = 0; + + if (!str) + goto out; + + size = (unsigned long)memparse(str, &state); + order = ilog2(size >> PAGE_SHIFT); + if (*state != ':' || !is_power_of_2(size) || size <= PAGE_SIZE || + !(BIT(order) & THP_ORDERS_ALL_ANON)) + goto out; + + state++; + + if (!strcmp(state, "always")) { + clear_bit(order, &huge_anon_orders_inherit); + clear_bit(order, &huge_anon_orders_madvise); + set_bit(order, &huge_anon_orders_always); + ret = 1; + } else if (!strcmp(state, "inherit")) { + clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_madvise); + set_bit(order, &huge_anon_orders_inherit); + ret = 1; + } else if (!strcmp(state, "madvise")) { + clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_inherit); + set_bit(order, &huge_anon_orders_madvise); + ret = 1; + } else if (!strcmp(state, "never")) { + clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_inherit); + clear_bit(order, &huge_anon_orders_madvise); + ret = 1; + } + + if (ret) + anon_orders_configured = true; +out: + if (!ret) + pr_warn("thp_anon=%s: cannot parse, ignored\n", str); + return ret; +} +__setup("thp_anon=", setup_thp_anon); + pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) From patchwork Wed Jul 17 07:12:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13735174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91DD3C3DA42 for ; Wed, 17 Jul 2024 07:13:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4FC46B0092; Wed, 17 Jul 2024 03:13:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB3676B0095; Wed, 17 Jul 2024 03:13:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2BCF6B0096; Wed, 17 Jul 2024 03:13:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 92D4C6B0092 for ; Wed, 17 Jul 2024 03:13:17 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3DE541A0960 for ; Wed, 17 Jul 2024 07:13:17 +0000 (UTC) X-FDA: 82348378434.16.71F9DBC Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf26.hostedemail.com (Postfix) with ESMTP id 8FAE3140003 for ; Wed, 17 Jul 2024 07:13:15 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721200343; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6hY3/BEWPCr0kuWB2FOfe3gRaMPXC6ulqe4XpbSGqDw=; b=YeaT07GdGKYDtBHna3t9H07Oo0QzygHYBd2wy4AmlRDaRI3EjpYD9QcmiQaoXam75ZH1y7 +Pb6EWwPnpUhtokqEwhP7M0uNUAx/Dhmmxtu24H5WMoHybwNNL60ikcwHKSRTCArkTmfXl w7szHZTRj6oXd2Emg27B/xKtETDACAo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721200343; a=rsa-sha256; cv=none; b=TeT4uTw8HwsWtreuCeciTfFDwvSasbhr7Sj8YS40J33bkqlwnAlctt6i1dX0tIOqWvIrHG yJZgGkEXJr4vJXIjNbbJOw/36rApNFQSwz3oRALv85hvW6oxPuFRFFLi5pC3/aGSyEp/PH EyFLz2s2OMN3jzRgwQEE4lkTt2aPw/c= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0FE991063; Wed, 17 Jul 2024 00:13:40 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0F1E43F762; Wed, 17 Jul 2024 00:13:12 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Barry Song , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , Daniel Gomez Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 4/4] mm: Override mTHP "file_enabled" defaults at kernel cmdline Date: Wed, 17 Jul 2024 08:12:56 +0100 Message-ID: <20240717071257.4141363-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240717071257.4141363-1-ryan.roberts@arm.com> References: <20240717071257.4141363-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8FAE3140003 X-Stat-Signature: usq1cjocpzghc1jyty5ccx3s4mipwijk X-Rspam-User: X-HE-Tag: 1721200395-693342 X-HE-Meta: U2FsdGVkX19KYNDYySAOGvfNz2IWZgPG95W768hpNh1eRjpJwwZjVFWExPYcY6ASVXT1kxGuPa6k/rz03aXfSqcdQjUF9CUNEzYHNGiFZJjeqtn7H7J0nh44Od70fsD4UNOPOFhLGXGcMSSN6xr+OFD4ePcHNZV9tdpVkdJM5ahEYNn3wxV3CTEsQZC/TBwv+Pt/4Tyd8t7/mSK7BTbWpV4pGH9F2I0buOTGPtboVxZvV/FzJR5ycwF7Zz7OrATKMAIsre7tkbh7wc+oefQGzUPpE3JOuhlJxokqXPzM8RYdet3m7TQUZG2y9In5uolfmfWUHqvAks1Eqx/yAcxA1hTg1O3DE5mWfRBKoBLpReFoKHEwTZXPHFs6X4/8iEmU7r4/aTl/+3dOxdQDjJHW4Q+6Fd4gnK0QQMrFRRjSkXFnfHrjvxIKNJHw8afCsF78JcsQvMaxEiSMf4Tr0AcYUfzoM3Al/jqYQfu4DukY32RKAJhsahWzvHm+dIIYyK10RYBx3WGGMT5+lehmtsE7HUm8P9gW5EOgZ6MxYEqqrf/xnwJPgB6yg0U9PSTS094Uwxkn5pgZItDZftaJD6uKjLsTUAPFy1Yp2YPj756xs/2K61cv2z+hAfKxOwMQG2+/KF3jL7nF1jUvfPACiu424Osg1gwXWCPr/K0ACrjIP6iCRSMAmCj8uLvPmsFzAI8mAtH5sqgXSfwPsf/nKEwLsqHRv/ULwrKRl9zI/mJeGdhbibs7tpLRCR+0qBLqMs7duSQdfNJTDt9eHYiKtTjjLcziknLReuku6bgz6Xh8DBxjHXaVwEMkOzn84k5QEwg3wkjB7+8HHK5JNJs5e1AlkUaVLMqP8svZ0uZGgzMli6ELdq8UJP9evIWdc0SO4CgSVvXhS0d+6j0PqVFipFZOTkGuoQK84x86do94XnVyzHYsdSNBp/yyN6VtkflNy112WuMaCisBa/Aot0ByQNp 7OUuKUJ9 1dV5v0wHu5P5EC5xciU15NV+OpmyE0G1Lwaz8TjXxvbC+vmqK2443XcSktDAHe15not3+Tb56zGgVnmG+SqgJbQDV65dpSHPGDSusTo937Q8Wzc/z0F3xWTw9q8VNM2866Ishlrz8cEEG0i2R3eUwJK428dOBtlXlW0s8MLlIx03xmBZDX1qVupYr4cZfBPOTOgAchVjvIiBz1pw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add thp_file= cmdline parameter to allow specifying the default enablement of each supported file-backed THP size. The parameter accepts the following format and can be provided multiple times to configure each size: thp_file=[KMG]: See Documentation/admin-guide/mm/transhuge.rst for more details. Configuring the defaults at boot time is often necessary because its not always possible to drop active executable pages from the page cache, especially if they are well used like libc. The command line parameter allows configuring the values before the first page is installed in the page cache. Signed-off-by: Ryan Roberts --- .../admin-guide/kernel-parameters.txt | 8 ++++ Documentation/admin-guide/mm/transhuge.rst | 13 ++++++ mm/huge_memory.c | 45 ++++++++++++++++++- 3 files changed, 65 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 48443ad12e3f..e3e99def5691 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6600,6 +6600,14 @@ See Documentation/admin-guide/mm/transhuge.rst for more details. + thp_file= [KNL] + Format: [KMG]:always|always+exec|never + Can be used to control the default behavior of the + system with respect to file-backed transparent hugepages. + Can be used multiple times for multiple file-backed THP + sizes. See Documentation/admin-guide/mm/transhuge.rst + for more details. + threadirqs [KNL,EARLY] Force threading of all interrupt handlers except those marked explicitly IRQF_NO_THREAD. diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index f53d43d986e2..2379ed4ad085 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -333,6 +333,19 @@ required. If ``thp_anon=`` is specified at least once, any anon THP sizes not explicitly configured on the command line are implicitly set to ``never``. +Each supported file-backed THP size can be controlled by passing +``thp_file=[KMG]:``, where ```` is the THP size and +```` is one of ``always``, ``always+exec`` or ``never``. + +For example, the following will set 64K THP to ``always+exec``:: + + thp_file=64K:always+exec + +``thp_file=`` may be specified multiple times to configure all THP sizes as +required. If ``thp_file=`` is specified at least once, any file-backed THP +sizes not explicitly configured on the command line are implicitly set to +``never``. + Hugepages in tmpfs/shmem ======================== diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 794d2790d90d..4d963dde7aea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -83,6 +83,7 @@ unsigned long huge_anon_orders_inherit __read_mostly; unsigned long huge_file_orders_always __read_mostly; int huge_file_exec_order __read_mostly = -1; static bool anon_orders_configured; +static bool file_orders_configured; unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, @@ -774,7 +775,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) * (and therefore THP_ORDERS_ALL_FILE_DEFAULT) isn't a compile-time * constant so we have to do this here. */ - huge_file_orders_always = THP_ORDERS_ALL_FILE_DEFAULT; + if (!file_orders_configured) { + huge_file_orders_always = THP_ORDERS_ALL_FILE_DEFAULT; + file_orders_configured = true; + } *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj); if (unlikely(!*hugepage_kobj)) { @@ -1008,6 +1012,45 @@ static int __init setup_thp_anon(char *str) } __setup("thp_anon=", setup_thp_anon); +static int __init setup_thp_file(char *str) +{ + unsigned long size; + char *state; + int order; + int ret = 0; + + if (!str) + goto out; + + size = (unsigned long)memparse(str, &state); + order = ilog2(size >> PAGE_SHIFT); + if (*state != ':' || !is_power_of_2(size) || size <= PAGE_SIZE || + !(BIT(order) & THP_ORDERS_ALL_FILE_DEFAULT)) + goto out; + + state++; + + if (!strcmp(state, "always")) { + set_bit(order, &huge_file_orders_always); + ret = 1; + } else if (!strcmp(state, "always+exec")) { + set_bit(order, &huge_file_orders_always); + huge_file_exec_order = order; + ret = 1; + } else if (!strcmp(state, "never")) { + clear_bit(order, &huge_file_orders_always); + ret = 1; + } + + if (ret) + file_orders_configured = true; +out: + if (!ret) + pr_warn("thp_file=%s: cannot parse, ignored\n", str); + return ret; +} +__setup("thp_file=", setup_thp_file); + pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE))