From patchwork Thu Feb 27 23:02:10 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Thomas Prescher via B4 Relay
 <devnull+thomas.prescher.cyberus-technology.de@kernel.org>
X-Patchwork-Id: 13995332
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9D1C4C197BF
	for <linux-mm@archiver.kernel.org>; Thu, 27 Feb 2025 23:02:19 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8202A280008; Thu, 27 Feb 2025 18:02:15 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7CE55280001; Thu, 27 Feb 2025 18:02:15 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 584B7280008; Thu, 27 Feb 2025 18:02:15 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com
 [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 165EE280001
	for <linux-mm@kvack.org>; Thu, 27 Feb 2025 18:02:15 -0500 (EST)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id B654B141E43
	for <linux-mm@kvack.org>; Thu, 27 Feb 2025 23:02:14 +0000 (UTC)
X-FDA: 83167249788.04.5A86804
Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254])
	by imf05.hostedemail.com (Postfix) with ESMTP id 8F8A1100026
	for <linux-mm@kvack.org>; Thu, 27 Feb 2025 23:02:12 +0000 (UTC)
Authentication-Results: imf05.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=YSx2biNL;
	spf=pass (imf05.hostedemail.com: domain of
 devnull+thomas.prescher.cyberus-technology.de@kernel.org designates
 172.105.4.254 as permitted sender)
 smtp.mailfrom=devnull+thomas.prescher.cyberus-technology.de@kernel.org;
	dmarc=pass (policy=quarantine) header.from=kernel.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1740697332;
	h=from:from:sender:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=amRfIu7egDuh2gQT2LLaey0nNPyqBorglSAqnFUklxU=;
	b=YKWTHKnFj6EG+A0ypn/33uuodik0nQMMCQx4Qj+DerMs56IpFkExcC1TZ9ygEXYrznLjHL
	uYPRUPIJ31vVzuRgPAPgnkOlMa6uc58f2NPJ2UU+yElIVFJdlf7rG8DtWOhf1d0YoC8+A5
	7sBiqmj7uBUjbRS8Wzp0P6YxbrOJIuc=
ARC-Authentication-Results: i=1;
	imf05.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=YSx2biNL;
	spf=pass (imf05.hostedemail.com: domain of
 devnull+thomas.prescher.cyberus-technology.de@kernel.org designates
 172.105.4.254 as permitted sender)
 smtp.mailfrom=devnull+thomas.prescher.cyberus-technology.de@kernel.org;
	dmarc=pass (policy=quarantine) header.from=kernel.org
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740697332; a=rsa-sha256;
	cv=none;
	b=7hw5xmlXNqYxD64kmg+YXmUCrz2xquQGqIfviKhDeyVyxmg1By/EW0hSY372JT4Ho5SW1/
	nn5arX1jQfysLS2LwfXcqnJCUmZQ7qhwcI94xD+Hnc03BIlKyronEmnZqVG96pa14IgrJA
	Pgze8KWZ19KCeZDhfvYaXspsoj/eerQ=
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by tor.source.kernel.org (Postfix) with ESMTP id 6BBEC61149;
	Thu, 27 Feb 2025 23:02:03 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPS id 9F62BC4CEE5;
	Thu, 27 Feb 2025 23:02:11 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1740697331;
	bh=oJkzUc1eO6G6pDf5SWNcF21d4R6h8JEKsl5niSERdXM=;
	h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From;
	b=YSx2biNLel4X6ZP1W/gaW33feRWUqEFDPvcJOwgZYCGyt/sCa630rTTevvqyiO66w
	 lnGf595QQRRqbSLeuqZ5zRgwj6oWM32hlWfRVFxoDFO5Mf869AXc38UIR46MKeNMw+
	 Dtp39gK8ygSvOwzr3e6e29TOpIoiMbSJ7lcaDU5pe9+8TUNVTq+GE9pnqD39RZDDyV
	 uKRzDd6GiFGdho3LTgBLtUkBTbJm2FFMa8y6kOaoOfNoHCWv7S/H3rg2ukG8troDI4
	 YLYuDyjy0fsvxYtD7T+8HP1UtCpHkkDCwPOBStPQT/TSSprxnMkT3e0gIbiNqU1EK/
	 sypjP1Fl5k/+w==
Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org
 (localhost.localdomain [127.0.0.1])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8897FC19F32;
	Thu, 27 Feb 2025 23:02:11 +0000 (UTC)
From: Thomas Prescher via B4 Relay
 <devnull+thomas.prescher.cyberus-technology.de@kernel.org>
Date: Fri, 28 Feb 2025 00:02:10 +0100
Subject: [PATCH v3 1/3] mm: hugetlb: improve parallel huge page allocation
 time
MIME-Version: 1.0
Message-Id: 
 <20250228-hugepage-parameter-v3-1-2628e9b2b5c0@cyberus-technology.de>
References: 
 <20250228-hugepage-parameter-v3-0-2628e9b2b5c0@cyberus-technology.de>
In-Reply-To: 
 <20250228-hugepage-parameter-v3-0-2628e9b2b5c0@cyberus-technology.de>
To: Jonathan Corbet <corbet@lwn.net>, Muchun Song <muchun.song@linux.dev>,
 Andrew Morton <akpm@linux-foundation.org>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
 linux-mm@kvack.org, Thomas Prescher <thomas.prescher@cyberus-technology.de>
X-Mailer: b4 0.14.2
X-Developer-Signature: v=1; a=ed25519-sha256; t=1740697330; l=3225;
 i=thomas.prescher@cyberus-technology.de; s=20250221;
 h=from:subject:message-id;
 bh=J38/TB8riQAZ4fzuSMdQ3k6+xa/EprG/GBf8GbXV050=;
 b=vOjP4B1DdPNfhvskR5NLf5KL0dhmWvZ8fNSGh+J2B/n9nrAWy1cYAwk4sJjUbqvqK+Wq1RjLV
 KLwMJC6/XAHDMH7TZTR8U4e9ce3TwK3/dB61DC+yHT7uY6kVQvCEhBt
X-Developer-Key: i=thomas.prescher@cyberus-technology.de; a=ed25519;
 pk=T5MVdLVCc/0UUyv5IcSqGVvGcVkgWW/KtuEo2RRJwM8=
X-Endpoint-Received: by B4 Relay for
 thomas.prescher@cyberus-technology.de/20250221 with auth_id=345
X-Original-From: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Reply-To: thomas.prescher@cyberus-technology.de
X-Rspam-User: 
X-Stat-Signature: 3hn7qj6kefzjuogogyhbp8rkbka7u6hz
X-Rspamd-Queue-Id: 8F8A1100026
X-Rspamd-Server: rspam07
X-HE-Tag: 1740697332-62468
X-HE-Meta: 
 U2FsdGVkX1+XLMXonkxO3BBEv+u/xbQ2vf6Md2d3Agixm+DwH68w4HBaFhnZbNOaeZr3BS31Op1DwL70vJ3mi7GgiR6+gCa7/aHY0oI7qAhAx0PCgiOl8ObGinaRoxFT/F3Yv7FhEx8ZilBTIQLrbkaxqKvECBT9ThJbXKDqtpNgPzIvXgk7wz9OyHHzaF/HaPQJzZo7DZhoL1uElitFi27vEAoVw9dViZPCF4NJ4e/srqlZLuP7fvu78t2Da6Avtypsd2RUtiUo4GCi3EVv7LPIYDLAW5Mwj+9CNLALt7gw+gnIc25SPuYgW6lg3+WtmpDEfrWg7GHBafanNlWItreu+QLPE6zDfm8sED0WP+odMwRqGNQqePoenpARkMpLxWL1qMaWqvy91K0FiQcqWXFoXY7/dh2/4bgscEp/JwFCUXRx2cl3SOnRg5nuHZnRovELwVZ5f+OptpA/8wAEKeADdV/HiaC38uHeMmlZJNslp7Du+bWvuCXaOx1fF6sEdUNmCdNRuZIdeeDDFUoqwjFUe5JzFNw2dkqYekFroJMoLwYMoBysjU+S+QGtR6Xl8cSIK9NDREroG0WF0mQzKlAb4qbDdyJn3O4Rsey2TPIq1oo4QgOeuc0H/GIW1gQrsju82D0rWXJ1WJ1shegnd9gTvY8T9gjn/HFkK1DKzU/buC2uKf8sQ1DqWSeIKPuxscgcDuNZBoyL55JCdl8Ghj3E1d24ypJc9biSMABQTGMlQOJ7pqQd84+D+ul8DqozybYYq229D1rk8TMoNeC7llknLAIKC1IEcD9QZzV2PcZeBXBcsAIp7hy7nQ0N4WiN3FT0sZyz84K+46QmGjDc25mzZPURdjxuhp9LMHIrPwne3Cs2cedQngnzoPoTjz2thXzHBGeO6EH96d36JO2GjNWYBZnBh834Jh0GVBHCM+DbKRriFWfV+ogveoEqPfKU9SqcPdKRMu1rgE9zxoU
 dmd2uBD/
 w4A7ICCZ/AWlSxZo+gwLbmrI6PdBy2HcI8wNOGwk3UH415N1heIpJ50oEy29rJDYh8bw/JPYGKPRx6qkU5T20z8FiPF2mrskPJs5mppJhUclrpp48SrDN7q8zcpsaL141tN+T9F9FgUHdYVxPFkOJ+x9+AFLhRvXk2abuOOOmJlAmDRxPC9lhqBEgdospG8KqVwnjiaS9nIX5FOgxa0phMXifTEmBmEnFRJJ/TJVinLv5UjNkMQ07VY5l5zIoE9QPD0SexZN2bB3qs38XfCjqIQ31DJtgHnMqTAe+daudA1d5nqZEB+5ESMpfCk6PTlzB+6xJ7+geEW4W8EJMfmOKFGtM79nWqEnO4X9qM3AiBuY2yCiaB5C+Rrmxr6990JEZhS4dK4x9tbrKGH4=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

From: Thomas Prescher <thomas.prescher@cyberus-technology.de>

Before this patch, the kernel currently used a hard coded
value of 2 threads per NUMA node for these allocations.

This patch changes this policy and the kernel now uses 25%
of the available hardware threads for the allocations.

Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
---
 mm/hugetlb.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea16450026496c020b544877db147d1..e9b1b3e2b9d467f067d54359e1401a03f9926108 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,9 +14,11 @@
 #include <linux/pagemap.h>
 #include <linux/mempolicy.h>
 #include <linux/compiler.h>
+#include <linux/cpumask.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
 #include <linux/memblock.h>
+#include <linux/minmax.h>
 #include <linux/sysfs.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
@@ -3427,31 +3429,31 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 		.numa_aware	= true
 	};
 
+	unsigned int num_allocation_threads = max(num_online_cpus() / 4, 1);
+
 	job.thread_fn	= hugetlb_pages_alloc_boot_node;
 	job.start	= 0;
 	job.size	= h->max_huge_pages;
 
 	/*
-	 * job.max_threads is twice the num_node_state(N_MEMORY),
+	 * job.max_threads is 25% of the available cpu threads by default.
 	 *
-	 * Tests below indicate that a multiplier of 2 significantly improves
-	 * performance, and although larger values also provide improvements,
-	 * the gains are marginal.
+	 * On large servers with terabytes of memory, huge page allocation
+	 * can consume a considerably amount of time.
 	 *
-	 * Therefore, choosing 2 as the multiplier strikes a good balance between
-	 * enhancing parallel processing capabilities and maintaining efficient
-	 * resource management.
+	 * Tests below show how long it takes to allocate 1 TiB of memory with 2MiB huge pages.
+	 * 2MiB huge pages. Using more threads can significantly improve allocation time.
 	 *
-	 * +------------+-------+-------+-------+-------+-------+
-	 * | multiplier |   1   |   2   |   3   |   4   |   5   |
-	 * +------------+-------+-------+-------+-------+-------+
-	 * | 256G 2node | 358ms | 215ms | 157ms | 134ms | 126ms |
-	 * | 2T   4node | 979ms | 679ms | 543ms | 489ms | 481ms |
-	 * | 50G  2node | 71ms  | 44ms  | 37ms  | 30ms  | 31ms  |
-	 * +------------+-------+-------+-------+-------+-------+
+	 * +-----------------------+-------+-------+-------+-------+-------+
+	 * | threads               |   8   |   16  |   32  |   64  |   128 |
+	 * +-----------------------+-------+-------+-------+-------+-------+
+	 * | skylake      144 cpus |   44s |   22s |   16s |   19s |   20s |
+	 * | cascade lake 192 cpus |   39s |   20s |   11s |   10s |    9s |
+	 * +-----------------------+-------+-------+-------+-------+-------+
 	 */
-	job.max_threads	= num_node_state(N_MEMORY) * 2;
-	job.min_chunk	= h->max_huge_pages / num_node_state(N_MEMORY) / 2;
+
+	job.max_threads	= num_allocation_threads;
+	job.min_chunk	= h->max_huge_pages / num_allocation_threads;
 	padata_do_multithreaded(&job);
 
 	return h->nr_huge_pages;