From patchwork Fri Jan 26 15:24:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95B4BC47DDF for ; Fri, 26 Jan 2024 15:25:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE7C06B0082; Fri, 26 Jan 2024 10:25:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E97116B0085; Fri, 26 Jan 2024 10:25:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D60CB6B0087; Fri, 26 Jan 2024 10:25:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C2ADC6B0082 for ; Fri, 26 Jan 2024 10:25:00 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6F528120FD9 for ; Fri, 26 Jan 2024 15:25:00 +0000 (UTC) X-FDA: 81721835160.15.814DBEB Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) by imf19.hostedemail.com (Postfix) with ESMTP id BBEBB1A001B for ; Fri, 26 Jan 2024 15:24:58 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Z2mGLWb9; spf=pass (imf19.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P9aNrm9Rl6VWe92w5Q1+H7U7+x0T3IY85gF9i1XPLD0=; b=FTy+AQFl0ym3i218UPEm3DQ/1IMEG1yWXW8uWxSKWjDSdc98l+4SPd8sd0+rNb9nB5BJWj AwAoI8Fk4Awaq0YGND8Odbp0s9BBZz2me3rx+Tf/5ONgrB8YETCmNP5NW9wXL+Z15LIC+k MYc1Rx9MNGW3hZ2bn6h4BmDpnAtzUZU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282698; a=rsa-sha256; cv=none; b=qVQ4o2eWGqPAJbqnKs3koQszn3/A7rMjgWMwwbplYM4GQuq65i6zQ+Lqwi6zYTzlw4op0t Km6XOS4XDTHa7zzWrZYIo284CScCg4qQutEbRzXUkbR49JbpggZr+MuJwxS8wALxa4znep OQea6JMPb4SWLMk+JH7Rn/tP9fOpJO4= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Z2mGLWb9; spf=pass (imf19.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P9aNrm9Rl6VWe92w5Q1+H7U7+x0T3IY85gF9i1XPLD0=; b=Z2mGLWb9xrBuI0vOVD3E/NJFwQiuaje5gheixSf/WVC/8b+43XxkctLePzaM1b5Q45N43H jiACbC9uRwNgXG1Aj/uKYCxw/54Qrw4qopzYmDpDtD28Eu+qzsQAz6NeImrzqeJQQJWWKD oNdPxdCl8DH8xl4A48uiL4uj+fNJQvE= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 1/7] hugetlb: code clean for hugetlb_hstate_alloc_pages Date: Fri, 26 Jan 2024 23:24:05 +0800 Message-Id: <20240126152411.1238072-2-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: 61jfnek6sm567rccba5hck65qwtwod4n X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: BBEBB1A001B X-Rspam-User: X-HE-Tag: 1706282698-121606 X-HE-Meta: U2FsdGVkX1+o0/aGAwj0sDk5wi+OqF9I33fxZgznnn50AYln3BLuWbS1+uX95EUQ+7hni+u6XqtpaVwbw0xKwujXQnouOwXbr8k380U88yGVKMOPeuADUofFBePbVGZPVVFdlVfc707+4hq8wa0fezBJ7GBR4Ym5wx8NyYvzzt+h9Xm+uzv0GBUs+PCnEcOp4OY6B+KtEFzmd2/P8f/rsieGrVN9XB33EcrHKNTtuopd8KKnwY/rGGs9z13ETvONaw4G1zy3b2m0qanglR9L+RLUrsaK6oK8Rfrg+mEoEsdgJiOENCrSezvj48vDZe36G8MP6nF2vNTKAlwu5ofoveylPtLu8rDVlK+NcOJlsIkJm4qy7wxoa2w9UAqcoXP+Csc11Xd7xYC8MT/dA/LP/4ipL9m3aSCrnY3WihvYwQaffLc+ZjpeaMvf7qJvIiHP2AdVug4T1CxFbXWbE4yepQNFzeWI55rzeza74hMoRm1pgyQtJ0APZg92V2B/5Xyf9v4kcWrkzlfmP0ulrZZ1HrQ5dgpKU56NWiaio9hMRXBqDKUkQ6/NsNTQfwn2ogEt2vGV06i93MRFOG1hZH6NUKD509ceWKEaiYua1HzdvuWk4cfhxiA4CYy1dlI7ClMCw1jXYungMwnnnFq+cWyywqLuzb9OllZUb3wPPmyfl8u8AIk0B4X/Pp0d6e+i0tU19Fn0nKwNbTL5CiSMTDG97yAcsaqLbVnYwytxgx8lSfh/ytp0S3WCnS8Uj6b4RyjiIklaDPEfXU80B1IFeygF5Q+MPpiefq/lfOlZULHM3TJ1itxEqeZTGE9Bv7FtunblCUCMUPSovfgr31VrI3pac9OsHnHpCCvXAwqYg7b18QJbOEZz7IOoOKthsuIxofmT10TkEURyP79gbumVjugtgo5slb469iaCSkLDqeJlntRr2Dr/nmfnGlaOMIA2xO7Okcao6qtdPKYc7QzKTDd 6ozAXWST pNSvD67NeUDSiD/oTeVG7kFG41CNxgmytGAR8k7QamFoB44kuqF3pwlm81tkhbDy1zm2Vs7TpxO+xV91Mf3dIA+hlDVMfYu3V3KX8/GJgGeO5tiYW8GAwVMpwz183ck5UeHpSoZjjk1cv+LYXG9WALmjvBYm2iVAUWAFYHOWGTTAiiiqMJEnn75fIjYbFA0rPJWp5ATlGXoQVMfumBPm6946oMnOwK8PIbeD8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The readability of `hugetlb_hstate_alloc_pages` is poor. By cleaning the code, its readability can be improved, facilitating future modifications. This patch extracts two functions to reduce the complexity of `hugetlb_hstate_alloc_pages` and has no functional changes. - hugetlb_hstate_alloc_pages_node_specific() to handle iterates through each online node and performs allocation if necessary. - hugetlb_hstate_alloc_pages_report() report error during allocation. And the value of h->max_huge_pages is updated accordingly. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song Reviewed-by: Tim Chen --- mm/hugetlb.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2cf78218dfe2e..20d0494424780 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3482,6 +3482,33 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid) h->max_huge_pages_node[nid] = i; } +static bool __init hugetlb_hstate_alloc_pages_specific_nodes(struct hstate *h) +{ + int i; + bool node_specific_alloc = false; + + for_each_online_node(i) { + if (h->max_huge_pages_node[i] > 0) { + hugetlb_hstate_alloc_pages_onenode(h, i); + node_specific_alloc = true; + } + } + + return node_specific_alloc; +} + +static void __init hugetlb_hstate_alloc_pages_errcheck(unsigned long allocated, struct hstate *h) +{ + if (allocated < h->max_huge_pages) { + char buf[32]; + + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); + pr_warn("HugeTLB: allocating %lu of page size %s failed. Only allocated %lu hugepages.\n", + h->max_huge_pages, buf, allocated); + h->max_huge_pages = allocated; + } +} + /* * NOTE: this routine is called in different contexts for gigantic and * non-gigantic pages. @@ -3499,7 +3526,6 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) struct folio *folio; LIST_HEAD(folio_list); nodemask_t *node_alloc_noretry; - bool node_specific_alloc = false; /* skip gigantic hugepages allocation if hugetlb_cma enabled */ if (hstate_is_gigantic(h) && hugetlb_cma_size) { @@ -3508,14 +3534,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) } /* do node specific alloc */ - for_each_online_node(i) { - if (h->max_huge_pages_node[i] > 0) { - hugetlb_hstate_alloc_pages_onenode(h, i); - node_specific_alloc = true; - } - } - - if (node_specific_alloc) + if (hugetlb_hstate_alloc_pages_specific_nodes(h)) return; /* below will do all node balanced alloc */ @@ -3558,14 +3577,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) /* list will be empty if hstate_is_gigantic */ prep_and_add_allocated_folios(h, &folio_list); - if (i < h->max_huge_pages) { - char buf[32]; - - string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); - pr_warn("HugeTLB: allocating %lu of page size %s failed. Only allocated %lu hugepages.\n", - h->max_huge_pages, buf, i); - h->max_huge_pages = i; - } + hugetlb_hstate_alloc_pages_errcheck(i, h); kfree(node_alloc_noretry); } From patchwork Fri Jan 26 15:24:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532816 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 690EBC47DDF for ; Fri, 26 Jan 2024 15:25:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F35FA6B0087; Fri, 26 Jan 2024 10:25:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE62B6B008C; Fri, 26 Jan 2024 10:25:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAE016B0092; Fri, 26 Jan 2024 10:25:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CB51E6B0087 for ; Fri, 26 Jan 2024 10:25:07 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9C20C120FE4 for ; Fri, 26 Jan 2024 15:25:07 +0000 (UTC) X-FDA: 81721835454.06.4B8CF42 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) by imf15.hostedemail.com (Postfix) with ESMTP id B87A4A0007 for ; Fri, 26 Jan 2024 15:25:05 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=NIN5KgkQ; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf15.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EpmnH1heqxiT9L2xvocYTAQyyeV+M8p6qSwgZH/mouw=; b=OAwx1Rnb9pCl00or6BR9bG8hgFEyt7XFoylTv8l30V7LZTmEwHWfF4a0rp2xknTYceL9xn uF83oZ77lPi5T29Xh7STViHNBA86Mv/qK89SkZ8+3RnveFUk/Mg+qjv29DkVtvJ0Irq6lZ PWggSHgON6ztmqZUZHRCS8NlgdDrn30= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=NIN5KgkQ; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf15.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282705; a=rsa-sha256; cv=none; b=bmCUUZzjWkIJ74E39LsfS+MCDfueUjrcRgOP8wZKjV1dcWUHUmL8kdel2UMP+1l9FAc4pa Pw2d+Wp9zD9oWr4rBkn+BtBFwrqYFkiNjwB3b6HInD/h0OGz+cw73aaU0BMT2QPxxJ9hgV voc9g1xp4QdjxMgh+KkpAlaW0RSfa7o= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282704; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EpmnH1heqxiT9L2xvocYTAQyyeV+M8p6qSwgZH/mouw=; b=NIN5KgkQ6ktwWa4GiDjUdAl7RaNOrgEmptJXjSAwQxyyMDJcoijL2+TDpy7yOmpIZkDhsL OSnefDgiw31H9dPHL9T6oBqYiSboHE9E7/g4GRKZgZb0/NDBZi/sa0bL1SshUOSqTeY3cu /G8p+dcISN/5OT93ypM/kpclSHGR/Kg= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 2/7] hugetlb: split hugetlb_hstate_alloc_pages Date: Fri, 26 Jan 2024 23:24:06 +0800 Message-Id: <20240126152411.1238072-3-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: B87A4A0007 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: kknawb561iektipqym9z5s9a8mbp34zr X-HE-Tag: 1706282705-667633 X-HE-Meta: U2FsdGVkX1/rHUIPDf0sjqhMqKtqgnrmJWeMeImzz/hreTc0P4rJwFvvlprHosYWL22T/oPlhx/BtSn0E5qX+TIAdfqwT86b1yYUpl57H8dSQjx35VahJF+6rj3sF1EzBqDCEBAK/V7oCuNUZblbCa7QVIsA4fN15i2UJMfdttb7GRllFRYYHpSjKNZLmpC+rSLnzxqQtjY7G+l/PucGLebNv+TmlshCXaeaLX6rbbWTm+MwrrAZ+ynithQnW9ZKA/Ix7LJK/FhWy02+gsi+NKPV25TJJKn9+ixS5hrBDkb0aiF/LLjjIXDn71W8n7v40pmQLK/yoJKOtHH7MltrcLrIgho8KeYg+UJnoPQu6GRn9pfXDfdW/doC8c5YBISlJ/Jkwe+vsyqlaDEgPsTPPXacMDomy9uYNReD8bvuUIXBKgF95QiO0HZ99NfdY1nJsA26ro59+23uXjLGftby3PtDU1e3ibt9SHTu6oHtah+CcFesE9DOFz4cUsOHAY8OernI0Cn7udqhoA6NDgcFkwq0IhBUBKXFw5MeVU96CNiwMBEeXoZtPNNyh74mabx/6dEJnQzm0LlycnxrQgqJjz+/RIiFlw5Y6Y06Y13SH2MO612EHT8sAPateodqqPXIYSABZX41qsDSCJTkFpfUrThRwDBz7h7U8+BerLyXXquQHxs/9PGOm3STuYlzsC4oEwFt23iEX8RpJ2/WTWEm6JQCYujWMgaeIC3/VqF9uyRz1+krCagEdpW2nPhm2Uzkxf8ArKtVHbeHVwYQkoUNs6EfQCJnkgkSomxHIYh+JfH6wQM9pAbTvdVj30PwhYH1jykygCN09CYdmvIL5OZrHuHrZNDm4rvBiXcdnAJCIP1SuDMcwoh/0i95Ia5JrKgQNuGRQKpu9FkcohkSh+ExIDbAzma4Zhoeh+CsNUv0gYwHPSZdAUHzce3wCUwHucDV5EqC0YzlR1c0RgC5Jwg 9+imsjfR f11ym+/N+S9HCyUENS7DsyZm9hdp5TsKrgNI1m++YiFpMc36tYnyluC63h5p3lL8fX6y1nnxhLe9v4iyv9wlLf9xuva8t7pAQVGYE73bmIKKH+/cKqpuzjY3/QWzI+EndLQrLfklyOgyt/VoDSCbPDifjnWG7HL29WdfhWVLDcTxngzD85bcJ9zTK0mSBJ/gmjSF/qW53lw3Ll2JoRERSggTLwMyF8ilDQC2YqhhBZ2X2bUE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 1G and 2M huge pages have different allocation and initialization logic, which leads to subtle differences in parallelization. Therefore, it is appropriate to split hugetlb_hstate_alloc_pages into gigantic and non-gigantic. This patch has no functional changes. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Tim Chen Reviewed-by: Muchun Song --- mm/hugetlb.c | 87 ++++++++++++++++++++++++++-------------------------- 1 file changed, 43 insertions(+), 44 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 20d0494424780..00bbf7442eb6c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3509,6 +3509,43 @@ static void __init hugetlb_hstate_alloc_pages_errcheck(unsigned long allocated, } } +static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstate *h) +{ + unsigned long i; + + for (i = 0; i < h->max_huge_pages; ++i) { + if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE)) + break; + cond_resched(); + } + + return i; +} + +static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) +{ + unsigned long i; + struct folio *folio; + LIST_HEAD(folio_list); + nodemask_t node_alloc_noretry; + + /* Bit mask controlling how hard we retry per-node allocations.*/ + nodes_clear(node_alloc_noretry); + + for (i = 0; i < h->max_huge_pages; ++i) { + folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY], + &node_alloc_noretry); + if (!folio) + break; + list_add(&folio->lru, &folio_list); + cond_resched(); + } + + prep_and_add_allocated_folios(h, &folio_list); + + return i; +} + /* * NOTE: this routine is called in different contexts for gigantic and * non-gigantic pages. @@ -3522,10 +3559,7 @@ static void __init hugetlb_hstate_alloc_pages_errcheck(unsigned long allocated, */ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) { - unsigned long i; - struct folio *folio; - LIST_HEAD(folio_list); - nodemask_t *node_alloc_noretry; + unsigned long allocated; /* skip gigantic hugepages allocation if hugetlb_cma enabled */ if (hstate_is_gigantic(h) && hugetlb_cma_size) { @@ -3538,47 +3572,12 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) return; /* below will do all node balanced alloc */ - if (!hstate_is_gigantic(h)) { - /* - * Bit mask controlling how hard we retry per-node allocations. - * Ignore errors as lower level routines can deal with - * node_alloc_noretry == NULL. If this kmalloc fails at boot - * time, we are likely in bigger trouble. - */ - node_alloc_noretry = kmalloc(sizeof(*node_alloc_noretry), - GFP_KERNEL); - } else { - /* allocations done at boot time */ - node_alloc_noretry = NULL; - } - - /* bit mask controlling how hard we retry per-node allocations */ - if (node_alloc_noretry) - nodes_clear(*node_alloc_noretry); - - for (i = 0; i < h->max_huge_pages; ++i) { - if (hstate_is_gigantic(h)) { - /* - * gigantic pages not added to list as they are not - * added to pools now. - */ - if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE)) - break; - } else { - folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY], - node_alloc_noretry); - if (!folio) - break; - list_add(&folio->lru, &folio_list); - } - cond_resched(); - } - - /* list will be empty if hstate_is_gigantic */ - prep_and_add_allocated_folios(h, &folio_list); + if (hstate_is_gigantic(h)) + allocated = hugetlb_gigantic_pages_alloc_boot(h); + else + allocated = hugetlb_pages_alloc_boot(h); - hugetlb_hstate_alloc_pages_errcheck(i, h); - kfree(node_alloc_noretry); + hugetlb_hstate_alloc_pages_errcheck(allocated, h); } static void __init hugetlb_init_hstates(void) From patchwork Fri Jan 26 15:24:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3DC0C47DDF for ; Fri, 26 Jan 2024 15:25:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C0D56B0092; Fri, 26 Jan 2024 10:25:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 56FDA6B0093; Fri, 26 Jan 2024 10:25:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 411026B0095; Fri, 26 Jan 2024 10:25:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 32FFB6B0092 for ; Fri, 26 Jan 2024 10:25:18 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0A92340F3B for ; Fri, 26 Jan 2024 15:25:18 +0000 (UTC) X-FDA: 81721835916.29.38B9B93 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) by imf13.hostedemail.com (Postfix) with ESMTP id 7B70E2000E for ; Fri, 26 Jan 2024 15:25:16 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=wyYlzc34; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf13.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RmrMLc6E2IVqKhV8ZhRU9VeFjN80rDuhVxh0u6+eJMA=; b=T44OEENYBmBPsRwp4LRo7BQSKU3P4ragP2xWZtawO7Cx1ZE2m6cetdfIgzbAfjVmF3hHEv 1Wl1R87RnDnQlp7eVu3SMrZDFcv8X+RlIZALShmTbp2Y0eDLzYCL8Un0ENDfXfksq5Ku7h Cl/fS4x/64AcpWT71wHDQabpzWwtFQ8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=wyYlzc34; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf13.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282716; a=rsa-sha256; cv=none; b=EdkgnsofT516LOJ+5bPxJRVEdVldpqtSTHiQ2wd4DDxRVjm82CrBozA5MDSkS1ifdrPRMk GxLSr2nuI5Q6G/Hx8z0Kcyx8RNepqltpHfZY4bkIwgbgmTKYOn388r4wr7BrJUokD7ln/j QAc11VMam674NKvw7HO7q0C+RyUNkG4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282715; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RmrMLc6E2IVqKhV8ZhRU9VeFjN80rDuhVxh0u6+eJMA=; b=wyYlzc34oaBoDa42QsBypygqXrsoGKHAfb+deg3SuOPDAmbhiv09g6PCqoW3PG4AJbHsVA hz3+64qyLRhxOGW5ACJiiomiu2sHCxRKxpEOC4gAbDn/H6YePZadVNJJ/H4FtQXG7IOOjG 0L+u9CTOwvqS2DBzIXlE1TKDr+2kLOc= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 3/7] padata: dispatch works on different nodes Date: Fri, 26 Jan 2024 23:24:07 +0800 Message-Id: <20240126152411.1238072-4-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 7B70E2000E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: d1eagbtfg1mgks3y3jgyyh7kcg4wwxzx X-HE-Tag: 1706282716-827688 X-HE-Meta: U2FsdGVkX1/BJeS/gV3dai9g2yNjcRE7Hgtr9VT9x8aMSd+SbQbuXwXFW/cf1iM4d/OcZeVSq6IXsjSgqpRDKgYlPERrHKtzbesl3NNFfnLxZfsdMlF2Ryo0e5MzmXIqtFoaI3CbP561KnZCp4GpFjTY+d0spjjIHXGeaj/Pt9WsEc6dAXoh/g+TzqPFY5dJ0+7o/D/QRVZwy2iTMm7jNySsY4fGO/wTzJzvZjksyxNcWELr9iFrRrhOAIagJpNuKLaSQPNWn2CiTFjggqyxxfiAA6dIl0kDASHarIeU9nh3nDxBDlAZqCbh57YvK9dj47Uqfb8nrFojiyHs5EVlenLH+lZGaD+yuLPiI6lWFQLRN3rgc0zV7W/dVOfqcmqMuFNNvvIuCXARRF8HTQfZxMioDqq8pwqFShIA5VzfwqxlPTphCPUZq7CM+Kr5Obq2x0tGUAdZmd7edMklEd+SX3fbiA218GaZwvpL8C/xVadzlRo7wyvje+Qq41KUtsWd/uYH692vZvfldxvy5j1VedSucQJImbsrmwZ4GPLrWYCRUziSHbaiAdd7iyx3NOFx/NykH4S2omChXtMWF6cOIGyrhS0vBUfkyw+IIRabF3cqAcw1i7uiTADMTJh9V/vjIjngYcQe1oV53OM87HSImfUq+Snx1CisFan7soWWh4k7/LHbh1HHsFX6FXnieP0whU/9VU3oxhuyC6bfvAhDLz+aFYVQ/W/9JV6hzYxKKwt4GGGDTeSvp6SPP9wVrahi2cCfCgYjCqYwKsvQv3ULCKsnu4IKU2p7IHgGsO8vEwngkvS1H/xfnEswxu596rI7aqGLx4k2Jt0hZe4m2vOriw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When a group of tasks that access different nodes are scheduled on the same node, they may encounter bandwidth bottlenecks and access latency. Thus, numa_aware flag is introduced here, allowing tasks to be distributed across different nodes to fully utilize the advantage of multi-node systems. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song Reviewed-by: Tim Chen --- include/linux/padata.h | 2 ++ kernel/padata.c | 14 ++++++++++++-- mm/mm_init.c | 1 + 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 495b16b6b4d72..8f418711351bc 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -137,6 +137,7 @@ struct padata_shell { * appropriate for one worker thread to do at once. * @max_threads: Max threads to use for the job, actual number may be less * depending on task size and minimum chunk size. + * @numa_aware: Distribute jobs to different nodes with CPU in a round robin fashion. */ struct padata_mt_job { void (*thread_fn)(unsigned long start, unsigned long end, void *arg); @@ -146,6 +147,7 @@ struct padata_mt_job { unsigned long align; unsigned long min_chunk; int max_threads; + bool numa_aware; }; /** diff --git a/kernel/padata.c b/kernel/padata.c index 179fb1518070c..e3f639ff16707 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -485,7 +485,8 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) struct padata_work my_work, *pw; struct padata_mt_job_state ps; LIST_HEAD(works); - int nworks; + int nworks, nid; + static atomic_t last_used_nid __initdata; if (job->size == 0) return; @@ -517,7 +518,16 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) ps.chunk_size = roundup(ps.chunk_size, job->align); list_for_each_entry(pw, &works, pw_list) - queue_work(system_unbound_wq, &pw->pw_work); + if (job->numa_aware) { + int old_node = atomic_read(&last_used_nid); + + do { + nid = next_node_in(old_node, node_states[N_CPU]); + } while (!atomic_try_cmpxchg(&last_used_nid, &old_node, nid)); + queue_work_node(nid, system_unbound_wq, &pw->pw_work); + } else { + queue_work(system_unbound_wq, &pw->pw_work); + } /* Use the current thread, which saves starting a workqueue worker. */ padata_work_init(&my_work, padata_mt_helper, &ps, PADATA_WORK_ONSTACK); diff --git a/mm/mm_init.c b/mm/mm_init.c index 2c19f5515e36c..549e76af8f82a 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2231,6 +2231,7 @@ static int __init deferred_init_memmap(void *data) .align = PAGES_PER_SECTION, .min_chunk = PAGES_PER_SECTION, .max_threads = max_threads, + .numa_aware = false, }; padata_do_multithreaded(&job); From patchwork Fri Jan 26 15:24:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E6F0C47422 for ; Fri, 26 Jan 2024 15:25:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F1E96B0095; Fri, 26 Jan 2024 10:25:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A08A6B0096; Fri, 26 Jan 2024 10:25:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0418E6B0098; Fri, 26 Jan 2024 10:25:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E49796B0095 for ; Fri, 26 Jan 2024 10:25:29 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B9FAD1408FE for ; Fri, 26 Jan 2024 15:25:29 +0000 (UTC) X-FDA: 81721836378.19.257B41D Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) by imf27.hostedemail.com (Postfix) with ESMTP id 0BEB040005 for ; Fri, 26 Jan 2024 15:25:27 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="gXsmiu/Z"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282728; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DgUhGSnAajARz2sR8zEsgOZsTNsgDGT2+wOJsJvBtFE=; b=3Il8eV8L/nbSdLZjotrsH0+xC9e66m7oLWkin3ZxaqmIi1ePNDcsTwblTeU4M/UHPLHFQ8 M9kBo787AxpBOS39fILq8waqxDVWRejbBwWjsIuqIw8nXIhwFr+8t2WNEDeAbNWYoC/hxN 2RyPjLKv1CSO6miaEyGPtBbokXy74jg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="gXsmiu/Z"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282728; a=rsa-sha256; cv=none; b=DH9qhfdKusUCHOPjrsa56cWk1sivzJCOfQaZ/mGbIkXxUFuPy8r1xHJQaHVpZBxdooPa2C 7yfk8/P4OPX0k08HymJyOmbH3UUg0dlsTXGoXwrr6rTQsIDRh8Qt+emHeowg5nTFtjG4NV uAf9aZfBpF715an3/reAUCc5ZzpqawY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282726; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DgUhGSnAajARz2sR8zEsgOZsTNsgDGT2+wOJsJvBtFE=; b=gXsmiu/ZCf1nMQFyOdSPcIWiSFw45L9E3U4xHBNZLrlxxSNlwx6qWum6Nk/BZJ5TD6kyb2 jZji4uY+b0TrkNYU777z3nkzvP+RZPRiUs6uA84GSunLLr+1e8DmhHiDWLWBdX0usDWBhv Au8J+F364De+i5U9tAHRO6qcgDJNC1o= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 4/7] hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc Date: Fri, 26 Jan 2024 23:24:08 +0800 Message-Id: <20240126152411.1238072-5-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 0BEB040005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: qoda4s7hkn3nye37dkcjg941qbjyrxii X-HE-Tag: 1706282727-128505 X-HE-Meta: U2FsdGVkX1+a8SmM2gzvf1/2Iil0Pm+wBVWR6nI/7yEJWEDer9dmfenoO3OHU9mrG/vixtQAKsOTLGI0y8mzRHyrJDyVKePIpl8dTONEO1CP9wp3+dQMAfB8Q6FxjfsQfV8cJ61qg2J8ZQYRcVaKJ8FI7SZh9HOfP1eFUpjI084hHIFEqd//ROyeFDabONI8VC55rw67Y07yuyyBIuE3RNqp1FbVJtzeZVdUf/Pexc10qQ1tfrihMdp6N+rABUHzVd1K3G58ClI1AOU/czGFOiLDAnmr3eN0DhrWo7fWB4Jzf+ONFJdmwZVMD6P+7knQAiK8cZ9AyD/L489NaP3UKtOpiN2y2KlkFQoK/oFF0ZsJtvL3mp+VFgSVwFKc/1vMadeJhuqnDaA4P39rNDdUaDimByowlnznzr1Lwy+qdIWi8RqHAtifpIG1qYWQe0nQRwWlFKhmTJ1xBidzyr4otP8SUkb7Mty8K4HckhLXq9HWRY1zYb1c26VyoNB+Yv4BF4FzA5rulE7DuH9J+MWG0LgUIQJkbEWAgzhwGKuQMTXnLxhsGTOjqN7L7QXLdOvRyHQw1BsxbGyktCQa219gDos1EFuMzxdhfkfB5CAAUbQnOp8bFzjyEIJpkFQEpEmJj+JIBUHCDW5OHFz2JsHExb1Fbwb+oCbx83maHQHQ/38DuJlJx4qxpwGeBHnpCSBj4myZFILBGosL3VFJLAsFZ9yxclW6Q8fUkGSV/a/W8r6ctvPpcOeREFkx3QKnrZLVzrnMOBR7RhKi/ae2Rn+WtF8ixWyWlmM4z/G6U+3iRZkyOw1olnOyalXPhRJuqB+jcjlcPts1FoPS+oHEFa31yJqgqgYSMlxbZxdlSSxjX4zki+N9MT5LYzk4vDjlkNZDlhDa12wodZhXitzOGaBRq5FJOM98+frOXj8j/BxNcoqyUrxINaMB4W1RY094AdLnsKLfuhdPXVOOnvp5XwR yfukiwRa XC8ULI+HTfT1BbFXtVAN+xSkyZMYr/gHc0MAX/fCJk+hPkuw+yOY6Jj/hQyuu1PkzrreQ2P692FOw72H9sR93ij03D/0ZexzS3wy5RRYW9DECSul0HSTXryjDcnDYVvF3wrq+TFz0+qtU0PtjhpP+n+sVaQVV7fnrGdeF9+uaa1aiWSGVGP5nL0Zy53+SL1xA482VREWxvQtfwbeO1gA5vZRzv8jIAzqfK5hDGVPgYFj/eGk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With parallelization of hugetlb allocation across different threads, each thread works on a differnet node to allocate pages from, instead of all allocating from a common node h->next_nid_to_alloc. To address this, it's necessary to assign a separate next_nid_to_alloc for each thread. Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc have been modified to directly accept a *next_nid_to_alloc parameter, ensuring thread-specific allocation and avoiding concurrent access issues. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Tim Chen Reviewed-by: Muchun Song --- mm/hugetlb.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 00bbf7442eb6c..e4e8ffa1c145a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1464,15 +1464,15 @@ static int get_valid_node_allowed(int nid, nodemask_t *nodes_allowed) * next node from which to allocate, handling wrap at end of node * mask. */ -static int hstate_next_node_to_alloc(struct hstate *h, +static int hstate_next_node_to_alloc(int *next_node, nodemask_t *nodes_allowed) { int nid; VM_BUG_ON(!nodes_allowed); - nid = get_valid_node_allowed(h->next_nid_to_alloc, nodes_allowed); - h->next_nid_to_alloc = next_node_allowed(nid, nodes_allowed); + nid = get_valid_node_allowed(*next_node, nodes_allowed); + *next_node = next_node_allowed(nid, nodes_allowed); return nid; } @@ -1495,10 +1495,10 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed) return nid; } -#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask) \ +#define for_each_node_mask_to_alloc(next_node, nr_nodes, node, mask) \ for (nr_nodes = nodes_weight(*mask); \ nr_nodes > 0 && \ - ((node = hstate_next_node_to_alloc(hs, mask)) || 1); \ + ((node = hstate_next_node_to_alloc(next_node, mask)) || 1); \ nr_nodes--) #define for_each_node_mask_to_free(hs, nr_nodes, node, mask) \ @@ -2350,12 +2350,13 @@ static void prep_and_add_allocated_folios(struct hstate *h, */ static struct folio *alloc_pool_huge_folio(struct hstate *h, nodemask_t *nodes_allowed, - nodemask_t *node_alloc_noretry) + nodemask_t *node_alloc_noretry, + int *next_node) { gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; int nr_nodes, node; - for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { + for_each_node_mask_to_alloc(next_node, nr_nodes, node, nodes_allowed) { struct folio *folio; folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, node, @@ -3310,7 +3311,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) goto found; } /* allocate from next node when distributing huge pages */ - for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) { + for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, &node_states[N_MEMORY]) { m = memblock_alloc_try_nid_raw( huge_page_size(h), huge_page_size(h), 0, MEMBLOCK_ALLOC_ACCESSIBLE, node); @@ -3679,7 +3680,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, VM_BUG_ON(delta != -1 && delta != 1); if (delta < 0) { - for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { + for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, nodes_allowed) { if (h->surplus_huge_pages_node[node]) goto found; } @@ -3794,7 +3795,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, cond_resched(); folio = alloc_pool_huge_folio(h, nodes_allowed, - node_alloc_noretry); + node_alloc_noretry, + &h->next_nid_to_alloc); if (!folio) { prep_and_add_allocated_folios(h, &page_list); spin_lock_irq(&hugetlb_lock); From patchwork Fri Jan 26 15:24:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31083C47DDF for ; Fri, 26 Jan 2024 15:25:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4F556B0098; Fri, 26 Jan 2024 10:25:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C26B16B0099; Fri, 26 Jan 2024 10:25:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B16116B009A; Fri, 26 Jan 2024 10:25:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A0CA36B0098 for ; Fri, 26 Jan 2024 10:25:35 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 50C5C1C19BB for ; Fri, 26 Jan 2024 15:25:35 +0000 (UTC) X-FDA: 81721836630.25.73FB993 Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175]) by imf18.hostedemail.com (Postfix) with ESMTP id 9C7CC1C0004 for ; Fri, 26 Jan 2024 15:25:33 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=IkyvnIde; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282733; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fxc6M49wsiZtuAiZJk8Hxt8gV/1ixSfY+RSkangwfeM=; b=qvcpiLHvtX39PeIVL23XQwoB0LGgTKUKGlO8fy4DUNCauU0wihP9cVe8gKsS6H3gmSUs95 ZXTW9EYz9q5DbaXsqYx1lMQZeNrFSSW58YPJexa5ljfJKYLWvvOyA6NOH2LlFgWo6lFduh RVMdRaYk/eJZGHaxq0r1M3SzKfQBkEQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=IkyvnIde; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282733; a=rsa-sha256; cv=none; b=iAYEs0YX0DAOwU6NNeKTKkawn0TumpUiT/atwjcqJ1jfUVogpQlsiBXWYkbG29QYBIFbmc R8s3jsqk+8J7Oqd8nnUDRc8raBzxgUzQ0Hr2ygpOiMau58LXqNB8q6R1G9/OeyawYZfzQn 4dc/gemwHucLtXvJ44N3bma6o3uEIVI= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282732; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fxc6M49wsiZtuAiZJk8Hxt8gV/1ixSfY+RSkangwfeM=; b=IkyvnIdeEAXzyNoF0LqdZSDTbiTOQSu5etCNgZoKti/tjJbcin7DkJcC3ohdXGZT0Tnwyh /g1l+hlYRjpoHgsQS3F7EjYdzCdbyPl6t77EACOC7usJara6gjWSefTrR2k8x+WlfEOi3s WjM+WuvVeiatPmm6FNEI9O1LnuQ1/Ow= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 5/7] hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA Date: Fri, 26 Jan 2024 23:24:09 +0800 Message-Id: <20240126152411.1238072-6-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: rukonhytwxm3jzrwbwtdie6jezqw1t1a X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9C7CC1C0004 X-HE-Tag: 1706282733-664669 X-HE-Meta: U2FsdGVkX1+G/zUYHcjM4ZsKuASBfssTWvs4EuKqWv6D/CYk55P1xsqV1ZFhmL9beiQ3QlzTQAxMGA2hpjz1sN4SN1FyxSBNXXc0baC1X6XbPwyD6ksTT09wbA3YMuNXfMszYJlqvfygCWgAQfYnQYJxUkYOXGWDMQK5xCAKWpycmr6HiKa1RNsKH85gcMAkFa1n2mOWxMjlp8GJsWU+r2z/n6CFyozh188wM0OxWhYP6uXgaqq32xSwUKrsGrGumn2FHFr1OxUgXXCzDAUxO5gu/Rp+LpnxaF/QjjAL31hrw0Xw2fQ/S07TNuQG1hau++rIskKVCYsvMTD/am0YfZKXqFbmCe/PQROaDq1ZPR8qbg7jCqMQsxP9+6q9nZX+MRA5tPUtDU9robll3nOxQa4X0NDWvGw0QIEZ82UbXACGm0aPZIKL3g2Vpvg/PPSZljXchYsmk6nnsaRauQ4Rmg82bVSZTtLVv0TFoKLeEM7+o6LnKsYJC1HXDLNYCTMV7qoLD8TPx2sBZOS0ldbxvWigwsiXj3AGhaN5iReBYsXNVezzKVBNmMtVTec5u9yX7TrJ3/KY0WUh6w0H9Q092l4Pjdg/APY9aO1Fbtj7lUSADnWgYhAalRnIZXUujjeLREsDtyDEoh8y9BLCGH8H5LY94/ukMrRVT4nubgj+kCSuhr3lBrylJ0W/gb7cPuFaNr7PmtUcGglMbeWnJbayOmzbhX+3LAaHXsiwuDwDowbjKKGpZ3d2yZwoRiLpxspdKEY3M+QmDpPUM5nxrpYtLZZp2tcF0BFFtmm6+Hj7eHSAD4Tj8k6HflQD+rYNdiLeWNCVArdOBcqmvd43YwiYX5Fqw9bxqbValcz3pXHBPURsnKsTevBjz/kdiJ3ranZgRgpvogLuD5ncpae+9Xv2zLaW1aMUJbx9vPCzslhS0SwRcmYb6zqsb893+/LlUsK09+hJ7Ay6pB/G9fyfGTy Jl5Bt5r0 wASJEq1MkgyCkng9Hh8IfNfwyqAv7WGRiLUfo/Ub5pzb0ytf/Uvbb/4J+JYNzyZTxuX2ZobZgJJ6vDbaj/RKX9lYjncui8cGqbgTVGrAgUu0pH62oaKju46+ktXNT1VY3yxirAWpviUFsl6MhpMFELABhDF4C5BH6SyTzXZHJSYw7HQAdHGqxNRcc6CdawEy1dOFFL4ba7tl8TcOtmcduSWhE5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Allow hugetlb use padata_do_multithreaded for parallel initialization. Select CONFIG_PADATA in this case. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song --- fs/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/Kconfig b/fs/Kconfig index ea2f77446080e..3abc107ab2fbd 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -261,6 +261,7 @@ menuconfig HUGETLBFS depends on X86 || SPARC64 || ARCH_SUPPORTS_HUGETLBFS || BROKEN depends on (SYSFS || SYSCTL) select MEMFD_CREATE + select PADATA help hugetlbfs is a filesystem backing for HugeTLB pages, based on ramfs. For architectures that support it, say Y here and read From patchwork Fri Jan 26 15:24:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39D88C47DDF for ; Fri, 26 Jan 2024 15:25:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C506E6B009A; Fri, 26 Jan 2024 10:25:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BFFB16B009B; Fri, 26 Jan 2024 10:25:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA1776B009C; Fri, 26 Jan 2024 10:25:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9709C6B009A for ; Fri, 26 Jan 2024 10:25:41 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6DD0BA08CC for ; Fri, 26 Jan 2024 15:25:41 +0000 (UTC) X-FDA: 81721836882.28.36F0557 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) by imf02.hostedemail.com (Postfix) with ESMTP id C7F5E80018 for ; Fri, 26 Jan 2024 15:25:39 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=qwIcy4O8; spf=pass (imf02.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jedycsf2hoazZl2VD5dKzn3Ogz/yNT2KcxNdOLYavB4=; b=j8MbZRWsymfKxc6fFceY0Auk8+zr9TjNXMBqJ4Gzpert2dQKj7p1028B5b9DhKXtaS+Gyt dZBYIve0WMw98ehMzDiUKkL4w2TVocFgVFcxz6blLEhNgBwxc2I6S4/Bt7BNAf9iRGGDKA pR6KJcgaQ3uYNYjj3vLKWu161WJZoZY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=qwIcy4O8; spf=pass (imf02.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282740; a=rsa-sha256; cv=none; b=G8z/EnpUUIS5ngZ35iViDr5rW8/Zv7vWEVfDvkABmyWrDgnyG+aDlDNt46mvjftSH5Gsv6 Oppf39dAEHcYrBsCoszt6c0+Xl3NMp9p2s0mTXi2T4GqPsSunkdsywjx35TzvanLMq2sCn 5A+VchaYNRioVcC5WnVEULYdPcn8nh8= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jedycsf2hoazZl2VD5dKzn3Ogz/yNT2KcxNdOLYavB4=; b=qwIcy4O8oYc6ieAe8OwvpEKH4wyBee3Jlif+nxuPJKFmuXp+E3oYFgTjBUsVhVcTbe3PJO LbG1Yp4MWWpIydaJlX5rrJfhgvOcuD2EKuJO/Yme8P9fG6DM1ACjVdMyPO1FZ4Ag0JyV0D a0k79BNlAPNr4wc8ZdQcdQUIK48SoXc= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 6/7] hugetlb: parallelize 2M hugetlb allocation and initialization Date: Fri, 26 Jan 2024 23:24:10 +0800 Message-Id: <20240126152411.1238072-7-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: C7F5E80018 X-Rspam-User: X-Stat-Signature: y3j3w1s1m9qgq1m6x57u6iyqgqysbwk4 X-Rspamd-Server: rspam01 X-HE-Tag: 1706282739-832883 X-HE-Meta: U2FsdGVkX1/57Amtu9DdR+il36ZgXybuX3qLtMOpnQBIdQ+kltLrvOQJERzJlzanpTzw0CpkrQChA7jhtxBhMFHduuZLyjlLH8M/HglAE3nTjdx1KsIs0jzBK+fok87ZeNS11k3wEUQYfurN3FPSr2Byfwg1i2CZrn7IDfadvkkoNItHxtrax6Et2ld4VMyniI/bRKrLeG4th4Yw2tkjGCzXDpCsNMbRYlZCJcHVkhOLlMiDpESuUuUvqf24N/VS7m4UHxVU6b9f23V/csL7nj3n8AaiN1xeXLoe4VqFQ2sinXlDsW5i02hQ7n45/xXCDsDw6kOPhLiys2BWw2RS2rETDZ8Nn+ePw8+TZEsDw58F54Qg8Um6hl7H64IoaLsmTuevNZI/OqYfM2CF4xqwRSKpUIrmf11ox3agzbuXBHVt3e5IirNgAaPT695K2ZR649mrORHqM5YbtLQlw5oVFQp2vbJti9ectYQGNm7H6TXL3w4IKw6NC+jmUu/I6ZJd7Xb55rK5W6Ah+5rQzrLDFJzq4aSIcVPjnbRO13kYl2/Ixf1cUGshJM6ojMzgDldxIk1XWGT1P+OIPN/7kx6qrZiaughH2qBcZ3uL2ofRL5m3eviKX+sXBHHqiNP3kO8UqL8zGIDsnFsEuytitxYpHycvrm+5qXGCwVQjjqriOlkm0F9gXU5T608EhqU1tO5Jm1RItueLs/SiJZkzO/6IfkL18uRcOgRis9nxoFR9KTK2RAzoY59jeBGFaXf9cR55Qi50cQZ0nU7HsXcvTJN7bYK0fe99rEZV1dTaoAAqJHHl26JsPLI/p0iph2TqUbYGq/pT9k2f4Gg4dmaIsUNBUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: By distributing both the allocation and the initialization tasks across multiple threads, the initialization of 2M hugetlb will be faster, thereby improving the boot speed. Here are some test results: test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song --- mm/hugetlb.c | 73 ++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 56 insertions(+), 17 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e4e8ffa1c145a..385840397bce5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -3510,6 +3511,30 @@ static void __init hugetlb_hstate_alloc_pages_errcheck(unsigned long allocated, } } +static void __init hugetlb_pages_alloc_boot_node(unsigned long start, unsigned long end, void *arg) +{ + struct hstate *h = (struct hstate *)arg; + int i, num = end - start; + nodemask_t node_alloc_noretry; + LIST_HEAD(folio_list); + int next_node = first_online_node; + + /* Bit mask controlling how hard we retry per-node allocations.*/ + nodes_clear(node_alloc_noretry); + + for (i = 0; i < num; ++i) { + struct folio *folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY], + &node_alloc_noretry, &next_node); + if (!folio) + break; + + list_move(&folio->lru, &folio_list); + cond_resched(); + } + + prep_and_add_allocated_folios(h, &folio_list); +} + static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstate *h) { unsigned long i; @@ -3525,26 +3550,40 @@ static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstate *h) static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) { - unsigned long i; - struct folio *folio; - LIST_HEAD(folio_list); - nodemask_t node_alloc_noretry; - - /* Bit mask controlling how hard we retry per-node allocations.*/ - nodes_clear(node_alloc_noretry); + struct padata_mt_job job = { + .fn_arg = h, + .align = 1, + .numa_aware = true + }; - for (i = 0; i < h->max_huge_pages; ++i) { - folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY], - &node_alloc_noretry); - if (!folio) - break; - list_add(&folio->lru, &folio_list); - cond_resched(); - } + job.thread_fn = hugetlb_pages_alloc_boot_node; + job.start = 0; + job.size = h->max_huge_pages; - prep_and_add_allocated_folios(h, &folio_list); + /* + * job.max_threads is twice the num_node_state(N_MEMORY), + * + * Tests below indicate that a multiplier of 2 significantly improves + * performance, and although larger values also provide improvements, + * the gains are marginal. + * + * Therefore, choosing 2 as the multiplier strikes a good balance between + * enhancing parallel processing capabilities and maintaining efficient + * resource management. + * + * +------------+-------+-------+-------+-------+-------+ + * | multiplier | 1 | 2 | 3 | 4 | 5 | + * +------------+-------+-------+-------+-------+-------+ + * | 256G 2node | 358ms | 215ms | 157ms | 134ms | 126ms | + * | 2T 4node | 979ms | 679ms | 543ms | 489ms | 481ms | + * | 50G 2node | 71ms | 44ms | 37ms | 30ms | 31ms | + * +------------+-------+-------+-------+-------+-------+ + */ + job.max_threads = num_node_state(N_MEMORY) * 2; + job.min_chunk = h->max_huge_pages / num_node_state(N_MEMORY) / 2; + padata_do_multithreaded(&job); - return i; + return h->nr_huge_pages; } /* From patchwork Fri Jan 26 15:24:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang Li X-Patchwork-Id: 13532821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 776A3C47DDF for ; Fri, 26 Jan 2024 15:25:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E99DD6B009C; Fri, 26 Jan 2024 10:25:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4AA06B009D; Fri, 26 Jan 2024 10:25:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D12076B009E; Fri, 26 Jan 2024 10:25:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C2C226B009C for ; Fri, 26 Jan 2024 10:25:49 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A9F92A249C for ; Fri, 26 Jan 2024 15:25:49 +0000 (UTC) X-FDA: 81721837218.24.3D184FF Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf29.hostedemail.com (Postfix) with ESMTP id 2930612002C for ; Fri, 26 Jan 2024 15:25:47 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=dYyUEugy; spf=pass (imf29.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706282748; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0iArgVEROeDskUb/co444WxwcA+iCGyxYPbAZNymH1g=; b=pz4RXzLdyiWkGHqXrdT3fX3jpGy9u5VMaoMfRVymqkn4JpiJi9YiAejLbEVPJ0Z/P5g4hR pVD27FKOpxUNk818a24hTLYQd8mNMgMbLrw+nGwX9MneOwayKMVHnfesjXsv3BZNZow/iC LEagIs2S3aqwGAJvUT9qM36mmbFVFIA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706282748; a=rsa-sha256; cv=none; b=pOcZNTO5qlAjpaRWVUEEdANs4yd2asv/s8wzfRAA+NlVRlb4Ln+SOT+LnCJOWMlCQrymcK pi/PBGpC/2rhXMrelmXnPEUOAuEytC1eOgRXQrLrwsGbmcscDB/ptguSIx51lVpLZMnM/f 6qbByQyap2wj4flnG+sxNjlfQBv56PQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=dYyUEugy; spf=pass (imf29.hostedemail.com: domain of gang.li@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706282746; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0iArgVEROeDskUb/co444WxwcA+iCGyxYPbAZNymH1g=; b=dYyUEugy+xtDscuDVSmLIJChMKlpJo7hUQlBy8G4LmvOhGPDib/TUY2KXKt7cRInLKAa1I b1QrLCFOaKrSbCu0c2Y3x47nwNUIIzTHzZvBCbhCbYQlzL25YqVImvkO36hN0p1LqAbwUF 8bvuvbI8HTBoL2mr29vGEt4GJQlLJPQ= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v5 7/7] hugetlb: parallelize 1G hugetlb initialization Date: Fri, 26 Jan 2024 23:24:11 +0800 Message-Id: <20240126152411.1238072-8-gang.li@linux.dev> In-Reply-To: <20240126152411.1238072-1-gang.li@linux.dev> References: <20240126152411.1238072-1-gang.li@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: p95sqduyqwhgqjuguxmrk561tfd5ydap X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2930612002C X-Rspam-User: X-HE-Tag: 1706282747-284635 X-HE-Meta: U2FsdGVkX1/aKlKnYRPNaTwfk+krui9oIpeB+jDgghAuqDGD18iLgXm3fSBgPrjfnQahDeUZ2CYRr3+dOBsmxbEy9rRweknyoLgunZaSQh3FIfuQpwBSmKJ0as3MPZwBV6sJ+U2ICs30Ztip9PyMKkG7/CV9FSNbqxA1KdKSZQ7oTJyYyAmKqVIjGfPQkZ5FlvS9nX5Otn8qTUq6zuJNDnDnbLL8+YwRitt6hN9NtHqiaQLRN+MccgYinuGrUlQvp9Uv+qSarN9P86sdknrBihVnCX/HnGpNEiTJPF77txxqvBHQdb3ZuZweE4SPna0NMV76VJbT9gSyJ80v6K/Q9GUWb5dTq3CRPdq5E+vyJZKx9PRv1XapIvxxldhSyTcUCxxciUe6JnSIo5wAk1PDt1OEUoOJh1XR9PxDHVprq5wpv9veg7EjDlwNTGXbPWtu39PT7KuRdbkiVrrsIYk//mSZmp8RtaO0NNLdhULm28am/NHumgyNyMHUxwVvH24Glzk8Ejpv/0UifnNkz69JawddtabYutlZVaFax4929JKr/8OHafnBoYqgJVn0Qy3xJBZrP6NmR/tQ2fY0gAhfyFi8nTAs0se3BXwnBlCQXZeeBNb5SfEkM8x0grlayYHS+6pyXxyWyuRPvKi7YEYmUvXeENFytlvR1QNr5swPJ7yxW76t084uEnf5wvqSzOFq1/RenBnj9vD0A+LXU0IW7+4G62Xd+ExPV1LrVJgzjugIEXUPiFmZiCAryI+rbRySrE0b2g2z3QM2KWpsj9VCj3uwadN4areWwdYr9vq/UMdXjqpbVT7//zaw+wyC7rYDUW4CP87CDJjAh6bD7dTU7Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Optimizing the initialization speed of 1G huge pages through parallelization. 1G hugetlbs are allocated from bootmem, a process that is already very fast and does not currently require optimization. Therefore, we focus on parallelizing only the initialization phase in `gather_bootmem_prealloc`. Here are some test results: test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song --- arch/powerpc/mm/hugetlbpage.c | 2 +- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 44 ++++++++++++++++++++++++++++------- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 0a540b37aab62..a1651d5471862 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -226,7 +226,7 @@ static int __init pseries_alloc_bootmem_huge_page(struct hstate *hstate) return 0; m = phys_to_virt(gpage_freearray[--nr_gpages]); gpage_freearray[nr_gpages] = 0; - list_add(&m->list, &huge_boot_pages); + list_add(&m->list, &huge_boot_pages[0]); m->hstate = hstate; return 1; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c1ee640d87b11..77b30a8c6076b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -178,7 +178,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); extern int sysctl_hugetlb_shm_group; -extern struct list_head huge_boot_pages; +extern struct list_head huge_boot_pages[MAX_NUMNODES]; /* arch callbacks */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 385840397bce5..eee0c456f6571 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -69,7 +69,7 @@ static bool hugetlb_cma_folio(struct folio *folio, unsigned int order) #endif static unsigned long hugetlb_cma_size __initdata; -__initdata LIST_HEAD(huge_boot_pages); +__initdata struct list_head huge_boot_pages[MAX_NUMNODES]; /* for command line parsing */ static struct hstate * __initdata parsed_hstate; @@ -3301,7 +3301,7 @@ int alloc_bootmem_huge_page(struct hstate *h, int nid) int __alloc_bootmem_huge_page(struct hstate *h, int nid) { struct huge_bootmem_page *m = NULL; /* initialize for clang */ - int nr_nodes, node; + int nr_nodes, node = nid; /* do node specific alloc */ if (nid != NUMA_NO_NODE) { @@ -3339,7 +3339,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) huge_page_size(h) - PAGE_SIZE); /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); - list_add(&m->list, &huge_boot_pages); + list_add(&m->list, &huge_boot_pages[node]); m->hstate = h; return 1; } @@ -3390,8 +3390,6 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h, /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); - /* Add all new pool pages to free lists in one lock cycle */ - spin_lock_irqsave(&hugetlb_lock, flags); list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { if (!folio_test_hugetlb_vmemmap_optimized(folio)) { /* @@ -3404,23 +3402,27 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h, HUGETLB_VMEMMAP_RESERVE_PAGES, pages_per_huge_page(h)); } + /* Subdivide locks to achieve better parallel performance */ + spin_lock_irqsave(&hugetlb_lock, flags); __prep_account_new_huge_page(h, folio_nid(folio)); enqueue_hugetlb_folio(h, folio); + spin_unlock_irqrestore(&hugetlb_lock, flags); } - spin_unlock_irqrestore(&hugetlb_lock, flags); } /* * Put bootmem huge pages into the standard lists after mem_map is up. * Note: This only applies to gigantic (order > MAX_PAGE_ORDER) pages. */ -static void __init gather_bootmem_prealloc(void) +static void __init gather_bootmem_prealloc_node(unsigned long start, unsigned long end, void *arg) + { + int nid = start; LIST_HEAD(folio_list); struct huge_bootmem_page *m; struct hstate *h = NULL, *prev_h = NULL; - list_for_each_entry(m, &huge_boot_pages, list) { + list_for_each_entry(m, &huge_boot_pages[nid], list) { struct page *page = virt_to_page(m); struct folio *folio = (void *)page; @@ -3453,6 +3455,22 @@ static void __init gather_bootmem_prealloc(void) prep_and_add_bootmem_folios(h, &folio_list); } +static void __init gather_bootmem_prealloc(void) +{ + struct padata_mt_job job = { + .thread_fn = gather_bootmem_prealloc_node, + .fn_arg = NULL, + .start = 0, + .size = num_node_state(N_MEMORY), + .align = 1, + .min_chunk = 1, + .max_threads = num_node_state(N_MEMORY), + .numa_aware = true, + }; + + padata_do_multithreaded(&job); +} + static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid) { unsigned long i; @@ -3600,6 +3618,7 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) static void __init hugetlb_hstate_alloc_pages(struct hstate *h) { unsigned long allocated; + static bool initialied __initdata; /* skip gigantic hugepages allocation if hugetlb_cma enabled */ if (hstate_is_gigantic(h) && hugetlb_cma_size) { @@ -3607,6 +3626,15 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) return; } + /* hugetlb_hstate_alloc_pages will be called many times, initialize huge_boot_pages once */ + if (!initialied) { + int i = 0; + + for (i = 0; i < MAX_NUMNODES; i++) + INIT_LIST_HEAD(&huge_boot_pages[i]); + initialied = true; + } + /* do node specific alloc */ if (hugetlb_hstate_alloc_pages_specific_nodes(h)) return;