Message ID | 20240102131249.76622-1-gang.li@linux.dev (mailing list archive) |
---|---|
Headers | show |
Series | hugetlb: parallelize hugetlb page init on boot | expand |
On Tue, 2 Jan 2024, Gang Li wrote: > Hi all, hugetlb init parallelization has now been updated to v3. > > This series is tested on next-20240102 and can not be applied to v6.7-rc8. > > Update Summary: > - Select CONFIG_PADATA as we use padata_do_multithreaded > - Fix a race condition in h->next_nid_to_alloc > - Fix local variable initialization issues > - Remove RFC tag > > Thanks to the testing by David Rientjes, we now know that this patch reduce > hugetlb 1G initialization time from 77s to 18.3s on a 12T machine[4]. > > # Introduction > Hugetlb initialization during boot takes up a considerable amount of time. > For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2 > seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel > host takes more than 1 minute[1]. This is a noteworthy figure. > > Inspired by [2] and [3], hugetlb initialization can also be accelerated > through parallelization. Kernel already has infrastructure like > padata_do_multithreaded, this patch uses it to achieve effective results > by minimal modifications. > > [1] https://lore.kernel.org/all/783f8bac-55b8-5b95-eb6a-11a583675000@google.com/ > [2] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/ > [3] https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/ > [4] https://lore.kernel.org/all/76becfc1-e609-e3e8-2966-4053143170b6@google.com/ > > # Test result > test no patch(ms) patched(ms) saved > ------------------- -------------- ------------- -------- > 256c2t(4 node) 1G 4745 2024 57.34% > 128c1t(2 node) 1G 3358 1712 49.02% > 12t 1G 77000 18300 76.23% > > 256c2t(4 node) 2M 3336 1051 68.52% > 128c1t(2 node) 2M 1943 716 63.15% > I tested 1GB hugetlb on a smaller AMD host with the following: diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3301,7 +3301,7 @@ int alloc_bootmem_huge_page(struct hstate *h, int nid) int __alloc_bootmem_huge_page(struct hstate *h, int nid) { struct huge_bootmem_page *m = NULL; /* initialize for clang */ - int nr_nodes, node; + int nr_nodes, node = nid; /* do node specific alloc */ if (nid != NUMA_NO_NODE) { After the build error is fixed, feel free to add: Tested-by: David Rientjes <rientjes@google.com> to each patch. I think Andrew will probably take a build fix up as a delta on top of patch 4 rather than sending a whole new series unless there is other feedback that you receive.
On 2024/1/3 09:52, David Rientjes wrote: > > I tested 1GB hugetlb on a smaller AMD host with the following: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3301,7 +3301,7 @@ int alloc_bootmem_huge_page(struct hstate *h, int nid) > int __alloc_bootmem_huge_page(struct hstate *h, int nid) > { > struct huge_bootmem_page *m = NULL; /* initialize for clang */ > - int nr_nodes, node; > + int nr_nodes, node = nid; > > /* do node specific alloc */ > if (nid != NUMA_NO_NODE) { > Oh, if nid != NUMA_NO_NODE and memblock_alloc_try_nid_raw succeed, `node` must take the value of `nid`. Otherwise, list_add(&m->list, &huge_boot_pages[node]) will not be executed correctly. > After the build error is fixed, feel free to add: > > Tested-by: David Rientjes <rientjes@google.com> > Thanks! > to each patch. I think Andrew will probably take a build fix up as a > delta on top of patch 4 rather than sending a whole new series unless > there is other feedback that you receive.