Message ID | 20230201204338.1337562-1-usama.arif@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Parallel CPU bringup for x86_64 | expand |
On Wed, 2023-02-01 at 20:43 +0000, Usama Arif wrote: > This patchseries is from the work done by David Woodhouse (v4: https://lore.kernel.org/all/20220201205328.123066-1-dwmw2@infradead.org/). > The parallel CPU bringup is disabled for all AMD CPUs in this version: (see discussions: https://lore.kernel.org/all/bc3f2b1332c4bb77558df8aa36493a55542fe5b9.camel@infradead.org/ and > https://lore.kernel.org/all/3b6ac86fdc800cac5806433daf14a9095be101e9.camel@infradead.org/). > > Doing INIT/SIPI/SIPI in parallel brings down the time for smpboot from ~700ms > to 100ms (85% improvement) on a server with 128 CPUs split across 2 NUMA > nodes. > > Adding another cpuhp state for do_wait_cpu_initialized to make sure cpu_init > is reached in parallel as proposed by David in v1 will bring it down further > to ~30ms. Making this change would be dependent on this patchseries, so they > could be explored if this gets merged. > > Changes across versions: > v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more > v3: Clean up x2apic patch, add MTRR optimisation, lock topology update > in preparation for more parallelisation. > v4: Fixes to the real mode parallelisation patch spotted by SeanC, to > avoid scribbling on initial_gs in common_cpu_up(), and to allow all > 24 bits of the physical X2APIC ID to be used. That patch still needs > a Signed-off-by from its original author, who once claimed not to > remember writing it at all. But now we've fixed it, hopefully he'll > admit it now :) > v5: rebase to v6.1 and remeasure performance, disable parallel bringup > for AMD CPUs. Thanks, Usama. I've updated to v6.2-rc6 since there were a few more tweaks required (and we should double-check that the new handling of cache_ap_init from a dedicated cpuhp step works right if that ends up being done in parallel). I also fixed up the complaints from the test robot; including <linux/smpboot.h> from smpboot.c and making do_cpu_up() static, and putting #ifdef CONFIG_SMP around the 'are we booting the AP?' check and code segment in head_64.S. I've made the AMD thing a CPU bug as Peter suggested, and pushed it to https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc6 for you to do the real work of actually testing it :)
On 02/02/2023 10:02, David Woodhouse wrote: > On Wed, 2023-02-01 at 20:43 +0000, Usama Arif wrote: >> This patchseries is from the work done by David Woodhouse (v4: https://lore.kernel.org/all/20220201205328.123066-1-dwmw2@infradead.org/). >> The parallel CPU bringup is disabled for all AMD CPUs in this version: (see discussions: https://lore.kernel.org/all/bc3f2b1332c4bb77558df8aa36493a55542fe5b9.camel@infradead.org/ and >> https://lore.kernel.org/all/3b6ac86fdc800cac5806433daf14a9095be101e9.camel@infradead.org/). >> >> Doing INIT/SIPI/SIPI in parallel brings down the time for smpboot from ~700ms >> to 100ms (85% improvement) on a server with 128 CPUs split across 2 NUMA >> nodes. >> >> Adding another cpuhp state for do_wait_cpu_initialized to make sure cpu_init >> is reached in parallel as proposed by David in v1 will bring it down further >> to ~30ms. Making this change would be dependent on this patchseries, so they >> could be explored if this gets merged. >> >> Changes across versions: >> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more >> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update >> in preparation for more parallelisation. >> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to >> avoid scribbling on initial_gs in common_cpu_up(), and to allow all >> 24 bits of the physical X2APIC ID to be used. That patch still needs >> a Signed-off-by from its original author, who once claimed not to >> remember writing it at all. But now we've fixed it, hopefully he'll >> admit it now :) >> v5: rebase to v6.1 and remeasure performance, disable parallel bringup >> for AMD CPUs. > > Thanks, Usama. > > I've updated to v6.2-rc6 since there were a few more tweaks required > (and we should double-check that the new handling of cache_ap_init from > a dedicated cpuhp step works right if that ends up being done in > parallel). > > I also fixed up the complaints from the test robot; including > <linux/smpboot.h> from smpboot.c and making do_cpu_up() static, and > putting #ifdef CONFIG_SMP around the 'are we booting the AP?' check and > code segment in head_64.S. > > I've made the AMD thing a CPU bug as Peter suggested, and pushed it to > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc6 > for you to do the real work of actually testing it :) Thanks David! I have tested and reposted the v6.2-rc6 patches. One thing I was mistaken about since I had rebased the patches together was that the last 100ms to 30ms optimization was coming from parallelization in x86/cpu:wait-init, when it seems to have a negligible affect. The last 70ms optimization was coming mainly from reusing timer calibration. Its a simple patch and I have added it at the end of the series. The only thing thats' missing was a sign-off from the author who I have added to the latest series.