Message ID | 20240726201332.626395-1-ankur.a.arora@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | Enable haltpoll on arm64 | expand |
> Subject: [PATCH v6 00/10] Enable haltpoll on arm64 > > This patchset enables the cpuidle-haltpoll driver and its namesake > governor on arm64. This is specifically interesting for KVM guests by > reducing IPC latencies. > > Comparing idle switching latencies on an arm64 KVM guest with > perf bench sched pipe: > > usecs/op %stdev > > no haltpoll (baseline) 13.48 +- 5.19% > with haltpoll 6.84 +- 22.07% I got similar results with VM on Grace machine (applied to 6.10). [default] # cat /sys/devices/system/cpu/cpuidle/current_driver none # perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 23.832 [sec] 23.832644 usecs/op 41959 ops/sec [With "cpuidle-haltpoll.force=1" commandline] # cat /sys/devices/system/cpu/cpuidle/current_driver haltpoll # perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 6.340 [sec] 6.340116 usecs/op 157725 ops/sec Tested-by: Misono Tomohiro <misono.tomohiro@fujitsu.com> Regards, Tomohiro > > > No change in performance for a similar test on x86: > > usecs/op %stdev > > haltpoll w/ cpu_relax() (baseline) 4.75 +- 1.76% > haltpoll w/ smp_cond_load_relaxed() 4.78 +- 2.31% > > Both sets of tests were on otherwise idle systems with guest VCPUs > pinned to specific PCPUs. One reason for the higher stdev on arm64 > is that trapping of the WFE instruction by the host KVM is contingent > on the number of tasks on the runqueue. > > > The patch series is organized in three parts: > > - patch 1, reorganizes the poll_idle() loop, switching to > smp_cond_load_relaxed() in the polling loop. > Relatedly patches 2, 3 mangle the config option ARCH_HAS_CPU_RELAX, > renaming it to ARCH_HAS_OPTIMIZED_POLL. > > - patches 4-6 reorganize the haltpoll selection and init logic > to allow architecture code to select it. > > - and finally, patches 7-10 add the bits for arm64 support. > > > What is still missing: this series largely completes the haltpoll side > of functionality for arm64. There are, however, a few related areas > that still need to be threshed out: > > - WFET support: WFE on arm64 does not guarantee that poll_idle() > would terminate in halt_poll_ns. Using WFET would address this. > - KVM_NO_POLL support on arm64 > - KVM TWED support on arm64: allow the host to limit time spent in > WFE. > > > Changelog: > > v6: > > - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL > changes together (comment from Christoph Lameter) > - threshes out the commit messages a bit more (comments from Christoph > Lameter, Sudeep Holla) > - also rework selection of cpuidle-haltpoll. Now selected based > on the architectural selection of ARCH_CPUIDLE_HALTPOLL. > - moved back to arch_haltpoll_want() (comment from Joao Martins) > Also, arch_haltpoll_want() now takes the force parameter and is > now responsible for the complete selection (or not) of haltpoll. > - fixes the build breakage on i386 > - fixes the cpuidle-haltpoll module breakage on arm64 (comment from > Tomohiro Misono, Haris Okanovic) > > > v5: > - rework the poll_idle() loop around smp_cond_load_relaxed() (review > comment from Tomohiro Misono.) > - also rework selection of cpuidle-haltpoll. Now selected based > on the architectural selection of ARCH_CPUIDLE_HALTPOLL. > - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on > arm64 now depends on the event-stream being enabled. > - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic) > - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL. > > v4 changes from v3: > - change 7/8 per Rafael input: drop the parens and use ret for the final check > - add 8/8 which renames the guard for building poll_state > > v3 changes from v2: > - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig > - add Ack-by from Rafael Wysocki on 2/7 > > v2 changes from v1: > - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ > (this improves by 50% at least the CPU cycles consumed in the tests above: > 10,716,881,137 now vs 14,503,014,257 before) > - removed the ifdef from patch 1 per RafaelW > > Please review. > > Ankur Arora (5): > cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL > cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL > arm64: idle: export arch_cpu_idle > arm64: support cpuidle-haltpoll > cpuidle/poll_state: limit POLL_IDLE_RELAX_COUNT on arm64 > > Joao Martins (4): > Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig > cpuidle-haltpoll: define arch_haltpoll_want() > governors/haltpoll: drop kvm_para_available() check > arm64: define TIF_POLLING_NRFLAG > > Mihai Carabas (1): > cpuidle/poll_state: poll via smp_cond_load_relaxed() > > arch/Kconfig | 3 +++ > arch/arm64/Kconfig | 10 ++++++++++ > arch/arm64/include/asm/cpuidle_haltpoll.h | 9 +++++++++ > arch/arm64/include/asm/thread_info.h | 2 ++ > arch/arm64/kernel/cpuidle.c | 23 +++++++++++++++++++++++ > arch/arm64/kernel/idle.c | 1 + > arch/x86/Kconfig | 5 ++--- > arch/x86/include/asm/cpuidle_haltpoll.h | 1 + > arch/x86/kernel/kvm.c | 13 +++++++++++++ > drivers/acpi/processor_idle.c | 4 ++-- > drivers/cpuidle/Kconfig | 5 ++--- > drivers/cpuidle/Makefile | 2 +- > drivers/cpuidle/cpuidle-haltpoll.c | 12 +----------- > drivers/cpuidle/governors/haltpoll.c | 6 +----- > drivers/cpuidle/poll_state.c | 21 ++++++++++++++++----- > drivers/idle/Kconfig | 1 + > include/linux/cpuidle.h | 2 +- > include/linux/cpuidle_haltpoll.h | 5 +++++ > 18 files changed, 94 insertions(+), 31 deletions(-) > create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h > > -- > 2.43.5
Tomohiro Misono (Fujitsu) <misono.tomohiro@fujitsu.com> writes: >> Subject: [PATCH v6 00/10] Enable haltpoll on arm64 >> >> This patchset enables the cpuidle-haltpoll driver and its namesake >> governor on arm64. This is specifically interesting for KVM guests by >> reducing IPC latencies. >> >> Comparing idle switching latencies on an arm64 KVM guest with >> perf bench sched pipe: >> >> usecs/op %stdev >> >> no haltpoll (baseline) 13.48 +- 5.19% >> with haltpoll 6.84 +- 22.07% > > I got similar results with VM on Grace machine (applied to 6.10). Great. Thanks for testing. > [default] > # cat /sys/devices/system/cpu/cpuidle/current_driver > none > # perf bench sched pipe > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 23.832 [sec] > > 23.832644 usecs/op > 41959 ops/sec > > [With "cpuidle-haltpoll.force=1" commandline] > # cat /sys/devices/system/cpu/cpuidle/current_driver > haltpoll > # perf bench sched pipe > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 6.340 [sec] > > 6.340116 usecs/op > 157725 ops/sec > > Tested-by: Misono Tomohiro <misono.tomohiro@fujitsu.com> Thanks! -- ankur