Message ID | 20190103191849.GA24387@amt.cnet (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86: kvmguest: use TSC clocksource if invariant TSC is exposed | expand |
On Thu, Jan 03, 2019 at 05:18:49PM -0200, Marcelo Tosatti wrote: > > The invariant TSC bit has the following meaning: > > "The time stamp counter in newer processors may support an enhancement, > referred to as invariant TSC. Processor's support for invariant TSC > is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run > at a constant rate in all ACPI P-, C-. and T-states. This is the > architectural behavior moving forward. On processors with invariant TSC > support, the OS may use the TSC for wall clock timer services (instead > of ACPI or HPET timers). TSC reads are much more efficient and do not > incur the overhead associated with a ring transition or access to a > platform resource." > > IOW, TSC does not change frequency. In such case, and with > TSC scaling hardware available to handle migration, it is possible > to use the TSC clocksource directly, whose system calls are > faster. > > Reduce the rating of kvmclock clocksource to allow TSC clocksource > to be the default if invariant TSC is exposed. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index 30084ec..575857c 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -35,6 +35,7 @@ > #include <asm/x86_init.h> > #include <asm/reboot.h> > #include <asm/kvmclock.h> > +#include <asm/processor.h> > > static int kvmclock __initdata = 1; > static int kvmclock_vsyscall __initdata = 1; > @@ -325,6 +326,7 @@ static int kvmclock_setup_percpu(unsigned int cpu) > void __init kvmclock_init(void) > { > u8 flags; > + struct cpuinfo_x86 *c = &boot_cpu_data; > > if (!kvm_para_available() || !kvmclock) > return; > @@ -368,6 +370,18 @@ void __init kvmclock_init(void) > machine_ops.crash_shutdown = kvm_crash_shutdown; > #endif > kvm_get_preset_lpj(); > + > + /* > + * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate > + * with P/T states and does not stop in deep C-states. > + * > + * Invariant TSC exposed by host means kvmclock is not necessary: > + * can use TSC as clocksource. > + * > + */ > + if (c->x86_power & (1 << 8)) > + kvm_clock.rating = 299; Why check x86_power manually instead of using the feature bits that all x86 CPUs set based on x86_power? And it seems like this should check tsc_unstable as well, e.g. to account for the user explicitly disabling the TSC via kernel params. I.e.: if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && !check_tsc_unstable()) kvm_clock.rating = 299; Note that I have no idea if there are other changes necessary to promote the TSC above kvmclock, i.e. someone with more clock knowledge should review this :) > + > clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); > pv_info.name = "KVM"; > } > > > >
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 30084ec..575857c 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -35,6 +35,7 @@ #include <asm/x86_init.h> #include <asm/reboot.h> #include <asm/kvmclock.h> +#include <asm/processor.h> static int kvmclock __initdata = 1; static int kvmclock_vsyscall __initdata = 1; @@ -325,6 +326,7 @@ static int kvmclock_setup_percpu(unsigned int cpu) void __init kvmclock_init(void) { u8 flags; + struct cpuinfo_x86 *c = &boot_cpu_data; if (!kvm_para_available() || !kvmclock) return; @@ -368,6 +370,18 @@ void __init kvmclock_init(void) machine_ops.crash_shutdown = kvm_crash_shutdown; #endif kvm_get_preset_lpj(); + + /* + * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate + * with P/T states and does not stop in deep C-states. + * + * Invariant TSC exposed by host means kvmclock is not necessary: + * can use TSC as clocksource. + * + */ + if (c->x86_power & (1 << 8)) + kvm_clock.rating = 299; + clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); pv_info.name = "KVM"; }
The invariant TSC bit has the following meaning: "The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." IOW, TSC does not change frequency. In such case, and with TSC scaling hardware available to handle migration, it is possible to use the TSC clocksource directly, whose system calls are faster. Reduce the rating of kvmclock clocksource to allow TSC clocksource to be the default if invariant TSC is exposed. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>