mbox series

[RFC,0/10] Cleaning up the KVM clock mess

Message ID 20240418193528.41780-1-dwmw2@infradead.org (mailing list archive)
Headers show
Series Cleaning up the KVM clock mess | expand

Message

David Woodhouse April 18, 2024, 7:34 p.m. UTC
I lied, there aren't three different definitions of the KVM clock. The
fourth is on i386, where it's still based on CLOCK_MONOTONIC (well,
boot time which might as well be the same sine we offset it anyway).

If we fix *that* to be based on CLOCK_MONOTONIC_RAW then we can rip out
a whole bunch of mechanisms which were added to cope with NTP frequency
skew.

This cleans up the mess and gets us back down to the two unavoidable
definitions of the KVM clock: when the host and guest TSCs are well
behaved and in sync, it's in "master clock" mode where it's defined as
a simple arithmetic function of the guest TSC, otherwise it's clamped
to the host's CLOCK_MONOTONIC_RAW.

It includes Jack's KVM_[GS]ET_CLOCK_GUEST patch to allow accurate
migration. Also my KVM_VCPU_TSC_SCALE which exposes the precise TSC
scaling factors. This is needed to get accurate migration of the guest
TSC, and can *also* be used by userspace to have vDSO-style access to
the KVM clock. Thus allowing hypercalls and other emulated clock devices
(e.g. PIT, HPET, ACPI timer) to be based on the KVM clock too, giving
*consistency* across a live migration.

I do still need to fix KVM_REQ_MASTERCLOCK_UPDATE so that it doesn't
clamp the clock back to CLOCK_MONOTONIC_RAW; we should *update* the
ka->kvmclock_offset when we've been running in use_master_clock mode.
Should probably do that in kvm_arch_hardware_disable() for timekeeping
across hibernation too, but I haven't finished working that one out.

I think there are still some places left where KVM reads the time twice 
in close(ish) succession and then assumes they were at the *same* time, 
which I'll audit and fix too.

I also need to flesh out the test cases and do some real testing from
VMMs, but I think it's ready for some heckling at least.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks

David Woodhouse (8):
      KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
      KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
      KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
      KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration
      KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
      KVM: x86: Remove periodic global clock updates
      KVM: x86: Kill KVM_REQ_GLOBAL_CLOCK_UPDATE
      KVM: x86: Fix KVM clock precision in __get_kvmclock()

Jack Allister (2):
      KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
      KVM: selftests: Add KVM/PV clock selftest to prove timer correction

 Documentation/virt/kvm/api.rst                    |  37 ++
 Documentation/virt/kvm/devices/vcpu.rst           | 115 ++++--
 arch/x86/include/asm/kvm_host.h                   |   8 +-
 arch/x86/include/uapi/asm/kvm.h                   |   6 +
 arch/x86/kvm/svm/svm.c                            |   3 +-
 arch/x86/kvm/vmx/vmx.c                            |   2 +-
 arch/x86/kvm/x86.c                                | 438 +++++++++++++---------
 arch/x86/kvm/xen.c                                |   4 +-
 include/uapi/linux/kvm.h                          |   3 +
 tools/testing/selftests/kvm/Makefile              |   1 +
 tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++++++
 11 files changed, 600 insertions(+), 209 deletions(-)

Comments

David Woodhouse April 19, 2024, 12:52 p.m. UTC | #1
On Thu, 2024-04-18 at 20:34 +0100, David Woodhouse wrote:
> 
>       KVM: x86: Remove periodic global clock updates
>       KVM: x86: Kill KVM_REQ_GLOBAL_CLOCK_UPDATE

Meh, I might have to put those back. They were originally introduced to
cope with NTP frequency skew which is no longer a problem, but we now
know there's a systemic skew even between the host CLOCK_MONOTONIC_RAW
and the KVM clock as calculated via the guest's TSC.

So at least when !ka->use_master_clock I think the forced resync does
need to happen.