Message ID | Y8no/eAQi0QkIJqa@tpad (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: add bit to indicate correct tsc_shift | expand |
On 1/20/23 02:06, Marcelo Tosatti wrote: > Before commit 78db6a5037965429c04d708281f35a6e5562d31b, > kvm_guest_time_update() would use vcpu->virtual_tsc_khz to calculate > tsc_shift value in the vcpus pvclock structure written to guest memory. > > For those kernels, if vcpu->virtual_tsc_khz != tsc_khz (which can be the > case when guest state is restored via migration, or if tsc-khz option is > passed to QEMU), and TSC scaling is not enabled (which happens if the > difference between the frequency requested via KVM_SET_TSC_KHZ and the > host TSC KHZ is smaller than 250ppm), then there can be a difference > between what KVM_GET_CLOCK would return and what the guest reads as > kvmclock value. I don't think it's justifiable to further complicate the userspace API for a bug that's been fixed six years ago. I'd be very surprised if any combination of modern upstream {QEMU,kernel} is going to do a successful migration from such an old {QEMU,kernel}. RHEL/CentOS are able to do so because *specific pairs* have been tested, but as far as upstream is concerned this adds complexity that absolutely no one will use. I replied on the QEMU patch with further suggestions. Paolo
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6aaae18f1854..61c5876cfd87 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2160,7 +2160,8 @@ static inline int kvm_cpu_get_apicid(int mps_cpu) int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); #define KVM_CLOCK_VALID_FLAGS \ - (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC | \ + KVM_CLOCK_CORRECT_TSC_SHIFT) #define KVM_X86_VALID_QUIRKS \ (KVM_X86_QUIRK_LINT0_REENABLED | \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index da4bbd043a7b..142a53d06100 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3014,6 +3014,8 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; } + data->flags |= KVM_CLOCK_CORRECT_TSC_SHIFT; + put_cpu(); } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..7b94d7bd03a6 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1301,6 +1301,9 @@ struct kvm_irqfd { #define KVM_CLOCK_TSC_STABLE 2 #define KVM_CLOCK_REALTIME (1 << 2) #define KVM_CLOCK_HOST_TSC (1 << 3) +/* whether tsc_shift as seen by the guest matches guest visible TSC */ +/* This is true since commit 78db6a5037965429c04d708281f35a6e5562d31b */ +#define KVM_CLOCK_CORRECT_TSC_SHIFT (1 << 4) struct kvm_clock_data { __u64 clock;
Before commit 78db6a5037965429c04d708281f35a6e5562d31b, kvm_guest_time_update() would use vcpu->virtual_tsc_khz to calculate tsc_shift value in the vcpus pvclock structure written to guest memory. For those kernels, if vcpu->virtual_tsc_khz != tsc_khz (which can be the case when guest state is restored via migration, or if tsc-khz option is passed to QEMU), and TSC scaling is not enabled (which happens if the difference between the frequency requested via KVM_SET_TSC_KHZ and the host TSC KHZ is smaller than 250ppm), then there can be a difference between what KVM_GET_CLOCK would return and what the guest reads as kvmclock value. When KVM_SET_CLOCK'ing what is read with KVM_GET_CLOCK, the guest can observe a forward or backwards time jump. Advertise to userspace that current kernel contains this fix, so QEMU can workaround the problem by reading pvclock via guest memory directly otherwise. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>