Message ID | 20200710200743.3992127-1-oupton@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls | expand |
On 10/07/20 22:07, Oliver Upton wrote: > From: Peter Hornyack <peterhornyack@google.com> > > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics > for determining when userspace is attempting to synchronize TSCs. > Instead of guessing at userspace's intentions in the kernel, directly > expose control of the TSC offset field to userspace such that userspace > may deliberately synchronize the guest TSCs. > > Note that TSC offset support is mandatory for KVM on both SVM and VMX. > > Reviewed-by: Jim Mattson <jmattson@google.com> > Signed-off-by: Peter Hornyack <peterhornyack@google.com> > Signed-off-by: Oliver Upton <oupton@google.com> > --- > Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ > arch/x86/kvm/x86.c | 28 ++++++++++++++++++++++++++++ > include/uapi/linux/kvm.h | 5 +++++ > 3 files changed, 60 insertions(+) Needless to say, a patch that comes with tests starts on the fast lane. But I have a fundamental question that isn't answered by either the test or the documentation: how should KVM_SET_TSC_OFFSET be used _in practice_ by a VMM? Thanks, Paolo
On Fri, Jul 10, 2020 at 1:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 10/07/20 22:07, Oliver Upton wrote: > > From: Peter Hornyack <peterhornyack@google.com> > > > > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics > > for determining when userspace is attempting to synchronize TSCs. > > Instead of guessing at userspace's intentions in the kernel, directly > > expose control of the TSC offset field to userspace such that userspace > > may deliberately synchronize the guest TSCs. > > > > Note that TSC offset support is mandatory for KVM on both SVM and VMX. > > > > Reviewed-by: Jim Mattson <jmattson@google.com> > > Signed-off-by: Peter Hornyack <peterhornyack@google.com> > > Signed-off-by: Oliver Upton <oupton@google.com> > > --- > > Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ > > arch/x86/kvm/x86.c | 28 ++++++++++++++++++++++++++++ > > include/uapi/linux/kvm.h | 5 +++++ > > 3 files changed, 60 insertions(+) > > Needless to say, a patch that comes with tests starts on the fast lane. > But I have a fundamental question that isn't answered by either the > test or the documentation: how should KVM_SET_TSC_OFFSET be used _in > practice_ by a VMM? One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the former. This isn't the only undocumented dependency among the various KVM_SET_* calls, but I agree that it would be helpful to document it.
On Fri, Jul 10, 2020 at 3:09 PM Jim Mattson <jmattson@google.com> wrote: > > On Fri, Jul 10, 2020 at 1:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > On 10/07/20 22:07, Oliver Upton wrote: > > > From: Peter Hornyack <peterhornyack@google.com> > > > > > > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics > > > for determining when userspace is attempting to synchronize TSCs. > > > Instead of guessing at userspace's intentions in the kernel, directly > > > expose control of the TSC offset field to userspace such that userspace > > > may deliberately synchronize the guest TSCs. > > > > > > Note that TSC offset support is mandatory for KVM on both SVM and VMX. > > > > > > Reviewed-by: Jim Mattson <jmattson@google.com> > > > Signed-off-by: Peter Hornyack <peterhornyack@google.com> > > > Signed-off-by: Oliver Upton <oupton@google.com> > > > --- > > > Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ > > > arch/x86/kvm/x86.c | 28 ++++++++++++++++++++++++++++ > > > include/uapi/linux/kvm.h | 5 +++++ > > > 3 files changed, 60 insertions(+) > > > > Needless to say, a patch that comes with tests starts on the fast lane. > > But I have a fundamental question that isn't answered by either the > > test or the documentation: how should KVM_SET_TSC_OFFSET be used _in > > practice_ by a VMM? > > One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or > one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the > former. > > This isn't the only undocumented dependency among the various > KVM_SET_* calls, but I agree that it would be helpful to document it. Sorry, I was AFK for a bit but it seems Jim beat me to the punch :-) I'd be glad to add a bit of color on this to the documentation.
On 11/07/20 00:09, Jim Mattson wrote: >> But I have a fundamental question that isn't answered by either the >> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in >> practice_ by a VMM? > > One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or > one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the > former. Other questions: 1) how do you handle non-synchronized TSC between source and destination? 2) how is KVM_REQ_MASTERCLOCK_UPDATE triggered, since that's the main function of the various heuristics? Paolo
On Fri, Jul 10, 2020 at 3:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 11/07/20 00:09, Jim Mattson wrote: > >> But I have a fundamental question that isn't answered by either the > >> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in > >> practice_ by a VMM? > > > > One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or > > one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the > > former. > > Other questions: > > 1) how do you handle non-synchronized TSC between source and destination? > On the source side we record a host TSC reading establishing the start of migration, along with the TSC offsets. We recalculate new offsets that compensate for the elapsed time + TSC difference between source/target when restoring the VM. > 2) how is KVM_REQ_MASTERCLOCK_UPDATE triggered, since that's the main > function of the various heuristics? We initialize all TSCs to 0 at VM creation, then restore offsets (if any). Should the offsets be out of phase, we write a garbage value to the TSC MSR to break the vCPU matching, then set the offset. Not the most elegant solution, it'd be better if the {GET,SET}_TSC_OFFSET ioctls instrumented this directly. > > Paolo >
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 644e5326aa50..ef1fc109b901 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4701,6 +4701,33 @@ KVM_PV_VM_VERIFY KVM is allowed to start protected VCPUs. +4.126 KVM_GET_TSC_OFFSET +------------------------ + +:Capability: KVM_CAP_TSC_OFFSET +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: __u64 (out) +:Returns: 0 on success, < 0 on error + +This ioctl gets the TSC offset field. The offset is returned as an +unsigned value, though it is interpreted as a signed value by hardware. + + +4.127 KVM_SET_TSC_OFFSET +------------------------ + +:Capability: KVM_CAP_TSC_OFFSET +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: __u64 (in) +:Returns: 0 on success, < 0 on error + +This ioctl sets the TSC offset field. The offset is represented as an +unsigned value, though it is interpreted as a signed value by hardware. +The guest's TSC value will change based on the written offset. + + 5. The kvm_run structure ======================== diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e27d3db7e43f..563256ce5ade 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2026,6 +2026,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) } EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); +static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.l1_tsc_offset; +} + static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) { vcpu->arch.l1_tsc_offset = offset; @@ -3482,6 +3487,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_HYPERV_TIME: case KVM_CAP_IOAPIC_POLARITY_IGNORED: case KVM_CAP_TSC_DEADLINE_TIMER: + case KVM_CAP_TSC_OFFSET: case KVM_CAP_DISABLE_QUIRKS: case KVM_CAP_SET_BOOT_CPU_ID: case KVM_CAP_SPLIT_IRQCHIP: @@ -4734,6 +4740,28 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = 0; break; } + case KVM_GET_TSC_OFFSET: { + u64 tsc_offset; + + r = -EFAULT; + tsc_offset = kvm_vcpu_read_tsc_offset(vcpu); + if (copy_to_user(argp, &tsc_offset, sizeof(tsc_offset))) + goto out; + r = 0; + break; + } + case KVM_SET_TSC_OFFSET: { + u64 __user *tsc_offset_arg = argp; + u64 tsc_offset; + + r = -EFAULT; + if (copy_from_user(&tsc_offset, tsc_offset_arg, + sizeof(tsc_offset))) + goto out; + kvm_vcpu_write_tsc_offset(vcpu, tsc_offset); + r = 0; + break; + } default: r = -EINVAL; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ff9b335620d0..41f387ffcd11 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1033,6 +1033,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_HALT_POLL 182 #define KVM_CAP_ASYNC_PF_INT 183 #define KVM_CAP_LAST_CPU 184 +#define KVM_CAP_TSC_OFFSET 185 #ifdef KVM_CAP_IRQ_ROUTING @@ -1501,6 +1502,10 @@ struct kvm_enc_region { #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3) #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4) +/* Available with KVM_CAP_TSC_OFFSET */ +#define KVM_GET_TSC_OFFSET _IOR(KVMIO, 0xc5, __u64) +#define KVM_SET_TSC_OFFSET _IOW(KVMIO, 0xc6, __u64) + struct kvm_s390_pv_sec_parm { __u64 origin; __u64 length;