diff mbox series

[1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls

Message ID 20200710200743.3992127-1-oupton@google.com (mailing list archive)
State New, archived
Headers show
Series [1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls | expand

Commit Message

Oliver Upton July 10, 2020, 8:07 p.m. UTC
From: Peter Hornyack <peterhornyack@google.com>

The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
for determining when userspace is attempting to synchronize TSCs.
Instead of guessing at userspace's intentions in the kernel, directly
expose control of the TSC offset field to userspace such that userspace
may deliberately synchronize the guest TSCs.

Note that TSC offset support is mandatory for KVM on both SVM and VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Peter Hornyack <peterhornyack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  5 +++++
 3 files changed, 60 insertions(+)

Comments

Paolo Bonzini July 10, 2020, 8:38 p.m. UTC | #1
On 10/07/20 22:07, Oliver Upton wrote:
> From: Peter Hornyack <peterhornyack@google.com>
> 
> The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
> for determining when userspace is attempting to synchronize TSCs.
> Instead of guessing at userspace's intentions in the kernel, directly
> expose control of the TSC offset field to userspace such that userspace
> may deliberately synchronize the guest TSCs.
> 
> Note that TSC offset support is mandatory for KVM on both SVM and VMX.
> 
> Reviewed-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Peter Hornyack <peterhornyack@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h       |  5 +++++
>  3 files changed, 60 insertions(+)

Needless to say, a patch that comes with tests starts on the fast lane.
 But I have a fundamental question that isn't answered by either the
test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
practice_ by a VMM?

Thanks,

Paolo
Jim Mattson July 10, 2020, 10:09 p.m. UTC | #2
On Fri, Jul 10, 2020 at 1:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/07/20 22:07, Oliver Upton wrote:
> > From: Peter Hornyack <peterhornyack@google.com>
> >
> > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
> > for determining when userspace is attempting to synchronize TSCs.
> > Instead of guessing at userspace's intentions in the kernel, directly
> > expose control of the TSC offset field to userspace such that userspace
> > may deliberately synchronize the guest TSCs.
> >
> > Note that TSC offset support is mandatory for KVM on both SVM and VMX.
> >
> > Reviewed-by: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Peter Hornyack <peterhornyack@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h       |  5 +++++
> >  3 files changed, 60 insertions(+)
>
> Needless to say, a patch that comes with tests starts on the fast lane.
>  But I have a fundamental question that isn't answered by either the
> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
> practice_ by a VMM?

One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
former.

This isn't the only undocumented dependency among the various
KVM_SET_* calls, but I agree that it would be helpful to document it.
Oliver Upton July 10, 2020, 10:37 p.m. UTC | #3
On Fri, Jul 10, 2020 at 3:09 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Jul 10, 2020 at 1:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 10/07/20 22:07, Oliver Upton wrote:
> > > From: Peter Hornyack <peterhornyack@google.com>
> > >
> > > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
> > > for determining when userspace is attempting to synchronize TSCs.
> > > Instead of guessing at userspace's intentions in the kernel, directly
> > > expose control of the TSC offset field to userspace such that userspace
> > > may deliberately synchronize the guest TSCs.
> > >
> > > Note that TSC offset support is mandatory for KVM on both SVM and VMX.
> > >
> > > Reviewed-by: Jim Mattson <jmattson@google.com>
> > > Signed-off-by: Peter Hornyack <peterhornyack@google.com>
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > ---
> > >  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> > >  arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
> > >  include/uapi/linux/kvm.h       |  5 +++++
> > >  3 files changed, 60 insertions(+)
> >
> > Needless to say, a patch that comes with tests starts on the fast lane.
> >  But I have a fundamental question that isn't answered by either the
> > test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
> > practice_ by a VMM?
>
> One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
> one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
> former.
>
> This isn't the only undocumented dependency among the various
> KVM_SET_* calls, but I agree that it would be helpful to document it.

Sorry, I was AFK for a bit but it seems Jim beat me to the punch :-)

I'd be glad to add a bit of color on this to the documentation.
Paolo Bonzini July 10, 2020, 10:40 p.m. UTC | #4
On 11/07/20 00:09, Jim Mattson wrote:
>>  But I have a fundamental question that isn't answered by either the
>> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
>> practice_ by a VMM?
>
> One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
> one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
> former.

Other questions:

1) how do you handle non-synchronized TSC between source and destination?

2) how is KVM_REQ_MASTERCLOCK_UPDATE triggered, since that's the main
function of the various heuristics?

Paolo
Oliver Upton July 13, 2020, 7:54 p.m. UTC | #5
On Fri, Jul 10, 2020 at 3:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 11/07/20 00:09, Jim Mattson wrote:
> >>  But I have a fundamental question that isn't answered by either the
> >> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
> >> practice_ by a VMM?
> >
> > One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
> > one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
> > former.
>
> Other questions:
>
> 1) how do you handle non-synchronized TSC between source and destination?
>

On the source side we record a host TSC reading establishing the start
of migration, along with the TSC offsets. We recalculate new offsets
that compensate for the elapsed time + TSC difference between
source/target when restoring the VM.

> 2) how is KVM_REQ_MASTERCLOCK_UPDATE triggered, since that's the main
> function of the various heuristics?

We initialize all TSCs to 0 at VM creation, then restore offsets (if
any). Should the offsets be out of phase, we write a garbage value to
the TSC MSR to break the vCPU matching, then set the offset. Not the
most elegant solution, it'd be better if the {GET,SET}_TSC_OFFSET
ioctls instrumented this directly.

>
> Paolo
>
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 644e5326aa50..ef1fc109b901 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4701,6 +4701,33 @@  KVM_PV_VM_VERIFY
   KVM is allowed to start protected VCPUs.
 
 
+4.126 KVM_GET_TSC_OFFSET
+------------------------
+
+:Capability: KVM_CAP_TSC_OFFSET
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: __u64 (out)
+:Returns: 0 on success, < 0 on error
+
+This ioctl gets the TSC offset field. The offset is returned as an
+unsigned value, though it is interpreted as a signed value by hardware.
+
+
+4.127 KVM_SET_TSC_OFFSET
+------------------------
+
+:Capability: KVM_CAP_TSC_OFFSET
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: __u64 (in)
+:Returns: 0 on success, < 0 on error
+
+This ioctl sets the TSC offset field. The offset is represented as an
+unsigned value, though it is interpreted as a signed value by hardware.
+The guest's TSC value will change based on the written offset.
+
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e27d3db7e43f..563256ce5ade 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2026,6 +2026,11 @@  u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.l1_tsc_offset;
+}
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
@@ -3482,6 +3487,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_TIME:
 	case KVM_CAP_IOAPIC_POLARITY_IGNORED:
 	case KVM_CAP_TSC_DEADLINE_TIMER:
+	case KVM_CAP_TSC_OFFSET:
 	case KVM_CAP_DISABLE_QUIRKS:
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
@@ -4734,6 +4740,28 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_GET_TSC_OFFSET: {
+		u64 tsc_offset;
+
+		r = -EFAULT;
+		tsc_offset = kvm_vcpu_read_tsc_offset(vcpu);
+		if (copy_to_user(argp, &tsc_offset, sizeof(tsc_offset)))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_SET_TSC_OFFSET: {
+		u64 __user *tsc_offset_arg = argp;
+		u64 tsc_offset;
+
+		r = -EFAULT;
+		if (copy_from_user(&tsc_offset, tsc_offset_arg,
+				   sizeof(tsc_offset)))
+			goto out;
+		kvm_vcpu_write_tsc_offset(vcpu, tsc_offset);
+		r = 0;
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ff9b335620d0..41f387ffcd11 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1033,6 +1033,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
 #define KVM_CAP_LAST_CPU 184
+#define KVM_CAP_TSC_OFFSET 185
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1501,6 +1502,10 @@  struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+/* Available with KVM_CAP_TSC_OFFSET */
+#define KVM_GET_TSC_OFFSET	_IOR(KVMIO, 0xc5, __u64)
+#define KVM_SET_TSC_OFFSET	_IOW(KVMIO, 0xc6, __u64)
+
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;