diff mbox

[1/2] keep guest wallclock in sync with host clock

Message ID 1251805848-17451-2-git-send-email-glommer@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Glauber Costa Sept. 1, 2009, 11:50 a.m. UTC
KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
However, the current mechanism will not propagate changes in wallclock value
upwards. This effectively means that in a large pool of VMs that need accurate timing,
all of them has to run NTP, instead of just the host doing it.

Since the host updates information in the shared memory area upon msr writes,
this patch introduces a worker that writes to that msr, and calls do_settimeofday
at fixed intervals, with second resolution. A interval of 0 determines that we
are not interested in this behaviour. A later patch will make this optional at
runtime

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/kernel/kvmclock.c |   62 +++++++++++++++++++++++++++++++++++++------
 1 files changed, 53 insertions(+), 9 deletions(-)

Comments

Avi Kivity Sept. 2, 2009, 11:44 a.m. UTC | #1
On 09/01/2009 02:50 PM, Glauber Costa wrote:
> KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> However, the current mechanism will not propagate changes in wallclock value
> upwards. This effectively means that in a large pool of VMs that need accurate timing,
> all of them has to run NTP, instead of just the host doing it.
>
> Since the host updates information in the shared memory area upon msr writes,
> this patch introduces a worker that writes to that msr, and calls do_settimeofday
> at fixed intervals, with second resolution. A interval of 0 determines that we
> are not interested in this behaviour. A later patch will make this optional at
> runtime
>
> +
> +static void kvm_sync_wall_clock(struct work_struct *work)
> +{
> +	struct timespec now;
> +
> +	kvm_get_wall_ts(&now);
>    

What happens if we schedule here?

> +
> +	do_settimeofday(&now);
> +	schedule_next_update();
> +}
> +
> +static __init int init_updates(void)
> +{
> +	schedule_next_update();
> +	return 0;
> +}
> +/*
> + * It has to be run after workqueues are initialized, since we call
> + * schedule_delayed_work. Other than that, we have no specific requirements
> + */
> +late_initcall(init_updates);
>    

Should this run on bare metal too?
Glauber Costa Sept. 2, 2009, 12:21 p.m. UTC | #2
On Wed, Sep 02, 2009 at 02:44:11PM +0300, Avi Kivity wrote:
> On 09/01/2009 02:50 PM, Glauber Costa wrote:
>> KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
>> However, the current mechanism will not propagate changes in wallclock value
>> upwards. This effectively means that in a large pool of VMs that need accurate timing,
>> all of them has to run NTP, instead of just the host doing it.
>>
>> Since the host updates information in the shared memory area upon msr writes,
>> this patch introduces a worker that writes to that msr, and calls do_settimeofday
>> at fixed intervals, with second resolution. A interval of 0 determines that we
>> are not interested in this behaviour. A later patch will make this optional at
>> runtime
>>
>> +
>> +static void kvm_sync_wall_clock(struct work_struct *work)
>> +{
>> +	struct timespec now;
>> +
>> +	kvm_get_wall_ts(&now);
>>    
>
> What happens if we schedule here?
hummm, I guess disabling preemption would be enough to make us safe here?

>
>> +
>> +	do_settimeofday(&now);
>> +	schedule_next_update();
>> +}
>> +
>> +static __init int init_updates(void)
>> +{
>> +	schedule_next_update();
>> +	return 0;
>> +}
>> +/*
>> + * It has to be run after workqueues are initialized, since we call
>> + * schedule_delayed_work. Other than that, we have no specific requirements
>> + */
>> +late_initcall(init_updates);
>>    
>
> Should this run on bare metal too?
>
> -- 
> error compiling committee.c: too many arguments to function
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 2, 2009, 12:24 p.m. UTC | #3
On 09/02/2009 03:21 PM, Glauber Costa wrote:
>
>>> +static void kvm_sync_wall_clock(struct work_struct *work)
>>> +{
>>> +	struct timespec now;
>>> +
>>> +	kvm_get_wall_ts(&now);
>>>
>>>        
>> What happens if we schedule here?
>>      
> hummm, I guess disabling preemption would be enough to make us safe here?
>    

You can't prevent host preemption.  You might read kvmclock again and 
repeat if too much time has passed.
Glauber Costa Sept. 2, 2009, 12:48 p.m. UTC | #4
On Wed, Sep 02, 2009 at 03:24:26PM +0300, Avi Kivity wrote:
> On 09/02/2009 03:21 PM, Glauber Costa wrote:
>>
>>>> +static void kvm_sync_wall_clock(struct work_struct *work)
>>>> +{
>>>> +	struct timespec now;
>>>> +
>>>> +	kvm_get_wall_ts(&now);
>>>>
>>>>        
>>> What happens if we schedule here?
>>>      
>> hummm, I guess disabling preemption would be enough to make us safe here?
>>    
>
> You can't prevent host preemption.  You might read kvmclock again and  
> repeat if too much time has passed.
But then you can be scheduled after you did settimeofday, but before reading
kvmclock again. Since we're aiming for periodic adjustments here,
any discrepancies should not last long, so we can maybe live with it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Sept. 2, 2009, 12:56 p.m. UTC | #5
On 09/02/2009 03:48 PM, Glauber Costa wrote:
>
>> You can't prevent host preemption.  You might read kvmclock again and
>> repeat if too much time has passed.
>>      
> But then you can be scheduled after you did settimeofday, but before reading
> kvmclock again. Since we're aiming for periodic adjustments here,
> any discrepancies should not last long, so we can maybe live with it.
>    

do {
     read_kvmclock
     settimeofday
     read_kvmclock
} while the_difference_between_the_two_reads_is_too_large
diff mbox

Patch

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index e5efcdc..fc409e9 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@ 
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static unsigned int kvm_wall_update_interval = 5;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -39,24 +40,67 @@  early_param("no-kvmclock", parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
-/*
- * The wallclock is the time of day when we booted. Since then, some time may
- * have elapsed since the hypervisor wrote the data. So we try to account for
- * that with system time
- */
-static unsigned long kvm_get_wallclock(void)
+static void kvm_get_wall_ts(struct timespec *ts)
 {
-	struct pvclock_vcpu_time_info *vcpu_time;
-	struct timespec ts;
 	int low, high;
+	struct pvclock_vcpu_time_info *vcpu_time;
 
 	low = (int)__pa_symbol(&wall_clock);
 	high = ((u64)__pa_symbol(&wall_clock) >> 32);
 	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
 
 	vcpu_time = &get_cpu_var(hv_clock);
-	pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
+	pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
 	put_cpu_var(hv_clock);
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work);
+static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock);
+
+static void schedule_next_update(void)
+{
+	struct timespec next;
+
+	if (kvm_wall_update_interval == 0)
+		return;
+
+	next.tv_sec = kvm_wall_update_interval;
+	next.tv_nsec = 0;
+
+	schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next));
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work)
+{
+	struct timespec now;
+
+	kvm_get_wall_ts(&now);
+
+	do_settimeofday(&now);
+	schedule_next_update();
+}
+
+static __init int init_updates(void)
+{
+	schedule_next_update();
+	return 0;
+}
+/*
+ * It has to be run after workqueues are initialized, since we call
+ * schedule_delayed_work. Other than that, we have no specific requirements
+ */
+late_initcall(init_updates);
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that with system time
+ */
+static unsigned long kvm_get_wallclock(void)
+{
+	struct timespec ts;
+
+	kvm_get_wall_ts(&ts);
 
 	return ts.tv_sec;
 }