Message ID | alpine.DEB.2.10.1604211359220.23110@sstabellini-ThinkPad-X260 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 21.04.16 at 15:29, <sstabellini@kernel.org> wrote: > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -784,7 +784,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) > struct cpu_time *t; > struct vcpu_time_info *u, _u = {}; > struct domain *d = v->domain; > - s_time_t tsc_stamp; > + s_time_t stime_stamp, tsc_stamp = 0; I don't see why the initializer needs adding here. > @@ -807,7 +808,11 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) > tsc_stamp = -gtime_to_gtsc(d, -stime); > } > else > + { > tsc_stamp = gtime_to_gtsc(d, stime); > + if (!tsc_stamp) Coding style. > + stime_stamp = d->arch.vtsc_offset; > + } While I can see this being the right thing for getting the two stamps in sync, is that really helping the guest? Time ought to be not moving forward until getting past vtsc_offset afaict, and that can't be good. I.e. it would seem to me that it's gtime_to_gtsc() that needs adjustment to properly deal with time < d->arch.vtsc_offset. Plus I can't see why, in the worst case, the gTSC value can't be wrapped through zero into negative (or really huge positive) range: Such TSC values are certainly not invalid, and guests shouldn't really make assumptions on TSC values being in the small positive range when they boot. Also, looking at all the involved code, I once again wonder whether all the is_hvm_*() there shouldn't be has_hvm_container_*(). Jan
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 687e39b..27b0e5c 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -784,7 +784,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) struct cpu_time *t; struct vcpu_time_info *u, _u = {}; struct domain *d = v->domain; - s_time_t tsc_stamp; + s_time_t stime_stamp, tsc_stamp = 0; if ( v->vcpu_info == NULL ) return; @@ -792,6 +792,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) t = &this_cpu(cpu_time); u = &vcpu_info(v, time); + stime_stamp = t->stime_local_stamp; if ( d->arch.vtsc ) { s_time_t stime = t->stime_local_stamp; @@ -807,7 +808,11 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) tsc_stamp = -gtime_to_gtsc(d, -stime); } else + { tsc_stamp = gtime_to_gtsc(d, stime); + if (!tsc_stamp) + stime_stamp = d->arch.vtsc_offset; + } _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac; _u.tsc_shift = d->arch.vtsc_to_ns.shift; @@ -829,7 +834,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) } _u.tsc_timestamp = tsc_stamp; - _u.system_time = t->stime_local_stamp; + _u.system_time = stime_stamp; if ( is_hvm_domain(d) ) _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time() using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct vcpu_time_info, is calculated from stime_local_stamp using gtime_to_gtsc. However gtime_to_gtsc can return 0, if time < vtsc_offset, which can actually happen when gtime_to_gtsc is called passing stime_local_stamp (the caller function is __update_vcpu_system_time). In that case the pvclock protocol doesn't work properly and the guest is unable to calculate the system time correctly. As a consequence when the guest tries to set a timer event (for example calling the VCPUOP_set_singleshot_timer hypercall), the event will be in the past causing Linux to hang. The purpose of the pvclock protocol is to allow the guest to calculate the system_time in nanosec correctly. The guest calculates as follow: from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time Given that with vtsc=1: rdtsc = to_vtsc_scale(NOW() - vtsc_offset) vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset) The expression evaluates to NOW(), which is what we want. However when stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is actually 0, because it cannot be negative (the field is uint64_t). As a consequence the calculated overall system_time is not correct. This patch fixes the issue by passing vtsc_offset as system_time when vcpu_time_info.tsc_timestamp is 0: rdtsc = to_vtsc_scale(NOW() - vtsc_offset) vcpu_time_info.tsc_timestamp = 0 vcpu_time_info.system_time = vtsc_offset The pvclock expression evaluates to NOW(), which is what we want. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>