diff mbox

xen/time: fix system_time for vtsc=1 PV guests

Message ID alpine.DEB.2.10.1604211359220.23110@sstabellini-ThinkPad-X260 (mailing list archive)
State New, archived
Headers show

Commit Message

Stefano Stabellini April 21, 2016, 1:29 p.m. UTC
For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
vcpu_time_info, is calculated from stime_local_stamp using
gtime_to_gtsc.

However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
actually happen when gtime_to_gtsc is called passing stime_local_stamp
(the caller function is __update_vcpu_system_time).

In that case the pvclock protocol doesn't work properly and the guest is
unable to calculate the system time correctly. As a consequence when the
guest tries to set a timer event (for example calling the
VCPUOP_set_singleshot_timer hypercall), the event will be in the past
causing Linux to hang.

The purpose of the pvclock protocol is to allow the guest to calculate
the system_time in nanosec correctly. The guest calculates as follow:

  from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time

Given that with vtsc=1:
  rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
  vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset)

The expression evaluates to NOW(), which is what we want.
However when stime_local_stamp < vtsc_offset,
vcpu_time_info.tsc_timestamp is actually 0, because it cannot be
negative (the field is uint64_t). As a consequence the calculated
overall system_time is not correct.

This patch fixes the issue by passing vtsc_offset as system_time when
vcpu_time_info.tsc_timestamp is 0:
  rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
  vcpu_time_info.tsc_timestamp = 0
  vcpu_time_info.system_time = vtsc_offset

The pvclock expression evaluates to NOW(), which is what we want.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

Comments

Jan Beulich April 22, 2016, 9:21 a.m. UTC | #1
>>> On 21.04.16 at 15:29, <sstabellini@kernel.org> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -784,7 +784,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>      struct cpu_time       *t;
>      struct vcpu_time_info *u, _u = {};
>      struct domain *d = v->domain;
> -    s_time_t tsc_stamp;
> +    s_time_t stime_stamp, tsc_stamp = 0;

I don't see why the initializer needs adding here.

> @@ -807,7 +808,11 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>                  tsc_stamp = -gtime_to_gtsc(d, -stime);
>          }
>          else
> +        {
>              tsc_stamp = gtime_to_gtsc(d, stime);
> +            if (!tsc_stamp)

Coding style.

> +                stime_stamp = d->arch.vtsc_offset;
> +        }

While I can see this being the right thing for getting the two stamps
in sync, is that really helping the guest? Time ought to be not moving
forward until getting past vtsc_offset afaict, and that can't be good.
I.e. it would seem to me that it's gtime_to_gtsc() that needs
adjustment to properly deal with time < d->arch.vtsc_offset.

Plus I can't see why, in the worst case, the gTSC value can't be
wrapped through zero into negative (or really huge positive) range:
Such TSC values are certainly not invalid, and guests shouldn't really
make assumptions on TSC values being in the small positive range
when they boot.

Also, looking at all the involved code, I once again wonder whether
all the is_hvm_*() there shouldn't be has_hvm_container_*().

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 687e39b..27b0e5c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -784,7 +784,7 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
     struct cpu_time       *t;
     struct vcpu_time_info *u, _u = {};
     struct domain *d = v->domain;
-    s_time_t tsc_stamp;
+    s_time_t stime_stamp, tsc_stamp = 0;
 
     if ( v->vcpu_info == NULL )
         return;
@@ -792,6 +792,7 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
     t = &this_cpu(cpu_time);
     u = &vcpu_info(v, time);
 
+    stime_stamp = t->stime_local_stamp;
     if ( d->arch.vtsc )
     {
         s_time_t stime = t->stime_local_stamp;
@@ -807,7 +808,11 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
                 tsc_stamp = -gtime_to_gtsc(d, -stime);
         }
         else
+        {
             tsc_stamp = gtime_to_gtsc(d, stime);
+            if (!tsc_stamp)
+                stime_stamp = d->arch.vtsc_offset;
+        }
 
         _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
         _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
@@ -829,7 +834,7 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
     }
 
     _u.tsc_timestamp = tsc_stamp;
-    _u.system_time   = t->stime_local_stamp;
+    _u.system_time   = stime_stamp;
 
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;