xen/time: fix system_time for vtsc=1 PV guests

Message ID	alpine.DEB.2.10.1604211359220.23110@sstabellini-ThinkPad-X260 (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> Date: Thu, 21 Apr 2016 14:29:35 +0100 (BST) From: Stefano Stabellini <sstabellini@kernel.org> To: xen-devel@lists.xenproject.org Message-ID: <alpine.DEB.2.10.1604211359220.23110@sstabellini-ThinkPad-X260> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Cc: andrew.cooper3@citrix.com, sstabellini@kernel.org, jbeulich@suse.com Subject: [Xen-devel] [PATCH] xen/time: fix system_time for vtsc=1 PV guests Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Message ID

alpine.DEB.2.10.1604211359220.23110@sstabellini-ThinkPad-X260 (mailing list archive)

State

New, archived

Headers

Date: Thu, 21 Apr 2016 14:29:35 +0100 (BST)
From: Stefano Stabellini <sstabellini@kernel.org>
To: xen-devel@lists.xenproject.org
Message-ID: <alpine.DEB.2.10.1604211359220.23110@sstabellini-ThinkPad-X260>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Cc: andrew.cooper3@citrix.com, sstabellini@kernel.org, jbeulich@suse.com
Subject: [Xen-devel] [PATCH] xen/time: fix system_time for vtsc=1 PV guests
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Commit Message

Stefano Stabellini April 21, 2016, 1:29 p.m. UTC

For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
vcpu_time_info, is calculated from stime_local_stamp using
gtime_to_gtsc.

However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
actually happen when gtime_to_gtsc is called passing stime_local_stamp
(the caller function is __update_vcpu_system_time).

In that case the pvclock protocol doesn't work properly and the guest is
unable to calculate the system time correctly. As a consequence when the
guest tries to set a timer event (for example calling the
VCPUOP_set_singleshot_timer hypercall), the event will be in the past
causing Linux to hang.

The purpose of the pvclock protocol is to allow the guest to calculate
the system_time in nanosec correctly. The guest calculates as follow:

  from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time

Given that with vtsc=1:
  rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
  vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset)

The expression evaluates to NOW(), which is what we want.
However when stime_local_stamp < vtsc_offset,
vcpu_time_info.tsc_timestamp is actually 0, because it cannot be
negative (the field is uint64_t). As a consequence the calculated
overall system_time is not correct.

This patch fixes the issue by passing vtsc_offset as system_time when
vcpu_time_info.tsc_timestamp is 0:
  rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
  vcpu_time_info.tsc_timestamp = 0
  vcpu_time_info.system_time = vtsc_offset

The pvclock expression evaluates to NOW(), which is what we want.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

Comments

Jan Beulich April 22, 2016, 9:21 a.m. UTC | #1

>>> On 21.04.16 at 15:29, <sstabellini@kernel.org> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -784,7 +784,7 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>      struct cpu_time       *t;
>      struct vcpu_time_info *u, _u = {};
>      struct domain *d = v->domain;
> -    s_time_t tsc_stamp;
> +    s_time_t stime_stamp, tsc_stamp = 0;

I don't see why the initializer needs adding here.

> @@ -807,7 +808,11 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>                  tsc_stamp = -gtime_to_gtsc(d, -stime);
>          }
>          else
> +        {
>              tsc_stamp = gtime_to_gtsc(d, stime);
> +            if (!tsc_stamp)

Coding style.

> +                stime_stamp = d->arch.vtsc_offset;
> +        }

While I can see this being the right thing for getting the two stamps
in sync, is that really helping the guest? Time ought to be not moving
forward until getting past vtsc_offset afaict, and that can't be good.
I.e. it would seem to me that it's gtime_to_gtsc() that needs
adjustment to properly deal with time < d->arch.vtsc_offset.

Plus I can't see why, in the worst case, the gTSC value can't be
wrapped through zero into negative (or really huge positive) range:
Such TSC values are certainly not invalid, and guests shouldn't really
make assumptions on TSC values being in the small positive range
when they boot.

Also, looking at all the involved code, I once again wonder whether
all the is_hvm_*() there shouldn't be has_hvm_container_*().

Jan

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 687e39b..27b0e5c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -784,7 +784,7 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
     struct cpu_time       *t;
     struct vcpu_time_info *u, _u = {};
     struct domain *d = v->domain;
-    s_time_t tsc_stamp;
+    s_time_t stime_stamp, tsc_stamp = 0;
 
     if ( v->vcpu_info == NULL )
         return;
@@ -792,6 +792,7 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
     t = &this_cpu(cpu_time);
     u = &vcpu_info(v, time);
 
+    stime_stamp = t->stime_local_stamp;
     if ( d->arch.vtsc )
     {
         s_time_t stime = t->stime_local_stamp;
@@ -807,7 +808,11 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
                 tsc_stamp = -gtime_to_gtsc(d, -stime);
         }
         else
+        {
             tsc_stamp = gtime_to_gtsc(d, stime);
+            if (!tsc_stamp)
+                stime_stamp = d->arch.vtsc_offset;
+        }
 
         _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
         _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
@@ -829,7 +834,7 @@  static void __update_vcpu_system_time(struct vcpu *v, int force)
     }
 
     _u.tsc_timestamp = tsc_stamp;
-    _u.system_time   = t->stime_local_stamp;
+    _u.system_time   = stime_stamp;
 
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;

xen/time: fix system_time for vtsc=1 PV guests

Commit Message

Comments

Patch