diff mbox

xen/time: fix gtime_to_gtsc for vtsc=1 PV guests

Message ID alpine.DEB.2.10.1604251211370.24872@sstabellini-ThinkPad-X260 (mailing list archive)
State New, archived
Headers show

Commit Message

Stefano Stabellini April 25, 2016, 11:18 a.m. UTC
From: Jan Beulich <JBeulich@suse.com>

For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
vcpu_time_info, is calculated from stime_local_stamp using
gtime_to_gtsc.

However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
actually happen when gtime_to_gtsc is called passing stime_local_stamp
(the caller function is __update_vcpu_system_time).

In that case the pvclock protocol doesn't work properly and the guest is
unable to calculate the system time correctly. As a consequence when the
guest tries to set a timer event (for example calling the
VCPUOP_set_singleshot_timer hypercall), the event will be in the past
causing Linux to hang.

The purpose of the pvclock protocol is to allow the guest to calculate
the system_time in nanosec correctly. The guest calculates as follow:

  from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time

Given that with vtsc=1:
  rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
  vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset)

The expression evaluates to NOW(), which is what we want.  However when
stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is
actually 0. As a consequence the calculated overall system_time is not
correct.

This patch fixes the issue by letting gtime_to_gtsc return a negative
integer in the form of a wrapped around unsigned integer, thus when the
guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate
the right value.

Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

Comments

Jan Beulich April 25, 2016, 12:05 p.m. UTC | #1
>>> On 25.04.16 at 13:18, <sstabellini@kernel.org> wrote:
> From: Jan Beulich <JBeulich@suse.com>
> 
> For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
> using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
> vcpu_time_info, is calculated from stime_local_stamp using
> gtime_to_gtsc.
> 
> However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
> actually happen when gtime_to_gtsc is called passing stime_local_stamp
> (the caller function is __update_vcpu_system_time).
> 
> In that case the pvclock protocol doesn't work properly and the guest is
> unable to calculate the system time correctly. As a consequence when the
> guest tries to set a timer event (for example calling the
> VCPUOP_set_singleshot_timer hypercall), the event will be in the past
> causing Linux to hang.
> 
> The purpose of the pvclock protocol is to allow the guest to calculate
> the system_time in nanosec correctly. The guest calculates as follow:
> 
>   from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + 
> vcpu_time_info.system_time
> 
> Given that with vtsc=1:
>   rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
>   vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - 
> vtsc_offset)
> 
> The expression evaluates to NOW(), which is what we want.  However when
> stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is
> actually 0. As a consequence the calculated overall system_time is not
> correct.
> 
> This patch fixes the issue by letting gtime_to_gtsc return a negative
> integer in the form of a wrapped around unsigned integer, thus when the
> guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate
> the right value.
> 
> Signed-off-by: Jan Beulich <JBeulich@suse.com>
> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

Assuming you mean for this to go into 4.7, I've added Wei to Cc
(and you should do so in case of re-submission).

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1663,7 +1663,13 @@ custom_param("tsc", tsc_parse);
>  u64 gtime_to_gtsc(struct domain *d, u64 time)
>  {
>      if ( !is_hvm_domain(d) )
> +    {
>          time = max_t(s64, time - d->arch.vtsc_offset, 0);

This line should have been deleted. While I'd be happy to do this
while committing, its presence raises the question of whether
things actually work as expected.

Jan

> +        if ( time < d->arch.vtsc_offset )
> +            return -scale_delta(d->arch.vtsc_offset - time,
> +                                &d->arch.ns_to_vtsc);
> +        time -= d->arch.vtsc_offset;
> +    }
>      return scale_delta(time, &d->arch.ns_to_vtsc);
>  }
>
Wei Liu April 25, 2016, 1:28 p.m. UTC | #2
On Mon, Apr 25, 2016 at 06:05:50AM -0600, Jan Beulich wrote:
> >>> On 25.04.16 at 13:18, <sstabellini@kernel.org> wrote:
> > From: Jan Beulich <JBeulich@suse.com>
> > 
> > For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
> > using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
> > vcpu_time_info, is calculated from stime_local_stamp using
> > gtime_to_gtsc.
> > 
> > However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
> > actually happen when gtime_to_gtsc is called passing stime_local_stamp
> > (the caller function is __update_vcpu_system_time).
> > 
> > In that case the pvclock protocol doesn't work properly and the guest is
> > unable to calculate the system time correctly. As a consequence when the
> > guest tries to set a timer event (for example calling the
> > VCPUOP_set_singleshot_timer hypercall), the event will be in the past
> > causing Linux to hang.
> > 
> > The purpose of the pvclock protocol is to allow the guest to calculate
> > the system_time in nanosec correctly. The guest calculates as follow:
> > 
> >   from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + 
> > vcpu_time_info.system_time
> > 
> > Given that with vtsc=1:
> >   rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
> >   vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - 
> > vtsc_offset)
> > 
> > The expression evaluates to NOW(), which is what we want.  However when
> > stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is
> > actually 0. As a consequence the calculated overall system_time is not
> > correct.
> > 
> > This patch fixes the issue by letting gtime_to_gtsc return a negative
> > integer in the form of a wrapped around unsigned integer, thus when the
> > guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate
> > the right value.
> > 
> > Signed-off-by: Jan Beulich <JBeulich@suse.com>
> > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Assuming you mean for this to go into 4.7, I've added Wei to Cc
> (and you should do so in case of re-submission).
> 

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

(only skimmed the commit message)
Andrew Cooper April 28, 2016, 12:49 p.m. UTC | #3
On 25/04/16 12:18, Stefano Stabellini wrote:
> From: Jan Beulich <JBeulich@suse.com>
>
> For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
> using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
> vcpu_time_info, is calculated from stime_local_stamp using
> gtime_to_gtsc.
>
> However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
> actually happen when gtime_to_gtsc is called passing stime_local_stamp
> (the caller function is __update_vcpu_system_time).
>
> In that case the pvclock protocol doesn't work properly and the guest is
> unable to calculate the system time correctly. As a consequence when the
> guest tries to set a timer event (for example calling the
> VCPUOP_set_singleshot_timer hypercall), the event will be in the past
> causing Linux to hang.
>
> The purpose of the pvclock protocol is to allow the guest to calculate
> the system_time in nanosec correctly. The guest calculates as follow:
>
>   from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time
>
> Given that with vtsc=1:
>   rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
>   vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset)
>
> The expression evaluates to NOW(), which is what we want.  However when
> stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is
> actually 0. As a consequence the calculated overall system_time is not
> correct.
>
> This patch fixes the issue by letting gtime_to_gtsc return a negative
> integer in the form of a wrapped around unsigned integer, thus when the
> guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate
> the right value.
>
> Signed-off-by: Jan Beulich <JBeulich@suse.com>
> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff mbox

Patch

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 687e39b..6a77a90 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1663,7 +1663,13 @@  custom_param("tsc", tsc_parse);
 u64 gtime_to_gtsc(struct domain *d, u64 time)
 {
     if ( !is_hvm_domain(d) )
+    {
         time = max_t(s64, time - d->arch.vtsc_offset, 0);
+        if ( time < d->arch.vtsc_offset )
+            return -scale_delta(d->arch.vtsc_offset - time,
+                                &d->arch.ns_to_vtsc);
+        time -= d->arch.vtsc_offset;
+    }
     return scale_delta(time, &d->arch.ns_to_vtsc);
 }