Message ID | alpine.DEB.2.10.1604251211370.24872@sstabellini-ThinkPad-X260 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 25.04.16 at 13:18, <sstabellini@kernel.org> wrote: > From: Jan Beulich <JBeulich@suse.com> > > For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time() > using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct > vcpu_time_info, is calculated from stime_local_stamp using > gtime_to_gtsc. > > However gtime_to_gtsc can return 0, if time < vtsc_offset, which can > actually happen when gtime_to_gtsc is called passing stime_local_stamp > (the caller function is __update_vcpu_system_time). > > In that case the pvclock protocol doesn't work properly and the guest is > unable to calculate the system time correctly. As a consequence when the > guest tries to set a timer event (for example calling the > VCPUOP_set_singleshot_timer hypercall), the event will be in the past > causing Linux to hang. > > The purpose of the pvclock protocol is to allow the guest to calculate > the system_time in nanosec correctly. The guest calculates as follow: > > from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + > vcpu_time_info.system_time > > Given that with vtsc=1: > rdtsc = to_vtsc_scale(NOW() - vtsc_offset) > vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - > vtsc_offset) > > The expression evaluates to NOW(), which is what we want. However when > stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is > actually 0. As a consequence the calculated overall system_time is not > correct. > > This patch fixes the issue by letting gtime_to_gtsc return a negative > integer in the form of a wrapped around unsigned integer, thus when the > guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate > the right value. > > Signed-off-by: Jan Beulich <JBeulich@suse.com> > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Assuming you mean for this to go into 4.7, I've added Wei to Cc (and you should do so in case of re-submission). > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -1663,7 +1663,13 @@ custom_param("tsc", tsc_parse); > u64 gtime_to_gtsc(struct domain *d, u64 time) > { > if ( !is_hvm_domain(d) ) > + { > time = max_t(s64, time - d->arch.vtsc_offset, 0); This line should have been deleted. While I'd be happy to do this while committing, its presence raises the question of whether things actually work as expected. Jan > + if ( time < d->arch.vtsc_offset ) > + return -scale_delta(d->arch.vtsc_offset - time, > + &d->arch.ns_to_vtsc); > + time -= d->arch.vtsc_offset; > + } > return scale_delta(time, &d->arch.ns_to_vtsc); > } >
On Mon, Apr 25, 2016 at 06:05:50AM -0600, Jan Beulich wrote: > >>> On 25.04.16 at 13:18, <sstabellini@kernel.org> wrote: > > From: Jan Beulich <JBeulich@suse.com> > > > > For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time() > > using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct > > vcpu_time_info, is calculated from stime_local_stamp using > > gtime_to_gtsc. > > > > However gtime_to_gtsc can return 0, if time < vtsc_offset, which can > > actually happen when gtime_to_gtsc is called passing stime_local_stamp > > (the caller function is __update_vcpu_system_time). > > > > In that case the pvclock protocol doesn't work properly and the guest is > > unable to calculate the system time correctly. As a consequence when the > > guest tries to set a timer event (for example calling the > > VCPUOP_set_singleshot_timer hypercall), the event will be in the past > > causing Linux to hang. > > > > The purpose of the pvclock protocol is to allow the guest to calculate > > the system_time in nanosec correctly. The guest calculates as follow: > > > > from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + > > vcpu_time_info.system_time > > > > Given that with vtsc=1: > > rdtsc = to_vtsc_scale(NOW() - vtsc_offset) > > vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - > > vtsc_offset) > > > > The expression evaluates to NOW(), which is what we want. However when > > stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is > > actually 0. As a consequence the calculated overall system_time is not > > correct. > > > > This patch fixes the issue by letting gtime_to_gtsc return a negative > > integer in the form of a wrapped around unsigned integer, thus when the > > guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate > > the right value. > > > > Signed-off-by: Jan Beulich <JBeulich@suse.com> > > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> > > Assuming you mean for this to go into 4.7, I've added Wei to Cc > (and you should do so in case of re-submission). > Release-acked-by: Wei Liu <wei.liu2@citrix.com> (only skimmed the commit message)
On 25/04/16 12:18, Stefano Stabellini wrote: > From: Jan Beulich <JBeulich@suse.com> > > For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time() > using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct > vcpu_time_info, is calculated from stime_local_stamp using > gtime_to_gtsc. > > However gtime_to_gtsc can return 0, if time < vtsc_offset, which can > actually happen when gtime_to_gtsc is called passing stime_local_stamp > (the caller function is __update_vcpu_system_time). > > In that case the pvclock protocol doesn't work properly and the guest is > unable to calculate the system time correctly. As a consequence when the > guest tries to set a timer event (for example calling the > VCPUOP_set_singleshot_timer hypercall), the event will be in the past > causing Linux to hang. > > The purpose of the pvclock protocol is to allow the guest to calculate > the system_time in nanosec correctly. The guest calculates as follow: > > from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time > > Given that with vtsc=1: > rdtsc = to_vtsc_scale(NOW() - vtsc_offset) > vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset) > > The expression evaluates to NOW(), which is what we want. However when > stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is > actually 0. As a consequence the calculated overall system_time is not > correct. > > This patch fixes the issue by letting gtime_to_gtsc return a negative > integer in the form of a wrapped around unsigned integer, thus when the > guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate > the right value. > > Signed-off-by: Jan Beulich <JBeulich@suse.com> > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 687e39b..6a77a90 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1663,7 +1663,13 @@ custom_param("tsc", tsc_parse); u64 gtime_to_gtsc(struct domain *d, u64 time) { if ( !is_hvm_domain(d) ) + { time = max_t(s64, time - d->arch.vtsc_offset, 0); + if ( time < d->arch.vtsc_offset ) + return -scale_delta(d->arch.vtsc_offset - time, + &d->arch.ns_to_vtsc); + time -= d->arch.vtsc_offset; + } return scale_delta(time, &d->arch.ns_to_vtsc); }