diff mbox series

[v2] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

Message ID 20240206151950.31174-1-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK | expand

Commit Message

Vitaly Kuznetsov Feb. 6, 2024, 3:19 p.m. UTC
xen_shinfo_test is observed to be flaky failing sporadically with
"VM time too old". With min_ts/max_ts debug print added:

Wall clock (v 3269818) 1704906491.986255664
Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
min_ts: 1704906491.986312153
max_ts: 1704906506.001006963
==== Test Assertion Failure ====
  x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
  pid=32724 tid=32724 errno=4 - Interrupted system call
     1	0x00000000004030ad: main at xen_shinfo_test.c:1003
     2	0x00007fca6b23feaf: ?? ??:0
     3	0x00007fca6b23ff5f: ?? ??:0
     4	0x0000000000405e04: _start at ??:?
  VM time too old

The test compares wall clock data from shinfo (which is the output of
kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the
host system before the VM is created. In the example above, it compares

 shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153

and fails as the later is greater than the former.  While this sounds like
a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch()
calculates guest's epoch (realtime when the guest was created) by
subtracting kvmclock from the current realtime and the calculation happens
when shinfo is setup. The problem is that kvmclock is a raw clock and
realtime clock is affected by NTP. This means that if realtime ticks with a
slightly reduced frequency, "guest's epoch" calculated by
kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big
issue from guest's perspective as the guest can't really observe this but
this epoch can't be compared with a fixed clock_gettime() on the host.

Replace the check with comparing wall clock data from shinfo to
KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
can be calculated by subtraction. Note, CLOCK_REALTIME is susceptible to
leap seconds jumps but there's no better alternative in KVM at this
moment. Leave a comment and accept 1s delta.

Reported-by: Jan Richter <jarichte@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes since v1 [David Woodhouse]:
- Check for KVM_CLOCK_REALTIME and skip testing shinfo epoch sanity in case
it's missing (e.g. when a non-TSC clocksource is used)
- Add a comment about a potential jump of CLOCK_REALTIME because of a leap
second injection and accept 1s delta. While this is very unlikely anyone
will even hit this in testing, this mostly serves a documentation purpose.
---
 .../selftests/kvm/x86_64/xen_shinfo_test.c    | 50 +++++++++++--------
 1 file changed, 28 insertions(+), 22 deletions(-)

Comments

Sean Christopherson April 29, 2024, 8:45 p.m. UTC | #1
On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote:
> xen_shinfo_test is observed to be flaky failing sporadically with
> "VM time too old". With min_ts/max_ts debug print added:
> 
> Wall clock (v 3269818) 1704906491.986255664
> Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> min_ts: 1704906491.986312153
> max_ts: 1704906506.001006963
> ==== Test Assertion Failure ====
>   x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
>   pid=32724 tid=32724 errno=4 - Interrupted system call
>      1	0x00000000004030ad: main at xen_shinfo_test.c:1003
>      2	0x00007fca6b23feaf: ?? ??:0
>      3	0x00007fca6b23ff5f: ?? ??:0
>      4	0x0000000000405e04: _start at ??:?
>   VM time too old
> 
> [...]

Applied to kvm-x86 selftests, thanks!

[1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
      https://github.com/kvm-x86/linux/commit/201142d16010

--
https://github.com/kvm-x86/linux/tree/next
David Woodhouse April 29, 2024, 11:32 p.m. UTC | #2
On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote:
> On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote:
> > xen_shinfo_test is observed to be flaky failing sporadically with
> > "VM time too old". With min_ts/max_ts debug print added:
> > 
> > Wall clock (v 3269818) 1704906491.986255664
> > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > min_ts: 1704906491.986312153
> > max_ts: 1704906506.001006963
> > ==== Test Assertion Failure ====
> >    x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> >    pid=32724 tid=32724 errno=4 - Interrupted system call
> >       1        0x00000000004030ad: main at xen_shinfo_test.c:1003
> >       2        0x00007fca6b23feaf: ?? ??:0
> >       3        0x00007fca6b23ff5f: ?? ??:0
> >       4        0x0000000000405e04: _start at ??:?
> >    VM time too old
> > 
> > [...]
> 
> Applied to kvm-x86 selftests, thanks!
> 
> [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
>       https://github.com/kvm-x86/linux/commit/201142d16010

Of course, this just highlights the fact that the very *definition* of
the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK
is entirely broken now. 

When the KVM clock was based on CLOCK_MONOTONIC, the delta between that
and wallclock time was constant (well, apart from leap seconds but KVM
has *always* been utterly hosed for that, so that's just par for the
course). So that made sense.

But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to
express wallclock time in terms of the KVM clock became silly. They run
at different rates, so the value returned by kvm_get_wall_clock_epoch()
will be constantly changing.

As I work through cleaning up the KVM clock mess, it occurred to me
that we should maybe *refresh* the wallclock time we report to the
guest. But I think it's just been hosed for so long that no guest could
ever trust it for anything but knowing roughly what year it is when
first booting, and it isn't worth fixing.

What we *should* do is expose something new which exposes the NTP-
calibrated relationship between the arch counter (or TSC) and the real
time, being explicit about TAI and about live migration (a guest needs
to know when it's been migrated and should throw away any NTP
refinement that it's done for *itself*).

I know we have the PTP paired reading thing, but that's *still* not
TAI, it makes guests do the work for themselves and doesn't give a
clean signal when live migration disrupts them.
Sean Christopherson April 30, 2024, 4:02 p.m. UTC | #3
On Mon, Apr 29, 2024, David Woodhouse wrote:
> On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote:
> > On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote:
> > > xen_shinfo_test is observed to be flaky failing sporadically with
> > > "VM time too old". With min_ts/max_ts debug print added:
> > > 
> > > Wall clock (v 3269818) 1704906491.986255664
> > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > > min_ts: 1704906491.986312153
> > > max_ts: 1704906506.001006963
> > > ==== Test Assertion Failure ====
> > >    x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> > >    pid=32724 tid=32724 errno=4 - Interrupted system call
> > >       1        0x00000000004030ad: main at xen_shinfo_test.c:1003
> > >       2        0x00007fca6b23feaf: ?? ??:0
> > >       3        0x00007fca6b23ff5f: ?? ??:0
> > >       4        0x0000000000405e04: _start at ??:?
> > >    VM time too old
> > > 
> > > [...]
> > 
> > Applied to kvm-x86 selftests, thanks!
> > 
> > [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
> >       https://github.com/kvm-x86/linux/commit/201142d16010
> 
> Of course, this just highlights the fact that the very *definition* of
> the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK
> is entirely broken now. 
> 
> When the KVM clock was based on CLOCK_MONOTONIC, the delta between that
> and wallclock time was constant (well, apart from leap seconds but KVM
> has *always* been utterly hosed for that, so that's just par for the
> course). So that made sense.
> 
> But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to
> express wallclock time in terms of the KVM clock became silly. They run
> at different rates, so the value returned by kvm_get_wall_clock_epoch()
> will be constantly changing.
> 
> As I work through cleaning up the KVM clock mess, it occurred to me
> that we should maybe *refresh* the wallclock time we report to the
> guest. But I think it's just been hosed for so long that no guest could
> ever trust it for anything but knowing roughly what year it is when
> first booting, and it isn't worth fixing.
> 
> What we *should* do is expose something new which exposes the NTP-
> calibrated relationship between the arch counter (or TSC) and the real
> time, being explicit about TAI and about live migration (a guest needs
> to know when it's been migrated and should throw away any NTP
> refinement that it's done for *itself*).
> 
> I know we have the PTP paired reading thing, but that's *still* not
> TAI, it makes guests do the work for themselves and doesn't give a
> clean signal when live migration disrupts them.

Is the above an objection to the selftest change, or just an aside?  Honest
question, I have no preference either way; I just don't have my head wrapped
around all of the clock stuff enough to have an informed opinion on what the
right/best way forward is.
David Woodhouse April 30, 2024, 5:39 p.m. UTC | #4
On Tue, 2024-04-30 at 09:02 -0700, Sean Christopherson wrote:
> On Mon, Apr 29, 2024, David Woodhouse wrote:
> > On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote:
> > > On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote:
> > > > xen_shinfo_test is observed to be flaky failing sporadically with
> > > > "VM time too old". With min_ts/max_ts debug print added:
> > > > 
> > > > Wall clock (v 3269818) 1704906491.986255664
> > > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > > > min_ts: 1704906491.986312153
> > > > max_ts: 1704906506.001006963
> > > > ==== Test Assertion Failure ====
> > > >    x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> > > >    pid=32724 tid=32724 errno=4 - Interrupted system call
> > > >       1        0x00000000004030ad: main at xen_shinfo_test.c:1003
> > > >       2        0x00007fca6b23feaf: ?? ??:0
> > > >       3        0x00007fca6b23ff5f: ?? ??:0
> > > >       4        0x0000000000405e04: _start at ??:?
> > > >    VM time too old
> > > > 
> > > > [...]
> > > 
> > > Applied to kvm-x86 selftests, thanks!
> > > 
> > > [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
> > >       https://github.com/kvm-x86/linux/commit/201142d16010
> > 
> > Of course, this just highlights the fact that the very *definition* of
> > the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK
> > is entirely broken now. 
> > 
> > When the KVM clock was based on CLOCK_MONOTONIC, the delta between that
> > and wallclock time was constant (well, apart from leap seconds but KVM
> > has *always* been utterly hosed for that, so that's just par for the
> > course). So that made sense.
> > 
> > But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to
> > express wallclock time in terms of the KVM clock became silly. They run
> > at different rates, so the value returned by kvm_get_wall_clock_epoch()
> > will be constantly changing.
> > 
> > As I work through cleaning up the KVM clock mess, it occurred to me
> > that we should maybe *refresh* the wallclock time we report to the
> > guest. But I think it's just been hosed for so long that no guest could
> > ever trust it for anything but knowing roughly what year it is when
> > first booting, and it isn't worth fixing.
> > 
> > What we *should* do is expose something new which exposes the NTP-
> > calibrated relationship between the arch counter (or TSC) and the real
> > time, being explicit about TAI and about live migration (a guest needs
> > to know when it's been migrated and should throw away any NTP
> > refinement that it's done for *itself*).
> > 
> > I know we have the PTP paired reading thing, but that's *still* not
> > TAI, it makes guests do the work for themselves and doesn't give a
> > clean signal when live migration disrupts them.
> 
> Is the above an objection to the selftest change, or just an aside?  Honest
> question, I have no preference either way; I just don't have my head wrapped
> around all of the clock stuff enough to have an informed opinion on what the
> right/best way forward is.

Sorry for not being clearer. The selftest change is fine; this is an
aside (well, more of a corollary).

Put differently: The selftest commit is entirely correct. We can't rely
on the accuracy of the wallclock time that KVM advertises to the guest
because it's fundamentally hosed. All the selftest can do is check that
we put something *plausible* there. 

The corollary is that we do need a *proper* way of conveying the
wallclock time to guests now that the old one is useless. And the PTP
clock pairing ioctl isn't sufficient either.

My previous email launched straight into the latter without taking the
time to say "yes, that commit is entirely right but also serves to
highlight the fact that..."
diff mbox series

Patch

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 9ec9ab60b63e..e6e3553633b1 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -375,20 +375,6 @@  static void guest_code(void)
 	GUEST_SYNC(TEST_DONE);
 }
 
-static int cmp_timespec(struct timespec *a, struct timespec *b)
-{
-	if (a->tv_sec > b->tv_sec)
-		return 1;
-	else if (a->tv_sec < b->tv_sec)
-		return -1;
-	else if (a->tv_nsec > b->tv_nsec)
-		return 1;
-	else if (a->tv_nsec < b->tv_nsec)
-		return -1;
-	else
-		return 0;
-}
-
 static struct vcpu_info *vinfo;
 static struct kvm_vcpu *vcpu;
 
@@ -425,7 +411,6 @@  static void *juggle_shinfo_state(void *arg)
 
 int main(int argc, char *argv[])
 {
-	struct timespec min_ts, max_ts, vm_ts;
 	struct kvm_xen_hvm_attr evt_reset;
 	struct kvm_vm *vm;
 	pthread_t thread;
@@ -443,8 +428,6 @@  int main(int argc, char *argv[])
 	bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL);
 	bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND);
 
-	clock_gettime(CLOCK_REALTIME, &min_ts);
-
 	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
 
 	/* Map a region for the shared_info page */
@@ -969,7 +952,6 @@  int main(int argc, char *argv[])
 	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset);
 
 	alarm(0);
-	clock_gettime(CLOCK_REALTIME, &max_ts);
 
 	/*
 	 * Just a *really* basic check that things are being put in the
@@ -978,6 +960,8 @@  int main(int argc, char *argv[])
 	 */
 	struct pvclock_wall_clock *wc;
 	struct pvclock_vcpu_time_info *ti, *ti2;
+	struct kvm_clock_data kcdata;
+	long long delta;
 
 	wc = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0xc00);
 	ti = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20);
@@ -993,12 +977,34 @@  int main(int argc, char *argv[])
 		       ti2->tsc_shift, ti2->flags);
 	}
 
-	vm_ts.tv_sec = wc->sec;
-	vm_ts.tv_nsec = wc->nsec;
 	TEST_ASSERT(wc->version && !(wc->version & 1),
 		    "Bad wallclock version %x", wc->version);
-	TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old");
-	TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new");
+
+	vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
+
+	if (kcdata.flags & KVM_CLOCK_REALTIME) {
+		if (verbose) {
+			printf("KVM_GET_CLOCK clock: %lld.%09lld\n",
+			       kcdata.clock / NSEC_PER_SEC, kcdata.clock % NSEC_PER_SEC);
+			printf("KVM_GET_CLOCK realtime: %lld.%09lld\n",
+			       kcdata.realtime / NSEC_PER_SEC, kcdata.realtime % NSEC_PER_SEC);
+		}
+
+		delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
+
+		/*
+		 * KVM_GET_CLOCK gives CLOCK_REALTIME which jumps on leap seconds updates but
+		 * unfortunately KVM doesn't currently offer a CLOCK_TAI alternative. Accept 1s
+		 * delta as testing clock accuracy is not the goal here. The test just needs to
+		 * check that the value in shinfo is somewhat sane.
+		 */
+		TEST_ASSERT(llabs(delta) < NSEC_PER_SEC,
+			    "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
+			    wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
+			    (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
+	} else {
+		pr_info("Missing KVM_CLOCK_REALTIME, skipping shinfo epoch sanity check\n");
+	}
 
 	TEST_ASSERT(ti->version && !(ti->version & 1),
 		    "Bad time_info version %x", ti->version);