Message ID | 20240206151950.31174-1-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK | expand |
On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote: > xen_shinfo_test is observed to be flaky failing sporadically with > "VM time too old". With min_ts/max_ts debug print added: > > Wall clock (v 3269818) 1704906491.986255664 > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > min_ts: 1704906491.986312153 > max_ts: 1704906506.001006963 > ==== Test Assertion Failure ==== > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > pid=32724 tid=32724 errno=4 - Interrupted system call > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > 2 0x00007fca6b23feaf: ?? ??:0 > 3 0x00007fca6b23ff5f: ?? ??:0 > 4 0x0000000000405e04: _start at ??:? > VM time too old > > [...] Applied to kvm-x86 selftests, thanks! [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK https://github.com/kvm-x86/linux/commit/201142d16010 -- https://github.com/kvm-x86/linux/tree/next
On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote: > On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote: > > xen_shinfo_test is observed to be flaky failing sporadically with > > "VM time too old". With min_ts/max_ts debug print added: > > > > Wall clock (v 3269818) 1704906491.986255664 > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > min_ts: 1704906491.986312153 > > max_ts: 1704906506.001006963 > > ==== Test Assertion Failure ==== > > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > > pid=32724 tid=32724 errno=4 - Interrupted system call > > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > > 2 0x00007fca6b23feaf: ?? ??:0 > > 3 0x00007fca6b23ff5f: ?? ??:0 > > 4 0x0000000000405e04: _start at ??:? > > VM time too old > > > > [...] > > Applied to kvm-x86 selftests, thanks! > > [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK > https://github.com/kvm-x86/linux/commit/201142d16010 Of course, this just highlights the fact that the very *definition* of the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK is entirely broken now. When the KVM clock was based on CLOCK_MONOTONIC, the delta between that and wallclock time was constant (well, apart from leap seconds but KVM has *always* been utterly hosed for that, so that's just par for the course). So that made sense. But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to express wallclock time in terms of the KVM clock became silly. They run at different rates, so the value returned by kvm_get_wall_clock_epoch() will be constantly changing. As I work through cleaning up the KVM clock mess, it occurred to me that we should maybe *refresh* the wallclock time we report to the guest. But I think it's just been hosed for so long that no guest could ever trust it for anything but knowing roughly what year it is when first booting, and it isn't worth fixing. What we *should* do is expose something new which exposes the NTP- calibrated relationship between the arch counter (or TSC) and the real time, being explicit about TAI and about live migration (a guest needs to know when it's been migrated and should throw away any NTP refinement that it's done for *itself*). I know we have the PTP paired reading thing, but that's *still* not TAI, it makes guests do the work for themselves and doesn't give a clean signal when live migration disrupts them.
On Mon, Apr 29, 2024, David Woodhouse wrote: > On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote: > > On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote: > > > xen_shinfo_test is observed to be flaky failing sporadically with > > > "VM time too old". With min_ts/max_ts debug print added: > > > > > > Wall clock (v 3269818) 1704906491.986255664 > > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > > min_ts: 1704906491.986312153 > > > max_ts: 1704906506.001006963 > > > ==== Test Assertion Failure ==== > > > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > > > pid=32724 tid=32724 errno=4 - Interrupted system call > > > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > > > 2 0x00007fca6b23feaf: ?? ??:0 > > > 3 0x00007fca6b23ff5f: ?? ??:0 > > > 4 0x0000000000405e04: _start at ??:? > > > VM time too old > > > > > > [...] > > > > Applied to kvm-x86 selftests, thanks! > > > > [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK > > https://github.com/kvm-x86/linux/commit/201142d16010 > > Of course, this just highlights the fact that the very *definition* of > the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK > is entirely broken now. > > When the KVM clock was based on CLOCK_MONOTONIC, the delta between that > and wallclock time was constant (well, apart from leap seconds but KVM > has *always* been utterly hosed for that, so that's just par for the > course). So that made sense. > > But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to > express wallclock time in terms of the KVM clock became silly. They run > at different rates, so the value returned by kvm_get_wall_clock_epoch() > will be constantly changing. > > As I work through cleaning up the KVM clock mess, it occurred to me > that we should maybe *refresh* the wallclock time we report to the > guest. But I think it's just been hosed for so long that no guest could > ever trust it for anything but knowing roughly what year it is when > first booting, and it isn't worth fixing. > > What we *should* do is expose something new which exposes the NTP- > calibrated relationship between the arch counter (or TSC) and the real > time, being explicit about TAI and about live migration (a guest needs > to know when it's been migrated and should throw away any NTP > refinement that it's done for *itself*). > > I know we have the PTP paired reading thing, but that's *still* not > TAI, it makes guests do the work for themselves and doesn't give a > clean signal when live migration disrupts them. Is the above an objection to the selftest change, or just an aside? Honest question, I have no preference either way; I just don't have my head wrapped around all of the clock stuff enough to have an informed opinion on what the right/best way forward is.
On Tue, 2024-04-30 at 09:02 -0700, Sean Christopherson wrote: > On Mon, Apr 29, 2024, David Woodhouse wrote: > > On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote: > > > On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote: > > > > xen_shinfo_test is observed to be flaky failing sporadically with > > > > "VM time too old". With min_ts/max_ts debug print added: > > > > > > > > Wall clock (v 3269818) 1704906491.986255664 > > > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > > > min_ts: 1704906491.986312153 > > > > max_ts: 1704906506.001006963 > > > > ==== Test Assertion Failure ==== > > > > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > > > > pid=32724 tid=32724 errno=4 - Interrupted system call > > > > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > > > > 2 0x00007fca6b23feaf: ?? ??:0 > > > > 3 0x00007fca6b23ff5f: ?? ??:0 > > > > 4 0x0000000000405e04: _start at ??:? > > > > VM time too old > > > > > > > > [...] > > > > > > Applied to kvm-x86 selftests, thanks! > > > > > > [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK > > > https://github.com/kvm-x86/linux/commit/201142d16010 > > > > Of course, this just highlights the fact that the very *definition* of > > the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK > > is entirely broken now. > > > > When the KVM clock was based on CLOCK_MONOTONIC, the delta between that > > and wallclock time was constant (well, apart from leap seconds but KVM > > has *always* been utterly hosed for that, so that's just par for the > > course). So that made sense. > > > > But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to > > express wallclock time in terms of the KVM clock became silly. They run > > at different rates, so the value returned by kvm_get_wall_clock_epoch() > > will be constantly changing. > > > > As I work through cleaning up the KVM clock mess, it occurred to me > > that we should maybe *refresh* the wallclock time we report to the > > guest. But I think it's just been hosed for so long that no guest could > > ever trust it for anything but knowing roughly what year it is when > > first booting, and it isn't worth fixing. > > > > What we *should* do is expose something new which exposes the NTP- > > calibrated relationship between the arch counter (or TSC) and the real > > time, being explicit about TAI and about live migration (a guest needs > > to know when it's been migrated and should throw away any NTP > > refinement that it's done for *itself*). > > > > I know we have the PTP paired reading thing, but that's *still* not > > TAI, it makes guests do the work for themselves and doesn't give a > > clean signal when live migration disrupts them. > > Is the above an objection to the selftest change, or just an aside? Honest > question, I have no preference either way; I just don't have my head wrapped > around all of the clock stuff enough to have an informed opinion on what the > right/best way forward is. Sorry for not being clearer. The selftest change is fine; this is an aside (well, more of a corollary). Put differently: The selftest commit is entirely correct. We can't rely on the accuracy of the wallclock time that KVM advertises to the guest because it's fundamentally hosed. All the selftest can do is check that we put something *plausible* there. The corollary is that we do need a *proper* way of conveying the wallclock time to guests now that the old one is useless. And the PTP clock pairing ioctl isn't sufficient either. My previous email launched straight into the latter without taking the time to say "yes, that commit is entirely right but also serves to highlight the fact that..."
diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c index 9ec9ab60b63e..e6e3553633b1 100644 --- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c +++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c @@ -375,20 +375,6 @@ static void guest_code(void) GUEST_SYNC(TEST_DONE); } -static int cmp_timespec(struct timespec *a, struct timespec *b) -{ - if (a->tv_sec > b->tv_sec) - return 1; - else if (a->tv_sec < b->tv_sec) - return -1; - else if (a->tv_nsec > b->tv_nsec) - return 1; - else if (a->tv_nsec < b->tv_nsec) - return -1; - else - return 0; -} - static struct vcpu_info *vinfo; static struct kvm_vcpu *vcpu; @@ -425,7 +411,6 @@ static void *juggle_shinfo_state(void *arg) int main(int argc, char *argv[]) { - struct timespec min_ts, max_ts, vm_ts; struct kvm_xen_hvm_attr evt_reset; struct kvm_vm *vm; pthread_t thread; @@ -443,8 +428,6 @@ int main(int argc, char *argv[]) bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL); bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND); - clock_gettime(CLOCK_REALTIME, &min_ts); - vm = vm_create_with_one_vcpu(&vcpu, guest_code); /* Map a region for the shared_info page */ @@ -969,7 +952,6 @@ int main(int argc, char *argv[]) vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset); alarm(0); - clock_gettime(CLOCK_REALTIME, &max_ts); /* * Just a *really* basic check that things are being put in the @@ -978,6 +960,8 @@ int main(int argc, char *argv[]) */ struct pvclock_wall_clock *wc; struct pvclock_vcpu_time_info *ti, *ti2; + struct kvm_clock_data kcdata; + long long delta; wc = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0xc00); ti = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20); @@ -993,12 +977,34 @@ int main(int argc, char *argv[]) ti2->tsc_shift, ti2->flags); } - vm_ts.tv_sec = wc->sec; - vm_ts.tv_nsec = wc->nsec; TEST_ASSERT(wc->version && !(wc->version & 1), "Bad wallclock version %x", wc->version); - TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old"); - TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new"); + + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); + + if (kcdata.flags & KVM_CLOCK_REALTIME) { + if (verbose) { + printf("KVM_GET_CLOCK clock: %lld.%09lld\n", + kcdata.clock / NSEC_PER_SEC, kcdata.clock % NSEC_PER_SEC); + printf("KVM_GET_CLOCK realtime: %lld.%09lld\n", + kcdata.realtime / NSEC_PER_SEC, kcdata.realtime % NSEC_PER_SEC); + } + + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); + + /* + * KVM_GET_CLOCK gives CLOCK_REALTIME which jumps on leap seconds updates but + * unfortunately KVM doesn't currently offer a CLOCK_TAI alternative. Accept 1s + * delta as testing clock accuracy is not the goal here. The test just needs to + * check that the value in shinfo is somewhat sane. + */ + TEST_ASSERT(llabs(delta) < NSEC_PER_SEC, + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); + } else { + pr_info("Missing KVM_CLOCK_REALTIME, skipping shinfo epoch sanity check\n"); + } TEST_ASSERT(ti->version && !(ti->version & 1), "Bad time_info version %x", ti->version);
xen_shinfo_test is observed to be flaky failing sporadically with "VM time too old". With min_ts/max_ts debug print added: Wall clock (v 3269818) 1704906491.986255664 Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 min_ts: 1704906491.986312153 max_ts: 1704906506.001006963 ==== Test Assertion Failure ==== x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 pid=32724 tid=32724 errno=4 - Interrupted system call 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 2 0x00007fca6b23feaf: ?? ??:0 3 0x00007fca6b23ff5f: ?? ??:0 4 0x0000000000405e04: _start at ??:? VM time too old The test compares wall clock data from shinfo (which is the output of kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the host system before the VM is created. In the example above, it compares shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153 and fails as the later is greater than the former. While this sounds like a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch() calculates guest's epoch (realtime when the guest was created) by subtracting kvmclock from the current realtime and the calculation happens when shinfo is setup. The problem is that kvmclock is a raw clock and realtime clock is affected by NTP. This means that if realtime ticks with a slightly reduced frequency, "guest's epoch" calculated by kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big issue from guest's perspective as the guest can't really observe this but this epoch can't be compared with a fixed clock_gettime() on the host. Replace the check with comparing wall clock data from shinfo to KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch can be calculated by subtraction. Note, CLOCK_REALTIME is susceptible to leap seconds jumps but there's no better alternative in KVM at this moment. Leave a comment and accept 1s delta. Reported-by: Jan Richter <jarichte@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- Changes since v1 [David Woodhouse]: - Check for KVM_CLOCK_REALTIME and skip testing shinfo epoch sanity in case it's missing (e.g. when a non-TSC clocksource is used) - Add a comment about a potential jump of CLOCK_REALTIME because of a leap second injection and accept 1s delta. While this is very unlikely anyone will even hit this in testing, this mostly serves a documentation purpose. --- .../selftests/kvm/x86_64/xen_shinfo_test.c | 50 +++++++++++-------- 1 file changed, 28 insertions(+), 22 deletions(-)