[v2] KVM: x86/xen: improve accuracy of Xen timers

From: David Woodhouse <dwmw@amazon.co.uk>

From: David Woodhouse <dwmw@amazon.co.uk>

A test program such as http://david.woodhou.se/timerlat.c confirms user
reports that timers are increasingly inaccurate as the lifetime of a
guest increases. Reporting the actual delay observed when asking for
100µs of sleep, it starts off OK on a newly-launched guest but gets
worse over time, giving incorrect sleep times:

root@ip-10-0-193-21:~# ./timerlat -c -n 5
00000000 latency 103243/100000 (3.2430%)
00000001 latency 103243/100000 (3.2430%)
00000002 latency 103242/100000 (3.2420%)
00000003 latency 103245/100000 (3.2450%)
00000004 latency 103245/100000 (3.2450%)

The biggest problem is that get_kvmclock_ns() returns inaccurate values
when the guest TSC is scaled. The guest sees a TSC value scaled from the
host TSC by a mul/shift conversion (hopefully done in hardware). The
guest then converts that guest TSC value into nanoseconds using the
mul/shift conversion given to it by the KVM pvclock information.

But get_kvmclock_ns() performs only a single conversion directly from
host TSC to nanoseconds, giving a different result. A test program at
http://david.woodhou.se/tsdrift.c demonstrates the cumulative error
over a day.

It's non-trivial to fix get_kvmclock_ns(), although I'll come back to
that. The actual guest hv_clock is per-CPU, and *theoretically* each
vCPU could be running at a *different* frequency. But this patch is
needed anyway because...

The other issue with Xen timers was that the code would snapshot the
host CLOCK_MONOTONIC at some point in time, and then... after a few
interrupts may have occurred, some preemption perhaps... would also read
the guest's kvmclock. Then it would proceed under the false assumption
that those two happened at the *same* time. Any time which *actually*
elapsed between reading the two clocks was introduced as inaccuracies
in the time at which the timer fired.

Fix it to use a variant of kvm_get_time_and_clockread(), which reads the
host TSC just *once*, then use the returned TSC value to calculate the
kvmclock (making sure to do that the way the guest would instead of
making the same mistake get_kvmclock_ns() does).

Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen
timers still have to use CLOCK_MONOTONIC. In practice the difference
between the two won't matter over the timescales involved, as the
*absolute* values don't matter; just the delta.

This does mean a new variant of kvm_get_time_and_clockread() is needed;
called kvm_get_monotonic_and_clockread() because that's what it does.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

---
v2: 
  • Fall back to get_kvmclock_ns() if vcpu-arch.hv_clock isn't set up
    yet, with a big comment explaining why that's actually OK.
  • Fix do_monotonic() *not* to add the boot time offset.
  • Rename do_monotonic_raw() → do_kvmclock_base() and add a comment
    to make it clear that it *does* add the boot time offset. That
    was just left as a bear trap for the unwary developer, wasn't it?

 arch/x86/kvm/x86.c |  61 +++++++++++++++++++++--
 arch/x86/kvm/x86.h |   1 +
 arch/x86/kvm/xen.c | 121 ++++++++++++++++++++++++++++++++++-----------
 3 files changed, 149 insertions(+), 34 deletions(-)

Message ID	74f32bfae7243a78d0e74b1ba3a2d1ea4a4a7518.camel@infradead.org (mailing list archive)
State	New, archived
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5166F12E4A for <kvm@vger.kernel.org>; Tue, 31 Oct 2023 23:13:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="cjnB676P" Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE697C9; Tue, 31 Oct 2023 16:13:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=MIME-Version:Content-Type:References: In-Reply-To:Date:Cc:To:From:Subject:Message-ID:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=aoXFGr076nF82BPbvdA/VLSMG5zWMqEUdTYR/AWyP44=; b=cjnB676P5gUvakf4chjOv/sCFz TJDWQVWQHHgnb1SuBQaOfsjkABgv3Pz1e4wpZwAvWSKKS+LqP5JK6WmKWklZVjdWoL/q9wbJY/rzs RLL3mqty5CjcAjlKNG29Qcub1/XJ3PMZKFAg/ARIk4f+aJuKKMl6EyoV4LFkW2UMwm5VYezJTeGqJ dxcoFJks5zh8ZkdIz/CM1qBnoz1+BSPV1OFBUSF7hxuC8dS/IaRCYC90bG1BtQv5/+RPLzSzWozk+ A5okksfXpJQQaeFSIWXDxysU3nlNS8Ybg1CA7w5OlYIf/cpCIQa6zHDhVYUD5bGJLI14bIwQfd0f4 TAS+pg7Q==; Received: from [2001:8b0:10b:5:f213:ff26:c9e:fe3d] (helo=u3832b3a9db3152.ant.amazon.com) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qxxvb-00CmZB-Fi; Tue, 31 Oct 2023 23:13:43 +0000 Message-ID: <74f32bfae7243a78d0e74b1ba3a2d1ea4a4a7518.camel@infradead.org> Subject: [PATCH v2] KVM: x86/xen: improve accuracy of Xen timers From: David Woodhouse <dwmw2@infradead.org> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Paul Durrant <paul@xen.org>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com> Date: Tue, 31 Oct 2023 23:13:42 +0000 In-Reply-To: <96da7273adfff2a346de9a4a27ce064f6fe0d0a1.camel@infradead.org> References: <96da7273adfff2a346de9a4a27ce064f6fe0d0a1.camel@infradead.org> Content-Type: multipart/signed; micalg="sha-256"; protocol="application/pkcs7-signature"; boundary="=-tZz2hTzuPytIyEK/+sg2" User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: <kvm.vger.kernel.org> List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from <dwmw2@infradead.org> by casper.infradead.org. See http://www.infradead.org/rpr.html
Series	[v2] KVM: x86/xen: improve accuracy of Xen timers \| expand [v2] KVM: x86/xen: improve accuracy of Xen timers

[v2] KVM: x86/xen: improve accuracy of Xen timers

Commit Message

Comments

Patch