KVM: x86/xen: improve accuracy of Xen timers

From: David Woodhouse <dwmw@amazon.co.uk>

From: David Woodhouse <dwmw@amazon.co.uk>

A test program such as http://david.woodhou.se/timerlat.c confirms user
reports that timers are increasingly inaccurate as the lifetime of a
guest increases. Reporting the actual delay observed when asking for
100µs of sleep, it starts off OK on a newly-launched guest but gets
worse over time, giving incorrect sleep times:

root@ip-10-0-193-21:~# ./timerlat -c -n 5
00000000 latency 103243/100000 (3.2430%)
00000001 latency 103243/100000 (3.2430%)
00000002 latency 103242/100000 (3.2420%)
00000003 latency 103245/100000 (3.2450%)
00000004 latency 103245/100000 (3.2450%)

The biggest problem is that get_kvmclock_ns() returns inaccurate values
when the guest TSC is scaled. The guest sees a TSC value scaled from the
host TSC by a mul/shift conversion (hopefully done in hardware). The
guest then converts that guest TSC value into nanoseconds using the
mul/shift conversion given to it by the KVM pvclock information.

But get_kvmclock_ns() performs only a single conversion directly from
host TSC to nanoseconds, giving a different result. A test program at
http://david.woodhou.se/tsdrift.c demonstrates the cumulative error
over a day.

It's non-trivial to fix get_kvmclock_ns(), although I'll come back to
that. The actual guest hv_clock is per-CPU, and *theoretically* each
vCPU could be running at a *different* frequency. But this patch is
needed anyway because...

The other issue with Xen timers was that the code would snapshot the
host CLOCK_MONOTONIC at some point in time, and then... after a few
interrupts may have occurred, some preemption perhaps... would also read
the guest's kvmclock. Then it would proceed under the false assumption
that those two happened at the *same* time. Any time which *actually*
elapsed between reading the two clocks was introduced as inaccuracies
in the time at which the timer fired.

Fix it to use a variant of kvm_get_time_and_clockread(), which reads the
host TSC just *once*, then use the returned TSC value to calculate the
kvmclock (making sure to do that the way the guest would instead of
making the same mistake get_kvmclock_ns() does).

Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen
timers still have to use CLOCK_MONOTONIC. In practice the difference
between the two won't matter over the timescales involved, as the
*absolute* values don't matter; just the delta.

This does mean a new variant of kvm_get_time_and_clockread() is needed;
called kvm_get_monotonic_and_clockread() because that's what it does.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/x86.c |  30 ++++++++++++
 arch/x86/kvm/x86.h |   1 +
 arch/x86/kvm/xen.c | 111 +++++++++++++++++++++++++++++++--------------
 3 files changed, 109 insertions(+), 33 deletions(-)

Message ID	96da7273adfff2a346de9a4a27ce064f6fe0d0a1.camel@infradead.org (mailing list archive)
State	New, archived
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1C61538A for <kvm@vger.kernel.org>; Mon, 30 Oct 2023 15:50:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="T4LMbY8q" Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB4F2CC; Mon, 30 Oct 2023 08:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=MIME-Version:Content-Type:Date:Cc:To: From:Subject:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=JIVnKoGUsdrTNuefwCJEO8cGyy/1Op8gcQl3Jxn6RKU=; b=T4LMbY8qfiuXCTNlt59GCIVqkX gJi8JrFrYrVNjIID4tfARMPhYPsVqswZELA/jKY5scoJE2pRO7HpYzrYW0xCw1KiH9f1mU/9RaCzs jaqJe7K4e+DY7GynOeNULkQ/u9RQaefNW4cD4K7YRYhnu9KsLqZDzP2Do+Hxis9MKnUsBYmoMcgUi 5M8L6zxSKQO5KcJhFGD2KQhj7bLL5BHEbJP6xNfE7D86iJEyZsaC0NCdZNB6wulHhbu0soQIg9wjB EAgQ03IkA38f2heHJjNC8tTUz4Uydt0M9SGT9sHd932Koz2OydtUSh3wpfmgFXa/+RK1vTOGAxwbN C6tUCmbw==; Received: from [2001:8b0:10b:5:9cbc:41e:b3e7:96ad] (helo=u3832b3a9db3152.ant.amazon.com) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qxUXP-004wml-Ls; Mon, 30 Oct 2023 15:50:47 +0000 Message-ID: <96da7273adfff2a346de9a4a27ce064f6fe0d0a1.camel@infradead.org> Subject: [PATCH] KVM: x86/xen: improve accuracy of Xen timers From: David Woodhouse <dwmw2@infradead.org> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Paul Durrant <paul@xen.org>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com> Date: Mon, 30 Oct 2023 15:50:41 +0000 Content-Type: multipart/signed; micalg="sha-256"; protocol="application/pkcs7-signature"; boundary="=-51oeNHsRH1xpT5ehKUZ2" User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: <kvm.vger.kernel.org> List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from <dwmw2@infradead.org> by casper.infradead.org. See http://www.infradead.org/rpr.html
Series	KVM: x86/xen: improve accuracy of Xen timers \| expand KVM: x86/xen: improve accuracy of Xen timers

KVM: x86/xen: improve accuracy of Xen timers

Commit Message

Comments

Patch