From patchwork Mon Apr 8 22:07:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jack Allister X-Patchwork-Id: 13621611 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59FA5148854; Mon, 8 Apr 2024 22:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712614080; cv=none; b=u7jM4PcHZElnJHkWew/ANPFKyq5a4H38Y90+Hp8lQY/8eRHBe/wuCa7sljsb8P3WJFPCanKKCAefH5cPgizBmycjC1nXbMosEkebAPACBvX5SsfRY+YbJXItNrTErAsSggjEVFIkD2UCaBvbnzwv1sSXXLrswsU+vxJ1pHqq/jc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712614080; c=relaxed/simple; bh=JFYjZ4s9HVWCcGug4NDnh8HtF9MZU8LyiNBr7E8NGRw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EA/XEo4nK/PaPjWzkFwWfuBXO+ZDyFG0ohO61HThMuMJ8wKd7WzVuPRdQNjZ4Mr5l7u3dl9Edz7ZKyJFV0/8DOwv9ecwM4kQKbLM1fysAQ3DYUqG6BnE0dw7WzK4DW4PckAxJK+JHEuTuqLnTa6PeAe6yGGtA2qUyrkiwnhcIeo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=WqLuiud/; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="WqLuiud/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1712614079; x=1744150079; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mMHYNSET/OiGHH6NMYDLuJgAcn+i0b2ppUt1xXG1EV0=; b=WqLuiud/hd/jL7q5gljl0l+qSKCmE/NC2irqHgeJS+9RJplFAmAQjmAb a60SNHKiekUTuKwpOTXSYaBOPg+u10f/LoDa3+Ry3l4GNfJdnlR3qNA0o 5Vej3oZIxLi8CwO+aQ9jtJXjfrLPGWaRq+T0v6Rq72Z4nEsnZU7iLzjUK U=; X-IronPort-AV: E=Sophos;i="6.07,187,1708387200"; d="scan'208";a="197276305" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2024 22:07:55 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.43.254:64593] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.15.243:2525] with esmtp (Farcaster) id d497b1a7-e8e3-4ec9-83b8-6b73edc799b9; Mon, 8 Apr 2024 22:07:54 +0000 (UTC) X-Farcaster-Flow-ID: d497b1a7-e8e3-4ec9-83b8-6b73edc799b9 Received: from EX19D033EUB003.ant.amazon.com (10.252.61.76) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 8 Apr 2024 22:07:53 +0000 Received: from EX19MTAUEB001.ant.amazon.com (10.252.135.35) by EX19D033EUB003.ant.amazon.com (10.252.61.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 8 Apr 2024 22:07:53 +0000 Received: from dev-dsk-jalliste-1c-e3349c3e.eu-west-1.amazon.com (10.13.244.142) by mail-relay.amazon.com (10.252.135.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Mon, 8 Apr 2024 22:07:51 +0000 From: Jack Allister To: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , , "H. Peter Anvin" CC: David Woodhouse , Paul Durrant , "Jack Allister" , , , Subject: [PATCH 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for KVM clock drift fixup Date: Mon, 8 Apr 2024 22:07:03 +0000 Message-ID: <20240408220705.7637-2-jalliste@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240408220705.7637-1-jalliste@amazon.com> References: <20240408220705.7637-1-jalliste@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There is a potential for drift between the TSC and a KVM/PV clock when the guest TSC is scaled (as seen previously in [1]). Which fixed drift between timers over the lifetime of a VM. However, there is another factor which will cause a drift. In a situation such as a kexec/live-update of the kernel or a live-migration of a VM the PV clock information is recalculated by KVM (KVM_REQ_MASTERCLOCK_UPDATE). This update samples a new system_time & tsc_timestamp to be used in the structure. For example, when a guest is running with a TSC frequency of 1.5GHz but the host frequency is 3.0GHz upon an update of the PV time information a delta of ~3500ns is observed between the TSC and the KVM/PV clock. There is no reason why a fixup creating an accuracy of ±1ns cannot be achieved. Additional interfaces are added to retrieve & fixup the PV time information when a VMM may believe is appropriate (deserialization after live-update/ migration). KVM_GET_CLOCK_GUEST can be used for the VMM to retrieve the currently used PV time information and then when the VMM believes a drift may occur can then instruct KVM to perform a correction via the setter KVM_SET_CLOCK_GUEST. The KVM_SET_CLOCK_GUEST ioctl works under the following premise. The host TSC & kernel timstamp are sampled at a singular point in time. Using the already known scaling/offset for L1 the guest TSC is then derived from this information. From here two PV time information structures are created, one which is the original time information structure prior to whatever may have caused a PV clock re-calculation (live-update/migration). The second is then using the singular point in time sampled just prior. An individual KVM/PV clock for each of the PV time information structures using the singular guest TSC. A delta is then determined between the two calculated PV times, which is then used as a correction offset added onto the kvmclock_offset for the VM. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=451a707813ae Suggested-by: David Woodhouse Signed-off-by: Jack Allister CC: Paul Durrant --- Documentation/virt/kvm/api.rst | 43 +++++++++++++++++ arch/x86/kvm/x86.c | 87 ++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 3 ++ 3 files changed, 133 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 0b5a33ee71ee..5f74d8ac1002 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6352,6 +6352,49 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_GET_CLOCK_GUEST +---------------------------- + +:Capability: none +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct pvclock_vcpu_time_info (out) +:Returns: 0 on success, <0 on error + +Retrieves the current time information structure used for KVM/PV clocks. +On x86 a PV clock is derived from the current TSC and is then scaled based +upon the a specified multiplier and shift. The result of this is then added +to a system time. + +The guest needs a way to determine the system time, multiplier and shift. This +can be done by multiple ways, for KVM guests this can be via an MSR write to +MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW which defines the guest physical +address KVM shall put the structure. On Xen guests this can be found in the Xen +vcpu_info. + +This is structure is useful information for a VMM to also know when taking into +account potential timer drift on live-update/migration. + +4.144 KVM_SET_CLOCK_GUEST +---------------------------- + +:Capability: none +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct pvclock_vcpu_time_info (in) +:Returns: 0 on success, <0 on error + +Triggers KVM to perform a correction of the KVM/PV clock structure based upon a +known prior PV clock structure (see KVM_GET_CLOCK_GUEST). + +If a VM is utilizing TSC scaling there is a potential for a drift between the +KVM/PV clock and the TSC itself. This is due to the loss of precision when +performing a multiply and shift rather than divide for the TSC. + +To perform the correction a delta is calculated between the original time info +(which is assumed correct) at a singular point in time X. The KVM clock offset +is then offset by this delta. + 5. The kvm_run structure ======================== diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 47d9f03b7778..5d2e10cd1c30 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6988,6 +6988,87 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) return 0; } +static struct kvm_vcpu *kvm_get_bsp_vcpu(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu = NULL; + int i; + + for (i = 0; i < KVM_MAX_VCPUS; i++) { + vcpu = kvm_get_vcpu_by_id(kvm, i); + if (!vcpu) + continue; + + if (kvm_vcpu_is_reset_bsp(vcpu)) + break; + } + + return vcpu; +} + +static int kvm_vm_ioctl_get_clock_guest(struct kvm *kvm, void __user *argp) +{ + struct kvm_vcpu *vcpu; + + vcpu = kvm_get_bsp_vcpu(kvm); + if (!vcpu) + return -EINVAL; + + if (!vcpu->arch.hv_clock.tsc_timestamp || !vcpu->arch.hv_clock.system_time) + return -EIO; + + if (copy_to_user(argp, &vcpu->arch.hv_clock, sizeof(vcpu->arch.hv_clock))) + return -EFAULT; + + return 0; +} + +static int kvm_vm_ioctl_set_clock_guest(struct kvm *kvm, void __user *argp) +{ + struct kvm_vcpu *vcpu; + struct pvclock_vcpu_time_info orig_pvti; + struct pvclock_vcpu_time_info dummy_pvti; + int64_t kernel_ns; + uint64_t host_tsc, guest_tsc; + uint64_t clock_orig, clock_dummy; + int64_t correction; + unsigned long i; + + vcpu = kvm_get_bsp_vcpu(kvm); + if (!vcpu) + return -EINVAL; + + if (copy_from_user(&orig_pvti, argp, sizeof(orig_pvti))) + return -EFAULT; + + /* + * Sample the kernel time and host TSC at a singular point. + * We then calculate the guest TSC using this exact point in time, + * From here we can then determine the delta using the + * PV time info requested from the user and what we currently have + * using the fixed point in time. This delta is then used as a + * correction factor to fixup the potential drift. + */ + if (!kvm_get_time_and_clockread(&kernel_ns, &host_tsc)) + return -EFAULT; + + guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc); + + dummy_pvti = orig_pvti; + dummy_pvti.tsc_timestamp = guest_tsc; + dummy_pvti.system_time = kernel_ns + kvm->arch.kvmclock_offset; + + clock_orig = __pvclock_read_cycles(&orig_pvti, guest_tsc); + clock_dummy = __pvclock_read_cycles(&dummy_pvti, guest_tsc); + + correction = clock_orig - clock_dummy; + kvm->arch.kvmclock_offset += correction; + + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + + return 0; +} + int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { struct kvm *kvm = filp->private_data; @@ -7246,6 +7327,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) case KVM_GET_CLOCK: r = kvm_vm_ioctl_get_clock(kvm, argp); break; + case KVM_SET_CLOCK_GUEST: + r = kvm_vm_ioctl_set_clock_guest(kvm, argp); + break; + case KVM_GET_CLOCK_GUEST: + r = kvm_vm_ioctl_get_clock_guest(kvm, argp); + break; case KVM_SET_TSC_KHZ: { u32 user_tsc_khz; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2190adbe3002..0d306311e4d6 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1548,4 +1548,7 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_SET_CLOCK_GUEST _IOW(KVMIO, 0xd5, struct pvclock_vcpu_time_info) +#define KVM_GET_CLOCK_GUEST _IOR(KVMIO, 0xd6, struct pvclock_vcpu_time_info) + #endif /* __LINUX_KVM_H */ From patchwork Mon Apr 8 22:07:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jack Allister X-Patchwork-Id: 13621612 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B303148854; Mon, 8 Apr 2024 22:08:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712614090; cv=none; b=kDtrmafRU8rDj4rik0sc5y6AgUnJDSCUk+vebbccrDgKSYOvw1rhhCpS+eyXVpXP7gUmrTXJNgsS1qN2Eh7szibsM2Dn4HA/qt/zOTgfVScQCidovDVVkZW1j1HweIxRKQaVz7jT4pQyDhJvqq5LZ7A3vpsKdV4mdPaX9yLpsKA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712614090; c=relaxed/simple; bh=6mi6jakl+vhCGrJBzcTDV5CynKVnsfEKynhyvw4ZIDo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dASbqCJKUADgIzljlOmcYCYdc6auJVnuGyxkyWb32cDPdTBJ4FJSV4aPOy+2Vvgi6UGEQkHamPvzcBUvs7H3ZUzz98XQi4r6uInID5GomJDHOW85vZgIMhMXqFzs0J9u/ixBTH8jhjs9wpuQ2hUmTcNHjn5Es8DvZ4mXl6Iphoo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=L/awBl//; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="L/awBl//" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1712614089; x=1744150089; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DvVGnq9BylvqxGBtmGF6jdpkEtxi98A9Ndid/G/EBM0=; b=L/awBl//QGb1Q1HlNwZLYh+88tvWWQEKzIbdFH/v6eSCa2sIKGP4THDz i5Ei6xaaO0FKBVDCt8de4Kjro4z8JujvHwcbGegWlxCt2UVtd+MC5hqr5 koyRqJNLavRZocMLHjqN8d0lNeMtzxFD3+vHzJGHKBrhbKc9dx4R5joG2 E=; X-IronPort-AV: E=Sophos;i="6.07,187,1708387200"; d="scan'208";a="646453268" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2024 22:08:05 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.17.79:64060] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.5.247:2525] with esmtp (Farcaster) id 4bf308e4-6f56-4b76-8126-c5904501e6ab; Mon, 8 Apr 2024 22:08:03 +0000 (UTC) X-Farcaster-Flow-ID: 4bf308e4-6f56-4b76-8126-c5904501e6ab Received: from EX19D033EUB003.ant.amazon.com (10.252.61.76) by EX19MTAEUA001.ant.amazon.com (10.252.50.50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 8 Apr 2024 22:08:03 +0000 Received: from EX19MTAUEB001.ant.amazon.com (10.252.135.35) by EX19D033EUB003.ant.amazon.com (10.252.61.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 8 Apr 2024 22:08:02 +0000 Received: from dev-dsk-jalliste-1c-e3349c3e.eu-west-1.amazon.com (10.13.244.142) by mail-relay.amazon.com (10.252.135.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Mon, 8 Apr 2024 22:08:01 +0000 From: Jack Allister To: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , , "H. Peter Anvin" , Shuah Khan CC: David Woodhouse , Paul Durrant , "Jack Allister" , , , , Subject: [PATCH 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer drift correction Date: Mon, 8 Apr 2024 22:07:04 +0000 Message-ID: <20240408220705.7637-3-jalliste@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240408220705.7637-1-jalliste@amazon.com> References: <20240408220705.7637-1-jalliste@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This test proves that there is an inherent KVM/PV clock drift away from the guest TSC when KVM decides to update the PV time information structure due to a KVM_REQ_MASTERCLOCK_UPDATE. This drift is exascerbated when a guest is using TSC scaling and running at a different frequency to the host TSC [1]. It also proves that KVM_[GS]ET_CLOCK_GUEST API is working to mitigate the drift from TSC to within ±1ns. The test simply records the PVTI (PV time information) at time of guest creation, after KVM has updated it's mapped PVTI structure and once the correction has taken place. A singular point in time is then recorded via the guest TSC and is used to calculate the a PV clock value using each of the 3 PVTI structures. As seen below a drift of ~3500ns is observed if no correction has taken place after KVM has updated the PVTI via master clock update. However, after the correction a delta of at most 1ns can be seen. * selftests: kvm: pvclock_test * scaling tsc from 2999999KHz to 1499999KHz * before=5038374946 uncorrected=5038371437 corrected=5038374945 * delta_uncorrected=3509 delta_corrected=1 Clocksource check code has been borrowed from [2]. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=451a707813ae [2]: https://lore.kernel.org/kvm/20240106083346.29180-1-dongli.zhang@oracle.com/ Signed-off-by: Jack Allister CC: David Woodhouse CC: Paul Durrant --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/pvclock_test.c | 223 ++++++++++++++++++ 2 files changed, 224 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/pvclock_test.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 741c7dc16afc..02ee1205bbed 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test +TEST_GEN_PROGS_x86_64 += x86_64/pvclock_test TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test diff --git a/tools/testing/selftests/kvm/x86_64/pvclock_test.c b/tools/testing/selftests/kvm/x86_64/pvclock_test.c new file mode 100644 index 000000000000..172ef4d19c60 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/pvclock_test.c @@ -0,0 +1,223 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright © 2024, Amazon.com, Inc. or its affiliates. + * + * Tests for pvclock API + * KVM_SET_CLOCK_GUEST/KVM_GET_CLOCK_GUEST + */ +#include +#include +#include +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" + +enum { + STAGE_FIRST_BOOT, + STAGE_UNCORRECTED, + STAGE_CORRECTED, + NUM_STAGES +}; + +#define KVMCLOCK_GPA 0xc0000000ull +#define KVMCLOCK_SIZE sizeof(struct pvclock_vcpu_time_info) + +static void trigger_pvti_update(vm_paddr_t pvti_pa) +{ + /* + * We need a way to trigger KVM to update the fields + * in the PV time info. The easiest way to do this is + * to temporarily switch to the old KVM system time + * method and then switch back to the new one. + */ + wrmsr(MSR_KVM_SYSTEM_TIME, pvti_pa | KVM_MSR_ENABLED); + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED); +} + +static void guest_code(vm_paddr_t pvti_pa) +{ + struct pvclock_vcpu_time_info *pvti_va = + (struct pvclock_vcpu_time_info *)pvti_pa; + + struct pvclock_vcpu_time_info pvti_boot; + struct pvclock_vcpu_time_info pvti_uncorrected; + struct pvclock_vcpu_time_info pvti_corrected; + uint64_t cycles_boot; + uint64_t cycles_uncorrected; + uint64_t cycles_corrected; + uint64_t tsc_guest; + + /* + * Setup the KVMCLOCK in the guest & store the original + * PV time structure that is used. + */ + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED); + pvti_boot = *pvti_va; + GUEST_SYNC(STAGE_FIRST_BOOT); + + /* + * Trigger an update of the PVTI, if we calculate + * the KVM clock using this structure we'll see + * a drift from the TSC. + */ + trigger_pvti_update(pvti_pa); + pvti_uncorrected = *pvti_va; + GUEST_SYNC(STAGE_UNCORRECTED); + + /* + * The test should have triggered the correction by this + * point in time. We have a copy of each of the PVTI structs + * at each stage now. + * + * Let's sample the timestamp at a SINGLE point in time and + * then calculate what the KVM clock would be using the PVTI + * from each stage. + * + * Then return each of these values to the tester. + */ + pvti_corrected = *pvti_va; + tsc_guest = rdtsc(); + + cycles_boot = __pvclock_read_cycles(&pvti_boot, tsc_guest); + cycles_uncorrected = __pvclock_read_cycles(&pvti_uncorrected, tsc_guest); + cycles_corrected = __pvclock_read_cycles(&pvti_corrected, tsc_guest); + + GUEST_SYNC_ARGS(STAGE_CORRECTED, cycles_boot, cycles_uncorrected, + cycles_corrected, 0); +} + +static void run_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu) +{ + struct ucall uc; + uint64_t ucall_reason; + struct pvclock_vcpu_time_info pvti_before; + uint64_t before, uncorrected, corrected; + int64_t delta_uncorrected, delta_corrected; + + /* Loop through each stage of the test. */ + while (true) { + + /* Start/restart the running vCPU code. */ + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + /* Retrieve and verify our stage. */ + ucall_reason = get_ucall(vcpu, &uc); + TEST_ASSERT(ucall_reason == UCALL_SYNC, + "Unhandled ucall reason=%lu", + ucall_reason); + + /* Run host specific code relating to stage. */ + switch (uc.args[1]) { + case STAGE_FIRST_BOOT: + /* Store the KVM clock values before an update. */ + vm_ioctl(vm, KVM_GET_CLOCK_GUEST, &pvti_before); + + /* Sleep for a set amount of time to induce drift. */ + sleep(5); + break; + + case STAGE_UNCORRECTED: + /* Restore the KVM clock values. */ + vm_ioctl(vm, KVM_SET_CLOCK_GUEST, &pvti_before); + break; + + case STAGE_CORRECTED: + /* Query the clock information and verify delta. */ + before = uc.args[2]; + uncorrected = uc.args[3]; + corrected = uc.args[4]; + + delta_uncorrected = before - uncorrected; + delta_corrected = before - corrected; + + pr_info("before=%lu uncorrected=%lu corrected=%lu\n", + before, uncorrected, corrected); + + pr_info("delta_uncorrected=%ld delta_corrected=%ld\n", + delta_uncorrected, delta_corrected); + + TEST_ASSERT((delta_corrected <= 1) && (delta_corrected >= -1), + "larger than expected delta detected = %ld", delta_corrected); + return; + } + } +} + +#define CLOCKSOURCE_PATH "/sys/devices/system/clocksource/clocksource0/current_clocksource" + +static void check_clocksource(void) +{ + char *clk_name; + struct stat st; + FILE *fp; + + fp = fopen(CLOCKSOURCE_PATH, "r"); + if (!fp) { + pr_info("failed to open clocksource file: %d; assuming TSC.\n", + errno); + return; + } + + if (fstat(fileno(fp), &st)) { + pr_info("failed to stat clocksource file: %d; assuming TSC.\n", + errno); + goto out; + } + + clk_name = malloc(st.st_size); + TEST_ASSERT(clk_name, "failed to allocate buffer to read file\n"); + + if (!fgets(clk_name, st.st_size, fp)) { + pr_info("failed to read clocksource file: %d; assuming TSC.\n", + ferror(fp)); + goto out; + } + + TEST_ASSERT(!strncmp(clk_name, "tsc\n", st.st_size), + "clocksource not supported: %s", clk_name); +out: + fclose(fp); +} + +static void configure_pvclock(struct kvm_vm *vm, struct kvm_vcpu *vcpu) +{ + unsigned int gpages; + + gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, KVMCLOCK_SIZE); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + KVMCLOCK_GPA, 1, gpages, 0); + virt_map(vm, KVMCLOCK_GPA, KVMCLOCK_GPA, gpages); + + vcpu_args_set(vcpu, 1, KVMCLOCK_GPA); +} + +static void configure_scaled_tsc(struct kvm_vcpu *vcpu) +{ + uint64_t tsc_khz; + + tsc_khz = __vcpu_ioctl(vcpu, KVM_GET_TSC_KHZ, NULL); + pr_info("scaling tsc from %ldKHz to %ldKHz\n", tsc_khz, tsc_khz / 2); + tsc_khz /= 2; + vcpu_ioctl(vcpu, KVM_SET_TSC_KHZ, (void *)tsc_khz); +} + +int main(void) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + + check_clocksource(); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + configure_pvclock(vm, vcpu); + configure_scaled_tsc(vcpu); + + run_test(vm, vcpu); + + return 0; +}