From patchwork Thu Sep 16 18:15:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32EF1C433EF for ; Thu, 16 Sep 2021 18:21:09 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F3B52611C4 for ; Thu, 16 Sep 2021 18:21:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org F3B52611C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=o3iHDVk0XyJUd62VV0Q6V5IOy9OQbvfsYyinQ5n2BiY=; b=zenI7X6Wt7BqpBSuOqlKCLEA97 ySYSd6ym/lfjlztoMRNbhjDkH7FQRJG3QNL44b0KVk6x8ZSXokktOF/YmqNM3SrZ7YL3BUz7LO7ij tH7ZggoPSOvcS4XMiPOluQiE2x3GVp8mXNsIhCQHSRwu2cul2+PzUn7Vr7eaAT9nJIwQBogn5MfES vfTmJQODmQigypFLNh0YgsbrndJ6EzQGcSrVZ/eqtoD07p09gkOCsLrE291Ruq/8R6t42tvRkAjdR 1YRPRNftKflHFyUaiRUtn/HjxGsg9KJTQic9JndIPydOeCAXDZ+V77QSOpjPD/rQ2PAqbDsHcIHBg gpJyjbeQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvyz-00C3o1-BW; Thu, 16 Sep 2021 18:19:38 +0000 Received: from mail-qk1-x74a.google.com ([2607:f8b0:4864:20::74a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvE-00C1wc-Sn for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:46 +0000 Received: by mail-qk1-x74a.google.com with SMTP id q13-20020a05620a038d00b003d38f784161so44874832qkm.8 for ; Thu, 16 Sep 2021 11:15:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=EqfJ4nsZ26w2OY+w0/ZRQGVHhFarrWqQ77C9H9qe4wk=; b=jCgTfuzY3Xi/FfyO0lx2uvq5fA0lVuB+u2I9budg5JdSmdUW6fkucj/6Iswn5+9u81 S5Hii45+Igo8DnjUigtICHwVREClRsu6Tt94jvAfHqx/W4EdvzOWJfxyBJRSO/PtALZt 511OJi7qp+hjaP2T2gNjcMes9QckyDHsJh+2BGWhGKw0M1JamoY1Gp59mcwwonYWw3jo YzlI/iWllIFsVn66gc/48kc2LhIAYzj4gD09j2aRz8ifMS/+7cKghKSzkamlf41sDmLH xwJx2x4VOpX8THrVq3tJaaLDKzBQd47kw3OExONFC73ii1Dcr8WZjUXHO6zWaFv0mYYj 2w9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=EqfJ4nsZ26w2OY+w0/ZRQGVHhFarrWqQ77C9H9qe4wk=; b=6GjP9XEFEgI50s48srycMayj+UP8yBP/lFdxM5C67Btusn5N9i1iSTitTV07n15wap 2WPsgxxuZhAxbJbFFXKSQjgRX6elCaa1wKfL0i6l6BQih3KLNRSjhc45kii/VHqrV6ir erGJhq40XIzXAGWpYt+YaZWmgWcXkdbEY4b8XrMGJRrZN6ZwDq1HpPFdinU87WD6PjZO 3SvBZPHqctGiHs1txJyUjx5qV5FwSX7tEGf42GkPFcb5KXXo2wxQQMz51fSGO80GbyIP T0omdIRz94UD5FBWLnMUNjhegeu/gIy3+kWJPnje1R4BQMU7W7fo+ZEUkEz+3VjQegWS 5AeA== X-Gm-Message-State: AOAM530390cwnNtivPmPnE7dyv7H41Di4jkVzS71qpGt0jHtifuWcSPY 18bwxlQjBTiUbWAfGa6LreaSd3ortbQ= X-Google-Smtp-Source: ABdhPJzJVShjUP9eV6I0zKA9rrR+wmqb3KVny8yYAxYLYNtJU8sisB/xFAD1ErUlrDRx7ismNi4ONuO4ygA= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:431:: with SMTP id a17mr6549014qvy.48.1631816142691; Thu, 16 Sep 2021 11:15:42 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:32 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-2-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 1/7] kvm: x86: abstract locking around pvclock_update_vm_gtod_copy From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111544_999147_CDECBC59 X-CRM114-Status: GOOD ( 14.28 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Paolo Bonzini Updates to the kvmclock parameters needs to do a complicated dance of KVM_REQ_MCLOCK_INPROGRESS and KVM_REQ_CLOCK_UPDATE in addition to taking pvclock_gtod_sync_lock. Place that in two functions that can be called on all of master clock update, KVM_SET_CLOCK, and Hyper-V reenlightenment. Signed-off-by: Paolo Bonzini Signed-off-by: Oliver Upton --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/x86.c | 62 +++++++++++++++------------------ 2 files changed, 29 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f8f48a7ec577..be6805fc0260 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1866,7 +1866,6 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier); unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu); bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); -void kvm_make_mclock_inprogress_request(struct kvm *kvm); void kvm_make_scan_ioapic_request(struct kvm *kvm); void kvm_make_scan_ioapic_request_mask(struct kvm *kvm, unsigned long *vcpu_bitmap); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 28ef14155726..1082b48418c3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2755,35 +2755,42 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) #endif } -void kvm_make_mclock_inprogress_request(struct kvm *kvm) +static void kvm_make_mclock_inprogress_request(struct kvm *kvm) { kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); } -static void kvm_gen_update_masterclock(struct kvm *kvm) +static void kvm_start_pvclock_update(struct kvm *kvm) { -#ifdef CONFIG_X86_64 - int i; - struct kvm_vcpu *vcpu; struct kvm_arch *ka = &kvm->arch; - unsigned long flags; - - kvm_hv_invalidate_tsc_page(kvm); kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); - pvclock_update_vm_gtod_copy(kvm); - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + spin_lock_irq(&ka->pvclock_gtod_sync_lock); +} +static void kvm_end_pvclock_update(struct kvm *kvm) +{ + struct kvm_arch *ka = &kvm->arch; + struct kvm_vcpu *vcpu; + int i; + + spin_unlock_irq(&ka->pvclock_gtod_sync_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); /* guest entries allowed */ kvm_for_each_vcpu(i, vcpu, kvm) kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); -#endif +} + +static void kvm_update_masterclock(struct kvm *kvm) +{ + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + kvm_end_pvclock_update(kvm); } u64 get_kvmclock_ns(struct kvm *kvm) @@ -6079,12 +6086,10 @@ long kvm_arch_vm_ioctl(struct file *filp, goto out; r = 0; - /* - * TODO: userspace has to take care of races with VCPU_RUN, so - * kvm_gen_update_masterclock() can be cut down to locked - * pvclock_update_vm_gtod_copy(). - */ - kvm_gen_update_masterclock(kvm); + + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); /* * This pairs with kvm_guest_time_update(): when masterclock is @@ -6093,15 +6098,12 @@ long kvm_arch_vm_ioctl(struct file *filp, * is slightly ahead) here we risk going negative on unsigned * 'system_time' when 'user_ns.clock' is very small. */ - spin_lock_irq(&ka->pvclock_gtod_sync_lock); if (kvm->arch.use_master_clock) now_ns = ka->master_kernel_ns; else now_ns = get_kvmclock_base_ns(); ka->kvmclock_offset = user_ns.clock - now_ns; - spin_unlock_irq(&ka->pvclock_gtod_sync_lock); - - kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); + kvm_end_pvclock_update(kvm); break; } case KVM_GET_CLOCK: { @@ -8107,14 +8109,13 @@ static void tsc_khz_changed(void *data) static void kvm_hyperv_tsc_notifier(void) { struct kvm *kvm; - struct kvm_vcpu *vcpu; int cpu; - unsigned long flags; mutex_lock(&kvm_lock); list_for_each_entry(kvm, &vm_list, vm_list) kvm_make_mclock_inprogress_request(kvm); + /* no guest entries from this point */ hyperv_stop_tsc_emulation(); /* TSC frequency always matches when on Hyper-V */ @@ -8125,16 +8126,11 @@ static void kvm_hyperv_tsc_notifier(void) list_for_each_entry(kvm, &vm_list, vm_list) { struct kvm_arch *ka = &kvm->arch; - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); + spin_lock_irq(&ka->pvclock_gtod_sync_lock); pvclock_update_vm_gtod_copy(kvm); - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - - kvm_for_each_vcpu(cpu, vcpu, kvm) - kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); - - kvm_for_each_vcpu(cpu, vcpu, kvm) - kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); + kvm_end_pvclock_update(kvm); } + mutex_unlock(&kvm_lock); } #endif @@ -9418,7 +9414,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) __kvm_migrate_timers(vcpu); if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) - kvm_gen_update_masterclock(vcpu->kvm); + kvm_update_masterclock(vcpu->kvm); if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu)) kvm_gen_kvmclock_update(vcpu); if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { From patchwork Thu Sep 16 18:15:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8668AC433F5 for ; Thu, 16 Sep 2021 18:21:53 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4E94960F12 for ; Thu, 16 Sep 2021 18:21:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4E94960F12 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=ZT5kITQb4XWiKEkhBW9QHXE6HIy5w2ux+qEUL7T3FKk=; b=hgjlT8oUNwYBpqeaaM52Cr8MNX /12RXOjxEjS1H/Erx51gfhqUTRm7s1vsxBYh7T9dTBl5is4ZHqcFtSUbLho+jQ/zHPo6U6DyERoWn eRO5yqMkU2OzzbMDU8ZfMoz/O1XooWpNjThz8wpkcIsUM+Ye+z74dg53ooFifPMNAhtm1LeeXdPAm kWfIbvETXYb9gudaPuaqO/F0XxgQTw/LsYuh2atMZc0Vj5z6nb4O369cobgY1bY9SRWR1kgAHTVf2 KW7s1+nGpTb28i5NSohC+IrnVHRT+vV0vXecCcKqpZpViRIUb9N9rCJGV+15uG0ltzo37QFqGi57R k2bqwANw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvzW-00C448-3u; Thu, 16 Sep 2021 18:20:11 +0000 Received: from mail-qv1-xf49.google.com ([2607:f8b0:4864:20::f49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvF-00C1wk-GQ for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:47 +0000 Received: by mail-qv1-xf49.google.com with SMTP id h18-20020ad446f2000000b0037a7b48ba05so62898171qvw.19 for ; Thu, 16 Sep 2021 11:15:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ompqiNMWI2hga8pPlI1DdLrAhAHA2ErmtWHohcgkDnQ=; b=EIa4PYxeIB/jdTr4Bj3rabymZMfiRDUda4Z/1Om5oXzv+tZ0fVo/aVqKHI2InGS1OU j9lrVhRl1gkVZxnjRo2Omv6GbaDFwuTFsTyhpJYtmlVggm0E+aaYOkyBo2IY8jh1btCe GCVbynojpKzy/F6leYHtS6kaa6OBYmyZ/yg5VpAo7eUg7Ay9lq3XlUxJEOuwtyxYRwM7 U8r/qmsNggetc0Jtt0lQFuvAKtC4IYRkZQuGIYG8PCS2CcI+gaR0uAWDft1vZecQzTlV 1E3D6XDkidTe09oTgISDpS3sGZjyIeQ2ZsLL11W4oTK5YDPcDw9j6SfA/hsX0uy5AbmI bz8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ompqiNMWI2hga8pPlI1DdLrAhAHA2ErmtWHohcgkDnQ=; b=za0kJhPtxvdChoyKxG7j3u+sGQtOgJ+c/mc4taO2o5yYhDtngXV6LA9dMTXn9oLNe7 evosheHn0LMwKFPlsz4HiJgQw7c1jJybr8FHyTQI/pgu64OzeD4Id7I6CAtIebsQ/ZSy J8BKH5J4WdPPeQoqdbRzYsyCLvX/5Ghxt1iClvIr9JUq4pSSrtZFc8BhbB4O5E5j6x27 20v9cCs0/VbIv+LFpyx+s2D4pG3N13Q32Bl7QW6Kbuqks0oph6YZKDhQQ4yOaEHqa9qE iDRsHZsfANfFpOIb0fOC69ytpn+Z+UibVIkSPrLz8z6NPlYdpBP238QSt0gEYO21jGQY rPfw== X-Gm-Message-State: AOAM532mMOw2WM/3fHvwTLkl3DoxKnGyiIWUUWL7OGB3gmnwN1AoI7J2 s8Nt/4BUUZqBZluGQPLGI31oyJzeuUM= X-Google-Smtp-Source: ABdhPJyMtDkmedSBXQ9JneBR51sdoiXY3FTPAb7hN3Y4cUearQHH1ZnZgDcI1O9vcz/G/4rXwSUCJqg2mWg= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:13ee:: with SMTP id ch14mr6861648qvb.43.1631816143780; Thu, 16 Sep 2021 11:15:43 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:33 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-3-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 2/7] KVM: x86: extract KVM_GET_CLOCK/KVM_SET_CLOCK to separate functions From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111545_623117_737E1B74 X-CRM114-Status: GOOD ( 15.73 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Paolo Bonzini no functional change intended. Signed-off-by: Paolo Bonzini Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 99 ++++++++++++++++++++++++---------------------- 1 file changed, 52 insertions(+), 47 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1082b48418c3..c910cf31958f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5829,6 +5829,54 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) } #endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */ +static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_clock_data data; + u64 now_ns; + + now_ns = get_kvmclock_ns(kvm); + user_ns.clock = now_ns; + user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; + memset(&user_ns.pad, 0, sizeof(user_ns.pad)); + + if (copy_to_user(argp, &data, sizeof(data))) + return -EFAULT; + + return 0; +} + +static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_arch *ka = &kvm->arch; + struct kvm_clock_data data; + u64 now_ns; + + if (copy_from_user(&data, argp, sizeof(data))) + return -EFAULT; + + if (data.flags) + return -EINVAL; + + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + + /* + * This pairs with kvm_guest_time_update(): when masterclock is + * in use, we use master_kernel_ns + kvmclock_offset to set + * unsigned 'system_time' so if we use get_kvmclock_ns() (which + * is slightly ahead) here we risk going negative on unsigned + * 'system_time' when 'data.clock' is very small. + */ + if (kvm->arch.use_master_clock) + now_ns = ka->master_kernel_ns; + else + now_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = data.clock - now_ns; + kvm_end_pvclock_update(kvm); + return 0; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -6072,55 +6120,12 @@ long kvm_arch_vm_ioctl(struct file *filp, break; } #endif - case KVM_SET_CLOCK: { - struct kvm_arch *ka = &kvm->arch; - struct kvm_clock_data user_ns; - u64 now_ns; - - r = -EFAULT; - if (copy_from_user(&user_ns, argp, sizeof(user_ns))) - goto out; - - r = -EINVAL; - if (user_ns.flags) - goto out; - - r = 0; - - kvm_hv_invalidate_tsc_page(kvm); - kvm_start_pvclock_update(kvm); - pvclock_update_vm_gtod_copy(kvm); - - /* - * This pairs with kvm_guest_time_update(): when masterclock is - * in use, we use master_kernel_ns + kvmclock_offset to set - * unsigned 'system_time' so if we use get_kvmclock_ns() (which - * is slightly ahead) here we risk going negative on unsigned - * 'system_time' when 'user_ns.clock' is very small. - */ - if (kvm->arch.use_master_clock) - now_ns = ka->master_kernel_ns; - else - now_ns = get_kvmclock_base_ns(); - ka->kvmclock_offset = user_ns.clock - now_ns; - kvm_end_pvclock_update(kvm); + case KVM_SET_CLOCK: + r = kvm_vm_ioctl_set_clock(kvm, argp); break; - } - case KVM_GET_CLOCK: { - struct kvm_clock_data user_ns; - u64 now_ns; - - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); - - r = -EFAULT; - if (copy_to_user(argp, &user_ns, sizeof(user_ns))) - goto out; - r = 0; + case KVM_GET_CLOCK: + r = kvm_vm_ioctl_get_clock(kvm, argp); break; - } case KVM_MEMORY_ENCRYPT_OP: { r = -ENOTTY; if (kvm_x86_ops.mem_enc_op) From patchwork Thu Sep 16 18:15:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAA52C433EF for ; Thu, 16 Sep 2021 18:22:24 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A05B7604DC for ; Thu, 16 Sep 2021 18:22:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A05B7604DC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=+hFTVVsUT9jvVpkbjeNg9ajqwhwnMWtJreb4Ad3wEtI=; b=0nF95tiQ2Bo6IwMNbA267rVqRt CVWODjIglgucI1Mz5JEytrByG7xBXkT8wgM6C0a2rPylQXjf0Nwii54BGOVqjVnr8yEG0Hip8TnAy uh/jO0McnziEJcSAYTTrDJTXr+d9FyyLc37Uu5CCwFwCLU5ICJgTF8HmGi0Fpk8RobxN9hUVkOaDu ZSTE2Eh7Gr7OdCcBGyksyadutMIU/Mj4fAEapanQqQHoBE8qEWzq5e8s53M12kDjP5CLDNl25a8al U/iMUUhx5o/RgqrYg6G5T/OhnM1rRojxYOB/gC673HkmIXmSNXd4Y/28OW/GCyXwNGXT/4NCfXn01 6ferEpcQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvzx-00C4Hh-So; Thu, 16 Sep 2021 18:20:38 +0000 Received: from mail-oo1-xc4a.google.com ([2607:f8b0:4864:20::c4a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvG-00C1x8-Le for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:48 +0000 Received: by mail-oo1-xc4a.google.com with SMTP id w6-20020a4adec6000000b0028b7d13a4c8so34371758oou.13 for ; Thu, 16 Sep 2021 11:15:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=/xw1HPEE3yg6hDZiSRSN4mqpJW+LQ6wXxySnmKRFyJ8=; b=aeIy+Fj9Ddx5QaJjlHoSsG2pqbF8w2bOPA1NUF6/sz/meu/pc2y7XH5Frw9KOhU9jX /kH/TSaeEzKpU3E9F4mu2P4UwMb8CA8hVkOuGgpkMxluodvQx0JB40MGxrC/28jBzSpS tOcue5Y0S/zW65/M/S+fci0vMm5hJ5aXdp3cKtcIkEKR27UTz5eqwFNzBMVkFzM01TOD rlmCukD6+Sq5NdvlmiFK9SoTsk9V7CF9hxsEpXwf31yhlcLh4USsTGsziVGOqmjspN8o 0Ky6X+cH1ZOM4CgbuHJIjBmRAlEsdJzbNTWMO+pHL8eK5LJOZ0KcX4fxKffaaQujOn7V WyoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=/xw1HPEE3yg6hDZiSRSN4mqpJW+LQ6wXxySnmKRFyJ8=; b=b0QvrB6Wv1yIZYrVYW1hahY8zProMr0qw9J75WPByEcCsRpfYbT3KZzY9QkuL0XmV3 J/NKlxIsbHlD5OP5vQ5W7+MNYTtukqWHczLSNgXAgnNqbFfeIZJ00+E/6R1Di06XeG52 9o+KwompFZok6wEJ1Yz2h4oDw7TyAJZnGoYPZSP6QKJj0L4c0PemtW7JS2CE5qrazrmS jpbU+1UArrxrEG5m6mqBSbpXB5cFQd5MuVPbPkX+WIohuQdj5RvrJnC2vC+BllIMs7my iLLWLUyOscRhXdJFUD09CYMlLi9TY6DA8g59ZWJ6DTsdAZwWx/QudpLP+kAeLQ5Txqic vFpQ== X-Gm-Message-State: AOAM530MZLpQy1Fva0SaTAZTP8m5ycL3GExx71j3n2EjbKz/qghH5zs6 cZwWwD3JbStTf4zGSw1T6Lmaxsoy0KY= X-Google-Smtp-Source: ABdhPJwhSdiutQtzePdJzjaNwKrgENSZ7USY45/SDq1qTsydz5Pm4qLqm2IgqkpSSQFsIfTfuENKHUrRnnM= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a9d:7f07:: with SMTP id j7mr5952745otq.84.1631816144992; Thu, 16 Sep 2021 11:15:44 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:34 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-4-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 3/7] KVM: x86: Fix potential race in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111546_771005_CF8D2CCE X-CRM114-Status: GOOD ( 14.89 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock outside of the pvclock sync lock. This is problematic, as the clock value written to the user may or may not actually correspond to a stable TSC. Fix the race by populating the entire kvm_clock_data structure behind the pvclock_gtod_sync_lock. Suggested-by: Sean Christopherson Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c910cf31958f..523c4e5c109f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2793,19 +2793,20 @@ static void kvm_update_masterclock(struct kvm *kvm) kvm_end_pvclock_update(kvm); } -u64 get_kvmclock_ns(struct kvm *kvm) +static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; unsigned long flags; - u64 ret; spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - return get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + return; } + data->flags |= KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); @@ -2817,13 +2818,26 @@ u64 get_kvmclock_ns(struct kvm *kvm) kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - ret = __pvclock_read_cycles(&hv_clock, rdtsc()); - } else - ret = get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + } else { + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + } put_cpu(); +} - return ret; +u64 get_kvmclock_ns(struct kvm *kvm) +{ + struct kvm_clock_data data; + + /* + * Zero flags as it's accessed RMW, leave everything else uninitialized + * as clock is always written and no other fields are consumed. + */ + data.flags = 0; + + get_kvmclock(kvm, &data); + return data.clock; } static void kvm_setup_pvclock_page(struct kvm_vcpu *v, @@ -5832,13 +5846,9 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) { struct kvm_clock_data data; - u64 now_ns; - - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); + memset(&data, 0, sizeof(data)); + get_kvmclock(kvm, &data); if (copy_to_user(argp, &data, sizeof(data))) return -EFAULT; From patchwork Thu Sep 16 18:15:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EC00C433F5 for ; Thu, 16 Sep 2021 18:23:18 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4BB6E604DC for ; Thu, 16 Sep 2021 18:23:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4BB6E604DC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=yq0Ncpk4yM85zW8AO0TmHvya5W2e9y9HoonvIdVbKFk=; b=PY5oF0tMHYqxsJuuLQK4pc2AhP kL1V8DIITETBWfM97akM+nPgVNzwRRW6inEsryPETfsZGILEvuasg905gVuXEoPTwdaeNuX/e86Ih EUQfp1avLjqszxNNFxo1f53nnFHMx8E2nSHHyRfflzaa5M+Cm3Vlk75gcvZ4QWnH8NEGWPywkP8aX zBQEHm5BtzRoRorH1FaE5E9T9PmNx25+Yhl2sdZRuQAerCY4aQjZYIKTnnzdaXjBQkn59WOoJXgcw lS/qMJuX8uktCSBbsicFZIEVrzkvBicqt+3C8Gf/e+mpS8O3k5e7l4qLeOHsV0pr2ixftsANJzKOW KdjrmgsQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQw0b-00C4Ya-Kw; Thu, 16 Sep 2021 18:21:18 +0000 Received: from mail-io1-xd49.google.com ([2607:f8b0:4864:20::d49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvH-00C1xe-JF for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:49 +0000 Received: by mail-io1-xd49.google.com with SMTP id n189-20020a6b8bc6000000b005b92c64b625so13747205iod.20 for ; Thu, 16 Sep 2021 11:15:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=krN97T8iTB8WO/XETWX7lQA+HuuTaZ6C799hkR3prSE=; b=EgpiipNExMeS0O6JHK/KOeOmoQxQDPxdhxTDLEreXijkJtqK15vvxnXOS+WpkPozOl 8Kaokdvg+5IKqrTmPoQhdzA+8+RjiHb+8l+cvULTx7gWE4ARJqzzjqmLEp9pA77qKDx4 6U9355QrUdSGV2iM7jv/yqK18b+Y5xGvyyDtvSffTu9F+4OYJ5ApqCOUalUFxjwj0c4T mOBgaqp4+wH9BA4gqrhO2Mrv0ZEheWDoVDlhcurg1TrK9g2Ny2z3fjuy+VL2nOKagqgL 5Cuq1OUPWXfMZe4uHTEuQ3t7QGhj6KcIk41CDj6ETNgAzYewx5EYTIEj9qWuHgzxRMgc 2V+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=krN97T8iTB8WO/XETWX7lQA+HuuTaZ6C799hkR3prSE=; b=gi1CjjeWOojIpVeOqPBg1rI9SuAaIwZog2nWolcDJzpxN2cCJ27ZngCK6yE7wUHEFR kALBB2L6HX5YoEE8N7yM+Ooxk1iC5FX4UljY17eAee9aK4ZJaLI6xRVyvedjgjrUorIo yD86djxFNDCvhen32iYuGp00uKV4Q5bodxreOsuAzjmTk/7EGh5h3ZvcGxxhfle4/G+e +CTpEYJHbZ0owynZcEnf09toZY9bDKK4p6dgni+iECTcEzKFIHZrlagVN2Xz+xCAYzfd SlL9yeLh3VjkrwKiu5oYBBsmAZXQHDz4aP2ELuNJAR8buOQXhgNloj3Cu7s5hWHTk+JN w9Yw== X-Gm-Message-State: AOAM530mijkPlbgrZgFVxaDIhaytWnhZ+u4EU3QEeLc9dGWEimVWdd9D luythsvFIAc7kTIutcTijwug8Gj28XI= X-Google-Smtp-Source: ABdhPJxzOT8GhRabWgsG2ePckhnurZrQANgwlQ79Q3hcycMD98sVM5endWV3vInm9fMl1vqXDCA1rX+7OQ0= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6e02:1564:: with SMTP id k4mr5041760ilu.146.1631816146209; Thu, 16 Sep 2021 11:15:46 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:35 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-5-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111547_692450_DCDD5774 X-CRM114-Status: GOOD ( 22.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by: Paolo Bonzini Signed-off-by: Oliver Upton --- Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++------- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/x86.c | 36 +++++++++++++++++++++------- include/uapi/linux/kvm.h | 7 +++++- 4 files changed, 70 insertions(+), 18 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index a6729c8cf063..d0b9c986cf6c 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -993,20 +993,34 @@ such as migration. When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the set of bits that KVM can return in struct kvm_clock_data's flag member. -The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned -value is the exact kvmclock value seen by all VCPUs at the instant -when KVM_GET_CLOCK was called. If clear, the returned value is simply -CLOCK_MONOTONIC plus a constant offset; the offset can be modified -with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, -but the exact value read by each VCPU could differ, because the host -TSC is not stable. +FLAGS: + +KVM_CLOCK_TSC_STABLE. If set, the returned value is the exact kvmclock +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant +offset; the offset can be modified with KVM_SET_CLOCK. KVM will try +to make all VCPUs follow this clock, but the exact value read by each +VCPU could differ, because the host TSC is not stable. + +KVM_CLOCK_REALTIME. If set, the `realtime` field in the kvm_clock_data +structure is populated with the value of the host's real time +clocksource at the instant when KVM_GET_CLOCK was called. If clear, +the `realtime` field does not contain a value. + +KVM_CLOCK_HOST_TSC. If set, the `host_tsc` field in the kvm_clock_data +structure is populated with the value of the host's timestamp counter (TSC) +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field +does not contain a value. :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter. In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios such as migration. +FLAGS: + +KVM_CLOCK_REALTIME. If set, KVM will compare the value of the `realtime` field +with the value of the host's real time clocksource at the instant when +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final +kvmclock value that will be provided to guests. + :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index be6805fc0260..9c34b5b63e39 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void); int alloc_all_memslots_rmaps(struct kvm *kvm); +#define KVM_CLOCK_VALID_FLAGS \ + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 523c4e5c109f..cb5d5cad5124 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) get_cpu(); if (__this_cpu_read(cpu_tsc_khz)) { +#ifdef CONFIG_X86_64 + struct timespec64 ts; + + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { + data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; + data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; + } else +#endif + data->host_tsc = rdtsc(); + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); } else { data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; } @@ -4062,7 +4072,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_SYNC_X86_VALID_FIELDS; break; case KVM_CAP_ADJUST_CLOCK: - r = KVM_CLOCK_TSC_STABLE; + r = KVM_CLOCK_VALID_FLAGS; break; case KVM_CAP_X86_DISABLE_EXITS: r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | @@ -5859,12 +5869,12 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) { struct kvm_arch *ka = &kvm->arch; struct kvm_clock_data data; - u64 now_ns; + u64 now_raw_ns; if (copy_from_user(&data, argp, sizeof(data))) return -EFAULT; - if (data.flags) + if (data.flags & ~KVM_CLOCK_REALTIME) return -EINVAL; kvm_hv_invalidate_tsc_page(kvm); @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) * is slightly ahead) here we risk going negative on unsigned * 'system_time' when 'data.clock' is very small. */ - if (kvm->arch.use_master_clock) - now_ns = ka->master_kernel_ns; + if (data.flags & KVM_CLOCK_REALTIME) { + u64 now_real_ns = ktime_get_real_ns(); + + /* + * Avoid stepping the kvmclock backwards. + */ + if (now_real_ns > data.realtime) + data.clock += now_real_ns - data.realtime; + } + + if (ka->use_master_clock) + now_raw_ns = ka->master_kernel_ns; else - now_ns = get_kvmclock_base_ns(); - ka->kvmclock_offset = data.clock - now_ns; + now_raw_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = data.clock - now_raw_ns; kvm_end_pvclock_update(kvm); return 0; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a067410ebea5..d228bf394465 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1223,11 +1223,16 @@ struct kvm_irqfd { /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */ #define KVM_CLOCK_TSC_STABLE 2 +#define KVM_CLOCK_REALTIME (1 << 2) +#define KVM_CLOCK_HOST_TSC (1 << 3) struct kvm_clock_data { __u64 clock; __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; /* For KVM_CAP_SW_TLB */ From patchwork Thu Sep 16 18:15:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6FA2C433EF for ; Thu, 16 Sep 2021 18:23:59 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B628D611C4 for ; Thu, 16 Sep 2021 18:23:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B628D611C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=OT1+rVIVYipogETtTeGkrMGT6oC3QjlUC+ZtS+NCpAs=; b=hNESIazmWHPUgIeuiRQAxoNebZ Ch8v+4TZPQUxzvSt6PKoafdT9Q5Wcld1IfKM4kCK54dkr6Xflq7SA7BAWJrY6RdCtMcPmhE2jDhcY 2j0h/gLJaaOLbqKyqwcyJ1ivTtxJQXDB6Y7QVNL0VRuaBBoAr2cTVU6oIEULdTfByzHYUE4iHigPY uWe2wi4V+DRdNAjGMFAKF54PFKLzhLdiaV6HeQsZ/HLX1aeEFx5jrsqzsB+QqUR7oF2qrucH/F4A5 PTIIu5hSSutsGKm0mFiTsdrnesX6inMmlmTp8YcaeNZg/UmI0xv9PMJMl4eHVUx/aflpS+EHp4sSH Y1e8S56g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQw1O-00C4tP-4N; Thu, 16 Sep 2021 18:22:08 +0000 Received: from mail-qk1-x74a.google.com ([2607:f8b0:4864:20::74a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvJ-00C1yO-1B for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:51 +0000 Received: by mail-qk1-x74a.google.com with SMTP id k9-20020a05620a138900b003d59b580010so44713271qki.18 for ; Thu, 16 Sep 2021 11:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=RkkbGR7s65RKrojatUhUcN7AGOVGILJpEmrP2I02yFk=; b=XdE2G/H9rYPulSHBgAfwPKILoMSC5zHl88dGQYBg/i+k3tK/Lp6vKr7uKIi1bUdViU nO6mVeTKlL84xNt3/RjeQER4b5z04fYNOneo+sSl8hyWRLgRWx95iUtG4Rob9UELNUs5 49ZPhvIMW2J95O8+ZRMZ3wDpTpXFprQFUHnOo4v6XjG42qe48CzDlEWHDnLjkorS+fCE 2ennJ2nPGZu9b8rVUjiN2JEl+neb/jih0MVrD8xU0NtCs3//Q3k6ELLj1/lJZgN7Ksp/ Lpds1Nu+kE2B0NxTAYgjUdDyF4jGzjT36AYL5urhrF1eOhrQKfi0tf3X+b8/KR2bkpcx jocg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=RkkbGR7s65RKrojatUhUcN7AGOVGILJpEmrP2I02yFk=; b=Hnvmxo9lW8uQHnGNGgZccyxbo0yRpQXyrTnF2uxAD2I/w3MgJE8b4Z64nz0epXDjmK RYmBv0cGgHwlaml+Cc46+nNYPvrr6p3HjDfZ/6RpTkTcCq2Pzx7dsCXgMbsiF5hL2da4 +HoE2WYgnMeD6BycwFuqRgdFpX3Ug+L0gZdQkI9f3u3xrSMpm29/rH7hS8IPB6ViOEhw +PTDDfAjhbWW5VNc8gljzMKhMzNAg/CiA8loguxIdXpftGGxTomTAnaojbMqmcEvrT4I 59sj+bPa4Cg9xLcMvLDKf1cqf7COWDVOt26eq+6csQn/SelCiG+sIGsCMlS9vPJYoSfx Z6KQ== X-Gm-Message-State: AOAM5319x7Cj81a//PkCPe1W8eHnuGXzA/rQ94ol7NNCN2xPeXTXJmUZ Tx721SVxmoBzlF9eo1w8EMeF/DLF2ww= X-Google-Smtp-Source: ABdhPJz+iJU5f6z/grqrVZ21yLMPrc5CcINLjptZt9ThEios+Cu4etZJHmTZs6+key3lJTsf+GErso+7PVs= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:3ca:: with SMTP id ce10mr6813032qvb.12.1631816147367; Thu, 16 Sep 2021 11:15:47 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:36 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-6-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 5/7] kvm: x86: protect masterclock with a seqcount From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111549_155666_130D7AC1 X-CRM114-Status: GOOD ( 21.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Paolo Bonzini Protect the reference point for kvmclock with a seqcount, so that kvmclock updates for all vCPUs can proceed in parallel. Xen runstate updates will also run in parallel and not bounce the kvmclock cacheline. nr_vcpus_matched_tsc is updated outside pvclock_update_vm_gtod_copy though, so a spinlock must be kept for that one. Signed-off-by: Paolo Bonzini [Oliver - drop unused locals, don't double acquire tsc_write_lock] Signed-off-by: Oliver Upton --- arch/x86/include/asm/kvm_host.h | 7 ++- arch/x86/kvm/x86.c | 83 +++++++++++++++++---------------- 2 files changed, 49 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9c34b5b63e39..5accfe7246ce 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1087,6 +1087,11 @@ struct kvm_arch { unsigned long irq_sources_bitmap; s64 kvmclock_offset; + + /* + * This also protects nr_vcpus_matched_tsc which is read from a + * preemption-disabled region, so it must be a raw spinlock. + */ raw_spinlock_t tsc_write_lock; u64 last_tsc_nsec; u64 last_tsc_write; @@ -1097,7 +1102,7 @@ struct kvm_arch { u64 cur_tsc_generation; int nr_vcpus_matched_tsc; - spinlock_t pvclock_gtod_sync_lock; + seqcount_raw_spinlock_t pvclock_sc; bool use_master_clock; u64 master_kernel_ns; u64 master_cycle_now; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cb5d5cad5124..29156c49cd11 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2533,9 +2533,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; kvm_vcpu_write_tsc_offset(vcpu, offset); - raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { kvm->arch.nr_vcpus_matched_tsc = 0; } else if (!already_matched) { @@ -2543,7 +2541,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) } kvm_track_tsc_matching(vcpu); - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, @@ -2731,9 +2729,6 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) int vclock_mode; bool host_tsc_clocksource, vcpus_matched; - vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == - atomic_read(&kvm->online_vcpus)); - /* * If the host uses TSC clock, then passthrough TSC as stable * to the guest. @@ -2742,6 +2737,10 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) &ka->master_kernel_ns, &ka->master_cycle_now); + lockdep_assert_held(&kvm->arch.tsc_write_lock); + vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == + atomic_read(&kvm->online_vcpus)); + ka->use_master_clock = host_tsc_clocksource && vcpus_matched && !ka->backwards_tsc_observed && !ka->boot_vcpu_runs_old_kvmclock; @@ -2760,14 +2759,18 @@ static void kvm_make_mclock_inprogress_request(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); } -static void kvm_start_pvclock_update(struct kvm *kvm) +static void __kvm_start_pvclock_update(struct kvm *kvm) { - struct kvm_arch *ka = &kvm->arch; + raw_spin_lock_irq(&kvm->arch.tsc_write_lock); + write_seqcount_begin(&kvm->arch.pvclock_sc); +} +static void kvm_start_pvclock_update(struct kvm *kvm) +{ kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - spin_lock_irq(&ka->pvclock_gtod_sync_lock); + __kvm_start_pvclock_update(kvm); } static void kvm_end_pvclock_update(struct kvm *kvm) @@ -2776,7 +2779,8 @@ static void kvm_end_pvclock_update(struct kvm *kvm) struct kvm_vcpu *vcpu; int i; - spin_unlock_irq(&ka->pvclock_gtod_sync_lock); + write_seqcount_end(&ka->pvclock_sc); + raw_spin_unlock_irq(&ka->tsc_write_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -2797,20 +2801,12 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; - unsigned long flags; - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; return; } - data->flags |= KVM_CLOCK_TSC_STABLE; - hv_clock.tsc_timestamp = ka->master_cycle_now; - hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - /* both __this_cpu_read() and rdtsc() should be on the same cpu */ get_cpu(); @@ -2825,6 +2821,9 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) #endif data->host_tsc = rdtsc(); + data->flags |= KVM_CLOCK_TSC_STABLE; + hv_clock.tsc_timestamp = ka->master_cycle_now; + hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); @@ -2839,14 +2838,14 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) u64 get_kvmclock_ns(struct kvm *kvm) { struct kvm_clock_data data; + struct kvm_arch *ka = &kvm->arch; + unsigned seq; - /* - * Zero flags as it's accessed RMW, leave everything else uninitialized - * as clock is always written and no other fields are consumed. - */ - data.flags = 0; - - get_kvmclock(kvm, &data); + do { + seq = read_seqcount_begin(&ka->pvclock_sc); + data.flags = 0; + get_kvmclock(kvm, &data); + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); return data.clock; } @@ -2912,6 +2911,7 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v, static int kvm_guest_time_update(struct kvm_vcpu *v) { unsigned long flags, tgt_tsc_khz; + unsigned seq; struct kvm_vcpu_arch *vcpu = &v->arch; struct kvm_arch *ka = &v->kvm->arch; s64 kernel_ns; @@ -2926,13 +2926,14 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) * If the host uses TSC clock, then passthrough TSC as stable * to the guest. */ - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); - use_master_clock = ka->use_master_clock; - if (use_master_clock) { - host_tsc = ka->master_cycle_now; - kernel_ns = ka->master_kernel_ns; - } - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + seq = read_seqcount_begin(&ka->pvclock_sc); + do { + use_master_clock = ka->use_master_clock; + if (use_master_clock) { + host_tsc = ka->master_cycle_now; + kernel_ns = ka->master_kernel_ns; + } + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); @@ -5855,10 +5856,15 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) { - struct kvm_clock_data data; + struct kvm_clock_data data = { 0 }; + unsigned seq; + + do { + seq = read_seqcount_begin(&kvm->arch.pvclock_sc); + data.flags = 0; + get_kvmclock(kvm, &data); + } while (read_seqcount_retry(&kvm->arch.pvclock_sc, seq)); - memset(&data, 0, sizeof(data)); - get_kvmclock(kvm, &data); if (copy_to_user(argp, &data, sizeof(data))) return -EFAULT; @@ -8159,9 +8165,7 @@ static void kvm_hyperv_tsc_notifier(void) kvm_max_guest_tsc_khz = tsc_khz; list_for_each_entry(kvm, &vm_list, vm_list) { - struct kvm_arch *ka = &kvm->arch; - - spin_lock_irq(&ka->pvclock_gtod_sync_lock); + __kvm_start_pvclock_update(kvm); pvclock_update_vm_gtod_copy(kvm); kvm_end_pvclock_update(kvm); } @@ -11188,8 +11192,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) raw_spin_lock_init(&kvm->arch.tsc_write_lock); mutex_init(&kvm->arch.apic_map_lock); - spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); - + seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock); kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); pvclock_update_vm_gtod_copy(kvm); From patchwork Thu Sep 16 18:15:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03FF5C433F5 for ; Thu, 16 Sep 2021 18:25:02 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C5C93611C4 for ; Thu, 16 Sep 2021 18:25:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C5C93611C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=5PoEt+Iq39NZrOXk05bf6G1fu5sFb8S2oN2YeP3QltU=; b=jqgnHJuRZlBG9JWD5Iz18YUWwK 0c8gBE06wuMHUXLCL5prZGvtjNwb1ykKxBSUCqFX7eEJ8I+mr9/xU9qqSKMonj30Zt94fPCmvWqsy BnL5f/Xu/SonCISPGY+mS6Fj36w8LDvhaXMpTZyHvusjYQqkiR3gKZiScEtNL7sOpamZDGhsdV3ZN zQSje9jA9gIluw9OS4/WDD59fgdAAlLUOpsI2lAUc+/pe08N2uySrUpnm9hvgzIo2kuTkbw1iKwOo Rg244k0/C60z/HWON9vxeU96m2zeDclpm2EFiQSJuApwmP7SfbCeZqLgs3Qj/hYjL+DcTjVl1rPhT wmJVZPXQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQw2O-00C5J8-8e; Thu, 16 Sep 2021 18:23:09 +0000 Received: from mail-qv1-xf49.google.com ([2607:f8b0:4864:20::f49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvK-00C1yj-3v for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:52 +0000 Received: by mail-qv1-xf49.google.com with SMTP id z6-20020a056214060600b0037a3f6bd9abso63316447qvw.3 for ; Thu, 16 Sep 2021 11:15:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kz3OtxRFyvuoMc+GBdo3WelMyguFcbw6nvQK4NKbTKo=; b=WGjrebjRTcERQwIM8hSLM7Nl5SVAOazwDndoUi9mjyNGkCx02qOHis4a1hPszhq0yW Hpks5WNyxwB/kMmupaieFv3feod8Wdj0B75fYzVvYKmlfFMCSpPeDxRIS+Aa/ODBqSIE hpy6lg5Dx2K7qebKQQ13Fn6X7I55PjoakM4zKMArOaD14HZ/KFbrRTx+vhAhJBeFR4YI rJ04K4maFwajOHK2c7o8CmX2izEPYvVchseyrggB+NJRzs7hghKXo7nPzRWu4qIPajs7 vIDtLSVoJTB9QMBi2AJIevhOF/EgVIob83DLDBKfYj2DEaTzF1zs3va4+qd8WE3NeXba uxKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kz3OtxRFyvuoMc+GBdo3WelMyguFcbw6nvQK4NKbTKo=; b=KQxAJN/hcl5m9VqIeCJb20S3uooUCNW7gCN++0pmdOh8ObiuCzPIdNNHyrydImYTzN Wpad701z13wdgv/lx9kBYuiUMMY3gi/g32o7SmkDIrVU+JFU4aB6LiEvpv9lnVWu5f4k Ex4jgrk9xnZ+qO4p2OujdkMy6SX1kSiJjrOjf2k2dTmfhOWwPc1jjUjCI2bRk7DPZ0eg 7h1yoKg0+DCuZ+H/1sQvFqkF9l7/n1l0TVgF02SE+QDD5qF77hWuEkubppr72zDeFzHZ OkjsxT9o5wDdn6VUyuBGipao7PdCVaL7oIDb2uzy45InVnl++RZS4KkR4sv/hWGgNx1v g24Q== X-Gm-Message-State: AOAM530gVN9JBX1dRbZUdbe3gsSMYH6w3P9CDFeJxCRmiGOcO79NNwI0 S5RJgyzq/g+hss0KCm3Rr+F0f14rSiQ= X-Google-Smtp-Source: ABdhPJxkIq2pjE4hQkHq1iuBeO8+WA+QPDKfz8Iecr8PSvZkKcaqo8w22TSP1+JhIpwxYCAaCvA+D8YErlY= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:1345:: with SMTP id 66mr8429706ybt.502.1631816148619; Thu, 16 Sep 2021 11:15:48 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:37 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-7-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 6/7] KVM: x86: Refactor tsc synchronization code From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111550_238310_ACC79C87 X-CRM114-Status: GOOD ( 15.52 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Refactor kvm_synchronize_tsc to make a new function that allows callers to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly for the sake of participating in TSC synchronization. Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 100 ++++++++++++++++++++++++++------------------- 1 file changed, 58 insertions(+), 42 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 29156c49cd11..1ea65bb2e74d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2447,13 +2447,68 @@ static inline bool kvm_check_tsc_unstable(void) return check_tsc_unstable(); } +/* + * Infers attempts to synchronize the guest's tsc from host writes. Sets the + * offset for the vcpu and tracks the TSC matching generation that the vcpu + * participates in. + */ +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, + u64 ns, bool matched) +{ + struct kvm *kvm = vcpu->kvm; + bool already_matched; + + lockdep_assert_held(&kvm->arch.tsc_write_lock); + + already_matched = + (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); + + /* + * We also track th most recent recorded KHZ, write and time to + * allow the matching interval to be extended at each write. + */ + kvm->arch.last_tsc_nsec = ns; + kvm->arch.last_tsc_write = tsc; + kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + + vcpu->arch.last_guest_tsc = tsc; + + /* Keep track of which generation this VCPU has synchronized to */ + vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; + vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; + vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; + + kvm_vcpu_write_tsc_offset(vcpu, offset); + + if (!matched) { + /* + * We split periods of matched TSC writes into generations. + * For each generation, we track the original measured + * nanosecond time, offset, and write, so if TSCs are in + * sync, we can match exact offset, and if not, we can match + * exact software computation in compute_guest_tsc() + * + * These values are tracked in kvm->arch.cur_xxx variables. + */ + kvm->arch.cur_tsc_generation++; + kvm->arch.cur_tsc_nsec = ns; + kvm->arch.cur_tsc_write = tsc; + kvm->arch.cur_tsc_offset = offset; + + kvm->arch.nr_vcpus_matched_tsc = 0; + } else if (!already_matched) { + kvm->arch.nr_vcpus_matched_tsc++; + } + + kvm_track_tsc_matching(vcpu); +} + static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) { struct kvm *kvm = vcpu->kvm; u64 offset, ns, elapsed; unsigned long flags; - bool matched; - bool already_matched; + bool matched = false; bool synchronizing = false; raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); @@ -2499,48 +2554,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) offset = kvm_compute_l1_tsc_offset(vcpu, data); } matched = true; - already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); - } else { - /* - * We split periods of matched TSC writes into generations. - * For each generation, we track the original measured - * nanosecond time, offset, and write, so if TSCs are in - * sync, we can match exact offset, and if not, we can match - * exact software computation in compute_guest_tsc() - * - * These values are tracked in kvm->arch.cur_xxx variables. - */ - kvm->arch.cur_tsc_generation++; - kvm->arch.cur_tsc_nsec = ns; - kvm->arch.cur_tsc_write = data; - kvm->arch.cur_tsc_offset = offset; - matched = false; } - /* - * We also track th most recent recorded KHZ, write and time to - * allow the matching interval to be extended at each write. - */ - kvm->arch.last_tsc_nsec = ns; - kvm->arch.last_tsc_write = data; - kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; - - vcpu->arch.last_guest_tsc = data; - - /* Keep track of which generation this VCPU has synchronized to */ - vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; - vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; - vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; - - kvm_vcpu_write_tsc_offset(vcpu, offset); - - if (!matched) { - kvm->arch.nr_vcpus_matched_tsc = 0; - } else if (!already_matched) { - kvm->arch.nr_vcpus_matched_tsc++; - } - - kvm_track_tsc_matching(vcpu); + __kvm_synchronize_tsc(vcpu, offset, data, ns, matched); raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } From patchwork Thu Sep 16 18:15:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12499931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCCA9C433EF for ; Thu, 16 Sep 2021 18:25:56 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9BCFF6120C for ; Thu, 16 Sep 2021 18:25:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9BCFF6120C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=g+j9Uxqo/KguFAMOEP4u0n78JMwq3Oogi0xiE7XnYwY=; b=B/5OIUiLqfhvw5GE+0ALodeAIg 6bD7rKLGhYYVqz280YbdhINDt8DyPDqgDvDGvIbXKVq7teAPYNzwwRoPcdDjpVYHmHZPfrHVsJ6Ox jovpXljaIHeJeyIYuAPnQH7KziwZa2yKvm8uYH3utgvDHbTVTGK5dtOOc2YEC5g010XS/0a/QhPG7 FDBGI5z4UGWDjxLmR9X4/EaCr/Jo1yPPRY0joBo9dA5gsO1o3njTDOFZXSWGtl+5Wr/RGytXiJluG Hq7Y6ZKjRvZJZ043qxs7Kx1RZKDFW8kS8M3vp9n/QWlZEIl2EN5MARGJJgaa5AFrIBEibYvvmky+H TK4o3gbQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQw39-00C5cp-SG; Thu, 16 Sep 2021 18:23:57 +0000 Received: from mail-qv1-xf4a.google.com ([2607:f8b0:4864:20::f4a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQvvL-00C1zI-JJ for linux-arm-kernel@lists.infradead.org; Thu, 16 Sep 2021 18:15:54 +0000 Received: by mail-qv1-xf4a.google.com with SMTP id r18-20020a056214069200b0037a291a6081so63338881qvz.18 for ; Thu, 16 Sep 2021 11:15:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VH3IppIPyJlk4geF0mSGYs2zxrFbb9GgoOtLmZm5/FE=; b=Vo12tVG3RtAsNDi6OV+F7wIMBXsbuG2RxD+tMf8urHKLJkoeF18pDK2CPPZTQkJBcL 2Lw388abngZfO0pIbiSW8ZcNRvyKq10NftP2ptQskf10hUrglOX2PmHXrvPS+OE/Kqs/ kAtvXBz609yV4sGic0f2pViCVPp/SVveee+DXedXCDIrR08cQdflIP2426oys4UkiloS w4YOIsa+x8H7Z3/C7b8ZuRpNyV364AAshpwVVYK5Ylp8nXSrE/erz52mGfnxoggGDSro 6w+z4T6zxfrjDwejeOtUAXI2kPK6bJ6Hht9x6+Vf+RBgR9x1eGXhY8waT0lVBOOZaPEP qSGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VH3IppIPyJlk4geF0mSGYs2zxrFbb9GgoOtLmZm5/FE=; b=v/opzS3nyRNt8oNg0xbO19mA5LSc+Oq/1cQCFwX1SLJxnlHxJ8GaTR5w1pQZgxsQ1Y N7LZqtj1qZnmuc1aBuVxvF0uTrYEa2iVPAGjkOGsQwW8rN6bAePuOss89p29o5YOP3kJ MgxkVO58obenFyTBEiszPLzRfxMs5+RLSqH8Ika1o1T7CfB6bSA9wM1eou2XKhh8dHQk 40OrVy+pd+fEw/xPzakDUdo3SX5jRDXyec/zMXBC0chjU5RaVPg97WI43j2oe2CoszaS 1exbvnPW54pMxbeHHac7jzOkF6zVMi9JcH8qf8URnLtTUq/F/DtXVCW9Gay/VR/0MJ/M qWlw== X-Gm-Message-State: AOAM533xaretz+AdBbHPsvzJaLtS7RCsfeSfCVGvuGvClR4ThhboB1wO IaNylGIUlnJxpEo16JoaIqqHWz7PSmw= X-Google-Smtp-Source: ABdhPJyyi3f8L4dINe/JGUv1TgQCmr/+VgcXNgWZR5IAkOZSRC0ZPXjKWQDpqH2hkIGMziW/Cu2fLSPG0Bs= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:d2ce:: with SMTP id j197mr9192007ybg.160.1631816149749; Thu, 16 Sep 2021 11:15:49 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:38 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-8-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210916_111551_731469_F7CA2AF3 X-CRM114-Status: GOOD ( 24.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack Cc: Sean Christopherson Signed-off-by: Oliver Upton --- Documentation/virt/kvm/devices/vcpu.rst | 57 ++++++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 110 ++++++++++++++++++++++++ 4 files changed, 172 insertions(+) diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 2acec3b9ef65..3b399d727c11 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The base address must be 64 byte aligned and exist within a valid guest memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information including the layout of the stolen time structure. + +4. GROUP: KVM_VCPU_TSC_CTRL +=========================== + +:Architectures: x86 + +4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET + +:Parameters: 64-bit unsigned TSC offset + +Returns: + + ======= ====================================== + -EFAULT Error reading/writing the provided + parameter address. + -ENXIO Attribute not supported + ======= ====================================== + +Specifies the guest's TSC offset relative to the host's TSC. The guest's +TSC is then derived by the following equation: + + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET + +This attribute is useful for the precise migration of a guest's TSC. The +following describes a possible algorithm to use for the migration of a +guest's TSC: + +From the source VMM process: + +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + +2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the + guest TSC offset (off_n). + +3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the + guest's TSC (freq). + +From the destination VMM process: + +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and + kvmclock nanoseconds (k_1). + +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 + +7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the + respective value derived in the previous step. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5accfe7246ce..09c678f2e616 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1096,6 +1096,7 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_write; u32 last_tsc_khz; + u64 last_tsc_offset; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 2ef1f6513c68..5a776a08f78c 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -504,4 +504,8 @@ struct kvm_pmu_event_filter { #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ +#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ +#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1ea65bb2e74d..1177604c805a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2470,6 +2470,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, kvm->arch.last_tsc_nsec = ns; kvm->arch.last_tsc_write = tsc; kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + kvm->arch.last_tsc_offset = offset; vcpu->arch.last_guest_tsc = tsc; @@ -4069,6 +4070,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: case KVM_CAP_SREGS2: case KVM_CAP_EXIT_ON_EMULATION_FAILURE: + case KVM_CAP_VCPU_ATTRIBUTES: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: @@ -4933,6 +4935,109 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) return 0; } +static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = -EFAULT; + if (put_user(vcpu->arch.l1_tsc_offset, uaddr)) + break; + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + struct kvm *kvm = vcpu->kvm; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: { + u64 offset, tsc, ns; + unsigned long flags; + bool matched; + + r = -EFAULT; + if (get_user(offset, uaddr)) + break; + + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); + + matched = (vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_offset == offset); + + tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; + ns = get_kvmclock_base_ns(); + + __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); + + r = 0; + break; + } + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu, + unsigned int ioctl, + void __user *argp) +{ + struct kvm_device_attr attr; + int r; + + if (copy_from_user(&attr, argp, sizeof(attr))) + return -EFAULT; + + if (attr.group != KVM_VCPU_TSC_CTRL) + return -ENXIO; + + switch (ioctl) { + case KVM_HAS_DEVICE_ATTR: + r = kvm_arch_tsc_has_attr(vcpu, &attr); + break; + case KVM_GET_DEVICE_ATTR: + r = kvm_arch_tsc_get_attr(vcpu, &attr); + break; + case KVM_SET_DEVICE_ATTR: + r = kvm_arch_tsc_set_attr(vcpu, &attr); + break; + } + + return r; +} + static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, struct kvm_enable_cap *cap) { @@ -5387,6 +5492,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = __set_sregs2(vcpu, u.sregs2); break; } + case KVM_HAS_DEVICE_ATTR: + case KVM_GET_DEVICE_ATTR: + case KVM_SET_DEVICE_ATTR: + r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); + break; default: r = -EINVAL; }