From patchwork Thu Sep 16 18:15:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500019 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA312C433EF for ; Thu, 16 Sep 2021 18:56:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0EDA6058D for ; Thu, 16 Sep 2021 18:56:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344089AbhIPS5u (ORCPT ); Thu, 16 Sep 2021 14:57:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245508AbhIPS53 (ORCPT ); Thu, 16 Sep 2021 14:57:29 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9501DC04A155 for ; Thu, 16 Sep 2021 11:15:43 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id e22-20020a05620a209600b003d5ff97bff7so44934123qka.1 for ; Thu, 16 Sep 2021 11:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=EqfJ4nsZ26w2OY+w0/ZRQGVHhFarrWqQ77C9H9qe4wk=; b=jCgTfuzY3Xi/FfyO0lx2uvq5fA0lVuB+u2I9budg5JdSmdUW6fkucj/6Iswn5+9u81 S5Hii45+Igo8DnjUigtICHwVREClRsu6Tt94jvAfHqx/W4EdvzOWJfxyBJRSO/PtALZt 511OJi7qp+hjaP2T2gNjcMes9QckyDHsJh+2BGWhGKw0M1JamoY1Gp59mcwwonYWw3jo YzlI/iWllIFsVn66gc/48kc2LhIAYzj4gD09j2aRz8ifMS/+7cKghKSzkamlf41sDmLH xwJx2x4VOpX8THrVq3tJaaLDKzBQd47kw3OExONFC73ii1Dcr8WZjUXHO6zWaFv0mYYj 2w9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=EqfJ4nsZ26w2OY+w0/ZRQGVHhFarrWqQ77C9H9qe4wk=; b=N01Is4iHvMqi5KY3fjT1c4Af0I2SqQYlFuXt/gK3rEaKgayQX5OeEToB6saX138ro8 QH0fBlOPPoVIBcW1cxjLwDRnZ9qEI5DINgpK9l8BFY5aFlpWwg8+xuUrsssNkpzJjTMr 5uCRjcT7nYTIvQ6nLY8BAYdibnEIjJnLP6LP0WoYN7gpeAMh54v5j/Svg805GuVPXdvj AjlF+SBK/HHXj272ZXQkmoDC/G0iOuW9CYhicvyjYFxfjgq86N9CABgAMgCr4Yn29Yed pn/mab6UWxWoaX6YJPyTaKapn7TwlNARHHgLGfX9OEhKVBtKA4kPnvjs+BOznc8X+xdt vRfQ== X-Gm-Message-State: AOAM533z1XFMRLuU5PBUdXNMtpPeJbt2OZsSa+in93N7THad7effb1eV XHEEJqCtCk/SVpypyWHKXDbFDrWGKaYcN/XvWlTGKEeQJPIu1oGPr5/9eH3Vtb8mBsG8W7vnkpS FXg61HDjYbc3ZApZoD2OwJh75Bbgx/C52KX1fgdZBfifnEl++mBIfdEkQDw== X-Google-Smtp-Source: ABdhPJzJVShjUP9eV6I0zKA9rrR+wmqb3KVny8yYAxYLYNtJU8sisB/xFAD1ErUlrDRx7ismNi4ONuO4ygA= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:431:: with SMTP id a17mr6549014qvy.48.1631816142691; Thu, 16 Sep 2021 11:15:42 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:32 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-2-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 1/7] kvm: x86: abstract locking around pvclock_update_vm_gtod_copy From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Paolo Bonzini Updates to the kvmclock parameters needs to do a complicated dance of KVM_REQ_MCLOCK_INPROGRESS and KVM_REQ_CLOCK_UPDATE in addition to taking pvclock_gtod_sync_lock. Place that in two functions that can be called on all of master clock update, KVM_SET_CLOCK, and Hyper-V reenlightenment. Signed-off-by: Paolo Bonzini Signed-off-by: Oliver Upton --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/x86.c | 62 +++++++++++++++------------------ 2 files changed, 29 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f8f48a7ec577..be6805fc0260 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1866,7 +1866,6 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier); unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu); bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); -void kvm_make_mclock_inprogress_request(struct kvm *kvm); void kvm_make_scan_ioapic_request(struct kvm *kvm); void kvm_make_scan_ioapic_request_mask(struct kvm *kvm, unsigned long *vcpu_bitmap); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 28ef14155726..1082b48418c3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2755,35 +2755,42 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) #endif } -void kvm_make_mclock_inprogress_request(struct kvm *kvm) +static void kvm_make_mclock_inprogress_request(struct kvm *kvm) { kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); } -static void kvm_gen_update_masterclock(struct kvm *kvm) +static void kvm_start_pvclock_update(struct kvm *kvm) { -#ifdef CONFIG_X86_64 - int i; - struct kvm_vcpu *vcpu; struct kvm_arch *ka = &kvm->arch; - unsigned long flags; - - kvm_hv_invalidate_tsc_page(kvm); kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); - pvclock_update_vm_gtod_copy(kvm); - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + spin_lock_irq(&ka->pvclock_gtod_sync_lock); +} +static void kvm_end_pvclock_update(struct kvm *kvm) +{ + struct kvm_arch *ka = &kvm->arch; + struct kvm_vcpu *vcpu; + int i; + + spin_unlock_irq(&ka->pvclock_gtod_sync_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); /* guest entries allowed */ kvm_for_each_vcpu(i, vcpu, kvm) kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); -#endif +} + +static void kvm_update_masterclock(struct kvm *kvm) +{ + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + kvm_end_pvclock_update(kvm); } u64 get_kvmclock_ns(struct kvm *kvm) @@ -6079,12 +6086,10 @@ long kvm_arch_vm_ioctl(struct file *filp, goto out; r = 0; - /* - * TODO: userspace has to take care of races with VCPU_RUN, so - * kvm_gen_update_masterclock() can be cut down to locked - * pvclock_update_vm_gtod_copy(). - */ - kvm_gen_update_masterclock(kvm); + + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); /* * This pairs with kvm_guest_time_update(): when masterclock is @@ -6093,15 +6098,12 @@ long kvm_arch_vm_ioctl(struct file *filp, * is slightly ahead) here we risk going negative on unsigned * 'system_time' when 'user_ns.clock' is very small. */ - spin_lock_irq(&ka->pvclock_gtod_sync_lock); if (kvm->arch.use_master_clock) now_ns = ka->master_kernel_ns; else now_ns = get_kvmclock_base_ns(); ka->kvmclock_offset = user_ns.clock - now_ns; - spin_unlock_irq(&ka->pvclock_gtod_sync_lock); - - kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); + kvm_end_pvclock_update(kvm); break; } case KVM_GET_CLOCK: { @@ -8107,14 +8109,13 @@ static void tsc_khz_changed(void *data) static void kvm_hyperv_tsc_notifier(void) { struct kvm *kvm; - struct kvm_vcpu *vcpu; int cpu; - unsigned long flags; mutex_lock(&kvm_lock); list_for_each_entry(kvm, &vm_list, vm_list) kvm_make_mclock_inprogress_request(kvm); + /* no guest entries from this point */ hyperv_stop_tsc_emulation(); /* TSC frequency always matches when on Hyper-V */ @@ -8125,16 +8126,11 @@ static void kvm_hyperv_tsc_notifier(void) list_for_each_entry(kvm, &vm_list, vm_list) { struct kvm_arch *ka = &kvm->arch; - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); + spin_lock_irq(&ka->pvclock_gtod_sync_lock); pvclock_update_vm_gtod_copy(kvm); - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - - kvm_for_each_vcpu(cpu, vcpu, kvm) - kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); - - kvm_for_each_vcpu(cpu, vcpu, kvm) - kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); + kvm_end_pvclock_update(kvm); } + mutex_unlock(&kvm_lock); } #endif @@ -9418,7 +9414,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) __kvm_migrate_timers(vcpu); if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) - kvm_gen_update_masterclock(vcpu->kvm); + kvm_update_masterclock(vcpu->kvm); if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu)) kvm_gen_kvmclock_update(vcpu); if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { From patchwork Thu Sep 16 18:15:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500023 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43804C433EF for ; Thu, 16 Sep 2021 18:56:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2C70760724 for ; Thu, 16 Sep 2021 18:56:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345196AbhIPS5y (ORCPT ); Thu, 16 Sep 2021 14:57:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245570AbhIPS53 (ORCPT ); Thu, 16 Sep 2021 14:57:29 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAF17C04A156 for ; Thu, 16 Sep 2021 11:15:44 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id x11-20020ac86b4b000000b00299d7592d31so63203429qts.0 for ; Thu, 16 Sep 2021 11:15:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ompqiNMWI2hga8pPlI1DdLrAhAHA2ErmtWHohcgkDnQ=; b=EIa4PYxeIB/jdTr4Bj3rabymZMfiRDUda4Z/1Om5oXzv+tZ0fVo/aVqKHI2InGS1OU j9lrVhRl1gkVZxnjRo2Omv6GbaDFwuTFsTyhpJYtmlVggm0E+aaYOkyBo2IY8jh1btCe GCVbynojpKzy/F6leYHtS6kaa6OBYmyZ/yg5VpAo7eUg7Ay9lq3XlUxJEOuwtyxYRwM7 U8r/qmsNggetc0Jtt0lQFuvAKtC4IYRkZQuGIYG8PCS2CcI+gaR0uAWDft1vZecQzTlV 1E3D6XDkidTe09oTgISDpS3sGZjyIeQ2ZsLL11W4oTK5YDPcDw9j6SfA/hsX0uy5AbmI bz8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ompqiNMWI2hga8pPlI1DdLrAhAHA2ErmtWHohcgkDnQ=; b=i3UtFNEVrV7weC1loiTRIvWg83dEGpPYc4rVRUBWWzErv6sn2YrN9gfp+k6dCDsnsJ oxhSAMVAcfhKr4OlStYOMzaigKPsQg+iwWw3bjD3RoIg5yBhFezSxr6yMF0FSHkJ5udb SepHDeXAdmrkqxIaybZz/1PaW5TH0xHp7qWUA1UL6vpGA4sl3E971IKjxhV2L3tdfIgO iSqnes+TN1zOVl2Xuk4z08loGmb1P9exx0u/ytc+AZcnVWbltzkozGrYwf+FpB2QJaeU LbNZfcN7td/3zccSRVS5n+zsT0sDaZ8rpFEkszk4mSwNArQTtIxdGzjF2cPrXhh/jgb8 Ds3g== X-Gm-Message-State: AOAM532VmJGB5in5+4LMhtM6NBB89EMLEWqVB7GWkI7GCEweUEbvRV7+ Doxdpvi2paGhVhMEoZIBZPMInfdB137GJb2Zb3oTQ5TVrX6puvy2EX/4WehsSdp9npi9JoBt6v0 I/KoAFd4RsUDn3PEBOWRJZoVwbqJtpCFyqthqnRcgj/CWe0/a9OqUJsyYHg== X-Google-Smtp-Source: ABdhPJyMtDkmedSBXQ9JneBR51sdoiXY3FTPAb7hN3Y4cUearQHH1ZnZgDcI1O9vcz/G/4rXwSUCJqg2mWg= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:13ee:: with SMTP id ch14mr6861648qvb.43.1631816143780; Thu, 16 Sep 2021 11:15:43 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:33 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-3-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 2/7] KVM: x86: extract KVM_GET_CLOCK/KVM_SET_CLOCK to separate functions From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Paolo Bonzini no functional change intended. Signed-off-by: Paolo Bonzini Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 99 ++++++++++++++++++++++++---------------------- 1 file changed, 52 insertions(+), 47 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1082b48418c3..c910cf31958f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5829,6 +5829,54 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) } #endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */ +static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_clock_data data; + u64 now_ns; + + now_ns = get_kvmclock_ns(kvm); + user_ns.clock = now_ns; + user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; + memset(&user_ns.pad, 0, sizeof(user_ns.pad)); + + if (copy_to_user(argp, &data, sizeof(data))) + return -EFAULT; + + return 0; +} + +static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_arch *ka = &kvm->arch; + struct kvm_clock_data data; + u64 now_ns; + + if (copy_from_user(&data, argp, sizeof(data))) + return -EFAULT; + + if (data.flags) + return -EINVAL; + + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + + /* + * This pairs with kvm_guest_time_update(): when masterclock is + * in use, we use master_kernel_ns + kvmclock_offset to set + * unsigned 'system_time' so if we use get_kvmclock_ns() (which + * is slightly ahead) here we risk going negative on unsigned + * 'system_time' when 'data.clock' is very small. + */ + if (kvm->arch.use_master_clock) + now_ns = ka->master_kernel_ns; + else + now_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = data.clock - now_ns; + kvm_end_pvclock_update(kvm); + return 0; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -6072,55 +6120,12 @@ long kvm_arch_vm_ioctl(struct file *filp, break; } #endif - case KVM_SET_CLOCK: { - struct kvm_arch *ka = &kvm->arch; - struct kvm_clock_data user_ns; - u64 now_ns; - - r = -EFAULT; - if (copy_from_user(&user_ns, argp, sizeof(user_ns))) - goto out; - - r = -EINVAL; - if (user_ns.flags) - goto out; - - r = 0; - - kvm_hv_invalidate_tsc_page(kvm); - kvm_start_pvclock_update(kvm); - pvclock_update_vm_gtod_copy(kvm); - - /* - * This pairs with kvm_guest_time_update(): when masterclock is - * in use, we use master_kernel_ns + kvmclock_offset to set - * unsigned 'system_time' so if we use get_kvmclock_ns() (which - * is slightly ahead) here we risk going negative on unsigned - * 'system_time' when 'user_ns.clock' is very small. - */ - if (kvm->arch.use_master_clock) - now_ns = ka->master_kernel_ns; - else - now_ns = get_kvmclock_base_ns(); - ka->kvmclock_offset = user_ns.clock - now_ns; - kvm_end_pvclock_update(kvm); + case KVM_SET_CLOCK: + r = kvm_vm_ioctl_set_clock(kvm, argp); break; - } - case KVM_GET_CLOCK: { - struct kvm_clock_data user_ns; - u64 now_ns; - - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); - - r = -EFAULT; - if (copy_to_user(argp, &user_ns, sizeof(user_ns))) - goto out; - r = 0; + case KVM_GET_CLOCK: + r = kvm_vm_ioctl_get_clock(kvm, argp); break; - } case KVM_MEMORY_ENCRYPT_OP: { r = -ENOTTY; if (kvm_x86_ops.mem_enc_op) From patchwork Thu Sep 16 18:15:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500027 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9BE6C433F5 for ; Thu, 16 Sep 2021 18:56:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C649060E52 for ; Thu, 16 Sep 2021 18:56:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344712AbhIPS55 (ORCPT ); Thu, 16 Sep 2021 14:57:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245110AbhIPS5a (ORCPT ); Thu, 16 Sep 2021 14:57:30 -0400 Received: from mail-oo1-xc49.google.com (mail-oo1-xc49.google.com [IPv6:2607:f8b0:4864:20::c49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1854C04A157 for ; Thu, 16 Sep 2021 11:15:45 -0700 (PDT) Received: by mail-oo1-xc49.google.com with SMTP id e17-20020a056820061100b002910b1828a0so34384119oow.16 for ; Thu, 16 Sep 2021 11:15:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=/xw1HPEE3yg6hDZiSRSN4mqpJW+LQ6wXxySnmKRFyJ8=; b=aeIy+Fj9Ddx5QaJjlHoSsG2pqbF8w2bOPA1NUF6/sz/meu/pc2y7XH5Frw9KOhU9jX /kH/TSaeEzKpU3E9F4mu2P4UwMb8CA8hVkOuGgpkMxluodvQx0JB40MGxrC/28jBzSpS tOcue5Y0S/zW65/M/S+fci0vMm5hJ5aXdp3cKtcIkEKR27UTz5eqwFNzBMVkFzM01TOD rlmCukD6+Sq5NdvlmiFK9SoTsk9V7CF9hxsEpXwf31yhlcLh4USsTGsziVGOqmjspN8o 0Ky6X+cH1ZOM4CgbuHJIjBmRAlEsdJzbNTWMO+pHL8eK5LJOZ0KcX4fxKffaaQujOn7V WyoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=/xw1HPEE3yg6hDZiSRSN4mqpJW+LQ6wXxySnmKRFyJ8=; b=uie8DFHgysaFaBZtacoOObCeASNNw+M6I4+cWgOqSu7ve7EucI4yn/7M/xhPgHauxG KnmA/oIj4FAoHYbJjIy/J6KttA8WStPLVL0B15KXlSYihMotCscNIPPPHlmJZZKU1S4/ UmFynFfBeDyHAq71083WC5HN1BgIfIXKeXfRsM3KiiwBcuQfoawGNcHzQIeEWPvLGhbB ZMkodpf5LILHVTrzyE4zEFjK6/z8qwhnGGMOZl2gxxGlV3D1pt1IyAs29JQFbUP38YTV gXuq1oKi+6UPQrCUteq9DsB+RaekmpZa0RSSEQPKpAtdb327hNRFCZLHxM3m19s9Oxsh ajyg== X-Gm-Message-State: AOAM531/rrlpX+YoXYLvr5Uh55qGiaO41gedVe1RWmwZ3SS8SV3QcUIQ qnEboZJ3SDB+21mzMz+Ne21wI3jAzYgwmfe2GUmRAdj9/iSsOYOID7dkPPUrGheHexN2Cr1uZN6 eGtghMWzkHBZ0UQG7/wWclvXGxATYBs5PGbGcA0XAdbQFhmj8Y0rkA9+/2Q== X-Google-Smtp-Source: ABdhPJwhSdiutQtzePdJzjaNwKrgENSZ7USY45/SDq1qTsydz5Pm4qLqm2IgqkpSSQFsIfTfuENKHUrRnnM= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a9d:7f07:: with SMTP id j7mr5952745otq.84.1631816144992; Thu, 16 Sep 2021 11:15:44 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:34 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-4-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 3/7] KVM: x86: Fix potential race in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock outside of the pvclock sync lock. This is problematic, as the clock value written to the user may or may not actually correspond to a stable TSC. Fix the race by populating the entire kvm_clock_data structure behind the pvclock_gtod_sync_lock. Suggested-by: Sean Christopherson Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c910cf31958f..523c4e5c109f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2793,19 +2793,20 @@ static void kvm_update_masterclock(struct kvm *kvm) kvm_end_pvclock_update(kvm); } -u64 get_kvmclock_ns(struct kvm *kvm) +static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; unsigned long flags; - u64 ret; spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - return get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + return; } + data->flags |= KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); @@ -2817,13 +2818,26 @@ u64 get_kvmclock_ns(struct kvm *kvm) kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - ret = __pvclock_read_cycles(&hv_clock, rdtsc()); - } else - ret = get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + } else { + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + } put_cpu(); +} - return ret; +u64 get_kvmclock_ns(struct kvm *kvm) +{ + struct kvm_clock_data data; + + /* + * Zero flags as it's accessed RMW, leave everything else uninitialized + * as clock is always written and no other fields are consumed. + */ + data.flags = 0; + + get_kvmclock(kvm, &data); + return data.clock; } static void kvm_setup_pvclock_page(struct kvm_vcpu *v, @@ -5832,13 +5846,9 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) { struct kvm_clock_data data; - u64 now_ns; - - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); + memset(&data, 0, sizeof(data)); + get_kvmclock(kvm, &data); if (copy_to_user(argp, &data, sizeof(data))) return -EFAULT; From patchwork Thu Sep 16 18:15:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BEB8C433FE for ; Thu, 16 Sep 2021 18:56:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83E6660724 for ; Thu, 16 Sep 2021 18:56:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244907AbhIPS54 (ORCPT ); Thu, 16 Sep 2021 14:57:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245637AbhIPS5a (ORCPT ); Thu, 16 Sep 2021 14:57:30 -0400 Received: from mail-io1-xd4a.google.com (mail-io1-xd4a.google.com [IPv6:2607:f8b0:4864:20::d4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A55DC04A158 for ; Thu, 16 Sep 2021 11:15:47 -0700 (PDT) Received: by mail-io1-xd4a.google.com with SMTP id d23-20020a056602281700b005b5b34670c7so13749461ioe.12 for ; Thu, 16 Sep 2021 11:15:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=krN97T8iTB8WO/XETWX7lQA+HuuTaZ6C799hkR3prSE=; b=EgpiipNExMeS0O6JHK/KOeOmoQxQDPxdhxTDLEreXijkJtqK15vvxnXOS+WpkPozOl 8Kaokdvg+5IKqrTmPoQhdzA+8+RjiHb+8l+cvULTx7gWE4ARJqzzjqmLEp9pA77qKDx4 6U9355QrUdSGV2iM7jv/yqK18b+Y5xGvyyDtvSffTu9F+4OYJ5ApqCOUalUFxjwj0c4T mOBgaqp4+wH9BA4gqrhO2Mrv0ZEheWDoVDlhcurg1TrK9g2Ny2z3fjuy+VL2nOKagqgL 5Cuq1OUPWXfMZe4uHTEuQ3t7QGhj6KcIk41CDj6ETNgAzYewx5EYTIEj9qWuHgzxRMgc 2V+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=krN97T8iTB8WO/XETWX7lQA+HuuTaZ6C799hkR3prSE=; b=4v+dEr47ycL3X9CvoPAwlpLNEJKq9X/MEW6ITeqkzmEa1mZqY5BL/iR/8Vy411xlEw Wv9b56jKoMwjVuKHw1EwGZsy8k3YsjZJQLE2V4eDGzoq/HFZGvnYGxl5DtkYh4Pi7uwa Yriwg4nYeIHrL19wb2mIRQTwngxmAk8Oat+0TSjMUtDBLxKCxlDxJqqdijCdeq9IW/mB +IMGLkr6DEMIZvuXlCrT8jseHHEoJ4LgfYiaW3QSpHzkS7FmhvymhZq+iPJfUMwNy81k BGeLxba4sQRGFgTuGO4tmktet8OC1gRO0Gy1R9h9B3sljppVCnY8QPvM45QLyK25ES+u RDkw== X-Gm-Message-State: AOAM533PeFvNKoIsEV70Z72m6NR13l4k25GZl4pTXuUDvLF33CJfBi+T dK2Dh/44t6vnWOwUS8jXg85uUbPBJq06ansQ0L2FxVG3dCpRECpHMrWGbgoIo/pxL46qT42EIbL m36OrfTEon2FEibKY4Pb4IuHiLNQ/eWo5Du9fdvNLF2pQnLNVJOxRfsuSoA== X-Google-Smtp-Source: ABdhPJxzOT8GhRabWgsG2ePckhnurZrQANgwlQ79Q3hcycMD98sVM5endWV3vInm9fMl1vqXDCA1rX+7OQ0= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6e02:1564:: with SMTP id k4mr5041760ilu.146.1631816146209; Thu, 16 Sep 2021 11:15:46 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:35 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-5-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by: Paolo Bonzini Signed-off-by: Oliver Upton --- Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++------- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/x86.c | 36 +++++++++++++++++++++------- include/uapi/linux/kvm.h | 7 +++++- 4 files changed, 70 insertions(+), 18 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index a6729c8cf063..d0b9c986cf6c 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -993,20 +993,34 @@ such as migration. When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the set of bits that KVM can return in struct kvm_clock_data's flag member. -The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned -value is the exact kvmclock value seen by all VCPUs at the instant -when KVM_GET_CLOCK was called. If clear, the returned value is simply -CLOCK_MONOTONIC plus a constant offset; the offset can be modified -with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, -but the exact value read by each VCPU could differ, because the host -TSC is not stable. +FLAGS: + +KVM_CLOCK_TSC_STABLE. If set, the returned value is the exact kvmclock +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant +offset; the offset can be modified with KVM_SET_CLOCK. KVM will try +to make all VCPUs follow this clock, but the exact value read by each +VCPU could differ, because the host TSC is not stable. + +KVM_CLOCK_REALTIME. If set, the `realtime` field in the kvm_clock_data +structure is populated with the value of the host's real time +clocksource at the instant when KVM_GET_CLOCK was called. If clear, +the `realtime` field does not contain a value. + +KVM_CLOCK_HOST_TSC. If set, the `host_tsc` field in the kvm_clock_data +structure is populated with the value of the host's timestamp counter (TSC) +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field +does not contain a value. :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter. In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios such as migration. +FLAGS: + +KVM_CLOCK_REALTIME. If set, KVM will compare the value of the `realtime` field +with the value of the host's real time clocksource at the instant when +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final +kvmclock value that will be provided to guests. + :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index be6805fc0260..9c34b5b63e39 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void); int alloc_all_memslots_rmaps(struct kvm *kvm); +#define KVM_CLOCK_VALID_FLAGS \ + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 523c4e5c109f..cb5d5cad5124 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) get_cpu(); if (__this_cpu_read(cpu_tsc_khz)) { +#ifdef CONFIG_X86_64 + struct timespec64 ts; + + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { + data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; + data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; + } else +#endif + data->host_tsc = rdtsc(); + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); } else { data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; } @@ -4062,7 +4072,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_SYNC_X86_VALID_FIELDS; break; case KVM_CAP_ADJUST_CLOCK: - r = KVM_CLOCK_TSC_STABLE; + r = KVM_CLOCK_VALID_FLAGS; break; case KVM_CAP_X86_DISABLE_EXITS: r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | @@ -5859,12 +5869,12 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) { struct kvm_arch *ka = &kvm->arch; struct kvm_clock_data data; - u64 now_ns; + u64 now_raw_ns; if (copy_from_user(&data, argp, sizeof(data))) return -EFAULT; - if (data.flags) + if (data.flags & ~KVM_CLOCK_REALTIME) return -EINVAL; kvm_hv_invalidate_tsc_page(kvm); @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) * is slightly ahead) here we risk going negative on unsigned * 'system_time' when 'data.clock' is very small. */ - if (kvm->arch.use_master_clock) - now_ns = ka->master_kernel_ns; + if (data.flags & KVM_CLOCK_REALTIME) { + u64 now_real_ns = ktime_get_real_ns(); + + /* + * Avoid stepping the kvmclock backwards. + */ + if (now_real_ns > data.realtime) + data.clock += now_real_ns - data.realtime; + } + + if (ka->use_master_clock) + now_raw_ns = ka->master_kernel_ns; else - now_ns = get_kvmclock_base_ns(); - ka->kvmclock_offset = data.clock - now_ns; + now_raw_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = data.clock - now_raw_ns; kvm_end_pvclock_update(kvm); return 0; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a067410ebea5..d228bf394465 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1223,11 +1223,16 @@ struct kvm_irqfd { /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */ #define KVM_CLOCK_TSC_STABLE 2 +#define KVM_CLOCK_REALTIME (1 << 2) +#define KVM_CLOCK_HOST_TSC (1 << 3) struct kvm_clock_data { __u64 clock; __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; /* For KVM_CAP_SW_TLB */ From patchwork Thu Sep 16 18:15:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500029 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83E09C433EF for ; Thu, 16 Sep 2021 18:56:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69C3B60E52 for ; Thu, 16 Sep 2021 18:56:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230452AbhIPS56 (ORCPT ); Thu, 16 Sep 2021 14:57:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243594AbhIPS5a (ORCPT ); Thu, 16 Sep 2021 14:57:30 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 740A1C05BD0D for ; Thu, 16 Sep 2021 11:15:48 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id c27-20020a05620a165b00b003d3817c7c23so44672205qko.16 for ; Thu, 16 Sep 2021 11:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=RkkbGR7s65RKrojatUhUcN7AGOVGILJpEmrP2I02yFk=; b=XdE2G/H9rYPulSHBgAfwPKILoMSC5zHl88dGQYBg/i+k3tK/Lp6vKr7uKIi1bUdViU nO6mVeTKlL84xNt3/RjeQER4b5z04fYNOneo+sSl8hyWRLgRWx95iUtG4Rob9UELNUs5 49ZPhvIMW2J95O8+ZRMZ3wDpTpXFprQFUHnOo4v6XjG42qe48CzDlEWHDnLjkorS+fCE 2ennJ2nPGZu9b8rVUjiN2JEl+neb/jih0MVrD8xU0NtCs3//Q3k6ELLj1/lJZgN7Ksp/ Lpds1Nu+kE2B0NxTAYgjUdDyF4jGzjT36AYL5urhrF1eOhrQKfi0tf3X+b8/KR2bkpcx jocg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=RkkbGR7s65RKrojatUhUcN7AGOVGILJpEmrP2I02yFk=; b=qRfUbmAb2QIhFHnc9Lj/76RINq0vAf24fsbPycxiPodCermqnABAmjWI5w+87tso3H pB51/Xz7DDR7/Jju1eD7zXZgSCCov0KA2F4hF0rUZ4HoQ8wWVyMNMcuErrAkBOLJx7qM GNgskQuFpXIgP4euC0+yVLOtwP7dsTz3lOe7UPvPrLovsB/QYmaIAY0V0ZjXZ7EqPqQ8 NSV3AP9erbhJ0bDui/bU77xK6GUkflpVAuGSTfjwZtM76N7VWgVDwe6Ze5CqutYNLM9w xZQnluhpgfQ0CwQLRZXLQMkyFsUNnmTPwRonLH52Y7xtRAXLY3oWBKgb/zIiiGT6xfmo XsSQ== X-Gm-Message-State: AOAM530f/UYRu2kc8dmTq1etO088ae27kT/40gvrmbeBVCpgUQhXKFME PLjHbNVDYKytFNBekfC3IJWP0MQVzew5xP4wd7bHa14sg6IfQtF+OH/k8fAOkC0vezVZB+yuDjv 7JMxjfq3+x8rdkmEG+YIkpVvzdGJgbHg5WcISqwUfie9peDPMtlqUyQ96VA== X-Google-Smtp-Source: ABdhPJz+iJU5f6z/grqrVZ21yLMPrc5CcINLjptZt9ThEios+Cu4etZJHmTZs6+key3lJTsf+GErso+7PVs= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:3ca:: with SMTP id ce10mr6813032qvb.12.1631816147367; Thu, 16 Sep 2021 11:15:47 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:36 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-6-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 5/7] kvm: x86: protect masterclock with a seqcount From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Paolo Bonzini Protect the reference point for kvmclock with a seqcount, so that kvmclock updates for all vCPUs can proceed in parallel. Xen runstate updates will also run in parallel and not bounce the kvmclock cacheline. nr_vcpus_matched_tsc is updated outside pvclock_update_vm_gtod_copy though, so a spinlock must be kept for that one. Signed-off-by: Paolo Bonzini [Oliver - drop unused locals, don't double acquire tsc_write_lock] Signed-off-by: Oliver Upton --- arch/x86/include/asm/kvm_host.h | 7 ++- arch/x86/kvm/x86.c | 83 +++++++++++++++++---------------- 2 files changed, 49 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9c34b5b63e39..5accfe7246ce 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1087,6 +1087,11 @@ struct kvm_arch { unsigned long irq_sources_bitmap; s64 kvmclock_offset; + + /* + * This also protects nr_vcpus_matched_tsc which is read from a + * preemption-disabled region, so it must be a raw spinlock. + */ raw_spinlock_t tsc_write_lock; u64 last_tsc_nsec; u64 last_tsc_write; @@ -1097,7 +1102,7 @@ struct kvm_arch { u64 cur_tsc_generation; int nr_vcpus_matched_tsc; - spinlock_t pvclock_gtod_sync_lock; + seqcount_raw_spinlock_t pvclock_sc; bool use_master_clock; u64 master_kernel_ns; u64 master_cycle_now; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cb5d5cad5124..29156c49cd11 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2533,9 +2533,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; kvm_vcpu_write_tsc_offset(vcpu, offset); - raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { kvm->arch.nr_vcpus_matched_tsc = 0; } else if (!already_matched) { @@ -2543,7 +2541,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) } kvm_track_tsc_matching(vcpu); - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, @@ -2731,9 +2729,6 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) int vclock_mode; bool host_tsc_clocksource, vcpus_matched; - vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == - atomic_read(&kvm->online_vcpus)); - /* * If the host uses TSC clock, then passthrough TSC as stable * to the guest. @@ -2742,6 +2737,10 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) &ka->master_kernel_ns, &ka->master_cycle_now); + lockdep_assert_held(&kvm->arch.tsc_write_lock); + vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == + atomic_read(&kvm->online_vcpus)); + ka->use_master_clock = host_tsc_clocksource && vcpus_matched && !ka->backwards_tsc_observed && !ka->boot_vcpu_runs_old_kvmclock; @@ -2760,14 +2759,18 @@ static void kvm_make_mclock_inprogress_request(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); } -static void kvm_start_pvclock_update(struct kvm *kvm) +static void __kvm_start_pvclock_update(struct kvm *kvm) { - struct kvm_arch *ka = &kvm->arch; + raw_spin_lock_irq(&kvm->arch.tsc_write_lock); + write_seqcount_begin(&kvm->arch.pvclock_sc); +} +static void kvm_start_pvclock_update(struct kvm *kvm) +{ kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - spin_lock_irq(&ka->pvclock_gtod_sync_lock); + __kvm_start_pvclock_update(kvm); } static void kvm_end_pvclock_update(struct kvm *kvm) @@ -2776,7 +2779,8 @@ static void kvm_end_pvclock_update(struct kvm *kvm) struct kvm_vcpu *vcpu; int i; - spin_unlock_irq(&ka->pvclock_gtod_sync_lock); + write_seqcount_end(&ka->pvclock_sc); + raw_spin_unlock_irq(&ka->tsc_write_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -2797,20 +2801,12 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; - unsigned long flags; - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; return; } - data->flags |= KVM_CLOCK_TSC_STABLE; - hv_clock.tsc_timestamp = ka->master_cycle_now; - hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - /* both __this_cpu_read() and rdtsc() should be on the same cpu */ get_cpu(); @@ -2825,6 +2821,9 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) #endif data->host_tsc = rdtsc(); + data->flags |= KVM_CLOCK_TSC_STABLE; + hv_clock.tsc_timestamp = ka->master_cycle_now; + hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); @@ -2839,14 +2838,14 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) u64 get_kvmclock_ns(struct kvm *kvm) { struct kvm_clock_data data; + struct kvm_arch *ka = &kvm->arch; + unsigned seq; - /* - * Zero flags as it's accessed RMW, leave everything else uninitialized - * as clock is always written and no other fields are consumed. - */ - data.flags = 0; - - get_kvmclock(kvm, &data); + do { + seq = read_seqcount_begin(&ka->pvclock_sc); + data.flags = 0; + get_kvmclock(kvm, &data); + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); return data.clock; } @@ -2912,6 +2911,7 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v, static int kvm_guest_time_update(struct kvm_vcpu *v) { unsigned long flags, tgt_tsc_khz; + unsigned seq; struct kvm_vcpu_arch *vcpu = &v->arch; struct kvm_arch *ka = &v->kvm->arch; s64 kernel_ns; @@ -2926,13 +2926,14 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) * If the host uses TSC clock, then passthrough TSC as stable * to the guest. */ - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); - use_master_clock = ka->use_master_clock; - if (use_master_clock) { - host_tsc = ka->master_cycle_now; - kernel_ns = ka->master_kernel_ns; - } - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + seq = read_seqcount_begin(&ka->pvclock_sc); + do { + use_master_clock = ka->use_master_clock; + if (use_master_clock) { + host_tsc = ka->master_cycle_now; + kernel_ns = ka->master_kernel_ns; + } + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); @@ -5855,10 +5856,15 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) { - struct kvm_clock_data data; + struct kvm_clock_data data = { 0 }; + unsigned seq; + + do { + seq = read_seqcount_begin(&kvm->arch.pvclock_sc); + data.flags = 0; + get_kvmclock(kvm, &data); + } while (read_seqcount_retry(&kvm->arch.pvclock_sc, seq)); - memset(&data, 0, sizeof(data)); - get_kvmclock(kvm, &data); if (copy_to_user(argp, &data, sizeof(data))) return -EFAULT; @@ -8159,9 +8165,7 @@ static void kvm_hyperv_tsc_notifier(void) kvm_max_guest_tsc_khz = tsc_khz; list_for_each_entry(kvm, &vm_list, vm_list) { - struct kvm_arch *ka = &kvm->arch; - - spin_lock_irq(&ka->pvclock_gtod_sync_lock); + __kvm_start_pvclock_update(kvm); pvclock_update_vm_gtod_copy(kvm); kvm_end_pvclock_update(kvm); } @@ -11188,8 +11192,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) raw_spin_lock_init(&kvm->arch.tsc_write_lock); mutex_init(&kvm->arch.apic_map_lock); - spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); - + seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock); kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); pvclock_update_vm_gtod_copy(kvm); From patchwork Thu Sep 16 18:15:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3E95C433F5 for ; Thu, 16 Sep 2021 18:56:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7F6960E08 for ; Thu, 16 Sep 2021 18:56:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343782AbhIPS6A (ORCPT ); Thu, 16 Sep 2021 14:58:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245671AbhIPS5a (ORCPT ); Thu, 16 Sep 2021 14:57:30 -0400 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A13AC04A159 for ; Thu, 16 Sep 2021 11:15:49 -0700 (PDT) Received: by mail-qv1-xf4a.google.com with SMTP id r18-20020a056214069200b0037a291a6081so63338396qvz.18 for ; Thu, 16 Sep 2021 11:15:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kz3OtxRFyvuoMc+GBdo3WelMyguFcbw6nvQK4NKbTKo=; b=WGjrebjRTcERQwIM8hSLM7Nl5SVAOazwDndoUi9mjyNGkCx02qOHis4a1hPszhq0yW Hpks5WNyxwB/kMmupaieFv3feod8Wdj0B75fYzVvYKmlfFMCSpPeDxRIS+Aa/ODBqSIE hpy6lg5Dx2K7qebKQQ13Fn6X7I55PjoakM4zKMArOaD14HZ/KFbrRTx+vhAhJBeFR4YI rJ04K4maFwajOHK2c7o8CmX2izEPYvVchseyrggB+NJRzs7hghKXo7nPzRWu4qIPajs7 vIDtLSVoJTB9QMBi2AJIevhOF/EgVIob83DLDBKfYj2DEaTzF1zs3va4+qd8WE3NeXba uxKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kz3OtxRFyvuoMc+GBdo3WelMyguFcbw6nvQK4NKbTKo=; b=kfSX7ESuB9H/eQgb2hcbWUKFPGQfpnfScrOB8KmQewVrBhNp/F4PLWtW/W5JOa/axM Wo42CrGQmPfuyT3qDAkHA3jSEQ9UtSevYw8lBwFq0v0na/k/sjWsz1tFDnOGbIkZEaOM eMJ8gU20AmopyRyl1TI/zdlO2BGLO0wyoefFIf1EntthTwbXv1oortTmpJUr2R5cRIpS CTcn9MQG9Io9vbCzbKCMw7cYF2wRJQNTNy0LUz3Nvacyi8qO029pqu8KGwitJ4HX0rZB RemFlO3lyCOnvGrgbNjve/iKTt7m+m4qC26rjphkf66QxK7DZIlTinViKdlN17lhYUi1 BMCQ== X-Gm-Message-State: AOAM530TMu1sg4Y7BxEqBwB4rr3jOZc3PW4Z4lWsbH+TStxh+FtPTb54 eCilPwsC0nwcHGWyvfHRb5aoAMFNYb4trBOcF/UxljrWWZP5i2wOnr5i/HBMTaw5aUg88m7BspV GvcSXc4tQN25zoE+rHqdS5hm9melrVjP9QTNPVtunLQaEOhOGtHQ7LexeNw== X-Google-Smtp-Source: ABdhPJxkIq2pjE4hQkHq1iuBeO8+WA+QPDKfz8Iecr8PSvZkKcaqo8w22TSP1+JhIpwxYCAaCvA+D8YErlY= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:1345:: with SMTP id 66mr8429706ybt.502.1631816148619; Thu, 16 Sep 2021 11:15:48 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:37 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-7-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 6/7] KVM: x86: Refactor tsc synchronization code From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor kvm_synchronize_tsc to make a new function that allows callers to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly for the sake of participating in TSC synchronization. Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 100 ++++++++++++++++++++++++++------------------- 1 file changed, 58 insertions(+), 42 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 29156c49cd11..1ea65bb2e74d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2447,13 +2447,68 @@ static inline bool kvm_check_tsc_unstable(void) return check_tsc_unstable(); } +/* + * Infers attempts to synchronize the guest's tsc from host writes. Sets the + * offset for the vcpu and tracks the TSC matching generation that the vcpu + * participates in. + */ +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, + u64 ns, bool matched) +{ + struct kvm *kvm = vcpu->kvm; + bool already_matched; + + lockdep_assert_held(&kvm->arch.tsc_write_lock); + + already_matched = + (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); + + /* + * We also track th most recent recorded KHZ, write and time to + * allow the matching interval to be extended at each write. + */ + kvm->arch.last_tsc_nsec = ns; + kvm->arch.last_tsc_write = tsc; + kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + + vcpu->arch.last_guest_tsc = tsc; + + /* Keep track of which generation this VCPU has synchronized to */ + vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; + vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; + vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; + + kvm_vcpu_write_tsc_offset(vcpu, offset); + + if (!matched) { + /* + * We split periods of matched TSC writes into generations. + * For each generation, we track the original measured + * nanosecond time, offset, and write, so if TSCs are in + * sync, we can match exact offset, and if not, we can match + * exact software computation in compute_guest_tsc() + * + * These values are tracked in kvm->arch.cur_xxx variables. + */ + kvm->arch.cur_tsc_generation++; + kvm->arch.cur_tsc_nsec = ns; + kvm->arch.cur_tsc_write = tsc; + kvm->arch.cur_tsc_offset = offset; + + kvm->arch.nr_vcpus_matched_tsc = 0; + } else if (!already_matched) { + kvm->arch.nr_vcpus_matched_tsc++; + } + + kvm_track_tsc_matching(vcpu); +} + static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) { struct kvm *kvm = vcpu->kvm; u64 offset, ns, elapsed; unsigned long flags; - bool matched; - bool already_matched; + bool matched = false; bool synchronizing = false; raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); @@ -2499,48 +2554,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) offset = kvm_compute_l1_tsc_offset(vcpu, data); } matched = true; - already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); - } else { - /* - * We split periods of matched TSC writes into generations. - * For each generation, we track the original measured - * nanosecond time, offset, and write, so if TSCs are in - * sync, we can match exact offset, and if not, we can match - * exact software computation in compute_guest_tsc() - * - * These values are tracked in kvm->arch.cur_xxx variables. - */ - kvm->arch.cur_tsc_generation++; - kvm->arch.cur_tsc_nsec = ns; - kvm->arch.cur_tsc_write = data; - kvm->arch.cur_tsc_offset = offset; - matched = false; } - /* - * We also track th most recent recorded KHZ, write and time to - * allow the matching interval to be extended at each write. - */ - kvm->arch.last_tsc_nsec = ns; - kvm->arch.last_tsc_write = data; - kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; - - vcpu->arch.last_guest_tsc = data; - - /* Keep track of which generation this VCPU has synchronized to */ - vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; - vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; - vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; - - kvm_vcpu_write_tsc_offset(vcpu, offset); - - if (!matched) { - kvm->arch.nr_vcpus_matched_tsc = 0; - } else if (!already_matched) { - kvm->arch.nr_vcpus_matched_tsc++; - } - - kvm_track_tsc_matching(vcpu); + __kvm_synchronize_tsc(vcpu, offset, data, ns, matched); raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } From patchwork Thu Sep 16 18:15:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12500033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F559C433EF for ; Thu, 16 Sep 2021 18:56:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECB3F60E54 for ; Thu, 16 Sep 2021 18:56:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343589AbhIPS6B (ORCPT ); Thu, 16 Sep 2021 14:58:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234301AbhIPS5a (ORCPT ); Thu, 16 Sep 2021 14:57:30 -0400 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA5CEC04A15A for ; Thu, 16 Sep 2021 11:15:50 -0700 (PDT) Received: by mail-qv1-xf4a.google.com with SMTP id p12-20020ad4496c000000b0037a535cb8b2so63311332qvy.15 for ; Thu, 16 Sep 2021 11:15:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VH3IppIPyJlk4geF0mSGYs2zxrFbb9GgoOtLmZm5/FE=; b=Vo12tVG3RtAsNDi6OV+F7wIMBXsbuG2RxD+tMf8urHKLJkoeF18pDK2CPPZTQkJBcL 2Lw388abngZfO0pIbiSW8ZcNRvyKq10NftP2ptQskf10hUrglOX2PmHXrvPS+OE/Kqs/ kAtvXBz609yV4sGic0f2pViCVPp/SVveee+DXedXCDIrR08cQdflIP2426oys4UkiloS w4YOIsa+x8H7Z3/C7b8ZuRpNyV364AAshpwVVYK5Ylp8nXSrE/erz52mGfnxoggGDSro 6w+z4T6zxfrjDwejeOtUAXI2kPK6bJ6Hht9x6+Vf+RBgR9x1eGXhY8waT0lVBOOZaPEP qSGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VH3IppIPyJlk4geF0mSGYs2zxrFbb9GgoOtLmZm5/FE=; b=0YTnsX8Uj7edpe+7Uyfx8HGi7VBc5l5QevnNmW6F3uSmgMSe4UJFCJu68PHMi8o3cd 3lMjv9hCXCpAq2R1+rqP/jLR8Cetnn9cHp2BJsX1Uo2X9FDKqoHMbx1uSmvqX/Mxym3u 32OR0fNdAqMPXLeZp7YsTq+LRnU/JK1GukoS739BPtYe3y93PA0tmbN7rGgJnZkReGNj V85Le1bvK0X0M4PonG1IgV8RoXtAkOSQh+7oNr8WKW5i4vVFCTy63caeekns+exx++US pKmMZRZ4LhwFGnbRVpTU/kXBjuORvuVCvQrujJ0ncAAPmUzYAhAcyGV9zBWlh91T10lo tpag== X-Gm-Message-State: AOAM530QwY3/zjGzfpHiBx2LT/bWDKodJhQTecvVSnuTi3fRsTUxRcL2 rYRklMf/Sm/+g8fXdITJxSy4Ssy7znTQrR3BNcmp5ObdiB6bo70R9MJd62Oei1qEM4jGDkBn3da 36dYa6LforqFgWjlsSU94Q7j9JwXeSWN1fpuOT5C/p+p+7cEj6M1xOO1v5g== X-Google-Smtp-Source: ABdhPJyyi3f8L4dINe/JGUv1TgQCmr/+VgcXNgWZR5IAkOZSRC0ZPXjKWQDpqH2hkIGMziW/Cu2fLSPG0Bs= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:d2ce:: with SMTP id j197mr9192007ybg.160.1631816149749; Thu, 16 Sep 2021 11:15:49 -0700 (PDT) Date: Thu, 16 Sep 2021 18:15:38 +0000 In-Reply-To: <20210916181538.968978-1-oupton@google.com> Message-Id: <20210916181538.968978-8-oupton@google.com> Mime-Version: 1.0 References: <20210916181538.968978-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.464.g1972c5931b-goog Subject: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack Cc: Sean Christopherson Signed-off-by: Oliver Upton --- Documentation/virt/kvm/devices/vcpu.rst | 57 ++++++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 110 ++++++++++++++++++++++++ 4 files changed, 172 insertions(+) diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 2acec3b9ef65..3b399d727c11 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The base address must be 64 byte aligned and exist within a valid guest memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information including the layout of the stolen time structure. + +4. GROUP: KVM_VCPU_TSC_CTRL +=========================== + +:Architectures: x86 + +4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET + +:Parameters: 64-bit unsigned TSC offset + +Returns: + + ======= ====================================== + -EFAULT Error reading/writing the provided + parameter address. + -ENXIO Attribute not supported + ======= ====================================== + +Specifies the guest's TSC offset relative to the host's TSC. The guest's +TSC is then derived by the following equation: + + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET + +This attribute is useful for the precise migration of a guest's TSC. The +following describes a possible algorithm to use for the migration of a +guest's TSC: + +From the source VMM process: + +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + +2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the + guest TSC offset (off_n). + +3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the + guest's TSC (freq). + +From the destination VMM process: + +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and + kvmclock nanoseconds (k_1). + +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 + +7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the + respective value derived in the previous step. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5accfe7246ce..09c678f2e616 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1096,6 +1096,7 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_write; u32 last_tsc_khz; + u64 last_tsc_offset; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 2ef1f6513c68..5a776a08f78c 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -504,4 +504,8 @@ struct kvm_pmu_event_filter { #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ +#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ +#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1ea65bb2e74d..1177604c805a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2470,6 +2470,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, kvm->arch.last_tsc_nsec = ns; kvm->arch.last_tsc_write = tsc; kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + kvm->arch.last_tsc_offset = offset; vcpu->arch.last_guest_tsc = tsc; @@ -4069,6 +4070,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: case KVM_CAP_SREGS2: case KVM_CAP_EXIT_ON_EMULATION_FAILURE: + case KVM_CAP_VCPU_ATTRIBUTES: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: @@ -4933,6 +4935,109 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) return 0; } +static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = -EFAULT; + if (put_user(vcpu->arch.l1_tsc_offset, uaddr)) + break; + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + struct kvm *kvm = vcpu->kvm; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: { + u64 offset, tsc, ns; + unsigned long flags; + bool matched; + + r = -EFAULT; + if (get_user(offset, uaddr)) + break; + + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); + + matched = (vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_offset == offset); + + tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; + ns = get_kvmclock_base_ns(); + + __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); + + r = 0; + break; + } + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu, + unsigned int ioctl, + void __user *argp) +{ + struct kvm_device_attr attr; + int r; + + if (copy_from_user(&attr, argp, sizeof(attr))) + return -EFAULT; + + if (attr.group != KVM_VCPU_TSC_CTRL) + return -ENXIO; + + switch (ioctl) { + case KVM_HAS_DEVICE_ATTR: + r = kvm_arch_tsc_has_attr(vcpu, &attr); + break; + case KVM_GET_DEVICE_ATTR: + r = kvm_arch_tsc_get_attr(vcpu, &attr); + break; + case KVM_SET_DEVICE_ATTR: + r = kvm_arch_tsc_set_attr(vcpu, &attr); + break; + } + + return r; +} + static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, struct kvm_enable_cap *cap) { @@ -5387,6 +5492,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = __set_sregs2(vcpu, u.sregs2); break; } + case KVM_HAS_DEVICE_ATTR: + case KVM_GET_DEVICE_ATTR: + case KVM_SET_DEVICE_ATTR: + r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); + break; default: r = -EINVAL; }