From patchwork Mon Aug 16 00:11:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D711FC4320A for ; Mon, 16 Aug 2021 00:11:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4A44613A7 for ; Mon, 16 Aug 2021 00:11:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231883AbhHPAMN (ORCPT ); Sun, 15 Aug 2021 20:12:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231668AbhHPAMG (ORCPT ); Sun, 15 Aug 2021 20:12:06 -0400 Received: from mail-il1-x149.google.com (mail-il1-x149.google.com [IPv6:2607:f8b0:4864:20::149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03722C061764 for ; Sun, 15 Aug 2021 17:11:36 -0700 (PDT) Received: by mail-il1-x149.google.com with SMTP id y8-20020a92c748000000b00224811cb945so690024ilp.6 for ; Sun, 15 Aug 2021 17:11:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=J0lJJJkW+NmZlpb6fpFofwzCF2N0YPn/fE7VW8sEN0E=; b=onMRm+c0TmbvvH91s8krImCv1CKEk4b274WYjrqgKk7gKLOUw9VrXJClkZ1DmSna/8 3tepivV3mzLKvagWxhpO+/WQdfkPceIPNKfBb2bGq51hsXJu/WQCIkjs874UB7dk0W9W oXAUvi79qPx6l8QqPRpDusbWhyXuCw7E1AogPxUA0fgJz83AKxK+/O+IR3aKYsxFeqZM JjNSmzkGS23X+RekEHj1AacWn1arDI/RxjwhBttOKfWKf9+zoiG8nYGaSZvEuGaZs74t Vs8UyftyhM2f38KfhifByPOjskfYoVIv781b+80yLp+3M+udeuw+Zi4uFZAtu/XHAmkx u6/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=J0lJJJkW+NmZlpb6fpFofwzCF2N0YPn/fE7VW8sEN0E=; b=U1h8Q0rQQ3Q6b+6r7bB4hRZ8RxwK9GWQVTaBz2jwXmaRkzWdGLiKj67j7yBwoBBzsh eEzgn1spK034XQomzQWt3cBqiRzx75j2ONjeMvaSKo5Zqe+3f7Ubp7S52+SKhfMnRJy0 zHXi0Oc21v931PISpXuM9PcxZlKc7lx+qklIA/BO+/XRzRZpgPs0lBxQL5Zb4BW0RtsE MSpCdCNfZxEu5Knh9au0QpTmHUFhC/XJ/E5zhIIHHZmuWraRs138NFh1wuI/RZJeq4lU cpKhIAfg+lvyG4fGgSggelkrwrHfKUa9leS9XKuXsMvTPx4TGNH2zrhJ4wJmN0fV54vJ Yw+A== X-Gm-Message-State: AOAM531h/kB0GLQDRGOyEnW78T1mUY1cNPqeGHw7mJp1qkTzvi334uuW A9uD/s+iiuFe/SXtRpZp3j+n5+6Z+1RQRpGcSI5WNSgs095qC++ptnvXmQqnAqI2B5ZbxGGDG/H 2fX8KXa+DLUi2NcdnCiLfNUIF6uvQLoOwYmjMo+lbcCeZq0T+z+VBeLS/+w== X-Google-Smtp-Source: ABdhPJw09eGgNy3EXbjdqGFxxJgFJV5YrUuGjNKkF5zMC9eqOSxrLDgPkbJpMCJxluIG+Ci2XveMUkYVc70= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a02:cd09:: with SMTP id g9mr12684753jaq.87.1629072695306; Sun, 15 Aug 2021 17:11:35 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:25 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-2-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock outside of the pvclock sync lock. This is problematic, as the clock value written to the user may or may not actually correspond to a stable TSC. Fix the race by populating the entire kvm_clock_data structure behind the pvclock_gtod_sync_lock. Suggested-by: Sean Christopherson Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fdc0c18339fb..2f3929bd5f58 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2787,19 +2787,20 @@ static void kvm_update_masterclock(struct kvm *kvm) kvm_end_pvclock_update(kvm); } -u64 get_kvmclock_ns(struct kvm *kvm) +static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; unsigned long flags; - u64 ret; spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); - return get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + return; } + data->flags |= KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); @@ -2811,13 +2812,26 @@ u64 get_kvmclock_ns(struct kvm *kvm) kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - ret = __pvclock_read_cycles(&hv_clock, rdtsc()); - } else - ret = get_kvmclock_base_ns() + ka->kvmclock_offset; + data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + } else { + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; + } put_cpu(); +} - return ret; +u64 get_kvmclock_ns(struct kvm *kvm) +{ + struct kvm_clock_data data; + + /* + * Zero flags as it's accessed RMW, leave everything else uninitialized + * as clock is always written and no other fields are consumed. + */ + data.flags = 0; + + get_kvmclock(kvm, &data); + return data.clock; } static void kvm_setup_pvclock_page(struct kvm_vcpu *v, @@ -6098,11 +6112,14 @@ long kvm_arch_vm_ioctl(struct file *filp, } case KVM_GET_CLOCK: { struct kvm_clock_data user_ns; - u64 now_ns; - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; + /* + * Zero flags as it is accessed RMW, leave everything else + * uninitialized as clock is always written and no other fields + * are consumed. + */ + user_ns.flags = 0; + get_kvmclock(kvm, &user_ns); memset(&user_ns.pad, 0, sizeof(user_ns.pad)); r = -EFAULT; From patchwork Mon Aug 16 00:11:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E7A9C4320A for ; Mon, 16 Aug 2021 00:11:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4579D6137D for ; Mon, 16 Aug 2021 00:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232020AbhHPAMT (ORCPT ); Sun, 15 Aug 2021 20:12:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231719AbhHPAMH (ORCPT ); Sun, 15 Aug 2021 20:12:07 -0400 Received: from mail-il1-x14a.google.com (mail-il1-x14a.google.com [IPv6:2607:f8b0:4864:20::14a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14881C061764 for ; Sun, 15 Aug 2021 17:11:37 -0700 (PDT) Received: by mail-il1-x14a.google.com with SMTP id c7-20020a928e070000b0290222cccb8651so8698599ild.14 for ; Sun, 15 Aug 2021 17:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=JjAheCeU8OLcl85r3fMj4Tdvexc0vXT5B0Nj3krTsUc=; b=d6cpi6xW3Siu7V3Ocspqj9m35qtPwhy9VjX9IpuhYvzlBO9bm8dw1P5J7VBwrG/v2d Fupr4JhmaYS9+w6umx9EqHBP3LfnMtK1Kr/jxVbEcNDyy8oQGQ3no2t92dY3FKEEIXEa CfCckQags7sN+kVlG5Hdj57ck9GltxZaJKxUBfcwB427IahQVbWxenKXA58a5qIeC3DG 8MrhNb0gHlaZT3vFri2g7jpaq5huj5XKJlGzrhzAtTdOSINjh7hF5DnQT/nJ3MjbGZNe c04JMAGppd3t5hTbFCi+s+kwGbL+qIHiUT7IsFop3H7TGFWcKRZxC8p8YgqscoZog1mz mhbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=JjAheCeU8OLcl85r3fMj4Tdvexc0vXT5B0Nj3krTsUc=; b=s/0jD5r2AbmbzYE/QG2gW6eTAtJQi3AZ6MUl6LCUsxcNOuau9quhzcOoKcbitlE8ny QnBqMzjb5+wYotobH0AhbPa2UKst17KQgAmLQTxu9pkh0E6WQLMuRL7HjpZIVFlS9vvr Dw+yOaoT2/ujNLCwS/6ENDgIIM9h/bJ64U/CYW3AC+DY2p2X7TdZFntWBT1I4/2NiMcl lWyLbDU2y9FmCwbfTNiLZZdkKbKduQ5Hg8FoKr13G19oZv4fE1CysIkf3rRC+9HnCFex NAOkmkyImhsno8to4OShYfEbeOzJ7ul/VhbMEQa9q5xMtMw60JtLCMI4wGrxXbYbHFiD bA8A== X-Gm-Message-State: AOAM531D1AWITcf2EHGCL5+Bw4NwJGUZWnKrAY1m52YA7Qx0rvYilw/W /tKs5E9lnlj1ergDqlvpmL7vfzSYc3CmVredG5dWCJB6CNgOuI88JIJQlf6VTP8Xg1tfXdFraui n1NLuM/w00HqdmclUhVJlITbNGmh2QAXcK/GqyW5ETlEhj2B0ncBbvKeb5w== X-Google-Smtp-Source: ABdhPJzql/ws9nv1W9hhmm8kA+C+wyM8/Yyi52JOtJkxfAmPdz1NtGSbK43RlGb0QaSB6Cb2d7x8Jf49cT8= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a02:cca8:: with SMTP id t8mr13047843jap.51.1629072696400; Sun, 15 Aug 2021 17:11:36 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:26 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-3-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 2/6] KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Wrap the existing implementation of the KVM_{GET,SET}_CLOCK ioctls in helper methods. No functional change intended. Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 107 ++++++++++++++++++++++++--------------------- 1 file changed, 57 insertions(+), 50 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2f3929bd5f58..39eaa2fb2001 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5833,12 +5833,65 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state) } #endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */ +static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_clock_data data; + + /* + * Zero flags as it is accessed RMW, leave everything else + * uninitialized as clock is always written and no other fields + * are consumed. + */ + data.flags = 0; + get_kvmclock(kvm, &data); + memset(&data.pad, 0, sizeof(data.pad)); + + if (copy_to_user(argp, &data, sizeof(data))) + return -EFAULT; + + return 0; +} + +static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) +{ + struct kvm_arch *ka = &kvm->arch; + struct kvm_clock_data data; + u64 now_ns; + + if (copy_from_user(&data, argp, sizeof(data))) + return -EFAULT; + + if (data.flags) + return -EINVAL; + + kvm_hv_invalidate_tsc_page(kvm); + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + + /* + * This pairs with kvm_guest_time_update(): when masterclock is + * in use, we use master_kernel_ns + kvmclock_offset to set + * unsigned 'system_time' so if we use get_kvmclock_ns() (which + * is slightly ahead) here we risk going negative on unsigned + * 'system_time' when 'data.clock' is very small. + */ + if (kvm->arch.use_master_clock) + now_ns = ka->master_kernel_ns; + else + now_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = data.clock - now_ns; + kvm_end_pvclock_update(kvm); + + return 0; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { struct kvm *kvm = filp->private_data; void __user *argp = (void __user *)arg; int r = -ENOTTY; + /* * This union makes it completely explicit to gcc-3.x * that these two variables' stack usage should be @@ -6076,58 +6129,12 @@ long kvm_arch_vm_ioctl(struct file *filp, break; } #endif - case KVM_SET_CLOCK: { - struct kvm_arch *ka = &kvm->arch; - struct kvm_clock_data user_ns; - u64 now_ns; - - r = -EFAULT; - if (copy_from_user(&user_ns, argp, sizeof(user_ns))) - goto out; - - r = -EINVAL; - if (user_ns.flags) - goto out; - - r = 0; - - kvm_hv_invalidate_tsc_page(kvm); - kvm_start_pvclock_update(kvm); - pvclock_update_vm_gtod_copy(kvm); - - /* - * This pairs with kvm_guest_time_update(): when masterclock is - * in use, we use master_kernel_ns + kvmclock_offset to set - * unsigned 'system_time' so if we use get_kvmclock_ns() (which - * is slightly ahead) here we risk going negative on unsigned - * 'system_time' when 'user_ns.clock' is very small. - */ - if (kvm->arch.use_master_clock) - now_ns = ka->master_kernel_ns; - else - now_ns = get_kvmclock_base_ns(); - ka->kvmclock_offset = user_ns.clock - now_ns; - kvm_end_pvclock_update(kvm); + case KVM_SET_CLOCK: + r = kvm_vm_ioctl_set_clock(kvm, argp); break; - } - case KVM_GET_CLOCK: { - struct kvm_clock_data user_ns; - - /* - * Zero flags as it is accessed RMW, leave everything else - * uninitialized as clock is always written and no other fields - * are consumed. - */ - user_ns.flags = 0; - get_kvmclock(kvm, &user_ns); - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); - - r = -EFAULT; - if (copy_to_user(argp, &user_ns, sizeof(user_ns))) - goto out; - r = 0; + case KVM_GET_CLOCK: + r = kvm_vm_ioctl_get_clock(kvm, argp); break; - } case KVM_MEMORY_ENCRYPT_OP: { r = -ENOTTY; if (kvm_x86_ops.mem_enc_op) From patchwork Mon Aug 16 00:11:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E008C432BE for ; Mon, 16 Aug 2021 00:11:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B5FE6137D for ; Mon, 16 Aug 2021 00:11:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231765AbhHPAM1 (ORCPT ); Sun, 15 Aug 2021 20:12:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231738AbhHPAMJ (ORCPT ); Sun, 15 Aug 2021 20:12:09 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 607D6C061764 for ; Sun, 15 Aug 2021 17:11:38 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id w201-20020a25dfd2000000b00594695384d1so1664319ybg.20 for ; Sun, 15 Aug 2021 17:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fOqzwvgotEAk4OwdvYZ5BEZVM02Y3lbMzgT+LwM/G40=; b=gTZB/ZB/2P56kjMc1lKSCKMylc+9raK1aQ56bRXEHO3bpP/HN84Z2Z7FPl0iTMF2t0 3eYmTEPW2KF4yFED9pzDdLF6LaqF/eeuCylPGwe/vhiaVb8hiCPZSoDq1lgtw/NeP7yS CSMDPoQkJ3ZsB1IA+jgpZv4Y0ADwHzHhGyHxRWhwDTANC3C3iM37jl7shZi414dwxyBL oKe01LTg4IyRAqwhEhEye2DqNrU/rofkIMgqsV1rZR3pRqK3eD/3a1Ub1VQro+dytRhc BRGWJYN5CK3oeH2kj/UxJHm/9pi0im1RVruUevIWBgbRQb0dq/F9OKW0cxb2T1ghkiPL Pwyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fOqzwvgotEAk4OwdvYZ5BEZVM02Y3lbMzgT+LwM/G40=; b=P4NEhg8WnxyjYtTJImK2b2cLaUjReT68y0OJ0BDAmqeFOrmeJiH7WYB4jBPb8/PV10 1QM3Bq5OOFp3v3xZ6J+yyp+mCHC8ciVeSv6aU+sYhQteYh595A+OeumLXJsHmIK9Mh+Q eNEJK2NtuQt4Ncp0iSenbpseSoirSOQXKLpROdbPewtUaZLpXlADTDp0Ri5gQCt+aXT1 C3FbGVGKR3QBoQ69JTUv85bivItUSMSh+jDwCd7adm5TVAwZCKHOe12+JPfH6EuzI8TS ugggZMuVWKEhXX2jZoEavTlAuoKGXiX3EQzlDgTn3tTq/CF+lEU4WjvmgD77FDRmqEcs +8gA== X-Gm-Message-State: AOAM531aXe1G30uuO4lR3dqk7DggesxKCqDQHF5DzXrfyg+jNeVObrs6 PijJepVlKNhL3oJPrWZuFU+aFF3MITb38hPzLb8Wfm3ngrBBXmBP/26IWNnjFA6bXG114gzJHnL C56RgjdzEph4U8pwK3vqifwtrBLELxw+3NYbK6cfAkgoRagY3EXLn9BMskg== X-Google-Smtp-Source: ABdhPJxATQFnfStab+dOOYxVKWby13GU3DJ5ljaK+HxeHk2KhPLVTDc5Fu1HrzQZMUMtpIj5HWzGmsC1BrY= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:7405:: with SMTP id p5mr18675318ybc.94.1629072697457; Sun, 15 Aug 2021 17:11:37 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:27 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-4-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by: Paolo Bonzini Signed-off-by: Oliver Upton --- Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++++------- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/x86.c | 34 ++++++++++++++++++-------- include/uapi/linux/kvm.h | 7 +++++- 4 files changed, 66 insertions(+), 20 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 86d7ad3a126c..b3d12bf9fbf5 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -993,20 +993,34 @@ such as migration. When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the set of bits that KVM can return in struct kvm_clock_data's flag member. -The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned -value is the exact kvmclock value seen by all VCPUs at the instant -when KVM_GET_CLOCK was called. If clear, the returned value is simply -CLOCK_MONOTONIC plus a constant offset; the offset can be modified -with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, -but the exact value read by each VCPU could differ, because the host -TSC is not stable. +FLAGS: + +KVM_CLOCK_TSC_STABLE. If set, the returned value is the exact kvmclock +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant +offset; the offset can be modified with KVM_SET_CLOCK. KVM will try +to make all VCPUs follow this clock, but the exact value read by each +VCPU could differ, because the host TSC is not stable. + +KVM_CLOCK_REALTIME. If set, the `realtime` field in the kvm_clock_data +structure is populated with the value of the host's real time +clocksource at the instant when KVM_GET_CLOCK was called. If clear, +the `realtime` field does not contain a value. + +KVM_CLOCK_HOST_TSC. If set, the `host_tsc` field in the kvm_clock_data +structure is populated with the value of the host's timestamp counter (TSC) +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field +does not contain a value. :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter. In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios such as migration. +FLAGS: + +KVM_CLOCK_REALTIME. If set, KVM will compare the value of the `realtime` field +with the value of the host's real time clocksource at the instant when +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final +kvmclock value that will be provided to guests. + :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20daaf67a5bf..7fad2615f4a9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1916,4 +1916,7 @@ int kvm_cpu_dirty_log_size(void); int alloc_all_memslots_rmaps(struct kvm *kvm); +#define KVM_CLOCK_VALID_FLAGS \ + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 39eaa2fb2001..b1e9a4885be6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2809,10 +2809,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) get_cpu(); if (__this_cpu_read(cpu_tsc_khz)) { +#ifdef CONFIG_X86_64 + struct timespec64 ts; + + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { + data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; + data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; + } else +#endif + data->host_tsc = rdtsc(); + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); - data->clock = __pvclock_read_cycles(&hv_clock, rdtsc()); + data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); } else { data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; } @@ -4052,7 +4062,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_SYNC_X86_VALID_FIELDS; break; case KVM_CAP_ADJUST_CLOCK: - r = KVM_CLOCK_TSC_STABLE; + r = KVM_CLOCK_VALID_FLAGS; break; case KVM_CAP_X86_DISABLE_EXITS: r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | @@ -5837,14 +5847,8 @@ static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) { struct kvm_clock_data data; - /* - * Zero flags as it is accessed RMW, leave everything else - * uninitialized as clock is always written and no other fields - * are consumed. - */ - data.flags = 0; + memset(&data, 0, sizeof(data)); get_kvmclock(kvm, &data); - memset(&data.pad, 0, sizeof(data.pad)); if (copy_to_user(argp, &data, sizeof(data))) return -EFAULT; @@ -5861,13 +5865,23 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) if (copy_from_user(&data, argp, sizeof(data))) return -EFAULT; - if (data.flags) + if (data.flags & ~KVM_CLOCK_REALTIME) return -EINVAL; kvm_hv_invalidate_tsc_page(kvm); kvm_start_pvclock_update(kvm); pvclock_update_vm_gtod_copy(kvm); + if (data.flags & KVM_CLOCK_REALTIME) { + u64 now_real_ns = ktime_get_real_ns(); + + /* + * Avoid stepping the kvmclock backwards. + */ + if (now_real_ns > data.realtime) + data.clock += now_real_ns - data.realtime; + } + /* * This pairs with kvm_guest_time_update(): when masterclock is * in use, we use master_kernel_ns + kvmclock_offset to set diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a067410ebea5..d228bf394465 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1223,11 +1223,16 @@ struct kvm_irqfd { /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */ #define KVM_CLOCK_TSC_STABLE 2 +#define KVM_CLOCK_REALTIME (1 << 2) +#define KVM_CLOCK_HOST_TSC (1 << 3) struct kvm_clock_data { __u64 clock; __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; /* For KVM_CAP_SW_TLB */ From patchwork Mon Aug 16 00:11:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0101CC4338F for ; Mon, 16 Aug 2021 00:11:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDF786139E for ; Mon, 16 Aug 2021 00:11:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231725AbhHPAM0 (ORCPT ); Sun, 15 Aug 2021 20:12:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231742AbhHPAMK (ORCPT ); Sun, 15 Aug 2021 20:12:10 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 739FCC0613C1 for ; Sun, 15 Aug 2021 17:11:39 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id j9-20020a2581490000b02905897d81c63fso15017803ybm.8 for ; Sun, 15 Aug 2021 17:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kyk+rbWBkERQd/V+m5Ky7yGxQv7VTu6C8mlmv2dRF6E=; b=eUHubHDxtjyylPDdc7bFedowojWI9zLrQ6psYgTsmP3GTPLnhTiyG3JkHUedJyRGuV 2eC7QcQGo03/1Q1arzvIjJknUW+aRLXuYxOvgHebRDZQ+7hdr2FgzaxHCH/W37K8ExbY 9uhaQUa0ugMkV3wHWqJYNohGT7pEbAs0kvTvoALG1Idd6riRwTK8ubVHttvqg6pDcmgY R1VdJ2k/NjLhzNLeWBJXngB3Qv5y1wbhHQFfsF/Ri1L5Yh2n58lKXFT3Y2CGEQAj+riK Fl5HPuvhjJMALXNobtZyVFmABCcoh04VnbuWQBWv7Lpu4R2JoEVIcnsrmG4ADhqIYrQA rhBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kyk+rbWBkERQd/V+m5Ky7yGxQv7VTu6C8mlmv2dRF6E=; b=bR7deXuaSEf6oJGPisxKPuYtUSBYgjLeb+LVt97KIPEj+G8L/qpfjNJymZD6dSg6De y7CvJN0nO4Yjkg7AAZ5udNRxsnMdsflN2zpHM8tV6FxEXa/ocpAHACXW9pOnNDNl9Fvt 511VJIkczCRLm9wu+DE7XnYSYkgVFqAef3UkrpwiQWikKPIUl9sdBxnYSZrx+zwMuER+ 0wbv105Mo4W3aY6NzWCQuXhP3X2T+JpKi++lQk+dQKSb7HnsTJFAoGbOlE2LmbNJYM3I cOtYQ7ZZCeWV6ru5eLHxfZOSqxCkSuOlM/EtKeip7pHhbwy1qUDGdmKWDWYTi3WD9El2 5xZg== X-Gm-Message-State: AOAM531ndLBM9boddO3rygwRovellKfMXX620g+T6bGh5bKyu7xrWWg7 s97zTTqOJEPKk1zeGtPtVg6dipLuXjT28bH6QhSUNZxoyAvdPuvNmENsHv5h2+s2+XIez6ioP/0 M5ZD4iUrMM+G95xLZlQgmelzEWlOiMnbJg1vZyG54f3f4UlL6QYZsf/xntQ== X-Google-Smtp-Source: ABdhPJyMH3Sq1NjbgLuSEifGRtyldSI+jrBmr/Nsj225QWnfzIp+Bgns1yDUBBrraJg58Tk3MfA7ngTcz3A= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:7a04:: with SMTP id v4mr17335731ybc.261.1629072698596; Sun, 15 Aug 2021 17:11:38 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:28 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-5-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org A later change requires that the pvclock sync lock be taken while holding the tsc_write_lock. Change the locking in kvm_synchronize_tsc() to align with the requirement to isolate the locking change to its own commit. Cc: Sean Christopherson Signed-off-by: Oliver Upton --- Documentation/virt/kvm/locking.rst | 11 +++++++++++ arch/x86/kvm/x86.c | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 8138201efb09..0bf346adac2a 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -36,6 +36,9 @@ On x86: holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise there's no need to take kvm->arch.tdp_mmu_pages_lock at all). +- kvm->arch.tsc_write_lock is taken outside + kvm->arch.pvclock_gtod_sync_lock + Everything else is a leaf: no other lock is taken inside the critical sections. @@ -222,6 +225,14 @@ time it will be set using the Dirty tracking mechanism described above. :Comment: 'raw' because hardware enabling/disabling must be atomic /wrt migration. +:Name: kvm_arch::pvclock_gtod_sync_lock +:Type: raw_spinlock_t +:Arch: x86 +:Protects: kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write, + cur_tsc_offset,nr_vcpus_matched_tsc} +:Comment: 'raw' because updating the kvm master clock must not be + preempted. + :Name: kvm_arch::tsc_write_lock :Type: raw_spinlock :Arch: x86 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b1e9a4885be6..f1434cd388b9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2533,7 +2533,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; kvm_vcpu_write_tsc_offset(vcpu, offset); - raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { @@ -2544,6 +2543,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) kvm_track_tsc_matching(vcpu); spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, From patchwork Mon Aug 16 00:11:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7F34C43214 for ; Mon, 16 Aug 2021 00:11:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C416A6138F for ; Mon, 16 Aug 2021 00:11:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231921AbhHPAM1 (ORCPT ); Sun, 15 Aug 2021 20:12:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231843AbhHPAML (ORCPT ); Sun, 15 Aug 2021 20:12:11 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81659C061796 for ; Sun, 15 Aug 2021 17:11:40 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id c63-20020a25e5420000b0290580b26e708aso14973641ybh.12 for ; Sun, 15 Aug 2021 17:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Dz4OQO7fko2xnm4bF9gtGHyuF6byVH0iKDoKQ9+K0Hk=; b=nG02hDlOtRC4KMB166Vy/WGh+WM1WqWLqoL9SZhFphHrdt7+VGLAwmWCuohkTHG09W qutphgh99ADbQ04rhvA8LKDsAP+PLuBGZVfX6CsIlwTBHn6j1ge13dmrDOQc/smYQfij KPWPWRIVRlB3EriEo/xoctVVzJUObu/x7xovD9XeuNYhRk6uYKiZw+BV5pFtGdV9a4tc nvsgIvqHBuviNCQWPM8HO56W+tOa30KchZXhGEeUXYEC/ZDclh79E9PPgJx/+wcbJCHb /w6w1lECCO5Ja8sYwFNhywkGt3KmcdRiSDJcb6AlSJRaOymWy44wwnqT4ZcbZ+U1s16l nAag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Dz4OQO7fko2xnm4bF9gtGHyuF6byVH0iKDoKQ9+K0Hk=; b=Th0zxF6PHdilcCmIbg2PyTnolVWOwqo56iBxAFVsgg1n2TSHx+NQtHKNorE+z2xLhB hyUjoSKbz02BDutkhqMOu0Im172APVrn57GQZrBJjJkACcW1ILxHQcyVFmHTwNMCFPIO H6MAnAjZ+Xj/bp9rDcKDj1jnx6gvsUmkBBCMg5kOxOoruhkMF62cEqvh0i6vX5U4Bemy TzzcnQcLPDRYwuuTOme3w3g8VkkE/aB583ClNW9F8K/pA0r78HAI1z3BkznTS7cogTEp gKl6ND+JYLc5e9R6gMwpzyQ0UdDF7/ANPECsD8cIVJio6aY5XNvC+bXUwF1lYOqGguOR xJqg== X-Gm-Message-State: AOAM5300xUAJnhMm9sJJ4tnLDCB9eNbMdNOrIz3WtW6nBDzPO0fPhUoE oFHswmk2MGakIarvfCKIVA9soWmQ9At770prurY/JSA7aE+SsDUaHNf+Jn+ULilESKCU3jfEstx dtsxCY00JltGmFdB8sahJ5gnRj6EnI6SXseBN7C2w5PH49wx8ZHaUv5KK4w== X-Google-Smtp-Source: ABdhPJz9cl6CBqtVJqMeLgOBubeEV1ZIVpCy0Cv8/HD8GcR/AFOkA5Fnz9sYJeChQ3DsMEpVB1XuZCGCFoc= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a25:41ce:: with SMTP id o197mr18562489yba.365.1629072699675; Sun, 15 Aug 2021 17:11:39 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:29 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-6-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor kvm_synchronize_tsc to make a new function that allows callers to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly for the sake of participating in TSC synchronization. Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 105 ++++++++++++++++++++++++++------------------- 1 file changed, 61 insertions(+), 44 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f1434cd388b9..9d0445527dad 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2447,13 +2447,71 @@ static inline bool kvm_check_tsc_unstable(void) return check_tsc_unstable(); } +/* + * Infers attempts to synchronize the guest's tsc from host writes. Sets the + * offset for the vcpu and tracks the TSC matching generation that the vcpu + * participates in. + */ +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, + u64 ns, bool matched) +{ + struct kvm *kvm = vcpu->kvm; + bool already_matched; + + lockdep_assert_held(&kvm->arch.tsc_write_lock); + + already_matched = + (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); + + /* + * We track the most recent recorded KHZ, write and time to + * allow the matching interval to be extended at each write. + */ + kvm->arch.last_tsc_nsec = ns; + kvm->arch.last_tsc_write = tsc; + kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + + vcpu->arch.last_guest_tsc = tsc; + + /* Keep track of which generation this VCPU has synchronized to */ + vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; + vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; + vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; + + kvm_vcpu_write_tsc_offset(vcpu, offset); + + if (!matched) { + /* + * We split periods of matched TSC writes into generations. + * For each generation, we track the original measured + * nanosecond time, offset, and write, so if TSCs are in + * sync, we can match exact offset, and if not, we can match + * exact software computation in compute_guest_tsc() + * + * These values are tracked in kvm->arch.cur_xxx variables. + */ + kvm->arch.cur_tsc_generation++; + kvm->arch.cur_tsc_nsec = ns; + kvm->arch.cur_tsc_write = tsc; + kvm->arch.cur_tsc_offset = offset; + + spin_lock(&kvm->arch.pvclock_gtod_sync_lock); + kvm->arch.nr_vcpus_matched_tsc = 0; + } else if (!already_matched) { + spin_lock(&kvm->arch.pvclock_gtod_sync_lock); + kvm->arch.nr_vcpus_matched_tsc++; + } + + kvm_track_tsc_matching(vcpu); + spin_unlock(&kvm->arch.pvclock_gtod_sync_lock); +} + static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) { struct kvm *kvm = vcpu->kvm; u64 offset, ns, elapsed; unsigned long flags; - bool matched; - bool already_matched; + bool matched = false; bool synchronizing = false; raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); @@ -2499,50 +2557,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) offset = kvm_compute_l1_tsc_offset(vcpu, data); } matched = true; - already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); - } else { - /* - * We split periods of matched TSC writes into generations. - * For each generation, we track the original measured - * nanosecond time, offset, and write, so if TSCs are in - * sync, we can match exact offset, and if not, we can match - * exact software computation in compute_guest_tsc() - * - * These values are tracked in kvm->arch.cur_xxx variables. - */ - kvm->arch.cur_tsc_generation++; - kvm->arch.cur_tsc_nsec = ns; - kvm->arch.cur_tsc_write = data; - kvm->arch.cur_tsc_offset = offset; - matched = false; } - /* - * We also track th most recent recorded KHZ, write and time to - * allow the matching interval to be extended at each write. - */ - kvm->arch.last_tsc_nsec = ns; - kvm->arch.last_tsc_write = data; - kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; - - vcpu->arch.last_guest_tsc = data; - - /* Keep track of which generation this VCPU has synchronized to */ - vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; - vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; - vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; - - kvm_vcpu_write_tsc_offset(vcpu, offset); - - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); - if (!matched) { - kvm->arch.nr_vcpus_matched_tsc = 0; - } else if (!already_matched) { - kvm->arch.nr_vcpus_matched_tsc++; - } - - kvm_track_tsc_matching(vcpu); - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + __kvm_synchronize_tsc(vcpu, offset, data, ns, matched); raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } From patchwork Mon Aug 16 00:11:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12437467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FC41C4338F for ; Mon, 16 Aug 2021 00:12:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 26D096137D for ; Mon, 16 Aug 2021 00:12:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231887AbhHPAM3 (ORCPT ); Sun, 15 Aug 2021 20:12:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231861AbhHPAMM (ORCPT ); Sun, 15 Aug 2021 20:12:12 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8719EC0617AD for ; Sun, 15 Aug 2021 17:11:41 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id s9-20020ad450090000b029034fef0edad8so11783890qvo.21 for ; Sun, 15 Aug 2021 17:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=J/3kqHaHqhmnKwrHoMPM0ZlNGfNvJm2eth5nUFdCMDA=; b=lWAe2bkLnP2RZHENEsb3Km2+TAYR519X/MWjvSccLXFB2GldsN1VyRPQ5mW0F2BYlw E9v7oknYR2V65GsyUntrQnqDKFtRBVHvGhVVUDjamUf3oiSkNNuze/bWX8zPVJVJzXNC hHHwv6p56PtA8BIjOvzL7w53kdQqVjJbYIugRTjWxNWXnk+r/vDs2rqGf1iZY9bqSh9C DeEN2q7ZSoOT3+VNV9HhGyx1woN8GWQ0Vfj07KcMpKumaafSVDj+JBsFmuQeHd/cj1eI 4vd3yGpSENidn2i2tzDtNxKbl97FMVgvxNyFkAVx2x4+p4OLafFcs1G9EAKuhyf6/H8X P+pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=J/3kqHaHqhmnKwrHoMPM0ZlNGfNvJm2eth5nUFdCMDA=; b=DpxM8bQGd0N6Mia04pWxUyaBwX4VkEqc+0iVVKI91FNCnGBnDARVvQEfNKgO2Sdwk5 BvKqNW5Fy6LHnPcIPgmhTgvuH40bJxIB7QaQRPBraLqTcVWT/fsOtaZC0laSUOf7bpNM YX2KrJmnNKB0Nbp035qLUu0d1ytgxqRmUL4g7mnPZn9peL7nQHdYE3Of0oWdRSR2GvlB seWnpMVayRewS1t1R6hL5P4eCFnUGXUaQRdkjuAx8D9NJoYVtsFqmuG/KsypP4VlTbyU vOo/noLVm6aUAEVJ8zxQhYbx36sEgubOT1EONH0fwAHYawet3AUfUR17LyRhCwLD8j9i epfA== X-Gm-Message-State: AOAM533fbrgBpnb+YuZVLTgEgQN3nJIlAvTEeT4RTgDpZYbe+mlntaOY hqvPhWjaH++BC1fzngwmI39npZETyO+8PVK6kYSG94KbmHbrxY4nJmXQLjNsTilKLSjPPhGqTWn NeEt0L4Bq64NzD1PUQ2LrwI00VuyuZ2Rnf3Vh6L/PatsD9RvUjAP3HGupng== X-Google-Smtp-Source: ABdhPJyRvEL0jJdTP2zROUoA9tF1O2ZqfbMeyOeNxHBxzx4l3GZm/jafv62X7H59aGC5dQa5XzTEAGZtmbo= X-Received: from oupton.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:404]) (user=oupton job=sendgmr) by 2002:a05:6214:2465:: with SMTP id im5mr13584587qvb.46.1629072700634; Sun, 15 Aug 2021 17:11:40 -0700 (PDT) Date: Mon, 16 Aug 2021 00:11:30 +0000 In-Reply-To: <20210816001130.3059564-1-oupton@google.com> Message-Id: <20210816001130.3059564-7-oupton@google.com> Mime-Version: 1.0 References: <20210816001130.3059564-1-oupton@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace From: Oliver Upton To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini , Sean Christopherson , Marc Zyngier , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Andrew Jones , Will Deacon , Catalin Marinas , Oliver Upton Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack Cc: Sean Christopherson Signed-off-by: Oliver Upton --- Documentation/virt/kvm/devices/vcpu.rst | 57 +++++++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 109 ++++++++++++++++++++++++ 4 files changed, 171 insertions(+) diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 2acec3b9ef65..3b399d727c11 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The base address must be 64 byte aligned and exist within a valid guest memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information including the layout of the stolen time structure. + +4. GROUP: KVM_VCPU_TSC_CTRL +=========================== + +:Architectures: x86 + +4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET + +:Parameters: 64-bit unsigned TSC offset + +Returns: + + ======= ====================================== + -EFAULT Error reading/writing the provided + parameter address. + -ENXIO Attribute not supported + ======= ====================================== + +Specifies the guest's TSC offset relative to the host's TSC. The guest's +TSC is then derived by the following equation: + + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET + +This attribute is useful for the precise migration of a guest's TSC. The +following describes a possible algorithm to use for the migration of a +guest's TSC: + +From the source VMM process: + +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + +2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the + guest TSC offset (off_n). + +3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the + guest's TSC (freq). + +From the destination VMM process: + +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and + kvmclock nanoseconds (k_1). + +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 + +7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the + respective value derived in the previous step. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7fad2615f4a9..376b26a294c9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1071,6 +1071,7 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_write; u32 last_tsc_khz; + u64 last_tsc_offset; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index a6c327f8ad9e..0b22e1e84e78 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -503,4 +503,8 @@ struct kvm_pmu_event_filter { #define KVM_PMU_EVENT_ALLOW 0 #define KVM_PMU_EVENT_DENY 1 +/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ +#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ +#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9d0445527dad..0b1398d439c0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2470,6 +2470,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, kvm->arch.last_tsc_nsec = ns; kvm->arch.last_tsc_write = tsc; kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; + kvm->arch.last_tsc_offset = offset; vcpu->arch.last_guest_tsc = tsc; @@ -4923,6 +4924,109 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) return 0; } +static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: + r = -EFAULT; + if (put_user(vcpu->arch.l1_tsc_offset, uaddr)) + break; + r = 0; + break; + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + u64 __user *uaddr = (u64 __user *)attr->addr; + struct kvm *kvm = vcpu->kvm; + int r; + + switch (attr->attr) { + case KVM_VCPU_TSC_OFFSET: { + u64 offset, tsc, ns; + unsigned long flags; + bool matched; + + r = -EFAULT; + if (get_user(offset, uaddr)) + break; + + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); + + matched = (vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz && + kvm->arch.last_tsc_offset == offset); + + tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; + ns = get_kvmclock_base_ns(); + + __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); + + r = 0; + break; + } + default: + r = -ENXIO; + } + + return r; +} + +static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu, + unsigned int ioctl, + void __user *argp) +{ + struct kvm_device_attr attr; + int r; + + if (copy_from_user(&attr, argp, sizeof(attr))) + return -EFAULT; + + if (attr.group != KVM_VCPU_TSC_CTRL) + return -ENXIO; + + switch (ioctl) { + case KVM_HAS_DEVICE_ATTR: + r = kvm_arch_tsc_has_attr(vcpu, &attr); + break; + case KVM_GET_DEVICE_ATTR: + r = kvm_arch_tsc_get_attr(vcpu, &attr); + break; + case KVM_SET_DEVICE_ATTR: + r = kvm_arch_tsc_set_attr(vcpu, &attr); + break; + } + + return r; +} + static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, struct kvm_enable_cap *cap) { @@ -5377,6 +5481,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = __set_sregs2(vcpu, u.sregs2); break; } + case KVM_HAS_DEVICE_ATTR: + case KVM_GET_DEVICE_ATTR: + case KVM_SET_DEVICE_ATTR: + r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); + break; default: r = -EINVAL; }