From patchwork Mon Jan 11 19:57:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12011615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 288A5C4332D for ; Mon, 11 Jan 2021 19:58:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3D4022CB1 for ; Mon, 11 Jan 2021 19:58:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391384AbhAKT6O (ORCPT ); Mon, 11 Jan 2021 14:58:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391324AbhAKT6N (ORCPT ); Mon, 11 Jan 2021 14:58:13 -0500 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6023EC061794 for ; Mon, 11 Jan 2021 11:57:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=1qwNwrIFDu4e0qSxR1YkNf1iRcj1rDIsKqfExd0JJk8=; b=B1Y3erkSC/PBw6OGGoLKrMMTsL ycuNFmxA6ajK5YlW4RkjdAJbcIBRLoILC1HBcKrA4VX2oDCn5mrEwfeUCMBxyHbNe0u79TnpeHNln shjS0upPZyDVPO7Vty/yWzy5O3ioYSYZb1SmJH4lyZnlk7JFw980L4TAe6Y7k5wRikisocoMqh1Of 0kbNyrOGXgNYkf410z4TJ8MnWu+r0Ze9Bd5G74CbBpOnBUmaKM5fd+U5cLF0XOaWvqhRVRE/iawyb 8Nwn8rJu3FiZ3kSJrJxEIkLCG3uTq0dJvXjgcht1aG31fIvo5tELcPSazeGBMGx1oovlFIDKLBOlw Bm1wYnXg==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kz3Jg-0001hI-QG; Mon, 11 Jan 2021 19:57:29 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1kz3Jf-0001Hy-NO; Mon, 11 Jan 2021 19:57:27 +0000 From: David Woodhouse To: kvm@vger.kernel.org Cc: Paolo Bonzini , Ankur Arora , Joao Martins , Boris Ostrovsky , Sean Christopherson , graf@amazon.com, iaslan@amazon.de, pdurrant@amazon.com, aagch@amazon.com, fandree@amazon.com, hch@infradead.org Subject: [PATCH v5 13/16] KVM: x86/xen: register runstate info Date: Mon, 11 Jan 2021 19:57:22 +0000 Message-Id: <20210111195725.4601-14-dwmw2@infradead.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210111195725.4601-1-dwmw2@infradead.org> References: <20210111195725.4601-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by merlin.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Joao Martins Allow emulator to register vcpu runstates which allow Xen guests to use that for steal clock. The 'preempted' state of KVM steal clock equates to 'runnable' state, 'running' has similar meanings for both and 'offline' is used when system admin needs to bring vcpu offline or hotplug. Signed-off-by: Joao Martins Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 5 ++ arch/x86/kvm/x86.c | 10 +++ arch/x86/kvm/xen.c | 148 +++++++++++++++++++++++++++++++- arch/x86/kvm/xen.h | 8 ++ include/uapi/linux/kvm.h | 1 + 5 files changed, 171 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cd65bd43fc5f..73f285ebb181 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -523,10 +523,15 @@ struct kvm_vcpu_hv { /* Xen HVM per vcpu emulation context */ struct kvm_vcpu_xen { u64 hypercall_rip; + u32 current_runstate; bool vcpu_info_set; bool vcpu_time_info_set; + bool runstate_set; struct gfn_to_hva_cache vcpu_info_cache; struct gfn_to_hva_cache vcpu_time_info_cache; + struct gfn_to_hva_cache runstate_cache; + u64 last_steal; + u64 last_state_ns; }; struct kvm_vcpu_arch { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1cf503d559eb..f3f07b0265fc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2946,6 +2946,11 @@ static void record_steal_time(struct kvm_vcpu *vcpu) struct kvm_host_map map; struct kvm_steal_time *st; + if (vcpu->arch.xen.runstate_set) { + kvm_xen_setup_runstate_page(vcpu); + return; + } + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; @@ -3999,6 +4004,11 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) struct kvm_host_map map; struct kvm_steal_time *st; + if (vcpu->arch.xen.runstate_set) { + kvm_xen_runstate_set_preempted(vcpu); + return; + } + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 1cca46effec8..17cbb4462b7e 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -11,9 +11,11 @@ #include "hyperv.h" #include +#include #include #include +#include #include "trace.h" @@ -56,6 +58,124 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) return 0; } +static void kvm_xen_update_runstate(struct kvm_vcpu *v, int state, u64 steal_ns) +{ + struct kvm_vcpu_xen *vcpu_xen = &v->arch.xen; + struct vcpu_runstate_info runstate; + unsigned int offset = offsetof(struct compat_vcpu_runstate_info, state_entry_time); + u64 now, delta; + + BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c); + +#ifdef CONFIG_X86_64 + /* + * The only difference is alignment of uint64_t in 32-bit. + * So the first field 'state' is accessed via *runstate_state + * which is unmodified, while the other fields are accessed + * through 'runstate->' which we tweak here by adding 4. + */ + BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) != + offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4); + BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) != + offsetof(struct compat_vcpu_runstate_info, time) + 4); + + offset = offsetof(struct vcpu_runstate_info, state_entry_time); +#endif + /* + * Although it's called "state_entry_time" and explicitly documented + * as being "the system time at which the VCPU was last scheduled to + * run", Xen just treats it as a counter for HVM domains too. + */ + if (kvm_read_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, + &runstate.state_entry_time, offset, + sizeof(u64) * 5)) + return; + + runstate.state_entry_time = XEN_RUNSTATE_UPDATE | + (runstate.state_entry_time + 1); + + if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, + &runstate.state_entry_time, offset, + sizeof(u64))) + return; + smp_wmb(); + + BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != + offsetof(struct compat_vcpu_runstate_info, state)); + BUILD_BUG_ON(sizeof(((struct vcpu_runstate_info *)0)->state) != + sizeof(((struct compat_vcpu_runstate_info *)0)->state)); + if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, + &state, + offsetof(struct vcpu_runstate_info, state), + sizeof(runstate.state))) + return; + + now = ktime_get_ns(); + delta = now - vcpu_xen->last_state_ns - steal_ns; + runstate.time[vcpu_xen->current_runstate] += delta; + if (steal_ns) + runstate.time[RUNSTATE_runnable] += steal_ns; + + BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) != + offsetof(struct vcpu_runstate_info, time) - sizeof(u64)); + BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state_entry_time) != + offsetof(struct compat_vcpu_runstate_info, time) - sizeof(u64)); + BUILD_BUG_ON(sizeof(((struct vcpu_runstate_info *)0)->time) != + sizeof(((struct compat_vcpu_runstate_info *)0)->time)); + if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, + &runstate.time[0], + offset + sizeof(u64), + sizeof(runstate.time))) + return; + smp_wmb(); + vcpu_xen->current_runstate = state; + vcpu_xen->last_state_ns = now; + + runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; + if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, + &runstate.state_entry_time, offset, + sizeof(u64))) + return; +} + +void kvm_xen_runstate_set_preempted(struct kvm_vcpu *v) +{ + struct kvm_vcpu_xen *vcpu_xen = &v->arch.xen; + int new_state; + + BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c); + BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != + offsetof(struct compat_vcpu_runstate_info, state)); + BUILD_BUG_ON(sizeof(((struct vcpu_runstate_info *)0)->state) != + sizeof(((struct compat_vcpu_runstate_info *)0)->state)); + + if (v->preempted) { + new_state = RUNSTATE_runnable; + } else { + new_state = RUNSTATE_blocked; + vcpu_xen->last_steal = current->sched_info.run_delay; + } + + kvm_xen_update_runstate(v, new_state, 0); +} + +void kvm_xen_setup_runstate_page(struct kvm_vcpu *v) +{ + struct kvm_vcpu_xen *vcpu_xen = &v->arch.xen; + u64 steal_time = 0; + + /* + * If the CPU was blocked when it last stopped, presumably + * it became unblocked at some point because it's being run + * again now. The scheduler run_delay is the runnable time, + * to be subtracted from the blocked time. + */ + if (vcpu_xen->current_runstate == RUNSTATE_blocked) + steal_time = current->sched_info.run_delay - vcpu_xen->last_steal; + + kvm_xen_update_runstate(v, RUNSTATE_running, steal_time); +} + int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { struct kvm_vcpu *v; @@ -78,7 +198,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id); if (!v) return -EINVAL; - /* No compat necessary here. */ BUILD_BUG_ON(sizeof(struct vcpu_info) != sizeof(struct compat_vcpu_info)); @@ -110,6 +229,22 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); break; + case KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE: + v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id); + if (!v) + return -EINVAL; + + r = kvm_gfn_to_hva_cache_init(kvm, &v->arch.xen.runstate_cache, + data->u.vcpu_attr.gpa, + sizeof(struct vcpu_runstate_info)); + if (r) + return r; + + v->arch.xen.runstate_set = true; + v->arch.xen.current_runstate = RUNSTATE_blocked; + v->arch.xen.last_state_ns = ktime_get_ns(); + break; + default: break; } @@ -157,6 +292,17 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) } break; + case KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE: + v = kvm_get_vcpu_by_id(kvm, data->u.vcpu_attr.vcpu_id); + if (!v) + return -EINVAL; + + if (v->arch.xen.runstate_set) { + data->u.vcpu_attr.gpa = v->arch.xen.runstate_cache.gpa; + r = 0; + } + break; + default: break; } diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 317c1325dade..407e717476d6 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -9,6 +9,8 @@ #ifndef __ARCH_X86_KVM_XEN_H__ #define __ARCH_X86_KVM_XEN_H__ +void kvm_xen_setup_runstate_page(struct kvm_vcpu *vcpu); +void kvm_xen_runstate_set_preempted(struct kvm_vcpu *vcpu); int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); int kvm_xen_hypercall(struct kvm_vcpu *vcpu); @@ -56,4 +58,10 @@ struct compat_shared_info { struct compat_arch_shared_info arch; }; +struct compat_vcpu_runstate_info { + int state; + uint64_t state_entry_time; + uint64_t time[4]; +} __attribute__((packed)); + #endif /* __ARCH_X86_KVM_XEN_H__ */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6e91c004ae68..0571a7bbb13b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1608,6 +1608,7 @@ struct kvm_xen_hvm_attr { #define KVM_XEN_ATTR_TYPE_SHARED_INFO 0x1 #define KVM_XEN_ATTR_TYPE_VCPU_INFO 0x2 #define KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO 0x3 +#define KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE 0x4 /* Secure Encrypted Virtualization command */ enum sev_cmd_id {