From patchwork Thu Feb 10 00:27:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4291AC433EF for ; Thu, 10 Feb 2022 01:32:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232767AbiBJBcS (ORCPT ); Wed, 9 Feb 2022 20:32:18 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:39848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232966AbiBJBcQ (ORCPT ); Wed, 9 Feb 2022 20:32:16 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 242D010E0 for ; Wed, 9 Feb 2022 17:32:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Wx2SHMFRFNms0QAIPNmRLyuGN49B+s0O45YYcLdb6nk=; b=jBqaFP1EKztmuOW0FaHuKBHzu4 VGqKIFCGg+JhafHtZN7glL9ZOEs9CezVnMtp8tbxdTo1PSxuq7RHOUuEDHHMATOZ1q2DyH9UmjcJq JAv0D6zNDS3agg2vnmVeCzhYVgmfpIzw5iWkzj6Qh9quRyLlhAbtszCypjA7iWgcERv7ijHw6ynk7 dyV+aIopDfDyRkP6/A4M/ZN071LwfgLOkz+ne0I7pPRnA7M96E9HrHtFs62ZR3m1f1Vsh3uYyxIqv +/t2Q9coXGhvu8Mo/LlnlR1DUNvCUVLpd+m5cqtOWLcBtfxunLCBLgraldL7zBKMIlKIo50mz9bWn xoZyT1RA==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl1-Dk; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIx-0019Cj-Uj; Thu, 10 Feb 2022 00:27:23 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 01/15] KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU Date: Thu, 10 Feb 2022 00:27:07 +0000 Message-Id: <20220210002721.273608-2-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse There are circumstances whem kvm_xen_update_runstate_guest() should not sleep because it ends up being called from __schedule() when the vCPU is preempted: [ 222.830825] kvm_xen_update_runstate_guest+0x24/0x100 [ 222.830878] kvm_arch_vcpu_put+0x14c/0x200 [ 222.830920] kvm_sched_out+0x30/0x40 [ 222.830960] __schedule+0x55c/0x9f0 To handle this, make it use the same trick as __kvm_xen_has_interrupt(), of using the hva from the gfn_to_hva_cache directly. Then it can use pagefault_disable() around the accesses and just bail out if the page is absent (which is unlikely). After first looking at this, there followed a long path of discovery which culminated in removing the existing gfn_to_pfn_cache and replacing it with something more fit for purpose in 5.17. A future patch can convert the runstate code to use that, but this simpler fix can be applied more easily to the older stable kernels. Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse Cc: stable@vger.kernel.org --- arch/x86/kvm/xen.c | 102 ++++++++++++++++++++++++++++++--------------- 1 file changed, 68 insertions(+), 34 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index bad57535fad0..39b319f428bc 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -133,36 +133,60 @@ static void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) { struct kvm_vcpu_xen *vx = &v->arch.xen; + struct gfn_to_hva_cache *ghc = &vx->runstate_cache; + struct kvm_memslots *slots = kvm_memslots(v->kvm); + bool atomic = (state == RUNSTATE_runnable); uint64_t state_entry_time; - unsigned int offset; + int __user *user_state; + uint64_t __user *user_times; kvm_xen_update_runstate(v, state); if (!vx->runstate_set) return; - BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c); + if (unlikely(slots->generation != ghc->generation || kvm_is_error_hva(ghc->hva)) && + kvm_gfn_to_hva_cache_init(v->kvm, ghc, ghc->gpa, ghc->len)) + return; + + /* We made sure it fits in a single page */ + BUG_ON(!ghc->memslot); + + if (atomic) + pagefault_disable(); - offset = offsetof(struct compat_vcpu_runstate_info, state_entry_time); -#ifdef CONFIG_X86_64 /* - * The only difference is alignment of uint64_t in 32-bit. - * So the first field 'state' is accessed directly using - * offsetof() (where its offset happens to be zero), while the - * remaining fields which are all uint64_t, start at 'offset' - * which we tweak here by adding 4. + * The only difference between 32-bit and 64-bit versions of the + * runstate struct is the alignment of uint64_t in 32-bit, which + * means that the 64-bit version has an additional 4 bytes of + * padding after the first field 'state'. + * + * So we use 'int __user *user_state' to point to the state field, + * and 'uint64_t __user *user_times' for runstate_entry_time. So + * the actual array of time[] in each state starts at user_times[1]. */ + BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != 0); + BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state) != 0); + user_state = (int __user *)ghc->hva; + + BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c); + + user_times = (uint64_t __user *)(ghc->hva + + offsetof(struct compat_vcpu_runstate_info, + state_entry_time)); +#ifdef CONFIG_X86_64 BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) != offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4); BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) != offsetof(struct compat_vcpu_runstate_info, time) + 4); if (v->kvm->arch.xen.long_mode) - offset = offsetof(struct vcpu_runstate_info, state_entry_time); + user_times = (uint64_t __user *)(ghc->hva + + offsetof(struct vcpu_runstate_info, + state_entry_time)); #endif /* - * First write the updated state_entry_time at the appropriate - * location determined by 'offset'. + * First write the updated state_entry_time to the guest area. */ state_entry_time = vx->runstate_entry_time; state_entry_time |= XEN_RUNSTATE_UPDATE; @@ -172,28 +196,21 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) BUILD_BUG_ON(sizeof_field(struct compat_vcpu_runstate_info, state_entry_time) != sizeof(state_entry_time)); - if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, - &state_entry_time, offset, - sizeof(state_entry_time))) - return; + if (__put_user(state_entry_time, user_times)) + goto out; smp_wmb(); /* * Next, write the new runstate. This is in the *same* place * for 32-bit and 64-bit guests, asserted here for paranoia. */ - BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != - offsetof(struct compat_vcpu_runstate_info, state)); BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, state) != sizeof(vx->current_runstate)); BUILD_BUG_ON(sizeof_field(struct compat_vcpu_runstate_info, state) != sizeof(vx->current_runstate)); - if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, - &vx->current_runstate, - offsetof(struct vcpu_runstate_info, state), - sizeof(vx->current_runstate))) - return; + if (__put_user(vx->current_runstate, user_state)) + goto out; /* * Write the actual runstate times immediately after the @@ -208,24 +225,23 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, time) != sizeof(vx->runstate_times)); - if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, - &vx->runstate_times[0], - offset + sizeof(u64), - sizeof(vx->runstate_times))) - return; - + if (__copy_to_user(user_times + 1, vx->runstate_times, sizeof(vx->runstate_times))) + goto out; smp_wmb(); /* * Finally, clear the XEN_RUNSTATE_UPDATE bit in the guest's * runstate_entry_time field. */ - state_entry_time &= ~XEN_RUNSTATE_UPDATE; - if (kvm_write_guest_offset_cached(v->kvm, &v->arch.xen.runstate_cache, - &state_entry_time, offset, - sizeof(state_entry_time))) - return; + __put_user(state_entry_time, user_times); + smp_wmb(); + + out: + mark_page_dirty_in_slot(v->kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT); + + if (atomic) + pagefault_enable(); } int __kvm_xen_has_interrupt(struct kvm_vcpu *v) @@ -443,6 +459,12 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; } + /* It must fit within a single page */ + if ((data->u.gpa & ~PAGE_MASK) + sizeof(struct vcpu_info) > PAGE_SIZE) { + r = -EINVAL; + break; + } + r = kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache, data->u.gpa, @@ -460,6 +482,12 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; } + /* It must fit within a single page */ + if ((data->u.gpa & ~PAGE_MASK) + sizeof(struct pvclock_vcpu_time_info) > PAGE_SIZE) { + r = -EINVAL; + break; + } + r = kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.xen.vcpu_time_info_cache, data->u.gpa, @@ -481,6 +509,12 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; } + /* It must fit within a single page */ + if ((data->u.gpa & ~PAGE_MASK) + sizeof(struct vcpu_runstate_info) > PAGE_SIZE) { + r = -EINVAL; + break; + } + r = kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.xen.runstate_cache, data->u.gpa, From patchwork Thu Feb 10 00:27:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07CD5C433F5 for ; Thu, 10 Feb 2022 02:09:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232179AbiBJCI4 (ORCPT ); Wed, 9 Feb 2022 21:08:56 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:55296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235872AbiBJCIl (ORCPT ); Wed, 9 Feb 2022 21:08:41 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27799B07 for ; Wed, 9 Feb 2022 18:08:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=EPVVj8e5VYc/sxbpTFSedbue+iktt5ulk1qYFSL0wR4=; b=gVCywl6q+o+dD7XBDboVDhrtd9 lrsCEzfo80qqxnVR1ty55VgUIabx91Z3RFKeF8lrhlOO/nBzq4VAmATUEE+GcOtktBgnIqmobOv1j ncps3rDVEgRnj70ostGughl87Z+c3Aqt9fofiqATyM+7XFrkE2fH5Go4B15XsZymWI11kCEOzE7t3 wGwDpAqThAjHL/UhnjdOOnGJA8vXRfYdGsjigLO/KXgTcHpz84sJeH+pyzD9oHVYALPWM48UWFc1l tRdGGj6Umc2begwfgIgiBUh2Q/DwtOQoumU9Iw7h9RLdYd5SmjPc2aPfS+VWhwo80gt63rZ0nbnRe oogzri0A==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008YWE-7w; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIx-0019Co-Vc; Thu, 10 Feb 2022 00:27:23 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 02/15] KVM: x86/xen: Use gfn_to_pfn_cache for runstate area Date: Thu, 10 Feb 2022 00:27:08 +0000 Message-Id: <20220210002721.273608-3-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/x86.c | 1 + arch/x86/kvm/xen.c | 111 ++++++++++++++++---------------- arch/x86/kvm/xen.h | 6 +- 4 files changed, 62 insertions(+), 59 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6e7c545bc7ee..1e73053fd2bf 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -603,10 +603,9 @@ struct kvm_vcpu_xen { u32 current_runstate; bool vcpu_info_set; bool vcpu_time_info_set; - bool runstate_set; struct gfn_to_hva_cache vcpu_info_cache; struct gfn_to_hva_cache vcpu_time_info_cache; - struct gfn_to_hva_cache runstate_cache; + struct gfn_to_pfn_cache runstate_cache; u64 last_steal; u64 runstate_entry_time; u64 runstate_times[4]; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 74b53a16f38a..5d0191bf30b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11195,6 +11195,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); fpu_free_guest_fpstate(&vcpu->arch.guest_fpu); + kvm_xen_destroy_vcpu(vcpu); kvm_hv_vcpu_uninit(vcpu); kvm_pmu_destroy(vcpu); kfree(vcpu->arch.mce_banks); diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 39b319f428bc..5d40d6521440 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -133,27 +133,37 @@ static void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) { struct kvm_vcpu_xen *vx = &v->arch.xen; - struct gfn_to_hva_cache *ghc = &vx->runstate_cache; - struct kvm_memslots *slots = kvm_memslots(v->kvm); - bool atomic = (state == RUNSTATE_runnable); - uint64_t state_entry_time; - int __user *user_state; - uint64_t __user *user_times; + struct gfn_to_pfn_cache *gpc = &vx->runstate_cache; + uint64_t *user_times; + unsigned long flags; + size_t user_len; + int *user_state; kvm_xen_update_runstate(v, state); - if (!vx->runstate_set) + if (!vx->runstate_cache.active) return; - if (unlikely(slots->generation != ghc->generation || kvm_is_error_hva(ghc->hva)) && - kvm_gfn_to_hva_cache_init(v->kvm, ghc, ghc->gpa, ghc->len)) - return; + if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) + user_len = sizeof(struct vcpu_runstate_info); + else + user_len = sizeof(struct compat_vcpu_runstate_info); - /* We made sure it fits in a single page */ - BUG_ON(!ghc->memslot); + read_lock_irqsave(&gpc->lock, flags); + while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa, + user_len)) { + read_unlock_irqrestore(&gpc->lock, flags); - if (atomic) - pagefault_disable(); + /* When invoked from kvm_sched_out() we cannot sleep */ + if (state == RUNSTATE_runnable) + return; + + if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, + user_len, false)) + return; + + read_lock_irqsave(&gpc->lock, flags); + } /* * The only difference between 32-bit and 64-bit versions of the @@ -167,37 +177,32 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) */ BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != 0); BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state) != 0); - user_state = (int __user *)ghc->hva; - BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c); - - user_times = (uint64_t __user *)(ghc->hva + - offsetof(struct compat_vcpu_runstate_info, - state_entry_time)); #ifdef CONFIG_X86_64 BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) != offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4); BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) != offsetof(struct compat_vcpu_runstate_info, time) + 4); - - if (v->kvm->arch.xen.long_mode) - user_times = (uint64_t __user *)(ghc->hva + - offsetof(struct vcpu_runstate_info, - state_entry_time)); #endif + + user_state = gpc->khva; + + if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) + user_times = gpc->khva + offsetof(struct vcpu_runstate_info, + state_entry_time); + else + user_times = gpc->khva + offsetof(struct compat_vcpu_runstate_info, + state_entry_time); + /* * First write the updated state_entry_time to the guest area. */ - state_entry_time = vx->runstate_entry_time; - state_entry_time |= XEN_RUNSTATE_UPDATE; - BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, state_entry_time) != - sizeof(state_entry_time)); + sizeof(user_times[0])); BUILD_BUG_ON(sizeof_field(struct compat_vcpu_runstate_info, state_entry_time) != - sizeof(state_entry_time)); + sizeof(user_times[0])); - if (__put_user(state_entry_time, user_times)) - goto out; + user_times[0] = vx->runstate_entry_time | XEN_RUNSTATE_UPDATE; smp_wmb(); /* @@ -209,8 +214,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) BUILD_BUG_ON(sizeof_field(struct compat_vcpu_runstate_info, state) != sizeof(vx->current_runstate)); - if (__put_user(vx->current_runstate, user_state)) - goto out; + *user_state = vx->current_runstate; /* * Write the actual runstate times immediately after the @@ -225,23 +229,19 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, time) != sizeof(vx->runstate_times)); - if (__copy_to_user(user_times + 1, vx->runstate_times, sizeof(vx->runstate_times))) - goto out; + memcpy(user_times + 1, vx->runstate_times, sizeof(vx->runstate_times)); smp_wmb(); /* * Finally, clear the XEN_RUNSTATE_UPDATE bit in the guest's * runstate_entry_time field. */ - state_entry_time &= ~XEN_RUNSTATE_UPDATE; - __put_user(state_entry_time, user_times); + user_times[0] &= ~XEN_RUNSTATE_UPDATE; smp_wmb(); - out: - mark_page_dirty_in_slot(v->kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT); + read_unlock_irqrestore(&gpc->lock, flags); - if (atomic) - pagefault_enable(); + mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT); } int __kvm_xen_has_interrupt(struct kvm_vcpu *v) @@ -504,24 +504,17 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; } if (data->u.gpa == GPA_INVALID) { - vcpu->arch.xen.runstate_set = false; + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, + &vcpu->arch.xen.runstate_cache); r = 0; break; } - /* It must fit within a single page */ - if ((data->u.gpa & ~PAGE_MASK) + sizeof(struct vcpu_runstate_info) > PAGE_SIZE) { - r = -EINVAL; - break; - } - - r = kvm_gfn_to_hva_cache_init(vcpu->kvm, + r = kvm_gfn_to_pfn_cache_init(vcpu->kvm, &vcpu->arch.xen.runstate_cache, - data->u.gpa, - sizeof(struct vcpu_runstate_info)); - if (!r) { - vcpu->arch.xen.runstate_set = true; - } + NULL, false, true, data->u.gpa, + sizeof(struct vcpu_runstate_info), + false); break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT: @@ -656,7 +649,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) r = -EOPNOTSUPP; break; } - if (vcpu->arch.xen.runstate_set) { + if (vcpu->arch.xen.runstate_cache.active) { data->u.gpa = vcpu->arch.xen.runstate_cache.gpa; r = 0; } @@ -1054,3 +1047,9 @@ int kvm_xen_setup_evtchn(struct kvm *kvm, return 0; } + +void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) +{ + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, + &vcpu->arch.xen.runstate_cache); +} diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index adbcc9ed59db..54b2bf4c3001 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -23,7 +23,7 @@ int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data); int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc); void kvm_xen_init_vm(struct kvm *kvm); void kvm_xen_destroy_vm(struct kvm *kvm); - +void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu); int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm); int kvm_xen_setup_evtchn(struct kvm *kvm, @@ -65,6 +65,10 @@ static inline void kvm_xen_destroy_vm(struct kvm *kvm) { } +static inline void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) +{ +} + static inline bool kvm_xen_msr_enabled(struct kvm *kvm) { return false; From patchwork Thu Feb 10 00:27:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6416BC433EF for ; Thu, 10 Feb 2022 01:31:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232834AbiBJBbx (ORCPT ); Wed, 9 Feb 2022 20:31:53 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:38510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232684AbiBJBbn (ORCPT ); Wed, 9 Feb 2022 20:31:43 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B37C81A0 for ; Wed, 9 Feb 2022 17:31:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=mELgNS1+ruG4c62o2IDV3AR0D1jff1f/1QgrgjYj/SU=; b=HpPbvpEj82nGjGCSz5U3QL1GSx NaHsKsMUQCQIYxeEDJ4Kp1q+GU93U3C46iQplpu3n51tOuWutaGreZlG6nwlDqQK5qtrHjhTgHEBx hpYh/HcYoSEfkB8b44m/7q2Dl6oZhCYUgi09ONVeVg9h08yomflxPyafYg18hVkJjCvsKIBSnY+DP lQp293ZxPxSRJsYtG/ExQVq8eWVnYmPdIdvAbPuNM7CXi4ICUqquYRwK6/Sl/LZcCohYRe8d4b2K9 hpsfyHz37Pk6T+wqUJ4ovQKAH/oiPSe4KVLHR2i1JLdJn8viSlMJIjmr+bDoWCDXSQ7ssMoHjJMzL GC759Ycw==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl2-FD; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019Ct-0G; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 03/15] KVM: x86: Use gfn_to_pfn_cache for pv_time Date: Thu, 10 Feb 2022 00:27:09 +0000 Message-Id: <20220210002721.273608-4-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Add a new kvm_setup_guest_pvclock() which parallels the existing kvm_setup_pvclock_page(). The latter will be removed once we convert all users to the gfn_to_pfn_cache version. Using the new cache, we can potentially let kvm_set_guest_paused() set the PVCLOCK_GUEST_STOPPED bit directly rather than having to delegate to the vCPU via KVM_REQ_CLOCK_UPDATE. But not yet. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/x86.c | 76 +++++++++++++++++++++++++++------ 2 files changed, 64 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1e73053fd2bf..c64b80c07fdd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -747,8 +747,7 @@ struct kvm_vcpu_arch { gpa_t time; struct pvclock_vcpu_time_info hv_clock; unsigned int hw_tsc_khz; - struct gfn_to_hva_cache pv_time; - bool pv_time_enabled; + struct gfn_to_pfn_cache pv_time; /* set guest stopped flag in pvclock flags field */ bool pvclock_set_guest_stopped_request; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5d0191bf30b3..992baebf0a58 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2209,14 +2209,14 @@ static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time, kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); /* we verify if the enable bit is set... */ - vcpu->arch.pv_time_enabled = false; - if (!(system_time & 1)) - return; - - if (!kvm_gfn_to_hva_cache_init(vcpu->kvm, - &vcpu->arch.pv_time, system_time & ~1ULL, - sizeof(struct pvclock_vcpu_time_info))) - vcpu->arch.pv_time_enabled = true; + if (system_time & 1) { + kvm_gfn_to_pfn_cache_init(vcpu->kvm, &vcpu->arch.pv_time, vcpu, + false, true, system_time & ~1ULL, + sizeof(struct pvclock_vcpu_time_info), + false); + } else { + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.pv_time); + } return; } @@ -2919,6 +2919,56 @@ u64 get_kvmclock_ns(struct kvm *kvm) return data.clock; } +static void kvm_setup_guest_pvclock(struct kvm_vcpu *v, + struct gfn_to_pfn_cache *gpc, + unsigned int offset) +{ + struct kvm_vcpu_arch *vcpu = &v->arch; + struct pvclock_vcpu_time_info *guest_hv_clock; + unsigned long flags; + + read_lock_irqsave(&gpc->lock, flags); + while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa, + offset + sizeof(*guest_hv_clock))) { + read_unlock_irqrestore(&gpc->lock, flags); + + if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, + offset + sizeof(*guest_hv_clock), + true)) + return; + + read_lock_irqsave(&gpc->lock, flags); + } + + guest_hv_clock = (void *)(gpc->khva + offset); + + /* + * This VCPU is paused, but it's legal for a guest to read another + * VCPU's kvmclock, so we really have to follow the specification where + * it says that version is odd if data is being modified, and even after + * it is consistent. + */ + + guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1; + smp_wmb(); + + /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ + vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED); + + if (vcpu->pvclock_set_guest_stopped_request) { + vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED; + vcpu->pvclock_set_guest_stopped_request = false; + } + + memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock)); + smp_wmb(); + + guest_hv_clock->version = ++vcpu->hv_clock.version; + read_unlock_irqrestore(&gpc->lock, flags); + + trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock); +} + static void kvm_setup_pvclock_page(struct kvm_vcpu *v, struct gfn_to_hva_cache *cache, unsigned int offset) @@ -3064,8 +3114,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) vcpu->hv_clock.flags = pvclock_flags; - if (vcpu->pv_time_enabled) - kvm_setup_pvclock_page(v, &vcpu->pv_time, 0); + if (vcpu->pv_time.active) + kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0); if (vcpu->xen.vcpu_info_set) kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_info_cache, offsetof(struct compat_vcpu_info, time)); @@ -3259,7 +3309,7 @@ static int kvm_pv_enable_async_pf_int(struct kvm_vcpu *vcpu, u64 data) static void kvmclock_reset(struct kvm_vcpu *vcpu) { - vcpu->arch.pv_time_enabled = false; + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.pv_time); vcpu->arch.time = 0; } @@ -5058,7 +5108,7 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, */ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) { - if (!vcpu->arch.pv_time_enabled) + if (!vcpu->arch.pv_time.active) return -EINVAL; vcpu->arch.pvclock_set_guest_stopped_request = true; kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -6122,7 +6172,7 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm) mutex_lock(&kvm->lock); kvm_for_each_vcpu(i, vcpu, kvm) { - if (!vcpu->arch.pv_time_enabled) + if (!vcpu->arch.pv_time.active) continue; ret = kvm_set_guest_paused(vcpu); From patchwork Thu Feb 10 00:27:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA81EC433EF for ; Thu, 10 Feb 2022 01:32:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233015AbiBJBcW (ORCPT ); Wed, 9 Feb 2022 20:32:22 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:40094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233075AbiBJBcO (ORCPT ); Wed, 9 Feb 2022 20:32:14 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71D9A22539 for ; Wed, 9 Feb 2022 17:32:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=HALpeGgwypjw8MPeLKmaRZHjwCcQf1V3jfzDjp9cnpY=; b=BiwDSeCRV8kkY4kv8LLUvBlEHF s5Phbg6paTRSL19uIwZeVf05nTTgIDx7qdnTtS5obPsIvh9ecTf+LiH30j1HroDnL5YUtTvYQIzkR 0qtXaGR2QlyoVUNx3eqtiZv6mxBeGDF4DykCxCl+xdn+PK3Qys6VxwMfXkdchOfc0zCQVG0Mp9jjq 6Ao+Vr1sgFOxonbUEArytPegTQaBjBUnM70Jbdt1Hq0VT8NGylQEFxN3MjJeOsQg2mvdI3FVFZLvF TKrcbmc5rzYSGLre9euXIuasQyaCmSNKYlusY5skA7TOP94mJFPtsLIpGf3Iy/cCaiOdBJjY1S/F5 yP03wlQA==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl3-Go; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019Cy-16; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 04/15] KVM: x86/xen: Use gfn_to_pfn_cache for vcpu_info Date: Thu, 10 Feb 2022 00:27:10 +0000 Message-Id: <20220210002721.273608-5-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Currently, the fast path of kvm_xen_set_evtchn_fast() doesn't set the index bits in the target vCPU's evtchn_pending_sel, because it only has a userspace virtual address with which to do so. It just sets them in the kernel, and kvm_xen_has_interrupt() then completes the delivery to the actual vcpu_info structure when the vCPU runs. Using a gfn_to_pfn_cache allows kvm_xen_set_evtchn_fast() to do the full delivery in the common case. Clean up the fallback case too, by moving the deferred delivery out into a separate kvm_xen_inject_pending_events() function which isn't ever called in atomic contexts as __kvm_xen_has_interrupt() is. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/x86.c | 9 +- arch/x86/kvm/xen.c | 246 +++++++++++++++++--------------- arch/x86/kvm/xen.h | 20 ++- 4 files changed, 157 insertions(+), 121 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c64b80c07fdd..118c9ce8e821 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -601,9 +601,8 @@ struct kvm_vcpu_hv { struct kvm_vcpu_xen { u64 hypercall_rip; u32 current_runstate; - bool vcpu_info_set; bool vcpu_time_info_set; - struct gfn_to_hva_cache vcpu_info_cache; + struct gfn_to_pfn_cache vcpu_info_cache; struct gfn_to_hva_cache vcpu_time_info_cache; struct gfn_to_pfn_cache runstate_cache; u64 last_steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 992baebf0a58..268f64b70768 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3116,9 +3116,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) if (vcpu->pv_time.active) kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0); - if (vcpu->xen.vcpu_info_set) - kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_info_cache, - offsetof(struct compat_vcpu_info, time)); + if (vcpu->xen.vcpu_info_cache.active) + kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache, + offsetof(struct compat_vcpu_info, time)); if (vcpu->xen.vcpu_time_info_set) kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0); if (!v->vcpu_idx) @@ -10295,6 +10295,9 @@ static int vcpu_run(struct kvm_vcpu *vcpu) break; kvm_clear_request(KVM_REQ_UNBLOCK, vcpu); + if (kvm_xen_has_pending_events(vcpu)) + kvm_xen_inject_pending_events(vcpu); + if (kvm_cpu_has_pending_timer(vcpu)) kvm_inject_pending_timer_irqs(vcpu); diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 5d40d6521440..545c1d5c070e 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -8,6 +8,7 @@ #include "x86.h" #include "xen.h" +#include "lapic.h" #include "hyperv.h" #include @@ -244,23 +245,80 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT); } -int __kvm_xen_has_interrupt(struct kvm_vcpu *v) +/* + * On event channel delivery, the vcpu_info may not have been accessible. + * In that case, there are bits in vcpu->arch.xen.evtchn_pending_sel which + * need to be marked into the vcpu_info (and evtchn_upcall_pending set). + * Do so now that we can sleep in the context of the vCPU to bring the + * page in, and refresh the pfn cache for it. + */ +void kvm_xen_inject_pending_events(struct kvm_vcpu *v) { unsigned long evtchn_pending_sel = READ_ONCE(v->arch.xen.evtchn_pending_sel); - bool atomic = in_atomic() || !task_is_running(current); - int err; + struct gfn_to_pfn_cache *gpc = &v->arch.xen.vcpu_info_cache; + unsigned long flags; + + if (!evtchn_pending_sel) + return; + + /* + * Yes, this is an open-coded loop. But that's just what put_user() + * does anyway. Page it in and retry the instruction. We're just a + * little more honest about it. + */ + read_lock_irqsave(&gpc->lock, flags); + while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa, + sizeof(struct vcpu_info))) { + read_unlock_irqrestore(&gpc->lock, flags); + + if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, + sizeof(struct vcpu_info), + false)) + return; + + read_lock_irqsave(&gpc->lock, flags); + } + + /* Now gpc->khva is a valid kernel address for the vcpu_info */ + if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) { + struct vcpu_info *vi = gpc->khva; + + asm volatile(LOCK_PREFIX "orq %0, %1\n" + "notq %0\n" + LOCK_PREFIX "andq %0, %2\n" + : "=r" (evtchn_pending_sel), + "+m" (vi->evtchn_pending_sel), + "+m" (v->arch.xen.evtchn_pending_sel) + : "0" (evtchn_pending_sel)); + WRITE_ONCE(vi->evtchn_upcall_pending, 1); + } else { + u32 evtchn_pending_sel32 = evtchn_pending_sel; + struct compat_vcpu_info *vi = gpc->khva; + + asm volatile(LOCK_PREFIX "orl %0, %1\n" + "notl %0\n" + LOCK_PREFIX "andl %0, %2\n" + : "=r" (evtchn_pending_sel32), + "+m" (vi->evtchn_pending_sel), + "+m" (v->arch.xen.evtchn_pending_sel) + : "0" (evtchn_pending_sel32)); + WRITE_ONCE(vi->evtchn_upcall_pending, 1); + } + read_unlock_irqrestore(&gpc->lock, flags); + + mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT); +} + +int __kvm_xen_has_interrupt(struct kvm_vcpu *v) +{ + struct gfn_to_pfn_cache *gpc = &v->arch.xen.vcpu_info_cache; + unsigned long flags; u8 rc = 0; /* * If the global upcall vector (HVMIRQ_callback_vector) is set and * the vCPU's evtchn_upcall_pending flag is set, the IRQ is pending. */ - struct gfn_to_hva_cache *ghc = &v->arch.xen.vcpu_info_cache; - struct kvm_memslots *slots = kvm_memslots(v->kvm); - bool ghc_valid = slots->generation == ghc->generation && - !kvm_is_error_hva(ghc->hva) && ghc->memslot; - - unsigned int offset = offsetof(struct vcpu_info, evtchn_upcall_pending); /* No need for compat handling here */ BUILD_BUG_ON(offsetof(struct vcpu_info, evtchn_upcall_pending) != @@ -270,101 +328,36 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) BUILD_BUG_ON(sizeof(rc) != sizeof_field(struct compat_vcpu_info, evtchn_upcall_pending)); - /* - * For efficiency, this mirrors the checks for using the valid - * cache in kvm_read_guest_offset_cached(), but just uses - * __get_user() instead. And falls back to the slow path. - */ - if (!evtchn_pending_sel && ghc_valid) { - /* Fast path */ - pagefault_disable(); - err = __get_user(rc, (u8 __user *)ghc->hva + offset); - pagefault_enable(); - if (!err) - return rc; - } - - /* Slow path */ + read_lock_irqsave(&gpc->lock, flags); + while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa, + sizeof(struct vcpu_info))) { + read_unlock_irqrestore(&gpc->lock, flags); - /* - * This function gets called from kvm_vcpu_block() after setting the - * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately - * from a HLT. So we really mustn't sleep. If the page ended up absent - * at that point, just return 1 in order to trigger an immediate wake, - * and we'll end up getting called again from a context where we *can* - * fault in the page and wait for it. - */ - if (atomic) - return 1; + /* + * This function gets called from kvm_vcpu_block() after setting the + * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately + * from a HLT. So we really mustn't sleep. If the page ended up absent + * at that point, just return 1 in order to trigger an immediate wake, + * and we'll end up getting called again from a context where we *can* + * fault in the page and wait for it. + */ + if (in_atomic() || !task_is_running(current)) + return 1; - if (!ghc_valid) { - err = kvm_gfn_to_hva_cache_init(v->kvm, ghc, ghc->gpa, ghc->len); - if (err || !ghc->memslot) { + if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, + sizeof(struct vcpu_info), + false)) { /* * If this failed, userspace has screwed up the * vcpu_info mapping. No interrupts for you. */ return 0; } + read_lock_irqsave(&gpc->lock, flags); } - /* - * Now we have a valid (protected by srcu) userspace HVA in - * ghc->hva which points to the struct vcpu_info. If there - * are any bits in the in-kernel evtchn_pending_sel then - * we need to write those to the guest vcpu_info and set - * its evtchn_upcall_pending flag. If there aren't any bits - * to add, we only want to *check* evtchn_upcall_pending. - */ - if (evtchn_pending_sel) { - bool long_mode = v->kvm->arch.xen.long_mode; - - if (!user_access_begin((void __user *)ghc->hva, sizeof(struct vcpu_info))) - return 0; - - if (IS_ENABLED(CONFIG_64BIT) && long_mode) { - struct vcpu_info __user *vi = (void __user *)ghc->hva; - - /* Attempt to set the evtchn_pending_sel bits in the - * guest, and if that succeeds then clear the same - * bits in the in-kernel version. */ - asm volatile("1:\t" LOCK_PREFIX "orq %0, %1\n" - "\tnotq %0\n" - "\t" LOCK_PREFIX "andq %0, %2\n" - "2:\n" - _ASM_EXTABLE_UA(1b, 2b) - : "=r" (evtchn_pending_sel), - "+m" (vi->evtchn_pending_sel), - "+m" (v->arch.xen.evtchn_pending_sel) - : "0" (evtchn_pending_sel)); - } else { - struct compat_vcpu_info __user *vi = (void __user *)ghc->hva; - u32 evtchn_pending_sel32 = evtchn_pending_sel; - - /* Attempt to set the evtchn_pending_sel bits in the - * guest, and if that succeeds then clear the same - * bits in the in-kernel version. */ - asm volatile("1:\t" LOCK_PREFIX "orl %0, %1\n" - "\tnotl %0\n" - "\t" LOCK_PREFIX "andl %0, %2\n" - "2:\n" - _ASM_EXTABLE_UA(1b, 2b) - : "=r" (evtchn_pending_sel32), - "+m" (vi->evtchn_pending_sel), - "+m" (v->arch.xen.evtchn_pending_sel) - : "0" (evtchn_pending_sel32)); - } - rc = 1; - unsafe_put_user(rc, (u8 __user *)ghc->hva + offset, err); - - err: - user_access_end(); - - mark_page_dirty_in_slot(v->kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT); - } else { - __get_user(rc, (u8 __user *)ghc->hva + offset); - } - + rc = ((struct vcpu_info *)gpc->khva)->evtchn_upcall_pending; + read_unlock_irqrestore(&gpc->lock, flags); return rc; } @@ -454,25 +447,18 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) offsetof(struct compat_vcpu_info, time)); if (data->u.gpa == GPA_INVALID) { - vcpu->arch.xen.vcpu_info_set = false; + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache); r = 0; break; } - /* It must fit within a single page */ - if ((data->u.gpa & ~PAGE_MASK) + sizeof(struct vcpu_info) > PAGE_SIZE) { - r = -EINVAL; - break; - } - - r = kvm_gfn_to_hva_cache_init(vcpu->kvm, + r = kvm_gfn_to_pfn_cache_init(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache, - data->u.gpa, - sizeof(struct vcpu_info)); - if (!r) { - vcpu->arch.xen.vcpu_info_set = true; + NULL, false, true, data->u.gpa, + sizeof(struct vcpu_info), false); + if (!r) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); - } + break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: @@ -629,7 +615,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) switch (data->type) { case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO: - if (vcpu->arch.xen.vcpu_info_set) + if (vcpu->arch.xen.vcpu_info_cache.active) data->u.gpa = vcpu->arch.xen.vcpu_info_cache.gpa; else data->u.gpa = GPA_INVALID; @@ -902,16 +888,17 @@ int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, if (!vcpu) return -1; - if (!vcpu->arch.xen.vcpu_info_set) + if (!vcpu->arch.xen.vcpu_info_cache.active) return -1; if (e->xen_evtchn.port >= max_evtchn_port(kvm)) return -1; rc = -EWOULDBLOCK; - read_lock_irqsave(&gpc->lock, flags); idx = srcu_read_lock(&kvm->srcu); + + read_lock_irqsave(&gpc->lock, flags); if (!kvm_gfn_to_pfn_cache_check(kvm, gpc, gpc->gpa, PAGE_SIZE)) goto out_rcu; @@ -939,17 +926,44 @@ int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, } else if (test_bit(e->xen_evtchn.port, mask_bits)) { rc = -1; /* Masked */ } else { - rc = 1; /* Delivered. But was the vCPU waking already? */ - if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel)) - kick_vcpu = true; + rc = 1; /* Delivered to the bitmap in shared_info. */ + /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ + read_unlock_irqrestore(&gpc->lock, flags); + gpc = &vcpu->arch.xen.vcpu_info_cache; + + read_lock_irqsave(&gpc->lock, flags); + if (!kvm_gfn_to_pfn_cache_check(kvm, gpc, gpc->gpa, sizeof(struct vcpu_info))) { + /* + * Could not access the vcpu_info. Set the bit in-kernel + * and prod the vCPU to deliver it for itself. + */ + if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel)) + kick_vcpu = true; + goto out_rcu; + } + + if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { + struct vcpu_info *vcpu_info = gpc->khva; + if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) { + WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1); + kick_vcpu = true; + } + } else { + struct compat_vcpu_info *vcpu_info = gpc->khva; + if (!test_and_set_bit(port_word_bit, + (unsigned long *)&vcpu_info->evtchn_pending_sel)) { + WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1); + kick_vcpu = true; + } + } } out_rcu: - srcu_read_unlock(&kvm->srcu, idx); read_unlock_irqrestore(&gpc->lock, flags); + srcu_read_unlock(&kvm->srcu, idx); if (kick_vcpu) { - kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_make_request(KVM_REQ_UNBLOCK, vcpu); kvm_vcpu_kick(vcpu); } @@ -1052,4 +1066,6 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.runstate_cache); + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, + &vcpu->arch.xen.vcpu_info_cache); } diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 54b2bf4c3001..7dd0590f93e1 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -15,6 +15,7 @@ extern struct static_key_false_deferred kvm_xen_enabled; int __kvm_xen_has_interrupt(struct kvm_vcpu *vcpu); +void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu); int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); @@ -46,11 +47,19 @@ static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm) static inline int kvm_xen_has_interrupt(struct kvm_vcpu *vcpu) { if (static_branch_unlikely(&kvm_xen_enabled.key) && - vcpu->arch.xen.vcpu_info_set && vcpu->kvm->arch.xen.upcall_vector) + vcpu->arch.xen.vcpu_info_cache.active && + vcpu->kvm->arch.xen.upcall_vector) return __kvm_xen_has_interrupt(vcpu); return 0; } + +static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) +{ + return static_branch_unlikely(&kvm_xen_enabled.key) && + vcpu->arch.xen.evtchn_pending_sel; +} + #else static inline int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data) { @@ -83,6 +92,15 @@ static inline int kvm_xen_has_interrupt(struct kvm_vcpu *vcpu) { return 0; } + +static inline void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu) +{ +} + +static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) +{ + return false; +} #endif int kvm_xen_hypercall(struct kvm_vcpu *vcpu); From patchwork Thu Feb 10 00:27:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0531EC433F5 for ; Thu, 10 Feb 2022 01:31:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232733AbiBJBbp (ORCPT ); Wed, 9 Feb 2022 20:31:45 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:38536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232620AbiBJBbk (ORCPT ); Wed, 9 Feb 2022 20:31:40 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D921F25D0 for ; Wed, 9 Feb 2022 17:31:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=1qcdLnfj9OsEH3dcLdWFRjxr11eA1hrs7qlmndQN3IQ=; b=d3lZpqBCatZxI7wcxkN0DITmLt drTRgMWfUjueRhgupzNf8d2Jafjq1MxkVBdhQB7NkO0CbLuXJPZCrkRkWp1tfy6aOeah7EpUkCYTn 7EaCeKyK1rWs8tfCxmJJANTb1iZsBBfpU5nrMSjCmyVeVDdSQISzRRlVrEt1XxN13wWQ0hYMhlwWj Z05b1v4i+xODfP4Ns0QqL0W/TCnCh6r4SXS3RJc1AB79dkoY5MHYla+ioYU+NkGSY/Ye8Phz3RiNB OR3xicMbROQc6ULQJH4ZlzNJNg7WM0tO2mtb5mDdyfy0TmsUdLph0E8oxZ4BOErFZF5KtiklrUh2s DSNpU/4Q==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl4-HR; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019D3-2D; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 05/15] KVM: x86/xen: Use gfn_to_pfn_cache for vcpu_time_info Date: Thu, 10 Feb 2022 00:27:11 +0000 Message-Id: <20220210002721.273608-6-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse This switches the final pvclock to kvm_setup_pvclock_pfncache() and now the old kvm_setup_pvclock_page() can be removed. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/x86.c | 63 ++------------------------------- arch/x86/kvm/xen.c | 24 ++++++------- 3 files changed, 13 insertions(+), 77 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 118c9ce8e821..1c63296d3951 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -601,9 +601,8 @@ struct kvm_vcpu_hv { struct kvm_vcpu_xen { u64 hypercall_rip; u32 current_runstate; - bool vcpu_time_info_set; struct gfn_to_pfn_cache vcpu_info_cache; - struct gfn_to_hva_cache vcpu_time_info_cache; + struct gfn_to_pfn_cache vcpu_time_info_cache; struct gfn_to_pfn_cache runstate_cache; u64 last_steal; u64 runstate_entry_time; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 268f64b70768..72c72a36bd4d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2969,65 +2969,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v, trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock); } -static void kvm_setup_pvclock_page(struct kvm_vcpu *v, - struct gfn_to_hva_cache *cache, - unsigned int offset) -{ - struct kvm_vcpu_arch *vcpu = &v->arch; - struct pvclock_vcpu_time_info guest_hv_clock; - - if (unlikely(kvm_read_guest_offset_cached(v->kvm, cache, - &guest_hv_clock, offset, sizeof(guest_hv_clock)))) - return; - - /* This VCPU is paused, but it's legal for a guest to read another - * VCPU's kvmclock, so we really have to follow the specification where - * it says that version is odd if data is being modified, and even after - * it is consistent. - * - * Version field updates must be kept separate. This is because - * kvm_write_guest_cached might use a "rep movs" instruction, and - * writes within a string instruction are weakly ordered. So there - * are three writes overall. - * - * As a small optimization, only write the version field in the first - * and third write. The vcpu->pv_time cache is still valid, because the - * version field is the first in the struct. - */ - BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); - - if (guest_hv_clock.version & 1) - ++guest_hv_clock.version; /* first time write, random junk */ - - vcpu->hv_clock.version = guest_hv_clock.version + 1; - kvm_write_guest_offset_cached(v->kvm, cache, - &vcpu->hv_clock, offset, - sizeof(vcpu->hv_clock.version)); - - smp_wmb(); - - /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ - vcpu->hv_clock.flags |= (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); - - if (vcpu->pvclock_set_guest_stopped_request) { - vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED; - vcpu->pvclock_set_guest_stopped_request = false; - } - - trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock); - - kvm_write_guest_offset_cached(v->kvm, cache, - &vcpu->hv_clock, offset, - sizeof(vcpu->hv_clock)); - - smp_wmb(); - - vcpu->hv_clock.version++; - kvm_write_guest_offset_cached(v->kvm, cache, - &vcpu->hv_clock, offset, - sizeof(vcpu->hv_clock.version)); -} - static int kvm_guest_time_update(struct kvm_vcpu *v) { unsigned long flags, tgt_tsc_khz; @@ -3119,8 +3060,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) if (vcpu->xen.vcpu_info_cache.active) kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache, offsetof(struct compat_vcpu_info, time)); - if (vcpu->xen.vcpu_time_info_set) - kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0); + if (vcpu->xen.vcpu_time_info_cache.active) + kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0); if (!v->vcpu_idx) kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock); return 0; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 545c1d5c070e..d176d7e15c50 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -463,25 +463,19 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: if (data->u.gpa == GPA_INVALID) { - vcpu->arch.xen.vcpu_time_info_set = false; + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, + &vcpu->arch.xen.vcpu_time_info_cache); r = 0; break; } - /* It must fit within a single page */ - if ((data->u.gpa & ~PAGE_MASK) + sizeof(struct pvclock_vcpu_time_info) > PAGE_SIZE) { - r = -EINVAL; - break; - } - - r = kvm_gfn_to_hva_cache_init(vcpu->kvm, + r = kvm_gfn_to_pfn_cache_init(vcpu->kvm, &vcpu->arch.xen.vcpu_time_info_cache, - data->u.gpa, - sizeof(struct pvclock_vcpu_time_info)); - if (!r) { - vcpu->arch.xen.vcpu_time_info_set = true; + NULL, false, true, data->u.gpa, + sizeof(struct pvclock_vcpu_time_info), + false); + if (!r) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); - } break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR: @@ -623,7 +617,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: - if (vcpu->arch.xen.vcpu_time_info_set) + if (vcpu->arch.xen.vcpu_time_info_cache.active) data->u.gpa = vcpu->arch.xen.vcpu_time_info_cache.gpa; else data->u.gpa = GPA_INVALID; @@ -1068,4 +1062,6 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) &vcpu->arch.xen.runstate_cache); kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.vcpu_info_cache); + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, + &vcpu->arch.xen.vcpu_time_info_cache); } From patchwork Thu Feb 10 00:27:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 977FAC433F5 for ; Thu, 10 Feb 2022 01:32:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232956AbiBJBcO (ORCPT ); Wed, 9 Feb 2022 20:32:14 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:39584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232953AbiBJBcD (ORCPT ); Wed, 9 Feb 2022 20:32:03 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D789825D0 for ; Wed, 9 Feb 2022 17:32:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=C2FYqCfa343t1EFUPyqJ8P8ZrP+FazJ1u8EFj229RSU=; b=OcAQ0qkB/rZ0jAn+e4NopCGtwj yKHOlOqdPe3JiMhNrI2xEITYWZo6VbJ7bq6xgjcKKnfu0LkR8pF20k2AoVDy1XdbpSiS/OKUUgP92 SE/kS3EoOhx3b5UoGmV/kw7igOzk4Cia7FgkUGUrK4SVZr/59NS0WI8eMDEg2lY1ISsNEk1WtT+8X j/TNZvlvTbg0aLGZJE45+1YyyIxE+hrHkYhacpAIIktjKLn/76RFvQz6xlYgD3O83TZwDRcfIhFOP /8n5QREWDhKX3t6CUdd0tgCcpbr1Q+kXrfdSvhs6D1Wq45v8bvP5nBsBvhxswZHBFsYzR/xFkItGQ biTonRHQ==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl5-IJ; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019D8-30; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 06/15] KVM: x86/xen: Make kvm_xen_set_evtchn() reusable from other places Date: Thu, 10 Feb 2022 00:27:12 +0000 Message-Id: <20220210002721.273608-7-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Clean it up to return -errno on error consistently, while still being compatible with the return conventions for kvm_arch_set_irq_inatomic() and the kvm_set_irq() callback. We use -ENOTCONN to indicate when the port is masked. No existing users care, except that it's negative. Also allow it to optimise the vCPU lookup. Unless we abuse the lapic map, there is no quick lookup from APIC ID to a vCPU; the logic in kvm_get_vcpu_by_id() will just iterate over all vCPUs till it finds the one it wants. So do that just once and stash the result in the struct kvm_xen_evtchn for next time. Signed-off-by: David Woodhouse --- arch/x86/kvm/irq_comm.c | 2 +- arch/x86/kvm/xen.c | 83 ++++++++++++++++++++++++++++------------ arch/x86/kvm/xen.h | 2 +- include/linux/kvm_host.h | 3 +- 4 files changed, 62 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index 6e0dab04320e..0687162c4f22 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -181,7 +181,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e, if (!level) return -1; - return kvm_xen_set_evtchn_fast(e, kvm); + return kvm_xen_set_evtchn_fast(&e->xen_evtchn, kvm); #endif default: break; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index d176d7e15c50..ac3aeecad56e 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -861,13 +861,16 @@ static inline int max_evtchn_port(struct kvm *kvm) } /* - * This follows the kvm_set_irq() API, so it returns: + * The return value from this function is propagated to kvm_set_irq() API, + * so it returns: * < 0 Interrupt was ignored (masked or not delivered for other reasons) * = 0 Interrupt was coalesced (previous irq is still pending) * > 0 Number of CPUs interrupt was delivered to + * + * It is also called directly from kvm_arch_set_irq_inatomic(), where the + * only check on its return value is a comparison with -EWOULDBLOCK'. */ -int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, - struct kvm *kvm) +int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) { struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; struct kvm_vcpu *vcpu; @@ -875,18 +878,23 @@ int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, unsigned long flags; int port_word_bit; bool kick_vcpu = false; - int idx; - int rc; + int vcpu_idx, idx, rc; - vcpu = kvm_get_vcpu_by_id(kvm, e->xen_evtchn.vcpu); - if (!vcpu) - return -1; + vcpu_idx = READ_ONCE(xe->vcpu_idx); + if (vcpu_idx >= 0) + vcpu = kvm_get_vcpu(kvm, vcpu_idx); + else { + vcpu = kvm_get_vcpu_by_id(kvm, xe->vcpu_id); + if (!vcpu) + return -EINVAL; + WRITE_ONCE(xe->vcpu_idx, kvm_vcpu_get_idx(vcpu)); + } if (!vcpu->arch.xen.vcpu_info_cache.active) - return -1; + return -EINVAL; - if (e->xen_evtchn.port >= max_evtchn_port(kvm)) - return -1; + if (xe->port >= max_evtchn_port(kvm)) + return -EINVAL; rc = -EWOULDBLOCK; @@ -900,12 +908,12 @@ int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, struct shared_info *shinfo = gpc->khva; pending_bits = (unsigned long *)&shinfo->evtchn_pending; mask_bits = (unsigned long *)&shinfo->evtchn_mask; - port_word_bit = e->xen_evtchn.port / 64; + port_word_bit = xe->port / 64; } else { struct compat_shared_info *shinfo = gpc->khva; pending_bits = (unsigned long *)&shinfo->evtchn_pending; mask_bits = (unsigned long *)&shinfo->evtchn_mask; - port_word_bit = e->xen_evtchn.port / 32; + port_word_bit = xe->port / 32; } /* @@ -915,10 +923,10 @@ int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, * already set, then we kick the vCPU in question to write to the * *real* evtchn_pending_sel in its own guest vcpu_info struct. */ - if (test_and_set_bit(e->xen_evtchn.port, pending_bits)) { + if (test_and_set_bit(xe->port, pending_bits)) { rc = 0; /* It was already raised */ - } else if (test_bit(e->xen_evtchn.port, mask_bits)) { - rc = -1; /* Masked */ + } else if (test_bit(xe->port, mask_bits)) { + rc = -ENOTCONN; /* Masked */ } else { rc = 1; /* Delivered to the bitmap in shared_info. */ /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ @@ -964,17 +972,12 @@ int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, return rc; } -/* This is the version called from kvm_set_irq() as the .set function */ -static int evtchn_set_fn(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, - int irq_source_id, int level, bool line_status) +static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm) { bool mm_borrowed = false; int rc; - if (!level) - return -1; - - rc = kvm_xen_set_evtchn_fast(e, kvm); + rc = kvm_xen_set_evtchn_fast(xe, kvm); if (rc != -EWOULDBLOCK) return rc; @@ -1018,7 +1021,7 @@ static int evtchn_set_fn(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; int idx; - rc = kvm_xen_set_evtchn_fast(e, kvm); + rc = kvm_xen_set_evtchn_fast(xe, kvm); if (rc != -EWOULDBLOCK) break; @@ -1036,11 +1039,27 @@ static int evtchn_set_fn(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm return rc; } +/* This is the version called from kvm_set_irq() as the .set function */ +static int evtchn_set_fn(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, + int irq_source_id, int level, bool line_status) +{ + if (!level) + return -EINVAL; + + return kvm_xen_set_evtchn(&e->xen_evtchn, kvm); +} + +/* + * Set up an event channel interrupt from the KVM IRQ routing table. + * Used for e.g. PIRQ from passed through physical devices. + */ int kvm_xen_setup_evtchn(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, const struct kvm_irq_routing_entry *ue) { + struct kvm_vcpu *vcpu; + if (ue->u.xen_evtchn.port >= max_evtchn_port(kvm)) return -EINVAL; @@ -1048,8 +1067,22 @@ int kvm_xen_setup_evtchn(struct kvm *kvm, if (ue->u.xen_evtchn.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) return -EINVAL; + /* + * Xen gives us interesting mappings from vCPU index to APIC ID, + * which means kvm_get_vcpu_by_id() has to iterate over all vCPUs + * to find it. Do that once at setup time, instead of every time. + * But beware that on live update / live migration, the routing + * table might be reinstated before the vCPU threads have finished + * recreating their vCPUs. + */ + vcpu = kvm_get_vcpu_by_id(kvm, ue->u.xen_evtchn.vcpu); + if (vcpu) + e->xen_evtchn.vcpu_idx = kvm_vcpu_get_idx(vcpu); + else + e->xen_evtchn.vcpu_idx = -1; + e->xen_evtchn.port = ue->u.xen_evtchn.port; - e->xen_evtchn.vcpu = ue->u.xen_evtchn.vcpu; + e->xen_evtchn.vcpu_id = ue->u.xen_evtchn.vcpu; e->xen_evtchn.priority = ue->u.xen_evtchn.priority; e->set = evtchn_set_fn; diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 7dd0590f93e1..e28feb32add6 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -25,7 +25,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc); void kvm_xen_init_vm(struct kvm *kvm); void kvm_xen_destroy_vm(struct kvm *kvm); void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu); -int kvm_xen_set_evtchn_fast(struct kvm_kernel_irq_routing_entry *e, +int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm); int kvm_xen_setup_evtchn(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 06912d6b39d0..5142b5795fea 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -496,7 +496,8 @@ struct kvm_hv_sint { struct kvm_xen_evtchn { u32 port; - u32 vcpu; + u32 vcpu_id; + int vcpu_idx; u32 priority; }; From patchwork Thu Feb 10 00:27:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F180C433EF for ; Thu, 10 Feb 2022 01:31:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232813AbiBJBbu (ORCPT ); Wed, 9 Feb 2022 20:31:50 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:38630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232735AbiBJBbh (ORCPT ); Wed, 9 Feb 2022 20:31:37 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAE07205F2 for ; Wed, 9 Feb 2022 17:31:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=J+XshkmIT/StMwJqjcuIYm/D2WKwNqr6xvXhipXBTTA=; b=Cuna54O38hm49NZbs8gNMJYQ8F DIHQljzdiVBjgA5y093qL6a68crZCgjWzBY1U2YaDxrw+sxk+9k1cL3Rs9dBWWH9cw7jzVPGTSRE4 UvIV71JSVGbFabtfuFIubo7ZafyoJXMUmrOMl6ksYie0hp0rPbb/0j/gLSGhr9Y+K0/qJPw+RLkC4 eYb1+0UJ10UYUlz33soGiB4tltNsD0QGLVHbndlwDnNPXPx5bjE2JR4rfJN5bliGf8Cqgrx+7Rnzy fiUlVBnyiiSFszGX1SliWo+NhrT3GDRjOgOIa2J5O2CP76SNOpZoVUcjnn+QCU8zwvJaO8Ql9AYbE Axu3wlgA==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl6-JO; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019DD-3q; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 07/15] KVM: x86/xen: Support direct injection of event channel events Date: Thu, 10 Feb 2022 00:27:13 +0000 Message-Id: <20220210002721.273608-8-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse This adds a KVM_XEN_HVM_EVTCHN_SEND ioctl which allows direct injection of events given an explicit { vcpu, port, priority } in precisely the same form that those fields are given in the IRQ routing table. Userspace is currently able to inject 2-level events purely by setting the bits in the shared_info and vcpu_info, but FIFO event channels are harder to deal with; we will need the kernel to take sole ownership of delivery when we support those. A patch advertising this feature with a new bit in the KVM_CAP_XEN_HVM ioctl will be added in a subsequent patch. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 9 +++++++++ arch/x86/kvm/xen.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/kvm/xen.h | 1 + include/uapi/linux/kvm.h | 3 +++ 4 files changed, 45 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 72c72a36bd4d..a54bd731cf41 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6440,6 +6440,15 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_xen_hvm_set_attr(kvm, &xha); break; } + case KVM_XEN_HVM_EVTCHN_SEND: { + struct kvm_irq_routing_xen_evtchn uxe; + + r = -EFAULT; + if (copy_from_user(&uxe, argp, sizeof(uxe))) + goto out; + r = kvm_xen_hvm_evtchn_send(kvm, &uxe); + break; + } #endif case KVM_SET_CLOCK: r = kvm_vm_ioctl_set_clock(kvm, argp); diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index ac3aeecad56e..90fffca518f0 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -1089,6 +1089,38 @@ int kvm_xen_setup_evtchn(struct kvm *kvm, return 0; } +/* + * Explicit event sending from userspace with KVM_XEN_HVM_EVTCHN_SEND ioctl. + */ +int kvm_xen_hvm_evtchn_send(struct kvm *kvm, struct kvm_irq_routing_xen_evtchn *uxe) +{ + struct kvm_xen_evtchn e; + int ret; + + if (!uxe->port || uxe->port >= max_evtchn_port(kvm)) + return -EINVAL; + + /* We only support 2 level event channels for now */ + if (uxe->priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) + return -EINVAL; + + e.port = uxe->port; + e.vcpu_id = uxe->vcpu; + e.vcpu_idx = -1; + e.priority = uxe->priority; + + ret = kvm_xen_set_evtchn(&e, kvm); + + /* + * None of that 'return 1 if it actually got delivered' nonsense. + * We don't care if it was masked (-ENOTCONN) either. + */ + if (ret > 0 || ret == -ENOTCONN) + ret = 0; + + return ret; +} + void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index e28feb32add6..852286de574e 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -20,6 +20,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); +int kvm_xen_hvm_evtchn_send(struct kvm *kvm, struct kvm_irq_routing_xen_evtchn *evt); int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data); int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc); void kvm_xen_init_vm(struct kvm *kvm); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index b46bcdb0cab1..f82b503e26e2 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1692,6 +1692,9 @@ struct kvm_xen_hvm_attr { #define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr) #define KVM_XEN_VCPU_SET_ATTR _IOW(KVMIO, 0xcb, struct kvm_xen_vcpu_attr) +/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */ +#define KVM_XEN_HVM_EVTCHN_SEND _IOW(KVMIO, 0xd0, struct kvm_irq_routing_xen_evtchn) + #define KVM_GET_SREGS2 _IOR(KVMIO, 0xcc, struct kvm_sregs2) #define KVM_SET_SREGS2 _IOW(KVMIO, 0xcd, struct kvm_sregs2) From patchwork Thu Feb 10 00:27:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741182 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2554C433FE for ; Thu, 10 Feb 2022 02:09:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233609AbiBJCJD (ORCPT ); Wed, 9 Feb 2022 21:09:03 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:55262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235871AbiBJCIl (ORCPT ); Wed, 9 Feb 2022 21:08:41 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33A56286 for ; Wed, 9 Feb 2022 18:08:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Iw9MIDIagznm07FHPCkl35WaGQDZFSqo3TgOuINQbc0=; b=B24qu5uKawiTz66GhrdWIFWDCe RWffP8Wo373pyLreZKMFFVrF31BCx40boW3u0nCqVvdWMHGnlmFqU/F1xpygdPR9N7e9Qv2FJTlSQ JqrZUcFtJlymCd8eQ/gd7HWO2p1MU1IPp7KFOdDXghhUu9nOHE2Lm633pvhedtf73wrh27PvTQXgt KKZUomHUq1eGoHyHweQcNgPLXzFAoIAstWHeWucLRVwa3iJDY35TH2eV17FOvGunBJJfMk3w0Wvo/ /InS1rXcIrXME9nzLKGysdcwPMwk3nl3RFHn9WFEgdQBfotM5SjY60PKwrEvZD1SGlVZvPvmPRSib TV1rb5pA==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008YWF-DE; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019DI-4c; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 08/15] KVM: x86/xen: intercept EVTCHNOP_send from guests Date: Thu, 10 Feb 2022 00:27:14 +0000 Message-Id: <20220210002721.273608-9-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Joao Martins Userspace registers a sending @port to either deliver to an @eventfd or directly back to a local event channel port. After binding events the guest or host may wish to bind those events to a particular vcpu. This is usually done for unbound and and interdomain events. Update requests are handled via the KVM_XEN_EVTCHN_UPDATE flag. Unregistered ports are handled by the emulator. Co-developed-by: Ankur Arora Co-developed-By: David Woodhouse Signed-off-by: Joao Martins Signed-off-by: Ankur Arora Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/xen.c | 273 ++++++++++++++++++++++++++++++-- include/uapi/linux/kvm.h | 27 ++++ 3 files changed, 286 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1c63296d3951..dca0842d6926 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1020,6 +1020,7 @@ struct kvm_xen { bool long_mode; u8 upcall_vector; struct gfn_to_pfn_cache shinfo_cache; + struct idr evtchn_ports; }; enum kvm_irqchip_mode { diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 90fffca518f0..909c25521554 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -11,6 +11,7 @@ #include "lapic.h" #include "hyperv.h" +#include #include #include @@ -21,6 +22,9 @@ #include "trace.h" +static int kvm_xen_setattr_evtchn(struct kvm *kvm, struct kvm_xen_hvm_attr *data); +static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r); + DEFINE_STATIC_KEY_DEFERRED_FALSE(kvm_xen_enabled, HZ); static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) @@ -365,36 +369,44 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { int r = -ENOENT; - mutex_lock(&kvm->lock); switch (data->type) { case KVM_XEN_ATTR_TYPE_LONG_MODE: if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) { r = -EINVAL; } else { + mutex_lock(&kvm->lock); kvm->arch.xen.long_mode = !!data->u.long_mode; + mutex_unlock(&kvm->lock); r = 0; } break; case KVM_XEN_ATTR_TYPE_SHARED_INFO: + mutex_lock(&kvm->lock); r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn); + mutex_unlock(&kvm->lock); break; case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR: if (data->u.vector && data->u.vector < 0x10) r = -EINVAL; else { + mutex_lock(&kvm->lock); kvm->arch.xen.upcall_vector = data->u.vector; + mutex_unlock(&kvm->lock); r = 0; } break; + case KVM_XEN_ATTR_TYPE_EVTCHN: + r = kvm_xen_setattr_evtchn(kvm, data); + break; + default: break; } - mutex_unlock(&kvm->lock); return r; } @@ -772,18 +784,6 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) return 0; } -void kvm_xen_init_vm(struct kvm *kvm) -{ -} - -void kvm_xen_destroy_vm(struct kvm *kvm) -{ - kvm_gfn_to_pfn_cache_destroy(kvm, &kvm->arch.xen.shinfo_cache); - - if (kvm->arch.xen_hvm_config.msr) - static_branch_slow_dec_deferred(&kvm_xen_enabled); -} - static int kvm_xen_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result) { kvm_rax_write(vcpu, result); @@ -803,7 +803,8 @@ static int kvm_xen_hypercall_complete_userspace(struct kvm_vcpu *vcpu) int kvm_xen_hypercall(struct kvm_vcpu *vcpu) { bool longmode; - u64 input, params[6]; + u64 input, params[6], r = -ENOSYS; + bool handled = false; input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX); @@ -834,6 +835,19 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu) trace_kvm_xen_hypercall(input, params[0], params[1], params[2], params[3], params[4], params[5]); + switch (input) { + case __HYPERVISOR_event_channel_op: + if (params[0] == EVTCHNOP_send) + handled = kvm_xen_hcall_evtchn_send(vcpu, params[1], &r); + break; + + default: + break; + } + + if (handled) + return kvm_xen_hypercall_set_result(vcpu, r); + vcpu->run->exit_reason = KVM_EXIT_XEN; vcpu->run->xen.type = KVM_EXIT_XEN_HCALL; vcpu->run->xen.u.hcall.longmode = longmode; @@ -1121,6 +1135,212 @@ int kvm_xen_hvm_evtchn_send(struct kvm *kvm, struct kvm_irq_routing_xen_evtchn * return ret; } +/* + * Support for *outbound* event channel events via the EVTCHNOP_send hypercall. + */ +struct evtchnfd { + u32 send_port; + u32 type; + union { + struct kvm_xen_evtchn port; + struct { + u32 port; /* zero */ + struct eventfd_ctx *ctx; + } eventfd; + } deliver; +}; + +/* + * Update target vCPU or priority for a registered sending channel. + */ +static int kvm_xen_eventfd_update(struct kvm *kvm, + struct kvm_xen_hvm_attr *data) +{ + u32 port = data->u.evtchn.send_port; + struct evtchnfd *evtchnfd; + + if (!port || port >= max_evtchn_port(kvm)) + return -EINVAL; + + mutex_lock(&kvm->lock); + evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); + mutex_unlock(&kvm->lock); + + if (!evtchnfd) + return -ENOENT; + + /* For an UPDATE, nothing may change except the priority/vcpu */ + if (evtchnfd->type != data->u.evtchn.type) + return -EINVAL; + + /* + * Port cannot change, and if it's zero that was an eventfd + * which can't be changed either. + */ + if (!evtchnfd->deliver.port.port || + evtchnfd->deliver.port.port != data->u.evtchn.deliver.port.port) + return -EINVAL; + + /* We only support 2 level event channels for now */ + if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) + return -EINVAL; + + mutex_lock(&kvm->lock); + evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; + if (evtchnfd->deliver.port.vcpu_id != data->u.evtchn.deliver.port.vcpu) { + evtchnfd->deliver.port.vcpu_id = data->u.evtchn.deliver.port.vcpu; + evtchnfd->deliver.port.vcpu_idx = -1; + } + mutex_unlock(&kvm->lock); + return 0; +} + +/* + * Configure the target (eventfd or local port delivery) for sending on + * a given event channel. + */ +static int kvm_xen_eventfd_assign(struct kvm *kvm, + struct kvm_xen_hvm_attr *data) +{ + u32 port = data->u.evtchn.send_port; + struct eventfd_ctx *eventfd = NULL; + struct evtchnfd *evtchnfd = NULL; + int ret = -EINVAL; + + if (!port || port >= max_evtchn_port(kvm)) + return -EINVAL; + + evtchnfd = kzalloc(sizeof(struct evtchnfd), GFP_KERNEL); + if (!evtchnfd) + return -ENOMEM; + + switch(data->u.evtchn.type) { + case EVTCHNSTAT_ipi: + /* IPI must map back to the same port# */ + if (data->u.evtchn.deliver.port.port != data->u.evtchn.send_port) + goto out; /* -EINVAL */ + break; + + case EVTCHNSTAT_interdomain: + if (data->u.evtchn.deliver.port.port) { + if (data->u.evtchn.deliver.port.port >= max_evtchn_port(kvm)) + goto out; /* -EINVAL */ + } else { + eventfd = eventfd_ctx_fdget(data->u.evtchn.deliver.eventfd.fd); + if (IS_ERR(eventfd)) { + ret = PTR_ERR(eventfd); + goto out; + } + } + break; + + case EVTCHNSTAT_virq: + case EVTCHNSTAT_closed: + case EVTCHNSTAT_unbound: + case EVTCHNSTAT_pirq: + default: /* Unknown event channel type */ + goto out; /* -EINVAL */ + } + + if (eventfd) { + evtchnfd->deliver.eventfd.ctx = eventfd; + } else { + /* We only support 2 level event channels for now */ + if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) + goto out; /* -EINVAL; */ + + evtchnfd->deliver.port.port = data->u.evtchn.deliver.port.port; + evtchnfd->deliver.port.vcpu_id = data->u.evtchn.deliver.port.vcpu; + evtchnfd->deliver.port.vcpu_idx = -1; + evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; + } + + mutex_lock(&kvm->lock); + ret = idr_alloc(&kvm->arch.xen.evtchn_ports, evtchnfd, port, port + 1, + GFP_KERNEL); + mutex_unlock(&kvm->lock); + + if (ret >= 0) + return 0; + + if (ret == -ENOSPC) + ret = -EEXIST; +out: + if (eventfd) + eventfd_ctx_put(eventfd); + kfree(evtchnfd); + return ret; +} + +static int kvm_xen_eventfd_deassign(struct kvm *kvm, u32 port) +{ + struct evtchnfd *evtchnfd; + + mutex_lock(&kvm->lock); + evtchnfd = idr_remove(&kvm->arch.xen.evtchn_ports, port); + mutex_unlock(&kvm->lock); + + if (!evtchnfd) + return -ENOENT; + + if (kvm) + synchronize_srcu(&kvm->srcu); + if (!evtchnfd->deliver.port.port) + eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); + kfree(evtchnfd); + return 0; +} + +static int kvm_xen_setattr_evtchn(struct kvm *kvm, struct kvm_xen_hvm_attr *data) +{ + u32 port = data->u.evtchn.send_port; + + if (!port || port >= max_evtchn_port(kvm)) + return -EINVAL; + + if (data->u.evtchn.flags == KVM_XEN_EVTCHN_DEASSIGN) + return kvm_xen_eventfd_deassign(kvm, port); + if (data->u.evtchn.flags == KVM_XEN_EVTCHN_UPDATE) + return kvm_xen_eventfd_update(kvm, data); + if (data->u.evtchn.flags) + return -EINVAL; + + return kvm_xen_eventfd_assign(kvm, data); +} + +static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r) +{ + struct evtchnfd *evtchnfd; + struct evtchn_send send; + gpa_t gpa; + int idx; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + gpa = kvm_mmu_gva_to_gpa_system(vcpu, param, NULL); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (!gpa || kvm_vcpu_read_guest(vcpu, gpa, &send, sizeof(send))) { + *r = -EFAULT; + return true; + } + + /* The evtchn_ports idr is protected by vcpu->kvm->srcu */ + evtchnfd = idr_find(&vcpu->kvm->arch.xen.evtchn_ports, send.port); + if (!evtchnfd) + return false; + + if (evtchnfd->deliver.port.port) { + int ret = kvm_xen_set_evtchn(&evtchnfd->deliver.port, vcpu->kvm); + if (ret < 0 && ret != -ENOTCONN) + return false; + } else { + eventfd_signal(evtchnfd->deliver.eventfd.ctx, 1); + } + + *r = 0; + return true; +} + void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, @@ -1130,3 +1350,26 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.vcpu_time_info_cache); } + +void kvm_xen_init_vm(struct kvm *kvm) +{ + idr_init(&kvm->arch.xen.evtchn_ports); +} + +void kvm_xen_destroy_vm(struct kvm *kvm) +{ + struct evtchnfd *evtchnfd; + int i; + + kvm_gfn_to_pfn_cache_destroy(kvm, &kvm->arch.xen.shinfo_cache); + + idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) { + if (!evtchnfd->deliver.port.port) + eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); + kfree(evtchnfd); + } + idr_destroy(&kvm->arch.xen.evtchn_ports); + + if (kvm->arch.xen_hvm_config.msr) + static_branch_slow_dec_deferred(&kvm_xen_enabled); +} diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f82b503e26e2..c0d6f49e3fd8 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1679,6 +1679,31 @@ struct kvm_xen_hvm_attr { struct { __u64 gfn; } shared_info; + struct { + __u32 send_port; + __u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */ + __u32 flags; +#define KVM_XEN_EVTCHN_DEASSIGN (1 << 0) +#define KVM_XEN_EVTCHN_UPDATE (1 << 1) + /* + * Events sent by the guest are either looped back to + * the guest itself (potentially on a different port#) + * or signalled via an eventfd. + */ + union { + struct { + __u32 port; + __u32 vcpu; + __u32 priority; + } port; + struct { + __u32 port; /* Zero for eventfd */ + __s32 fd; + } eventfd; + __u32 padding[4]; + } deliver; + } evtchn; + __u64 pad[8]; } u; }; @@ -1687,6 +1712,8 @@ struct kvm_xen_hvm_attr { #define KVM_XEN_ATTR_TYPE_LONG_MODE 0x0 #define KVM_XEN_ATTR_TYPE_SHARED_INFO 0x1 #define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR 0x2 +/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */ +#define KVM_XEN_ATTR_TYPE_EVTCHN 0x3 /* Per-vCPU Xen attributes */ #define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr) From patchwork Thu Feb 10 00:27:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741180 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DB82C433F5 for ; Thu, 10 Feb 2022 02:08:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231315AbiBJCIz (ORCPT ); Wed, 9 Feb 2022 21:08:55 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:55304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235899AbiBJCIn (ORCPT ); Wed, 9 Feb 2022 21:08:43 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 833C6B6B for ; Wed, 9 Feb 2022 18:08:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=TqNdvpCfmSXhqFJSoKVWx4TQwHdp3FSWSSRfSqxyXY0=; b=V7Nms1pwSXbtAyYom22/soVSvo cohnmRSFlz1JwUS7vYiR58/ecKFcj+TZdv8CXnRU0Mf08TlvF9h/KcLFL+Emg1AkmyCFEDJ/9seX3 sZQ2r2m9dKBo4xhuWGJa6zdaG7GQxxSgOnFd2JemlLhOe3Mgxw3B7lK+A/S2MB3SD2sOhn5Obvhwb +TXVx4SoShwtcV8VXu7vCeJwYfXs2WnJSGJj5uzoZvV1qqI/hmR0rfRK+xoLPxWizuZKeZFlVPq2B u8TqXE/RoBdr1wrO2i5MpibWii8a0+6lk960hehwP3gC2LA1U5P5VhxFV/ar8H0dgxCaokCsYWYYP RYzdq8fw==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008YWG-Dl; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019DN-5U; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 09/15] KVM: x86/xen: handle PV IPI vcpu yield Date: Thu, 10 Feb 2022 00:27:15 +0000 Message-Id: <20220210002721.273608-10-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Joao Martins Cooperative Linux guests after an IPI-many may yield vcpu if any of the IPI'd vcpus were preempted (i.e. runstate is 'runnable'.) Support SCHEDOP_yield for handling yield. Signed-off-by: Joao Martins Signed-off-by: David Woodhouse --- arch/x86/kvm/xen.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 909c25521554..3df44c996d66 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "trace.h" @@ -800,6 +801,20 @@ static int kvm_xen_hypercall_complete_userspace(struct kvm_vcpu *vcpu) return kvm_xen_hypercall_set_result(vcpu, run->xen.u.hcall.result); } +static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, int cmd, u64 param, u64 *r) +{ + switch (cmd) { + case SCHEDOP_yield: + kvm_vcpu_on_spin(vcpu, true); + *r = 0; + return true; + default: + break; + } + + return false; +} + int kvm_xen_hypercall(struct kvm_vcpu *vcpu) { bool longmode; @@ -840,7 +855,9 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu) if (params[0] == EVTCHNOP_send) handled = kvm_xen_hcall_evtchn_send(vcpu, params[1], &r); break; - + case __HYPERVISOR_sched_op: + handled = kvm_xen_hcall_sched_op(vcpu, params[0], params[1], &r); + break; default: break; } From patchwork Thu Feb 10 00:27:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1FDAC433F5 for ; Thu, 10 Feb 2022 01:31:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232788AbiBJBbr (ORCPT ); Wed, 9 Feb 2022 20:31:47 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:38516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232874AbiBJBbi (ORCPT ); Wed, 9 Feb 2022 20:31:38 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92FD610C4 for ; Wed, 9 Feb 2022 17:31:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=GsHOqOyhR3mAI8yaB48JEMlYdYgqkUbJ8nAunLp9SEg=; b=mT5xiR7IK5VXb7QAT6ywjhvZYb 8jLJi9nMhNDHVuKNBGkcMuyxFLI1414NGesDuJHViiNQSbOS1VrwcJBi+Xs57J0pFhUYIS33Qt4YM +vOJsUwzxuMsXM6nu6tC19/AGUmqV05ZtpvbonPkBy+L8vYCHI7sBoQGXT9ClaL8pEfbP4leoxe+T ImAQHtWHOx1sMJ7rDa0pSCA0/ywYkjP/XpTa+1ZipzDRz9In7h1Gnkm/XTF84mgordFMrSzk6HRyi pfer/LvgxLrfbWZ5nQQOdodL0OIiK5nrtUd4EuNOT0ldb7yUCPbO5G+Qby7nh7loebc55JhXPEaSv Bx8rNEtQ==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl7-Kn; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019DS-64; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 10/15] KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID Date: Thu, 10 Feb 2022 00:27:16 +0000 Message-Id: <20220210002721.273608-11-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse In order to intercept hypercalls such as VCPUOP_set_singleshot_timer, we need to be aware of the Xen CPU numbering. This looks a lot like the Hyper-V handling of vpidx, for obvious reasons. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 1 + arch/x86/kvm/xen.c | 45 +++++++++++++++++++++++++++++++++ arch/x86/kvm/xen.h | 5 ++++ include/uapi/linux/kvm.h | 3 +++ 5 files changed, 55 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dca0842d6926..3c58d0bf5f9b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -608,6 +608,7 @@ struct kvm_vcpu_xen { u64 runstate_entry_time; u64 runstate_times[4]; unsigned long evtchn_pending_sel; + u32 vcpu_id; /* The Xen / ACPI vCPU ID */ }; struct kvm_vcpu_arch { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a54bd731cf41..2143c2652f8f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11141,6 +11141,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; + kvm_xen_init_vcpu(vcpu); kvm_vcpu_mtrr_init(vcpu); vcpu_load(vcpu); kvm_set_tsc_khz(vcpu, max_tsc_khz); diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 3df44c996d66..8ee4bc648bcb 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -605,6 +605,15 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) r = 0; break; + case KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID: + if (data->u.vcpu_id >= KVM_MAX_VCPUS) + r = -EINVAL; + else { + vcpu->arch.xen.vcpu_id = data->u.vcpu_id; + r = 0; + } + break; + default: break; } @@ -680,6 +689,11 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) r = -EINVAL; break; + case KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID: + data->u.vcpu_id = vcpu->arch.xen.vcpu_id; + r = 0; + break; + default: break; } @@ -785,6 +799,32 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) return 0; } +static struct kvm_vcpu *kvm_xen_vcpu_by_id(struct kvm *kvm, u32 vcpu_id, + struct kvm_vcpu *vcpu) +{ + unsigned long i; + + if (vcpu && vcpu->arch.xen.vcpu_id == vcpu_id) + return vcpu; + + if (vcpu_id >= KVM_MAX_VCPUS) + return NULL; + + /* Try the matching index first */ + if (!vcpu || vcpu->vcpu_idx == vcpu_id) { + vcpu = kvm_get_vcpu(kvm, vcpu_id); + + if (vcpu && vcpu->arch.xen.vcpu_id == vcpu_id) + return vcpu; + } + + kvm_for_each_vcpu(i, vcpu, kvm) + if (vcpu->arch.xen.vcpu_id == vcpu_id) + return vcpu; + + return NULL; +} + static int kvm_xen_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result) { kvm_rax_write(vcpu, result); @@ -1358,6 +1398,11 @@ static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r) return true; } +void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) +{ + vcpu->arch.xen.vcpu_id = vcpu->vcpu_idx; +} + void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 852286de574e..54d587aae85b 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -25,6 +25,7 @@ int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data); int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc); void kvm_xen_init_vm(struct kvm *kvm); void kvm_xen_destroy_vm(struct kvm *kvm); +void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu); void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu); int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm); @@ -75,6 +76,10 @@ static inline void kvm_xen_destroy_vm(struct kvm *kvm) { } +static inline void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) +{ +} + static inline void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index c0d6f49e3fd8..284645e1a872 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1739,6 +1739,7 @@ struct kvm_xen_vcpu_attr { __u64 time_blocked; __u64 time_offline; } runstate; + __u32 vcpu_id; } u; }; @@ -1749,6 +1750,8 @@ struct kvm_xen_vcpu_attr { #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT 0x3 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA 0x4 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST 0x5 +/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */ +#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID 0x6 /* Secure Encrypted Virtualization command */ enum sev_cmd_id { From patchwork Thu Feb 10 00:27:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84229C433F5 for ; Thu, 10 Feb 2022 01:31:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232674AbiBJBbm (ORCPT ); Wed, 9 Feb 2022 20:31:42 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:38610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232690AbiBJBbh (ORCPT ); Wed, 9 Feb 2022 20:31:37 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6882810E9 for ; Wed, 9 Feb 2022 17:31:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=JuR6ocG++h2vQeBZIhv4RWMz/wWGcYEFpw+qNMIi0Pk=; b=kzkhIMERSgPZ52g9jQzESeoNtx dqes54p+fRspR36yk09h80rtiU8XEF0zXpza2DUP/1G9EZgElQ60PfS5KqOAsD+7EO7XB3utXihJL +t3ZlFVP1mO42KspHOqfo6/C6lUo5YwF3uPsuMTA4IwfjSLvEpKbc0k1zcwXS1+8NJKbnUJwD48Q8 oDhJU3TRB8VEIaTUSzT2M7zTjGEBIFC5aE+sgkRV3+U4xV9EIJ7YHGU3ZzPG8938tf+0QE+d4T2E+ C4nfTdOG51yLdSNsvmBfB8hbBYLRh5t6Nha3F1Bvw0ysYRE9Hss1NEAB0EhfpAPmCbGhqumEXzlCe wV00EUlg==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xl9-MC; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019DX-6m; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 11/15] KVM: x86/xen: handle PV timers oneshot mode Date: Thu, 10 Feb 2022 00:27:17 +0000 Message-Id: <20220210002721.273608-12-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Joao Martins If the guest has offloaded the timer virq, handle the following hypercalls for programming the timer: VCPUOP_set_singleshot_timer VCPUOP_stop_singleshot_timer set_timer_op(timestamp_ns) The event channel corresponding to the timer virq is then used to inject events once timer deadlines are met. For now we back the PV timer with hrtimer. Signed-off-by: Joao Martins Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/kvm/irq.c | 11 ++- arch/x86/kvm/x86.c | 3 + arch/x86/kvm/xen.c | 168 ++++++++++++++++++++++++++++++++ arch/x86/kvm/xen.h | 30 ++++++ include/uapi/linux/kvm.h | 6 ++ 6 files changed, 219 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3c58d0bf5f9b..ddfdc8cc8e60 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -609,6 +609,9 @@ struct kvm_vcpu_xen { u64 runstate_times[4]; unsigned long evtchn_pending_sel; u32 vcpu_id; /* The Xen / ACPI vCPU ID */ + u32 timer_virq; + atomic_t timer_pending; + struct hrtimer timer; }; struct kvm_vcpu_arch { diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 172b05343cfd..af2d26fc5458 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -22,10 +22,14 @@ */ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) { + int r = 0; + if (lapic_in_kernel(vcpu)) - return apic_has_pending_timer(vcpu); + r = apic_has_pending_timer(vcpu); + if (kvm_xen_timer_enabled(vcpu)) + r += kvm_xen_has_pending_timer(vcpu); - return 0; + return r; } EXPORT_SYMBOL(kvm_cpu_has_pending_timer); @@ -143,6 +147,8 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu) { if (lapic_in_kernel(vcpu)) kvm_inject_apic_timer_irqs(vcpu); + if (kvm_xen_timer_enabled(vcpu)) + kvm_xen_inject_timer_irqs(vcpu); } EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs); @@ -150,6 +156,7 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu) { __kvm_migrate_apic_timer(vcpu); __kvm_migrate_pit_timer(vcpu); + __kvm_migrate_xen_timer(vcpu); static_call_cond(kvm_x86_migrate_timers)(vcpu); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2143c2652f8f..7240d791e4ab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12063,6 +12063,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu) kvm_x86_ops.nested_ops->hv_timer_pending(vcpu)) return true; + if (kvm_xen_has_pending_timer(vcpu)) + return true; + return false; } diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 8ee4bc648bcb..c042c7d6ee02 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -23,6 +23,7 @@ #include "trace.h" +static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm); static int kvm_xen_setattr_evtchn(struct kvm *kvm, struct kvm_xen_hvm_attr *data); static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r); @@ -108,6 +109,80 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) return ret; } +void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu) +{ + if (atomic_read(&vcpu->arch.xen.timer_pending) > 0) { + struct kvm_xen_evtchn e; + + e.vcpu_id = vcpu->vcpu_id; + e.vcpu_idx = vcpu->vcpu_idx; + e.port = vcpu->arch.xen.timer_virq; + e.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; + + kvm_xen_set_evtchn(&e, vcpu->kvm); + atomic_set(&vcpu->arch.xen.timer_pending, 0); + } +} + +static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer) +{ + struct kvm_vcpu *vcpu = container_of(timer, struct kvm_vcpu, + arch.xen.timer); + struct kvm_xen_evtchn e; + + if (atomic_read(&vcpu->arch.xen.timer_pending)) + return HRTIMER_NORESTART; + + e.vcpu_id = vcpu->vcpu_id; + e.vcpu_idx = vcpu->vcpu_idx; + e.port = vcpu->arch.xen.timer_virq; + e.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; + + if (kvm_xen_set_evtchn_fast(&e, vcpu->kvm) != -EWOULDBLOCK) + return HRTIMER_NORESTART; + + atomic_inc(&vcpu->arch.xen.timer_pending); + kvm_make_request(KVM_REQ_UNBLOCK, vcpu); + kvm_vcpu_kick(vcpu); + + return HRTIMER_NORESTART; +} + +void __kvm_migrate_xen_timer(struct kvm_vcpu *vcpu) +{ + struct hrtimer *timer; + + if (!kvm_xen_timer_enabled(vcpu)) + return; + + timer = &vcpu->arch.xen.timer; + if (hrtimer_cancel(timer)) + hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED); +} + +static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 delta_ns) +{ + ktime_t ktime_now; + + atomic_set(&vcpu->arch.xen.timer_pending, 0); + ktime_now = ktime_get(); + hrtimer_start(&vcpu->arch.xen.timer, + ktime_add_ns(ktime_now, delta_ns), + HRTIMER_MODE_ABS_PINNED); +} + +static void kvm_xen_stop_timer(struct kvm_vcpu *vcpu) +{ + hrtimer_cancel(&vcpu->arch.xen.timer); +} + +void kvm_xen_init_timer(struct kvm_vcpu *vcpu) +{ + hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, + HRTIMER_MODE_ABS_PINNED); + vcpu->arch.xen.timer.function = xen_timer_callback; +} + static void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) { struct kvm_vcpu_xen *vx = &v->arch.xen; @@ -614,6 +689,21 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) } break; + case KVM_XEN_VCPU_ATTR_TYPE_TIMER: + if (data->u.timer.port) { + if (data->u.timer.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) { + r = -EINVAL; + break; + } + kvm_xen_init_timer(vcpu); + } else if (kvm_xen_timer_enabled(vcpu)) { + kvm_xen_stop_timer(vcpu); + } + vcpu->arch.xen.timer_virq = data->u.timer.port; + // XXX Start timer if it's given + r = 0; + break; + default: break; } @@ -694,6 +784,13 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) r = 0; break; + case KVM_XEN_VCPU_ATTR_TYPE_TIMER: + data->u.timer.port = vcpu->arch.xen.timer_virq; + data->u.timer.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; + data->u.timer.expires_ns = 0; // XXX + r = 0; + break; + default: break; } @@ -855,6 +952,67 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, int cmd, u64 param, u6 return false; } +static bool kvm_xen_hcall_vcpu_op(struct kvm_vcpu *vcpu, int cmd, int vcpu_id, + u64 param, u64 *r) +{ + struct vcpu_set_singleshot_timer oneshot; + struct kvm_vcpu *target_vcpu; + long delta; + gpa_t gpa; + int idx; + + target_vcpu = kvm_xen_vcpu_by_id(vcpu->kvm, vcpu_id, vcpu); + if (!target_vcpu || !kvm_xen_timer_enabled(target_vcpu)) + return false; + + switch (cmd) { + case VCPUOP_set_singleshot_timer: + idx = srcu_read_lock(&vcpu->kvm->srcu); + gpa = kvm_mmu_gva_to_gpa_system(vcpu, param, NULL); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (!gpa || kvm_vcpu_read_guest(vcpu, gpa, &oneshot, + sizeof(oneshot))) { + *r = -EFAULT; + return true; + } + + delta = oneshot.timeout_abs_ns - get_kvmclock_ns(vcpu->kvm); + kvm_xen_start_timer(target_vcpu, delta); + *r = 0; + return true; + + case VCPUOP_stop_singleshot_timer: + kvm_xen_stop_timer(target_vcpu); + *r = 0; + return true; + } + + return false; +} + +static bool kvm_xen_hcall_set_timer_op(struct kvm_vcpu *vcpu, uint64_t timeout, + u64 *r) +{ + ktime_t ktime_now = ktime_get(); + long delta = timeout - get_kvmclock_ns(vcpu->kvm); + + if (!kvm_xen_timer_enabled(vcpu)) + return false; + + if (timeout == 0) { + kvm_xen_stop_timer(vcpu); + } else if (unlikely(timeout < ktime_now) || + ((uint32_t) (delta >> 50) != 0)) { + kvm_xen_start_timer(vcpu, 50000000); + } else { + kvm_xen_start_timer(vcpu, delta); + } + + *r = 0; + return true; +} + int kvm_xen_hypercall(struct kvm_vcpu *vcpu) { bool longmode; @@ -898,6 +1056,13 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu) case __HYPERVISOR_sched_op: handled = kvm_xen_hcall_sched_op(vcpu, params[0], params[1], &r); break; + case __HYPERVISOR_vcpu_op: + handled = kvm_xen_hcall_vcpu_op(vcpu, params[0], params[1], + params[2], &r); + break; + case __HYPERVISOR_set_timer_op: + handled = kvm_xen_hcall_set_timer_op(vcpu, params[0], &r); + break; default: break; } @@ -1405,6 +1570,9 @@ void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { + if (kvm_xen_timer_enabled(vcpu)) + kvm_xen_stop_timer(vcpu); + kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.runstate_cache); kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 54d587aae85b..616fe751c8fc 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -62,6 +62,21 @@ static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) vcpu->arch.xen.evtchn_pending_sel; } +static inline bool kvm_xen_timer_enabled(struct kvm_vcpu *vcpu) +{ + return !!vcpu->arch.xen.timer_virq; +} + +static inline int kvm_xen_has_pending_timer(struct kvm_vcpu *vcpu) +{ + if (kvm_xen_hypercall_enabled(vcpu->kvm) && kvm_xen_timer_enabled(vcpu)) + return atomic_read(&vcpu->arch.xen.timer_pending); + + return 0; +} + +void __kvm_migrate_xen_timer(struct kvm_vcpu *vcpu); +void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu); #else static inline int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data) { @@ -104,6 +119,21 @@ static inline void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu) } static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) + +static inline void __kvm_migrate_xen_timer(struct kvm_vcpu *vcpu) +{ +} + +static inline int kvm_xen_has_pending_timer(struct kvm_vcpu *vcpu) +{ + return 0; +} + +static inline void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu) +{ +} + +static inline bool kvm_xen_timer_enabled(struct kvm_vcpu *vcpu) { return false; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 284645e1a872..f485b536112f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1740,6 +1740,11 @@ struct kvm_xen_vcpu_attr { __u64 time_offline; } runstate; __u32 vcpu_id; + struct { + __u32 port; + __u32 priority; + __u64 expires_ns; + } timer; } u; }; @@ -1752,6 +1757,7 @@ struct kvm_xen_vcpu_attr { #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST 0x5 /* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */ #define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID 0x6 +#define KVM_XEN_VCPU_ATTR_TYPE_TIMER 0x7 /* Secure Encrypted Virtualization command */ enum sev_cmd_id { From patchwork Thu Feb 10 00:27:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741124 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D78D1C433EF for ; Thu, 10 Feb 2022 01:32:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232972AbiBJBcB (ORCPT ); Wed, 9 Feb 2022 20:32:01 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:39104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232735AbiBJBbv (ORCPT ); Wed, 9 Feb 2022 20:31:51 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32685F47 for ; Wed, 9 Feb 2022 17:31:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=JG1kCAvlUbp2aaGqscsx7yUF7CXhQvJDhw2H3fyYAAQ=; b=c/peA/xEgqpvnKbFzSCx3g/NNV C7mvSZCDUR7oxuyfNXzZ1e2hMM/3nBgVglmUxbTB3Wu4kj3b2BC3A6lmJeeNf/Bjnl28yW5qJcclf RBFr4v/gf3gJrLlBqtPLYP1qavlgvmm35X4zzrZtR7vawYM7EULLSTkHy4FRqT9MPPJ3z/L7FHUZl +LSuTOEc4Z1N5uTlNXENd+fc5K9oh+fgQuo9nqX0FcyP9qiQnt+bUQg1SdA4wHhPj7LshcfKROjqP BQq2DbulXplRI6HYU7YJjwBDFmmUGncPKSsxIiSCxUCDvI3EhkkAgE08CFKWeBmLNKFYTIAbDUqKm J/TffRfQ==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xlB-P3; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019Dc-7Y; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 12/15] KVM: x86/xen: Kernel acceleration for XENVER_version Date: Thu, 10 Feb 2022 00:27:18 +0000 Message-Id: <20220210002721.273608-13-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Turns out this is a fast path for PV guests because they use it to trigger the event channel upcall. So letting it bounce all the way up to userspace is not great. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/xen.c | 18 ++++++++++++++++++ include/uapi/linux/kvm.h | 3 ++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ddfdc8cc8e60..1c064c2ec937 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1021,6 +1021,7 @@ struct msr_bitmap_range { /* Xen emulation context */ struct kvm_xen { + u32 xen_version; bool long_mode; u8 upcall_vector; struct gfn_to_pfn_cache shinfo_cache; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index c042c7d6ee02..735fccd44488 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -479,6 +480,12 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) r = kvm_xen_setattr_evtchn(kvm, data); break; + case KVM_XEN_ATTR_TYPE_XEN_VERSION: + mutex_lock(&kvm->lock); + kvm->arch.xen.xen_version = data->u.xen_version; + mutex_unlock(&kvm->lock); + break; + default: break; } @@ -511,6 +518,11 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) r = 0; break; + case KVM_XEN_ATTR_TYPE_XEN_VERSION: + data->u.xen_version = kvm->arch.xen.xen_version; + r = 0; + break; + default: break; } @@ -1049,6 +1061,12 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu) params[3], params[4], params[5]); switch (input) { + case __HYPERVISOR_xen_version: + if (params[0] == XENVER_version && vcpu->kvm->arch.xen.xen_version) { + r = vcpu->kvm->arch.xen.xen_version; + handled = true; + } + break; case __HYPERVISOR_event_channel_op: if (params[0] == EVTCHNOP_send) handled = kvm_xen_hcall_evtchn_send(vcpu, params[1], &r); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f485b536112f..243917851e0d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1703,7 +1703,7 @@ struct kvm_xen_hvm_attr { __u32 padding[4]; } deliver; } evtchn; - + __u32 xen_version; __u64 pad[8]; } u; }; @@ -1714,6 +1714,7 @@ struct kvm_xen_hvm_attr { #define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR 0x2 /* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */ #define KVM_XEN_ATTR_TYPE_EVTCHN 0x3 +#define KVM_XEN_ATTR_TYPE_XEN_VERSION 0x4 /* Per-vCPU Xen attributes */ #define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr) From patchwork Thu Feb 10 00:27:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4473FC433EF for ; Thu, 10 Feb 2022 01:31:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232676AbiBJBbo (ORCPT ); Wed, 9 Feb 2022 20:31:44 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:38510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232869AbiBJBbi (ORCPT ); Wed, 9 Feb 2022 20:31:38 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45CF8BE7 for ; Wed, 9 Feb 2022 17:31:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=3rLfMCSe7vev48x81sQgwmtolCxeLUwn1CMNLy33ODg=; b=YslOGmLYBV4n8Gb5vWB4q7fnCN GBu2+mm4XLE4QGhVCRPDLiHFmXybadByvoOPPEGf36H2/jgPSVOfwXbvV2ZNGIEnKAKzZv2+wln4+ qPY2K0kzcpL0sPRni3buR2j1nWjdV+TRSF9Enj2LnAcMEGS1v1mYptnPCYMbQ59w5vQ9pvNAzuJCc iFUsGqnVFO3EQZ6zs39S4irs1sADO45pdp2Btg6LTQ9kYv9v+IlF7qdy6Hot9/KjDk3eBXwpXjufh 7qcHBv/e2vmgwINbFuefvjrHkx9pLjKn1lsrOZJuqFtjX0IvTPRONxQT6LPZrloNex2+FBMMis7Sr cYCAD1ow==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xlC-Pl; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019Dh-8A; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 13/15] KVM: x86/xen: Support per-vCPU event channel upcall via local APIC Date: Thu, 10 Feb 2022 00:27:19 +0000 Message-Id: <20220210002721.273608-14-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Windows uses a per-vCPU vector, and it's delivered via the local APIC basically like an MSI (with associated EOI) unlike the traditional guest-wide vector which is just magically asserted by Xen (and in the KVM case by kvm_xen_has_interrupt() / kvm_cpu_get_extint()). Now that the kernel is able to raise event channel events for itself, being able to do so for Windows guests is also going to be useful. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/xen.c | 40 +++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 2 ++ 3 files changed, 43 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1c064c2ec937..50cbea89739b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -601,6 +601,7 @@ struct kvm_vcpu_hv { struct kvm_vcpu_xen { u64 hypercall_rip; u32 current_runstate; + u8 upcall_vector; struct gfn_to_pfn_cache vcpu_info_cache; struct gfn_to_pfn_cache vcpu_time_info_cache; struct gfn_to_pfn_cache runstate_cache; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 735fccd44488..60459e52a2fd 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -326,6 +326,22 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT); } +static void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v) +{ + struct kvm_lapic_irq irq = { }; + int r; + + irq.dest_id = v->vcpu_id; + irq.vector = v->arch.xen.upcall_vector; + irq.dest_mode = APIC_DEST_PHYSICAL; + irq.shorthand = APIC_DEST_NOSHORT; + irq.delivery_mode = APIC_DM_FIXED; + irq.level = 1; + + /* The fast version will always work for physical unicast */ + WARN_ON_ONCE(!kvm_irq_delivery_to_apic_fast(v->kvm, NULL, &irq, &r, NULL)); +} + /* * On event channel delivery, the vcpu_info may not have been accessible. * In that case, there are bits in vcpu->arch.xen.evtchn_pending_sel which @@ -387,6 +403,10 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) } read_unlock_irqrestore(&gpc->lock, flags); + /* For the per-vCPU lapic vector, deliver it as MSI. */ + if (v->arch.xen.upcall_vector) + kvm_xen_inject_vcpu_vector(v); + mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT); } @@ -716,6 +736,15 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) r = 0; break; + case KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR: + if (data->u.vector && data->u.vector < 0x10) + r = -EINVAL; + else { + vcpu->arch.xen.upcall_vector = data->u.vector; + r = 0; + } + break; + default: break; } @@ -803,6 +832,11 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) r = 0; break; + case KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR: + data->u.vector = vcpu->arch.xen.upcall_vector; + r = 0; + break; + default: break; } @@ -1212,6 +1246,12 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) kick_vcpu = true; } } + + /* For the per-vCPU lapic vector, deliver it as MSI. */ + if (kick_vcpu && vcpu->arch.xen.upcall_vector) { + kvm_xen_inject_vcpu_vector(vcpu); + kick_vcpu = false; + } } out_rcu: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 243917851e0d..3506186ad16f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1746,6 +1746,7 @@ struct kvm_xen_vcpu_attr { __u32 priority; __u64 expires_ns; } timer; + __u8 vector; } u; }; @@ -1759,6 +1760,7 @@ struct kvm_xen_vcpu_attr { /* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */ #define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID 0x6 #define KVM_XEN_VCPU_ATTR_TYPE_TIMER 0x7 +#define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR 0x8 /* Secure Encrypted Virtualization command */ enum sev_cmd_id { From patchwork Thu Feb 10 00:27:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3039C433F5 for ; Thu, 10 Feb 2022 01:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232944AbiBJBcD (ORCPT ); Wed, 9 Feb 2022 20:32:03 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:39366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232953AbiBJBb6 (ORCPT ); Wed, 9 Feb 2022 20:31:58 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A01820F51 for ; Wed, 9 Feb 2022 17:31:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=WG3prxj2crGx1WaeF+/kJumw/841xmevKri51CniynQ=; b=XWDrCdiNp/HtP/j2iQHa92uAI2 Wqy5qdxxJT+zINWdSwIWw7qvE3o0cSUwDbRlSVI6AmnWl2O9yNf/09FoJPRqeQZZPs54eE4HFxInT 4PNGoLBNFLcGEVV+uw51SMnJJz8wBY5z5nihgM+ousTMeibf5Lkw/Mm/OphNnMtJD5R33asq2hQ1e U3r2CEuIPbI5c7lXGfA2NKAakDl3/fIi7k1I/OORyQncw1VzsMDfjCWv41bh5zMN/o2w+uWO5JnLv 2Yn3OUIOnniTB6CYHvqf8yZcpSJRjZZ6Rtt/RQjjnQoIjCPh4959Bof8WW4TnvDPlJVsh/azwkARu iq3wXW3Q==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008xlD-PF; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019Dm-8p; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 14/15] KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND Date: Thu, 10 Feb 2022 00:27:20 +0000 Message-Id: <20220210002721.273608-15-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse At the end of the patch series adding this batch of event channel acceleration features, finally add the feature bit which advertises them and document it all. For SCHEDOP_poll we need to wake a polling vCPU when a given port is triggered, even when it's masked — and we want to implement that in the kernel, for efficiency. So we want the kernel to know that it has sole ownership of event channel delivery. Thus, we allow userspace to make the 'promise' by setting the corresponding feature bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll bypass later, we will do so only if that promise has been made by userspace. Signed-off-by: David Woodhouse --- Documentation/virt/kvm/api.rst | 129 ++++++++++++++++++++++++++++++--- arch/x86/kvm/x86.c | 3 +- arch/x86/kvm/xen.c | 6 +- include/uapi/linux/kvm.h | 1 + 4 files changed, 127 insertions(+), 12 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index a4267104db50..046b386f6ce3 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -988,12 +988,22 @@ memory. __u8 pad2[30]; }; -If the KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag is returned from the -KVM_CAP_XEN_HVM check, it may be set in the flags field of this ioctl. -This requests KVM to generate the contents of the hypercall page -automatically; hypercalls will be intercepted and passed to userspace -through KVM_EXIT_XEN. In this case, all of the blob size and address -fields must be zero. +If certain flags are returned from the KVM_CAP_XEN_HVM check, they may +be set in the flags field of this ioctl: + +The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate +the contents of the hypercall page automatically; hypercalls will be +intercepted and passed to userspace through KVM_EXIT_XEN. In this +ase, all of the blob size and address fields must be zero. + +The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace +will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event +channel interrupts rather than manipulating the guest's shared_info +structures directly. This, in turn, may allow KVM to enable features +such as intercepting the SCHEDOP_poll hypercall to accelerate PV +spinlock operation for the guest. Userspace may still use the ioctl +to deliver events if it was advertised, even if userspace does not +send this indication that it will always do so No other flags are currently valid in the struct kvm_xen_hvm_config. @@ -5149,7 +5159,25 @@ have deterministic behavior. struct { __u64 gfn; } shared_info; - __u64 pad[4]; + struct { + __u32 send_port; + __u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */ + __u32 flags; + union { + struct { + __u32 port; + __u32 vcpu; + __u32 priority; + } port; + struct { + __u32 port; /* Zero for eventfd */ + __s32 fd; + } eventfd; + __u32 padding[4]; + } deliver; + } evtchn; + __u32 xen_version; + __u64 pad[8]; } u; }; @@ -5180,6 +5208,30 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO KVM_XEN_ATTR_TYPE_UPCALL_VECTOR Sets the exception vector used to deliver Xen event channel upcalls. + This is the HVM-wide vector injected directly by the hypervisor + (not through the local APIC), typically configured by a guest via + HVM_PARAM_CALLBACK_IRQ. + +KVM_XEN_ATTR_TYPE_EVTCHN + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures + an outbound port number for interception of EVTCHNOP_send requests + from the guest. A given sending port number may be directed back + to a specified vCPU (by APIC ID) / port / priority on the guest, + or to trigger events on an eventfd. The vCPU and priority can be + changed by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, + but other fields cannot change for a given sending port. A port + mapping is removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags + field. + +KVM_XEN_ATTR_TYPE_XEN_VERSION + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures + the 32-bit version code returned to the guest when it invokes the + XENVER_version call; typically (XEN_MAJOR << 16 | XEN_MINOR). PV + Xen guests will often use this to as a dummy hypercall to trigger + event channel delivery, so responding within the kernel without + exiting to userspace is beneficial. 4.127 KVM_XEN_HVM_GET_ATTR -------------------------- @@ -5191,7 +5243,8 @@ KVM_XEN_ATTR_TYPE_UPCALL_VECTOR :Returns: 0 on success, < 0 on error Allows Xen VM attributes to be read. For the structure and types, -see KVM_XEN_HVM_SET_ATTR above. +see KVM_XEN_HVM_SET_ATTR above. The KVM_XEN_ATTR_TYPE_EVTCHN +attribute cannot be read. 4.128 KVM_XEN_VCPU_SET_ATTR --------------------------- @@ -5218,6 +5271,13 @@ see KVM_XEN_HVM_SET_ATTR above. __u64 time_blocked; __u64 time_offline; } runstate; + __u32 vcpu_id; + struct { + __u32 port; + __u32 priority; + __u64 expires_ns; + } timer; + __u8 vector; } u; }; @@ -5255,6 +5315,27 @@ KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST or RUNSTATE_offline) to set the current accounted state as of the adjusted state_entry_time. +KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the Xen + vCPU ID of the given vCPU, to allow timer-related VCPU operations to + be intercepted by KVM. + +KVM_XEN_VCPU_ATTR_TYPE_TIMER + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the + event channel port/priority for the VIRQ_TIMER of the vCPU, as well + as allowing a pending timer to be saved/restored. + +KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the + per-vCPU local APIC upcall vector, configured by a Xen guest with + the HVMOP_set_evtchn_upcall_vector hypercall. This is typically + used by Windows guests, and is distinct from the HVM-wide upcall + vector configured with HVM_PARAM_CALLBACK_IRQ. + + 4.129 KVM_XEN_VCPU_GET_ATTR --------------------------- @@ -5574,6 +5655,25 @@ enabled with ``arch_prctl()``, but this may change in the future. The offsets of the state save areas in struct kvm_xsave follow the contents of CPUID leaf 0xD on the host. +4.135 KVM_XEN_HVM_EVTCHN_SEND +----------------------------- + +:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct kvm_irq_routing_xen_evtchn +:Returns: 0 on success, < 0 on error + + +:: + + struct kvm_irq_routing_xen_evtchn { + __u32 port; + __u32 vcpu; + __u32 priority; + }; + +This ioctl injects an event channel interrupt directly to the guest vCPU. 5. The kvm_run structure ======================== @@ -7472,8 +7572,9 @@ PVHVM guests. Valid flags are:: #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0) #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1) #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2) - #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 2) - #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 3) + #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3) + #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) + #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG ioctl is available, for the guest to set its hypercall page. @@ -7497,6 +7598,14 @@ The KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL flag indicates that IRQ routing entries of the type KVM_IRQ_ROUTING_XEN_EVTCHN are supported, with the priority field set to indicate 2 level event channel delivery. +The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates that KVM supports +injecting event channel events directly into the guest with the +KVM_XEN_HVM_EVTCHN_SEND ioctl. It also indicates support for the +KVM_XEN_ATTR_TYPE_EVTCHN/XEN_VERSION HVM attributes and the +KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID/TIMER/UPCALL_VECTOR vCPU attributes. +related to event channel delivery, timers, and the XENVER_version +interception. + 8.31 KVM_CAP_PPC_MULTITCE ------------------------- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7240d791e4ab..8ef276fed7fe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4234,7 +4234,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR | KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_SHARED_INFO | - KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL; + KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL | + KVM_XEN_HVM_CONFIG_EVTCHN_SEND; if (sched_info_on()) r |= KVM_XEN_HVM_CONFIG_RUNSTATE; break; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 60459e52a2fd..87706bcecaef 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -917,7 +917,11 @@ int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data) int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) { - if (xhc->flags & ~KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL) + /* Only some feature flags need to be *enabled* by userspace */ + u32 permitted_flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | + KVM_XEN_HVM_CONFIG_EVTCHN_SEND; + + if (xhc->flags & ~permitted_flags) return -EINVAL; /* diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 3506186ad16f..540495642467 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1222,6 +1222,7 @@ struct kvm_x86_mce { #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2) #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3) #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) +#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) struct kvm_xen_hvm_config { __u32 flags; From patchwork Thu Feb 10 00:27:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 12741179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBDB7C433F5 for ; Thu, 10 Feb 2022 02:08:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231841AbiBJCIw (ORCPT ); Wed, 9 Feb 2022 21:08:52 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:56188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231329AbiBJCIq (ORCPT ); Wed, 9 Feb 2022 21:08:46 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF688262 for ; Wed, 9 Feb 2022 18:08:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=eBT2CqKCMBC53Y0I7CjLXvjtNTKCe8D0c69/7Tbfma8=; b=XK1fXQ/lecZKAJxrn3izfv309f Z+jr5oEqAZbgORnHOG3rieSAv+/dd8rNAC3hVF8Rnzgt20DzFEXFUOS4Q0sldiGlWG35EO1k2MUYl GknTGtWmKUjqhvE+xNhFkXzytsJIeSR3/0ntt0FcWU3lnCw78n7olnc3gos0eloTpz4VIr0FNr7xT fBG7TdlKFgfSBmLZ3AVtx1qvZM7+Ak2YHSoev83wZG0S8plEghvHFtyAiiWzyro/IgElDVfxcqqqA WQnsA7VyLjbyPHAjN1fhAYc7hd5MA25B7O3CBqB/iDoSXEu+aAoxrCB0ZTFnQAOqn6lGsOlqXPt/T uV6MWqXw==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-008YWK-K2; Thu, 10 Feb 2022 00:27:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nHxIy-0019Dr-9c; Thu, 10 Feb 2022 00:27:24 +0000 From: David Woodhouse To: kvm@vger.kernel.org, Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Joao Martins , Boris Ostrovsky , Metin Kaya , Paul Durrant Subject: [PATCH v0 15/15] KVM: x86/xen: handle PV spinlocks slowpath Date: Thu, 10 Feb 2022 00:27:21 +0000 Message-Id: <20220210002721.273608-16-dwmw2@infradead.org> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20220210002721.273608-1-dwmw2@infradead.org> References: <20220210002721.273608-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Boris Ostrovsky Add support for SCHEDOP_poll hypercall. This implementation is optimized for polling for a single channel, which is what Linux does. Polling for multiple channels is not especially efficient (and has not been tested). PV spinlocks slow path uses this hypercall, and explicitly crash if it's not supported. Signed-off-by: Boris Ostrovsky Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/kvm/x86.c | 2 + arch/x86/kvm/xen.c | 140 ++++++++++++++++++ arch/x86/kvm/xen.h | 5 + .../selftests/kvm/x86_64/xen_shinfo_test.c | 6 + 5 files changed, 156 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 50cbea89739b..7aa46b906fdc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -613,6 +613,8 @@ struct kvm_vcpu_xen { u32 timer_virq; atomic_t timer_pending; struct hrtimer timer; + int poll_evtchn; + struct timer_list poll_timer; }; struct kvm_vcpu_arch { @@ -1027,6 +1029,7 @@ struct kvm_xen { u8 upcall_vector; struct gfn_to_pfn_cache shinfo_cache; struct idr evtchn_ports; + unsigned long poll_mask[BITS_TO_LONGS(KVM_MAX_VCPUS)]; }; enum kvm_irqchip_mode { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ef276fed7fe..b43adb84d30b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11132,6 +11132,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.pending_external_vector = -1; vcpu->arch.preempted_in_kernel = false; + kvm_xen_init_vcpu(vcpu); + #if IS_ENABLED(CONFIG_HYPERV) vcpu->arch.hv_root_tdp = INVALID_PAGE; #endif diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 87706bcecaef..49059ca085b1 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -10,6 +10,7 @@ #include "xen.h" #include "lapic.h" #include "hyperv.h" +#include "lapic.h" #include #include @@ -988,9 +989,133 @@ static int kvm_xen_hypercall_complete_userspace(struct kvm_vcpu *vcpu) return kvm_xen_hypercall_set_result(vcpu, run->xen.u.hcall.result); } +static bool wait_pending_event(struct kvm_vcpu *vcpu, int nr_ports, + evtchn_port_t *ports) +{ + struct kvm *kvm = vcpu->kvm; + struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; + unsigned long *pending_bits; + unsigned long flags; + bool ret = true; + int idx, i; + + read_lock_irqsave(&gpc->lock, flags); + idx = srcu_read_lock(&kvm->srcu); + if (!kvm_gfn_to_pfn_cache_check(kvm, gpc, gpc->gpa, PAGE_SIZE)) + goto out_rcu; + + ret = false; + if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { + struct shared_info *shinfo = gpc->khva; + pending_bits = (unsigned long *)&shinfo->evtchn_pending; + } else { + struct compat_shared_info *shinfo = gpc->khva; + pending_bits = (unsigned long *)&shinfo->evtchn_pending; + } + + for (i = 0; i < nr_ports; i++) { + if (test_bit(ports[i], pending_bits)) { + ret = true; + break; + } + } + + out_rcu: + srcu_read_unlock(&kvm->srcu, idx); + read_unlock_irqrestore(&gpc->lock, flags); + + return ret; +} + +static bool kvm_xen_schedop_poll(struct kvm_vcpu *vcpu, u64 param, u64 *r) +{ + int idx, i; + struct sched_poll sched_poll; + evtchn_port_t port, *ports; + int ret = 0; + gpa_t gpa; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + gpa = kvm_mmu_gva_to_gpa_system(vcpu, param, NULL); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (!gpa || kvm_vcpu_read_guest(vcpu, gpa, &sched_poll, + sizeof(sched_poll))) { + *r = -EFAULT; + return true; + } + + if (unlikely(sched_poll.nr_ports > 1)) { + /* Xen (unofficially) limits number of pollers to 128 */ + if (sched_poll.nr_ports > 128) + return -EINVAL; + + ports = kmalloc_array(sched_poll.nr_ports, + sizeof(*ports), GFP_KERNEL); + if (!ports) + return -ENOMEM; + } else + ports = &port; + + for (i = 0; i < sched_poll.nr_ports; i++) { + idx = srcu_read_lock(&vcpu->kvm->srcu); + gpa = kvm_mmu_gva_to_gpa_system(vcpu, + (gva_t)(sched_poll.ports + i), + NULL); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (!gpa || kvm_vcpu_read_guest(vcpu, gpa, + &ports[i], sizeof(port))) { + ret = -EFAULT; + goto out; + } + } + + if (sched_poll.nr_ports == 1) + vcpu->arch.xen.poll_evtchn = port; + else + vcpu->arch.xen.poll_evtchn = -1; + + set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.xen.poll_mask); + + if (!wait_pending_event(vcpu, sched_poll.nr_ports, ports)) { + vcpu->arch.mp_state = KVM_MP_STATE_HALTED; + + if (sched_poll.timeout) + mod_timer(&vcpu->arch.xen.poll_timer, jiffies + nsecs_to_jiffies(sched_poll.timeout)); + + kvm_vcpu_halt(vcpu); + + if (sched_poll.timeout) + del_timer(&vcpu->arch.xen.poll_timer); + } + + vcpu->arch.xen.poll_evtchn = 0; + +out: + /* Really, this is only needed in case of timeout */ + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.xen.poll_mask); + + if (unlikely(sched_poll.nr_ports > 1)) + kfree(ports); + return ret; +} + +static void cancel_evtchn_poll(struct timer_list *t) +{ + struct kvm_vcpu *vcpu = from_timer(vcpu, t, arch.xen.poll_timer); + + kvm_make_request(KVM_REQ_UNHALT, vcpu); +} + static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, int cmd, u64 param, u64 *r) { switch (cmd) { + case SCHEDOP_poll: + if ((vcpu->kvm->arch.xen_hvm_config.flags & + KVM_XEN_HVM_CONFIG_EVTCHN_SEND) && lapic_in_kernel(vcpu)) + return kvm_xen_schedop_poll(vcpu, param, r); + fallthrough; case SCHEDOP_yield: kvm_vcpu_on_spin(vcpu, true); *r = 0; @@ -1152,6 +1277,17 @@ static inline int max_evtchn_port(struct kvm *kvm) return COMPAT_EVTCHN_2L_NR_CHANNELS; } +static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port) +{ + int poll_evtchn = vcpu->arch.xen.poll_evtchn; + + if ((poll_evtchn == port || poll_evtchn == -1) && + test_and_clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.xen.poll_mask)) { + kvm_make_request(KVM_REQ_UNBLOCK, vcpu); + kvm_vcpu_kick(vcpu); + } +} + /* * The return value from this function is propagated to kvm_set_irq() API, * so it returns: @@ -1219,6 +1355,7 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) rc = 0; /* It was already raised */ } else if (test_bit(xe->port, mask_bits)) { rc = -ENOTCONN; /* Masked */ + kvm_xen_check_poller(vcpu, xe->port); } else { rc = 1; /* Delivered to the bitmap in shared_info. */ /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ @@ -1628,6 +1765,8 @@ static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r) void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) { vcpu->arch.xen.vcpu_id = vcpu->vcpu_idx; + vcpu->arch.xen.poll_evtchn = 0; + timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); } void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) @@ -1641,6 +1780,7 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) &vcpu->arch.xen.vcpu_info_cache); kvm_gfn_to_pfn_cache_destroy(vcpu->kvm, &vcpu->arch.xen.vcpu_time_info_cache); + del_timer_sync(&vcpu->arch.xen.poll_timer); } void kvm_xen_init_vm(struct kvm *kvm) diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 616fe751c8fc..eaddbeb2f923 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -24,6 +24,7 @@ int kvm_xen_hvm_evtchn_send(struct kvm *kvm, struct kvm_irq_routing_xen_evtchn * int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data); int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc); void kvm_xen_init_vm(struct kvm *kvm); +void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu); void kvm_xen_destroy_vm(struct kvm *kvm); void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu); void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu); @@ -83,6 +84,10 @@ static inline int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data) return 1; } +static inline void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) +{ +} + static inline void kvm_xen_init_vm(struct kvm *kvm) { } diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c index 865e17146815..376c611443cd 100644 --- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c +++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c @@ -233,6 +233,12 @@ int main(int argc, char *argv[]) .flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL, .msr = XEN_HYPERCALL_MSR, }; + + /* Let the kernel know that we *will* use it for sending all + * event channels, which lets it intercept SCHEDOP_poll */ + if (xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND) + hvmc.flags |= KVM_XEN_HVM_CONFIG_EVTCHN_SEND; + vm_ioctl(vm, KVM_XEN_HVM_CONFIG, &hvmc); struct kvm_xen_hvm_attr lm = {