From patchwork Thu Apr 21 05:12:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5E33FC433EF for ; Thu, 21 Apr 2022 05:13:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1310710F370; Thu, 21 Apr 2022 05:13:06 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 510E610F370 for ; Thu, 21 Apr 2022 05:13:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517984; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v5Osah9g2vRm7G/Dpws6Fwb9j0/8Bm+Stdzyof2NxB4=; b=DDf6m1ujr7Ml+Nb+J2eIvpuSay5K+Qxqbs9/isDEfgdZUpQoDkVUSxUwJo5NXSqpg2cH7S sNZ8m49Li++N8t0VGtKCXdQ5H4OiRPXAXWk1kHdYbU4Gsv2v+oIbl/wUioqDPdX9IYylCm 1nsmz4AEllUro6oMzVM6zRD72pffg30= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-278-u5aCeUtBPnOB8gWikvy6Pw-1; Thu, 21 Apr 2022 01:12:58 -0400 X-MC-Unique: u5aCeUtBPnOB8gWikvy6Pw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 86CD7811E75; Thu, 21 Apr 2022 05:12:57 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 37A14145BA68; Thu, 21 Apr 2022 05:12:52 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 01/10] KVM: x86: mmu: allow to enable write tracking externally Date: Thu, 21 Apr 2022 08:12:35 +0300 Message-Id: <20220421051244.187733-2-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This will be used to enable write tracking from nested AVIC code and can also be used to enable write tracking in GVT-g module when it actually uses it as opposed to always enabling it, when the module is compiled in the kernel. No functional change intended. Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/kvm_page_track.h | 1 + arch/x86/kvm/mmu.h | 8 +++++--- arch/x86/kvm/mmu/mmu.c | 17 ++++++++++------- arch/x86/kvm/mmu/page_track.c | 10 ++++++++-- 5 files changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2c20f715f0094..ae41d2df69fe9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1234,7 +1234,7 @@ struct kvm_arch { * is used as one input when determining whether certain memslot * related allocations are necessary. */ - bool shadow_root_allocated; + bool mmu_page_tracking_enabled; #if IS_ENABLED(CONFIG_HYPERV) hpa_t hv_root_tdp; diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index eb186bc57f6a9..955a5ae07b10e 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -50,6 +50,7 @@ int kvm_page_track_init(struct kvm *kvm); void kvm_page_track_cleanup(struct kvm *kvm); bool kvm_page_track_write_tracking_enabled(struct kvm *kvm); +int kvm_page_track_write_tracking_enable(struct kvm *kvm); int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot); void kvm_page_track_free_memslot(struct kvm_memory_slot *slot); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 671cfeccf04e9..44d15551f7156 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -269,7 +269,7 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); -static inline bool kvm_shadow_root_allocated(struct kvm *kvm) +static inline bool mmu_page_tracking_enabled(struct kvm *kvm) { /* * Read shadow_root_allocated before related pointers. Hence, threads @@ -277,9 +277,11 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm) * see the pointers. Pairs with smp_store_release in * mmu_first_shadow_root_alloc. */ - return smp_load_acquire(&kvm->arch.shadow_root_allocated); + return smp_load_acquire(&kvm->arch.mmu_page_tracking_enabled); } +int mmu_enable_write_tracking(struct kvm *kvm); + #ifdef CONFIG_X86_64 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; } #else @@ -288,7 +290,7 @@ static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; } static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { - return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm); + return !is_tdp_mmu_enabled(kvm) || mmu_page_tracking_enabled(kvm); } static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 69a30d6d1e2b9..2c4edae4b026d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3368,7 +3368,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) return r; } -static int mmu_first_shadow_root_alloc(struct kvm *kvm) +int mmu_enable_write_tracking(struct kvm *kvm) { struct kvm_memslots *slots; struct kvm_memory_slot *slot; @@ -3378,21 +3378,20 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm) * Check if this is the first shadow root being allocated before * taking the lock. */ - if (kvm_shadow_root_allocated(kvm)) + if (mmu_page_tracking_enabled(kvm)) return 0; mutex_lock(&kvm->slots_arch_lock); /* Recheck, under the lock, whether this is the first shadow root. */ - if (kvm_shadow_root_allocated(kvm)) + if (mmu_page_tracking_enabled(kvm)) goto out_unlock; /* * Check if anything actually needs to be allocated, e.g. all metadata * will be allocated upfront if TDP is disabled. */ - if (kvm_memslots_have_rmaps(kvm) && - kvm_page_track_write_tracking_enabled(kvm)) + if (kvm_memslots_have_rmaps(kvm) && mmu_page_tracking_enabled(kvm)) goto out_success; for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { @@ -3422,7 +3421,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm) * all the related pointers are set. */ out_success: - smp_store_release(&kvm->arch.shadow_root_allocated, true); + smp_store_release(&kvm->arch.mmu_page_tracking_enabled, true); out_unlock: mutex_unlock(&kvm->slots_arch_lock); @@ -3459,7 +3458,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } } - r = mmu_first_shadow_root_alloc(vcpu->kvm); + r = mmu_enable_write_tracking(vcpu->kvm); if (r) return r; @@ -5727,6 +5726,10 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_write = kvm_mmu_pte_write; node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); + + if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled) + mmu_enable_write_tracking(kvm); + return 0; } diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 2e09d1b6249f3..8857d629036d7 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -21,10 +21,16 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) { - return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || - !tdp_enabled || kvm_shadow_root_allocated(kvm); + return mmu_page_tracking_enabled(kvm); } +int kvm_page_track_write_tracking_enable(struct kvm *kvm) +{ + return mmu_enable_write_tracking(kvm); +} +EXPORT_SYMBOL_GPL(kvm_page_track_write_tracking_enable); + + void kvm_page_track_free_memslot(struct kvm_memory_slot *slot) { int i; From patchwork Thu Apr 21 05:12:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7B6DAC433F5 for ; Thu, 21 Apr 2022 05:13:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 902AC10F376; Thu, 21 Apr 2022 05:13:10 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id B2A5910F378 for ; Thu, 21 Apr 2022 05:13:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517987; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dtw2d3gtqOXpmah/qxNUA+qsKhDf3+sbZ1LnrFMMKWM=; b=EqtHeqt70Okg82eeZR7WZzLttHiJdNPTd+hp99bRqJqEKtraOOME3aKXMrLUGydccV2Tan /KOp6StoerAh4gyBd8AIbefewBgDLvGpCGS/tejqut37kicmZ4bgIF2p3yOjq9LLmwyGtm AnOL/SQwKl8DqPd7ozoZ2Pv3bEexuvo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-659-tUHxrnQ3PUWjBn7iMMhztg-1; Thu, 21 Apr 2022 01:13:04 -0400 X-MC-Unique: tUHxrnQ3PUWjBn7iMMhztg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6B988811E75; Thu, 21 Apr 2022 05:13:03 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id E12FF145BA5A; Thu, 21 Apr 2022 05:12:57 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 02/10] x86: KVMGT: use kvm_page_track_write_tracking_enable Date: Thu, 21 Apr 2022 08:12:36 +0300 Message-Id: <20220421051244.187733-3-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This allows to enable the write tracking only when KVMGT is actually used and doesn't carry any penalty otherwise. Tested by booting a VM with a kvmgt mdev device. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/Kconfig | 3 --- arch/x86/kvm/mmu/mmu.c | 2 +- drivers/gpu/drm/i915/Kconfig | 1 - drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +++++ 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index e3cbd77061364..41341905d3734 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -126,7 +126,4 @@ config KVM_XEN If in doubt, say "N". -config KVM_EXTERNAL_WRITE_TRACKING - bool - endif # VIRTUALIZATION diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2c4edae4b026d..23f895d439cf5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5727,7 +5727,7 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); - if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled) + if (!tdp_enabled) mmu_enable_write_tracking(kvm); return 0; diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 98c5450b8eacc..7d8346f4bae11 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -130,7 +130,6 @@ config DRM_I915_GVT_KVMGT depends on DRM_I915_GVT depends on KVM depends on VFIO_MDEV - select KVM_EXTERNAL_WRITE_TRACKING default n help Choose this option if you want to enable KVMGT support for diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 057ec44901045..4c62ab3ef245d 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1933,6 +1933,7 @@ static int kvmgt_guest_init(struct mdev_device *mdev) struct intel_vgpu *vgpu; struct kvmgt_vdev *vdev; struct kvm *kvm; + int ret; vgpu = mdev_get_drvdata(mdev); if (handle_valid(vgpu->handle)) @@ -1948,6 +1949,10 @@ static int kvmgt_guest_init(struct mdev_device *mdev) if (__kvmgt_vgpu_exist(vgpu, kvm)) return -EEXIST; + ret = kvm_page_track_write_tracking_enable(kvm); + if (ret) + return ret; + info = vzalloc(sizeof(struct kvmgt_guest_info)); if (!info) return -ENOMEM; From patchwork Thu Apr 21 05:12:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E6110C4332F for ; Thu, 21 Apr 2022 05:13:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 505A910F379; Thu, 21 Apr 2022 05:13:17 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id AD95510F366 for ; Thu, 21 Apr 2022 05:13:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517992; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5bCkYEOsNHgNL9DAc6eiIfRMxj3itayT5GreRyCWKFU=; b=cLh2u/tk7qjQhbDqglLqFp2baqHejrl48vl4FcgLyaWqPeHhhyBQ/eR+IB9iRD5QiZPvzY mjMBfKUYldmFzhVcV0mX3vfvr8LCSPhZ5R9DOd0PEGbldp6nVhpdliYrBW9H/g+STkhQhd ssR8nj2P5QozZMxMcT6fxM3fXANdZtU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-516-0HIxVeq1PGehQlJ21WvyzA-1; Thu, 21 Apr 2022 01:13:10 -0400 X-MC-Unique: 0HIxVeq1PGehQlJ21WvyzA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FF8229DD98F; Thu, 21 Apr 2022 05:13:09 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id C534D145B96B; Thu, 21 Apr 2022 05:13:03 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 03/10] KVM: x86: mmu: add gfn_in_memslot helper Date: Thu, 21 Apr 2022 08:12:37 +0300 Message-Id: <20220421051244.187733-4-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This is a tiny refactoring, and can be useful to check if a GPA/GFN is within a memslot a bit more cleanly. Signed-off-by: Maxim Levitsky --- include/linux/kvm_host.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 252ee4a61b58b..12e261559070b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1580,6 +1580,13 @@ int kvm_request_irq_source_id(struct kvm *kvm); void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args); + +static inline bool gfn_in_memslot(struct kvm_memory_slot *slot, gfn_t gfn) +{ + return (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages); +} + + /* * Returns a pointer to the memslot if it contains gfn. * Otherwise returns NULL. @@ -1590,12 +1597,13 @@ try_get_memslot(struct kvm_memory_slot *slot, gfn_t gfn) if (!slot) return NULL; - if (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages) + if (gfn_in_memslot(slot, gfn)) return slot; else return NULL; } + /* * Returns a pointer to the memslot that contains gfn. Otherwise returns NULL. * From patchwork Thu Apr 21 05:12:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0980CC433EF for ; Thu, 21 Apr 2022 05:13:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0AD0189D39; Thu, 21 Apr 2022 05:13:22 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id A022B89B97 for ; Thu, 21 Apr 2022 05:13:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=onN0ZHZPa2j6tpDgsJt0Wzjm7hzuHqZyanl0pCWwqvQ=; b=LdDLhbOrjZfR6vHHIxaadkBLyqYtEO00yGBzUyk4bkG/AiJqnuwvZzDw+sQuC0181wsMNh cb0sYK4PC3MTbGy4xFI2VXsZgk39IA9/aSvAxGgwfXN/a9Arp0pJa/EZxY4qWCnvDv7JYv xc3vibJAj3qb10JUn8W/lXKU1+DXbok= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-588-n6ShaFpmO8GKtexO1bl7cw-1; Thu, 21 Apr 2022 01:13:15 -0400 X-MC-Unique: n6ShaFpmO8GKtexO1bl7cw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C7D40185A794; Thu, 21 Apr 2022 05:13:14 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 791B6145BA5A; Thu, 21 Apr 2022 05:13:09 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 04/10] KVM: x86: mmu: tweak fast path for emulation of access to nested NPT pages Date: Thu, 21 Apr 2022 08:12:38 +0300 Message-Id: <20220421051244.187733-5-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --- arch/x86/kvm/mmu/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Signed-off-by: Maxim Levitsky diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 23f895d439cf5..b63398dfdac3b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5315,8 +5315,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, */ if (vcpu->arch.mmu->root_role.direct && (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) { - kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); - return 1; + if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa))) + return 1; } /* From patchwork Thu Apr 21 05:12:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 44CBCC4332F for ; Thu, 21 Apr 2022 05:13:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DA66910E5BD; Thu, 21 Apr 2022 05:13:30 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id AC9F610E5A4 for ; Thu, 21 Apr 2022 05:13:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e0g3mBb+ynPHD4DJjbvIQMVAKnaFWx4DSoUHFihzCfc=; b=Q1nNY1IIks8LQoMslGe6724FkJ6h/QnpNjC8X2GK4qUto9mj1Zd2itmqLD0YUjVhl+8zQC /SDh7xkwViF+HPVfmt2prvoD4v1xUTe+BOJ0CbiVfdWlevodnVoxfNsgJrxubINeTGheom RgZ8NLtK7NdbkO3zZkIKeoJeiKLZoTA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-330-zdEv8QKuNvOaoaf16ssl_A-1; Thu, 21 Apr 2022 01:13:22 -0400 X-MC-Unique: zdEv8QKuNvOaoaf16ssl_A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AB0802805336; Thu, 21 Apr 2022 05:13:20 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2DD9B145BA5A; Thu, 21 Apr 2022 05:13:14 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 05/10] KVM: x86: lapic: don't allow to change APIC ID when apic acceleration is enabled Date: Thu, 21 Apr 2022 08:12:39 +0300 Message-Id: <20220421051244.187733-6-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" No normal guest has any reason to change physical APIC IDs, and allowing this introduces bugs into APIC acceleration code. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/lapic.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66b0eb0bda94e..56996aeca9881 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2046,10 +2046,20 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) - kvm_apic_set_xapic_id(apic, val >> 24); - else + if (apic_x2apic_mode(apic)) { ret = 1; + break; + } + /* + * Don't allow setting APIC ID with any APIC acceleration + * enabled to avoid unexpected issues + */ + if (enable_apicv && ((val >> 24) != apic->vcpu->vcpu_id)) { + kvm_vm_bugged(apic->vcpu->kvm); + break; + } + + kvm_apic_set_xapic_id(apic, val >> 24); break; case APIC_TASKPRI: @@ -2617,8 +2627,16 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s, bool set) { - if (apic_x2apic_mode(vcpu->arch.apic)) { - u32 *id = (u32 *)(s->regs + APIC_ID); + u32 *id = (u32 *)(s->regs + APIC_ID); + + if (!apic_x2apic_mode(vcpu->arch.apic)) { + /* Don't allow setting APIC ID with any APIC acceleration + * enabled to avoid unexpected issues + */ + if (enable_apicv && (*id >> 24) != vcpu->vcpu_id) + return -EINVAL; + } else { + u32 *ldr = (u32 *)(s->regs + APIC_LDR); u64 icr; From patchwork Thu Apr 21 05:12:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1576AC433FE for ; Thu, 21 Apr 2022 05:13:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3623110E42F; Thu, 21 Apr 2022 05:13:35 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id D380910E42F for ; Thu, 21 Apr 2022 05:13:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518012; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MCnDcG50OJVRyX/LehdalM91tRLRSat9OLrAV6Orn+A=; b=BIQRD3QrZcS8k9ltH5wrPDWVhBr1nFy3Pl76pPCXD/JclpObByz0bBka4Pl/d58MXMqs7t BWBZhvoGvvtxZqox1rypUifWq2Q06cuGLOl6FLbupNL1zN1QQgguRis90QrWbGR/SToqnR a0QRGx1b5fF2QI4BXWIe+Ir5rmPyHAA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-391-fjwQIF29PQSTCAlkuArcbQ-1; Thu, 21 Apr 2022 01:13:27 -0400 X-MC-Unique: fjwQIF29PQSTCAlkuArcbQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6543980005D; Thu, 21 Apr 2022 05:13:26 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 112E2145BA5A; Thu, 21 Apr 2022 05:13:20 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 06/10] KVM: x86: SVM: remove avic's broken code that updated APIC ID Date: Thu, 21 Apr 2022 08:12:40 +0300 Message-Id: <20220421051244.187733-7-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Now that KVM doesn't allow to change APIC ID in case AVIC is enabled, remove buggy AVIC code that tried to do so. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 35 ----------------------------------- 1 file changed, 35 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 9b859218af59c..f375ca1d6518e 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -442,35 +442,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu) return ret; } -static int avic_handle_apic_id_update(struct kvm_vcpu *vcpu) -{ - u64 *old, *new; - struct vcpu_svm *svm = to_svm(vcpu); - u32 id = kvm_xapic_id(vcpu->arch.apic); - - if (vcpu->vcpu_id == id) - return 0; - - old = avic_get_physical_id_entry(vcpu, vcpu->vcpu_id); - new = avic_get_physical_id_entry(vcpu, id); - if (!new || !old) - return 1; - - /* We need to move physical_id_entry to new offset */ - *new = *old; - *old = 0ULL; - to_svm(vcpu)->avic_physical_id_cache = new; - - /* - * Also update the guest physical APIC ID in the logical - * APIC ID table entry if already setup the LDR. - */ - if (svm->ldr_reg) - avic_handle_ldr_update(vcpu); - - return 0; -} - static void avic_handle_dfr_update(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -489,10 +460,6 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu) AVIC_UNACCEL_ACCESS_OFFSET_MASK; switch (offset) { - case APIC_ID: - if (avic_handle_apic_id_update(vcpu)) - return 0; - break; case APIC_LDR: if (avic_handle_ldr_update(vcpu)) return 0; @@ -584,8 +551,6 @@ int avic_init_vcpu(struct vcpu_svm *svm) void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu) { - if (avic_handle_apic_id_update(vcpu) != 0) - return; avic_handle_dfr_update(vcpu); avic_handle_ldr_update(vcpu); } From patchwork Thu Apr 21 05:12:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A21E0C433EF for ; Thu, 21 Apr 2022 05:13:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 88E5210F368; Thu, 21 Apr 2022 05:13:39 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id D349510F367 for ; Thu, 21 Apr 2022 05:13:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bFfak0fL1x1PD2NGlDq0gbCHu4stLjWexcxTqgBpjZU=; b=iQLyocRbalAz8LbQOZ2bgZG2f1bDPviVD2pgOSVYIqiGwFI/x8/E3QcMPuiKZXS4Fg0OHP eTwcoGtZCIhnDoNrb+JhiNUSOcETigQ3+aa0E6I5DW3ukGxcKbnzQxxjCIGt5+IIKGTNuu v2MhahwloOKjSEIU9M3nHHBcfV6OO30= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-210-E-cy5rt1PnWU_0o6vX-bsg-1; Thu, 21 Apr 2022 01:13:33 -0400 X-MC-Unique: E-cy5rt1PnWU_0o6vX-bsg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 16E3D8038E3; Thu, 21 Apr 2022 05:13:32 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id BD2DE145BA5A; Thu, 21 Apr 2022 05:13:26 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 07/10] KVM: x86: SVM: move avic state to separate struct Date: Thu, 21 Apr 2022 08:12:41 +0300 Message-Id: <20220421051244.187733-8-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This will make the code a bit easier to read when nested AVIC support is added. No functional change intended. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 49 +++++++++++++++++++++++------------------ arch/x86/kvm/svm/svm.h | 14 +++++++----- 2 files changed, 36 insertions(+), 27 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index f375ca1d6518e..87756237c646d 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -69,6 +69,8 @@ int avic_ga_log_notifier(u32 ga_tag) unsigned long flags; struct kvm_svm *kvm_svm; struct kvm_vcpu *vcpu = NULL; + struct kvm_svm_avic *avic; + u32 vm_id = AVIC_GATAG_TO_VMID(ga_tag); u32 vcpu_id = AVIC_GATAG_TO_VCPUID(ga_tag); @@ -76,9 +78,13 @@ int avic_ga_log_notifier(u32 ga_tag) trace_kvm_avic_ga_log(vm_id, vcpu_id); spin_lock_irqsave(&svm_vm_data_hash_lock, flags); - hash_for_each_possible(svm_vm_data_hash, kvm_svm, hnode, vm_id) { - if (kvm_svm->avic_vm_id != vm_id) + hash_for_each_possible(svm_vm_data_hash, avic, hnode, vm_id) { + + + if (avic->vm_id != vm_id) continue; + + kvm_svm = container_of(avic, struct kvm_svm, avic); vcpu = kvm_get_vcpu_by_id(&kvm_svm->kvm, vcpu_id); break; } @@ -98,18 +104,18 @@ int avic_ga_log_notifier(u32 ga_tag) void avic_vm_destroy(struct kvm *kvm) { unsigned long flags; - struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; if (!enable_apicv) return; - if (kvm_svm->avic_logical_id_table_page) - __free_page(kvm_svm->avic_logical_id_table_page); - if (kvm_svm->avic_physical_id_table_page) - __free_page(kvm_svm->avic_physical_id_table_page); + if (avic->logical_id_table_page) + __free_page(avic->logical_id_table_page); + if (avic->physical_id_table_page) + __free_page(avic->physical_id_table_page); spin_lock_irqsave(&svm_vm_data_hash_lock, flags); - hash_del(&kvm_svm->hnode); + hash_del(&avic->hnode); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); } @@ -117,10 +123,9 @@ int avic_vm_init(struct kvm *kvm) { unsigned long flags; int err = -ENOMEM; - struct kvm_svm *kvm_svm = to_kvm_svm(kvm); - struct kvm_svm *k2; struct page *p_page; struct page *l_page; + struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; u32 vm_id; if (!enable_apicv) @@ -131,14 +136,14 @@ int avic_vm_init(struct kvm *kvm) if (!p_page) goto free_avic; - kvm_svm->avic_physical_id_table_page = p_page; + avic->physical_id_table_page = p_page; /* Allocating logical APIC ID table (4KB) */ l_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!l_page) goto free_avic; - kvm_svm->avic_logical_id_table_page = l_page; + avic->logical_id_table_page = l_page; spin_lock_irqsave(&svm_vm_data_hash_lock, flags); again: @@ -149,13 +154,15 @@ int avic_vm_init(struct kvm *kvm) } /* Is it still in use? Only possible if wrapped at least once */ if (next_vm_id_wrapped) { - hash_for_each_possible(svm_vm_data_hash, k2, hnode, vm_id) { - if (k2->avic_vm_id == vm_id) + struct kvm_svm_avic *avic2; + + hash_for_each_possible(svm_vm_data_hash, avic2, hnode, vm_id) { + if (avic2->vm_id == vm_id) goto again; } } - kvm_svm->avic_vm_id = vm_id; - hash_add(svm_vm_data_hash, &kvm_svm->hnode, kvm_svm->avic_vm_id); + avic->vm_id = vm_id; + hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); return 0; @@ -169,8 +176,8 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) { struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm); phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page)); - phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page)); - phys_addr_t ppa = __sme_set(page_to_phys(kvm_svm->avic_physical_id_table_page)); + phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic.logical_id_table_page)); + phys_addr_t ppa = __sme_set(page_to_phys(kvm_svm->avic.physical_id_table_page)); vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK; vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK; @@ -193,7 +200,7 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu, if (index >= AVIC_MAX_PHYSICAL_ID_COUNT) return NULL; - avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page); + avic_physical_id_table = page_address(kvm_svm->avic.physical_id_table_page); return &avic_physical_id_table[index]; } @@ -387,7 +394,7 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat) index = (cluster << 2) + apic; } - logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page); + logical_apic_id_table = (u32 *) page_address(kvm_svm->avic.logical_id_table_page); return &logical_apic_id_table[index]; } @@ -737,7 +744,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq, /* Try to enable guest_mode in IRTE */ pi.base = __sme_set(page_to_phys(svm->avic_backing_page) & AVIC_HPA_MASK); - pi.ga_tag = AVIC_GATAG(to_kvm_svm(kvm)->avic_vm_id, + pi.ga_tag = AVIC_GATAG(to_kvm_svm(kvm)->avic.vm_id, svm->vcpu.vcpu_id); pi.is_guest_mode = true; pi.vcpu_data = &vcpu_info; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index e246793cbeaeb..96390fa5e3917 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -88,15 +88,17 @@ struct kvm_sev_info { atomic_t migration_in_progress; }; -struct kvm_svm { - struct kvm kvm; - /* Struct members for AVIC */ - u32 avic_vm_id; - struct page *avic_logical_id_table_page; - struct page *avic_physical_id_table_page; +struct kvm_svm_avic { + u32 vm_id; + struct page *logical_id_table_page; + struct page *physical_id_table_page; struct hlist_node hnode; +}; +struct kvm_svm { + struct kvm kvm; + struct kvm_svm_avic avic; struct kvm_sev_info sev_info; }; From patchwork Thu Apr 21 05:12:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3755FC433FE for ; Thu, 21 Apr 2022 05:13:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2E04310F36D; Thu, 21 Apr 2022 05:13:46 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0452210F36D for ; Thu, 21 Apr 2022 05:13:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=httJCT0Tg3TZnTNvm8TBcLaMq9M0mwNcCFiF9q7hQXs=; b=igqxYCTx1Gf0JhOi1ho38yrDaJ2MIyZtKQyFmA5CTmuE9kd3EyxTAc1YXbCoRKSRPOr5u+ a9Y93uerCgpLM4YKqSvvNgS13yNtY4gKEFtWzNFWFB637ItNX3CWmRkjIQdtU1UEbv4Qgf xxCd78QziSgpIcyQZqe8FvwogifHp7Q= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-460-VNUGnnvNOnmOxtmgV5S5vw-1; Thu, 21 Apr 2022 01:13:38 -0400 X-MC-Unique: VNUGnnvNOnmOxtmgV5S5vw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C01BF803D65; Thu, 21 Apr 2022 05:13:37 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 70F05145BA5A; Thu, 21 Apr 2022 05:13:32 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 08/10] KVM: x86: rename .set_apic_access_page_addr to reload_apic_access_page Date: Thu, 21 Apr 2022 08:12:42 +0300 Message-Id: <20220421051244.187733-9-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This will be used on SVM to reload shadow page of the AVIC physid table No functional change intended Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/vmx/vmx.c | 8 ++++---- arch/x86/kvm/x86.c | 6 +++--- 4 files changed, 9 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 96e4e9842dfc6..997edb7453ac2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -82,7 +82,7 @@ KVM_X86_OP_OPTIONAL(hwapic_isr_update) KVM_X86_OP_OPTIONAL_RET0(guest_apic_has_interrupt) KVM_X86_OP_OPTIONAL(load_eoi_exitmap) KVM_X86_OP_OPTIONAL(set_virtual_apic_mode) -KVM_X86_OP_OPTIONAL(set_apic_access_page_addr) +KVM_X86_OP_OPTIONAL(reload_apic_pages) KVM_X86_OP(deliver_interrupt) KVM_X86_OP_OPTIONAL(sync_pir_to_irr) KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ae41d2df69fe9..f83cfcd7dd74c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1415,7 +1415,7 @@ struct kvm_x86_ops { bool (*guest_apic_has_interrupt)(struct kvm_vcpu *vcpu); void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu); - void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu); + void (*reload_apic_pages)(struct kvm_vcpu *vcpu); void (*deliver_interrupt)(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector); int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); @@ -1888,7 +1888,6 @@ int kvm_cpu_has_extint(struct kvm_vcpu *v); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); - int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low, unsigned long ipi_bitmap_high, u32 min, unsigned long icr, int op_64_bit); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cf8581978bce3..7defd31703c61 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6339,7 +6339,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) vmx_update_msr_bitmap_x2apic(vcpu); } -static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) +static void vmx_reload_apic_access_page(struct kvm_vcpu *vcpu) { struct page *page; @@ -7777,7 +7777,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .enable_irq_window = vmx_enable_irq_window, .update_cr8_intercept = vmx_update_cr8_intercept, .set_virtual_apic_mode = vmx_set_virtual_apic_mode, - .set_apic_access_page_addr = vmx_set_apic_access_page_addr, + .reload_apic_pages = vmx_reload_apic_access_page, .refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl, .load_eoi_exitmap = vmx_load_eoi_exitmap, .apicv_post_state_restore = vmx_apicv_post_state_restore, @@ -7940,12 +7940,12 @@ static __init int hardware_setup(void) enable_vnmi = 0; /* - * set_apic_access_page_addr() is used to reload apic access + * kvm_vcpu_reload_apic_pages() is used to reload apic access * page upon invalidation. No need to do anything if not * using the APIC_ACCESS_ADDR VMCS field. */ if (!flexpriority_enabled) - vmx_x86_ops.set_apic_access_page_addr = NULL; + vmx_x86_ops.reload_apic_pages = NULL; if (!cpu_has_vmx_tpr_shadow()) vmx_x86_ops.update_cr8_intercept = NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab336f7c82e4b..3ac2d0134271b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9949,12 +9949,12 @@ void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); } -static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +static void kvm_vcpu_reload_apic_pages(struct kvm_vcpu *vcpu) { if (!lapic_in_kernel(vcpu)) return; - static_call_cond(kvm_x86_set_apic_access_page_addr)(vcpu); + static_call_cond(kvm_x86_reload_apic_pages)(vcpu); } void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu) @@ -10071,7 +10071,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_LOAD_EOI_EXITMAP, vcpu)) vcpu_load_eoi_exitmap(vcpu); if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) - kvm_vcpu_reload_apic_access_page(vcpu); + kvm_vcpu_reload_apic_pages(vcpu); if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) { vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH; From patchwork Thu Apr 21 05:12:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7A990C433F5 for ; Thu, 21 Apr 2022 05:13:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 77A9C10F339; Thu, 21 Apr 2022 05:13:51 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id E21BF10F339 for ; Thu, 21 Apr 2022 05:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518029; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0oDTC/F4WCpqkhQlb4Y7NxOZ6sI0GU5Tn9ocL+UaQDI=; b=gAaIhYkzznj/Xj7xq1jRTZGRt8F7bx7uszdUfU1XBvgpned6vzHsh+24RLQCgyDKRgKQRl eLH0MbfM83kaePcwtGDFqN5uG0gwdWG6qnbRXvCy+g2WOXtVHKLP6FHE5bq6ItTp87a5cE ia7eNVMkzNxaz0viaBijIlwGfZoSkdI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-591-DyysfhH4PSWgs0-iHThtvA-1; Thu, 21 Apr 2022 01:13:44 -0400 X-MC-Unique: DyysfhH4PSWgs0-iHThtvA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A71B6280533B; Thu, 21 Apr 2022 05:13:43 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27320145B96B; Thu, 21 Apr 2022 05:13:37 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 09/10] KVM: nSVM: implement support for nested AVIC Date: Thu, 21 Apr 2022 08:12:43 +0300 Message-Id: <20220421051244.187733-10-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This implements initial support of using the AVIC in a nested guest Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 850 +++++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm/nested.c | 131 +++++- arch/x86/kvm/svm/svm.c | 18 + arch/x86/kvm/svm/svm.h | 150 +++++++ arch/x86/kvm/trace.h | 140 ++++++- arch/x86/kvm/x86.c | 11 + 6 files changed, 1282 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 87756237c646d..9176c35662ada 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -51,6 +51,526 @@ static u32 next_vm_id = 0; static bool next_vm_id_wrapped = 0; static DEFINE_SPINLOCK(svm_vm_data_hash_lock); +static u32 nested_avic_get_reg(struct kvm_vcpu *vcpu, int reg_off) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + void *nested_apic_regs = svm->nested.l2_apic_access_page.hva; + + if (WARN_ON_ONCE(!nested_apic_regs)) + return 0; + + return *((u32 *) (nested_apic_regs + reg_off)); +} + +static inline struct kvm_vcpu *avic_vcpu_by_l1_apicid(struct kvm *kvm, + int l1_apicid) +{ + WARN_ON(l1_apicid == -1); + return kvm_get_vcpu_by_id(kvm, l1_apicid); +} + +static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, + struct avic_physid_table *t, + int n, + int new_l1_apicid) +{ + struct avic_physid_entry_descr *e = &t->entries[n]; + u64 sentry = READ_ONCE(*e->sentry); + u64 old_sentry = sentry; + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + struct kvm_vcpu *new_vcpu = NULL; + int l0_apicid = -1; + unsigned long flags; + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + WARN_ON(!test_bit(n, t->valid_entires)); + + if (!list_empty(&e->link)) + list_del_init(&e->link); + + if (new_l1_apicid != -1) + new_vcpu = avic_vcpu_by_l1_apicid(kvm, new_l1_apicid); + + if (new_vcpu) + list_add_tail(&e->link, &to_svm(new_vcpu)->nested.physid_ref_entries); + + if (new_vcpu && to_svm(new_vcpu)->nested_avic_active) + l0_apicid = kvm_cpu_get_apicid(new_vcpu->cpu); + + physid_entry_set_apicid(&sentry, l0_apicid); + + if (sentry != old_sentry) + WRITE_ONCE(*e->sentry, sentry); + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} + +static void avic_physid_shadow_entry_create(struct kvm *kvm, + struct avic_physid_table *t, + int n, + u64 gentry) +{ + struct avic_physid_entry_descr *e = &t->entries[n]; + struct page *backing_page; + u64 backing_page_gpa = physid_entry_get_backing_table(gentry); + int l1_apic_id = physid_entry_get_apicid(gentry); + hpa_t backing_page_hpa; + u64 sentry = 0; + + + if (backing_page_gpa == INVALID_BACKING_PAGE) + return; + + /* Pin the APIC backing page */ + backing_page = gfn_to_page(kvm, gpa_to_gfn(backing_page_gpa)); + + if (is_error_page(backing_page)) + /* Invalid GPA in the guest entry - point to a dummy entry */ + backing_page_hpa = t->dummy_page_hpa; + else + backing_page_hpa = page_to_phys(backing_page); + + physid_entry_set_backing_table(&sentry, backing_page_hpa); + + e->gentry = gentry; + *e->sentry = sentry; + + if (test_and_set_bit(n, t->valid_entires)) + WARN_ON(1); + + if (backing_page_hpa != t->dummy_page_hpa) + avic_physid_shadow_entry_set_vcpu(kvm, t, n, l1_apic_id); +} + +static void avic_physid_shadow_entry_remove(struct kvm *kvm, + struct avic_physid_table *t, + int n) +{ + struct avic_physid_entry_descr *e = &t->entries[n]; + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + hpa_t backing_page_hpa; + unsigned long flags; + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + if (!test_and_clear_bit(n, t->valid_entires)) + WARN_ON(1); + + /* Release the APIC backing page */ + backing_page_hpa = physid_entry_get_backing_table(*e->sentry); + + if (backing_page_hpa != t->dummy_page_hpa) + kvm_release_pfn_dirty(backing_page_hpa >> PAGE_SHIFT); + + if (!list_empty(&e->link)) + list_del_init(&e->link); + + e->gentry = 0; + *e->sentry = 0; + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} + +static void avic_update_peer_physid_entries(struct kvm_vcpu *vcpu, int cpu) +{ + /* + * Update all shadow physid tables which contain entries + * which reference this vCPU with its new physical location + */ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct vcpu_svm *vcpu_svm = to_svm(vcpu); + struct avic_physid_entry_descr *e; + int nentries = 0; + int l0_apicid = -1; + unsigned long flags; + bool new_active = cpu != -1; + + if (vcpu_svm->nested_avic_active == new_active) + return; + + if (cpu != -1) + l0_apicid = kvm_cpu_get_apicid(cpu); + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + list_for_each_entry(e, &vcpu_svm->nested.physid_ref_entries, link) { + u64 sentry = READ_ONCE(*e->sentry); + u64 old_sentry = sentry; + + physid_entry_set_apicid(&sentry, l0_apicid); + + if (sentry != old_sentry) + WRITE_ONCE(*e->sentry, sentry); + + nentries++; + } + + if (nentries) + trace_kvm_avic_physid_update_vcpu(vcpu->vcpu_id, cpu, nentries); + + vcpu_svm->nested_avic_active = new_active; + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} + +static bool +avic_physid_shadow_table_setup_write_tracking(struct kvm *kvm, + struct avic_physid_table *t, + bool enable) +{ + struct kvm_memory_slot *slot; + + write_lock(&kvm->mmu_lock); + slot = gfn_to_memslot(kvm, t->gfn); + if (!slot) { + write_unlock(&kvm->mmu_lock); + return false; + } + + if (enable) + kvm_slot_page_track_add_page(kvm, slot, t->gfn, KVM_PAGE_TRACK_WRITE); + else + kvm_slot_page_track_remove_page(kvm, slot, t->gfn, KVM_PAGE_TRACK_WRITE); + write_unlock(&kvm->mmu_lock); + return true; +} + +static void +avic_physid_shadow_table_erase(struct kvm *kvm, struct avic_physid_table *t) +{ + int i; + + if (!t->nentries) + return; + + avic_physid_shadow_table_setup_write_tracking(kvm, t, false); + + for_each_set_bit(i, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) + avic_physid_shadow_entry_remove(kvm, t, i); + + t->nentries = 0; + t->flood_count = 0; +} + +static struct avic_physid_table * +avic_physid_shadow_table_alloc(struct kvm *kvm, gfn_t gfn) +{ + struct avic_physid_entry_descr *e; + struct avic_physid_table *t; + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + u64 *shadow_table_address; + int i; + + if (kvm_page_track_write_tracking_enable(kvm)) + return NULL; + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + + t = kzalloc(sizeof(*t), GFP_KERNEL_ACCOUNT); + if (!t) + return NULL; + + t->shadow_table = alloc_page(GFP_KERNEL_ACCOUNT|__GFP_ZERO); + if (!t->shadow_table) + goto err_free_table; + + shadow_table_address = page_address(t->shadow_table); + t->shadow_table_hpa = __sme_set(page_to_phys(t->shadow_table)); + + for (i = 0; i < ARRAY_SIZE(t->entries); i++) { + e = &t->entries[i]; + e->sentry = &shadow_table_address[i]; + e->gentry = 0; + INIT_LIST_HEAD(&e->link); + } + + t->gfn = gfn; + t->refcount = 1; + list_add_tail(&t->link, &kvm_svm->avic.physid_tables); + + t->dummy_page_hpa = page_to_phys(kvm_svm->avic.invalid_physid_page); + + trace_kvm_avic_physid_table_alloc(gfn_to_gpa(gfn)); + return t; + +err_free_table: + kfree(t); + return NULL; +} + +static void +avic_physid_shadow_table_free(struct kvm *kvm, struct avic_physid_table *t) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + + WARN_ON(t->refcount); + + avic_physid_shadow_table_erase(kvm, t); + + trace_kvm_avic_physid_table_free(gfn_to_gpa(t->gfn)); + + hlist_del(&t->hash_link); + list_del(&t->link); + __free_page(t->shadow_table); + kfree(t); +} + +static struct avic_physid_table * +__avic_physid_shadow_table_get(struct hlist_head *head, gfn_t gfn) +{ + struct avic_physid_table *t; + + hlist_for_each_entry(t, head, hash_link) + if (t->gfn == gfn) { + t->refcount++; + return t; + } + return NULL; +} + +struct avic_physid_table * +avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct hlist_head *hlist; + struct avic_physid_table *t; + + mutex_lock(&kvm_svm->avic.tables_lock); + + hlist = &kvm_svm->avic.physid_gpa_hash[avic_physid_hash(gfn)]; + t = __avic_physid_shadow_table_get(hlist, gfn); + if (!t) { + t = avic_physid_shadow_table_alloc(vcpu->kvm, gfn); + if (!t) + goto out_unlock; + hlist_add_head(&t->hash_link, hlist); + } + t->flood_count = 0; +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); + return t; +} + +static void +__avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t) +{ + WARN_ON(t->refcount <= 0); + if (--t->refcount == 0) + avic_physid_shadow_table_free(kvm, t); +} + +void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + + mutex_lock(&kvm_svm->avic.tables_lock); + __avic_physid_shadow_table_put(kvm, t); + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +static void avic_physid_shadow_table_invalidate(struct kvm *kvm, + struct avic_physid_table *t) +{ + avic_physid_shadow_table_erase(kvm, t); + kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); +} + +int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, + struct avic_physid_table *t, int nentries) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct kvm_host_map map; + u64 *gentries; + int i; + int ret = 0; + + mutex_lock(&kvm_svm->avic.tables_lock); + + if (t->nentries >= nentries) + goto out_unlock; + + trace_kvm_avic_physid_table_reload(gfn_to_gpa(t->gfn), t->nentries, nentries); + + if (t->nentries == 0) { + if (!avic_physid_shadow_table_setup_write_tracking(vcpu->kvm, t, true)) { + ret = -EFAULT; + goto out_unlock; + } + } + + if (kvm_vcpu_map(vcpu, t->gfn, &map)) { + ret = -EFAULT; + goto out_unlock; + } + + gentries = (u64 *)map.hva; + + for (i = t->nentries ; i < nentries ; i++) + avic_physid_shadow_entry_create(vcpu->kvm, t, i, gentries[i]); + + /* publish the table before setting nentries */ + wmb(); + WRITE_ONCE(t->nentries, nentries); + + kvm_vcpu_unmap(vcpu, &map, false); +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); + return ret; +} + +static void avic_physid_shadow_table_track_write(struct kvm_vcpu *vcpu, + gpa_t gpa, + const u8 *new, + int bytes, + struct kvm_page_track_notifier_node *node) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct hlist_head *hlist; + struct avic_physid_table *t; + gfn_t gfn = gpa_to_gfn(gpa); + unsigned int page_offset = offset_in_page(gpa); + unsigned int entry_offset = page_offset & 0x7; + int first = page_offset / sizeof(u64); + int last = (page_offset + bytes - 1) / sizeof(u64); + u64 new_entry, old_entry; + int l1_apic_id; + + if (WARN_ON_ONCE(bytes == 0)) + return; + + mutex_lock(&kvm_svm->avic.tables_lock); + + hlist = &kvm_svm->avic.physid_gpa_hash[avic_physid_hash(gfn)]; + t = __avic_physid_shadow_table_get(hlist, gfn); + + if (!t) + goto out_unlock; + + trace_kvm_avic_physid_table_write(gpa, bytes); + + /* + * Update policy: + * + * Only a write to a single entry, entry that had a valid backing page + * on the last VM entry with this page, and only if the + * write touches only the is_running and/or apic_id part of this entry + * is allowed. + * + * Writes outside of known number of entries are ignored to support + * case when the guest is adding entries to end of the page + * in the process of a cpu hotplug. + * + * All other writes, which are not supposed to happen during + * use of the page, cause the page to be invalidated, + * and read as a whole, next time it is used by a vCPU for VM entry. + */ + + if (first >= t->nentries) + goto out_table_put; + + if (first != last || !test_bit(first, t->valid_entires)) + goto invalidate; + + /* update the entry with written bytes */ + old_entry = t->entries[first].gentry; + new_entry = old_entry; + memcpy(((u8 *)&new_entry) + entry_offset, new, bytes); + + /* if backing page changed, invalidate the whole page*/ + if (physid_entry_get_backing_table(old_entry) != + physid_entry_get_backing_table(new_entry)) + goto invalidate; + + if (++t->flood_count > t->nentries * AVIC_PHYSID_FLOOD_COUNT) + goto invalidate; + + /* Update the backing cpu */ + l1_apic_id = physid_entry_get_apicid(new_entry); + avic_physid_shadow_entry_set_vcpu(vcpu->kvm, t, first, l1_apic_id); + t->entries[first].gentry = new_entry; + goto out_table_put; +invalidate: + avic_physid_shadow_table_invalidate(vcpu->kvm, t); +out_table_put: + __avic_physid_shadow_table_put(vcpu->kvm, t); +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +static void avic_physid_shadow_table_flush_memslot(struct kvm *kvm, + struct kvm_memory_slot *slot, + struct kvm_page_track_notifier_node *node) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + struct avic_physid_table *t, *n; + int i; + + mutex_lock(&kvm_svm->avic.tables_lock); + + list_for_each_entry_safe(t, n, &kvm_svm->avic.physid_tables, link) { + + if (gfn_in_memslot(slot, t->gfn)) { + avic_physid_shadow_table_invalidate(kvm, t); + continue; + } + + for_each_set_bit(i, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) { + u64 gentry = t->entries[i].gentry; + gpa_t gpa = physid_entry_get_backing_table(gentry); + + if (gfn_in_memslot(slot, gpa_to_gfn(gpa))) { + avic_physid_shadow_table_invalidate(kvm, t); + break; + } + } + } + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu) +{ + int off; + + if (!nested_avic_in_use(vcpu)) + return false; + + for (off = 0x10; off < 0x80; off += 0x10) + if (nested_avic_get_reg(vcpu, APIC_IRR + off)) + return true; + return false; +} + +void avic_reload_apic_pages(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *vcpu_svm = to_svm(vcpu); + struct avic_physid_table *t = vcpu_svm->nested.l2_physical_id_table; + + int nentries = vcpu_svm->nested.ctl.avic_physical_id & + AVIC_PHYSICAL_ID_TABLE_SIZE_MASK; + + if (t && is_guest_mode(vcpu) && nested_avic_in_use(vcpu)) + avic_physid_shadow_table_sync(vcpu, t, nentries); +} + +void avic_free_nested(struct kvm_vcpu *vcpu) +{ + struct avic_physid_table *t; + struct vcpu_svm *svm = to_svm(vcpu); + + t = svm->nested.l2_physical_id_table; + if (t) { + avic_physid_shadow_table_put(vcpu->kvm, t); + svm->nested.l2_physical_id_table = NULL; + } + + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); +} + /* * This is a wrapper of struct amd_iommu_ir_data. */ @@ -105,26 +625,38 @@ void avic_vm_destroy(struct kvm *kvm) { unsigned long flags; struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; + unsigned long i; + struct kvm_vcpu *vcpu; if (!enable_apicv) return; + kvm_for_each_vcpu(i, vcpu, kvm) { + vcpu_load(vcpu); + avic_free_nested(vcpu); + vcpu_put(vcpu); + } + if (avic->logical_id_table_page) __free_page(avic->logical_id_table_page); if (avic->physical_id_table_page) __free_page(avic->physical_id_table_page); + if (avic->invalid_physid_page) + __free_page(avic->invalid_physid_page); spin_lock_irqsave(&svm_vm_data_hash_lock, flags); hash_del(&avic->hnode); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); + + + kvm_page_track_unregister_notifier(kvm, &avic->write_tracker); } int avic_vm_init(struct kvm *kvm) { unsigned long flags; int err = -ENOMEM; - struct page *p_page; - struct page *l_page; + struct page *page; struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; u32 vm_id; @@ -132,18 +664,26 @@ int avic_vm_init(struct kvm *kvm) return 0; /* Allocating physical APIC ID table (4KB) */ - p_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); - if (!p_page) + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) goto free_avic; - avic->physical_id_table_page = p_page; + avic->physical_id_table_page = page; /* Allocating logical APIC ID table (4KB) */ - l_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); - if (!l_page) + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) + goto free_avic; + + avic->logical_id_table_page = page; + + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) goto free_avic; - avic->logical_id_table_page = l_page; + /* Allocating dummy page for invalid nested avic physid entries */ + avic->invalid_physid_page = page; + spin_lock_irqsave(&svm_vm_data_hash_lock, flags); again: @@ -165,6 +705,14 @@ int avic_vm_init(struct kvm *kvm) hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); + raw_spin_lock_init(&avic->table_entries_lock); + mutex_init(&avic->tables_lock); + INIT_LIST_HEAD(&avic->physid_tables); + + avic->write_tracker.track_write = avic_physid_shadow_table_track_write; + avic->write_tracker.track_flush_slot = avic_physid_shadow_table_flush_memslot; + + kvm_page_track_register_notifier(kvm, &avic->write_tracker); return 0; free_avic: @@ -316,6 +864,161 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, } } +static void +avic_kick_target_vcpu_nested_physical(struct vcpu_svm *svm, + int target_l2_apic_id, + int *index, + bool *invalid_page) +{ + u64 gentry, sentry; + int target_l1_apicid; + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + + if (WARN_ON_ONCE(!t)) + return; + + /* + * This shouldn't normally happen as such condition + * should cause AVIC_IPI_FAILURE_INVALID_TARGET vmexit, + * however guest can change the page under us. + */ + if (target_l2_apic_id >= t->nentries) + return; + + gentry = t->entries[target_l2_apic_id].gentry; + sentry = *t->entries[target_l2_apic_id].sentry; + + /* Same reasoning as above */ + if (!(gentry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return; + + /* + * This races against the guest updating is_running bit. + * Race itself happens on real hardware as well, and the guest + * should use correct means to avoid it. + * + * AVIC hardware already set IRR and should have done memory + * barrier, and then found that shadowed is_running is false. + * We are doing another is_running check, completing it, + * thus don't need additional memory barriers + */ + + target_l1_apicid = physid_entry_get_apicid(gentry); + + if (target_l1_apicid == -1) { + + /* is_running is false, need to vmexit to the guest */ + if (*index == -1) { + u64 backing_page_phys = physid_entry_get_backing_table(sentry); + + *index = target_l2_apic_id; + if (backing_page_phys == t->dummy_page_hpa) + *invalid_page = true; + } + } else { + /* Wake up the target vCPU and hide the VM exit from the guest */ + struct kvm_vcpu *target = avic_vcpu_by_l1_apicid(svm->vcpu.kvm, target_l1_apicid); + + if (target && target != &svm->vcpu) + kvm_vcpu_wake_up(target); + } + + trace_kvm_avic_nested_kick_vcpu(svm->vcpu.vcpu_id, + target_l2_apic_id, + target_l1_apicid); +} + +static void +avic_kick_target_vcpus_nested_logical(struct vcpu_svm *svm, unsigned long dest, + int *index, bool *invalid_page) +{ + int logical_id; + u8 cluster = 0; + u64 *logical_id_table = (u64 *)svm->nested.l2_logical_id_table.hva; + int physical_index = -1; + + if (WARN_ON_ONCE(!logical_id_table)) + return; + + if (nested_avic_get_reg(&svm->vcpu, APIC_DFR) == APIC_DFR_CLUSTER) { + if (dest >= 0x40) + return; + cluster = dest & 0x3C; + dest &= 0x3; + } + + for_each_set_bit(logical_id, &dest, 8) { + int logical_index = cluster | logical_id; + u64 log_gentry = logical_id_table[logical_index]; + int l2_apicid = logid_get_physid(log_gentry); + + /* Should not happen as in this case AVIC should VM exit + * with 'invalid target' + + * However the guest can change the entry under KVM's back, + * thus ignore this case. + */ + if (l2_apicid == -1) + continue; + + avic_kick_target_vcpu_nested_physical(svm, l2_apicid, + &physical_index, + invalid_page); + + /* Reported index is the index of the logical entry in this case */ + if (physical_index != -1) + *index = logical_index; + } +} + +static void +avic_kick_target_vcpus_nested_broadcast(struct vcpu_svm *svm, + int *index, bool *invalid_page) +{ + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + int l2_apicid; + + /* + * This races against guest changing valid bit in the table and/or + * increasing nentries of the table. + * In both cases the race would happen on real hardware as well + * thus there is no need to take locks. + */ + for_each_set_bit(l2_apicid, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) + avic_kick_target_vcpu_nested_physical(svm, l2_apicid, + index, invalid_page); +} + + +static void avic_kick_target_vcpus_nested(struct kvm_vcpu *vcpu, + struct kvm_lapic *source, + u32 icrl, u32 icrh, + int *index, bool *invalid_page) +{ + struct vcpu_svm *svm = to_svm(vcpu); + int dest = GET_APIC_DEST_FIELD(icrh); + + switch (icrl & APIC_SHORT_MASK) { + case APIC_DEST_NOSHORT: + if (dest == 0xFF) + avic_kick_target_vcpus_nested_broadcast(svm, + index, invalid_page); + else if (icrl & APIC_DEST_MASK) + avic_kick_target_vcpus_nested_logical(svm, dest, + index, invalid_page); + else + avic_kick_target_vcpu_nested_physical(svm, dest, + index, invalid_page); + break; + case APIC_DEST_ALLINC: + case APIC_DEST_ALLBUT: + avic_kick_target_vcpus_nested_broadcast(svm, index, invalid_page); + break; + case APIC_DEST_SELF: + break; + } +} + int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -323,10 +1026,20 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) u32 icrl = svm->vmcb->control.exit_info_1; u32 id = svm->vmcb->control.exit_info_2 >> 32; u32 index = svm->vmcb->control.exit_info_2 & 0xFF; + int nindex = -1; + bool invalid_page = false; + struct kvm_lapic *apic = vcpu->arch.apic; trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index); + if (is_guest_mode(&svm->vcpu)) { + if (WARN_ON_ONCE(!nested_avic_in_use(vcpu))) + return 1; + if (WARN_ON_ONCE(!svm->nested.l2_physical_id_table)) + return 1; + } + switch (id) { case AVIC_IPI_FAILURE_INVALID_INT_TYPE: /* @@ -338,23 +1051,49 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) * which case KVM needs to emulate the ICR write as well in * order to clear the BUSY flag. */ + if (is_guest_mode(&svm->vcpu)) { + nested_svm_vmexit(svm); + break; + } + if (icrl & APIC_ICR_BUSY) kvm_apic_write_nodecode(vcpu, APIC_ICR); else kvm_apic_send_ipi(apic, icrl, icrh); + break; case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: /* * At this point, we expect that the AVIC HW has already * set the appropriate IRR bits on the valid target * vcpus. So, we just need to kick the appropriate vcpu. + * + * If nested we might also need to reflect the VM exit to + * the guest */ - avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh); + if (!is_guest_mode(&svm->vcpu)) { + avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh); + break; + } + + avic_kick_target_vcpus_nested(vcpu, apic, icrl, icrh, + &nindex, &invalid_page); + if (nindex != -1) { + if (invalid_page) + id = AVIC_IPI_FAILURE_INVALID_BACKING_PAGE; + + svm->vmcb->control.exit_info_2 = ((u64)id << 32) | nindex; + nested_svm_vmexit(svm); + } break; case AVIC_IPI_FAILURE_INVALID_TARGET: + if (is_guest_mode(&svm->vcpu)) + nested_svm_vmexit(svm); + else + WARN_ON_ONCE(1); break; case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE: - WARN_ONCE(1, "Invalid backing page\n"); + WARN_ON_ONCE(1); break; default: pr_err("Unknown IPI interception\n"); @@ -370,6 +1109,48 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu) return 0; } +int avic_emulate_doorbell_write(struct kvm_vcpu *vcpu, u64 data) +{ + int source_l1_apicid = vcpu->vcpu_id; + int target_l1_apicid = data & AVIC_DOORBELL_PHYSICAL_ID_MASK; + bool target_running, target_nested; + struct kvm_vcpu *target; + + if (data & ~AVIC_DOORBELL_PHYSICAL_ID_MASK) + return 1; + + target = avic_vcpu_by_l1_apicid(vcpu->kvm, target_l1_apicid); + if (!target) + /* Guest bug: targeting invalid APIC ID. */ + return 0; + + target_running = READ_ONCE(target->mode) == IN_GUEST_MODE; + target_nested = is_guest_mode(target); + + trace_kvm_avic_nested_doorbell(source_l1_apicid, target_l1_apicid, + target_nested, target_running); + + /* + * Target is not in nested mode, thus doorbell doesn't affect it + * if it became just now nested now, + * it means that it processed the doorbell on entry + */ + if (!target_nested) + return 0; + + /* + * If the target vCPU is in guest mode, kick the real doorbell. + * Otherwise we need to wake it up in case it is not scheduled to run. + */ + if (target_running) + wrmsr(MSR_AMD64_SVM_AVIC_DOORBELL, + kvm_cpu_get_apicid(READ_ONCE(target->cpu)), 0); + else + kvm_vcpu_wake_up(target); + + return 0; +} + static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat) { struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); @@ -463,9 +1244,13 @@ static void avic_handle_dfr_update(struct kvm_vcpu *vcpu) static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu) { + struct vcpu_svm *svm = to_svm(vcpu); u32 offset = to_svm(vcpu)->vmcb->control.exit_info_1 & AVIC_UNACCEL_ACCESS_OFFSET_MASK; + if (WARN_ON_ONCE(is_guest_mode(&svm->vcpu))) + return 0; + switch (offset) { case APIC_LDR: if (avic_handle_ldr_update(vcpu)) @@ -523,6 +1308,8 @@ int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu) AVIC_UNACCEL_ACCESS_WRITE_MASK; bool trap = is_avic_unaccelerated_access_trap(offset); + WARN_ON_ONCE(is_guest_mode(&svm->vcpu)); + trace_kvm_avic_unaccelerated_access(vcpu->vcpu_id, offset, trap, write, vector); if (trap) { @@ -908,18 +1695,51 @@ static void avic_vcpu_load(struct kvm_vcpu *vcpu) int cpu = get_cpu(); WARN_ON(cpu != vcpu->cpu); - __avic_vcpu_load(vcpu, cpu); - put_cpu(); } static void avic_vcpu_put(struct kvm_vcpu *vcpu) { preempt_disable(); - __avic_vcpu_put(vcpu); + preempt_enable(); +} + + +void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + lockdep_assert_preemption_disabled(); + + if (svm->nested.initialized && svm->avic_enabled) + avic_update_peer_physid_entries(vcpu, cpu); +} + +void __nested_avic_put(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + lockdep_assert_preemption_disabled(); + + if (svm->nested.initialized && svm->avic_enabled) + avic_update_peer_physid_entries(vcpu, -1); +} + +void nested_avic_load(struct kvm_vcpu *vcpu) +{ + int cpu = get_cpu(); + + WARN_ON(cpu != vcpu->cpu); + __nested_avic_load(vcpu, cpu); + put_cpu(); +} +void nested_avic_put(struct kvm_vcpu *vcpu) +{ + preempt_disable(); + __nested_avic_put(vcpu); preempt_enable(); } @@ -983,3 +1803,7 @@ void avic_vcpu_unblocking(struct kvm_vcpu *vcpu) avic_vcpu_load(vcpu); } + +/* + * TODO: Deal with AVIC errata in regard to flushing TLB on vCPU change + */ diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index bed5e1692cef0..811fa79c51801 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -387,6 +387,14 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu, memcpy(to->reserved_sw, from->reserved_sw, sizeof(struct hv_enlightenments)); } + + /* copy avic related settings only when it is enabled */ + if (from->int_ctl & AVIC_ENABLE_MASK) { + to->avic_vapic_bar = from->avic_vapic_bar; + to->avic_backing_page = from->avic_backing_page; + to->avic_logical_id = from->avic_logical_id; + to->avic_physical_id = from->avic_physical_id; + } } void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm, @@ -539,6 +547,71 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm) svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat; } + +static bool nested_vmcb02_prepare_avic(struct vcpu_svm *svm) +{ + struct vmcb *vmcb02 = svm->nested.vmcb02.ptr; + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + gfn_t physid_gfn; + int physid_nentries; + + if (!nested_avic_in_use(&svm->vcpu)) + return true; + + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.ctl.avic_backing_page & AVIC_HPA_MASK), + &svm->nested.l2_apic_access_page)) + goto error; + + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.ctl.avic_logical_id & AVIC_HPA_MASK), + &svm->nested.l2_logical_id_table)) + goto error_unmap_backing_page; + + physid_gfn = gpa_to_gfn(svm->nested.ctl.avic_physical_id & + AVIC_HPA_MASK); + physid_nentries = svm->nested.ctl.avic_physical_id & + AVIC_PHYSICAL_ID_TABLE_SIZE_MASK; + + if (t && t->gfn != physid_gfn) { + avic_physid_shadow_table_put(svm->vcpu.kvm, t); + svm->nested.l2_physical_id_table = NULL; + } + + if (!svm->nested.l2_physical_id_table) { + t = avic_physid_shadow_table_get(&svm->vcpu, physid_gfn); + if (!t) + goto error_unmap_logical_id_table; + svm->nested.l2_physical_id_table = t; + } + + if (t->nentries < physid_nentries) + if (avic_physid_shadow_table_sync(&svm->vcpu, t, physid_nentries) < 0) + goto error_put_table; + + /* Everything is setup, we can enable AVIC */ + vmcb02->control.avic_vapic_bar = + svm->nested.ctl.avic_vapic_bar & VMCB_AVIC_APIC_BAR_MASK; + vmcb02->control.avic_backing_page = + pfn_to_hpa(svm->nested.l2_apic_access_page.pfn); + vmcb02->control.avic_logical_id = + pfn_to_hpa(svm->nested.l2_logical_id_table.pfn); + vmcb02->control.avic_physical_id = + (svm->nested.l2_physical_id_table->shadow_table_hpa) | physid_nentries; + + vmcb02->control.int_ctl |= AVIC_ENABLE_MASK; + vmcb_mark_dirty(vmcb02, VMCB_AVIC); + return true; + +error_put_table: + avic_physid_shadow_table_put(svm->vcpu.kvm, t); + svm->nested.l2_physical_id_table = NULL; +error_unmap_logical_id_table: + kvm_vcpu_unmap(&svm->vcpu, &svm->nested.l2_logical_id_table, false); +error_unmap_backing_page: + kvm_vcpu_unmap(&svm->vcpu, &svm->nested.l2_apic_access_page, false); +error: + return false; +} + static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12) { bool new_vmcb12 = false; @@ -627,6 +700,17 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm) else int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK); + if (nested_avic_in_use(vcpu)) { + + /* + * Enabling AVIC implicitly disables the + * V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR + * fields in the VMCB Control Word" + */ + int_ctl_vmcb12_bits &= ~V_IRQ_INJECTION_BITS_MASK; + } + + /* Copied from vmcb01. msrpm_base can be overwritten later. */ vmcb02->control.nested_ctl = vmcb01->control.nested_ctl; vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; @@ -829,7 +913,10 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true)) goto out_exit_err; - if (nested_svm_vmrun_msrpm(svm)) + if (!nested_svm_vmrun_msrpm(svm)) + goto out_exit_err; + + if (nested_vmcb02_prepare_avic(svm)) goto out; out_exit_err: @@ -844,7 +931,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) out: kvm_vcpu_unmap(vcpu, &map, true); - return ret; } @@ -956,6 +1042,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm) nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); + if (nested_avic_in_use(vcpu)) { + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); + } + svm_switch_vmcb(svm, &svm->vmcb01); if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) { @@ -1069,6 +1160,7 @@ int svm_allocate_nested(struct vcpu_svm *svm) svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm); svm->nested.initialized = true; + nested_avic_load(&svm->vcpu); return 0; err_free_vmcb02: @@ -1078,6 +1170,8 @@ int svm_allocate_nested(struct vcpu_svm *svm) void svm_free_nested(struct vcpu_svm *svm) { + struct kvm_vcpu *vcpu = &svm->vcpu; + if (!svm->nested.initialized) return; @@ -1096,6 +1190,11 @@ void svm_free_nested(struct vcpu_svm *svm) */ svm->nested.last_vmcb12_gpa = INVALID_GPA; + if (svm->avic_enabled) { + nested_avic_put(vcpu); + avic_free_nested(vcpu); + } + svm->nested.initialized = false; } @@ -1116,8 +1215,10 @@ void svm_leave_nested(struct kvm_vcpu *vcpu) nested_svm_uninit_mmu_context(vcpu); vmcb_mark_all_dirty(svm->vmcb); - } + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); + } kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); } @@ -1206,6 +1307,20 @@ static int nested_svm_intercept(struct vcpu_svm *svm) vmexit = NESTED_EXIT_DONE; break; } + case SVM_EXIT_AVIC_UNACCELERATED_ACCESS: { + /* + * Unaccelerated AVIC access is always reflected + * and there is no intercept bit for it + */ + vmexit = NESTED_EXIT_DONE; + break; + } + case SVM_EXIT_AVIC_INCOMPLETE_IPI: + /* + * Doesn't have an intercept bit, host needs to intercept + * and in some cases reflect to the guest + */ + break; default: { if (vmcb12_is_intercept(&svm->nested.ctl, exit_code)) vmexit = NESTED_EXIT_DONE; @@ -1296,6 +1411,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu) kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending; struct kvm_lapic *apic = vcpu->arch.apic; + if (lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &apic->pending_events)) { if (block_nested_events) @@ -1423,6 +1539,13 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst, dst->pause_filter_count = from->pause_filter_count; dst->pause_filter_thresh = from->pause_filter_thresh; /* 'clean' and 'reserved_sw' are not changed by KVM */ + + if (from->int_ctl & AVIC_ENABLE_MASK) { + dst->avic_vapic_bar = from->avic_vapic_bar; + dst->avic_backing_page = from->avic_backing_page; + dst->avic_logical_id = from->avic_logical_id; + dst->avic_physical_id = from->avic_physical_id; + } } static int svm_get_nested_state(struct kvm_vcpu *vcpu, @@ -1644,7 +1767,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) return false; - if (!nested_svm_vmrun_msrpm(svm)) { + if (!nested_svm_vmrun_msrpm(svm) || !nested_vmcb02_prepare_avic(svm)) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index fc1725b7d05f6..3d9ab1e7b2b52 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1301,6 +1301,8 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu) svm->guest_state_loaded = false; + INIT_LIST_HEAD(&svm->nested.physid_ref_entries); + return 0; error_free_vmsa_page: @@ -1390,8 +1392,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) sd->current_vmcb = svm->vmcb; indirect_branch_prediction_barrier(); } + if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_load(vcpu, cpu); + + __nested_avic_load(vcpu, cpu); } static void svm_vcpu_put(struct kvm_vcpu *vcpu) @@ -1399,6 +1404,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_put(vcpu); + __nested_avic_put(vcpu); + svm_prepare_host_switch(vcpu); ++vcpu->stat.host_state_reload; @@ -2764,6 +2771,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) u32 ecx = msr->index; u64 data = msr->data; switch (ecx) { + case MSR_AMD64_SVM_AVIC_DOORBELL: + return avic_emulate_doorbell_write(vcpu, data); case MSR_AMD64_TSC_RATIO: if (!svm->tsc_scaling_enabled) { @@ -4060,6 +4069,9 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC)) kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_X2APIC); } + + svm->avic_enabled = enable_apicv && guest_cpuid_has(vcpu, X86_FEATURE_AVIC); + init_vmcb_after_set_cpuid(vcpu); } @@ -4669,9 +4681,11 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .enable_nmi_window = svm_enable_nmi_window, .enable_irq_window = svm_enable_irq_window, .update_cr8_intercept = svm_update_cr8_intercept, + .reload_apic_pages = avic_reload_apic_pages, .refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl, .check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons, .apicv_post_state_restore = avic_apicv_post_state_restore, + .guest_apic_has_interrupt = avic_nested_has_interrupt, .get_mt_mask = svm_get_mt_mask, .get_exit_info = svm_get_exit_info, @@ -4798,6 +4812,9 @@ static __init void svm_set_cpu_caps(void) if (vgif) kvm_cpu_cap_set(X86_FEATURE_VGIF); + if (enable_apicv) + kvm_cpu_cap_set(X86_FEATURE_AVIC); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); } @@ -4923,6 +4940,7 @@ static __init int svm_hardware_setup(void) svm_x86_ops.vcpu_blocking = NULL; svm_x86_ops.vcpu_unblocking = NULL; svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL; + svm_x86_ops.guest_apic_has_interrupt = NULL; } if (vls) { diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 96390fa5e3917..7d1a5028750e6 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -89,13 +90,36 @@ struct kvm_sev_info { }; +#define AVIC_PHYSID_HASH_SHIFT 8 +#define AVIC_PHYSID_HASH_SIZE (1 << AVIC_PHYSID_HASH_SHIFT) + struct kvm_svm_avic { u32 vm_id; struct page *logical_id_table_page; struct page *physical_id_table_page; struct hlist_node hnode; + + raw_spinlock_t table_entries_lock; + struct mutex tables_lock; + + /* List of all shadow tables */ + struct list_head physid_tables; + + /* GPA hash table to find a shadow table via its GPA */ + struct hlist_head physid_gpa_hash[AVIC_PHYSID_HASH_SIZE]; + + struct kvm_page_track_notifier_node write_tracker; + + struct page *invalid_physid_page; }; + +static __always_inline unsigned int avic_physid_hash(gfn_t gfn) +{ + return hash_64(gfn, AVIC_PHYSID_HASH_SHIFT); +} + + struct kvm_svm { struct kvm kvm; struct kvm_svm_avic avic; @@ -145,6 +169,51 @@ struct vmcb_ctrl_area_cached { u64 virt_ext; u32 clean; u8 reserved_sw[32]; + + u64 avic_vapic_bar; + u64 avic_backing_page; + u64 avic_logical_id; + u64 avic_physical_id; +}; + +struct avic_physid_entry_descr { + struct list_head link; + + /* cached value of guest entry */ + u64 gentry; + + /* shadow table entry pointer*/ + u64 *sentry; +}; + +#define AVIC_PHYSID_FLOOD_COUNT 5 + +struct avic_physid_table { + /* List of all tables member */ + struct list_head link; + + /* GPA hash of all tables member */ + struct hlist_node hash_link; + + /* GPA of the table in guest memory*/ + gfn_t gfn; + + /* Number of entries that we shadow and which are valid*/ + int nentries; + DECLARE_BITMAP(valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT); + + struct avic_physid_entry_descr entries[AVIC_MAX_PHYSICAL_ID_COUNT]; + + /* Guest visible shadow table */ + struct page *shadow_table; + hpa_t shadow_table_hpa; + hpa_t dummy_page_hpa; + + /* Number of vCPUs which are in nested mode and use this table */ + int refcount; + + /* Number of writes to this page between uses of it*/ + int flood_count; }; struct svm_nested_state { @@ -180,6 +249,13 @@ struct svm_nested_state { * on its side. */ bool force_msr_bitmap_recalc; + + /* All AVIC shadow PID table entry descriptors that reference this vCPU */ + struct list_head physid_ref_entries; + + struct kvm_host_map l2_apic_access_page; + struct kvm_host_map l2_logical_id_table; + struct avic_physid_table *l2_physical_id_table; }; struct vcpu_sev_es_state { @@ -242,11 +318,13 @@ struct vcpu_svm { bool pause_filter_enabled : 1; bool pause_threshold_enabled : 1; bool vgif_enabled : 1; + bool avic_enabled : 1; u32 ldr_reg; u32 dfr_reg; struct page *avic_backing_page; u64 *avic_physical_id_cache; + bool nested_avic_active; /* * Per-vcpu list of struct amd_svm_iommu_ir: @@ -614,6 +692,11 @@ int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu); int avic_init_vcpu(struct vcpu_svm *svm); void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void __avic_vcpu_put(struct kvm_vcpu *vcpu); +void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu); +void __nested_avic_put(struct kvm_vcpu *vcpu); +void nested_avic_load(struct kvm_vcpu *vcpu); +void nested_avic_put(struct kvm_vcpu *vcpu); + void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu); void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu); void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu); @@ -627,6 +710,73 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu); void avic_vcpu_unblocking(struct kvm_vcpu *vcpu); void avic_ring_doorbell(struct kvm_vcpu *vcpu); unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); +int avic_emulate_doorbell_write(struct kvm_vcpu *vcpu, u64 data); +void avic_reload_apic_pages(struct kvm_vcpu *vcpu); +void avic_free_nested(struct kvm_vcpu *vcpu); +bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu); + +struct avic_physid_table * +avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn); +void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t); +int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, + struct avic_physid_table *t, int nentries); + +static inline bool nested_avic_in_use(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *vcpu_svm = to_svm(vcpu); + + if (!vcpu_svm->avic_enabled) + return false; + + if (!nested_npt_enabled(vcpu_svm)) + return false; + + return vcpu_svm->nested.ctl.int_ctl & AVIC_ENABLE_MASK; +} + +#define INVALID_BACKING_PAGE (~(u64)0) + +static inline u64 physid_entry_get_backing_table(u64 entry) +{ + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return INVALID_BACKING_PAGE; + return entry & AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK; +} + +static inline int physid_entry_get_apicid(u64 entry) +{ + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return -1; + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) + return -1; + + return entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; +} + +static inline int logid_get_physid(u64 entry) +{ + if (!(entry & AVIC_LOGICAL_ID_ENTRY_VALID_BIT)) + return -1; + return entry & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK; +} + +static inline void physid_entry_set_backing_table(u64 *entry, u64 value) +{ + *entry &= ~AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK; + *entry |= (AVIC_PHYSICAL_ID_ENTRY_VALID_MASK | value); +} + +static inline void physid_entry_set_apicid(u64 *entry, int value) +{ + WARN_ON(!(*entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)); + + *entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + + if (value == -1) + *entry &= ~(AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + else + *entry |= (AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK | value); +} /* sev.c */ diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index e3a24b8f04be8..e063580559e9f 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -1385,7 +1385,7 @@ TRACE_EVENT(kvm_apicv_accept_irq, ); /* - * Tracepoint for AMD AVIC + * Tracepoints for AMD AVIC */ TRACE_EVENT(kvm_avic_incomplete_ipi, TP_PROTO(u32 vcpu, u32 icrh, u32 icrl, u32 id, u32 index), @@ -1459,6 +1459,144 @@ TRACE_EVENT(kvm_avic_ga_log, __entry->vmid, __entry->vcpuid) ); + +TRACE_EVENT(kvm_avic_physid_table_alloc, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64, gpa) + ), + + TP_fast_assign( + __entry->gpa = gpa; + ), + + TP_printk("table at gpa 0x%llx", + __entry->gpa) +); + + +TRACE_EVENT(kvm_avic_physid_table_free, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64, gpa) + ), + + TP_fast_assign( + __entry->gpa = gpa; + ), + + TP_printk("table at gpa 0x%llx", + __entry->gpa) +); + +TRACE_EVENT(kvm_avic_physid_table_reload, + TP_PROTO(u64 gpa, int nentries, int new_nentires), + TP_ARGS(gpa, nentries, new_nentires), + + TP_STRUCT__entry( + __field(u64, gpa) + __field(int, nentries) + __field(int, new_nentires) + ), + + TP_fast_assign( + __entry->gpa = gpa; + __entry->nentries = nentries; + __entry->new_nentires = new_nentires; + ), + + TP_printk("table at gpa 0x%llx, nentires %d -> %d", + __entry->gpa, __entry->nentries, __entry->new_nentires) +); + +TRACE_EVENT(kvm_avic_physid_table_write, + TP_PROTO(u64 gpa, int bytes), + TP_ARGS(gpa, bytes), + + TP_STRUCT__entry( + __field(u64, gpa) + __field(int, bytes) + ), + + TP_fast_assign( + __entry->gpa = gpa; + __entry->bytes = bytes; + ), + + TP_printk("gpa 0x%llx, write of %d bytes", + __entry->gpa, __entry->bytes) +); + +TRACE_EVENT(kvm_avic_physid_update_vcpu, + TP_PROTO(int vcpu_id, int cpu_id, int n), + TP_ARGS(vcpu_id, cpu_id, n), + + TP_STRUCT__entry( + __field(int, vcpu_id) + __field(int, cpu_id) + __field(int, n) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + __entry->cpu_id = cpu_id; + __entry->n = n; + ), + + TP_printk("vcpu %d cpu %d (%d entries)", + __entry->vcpu_id, __entry->cpu_id, __entry->n) +); + +TRACE_EVENT(kvm_avic_nested_doorbell, + TP_PROTO(int source_l1_apicid, int target_l1_apicid, bool target_nested, + bool target_running), + TP_ARGS(source_l1_apicid, target_l1_apicid, target_nested, + target_running), + + TP_STRUCT__entry( + __field(int, source_l1_apicid) + __field(int, target_l1_apicid) + __field(bool, target_nested) + __field(bool, target_running) + ), + + TP_fast_assign( + __entry->source_l1_apicid = source_l1_apicid; + __entry->target_l1_apicid = target_l1_apicid; + __entry->target_nested = target_nested; + __entry->target_running = target_running; + ), + + TP_printk("source %d target %d (nested: %d, running %d)", + __entry->source_l1_apicid, __entry->target_l1_apicid, + __entry->target_nested, __entry->target_running) +); + +TRACE_EVENT(kvm_avic_nested_kick_vcpu, + TP_PROTO(int source_l1_apic_id, int target_l2_apic_id, int target_l1_apic_id), + TP_ARGS(source_l1_apic_id, target_l2_apic_id, target_l1_apic_id), + + TP_STRUCT__entry( + __field(int, source_l1_apic_id) + __field(int, target_l2_apic_id) + __field(int, target_l1_apic_id) + ), + + TP_fast_assign( + __entry->source_l1_apic_id = source_l1_apic_id; + __entry->target_l2_apic_id = target_l2_apic_id; + __entry->target_l1_apic_id = target_l1_apic_id; + ), + + TP_printk("source l1 apic id: %d target l2 apic id: %d target l1 apic_id: %d", + __entry->source_l1_apic_id, __entry->target_l2_apic_id, + __entry->target_l1_apic_id) +); + TRACE_EVENT(kvm_hv_timer_state, TP_PROTO(unsigned int vcpu_id, unsigned int hv_timer_in_use), TP_ARGS(vcpu_id, hv_timer_in_use), diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ac2d0134271b..94c663a555a0c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13063,9 +13063,20 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window_update); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_alloc); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_free); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_reload); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_write); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_update_vcpu); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_nested_doorbell); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_nested_kick_vcpu); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_accept_irq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit); From patchwork Thu Apr 21 05:12:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12821105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D28C3C433F5 for ; Thu, 21 Apr 2022 05:13:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B6C0C10F37F; Thu, 21 Apr 2022 05:13:57 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5C0EA10F387 for ; Thu, 21 Apr 2022 05:13:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TE0/IyhbX+PJJtmqWsYqEi/mVebhKn0u1Uz64oTynUA=; b=ga47WoLeZCY8SIQ6L9SL4GA1sItR4l8Tr6jhduHspe/tifhPuoXSDfavwet5h1yeW63FW0 NQpsB5Psc4GdqcAMNzJyQmJiRMq6kzmKGOr2PMRFdMHPxI0wI9conLFHPjP69ff+TebMZx 8Zz36HknXcgJgh5T6tc2Yctt7raquUo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-212-BwterifsOBidoQRuIu9gkQ-1; Thu, 21 Apr 2022 01:13:50 -0400 X-MC-Unique: BwterifsOBidoQRuIu9gkQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5DC1A80005D; Thu, 21 Apr 2022 05:13:49 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0CEB1145BA5A; Thu, 21 Apr 2022 05:13:43 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v2 10/10] KVM: SVM: allow to avoid not needed updates to is_running Date: Thu, 21 Apr 2022 08:12:44 +0300 Message-Id: <20220421051244.187733-11-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Dave Hansen , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Allow optionally to make KVM not update is_running unless it is functionally needed which is only when a vCPU halts, or is in the guest mode. This means security wise that if a vCPU is scheduled out, other vCPUs could still send doorbell messages to the last physical CPU where this vCPU was last running. If a malicious guest tries to do it can slow down the victim CPU by about 40% in my testing, so this should only be enabled if physical CPUs are not shared among guests. The option is avic_doorbell_strict and is true by default, setting it to false allows this relaxed non strict mode. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 19 ++++++++++++------- arch/x86/kvm/svm/svm.c | 19 ++++++++++++++----- arch/x86/kvm/svm/svm.h | 1 + 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 9176c35662ada..1bfe58ee961b2 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -1641,7 +1641,7 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - u64 entry; + u64 old_entry, new_entry; int h_physical_id = kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm = to_svm(vcpu); @@ -1660,14 +1660,16 @@ void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (kvm_vcpu_is_blocking(vcpu)) return; - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + old_entry = READ_ONCE(*(svm->avic_physical_id_cache)); + new_entry = old_entry; - entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; - entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); - entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + new_entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + new_entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); + new_entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + + if (old_entry != new_entry) + WRITE_ONCE(*(svm->avic_physical_id_cache), new_entry); - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true); } @@ -1777,6 +1779,9 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) void avic_vcpu_blocking(struct kvm_vcpu *vcpu) { + if (!avic_doorbell_strict) + __nested_avic_put(vcpu); + if (!kvm_vcpu_apicv_active(vcpu)) return; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 3d9ab1e7b2b52..7e79fefc81650 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -190,6 +190,10 @@ module_param(avic, bool, 0444); static bool force_avic; module_param_unsafe(force_avic, bool, 0444); +bool avic_doorbell_strict = true; +module_param(avic_doorbell_strict, bool, 0444); + + bool __read_mostly dump_invalid_vmcb; module_param(dump_invalid_vmcb, bool, 0644); @@ -1395,16 +1399,21 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_load(vcpu, cpu); - __nested_avic_load(vcpu, cpu); } static void svm_vcpu_put(struct kvm_vcpu *vcpu) { - if (kvm_vcpu_apicv_active(vcpu)) - __avic_vcpu_put(vcpu); - - __nested_avic_put(vcpu); + /* + * Forbid AVIC's peers to send interrupts + * to this CPU unless we are in non strict mode, + * in which case, we will do so only when this vCPU blocks + */ + if (avic_doorbell_strict) { + if (kvm_vcpu_apicv_active(vcpu)) + __avic_vcpu_put(vcpu); + __nested_avic_put(vcpu); + } svm_prepare_host_switch(vcpu); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 7d1a5028750e6..7139bbb534f9e 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -36,6 +36,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; extern bool npt_enabled; extern int vgif; extern bool intercept_smi; +extern bool avic_doorbell_strict; /* * Clean bits in VMCB.