From patchwork Wed Apr 27 20:02:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829508 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 97D27C433FE for ; Wed, 27 Apr 2022 20:03:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 55D2B10E321; Wed, 27 Apr 2022 20:03:36 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1966F10E315 for ; Wed, 27 Apr 2022 20:03:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089812; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O/591BE30Gw+sLgR7wcCQy4XNcbdAfVRmL/PYPXct/c=; b=d3oqXRNYd5pHMB3mzaVN1vlhLNn76DvyhDoQc7EXA5RVPW7Z94BHo4MRJ7JN6wEhy2nELg JV0QFfmWD8nyiiLPwMsOnizRQSQ6ljapxR7hwhwmhnLVgcwcpw5eFMEKXnxF4HWkhjnVDk qOAhuXllJe9doEzIvJvq38FYHPyhosM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-550-dQOqBeYMObyNtmOAhvPWIA-1; Wed, 27 Apr 2022 16:03:30 -0400 X-MC-Unique: dQOqBeYMObyNtmOAhvPWIA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 06EDD8039D7; Wed, 27 Apr 2022 20:03:29 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6323F9E82; Wed, 27 Apr 2022 20:03:23 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 01/19] KVM: x86: document AVIC/APICv inhibit reasons Date: Wed, 27 Apr 2022 23:02:56 +0300 Message-Id: <20220427200314.276673-2-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm_host.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f164c6c1514a4..63eae00625bda 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1046,14 +1046,29 @@ struct kvm_x86_msr_filter { }; enum kvm_apicv_inhibit { + /* APICv/AVIC is disabled by module param and/or not supported in hardware */ APICV_INHIBIT_REASON_DISABLE, + /* APICv/AVIC is inhibited because AutoEOI feature is being used by a HyperV guest*/ APICV_INHIBIT_REASON_HYPERV, + /* AVIC is inhibited on a CPU because it runs a nested guest */ APICV_INHIBIT_REASON_NESTED, + /* AVIC is inhibited due to wait for an irq window (AVIC doesn't support this) */ APICV_INHIBIT_REASON_IRQWIN, + /* + * AVIC is inhibited because i8254 're-inject' mode is used + * which needs EOI intercept which AVIC doesn't support + */ APICV_INHIBIT_REASON_PIT_REINJ, + /* AVIC is inhibited because the guest has x2apic in its CPUID*/ APICV_INHIBIT_REASON_X2APIC, + /* AVIC/APICv is inhibited because KVM_GUESTDBG_BLOCKIRQ was enabled */ APICV_INHIBIT_REASON_BLOCKIRQ, + /* + * AVIC/APICv is inhibited because the guest didn't yet + * enable kernel/split irqchip + */ APICV_INHIBIT_REASON_ABSENT, + /* AVIC is disabled because SEV doesn't support it */ APICV_INHIBIT_REASON_SEV, }; From patchwork Wed Apr 27 20:02:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B69BCC43217 for ; Wed, 27 Apr 2022 20:03:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DA32F10E399; Wed, 27 Apr 2022 20:03:45 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 57B5610E33A for ; Wed, 27 Apr 2022 20:03:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089819; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d3HuyXBj7sbCyPJII4k+rhKVq+N2QJKVq0tQDMGX/Io=; b=eYGOCeGS/KgeYmQ+KKojnUECmm2401/HTJeTRZiPbiywfS0SbGfiB4DwMSyXfRG+yHL1ZH iNLX89/xig8lwe/xds+NA5995zxKBs48z12OZWWB8ZuRpessXpt4oKe5Ayaw8/Xe4cjplb iRrQJEEvEzUCM2GerlHmIvW1CE5nW5k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-99-gDqYxOC3NJe1YVvQKWXlBQ-1; Wed, 27 Apr 2022 16:03:36 -0400 X-MC-Unique: gDqYxOC3NJe1YVvQKWXlBQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 92F9D101AA44; Wed, 27 Apr 2022 20:03:34 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2AE3D9E82; Wed, 27 Apr 2022 20:03:28 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 02/19] KVM: x86: inhibit APICv/AVIC when the guest and/or host changes apic id/base from the defaults. Date: Wed, 27 Apr 2022 23:02:57 +0300 Message-Id: <20220427200314.276673-3-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit it instead. Also add a boolean 'apic_id_changed' to indicate if apic id ever changed. Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/lapic.c | 25 ++++++++++++++++++++++--- arch/x86/kvm/lapic.h | 8 ++++++++ 3 files changed, 33 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 63eae00625bda..636df87542555 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1070,6 +1070,8 @@ enum kvm_apicv_inhibit { APICV_INHIBIT_REASON_ABSENT, /* AVIC is disabled because SEV doesn't support it */ APICV_INHIBIT_REASON_SEV, + /* APIC ID and/or APIC base was changed by the guest */ + APICV_INHIBIT_REASON_RO_SETTINGS, }; struct kvm_arch { @@ -1258,6 +1260,7 @@ struct kvm_arch { hpa_t hv_root_tdp; spinlock_t hv_root_tdp_lock; #endif + bool apic_id_changed; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66b0eb0bda94e..8996675b3ef4c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2038,6 +2038,19 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val) } } +static void kvm_lapic_check_initial_apic_id(struct kvm_lapic *apic) +{ + if (kvm_apic_has_initial_apic_id(apic)) + return; + + pr_warn_once("APIC ID change is unsupported by KVM"); + + kvm_set_apicv_inhibit(apic->vcpu->kvm, + APICV_INHIBIT_REASON_RO_SETTINGS); + + apic->vcpu->kvm->arch.apic_id_changed = true; +} + static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) { int ret = 0; @@ -2046,9 +2059,11 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) + if (!apic_x2apic_mode(apic)) { + kvm_apic_set_xapic_id(apic, val >> 24); - else + kvm_lapic_check_initial_apic_id(apic); + } else ret = 1; break; @@ -2335,8 +2350,11 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) MSR_IA32_APICBASE_BASE; if ((value & MSR_IA32_APICBASE_ENABLE) && - apic->base_address != APIC_DEFAULT_PHYS_BASE) + apic->base_address != APIC_DEFAULT_PHYS_BASE) { + kvm_set_apicv_inhibit(apic->vcpu->kvm, + APICV_INHIBIT_REASON_RO_SETTINGS); pr_warn_once("APIC base relocation is unsupported by KVM"); + } } void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) @@ -2649,6 +2667,7 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, } } + kvm_lapic_check_initial_apic_id(vcpu->arch.apic); return 0; } diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 4e4f8a22754f9..b9c406d383080 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -252,4 +252,12 @@ static inline u8 kvm_xapic_id(struct kvm_lapic *apic) return kvm_lapic_get_reg(apic, APIC_ID) >> 24; } +static inline bool kvm_apic_has_initial_apic_id(struct kvm_lapic *apic) +{ + if (apic_x2apic_mode(apic)) + return true; + + return kvm_xapic_id(apic) == apic->vcpu->vcpu_id; +} + #endif From patchwork Wed Apr 27 20:02:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829510 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 49B86C433EF for ; Wed, 27 Apr 2022 20:03:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 573E210E361; Wed, 27 Apr 2022 20:03:51 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 251D210E385 for ; Wed, 27 Apr 2022 20:03:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JICLMFpK3jJ3Opx/D6G7EacoR9G87P/Z34aQ6aS+ZJc=; b=BToVngUbbuYZMm/VPVkRUzKdw3Rcmp0fIM3gA7WcivIHs4KdPJwaI9tKy5r4Gczt0aSJcW tWAO0zXruHLgJ7PReyuWIJyUQUMmY2FiAvYbpdCfcioKEk8VS278rc/Dkzw/KJEonegnMd cddPvwuyWLSdGYRJLPyLvT+Hp+AYsr4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-120-Sy16V930P4CpskDb-3jrlQ-1; Wed, 27 Apr 2022 16:03:41 -0400 X-MC-Unique: Sy16V930P4CpskDb-3jrlQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 61132802803; Wed, 27 Apr 2022 20:03:40 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id E90F79E84; Wed, 27 Apr 2022 20:03:34 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 03/19] KVM: x86: SVM: remove avic's broken code that updated APIC ID Date: Wed, 27 Apr 2022 23:02:58 +0300 Message-Id: <20220427200314.276673-4-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" AVIC is now inhibited if the guest changes apic id, thus remove that broken code. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 35 ----------------------------------- 1 file changed, 35 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 54fe03714f8a6..1102421668a11 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -508,35 +508,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu) return ret; } -static int avic_handle_apic_id_update(struct kvm_vcpu *vcpu) -{ - u64 *old, *new; - struct vcpu_svm *svm = to_svm(vcpu); - u32 id = kvm_xapic_id(vcpu->arch.apic); - - if (vcpu->vcpu_id == id) - return 0; - - old = avic_get_physical_id_entry(vcpu, vcpu->vcpu_id); - new = avic_get_physical_id_entry(vcpu, id); - if (!new || !old) - return 1; - - /* We need to move physical_id_entry to new offset */ - *new = *old; - *old = 0ULL; - to_svm(vcpu)->avic_physical_id_cache = new; - - /* - * Also update the guest physical APIC ID in the logical - * APIC ID table entry if already setup the LDR. - */ - if (svm->ldr_reg) - avic_handle_ldr_update(vcpu); - - return 0; -} - static void avic_handle_dfr_update(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -555,10 +526,6 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu) AVIC_UNACCEL_ACCESS_OFFSET_MASK; switch (offset) { - case APIC_ID: - if (avic_handle_apic_id_update(vcpu)) - return 0; - break; case APIC_LDR: if (avic_handle_ldr_update(vcpu)) return 0; @@ -650,8 +617,6 @@ int avic_init_vcpu(struct vcpu_svm *svm) void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu) { - if (avic_handle_apic_id_update(vcpu) != 0) - return; avic_handle_dfr_update(vcpu); avic_handle_ldr_update(vcpu); } From patchwork Wed Apr 27 20:02:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829512 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DB819C43217 for ; Wed, 27 Apr 2022 20:04:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 550F510E18B; Wed, 27 Apr 2022 20:04:00 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB96610E4F9 for ; Wed, 27 Apr 2022 20:03:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089831; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q/G/1I1AFGV7/M3Oqa7HyuoKCDAEb6r9zT6yudtrsQI=; b=fSEmqSTZc62cmEV6+h2o12+tVCXWgBGzSFnM0v2twRc5+7sQloYrr5F0yORUiqQN9Gd9rw OmhyixCJt0JBAtmOxvKMI+DcBVoioBITI/i0h2h3O59VRQxWRGLizO90yvBsrLYFWN/j1G VbaSiu2ROE2MCDke8ea7BBZYx7O/pKM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-280-cESqV5ndMp6vWCOPK914cw-1; Wed, 27 Apr 2022 16:03:47 -0400 X-MC-Unique: cESqV5ndMp6vWCOPK914cw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2F8B785A5BC; Wed, 27 Apr 2022 20:03:46 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6D479E74; Wed, 27 Apr 2022 20:03:40 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 04/19] KVM: x86: mmu: allow to enable write tracking externally Date: Wed, 27 Apr 2022 23:02:59 +0300 Message-Id: <20220427200314.276673-5-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This will be used to enable write tracking from nested AVIC code and can also be used to enable write tracking in GVT-g module when it actually uses it as opposed to always enabling it, when the module is compiled in the kernel. No functional change intended. Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/kvm_page_track.h | 1 + arch/x86/kvm/mmu.h | 8 +++++--- arch/x86/kvm/mmu/mmu.c | 17 ++++++++++------- arch/x86/kvm/mmu/page_track.c | 10 ++++++++-- 5 files changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 636df87542555..fc7df778a3d71 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1254,7 +1254,7 @@ struct kvm_arch { * is used as one input when determining whether certain memslot * related allocations are necessary. */ - bool shadow_root_allocated; + bool mmu_page_tracking_enabled; #if IS_ENABLED(CONFIG_HYPERV) hpa_t hv_root_tdp; diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index eb186bc57f6a9..955a5ae07b10e 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -50,6 +50,7 @@ int kvm_page_track_init(struct kvm *kvm); void kvm_page_track_cleanup(struct kvm *kvm); bool kvm_page_track_write_tracking_enabled(struct kvm *kvm); +int kvm_page_track_write_tracking_enable(struct kvm *kvm); int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot); void kvm_page_track_free_memslot(struct kvm_memory_slot *slot); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 671cfeccf04e9..44d15551f7156 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -269,7 +269,7 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); -static inline bool kvm_shadow_root_allocated(struct kvm *kvm) +static inline bool mmu_page_tracking_enabled(struct kvm *kvm) { /* * Read shadow_root_allocated before related pointers. Hence, threads @@ -277,9 +277,11 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm) * see the pointers. Pairs with smp_store_release in * mmu_first_shadow_root_alloc. */ - return smp_load_acquire(&kvm->arch.shadow_root_allocated); + return smp_load_acquire(&kvm->arch.mmu_page_tracking_enabled); } +int mmu_enable_write_tracking(struct kvm *kvm); + #ifdef CONFIG_X86_64 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; } #else @@ -288,7 +290,7 @@ static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; } static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { - return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm); + return !is_tdp_mmu_enabled(kvm) || mmu_page_tracking_enabled(kvm); } static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 904f0faff2186..fb744616bf7df 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3389,7 +3389,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) return r; } -static int mmu_first_shadow_root_alloc(struct kvm *kvm) +int mmu_enable_write_tracking(struct kvm *kvm) { struct kvm_memslots *slots; struct kvm_memory_slot *slot; @@ -3399,21 +3399,20 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm) * Check if this is the first shadow root being allocated before * taking the lock. */ - if (kvm_shadow_root_allocated(kvm)) + if (mmu_page_tracking_enabled(kvm)) return 0; mutex_lock(&kvm->slots_arch_lock); /* Recheck, under the lock, whether this is the first shadow root. */ - if (kvm_shadow_root_allocated(kvm)) + if (mmu_page_tracking_enabled(kvm)) goto out_unlock; /* * Check if anything actually needs to be allocated, e.g. all metadata * will be allocated upfront if TDP is disabled. */ - if (kvm_memslots_have_rmaps(kvm) && - kvm_page_track_write_tracking_enabled(kvm)) + if (kvm_memslots_have_rmaps(kvm) && mmu_page_tracking_enabled(kvm)) goto out_success; for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { @@ -3443,7 +3442,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm) * all the related pointers are set. */ out_success: - smp_store_release(&kvm->arch.shadow_root_allocated, true); + smp_store_release(&kvm->arch.mmu_page_tracking_enabled, true); out_unlock: mutex_unlock(&kvm->slots_arch_lock); @@ -3480,7 +3479,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } } - r = mmu_first_shadow_root_alloc(vcpu->kvm); + r = mmu_enable_write_tracking(vcpu->kvm); if (r) return r; @@ -5753,6 +5752,10 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_write = kvm_mmu_pte_write; node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); + + if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled) + mmu_enable_write_tracking(kvm); + return 0; } diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 2e09d1b6249f3..8857d629036d7 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -21,10 +21,16 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) { - return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || - !tdp_enabled || kvm_shadow_root_allocated(kvm); + return mmu_page_tracking_enabled(kvm); } +int kvm_page_track_write_tracking_enable(struct kvm *kvm) +{ + return mmu_enable_write_tracking(kvm); +} +EXPORT_SYMBOL_GPL(kvm_page_track_write_tracking_enable); + + void kvm_page_track_free_memslot(struct kvm_memory_slot *slot) { int i; From patchwork Wed Apr 27 20:03:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1F451C4332F for ; Wed, 27 Apr 2022 20:04:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1EADA10E380; Wed, 27 Apr 2022 20:04:00 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id C8A2210E18B for ; Wed, 27 Apr 2022 20:03:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089837; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DHlSw/zlCfAnRSb3YWptJwBx7kzIW2QjNrqPPy0gN4Q=; b=LOi1oidTUTmolTlJ8R2Bdw4O4DL7WnqJz86oirdr0RTjvgDBydcJErVDtFL7gHHMyRbT4S UverIAZz2eGuBGdAjfmgLDEQ8PSGTrE0hmsv8eL/xZ8zlDYxfROzgEBu4akIJUD9yFaUQK dDRJhBEIn64lUthkm6a2hRahaLLjUeI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-663-C3yWpEYEMr-GTHr4uc5AKg-1; Wed, 27 Apr 2022 16:03:53 -0400 X-MC-Unique: C3yWpEYEMr-GTHr4uc5AKg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 008831C068C1; Wed, 27 Apr 2022 20:03:52 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 850449E84; Wed, 27 Apr 2022 20:03:46 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 05/19] x86: KVMGT: use kvm_page_track_write_tracking_enable Date: Wed, 27 Apr 2022 23:03:00 +0300 Message-Id: <20220427200314.276673-6-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This allows to enable the write tracking only when KVMGT is actually used and doesn't carry any penalty otherwise. Tested by booting a VM with a kvmgt mdev device. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/Kconfig | 3 --- arch/x86/kvm/mmu/mmu.c | 2 +- drivers/gpu/drm/i915/Kconfig | 1 - drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +++++ 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index e3cbd77061364..41341905d3734 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -126,7 +126,4 @@ config KVM_XEN If in doubt, say "N". -config KVM_EXTERNAL_WRITE_TRACKING - bool - endif # VIRTUALIZATION diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index fb744616bf7df..633a3138d68e1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5753,7 +5753,7 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); - if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled) + if (!tdp_enabled) mmu_enable_write_tracking(kvm); return 0; diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 98c5450b8eacc..7d8346f4bae11 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -130,7 +130,6 @@ config DRM_I915_GVT_KVMGT depends on DRM_I915_GVT depends on KVM depends on VFIO_MDEV - select KVM_EXTERNAL_WRITE_TRACKING default n help Choose this option if you want to enable KVMGT support for diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 057ec44901045..4c62ab3ef245d 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1933,6 +1933,7 @@ static int kvmgt_guest_init(struct mdev_device *mdev) struct intel_vgpu *vgpu; struct kvmgt_vdev *vdev; struct kvm *kvm; + int ret; vgpu = mdev_get_drvdata(mdev); if (handle_valid(vgpu->handle)) @@ -1948,6 +1949,10 @@ static int kvmgt_guest_init(struct mdev_device *mdev) if (__kvmgt_vgpu_exist(vgpu, kvm)) return -EEXIST; + ret = kvm_page_track_write_tracking_enable(kvm); + if (ret) + return ret; + info = vzalloc(sizeof(struct kvmgt_guest_info)); if (!info) return -ENOMEM; From patchwork Wed Apr 27 20:03:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 342BAC433FE for ; Wed, 27 Apr 2022 20:04:10 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5CDDA10E3C4; Wed, 27 Apr 2022 20:04:09 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7843510E385 for ; Wed, 27 Apr 2022 20:04:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5bCkYEOsNHgNL9DAc6eiIfRMxj3itayT5GreRyCWKFU=; b=hXvZ0Ik9z7z4FXVwKdeR5uavEXnFJK72wKmNW7wVLiW654mw4fkBSiNcPQamO3+s3v2VMf u+Jdw51U+V14WHNPQcEYMBwKTQ22k1PxWIEPS44qoo/jF154Gygip8vemiW6n5MYQOnkgJ JKmuXeQ9k7zClrojeIVev6e554sBYYY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-202-gBR3848rPtKLahtijeTJ8g-1; Wed, 27 Apr 2022 16:03:59 -0400 X-MC-Unique: gBR3848rPtKLahtijeTJ8g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BF502811E75; Wed, 27 Apr 2022 20:03:57 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5699B9E84; Wed, 27 Apr 2022 20:03:52 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 06/19] KVM: x86: mmu: add gfn_in_memslot helper Date: Wed, 27 Apr 2022 23:03:01 +0300 Message-Id: <20220427200314.276673-7-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This is a tiny refactoring, and can be useful to check if a GPA/GFN is within a memslot a bit more cleanly. Signed-off-by: Maxim Levitsky --- include/linux/kvm_host.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 252ee4a61b58b..12e261559070b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1580,6 +1580,13 @@ int kvm_request_irq_source_id(struct kvm *kvm); void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args); + +static inline bool gfn_in_memslot(struct kvm_memory_slot *slot, gfn_t gfn) +{ + return (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages); +} + + /* * Returns a pointer to the memslot if it contains gfn. * Otherwise returns NULL. @@ -1590,12 +1597,13 @@ try_get_memslot(struct kvm_memory_slot *slot, gfn_t gfn) if (!slot) return NULL; - if (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages) + if (gfn_in_memslot(slot, gfn)) return slot; else return NULL; } + /* * Returns a pointer to the memslot that contains gfn. Otherwise returns NULL. * From patchwork Wed Apr 27 20:03:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AA83EC433F5 for ; Wed, 27 Apr 2022 20:04:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 70FF610E3D6; Wed, 27 Apr 2022 20:04:10 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0E81910E385 for ; Wed, 27 Apr 2022 20:04:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089848; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EmBISr5PcGWBT/vRUsWZC642bpN2n5dwnWOmG20jA/o=; b=i2l6qXsvFoEPSu4NIoMMQ6gc1Rs9/JnDwsNY71//DuMmS4ntN9MW6uC1dv2/E1NShcBxFj dz/mCWra18sw/rOoypmsFBoOyBxXW8PtAK2g8deJXjDNGPDt4b7FFrXAR3EWtiEMlLTMgX ZXGk0XXEKXsyA392JZTo75YGlbnzabg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-aIzHEq1EN0Kqjj0eP2PcIg-1; Wed, 27 Apr 2022 16:04:04 -0400 X-MC-Unique: aIzHEq1EN0Kqjj0eP2PcIg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 87E6F1014A62; Wed, 27 Apr 2022 20:04:03 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 20B2C9E74; Wed, 27 Apr 2022 20:03:57 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 07/19] KVM: x86: mmu: tweak fast path for emulation of access to nested NPT pages Date: Wed, 27 Apr 2022 23:03:02 +0300 Message-Id: <20220427200314.276673-8-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If a non leaf mmu page is write tracked externally for some reason, which can in theory happen if it was used for nested avic physid page before, then this code will enter an endless loop of page faults because unprotecting the mmu page will not remove write tracking, nor will the write tracker callback be called, because there is no mmu page at this address. Fix this by only invoking the fast path if we succeeded in zapping the mmu page. Fixes: 147277540bbc5 ("kvm: svm: Add support for additional SVM NPF error codes") Signed-off-by: Maxim Levitsky --- arch/x86/kvm/mmu/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 633a3138d68e1..8f77d41e7fd80 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5341,8 +5341,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, */ if (vcpu->arch.mmu->root_role.direct && (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) { - kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); - return 1; + if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa))) + return 1; } /* From patchwork Wed Apr 27 20:03:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0EA16C433EF for ; Wed, 27 Apr 2022 20:04:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4064410E3F3; Wed, 27 Apr 2022 20:04:22 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 94CB210E40A for ; Wed, 27 Apr 2022 20:04:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089854; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JTNl1nz6wNzLfnJ32LPzyYG5FM9y1gbwoESEN5pVrpg=; b=F2gew+mEsPX7YKQd/NIrWlZXm3TkH4rgbfTdFZdYks7lzFhteIn5m/f5F4EGFQDIwdX43s cGGv8pmAbYU1DHiHsunAGiciTSVZEHSHOlH4AwwGbuBLDw5xYSTfC/DEJ4v/HggXGoRdHg S4JaWiULf7iNN/2EJHNfP2zzgX3lgh0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-259-YXJwPXWiMoq_b70twTFq2w-1; Wed, 27 Apr 2022 16:04:11 -0400 X-MC-Unique: YXJwPXWiMoq_b70twTFq2w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E112E85A5BE; Wed, 27 Apr 2022 20:04:09 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD8F89E92; Wed, 27 Apr 2022 20:04:03 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 08/19] KVM: x86: SVM: move avic state to separate struct Date: Wed, 27 Apr 2022 23:03:03 +0300 Message-Id: <20220427200314.276673-9-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This will make the code a bit easier to read when nested AVIC support is added. No functional change intended. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 51 +++++++++++++++++++++++------------------ arch/x86/kvm/svm/svm.h | 14 ++++++----- 2 files changed, 37 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 1102421668a11..e5cbbb97fbab6 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -69,6 +69,8 @@ int avic_ga_log_notifier(u32 ga_tag) unsigned long flags; struct kvm_svm *kvm_svm; struct kvm_vcpu *vcpu = NULL; + struct kvm_svm_avic *avic; + u32 vm_id = AVIC_GATAG_TO_VMID(ga_tag); u32 vcpu_id = AVIC_GATAG_TO_VCPUID(ga_tag); @@ -76,9 +78,13 @@ int avic_ga_log_notifier(u32 ga_tag) trace_kvm_avic_ga_log(vm_id, vcpu_id); spin_lock_irqsave(&svm_vm_data_hash_lock, flags); - hash_for_each_possible(svm_vm_data_hash, kvm_svm, hnode, vm_id) { - if (kvm_svm->avic_vm_id != vm_id) + hash_for_each_possible(svm_vm_data_hash, avic, hnode, vm_id) { + + + if (avic->vm_id != vm_id) continue; + + kvm_svm = container_of(avic, struct kvm_svm, avic); vcpu = kvm_get_vcpu_by_id(&kvm_svm->kvm, vcpu_id); break; } @@ -98,18 +104,18 @@ int avic_ga_log_notifier(u32 ga_tag) void avic_vm_destroy(struct kvm *kvm) { unsigned long flags; - struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; if (!enable_apicv) return; - if (kvm_svm->avic_logical_id_table_page) - __free_page(kvm_svm->avic_logical_id_table_page); - if (kvm_svm->avic_physical_id_table_page) - __free_page(kvm_svm->avic_physical_id_table_page); + if (avic->logical_id_table_page) + __free_page(avic->logical_id_table_page); + if (avic->physical_id_table_page) + __free_page(avic->physical_id_table_page); spin_lock_irqsave(&svm_vm_data_hash_lock, flags); - hash_del(&kvm_svm->hnode); + hash_del(&avic->hnode); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); } @@ -117,10 +123,9 @@ int avic_vm_init(struct kvm *kvm) { unsigned long flags; int err = -ENOMEM; - struct kvm_svm *kvm_svm = to_kvm_svm(kvm); - struct kvm_svm *k2; struct page *p_page; struct page *l_page; + struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; u32 vm_id; if (!enable_apicv) @@ -131,14 +136,14 @@ int avic_vm_init(struct kvm *kvm) if (!p_page) goto free_avic; - kvm_svm->avic_physical_id_table_page = p_page; + avic->physical_id_table_page = p_page; /* Allocating logical APIC ID table (4KB) */ l_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!l_page) goto free_avic; - kvm_svm->avic_logical_id_table_page = l_page; + avic->logical_id_table_page = l_page; spin_lock_irqsave(&svm_vm_data_hash_lock, flags); again: @@ -149,13 +154,15 @@ int avic_vm_init(struct kvm *kvm) } /* Is it still in use? Only possible if wrapped at least once */ if (next_vm_id_wrapped) { - hash_for_each_possible(svm_vm_data_hash, k2, hnode, vm_id) { - if (k2->avic_vm_id == vm_id) + struct kvm_svm_avic *avic2; + + hash_for_each_possible(svm_vm_data_hash, avic2, hnode, vm_id) { + if (avic2->vm_id == vm_id) goto again; } } - kvm_svm->avic_vm_id = vm_id; - hash_add(svm_vm_data_hash, &kvm_svm->hnode, kvm_svm->avic_vm_id); + avic->vm_id = vm_id; + hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); return 0; @@ -169,8 +176,8 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb) { struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm); phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page)); - phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page)); - phys_addr_t ppa = __sme_set(page_to_phys(kvm_svm->avic_physical_id_table_page)); + phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic.logical_id_table_page)); + phys_addr_t ppa = __sme_set(page_to_phys(kvm_svm->avic.physical_id_table_page)); vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK; vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK; @@ -193,7 +200,7 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu, if (index >= AVIC_MAX_PHYSICAL_ID_COUNT) return NULL; - avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page); + avic_physical_id_table = page_address(kvm_svm->avic.physical_id_table_page); return &avic_physical_id_table[index]; } @@ -296,7 +303,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source int dest_mode = icrl & APIC_DEST_MASK; int shorthand = icrl & APIC_SHORT_MASK; struct kvm_svm *kvm_svm = to_kvm_svm(kvm); - u32 *avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page); + u32 *avic_logical_id_table = page_address(kvm_svm->avic.logical_id_table_page); if (shorthand != APIC_DEST_NOSHORT) return -EINVAL; @@ -453,7 +460,7 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat) index = (cluster << 2) + apic; } - logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page); + logical_apic_id_table = (u32 *) page_address(kvm_svm->avic.logical_id_table_page); return &logical_apic_id_table[index]; } @@ -803,7 +810,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq, /* Try to enable guest_mode in IRTE */ pi.base = __sme_set(page_to_phys(svm->avic_backing_page) & AVIC_HPA_MASK); - pi.ga_tag = AVIC_GATAG(to_kvm_svm(kvm)->avic_vm_id, + pi.ga_tag = AVIC_GATAG(to_kvm_svm(kvm)->avic.vm_id, svm->vcpu.vcpu_id); pi.is_guest_mode = true; pi.vcpu_data = &vcpu_info; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 32220a1b0ea20..6fcb164a6ee4a 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -88,15 +88,17 @@ struct kvm_sev_info { atomic_t migration_in_progress; }; -struct kvm_svm { - struct kvm kvm; - /* Struct members for AVIC */ - u32 avic_vm_id; - struct page *avic_logical_id_table_page; - struct page *avic_physical_id_table_page; +struct kvm_svm_avic { + u32 vm_id; + struct page *logical_id_table_page; + struct page *physical_id_table_page; struct hlist_node hnode; +}; +struct kvm_svm { + struct kvm kvm; + struct kvm_svm_avic avic; struct kvm_sev_info sev_info; }; From patchwork Wed Apr 27 20:03:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829516 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3F25C433EF for ; Wed, 27 Apr 2022 20:04:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 06F4810E439; Wed, 27 Apr 2022 20:04:31 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id BFFD710E439 for ; Wed, 27 Apr 2022 20:04:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089867; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qyiwndqLFW8lJocfITHVyIhR9sE91Sgya/K8+RHWWlQ=; b=cI2LtXaUAhmPc5BIJ82xQtyuC1rz6VWjMh4e6YJjiXWMODBwjA1hBUdH77+HtuL0Tkyja6 Anwc8qHlCcLZnRTGaeQ4/FbuajZI6m3FuuGvWWNGHhQEDfi2lvncoFwKzaXvJGzNfvpXhq PObLRyNwoA3dC0o3htCs35BepboKXIQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-221-IiLHPgneOzCvaq5st3HM4Q-1; Wed, 27 Apr 2022 16:04:24 -0400 X-MC-Unique: IiLHPgneOzCvaq5st3HM4Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 455C83C10AA1; Wed, 27 Apr 2022 20:04:23 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 42E607C55; Wed, 27 Apr 2022 20:04:10 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 09/19] KVM: x86: nSVM: add nested AVIC tracepoints Date: Wed, 27 Apr 2022 23:03:04 +0300 Message-Id: <20220427200314.276673-10-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This patch adds few tracepoints that will be used to debug/profile the nested AVIC. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/trace.h | 157 ++++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 13 ++++ 2 files changed, 169 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index de47625175692..f7ddba5ae06a5 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -1385,7 +1385,7 @@ TRACE_EVENT(kvm_apicv_accept_irq, ); /* - * Tracepoint for AMD AVIC + * Tracepoints for AMD AVIC */ TRACE_EVENT(kvm_avic_incomplete_ipi, TP_PROTO(u32 vcpu, u32 icrh, u32 icrl, u32 id, u32 index), @@ -1479,6 +1479,161 @@ TRACE_EVENT(kvm_avic_kick_vcpu_slowpath, __entry->icrh, __entry->icrl, __entry->index) ); +TRACE_EVENT(kvm_avic_physid_table_alloc, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64, gpa) + ), + + TP_fast_assign( + __entry->gpa = gpa; + ), + + TP_printk("table at gpa 0x%llx", + __entry->gpa) +); + + +TRACE_EVENT(kvm_avic_physid_table_free, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64, gpa) + ), + + TP_fast_assign( + __entry->gpa = gpa; + ), + + TP_printk("table at gpa 0x%llx", + __entry->gpa) +); + +TRACE_EVENT(kvm_avic_physid_table_reload, + TP_PROTO(u64 gpa, int nentries, int new_nentires), + TP_ARGS(gpa, nentries, new_nentires), + + TP_STRUCT__entry( + __field(u64, gpa) + __field(int, nentries) + __field(int, new_nentires) + ), + + TP_fast_assign( + __entry->gpa = gpa; + __entry->nentries = nentries; + __entry->new_nentires = new_nentires; + ), + + TP_printk("table at gpa 0x%llx, nentires %d -> %d", + __entry->gpa, __entry->nentries, __entry->new_nentires) +); + +TRACE_EVENT(kvm_avic_physid_table_write, + TP_PROTO(u64 gpa, int bytes), + TP_ARGS(gpa, bytes), + + TP_STRUCT__entry( + __field(u64, gpa) + __field(int, bytes) + ), + + TP_fast_assign( + __entry->gpa = gpa; + __entry->bytes = bytes; + ), + + TP_printk("gpa 0x%llx, write of %d bytes", + __entry->gpa, __entry->bytes) +); + +TRACE_EVENT(kvm_avic_physid_update_vcpu_host, + TP_PROTO(int vcpu_id, int cpu_id, int n), + TP_ARGS(vcpu_id, cpu_id, n), + + TP_STRUCT__entry( + __field(int, vcpu_id) + __field(int, cpu_id) + __field(int, n) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + __entry->cpu_id = cpu_id; + __entry->n = n; + ), + + TP_printk("l1 vcpu %d -> l0 cpu %d (%d entries)", + __entry->vcpu_id, __entry->cpu_id, __entry->n) +); + +TRACE_EVENT(kvm_avic_physid_update_vcpu_guest, + TP_PROTO(int vcpu_id, int cpu_id), + TP_ARGS(vcpu_id, cpu_id), + + TP_STRUCT__entry( + __field(int, vcpu_id) + __field(int, cpu_id) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + __entry->cpu_id = cpu_id; + ), + + TP_printk("l1 vcpu %d -> l0 cpu %d", + __entry->vcpu_id, __entry->cpu_id) +); + +TRACE_EVENT(kvm_avic_nested_doorbell, + TP_PROTO(int source_l1_apicid, int target_l1_apicid, bool target_nested, + bool target_running), + TP_ARGS(source_l1_apicid, target_l1_apicid, target_nested, + target_running), + + TP_STRUCT__entry( + __field(int, source_l1_apicid) + __field(int, target_l1_apicid) + __field(bool, target_nested) + __field(bool, target_running) + ), + + TP_fast_assign( + __entry->source_l1_apicid = source_l1_apicid; + __entry->target_l1_apicid = target_l1_apicid; + __entry->target_nested = target_nested; + __entry->target_running = target_running; + ), + + TP_printk("source %d target %d (nested: %d, running %d)", + __entry->source_l1_apicid, __entry->target_l1_apicid, + __entry->target_nested, __entry->target_running) +); + +TRACE_EVENT(kvm_avic_nested_kick_vcpu, + TP_PROTO(int source_l1_apic_id, int target_l2_apic_id, int target_l1_apic_id), + TP_ARGS(source_l1_apic_id, target_l2_apic_id, target_l1_apic_id), + + TP_STRUCT__entry( + __field(int, source_l1_apic_id) + __field(int, target_l2_apic_id) + __field(int, target_l1_apic_id) + ), + + TP_fast_assign( + __entry->source_l1_apic_id = source_l1_apic_id; + __entry->target_l2_apic_id = target_l2_apic_id; + __entry->target_l1_apic_id = target_l1_apic_id; + ), + + TP_printk("source l1 apic id: %d target l2 apic id: %d target l1 apic_id: %d", + __entry->source_l1_apic_id, __entry->target_l2_apic_id, + __entry->target_l1_apic_id) +); + TRACE_EVENT(kvm_hv_timer_state, TP_PROTO(unsigned int vcpu_id, unsigned int hv_timer_in_use), TP_ARGS(vcpu_id, hv_timer_in_use), diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 951d0a78ccdae..d2f73ce87a1e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13063,10 +13063,23 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window_update); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_kick_vcpu_slowpath); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_alloc); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_free); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_reload); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_write); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_update_vcpu_host); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_update_vcpu_guest); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_nested_doorbell); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_nested_kick_vcpu); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_accept_irq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit); From patchwork Wed Apr 27 20:03:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15643C433EF for ; Wed, 27 Apr 2022 20:04:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4BDC410E4AA; Wed, 27 Apr 2022 20:04:39 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id E504810E4AA for ; Wed, 27 Apr 2022 20:04:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089877; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=89RuVlFWNENo513LfTx1J681fautsCwvIr9MnyyPgbQ=; b=OVXMqeJxiX/AL37/H2WLnWWD/EnO+HFQ1PbKmvmOiaeFEBUpO12VTkKp3N2XiHEQDbm1xh UGJSnEgqXLEiUTbi0w5BjY/zE+OP/DOwvEq3aU+WT3ojnCJr1fwidVG5s1qBja4G2laYAY fVb0OQ2gF62iRhjvYQ5EeEMGSZQgrI4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-210-UNDZGWhRO4GsZn6rKZitjg-1; Wed, 27 Apr 2022 16:04:31 -0400 X-MC-Unique: UNDZGWhRO4GsZn6rKZitjg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0E063802809; Wed, 27 Apr 2022 20:04:30 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9AB0D9E89; Wed, 27 Apr 2022 20:04:23 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 10/19] KVM: x86: nSVM: implement AVIC's physid/logid table access helpers Date: Wed, 27 Apr 2022 23:03:05 +0300 Message-Id: <20220427200314.276673-11-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This implements a few helpers that help manipulate the AVIC's physical and logical id table entries. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/svm.h | 45 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 6fcb164a6ee4a..dfca4c06e2071 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -628,6 +628,51 @@ void avic_vcpu_unblocking(struct kvm_vcpu *vcpu); void avic_ring_doorbell(struct kvm_vcpu *vcpu); unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); +#define INVALID_BACKING_PAGE (~(u64)0) + +static inline u64 physid_entry_get_backing_table(u64 entry) +{ + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return INVALID_BACKING_PAGE; + return entry & AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK; +} + +static inline int physid_entry_get_apicid(u64 entry) +{ + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return -1; + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) + return -1; + + return entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; +} + +static inline int logid_get_physid(u64 entry) +{ + if (!(entry & AVIC_LOGICAL_ID_ENTRY_VALID_BIT)) + return -1; + return entry & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK; +} + +static inline void physid_entry_set_backing_table(u64 *entry, u64 value) +{ + *entry &= ~AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK; + *entry |= (AVIC_PHYSICAL_ID_ENTRY_VALID_MASK | value); +} + +static inline void physid_entry_set_apicid(u64 *entry, int value) +{ + WARN_ON(!(*entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)); + + *entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + + if (value == -1) + *entry &= ~(AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + else + *entry |= (AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK | value); +} + + /* sev.c */ #define GHCB_VERSION_MAX 1ULL From patchwork Wed Apr 27 20:03:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D1C9EC433F5 for ; Wed, 27 Apr 2022 20:04:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3E04610E516; Wed, 27 Apr 2022 20:04:47 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id B73E110E4AD for ; Wed, 27 Apr 2022 20:04:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5rfBwYJ26N8Es/7hSENLfHEbnCP+ZI4tpFxE+x8/wzk=; b=PfaZMbqUVjPHFLr4iJfm4z+0mk2ngFkaLXPGrkUFWqxQWlcFUMg4vDJDyWe723XWQ7GkKf pQnzEz5xs6Exe2Kcx0YXP1SYXdTR8R1DfnpyBW+bp6l/22tCnFLrJuL3wm8KVZSj7OkNGe GzMkEEXpTX1QrfHQ5w9OLPpLprYLa20= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-615-eEGmp6jkPTC81hMEBShfiA-1; Wed, 27 Apr 2022 16:04:37 -0400 X-MC-Unique: eEGmp6jkPTC81hMEBShfiA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0817B801E80; Wed, 27 Apr 2022 20:04:36 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 63CBF9E84; Wed, 27 Apr 2022 20:04:30 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 11/19] KVM: x86: nSVM: implement shadowing of AVIC's physical id table Date: Wed, 27 Apr 2022 23:03:06 +0300 Message-Id: <20220427200314.276673-12-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Implement the shadow physical id table and its write tracking code which will be soon used for the nested AVIC. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 461 +++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm/svm.h | 71 +++++++ 2 files changed, 524 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index e5cbbb97fbab6..f462b7e48e3ca 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -51,6 +51,433 @@ static u32 next_vm_id = 0; static bool next_vm_id_wrapped = 0; static DEFINE_SPINLOCK(svm_vm_data_hash_lock); + +static inline struct kvm_vcpu *avic_vcpu_by_l1_apicid(struct kvm *kvm, + int l1_apicid) +{ + WARN_ON(l1_apicid == -1); + return kvm_get_vcpu_by_id(kvm, l1_apicid); +} + +static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, + struct avic_physid_table *t, + int n, + int new_l1_apicid) +{ + struct avic_physid_entry_descr *e = &t->entries[n]; + u64 sentry = READ_ONCE(*e->sentry); + u64 old_sentry = sentry; + struct kvm_vcpu *new_vcpu = NULL; + int l0_apicid = -1; + + WARN_ON(!test_bit(n, t->valid_entires)); + + if (!list_empty(&e->link)) + list_del_init(&e->link); + + if (new_l1_apicid != -1) + new_vcpu = avic_vcpu_by_l1_apicid(kvm, new_l1_apicid); + + if (new_vcpu) + l0_apicid = kvm_cpu_get_apicid(new_vcpu->cpu); + + physid_entry_set_apicid(&sentry, l0_apicid); + + trace_kvm_avic_physid_update_vcpu_guest(new_l1_apicid, l0_apicid); + + if (sentry != old_sentry) + WRITE_ONCE(*e->sentry, sentry); +} + +static void avic_physid_shadow_entry_create(struct kvm *kvm, + struct avic_physid_table *t, + int n, + u64 gentry) +{ + struct avic_physid_entry_descr *e = &t->entries[n]; + struct page *backing_page; + u64 backing_page_gpa = physid_entry_get_backing_table(gentry); + int l1_apic_id = physid_entry_get_apicid(gentry); + hpa_t backing_page_hpa; + u64 sentry = 0; + + + if (backing_page_gpa == INVALID_BACKING_PAGE) + return; + + /* Pin the APIC backing page */ + backing_page = gfn_to_page(kvm, gpa_to_gfn(backing_page_gpa)); + + if (is_error_page(backing_page)) + /* Invalid GPA in the guest entry - point to a dummy entry */ + backing_page_hpa = t->dummy_page_hpa; + else + backing_page_hpa = page_to_phys(backing_page); + + physid_entry_set_backing_table(&sentry, backing_page_hpa); + + e->gentry = gentry; + *e->sentry = sentry; + + if (test_and_set_bit(n, t->valid_entires)) + WARN_ON(1); + + if (backing_page_hpa != t->dummy_page_hpa) + avic_physid_shadow_entry_set_vcpu(kvm, t, n, l1_apic_id); +} + +static void avic_physid_shadow_entry_remove(struct kvm *kvm, + struct avic_physid_table *t, + int n) +{ + struct avic_physid_entry_descr *e = &t->entries[n]; + hpa_t backing_page_hpa; + + if (!test_and_clear_bit(n, t->valid_entires)) + WARN_ON(1); + + /* Release the APIC backing page */ + backing_page_hpa = physid_entry_get_backing_table(*e->sentry); + + if (backing_page_hpa != t->dummy_page_hpa) + kvm_release_pfn_dirty(backing_page_hpa >> PAGE_SHIFT); + + if (!list_empty(&e->link)) + list_del_init(&e->link); + + e->gentry = 0; + *e->sentry = 0; +} + + +static bool +avic_physid_shadow_table_setup_write_tracking(struct kvm *kvm, + struct avic_physid_table *t, + bool enable) +{ + struct kvm_memory_slot *slot; + + write_lock(&kvm->mmu_lock); + slot = gfn_to_memslot(kvm, t->gfn); + if (!slot) { + write_unlock(&kvm->mmu_lock); + return false; + } + + if (enable) + kvm_slot_page_track_add_page(kvm, slot, t->gfn, KVM_PAGE_TRACK_WRITE); + else + kvm_slot_page_track_remove_page(kvm, slot, t->gfn, KVM_PAGE_TRACK_WRITE); + write_unlock(&kvm->mmu_lock); + return true; +} + +static void +avic_physid_shadow_table_erase(struct kvm *kvm, struct avic_physid_table *t) +{ + int i; + + if (!t->nentries) + return; + + avic_physid_shadow_table_setup_write_tracking(kvm, t, false); + + for_each_set_bit(i, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) + avic_physid_shadow_entry_remove(kvm, t, i); + + t->nentries = 0; + t->flood_count = 0; +} + +static struct avic_physid_table * +avic_physid_shadow_table_alloc(struct kvm *kvm, gfn_t gfn) +{ + struct avic_physid_entry_descr *e; + struct avic_physid_table *t; + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + u64 *shadow_table_address; + int i; + + if (kvm_page_track_write_tracking_enable(kvm)) + return NULL; + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + + t = kzalloc(sizeof(*t), GFP_KERNEL_ACCOUNT); + if (!t) + return NULL; + + t->shadow_table = alloc_page(GFP_KERNEL_ACCOUNT|__GFP_ZERO); + if (!t->shadow_table) + goto err_free_table; + + shadow_table_address = page_address(t->shadow_table); + t->shadow_table_hpa = __sme_set(page_to_phys(t->shadow_table)); + + for (i = 0; i < ARRAY_SIZE(t->entries); i++) { + e = &t->entries[i]; + e->sentry = &shadow_table_address[i]; + e->gentry = 0; + INIT_LIST_HEAD(&e->link); + } + + t->gfn = gfn; + t->refcount = 1; + + list_add_tail(&t->link, &kvm_svm->avic.physid_tables); + + t->dummy_page_hpa = page_to_phys(kvm_svm->avic.invalid_physid_page); + + trace_kvm_avic_physid_table_alloc(gfn_to_gpa(gfn)); + return t; + +err_free_table: + kfree(t); + return NULL; +} + +static void +avic_physid_shadow_table_free(struct kvm *kvm, struct avic_physid_table *t) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + + WARN_ON(t->refcount); + + avic_physid_shadow_table_erase(kvm, t); + + trace_kvm_avic_physid_table_free(gfn_to_gpa(t->gfn)); + + hlist_del(&t->hash_link); + list_del(&t->link); + __free_page(t->shadow_table); + kfree(t); +} + +static struct avic_physid_table * +__avic_physid_shadow_table_get(struct hlist_head *head, gfn_t gfn) +{ + struct avic_physid_table *t; + + hlist_for_each_entry(t, head, hash_link) + if (t->gfn == gfn) { + t->refcount++; + return t; + } + return NULL; +} + +struct avic_physid_table * +avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct hlist_head *hlist; + struct avic_physid_table *t; + + mutex_lock(&kvm_svm->avic.tables_lock); + + hlist = &kvm_svm->avic.physid_gpa_hash[avic_physid_hash(gfn)]; + t = __avic_physid_shadow_table_get(hlist, gfn); + if (!t) { + t = avic_physid_shadow_table_alloc(vcpu->kvm, gfn); + if (!t) + goto out_unlock; + hlist_add_head(&t->hash_link, hlist); + } +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); + return t; +} + +static void +__avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t) +{ + WARN_ON(t->refcount <= 0); + if (--t->refcount == 0) + avic_physid_shadow_table_free(kvm, t); +} + +void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + + mutex_lock(&kvm_svm->avic.tables_lock); + __avic_physid_shadow_table_put(kvm, t); + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +static void avic_physid_shadow_table_invalidate(struct kvm *kvm, + struct avic_physid_table *t) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + avic_physid_shadow_table_erase(kvm, t); +} + +int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, + struct avic_physid_table *t, int nentries) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct kvm_host_map map; + u64 *gentries; + int i; + int ret = 0; + + mutex_lock(&kvm_svm->avic.tables_lock); + + if (t->nentries >= nentries) + goto out_unlock; + + + trace_kvm_avic_physid_table_reload(gfn_to_gpa(t->gfn), t->nentries, nentries); + + if (t->nentries == 0) { + if (!avic_physid_shadow_table_setup_write_tracking(vcpu->kvm, t, true)) { + ret = -EFAULT; + goto out_unlock; + } + } + + if (kvm_vcpu_map(vcpu, t->gfn, &map)) { + ret = -EFAULT; + goto out_unlock; + } + + gentries = (u64 *)map.hva; + + for (i = t->nentries ; i < nentries ; i++) + avic_physid_shadow_entry_create(vcpu->kvm, t, i, gentries[i]); + + /* publish the table before setting nentries */ + wmb(); + WRITE_ONCE(t->nentries, nentries); + + kvm_vcpu_unmap(vcpu, &map, false); +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); + return ret; +} + +static void avic_physid_shadow_table_track_write(struct kvm_vcpu *vcpu, + gpa_t gpa, + const u8 *new, + int bytes, + struct kvm_page_track_notifier_node *node) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct hlist_head *hlist; + struct avic_physid_table *t; + gfn_t gfn = gpa_to_gfn(gpa); + unsigned int page_offset = offset_in_page(gpa); + unsigned int entry_offset = page_offset & 0x7; + int first = page_offset / sizeof(u64); + int last = (page_offset + bytes - 1) / sizeof(u64); + u64 new_entry, old_entry; + int l1_apic_id; + + if (WARN_ON_ONCE(bytes == 0)) + return; + + mutex_lock(&kvm_svm->avic.tables_lock); + + hlist = &kvm_svm->avic.physid_gpa_hash[avic_physid_hash(gfn)]; + t = __avic_physid_shadow_table_get(hlist, gfn); + + if (!t) + goto out_unlock; + + trace_kvm_avic_physid_table_write(gpa, bytes); + + /* + * Update policy: + * + * Only a write to a single entry, entry that had a valid backing page + * on the last VM entry with this page, and only if the + * write touches only the is_running and/or apic_id part of this entry + * is allowed. + * + * Writes outside of known number of entries are ignored to support + * case when the guest is adding entries to end of the page + * in the process of a cpu hotplug. + * + * All other writes, which are not supposed to happen during + * use of the page, cause the page to be invalidated, + * and read as a whole, next time it is used by a vCPU for VM entry. + */ + + if (first >= t->nentries) + goto out_table_put; + + if (first != last || !test_bit(first, t->valid_entires)) + goto invalidate; + + /* update the entry with written bytes */ + old_entry = t->entries[first].gentry; + new_entry = old_entry; + memcpy(((u8 *)&new_entry) + entry_offset, new, bytes); + + /* if backing page changed, invalidate the whole page*/ + if (physid_entry_get_backing_table(old_entry) != + physid_entry_get_backing_table(new_entry)) + goto invalidate; + + /* + * Detect write flooding to physid pages that might not be used + * for the purpose anymore + */ + if (!atomic_read(&t->usecount)) { + if (++t->flood_count > t->nentries * AVIC_PHYSID_FLOOD_COUNT) + goto invalidate; + } else { + t->flood_count = 0; + } + + /* Update the backing cpu */ + l1_apic_id = physid_entry_get_apicid(new_entry); + avic_physid_shadow_entry_set_vcpu(vcpu->kvm, t, first, l1_apic_id); + t->entries[first].gentry = new_entry; + goto out_table_put; +invalidate: + avic_physid_shadow_table_invalidate(vcpu->kvm, t); +out_table_put: + __avic_physid_shadow_table_put(vcpu->kvm, t); +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +static void avic_physid_shadow_table_flush_memslot(struct kvm *kvm, + struct kvm_memory_slot *slot, + struct kvm_page_track_notifier_node *node) +{ + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); + struct avic_physid_table *t, *n; + int i; + + mutex_lock(&kvm_svm->avic.tables_lock); + + list_for_each_entry_safe(t, n, &kvm_svm->avic.physid_tables, link) { + + if (gfn_in_memslot(slot, t->gfn)) { + avic_physid_shadow_table_invalidate(kvm, t); + continue; + } + + for_each_set_bit(i, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) { + u64 gentry = t->entries[i].gentry; + gpa_t gpa = physid_entry_get_backing_table(gentry); + + if (gfn_in_memslot(slot, gpa_to_gfn(gpa))) { + avic_physid_shadow_table_invalidate(kvm, t); + break; + } + } + } + mutex_unlock(&kvm_svm->avic.tables_lock); +} + + /* * This is a wrapper of struct amd_iommu_ir_data. */ @@ -113,18 +540,22 @@ void avic_vm_destroy(struct kvm *kvm) __free_page(avic->logical_id_table_page); if (avic->physical_id_table_page) __free_page(avic->physical_id_table_page); + if (avic->invalid_physid_page) + __free_page(avic->invalid_physid_page); spin_lock_irqsave(&svm_vm_data_hash_lock, flags); hash_del(&avic->hnode); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); + + + kvm_page_track_unregister_notifier(kvm, &avic->write_tracker); } int avic_vm_init(struct kvm *kvm) { unsigned long flags; int err = -ENOMEM; - struct page *p_page; - struct page *l_page; + struct page *page; struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; u32 vm_id; @@ -132,18 +563,25 @@ int avic_vm_init(struct kvm *kvm) return 0; /* Allocating physical APIC ID table (4KB) */ - p_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); - if (!p_page) + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) goto free_avic; - avic->physical_id_table_page = p_page; + avic->physical_id_table_page = page; /* Allocating logical APIC ID table (4KB) */ - l_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); - if (!l_page) + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) goto free_avic; - avic->logical_id_table_page = l_page; + avic->logical_id_table_page = page; + + /* Allocating a dummy page for invalid nested avic physid entries */ + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) + goto free_avic; + + avic->invalid_physid_page = page; spin_lock_irqsave(&svm_vm_data_hash_lock, flags); again: @@ -165,6 +603,13 @@ int avic_vm_init(struct kvm *kvm) hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); + mutex_init(&avic->tables_lock); + INIT_LIST_HEAD(&avic->physid_tables); + + avic->write_tracker.track_write = avic_physid_shadow_table_track_write; + avic->write_tracker.track_flush_slot = avic_physid_shadow_table_flush_memslot; + + kvm_page_track_register_notifier(kvm, &avic->write_tracker); return 0; free_avic: diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index dfca4c06e2071..fc15e1f938793 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -89,13 +90,33 @@ struct kvm_sev_info { }; +#define AVIC_PHYSID_HASH_SHIFT 8 +#define AVIC_PHYSID_HASH_SIZE (1 << AVIC_PHYSID_HASH_SHIFT) + struct kvm_svm_avic { u32 vm_id; struct page *logical_id_table_page; struct page *physical_id_table_page; struct hlist_node hnode; + + struct mutex tables_lock; + + /* List of all shadow tables */ + struct list_head physid_tables; + + /* GPA hash table to find a shadow table via its GPA */ + struct hlist_head physid_gpa_hash[AVIC_PHYSID_HASH_SIZE]; + + struct kvm_page_track_notifier_node write_tracker; + + struct page *invalid_physid_page; }; +static __always_inline unsigned int avic_physid_hash(gfn_t gfn) +{ + return hash_64(gfn, AVIC_PHYSID_HASH_SHIFT); +} + struct kvm_svm { struct kvm kvm; struct kvm_svm_avic avic; @@ -147,6 +168,49 @@ struct vmcb_ctrl_area_cached { u8 reserved_sw[32]; }; +struct avic_physid_entry_descr { + struct list_head link; + + /* cached value of guest entry */ + u64 gentry; + + /* shadow table entry pointer*/ + u64 *sentry; +}; + +#define AVIC_PHYSID_FLOOD_COUNT 1000 + +struct avic_physid_table { + /* List of all tables member */ + struct list_head link; + + /* GPA hash of all tables member */ + struct hlist_node hash_link; + + /* GPA of the table in guest memory*/ + gfn_t gfn; + + /* Number of entries that we shadow and which are valid*/ + int nentries; + DECLARE_BITMAP(valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT); + + struct avic_physid_entry_descr entries[AVIC_MAX_PHYSICAL_ID_COUNT]; + + /* Guest visible shadow table */ + struct page *shadow_table; + hpa_t shadow_table_hpa; + hpa_t dummy_page_hpa; + + /* Number of vCPUs which have reference to this table */ + int refcount; + + /* number of vCPUs that are in guest mode and use this table */ + atomic_t usecount; + + /* Number of writes to this page between uses of it*/ + int flood_count; +}; + struct svm_nested_state { struct kvm_vmcb_info vmcb02; u64 hsave_msr; @@ -628,6 +692,13 @@ void avic_vcpu_unblocking(struct kvm_vcpu *vcpu); void avic_ring_doorbell(struct kvm_vcpu *vcpu); unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); +struct avic_physid_table * +avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn); +void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t); +int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, + struct avic_physid_table *t, int nentries); + + #define INVALID_BACKING_PAGE (~(u64)0) static inline u64 physid_entry_get_backing_table(u64 entry) From patchwork Wed Apr 27 20:03:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 928BFC433EF for ; Wed, 27 Apr 2022 20:04:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D8BFD10E542; Wed, 27 Apr 2022 20:04:53 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 512E310E521 for ; Wed, 27 Apr 2022 20:04:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/AE6j2Z/VdUQtMtfyLiPPrZMPCCO8jp5Oj+E1IX+QYM=; b=ZTV4+ARp3mX2Qkn/Q4Q7PWXE1j1jWJKaWitcoUysDwddJT9WlN9AvCeTPYhBUnrqGKxp+2 uc6fK5aPcrW5gTnBvBTEaqVc9OSP0n2hPVSK3+0O8hnxaCQNzST/xmvbjM2ZJlCfIQpoFr kcnz+IeiIhwQ3Kkdh3DwPMFwxVNKjOo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-36-BxQAc8sbPyqddXdWg2ibGg-1; Wed, 27 Apr 2022 16:04:45 -0400 X-MC-Unique: BxQAc8sbPyqddXdWg2ibGg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F1E0F381078B; Wed, 27 Apr 2022 20:04:43 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5DBDF9E82; Wed, 27 Apr 2022 20:04:36 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 12/19] KVM: x86: nSVM: make nested AVIC physid write tracking be aware of the host scheduling Date: Wed, 27 Apr 2022 23:03:07 +0300 Message-Id: <20220427200314.276673-13-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" For each vCPU - store a linked list of all shadow physical id entries which address it. - Update those entries when this vCPU is scheduled in/out - update this list, when physid tables are modified by other means (guest write and/or table sync) To avoid races vs vcpu schedule, use a spinlock. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 113 +++++++++++++++++++++++++++++++++++++--- arch/x86/kvm/svm/svm.c | 7 +++ arch/x86/kvm/svm/svm.h | 10 ++++ 3 files changed, 122 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index f462b7e48e3ca..34da9fabd5194 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -67,8 +67,12 @@ static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, struct avic_physid_entry_descr *e = &t->entries[n]; u64 sentry = READ_ONCE(*e->sentry); u64 old_sentry = sentry; + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); struct kvm_vcpu *new_vcpu = NULL; int l0_apicid = -1; + unsigned long flags; + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); WARN_ON(!test_bit(n, t->valid_entires)); @@ -79,6 +83,9 @@ static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, new_vcpu = avic_vcpu_by_l1_apicid(kvm, new_l1_apicid); if (new_vcpu) + list_add_tail(&e->link, &to_svm(new_vcpu)->nested.physid_ref_entries); + + if (new_vcpu && to_svm(new_vcpu)->nested_avic_active) l0_apicid = kvm_cpu_get_apicid(new_vcpu->cpu); physid_entry_set_apicid(&sentry, l0_apicid); @@ -87,6 +94,8 @@ static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, if (sentry != old_sentry) WRITE_ONCE(*e->sentry, sentry); + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); } static void avic_physid_shadow_entry_create(struct kvm *kvm, @@ -131,7 +140,11 @@ static void avic_physid_shadow_entry_remove(struct kvm *kvm, int n) { struct avic_physid_entry_descr *e = &t->entries[n]; + struct kvm_svm *kvm_svm = to_kvm_svm(kvm); hpa_t backing_page_hpa; + unsigned long flags; + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); if (!test_and_clear_bit(n, t->valid_entires)) WARN_ON(1); @@ -147,8 +160,49 @@ static void avic_physid_shadow_entry_remove(struct kvm *kvm, e->gentry = 0; *e->sentry = 0; + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); } +static void avic_update_peer_physid_entries(struct kvm_vcpu *vcpu, int cpu) +{ + /* + * Update all shadow physid tables which contain entries + * which reference this vCPU with its new physical location + */ + struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); + struct vcpu_svm *vcpu_svm = to_svm(vcpu); + struct avic_physid_entry_descr *e; + int updated_nentries = 0; + int l0_apicid = -1; + unsigned long flags; + bool new_active = cpu != -1; + + if (cpu != -1) + l0_apicid = kvm_cpu_get_apicid(cpu); + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + list_for_each_entry(e, &vcpu_svm->nested.physid_ref_entries, link) { + u64 sentry = READ_ONCE(*e->sentry); + u64 old_sentry = sentry; + + physid_entry_set_apicid(&sentry, l0_apicid); + + if (sentry != old_sentry) { + updated_nentries++; + WRITE_ONCE(*e->sentry, sentry); + } + } + + if (updated_nentries) + trace_kvm_avic_physid_update_vcpu_host(vcpu->vcpu_id, + l0_apicid, updated_nentries); + + vcpu_svm->nested_avic_active = new_active; + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} static bool avic_physid_shadow_table_setup_write_tracking(struct kvm *kvm, @@ -603,6 +657,7 @@ int avic_vm_init(struct kvm *kvm) hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); + raw_spin_lock_init(&avic->table_entries_lock); mutex_init(&avic->tables_lock); INIT_LIST_HEAD(&avic->physid_tables); @@ -1428,9 +1483,51 @@ static void avic_vcpu_load(struct kvm_vcpu *vcpu) static void avic_vcpu_put(struct kvm_vcpu *vcpu) { preempt_disable(); - __avic_vcpu_put(vcpu); + preempt_enable(); +} + +void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + lockdep_assert_preemption_disabled(); + + /* + * For the same reason as in __avic_vcpu_load there is no + * need to load nested AVIC when this vCPU is blocking + */ + if (kvm_vcpu_is_blocking(vcpu)) + return; + + if (svm->nested.initialized) + avic_update_peer_physid_entries(vcpu, cpu); +} + +void __nested_avic_put(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + lockdep_assert_preemption_disabled(); + + if (svm->nested.initialized) + avic_update_peer_physid_entries(vcpu, -1); +} + +void nested_avic_load(struct kvm_vcpu *vcpu) +{ + int cpu = get_cpu(); + + WARN_ON(cpu != vcpu->cpu); + __nested_avic_load(vcpu, cpu); + put_cpu(); +} + +void nested_avic_put(struct kvm_vcpu *vcpu) +{ + preempt_disable(); + __nested_avic_put(vcpu); preempt_enable(); } @@ -1468,9 +1565,6 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) void avic_vcpu_blocking(struct kvm_vcpu *vcpu) { - if (!kvm_vcpu_apicv_active(vcpu)) - return; - /* * Unload the AVIC when the vCPU is about to block, _before_ * the vCPU actually blocks. @@ -1484,13 +1578,16 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu) * IRR and reading IsRunning; the lack of this barrier might be * the cause of errata #1235). */ - avic_vcpu_put(vcpu); + if (kvm_vcpu_apicv_active(vcpu)) + avic_vcpu_put(vcpu); + + nested_avic_put(vcpu); } void avic_vcpu_unblocking(struct kvm_vcpu *vcpu) { - if (!kvm_vcpu_apicv_active(vcpu)) - return; + if (kvm_vcpu_apicv_active(vcpu)) + avic_vcpu_load(vcpu); - avic_vcpu_load(vcpu); + nested_avic_load(vcpu); } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 75b4f3ac8b1a0..76fbee2c8c5d7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1302,6 +1302,8 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu) svm->guest_state_loaded = false; + INIT_LIST_HEAD(&svm->nested.physid_ref_entries); + return 0; error_free_vmsa_page: @@ -1391,8 +1393,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) sd->current_vmcb = svm->vmcb; indirect_branch_prediction_barrier(); } + if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_load(vcpu, cpu); + + __nested_avic_load(vcpu, cpu); } static void svm_vcpu_put(struct kvm_vcpu *vcpu) @@ -1400,6 +1405,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_put(vcpu); + __nested_avic_put(vcpu); + svm_prepare_host_switch(vcpu); ++vcpu->stat.host_state_reload; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index fc15e1f938793..401449dbce65d 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -99,6 +99,7 @@ struct kvm_svm_avic { struct page *physical_id_table_page; struct hlist_node hnode; + raw_spinlock_t table_entries_lock; struct mutex tables_lock; /* List of all shadow tables */ @@ -244,6 +245,9 @@ struct svm_nested_state { * on its side. */ bool force_msr_bitmap_recalc; + + /* All AVIC shadow PID table entry descriptors that reference this vCPU */ + struct list_head physid_ref_entries; }; struct vcpu_sev_es_state { @@ -311,6 +315,7 @@ struct vcpu_svm { u32 dfr_reg; struct page *avic_backing_page; u64 *avic_physical_id_cache; + bool nested_avic_active; /* * Per-vcpu list of struct amd_svm_iommu_ir: @@ -678,6 +683,11 @@ int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu); int avic_init_vcpu(struct vcpu_svm *svm); void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void __avic_vcpu_put(struct kvm_vcpu *vcpu); +void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu); +void __nested_avic_put(struct kvm_vcpu *vcpu); +void nested_avic_load(struct kvm_vcpu *vcpu); +void nested_avic_put(struct kvm_vcpu *vcpu); + void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu); void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu); void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu); From patchwork Wed Apr 27 20:03:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5085BC433FE for ; Wed, 27 Apr 2022 20:05:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 06A1210E56E; Wed, 27 Apr 2022 20:05:04 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 520E110E545 for ; Wed, 27 Apr 2022 20:04:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089898; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rnDGED1eoKg8mWDlJ24GzZnh49DIKVV+Swzkc6d/KD8=; b=FdNhWuxnsL3yHh7BDBO6fhFIt9HGq8IP4UApAc6sqmDF6OHHtOGDnrXrwuQijZYW2kXEdm 4dkYl94WFYEqABOgedIL43XDOjsU13dTndPG/QfQTMYDEAb0TRbF7LMheWlhzorh6fn2Ow ZzKDd4DUPuvlbkPMYfNdj6LadBhatl8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-237-ToKsD41uOPSms3Wrh18dDQ-1; Wed, 27 Apr 2022 16:04:51 -0400 X-MC-Unique: ToKsD41uOPSms3Wrh18dDQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4E85F381079E; Wed, 27 Apr 2022 20:04:50 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 536F89E82; Wed, 27 Apr 2022 20:04:44 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 13/19] KVM: x86: nSVM: wire nested AVIC to nested guest entry/exit Date: Wed, 27 Apr 2022 23:03:08 +0300 Message-Id: <20220427200314.276673-14-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" * Passthrough guest's avic pages that can be passed through - logical id table - avic backing page * Passthrough AVIC's mmio range - nested guest is responsible for marking it RW in its NPT tables. * Write track physical id page - all peer's avic backing pages are pinned as long as the shadow table is not invalidated/ freed. * Cache guest AVIC settings. * Add SDM mandated changes to emulated VM enter/exit. Note that nested AVIC still can't be enabled, thus this code has no effect yet. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 51 ++++++++++++++- arch/x86/kvm/svm/nested.c | 127 +++++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm/svm.c | 2 + arch/x86/kvm/svm/svm.h | 24 +++++++ 4 files changed, 199 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 34da9fabd5194..e6ec525a88625 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -59,6 +59,18 @@ static inline struct kvm_vcpu *avic_vcpu_by_l1_apicid(struct kvm *kvm, return kvm_get_vcpu_by_id(kvm, l1_apicid); } +static u32 nested_avic_get_reg(struct kvm_vcpu *vcpu, int reg_off) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + void *nested_apic_regs = svm->nested.l2_apic_access_page.hva; + + if (WARN_ON_ONCE(!nested_apic_regs)) + return 0; + + return *((u32 *) (nested_apic_regs + reg_off)); +} + static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, struct avic_physid_table *t, int n, @@ -531,6 +543,20 @@ static void avic_physid_shadow_table_flush_memslot(struct kvm *kvm, mutex_unlock(&kvm_svm->avic.tables_lock); } +void avic_free_nested(struct kvm_vcpu *vcpu) +{ + struct avic_physid_table *t; + struct vcpu_svm *svm = to_svm(vcpu); + + t = svm->nested.l2_physical_id_table; + if (t) { + avic_physid_shadow_table_put(vcpu->kvm, t); + svm->nested.l2_physical_id_table = NULL; + } + + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); +} /* * This is a wrapper of struct amd_iommu_ir_data. @@ -586,10 +612,18 @@ void avic_vm_destroy(struct kvm *kvm) { unsigned long flags; struct kvm_svm_avic *avic = &to_kvm_svm(kvm)->avic; + unsigned long i; + struct kvm_vcpu *vcpu; if (!enable_apicv) return; + kvm_for_each_vcpu(i, vcpu, kvm) { + vcpu_load(vcpu); + avic_free_nested(vcpu); + vcpu_put(vcpu); + } + if (avic->logical_id_table_page) __free_page(avic->logical_id_table_page); if (avic->physical_id_table_page) @@ -1501,7 +1535,7 @@ void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu) if (kvm_vcpu_is_blocking(vcpu)) return; - if (svm->nested.initialized) + if (svm->nested.initialized && svm->avic_enabled) avic_update_peer_physid_entries(vcpu, cpu); } @@ -1511,7 +1545,7 @@ void __nested_avic_put(struct kvm_vcpu *vcpu) lockdep_assert_preemption_disabled(); - if (svm->nested.initialized) + if (svm->nested.initialized && svm->avic_enabled) avic_update_peer_physid_entries(vcpu, -1); } @@ -1591,3 +1625,16 @@ void avic_vcpu_unblocking(struct kvm_vcpu *vcpu) nested_avic_load(vcpu); } + +bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu) +{ + int off; + + if (!nested_avic_in_use(vcpu)) + return false; + + for (off = 0x10; off < 0x80; off += 0x10) + if (nested_avic_get_reg(vcpu, APIC_IRR + off)) + return true; + return false; +} diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index bed5e1692cef0..eb5e9b600e052 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -387,6 +387,14 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu, memcpy(to->reserved_sw, from->reserved_sw, sizeof(struct hv_enlightenments)); } + + /* copy avic related settings only when it is enabled */ + if (from->int_ctl & AVIC_ENABLE_MASK) { + to->avic_vapic_bar = from->avic_vapic_bar; + to->avic_backing_page = from->avic_backing_page; + to->avic_logical_id = from->avic_logical_id; + to->avic_physical_id = from->avic_physical_id; + } } void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm, @@ -539,6 +547,79 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm) svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat; } + +static bool nested_vmcb02_prepare_avic(struct vcpu_svm *svm) +{ + struct vmcb *vmcb02 = svm->nested.vmcb02.ptr; + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + gfn_t physid_gfn; + int physid_nentries; + + if (!nested_avic_in_use(&svm->vcpu)) + return true; + + if (svm->vcpu.kvm->arch.apic_id_changed) { + /* if the guest played with apic id, it will keep both pieces */ + kvm_vm_bugged(svm->vcpu.kvm); + return false; + } + + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.ctl.avic_backing_page & AVIC_HPA_MASK), + &svm->nested.l2_apic_access_page)) + goto error; + + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.ctl.avic_logical_id & AVIC_HPA_MASK), + &svm->nested.l2_logical_id_table)) + goto error_unmap_backing_page; + + physid_gfn = gpa_to_gfn(svm->nested.ctl.avic_physical_id & + AVIC_HPA_MASK); + physid_nentries = svm->nested.ctl.avic_physical_id & + AVIC_PHYSICAL_ID_TABLE_SIZE_MASK; + + if (t && t->gfn != physid_gfn) { + avic_physid_shadow_table_put(svm->vcpu.kvm, t); + svm->nested.l2_physical_id_table = NULL; + } + + if (!svm->nested.l2_physical_id_table) { + t = avic_physid_shadow_table_get(&svm->vcpu, physid_gfn); + if (!t) + goto error_unmap_logical_id_table; + svm->nested.l2_physical_id_table = t; + } + + atomic_inc(&t->usecount); + + if (t->nentries < physid_nentries) + if (avic_physid_shadow_table_sync(&svm->vcpu, t, physid_nentries) < 0) + goto error_put_table; + + /* Everything is setup, we can enable AVIC */ + vmcb02->control.avic_vapic_bar = + svm->nested.ctl.avic_vapic_bar & VMCB_AVIC_APIC_BAR_MASK; + vmcb02->control.avic_backing_page = + pfn_to_hpa(svm->nested.l2_apic_access_page.pfn); + vmcb02->control.avic_logical_id = + pfn_to_hpa(svm->nested.l2_logical_id_table.pfn); + vmcb02->control.avic_physical_id = + (svm->nested.l2_physical_id_table->shadow_table_hpa) | physid_nentries; + + vmcb02->control.int_ctl |= AVIC_ENABLE_MASK; + vmcb_mark_dirty(vmcb02, VMCB_AVIC); + return true; + +error_put_table: + avic_physid_shadow_table_put(svm->vcpu.kvm, t); + svm->nested.l2_physical_id_table = NULL; +error_unmap_logical_id_table: + kvm_vcpu_unmap(&svm->vcpu, &svm->nested.l2_logical_id_table, false); +error_unmap_backing_page: + kvm_vcpu_unmap(&svm->vcpu, &svm->nested.l2_apic_access_page, false); +error: + return false; +} + static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12) { bool new_vmcb12 = false; @@ -627,6 +708,17 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm) else int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK); + if (nested_avic_in_use(vcpu)) { + + /* + * Enabling AVIC implicitly disables the + * V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR + * fields in the VMCB Control Word + */ + int_ctl_vmcb12_bits &= ~V_IRQ_INJECTION_BITS_MASK; + } + + /* Copied from vmcb01. msrpm_base can be overwritten later. */ vmcb02->control.nested_ctl = vmcb01->control.nested_ctl; vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; @@ -829,7 +921,10 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true)) goto out_exit_err; - if (nested_svm_vmrun_msrpm(svm)) + if (!nested_svm_vmrun_msrpm(svm)) + goto out_exit_err; + + if (nested_vmcb02_prepare_avic(svm)) goto out; out_exit_err: @@ -956,6 +1051,15 @@ int nested_svm_vmexit(struct vcpu_svm *svm) nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); + if (nested_avic_in_use(vcpu)) { + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); + + atomic_dec(&t->usecount); + } + svm_switch_vmcb(svm, &svm->vmcb01); if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) { @@ -1069,6 +1173,7 @@ int svm_allocate_nested(struct vcpu_svm *svm) svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm); svm->nested.initialized = true; + nested_avic_load(&svm->vcpu); return 0; err_free_vmcb02: @@ -1078,6 +1183,8 @@ int svm_allocate_nested(struct vcpu_svm *svm) void svm_free_nested(struct vcpu_svm *svm) { + struct kvm_vcpu *vcpu = &svm->vcpu; + if (!svm->nested.initialized) return; @@ -1096,6 +1203,11 @@ void svm_free_nested(struct vcpu_svm *svm) */ svm->nested.last_vmcb12_gpa = INVALID_GPA; + if (svm->avic_enabled) { + nested_avic_put(vcpu); + avic_free_nested(vcpu); + } + svm->nested.initialized = false; } @@ -1116,8 +1228,10 @@ void svm_leave_nested(struct kvm_vcpu *vcpu) nested_svm_uninit_mmu_context(vcpu); vmcb_mark_all_dirty(svm->vmcb); - } + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); + } kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); } @@ -1423,6 +1537,13 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst, dst->pause_filter_count = from->pause_filter_count; dst->pause_filter_thresh = from->pause_filter_thresh; /* 'clean' and 'reserved_sw' are not changed by KVM */ + + if (from->int_ctl & AVIC_ENABLE_MASK) { + dst->avic_vapic_bar = from->avic_vapic_bar; + dst->avic_backing_page = from->avic_backing_page; + dst->avic_logical_id = from->avic_logical_id; + dst->avic_physical_id = from->avic_physical_id; + } } static int svm_get_nested_state(struct kvm_vcpu *vcpu, @@ -1644,7 +1765,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) return false; - if (!nested_svm_vmrun_msrpm(svm)) { + if (!nested_svm_vmrun_msrpm(svm) || !nested_vmcb02_prepare_avic(svm)) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 76fbee2c8c5d7..a39bb0b27a51d 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4680,6 +4680,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl, .check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons, .apicv_post_state_restore = avic_apicv_post_state_restore, + .guest_apic_has_interrupt = avic_nested_has_interrupt, .get_mt_mask = svm_get_mt_mask, .get_exit_info = svm_get_exit_info, @@ -4931,6 +4932,7 @@ static __init int svm_hardware_setup(void) svm_x86_ops.vcpu_blocking = NULL; svm_x86_ops.vcpu_unblocking = NULL; svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL; + svm_x86_ops.guest_apic_has_interrupt = NULL; } if (vls) { diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 401449dbce65d..17fcc09cf4be1 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -167,6 +167,11 @@ struct vmcb_ctrl_area_cached { u64 virt_ext; u32 clean; u8 reserved_sw[32]; + + u64 avic_vapic_bar; + u64 avic_backing_page; + u64 avic_logical_id; + u64 avic_physical_id; }; struct avic_physid_entry_descr { @@ -248,6 +253,10 @@ struct svm_nested_state { /* All AVIC shadow PID table entry descriptors that reference this vCPU */ struct list_head physid_ref_entries; + + struct kvm_host_map l2_apic_access_page; + struct kvm_host_map l2_logical_id_table; + struct avic_physid_table *l2_physical_id_table; }; struct vcpu_sev_es_state { @@ -310,6 +319,7 @@ struct vcpu_svm { bool pause_filter_enabled : 1; bool pause_threshold_enabled : 1; bool vgif_enabled : 1; + bool avic_enabled : 1; u32 ldr_reg; u32 dfr_reg; @@ -701,6 +711,8 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu); void avic_vcpu_unblocking(struct kvm_vcpu *vcpu); void avic_ring_doorbell(struct kvm_vcpu *vcpu); unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); +void avic_free_nested(struct kvm_vcpu *vcpu); +bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu); struct avic_physid_table * avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn); @@ -708,6 +720,18 @@ void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *t); int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, struct avic_physid_table *t, int nentries); +static inline bool nested_avic_in_use(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *vcpu_svm = to_svm(vcpu); + + if (!vcpu_svm->avic_enabled) + return false; + + if (!nested_npt_enabled(vcpu_svm)) + return false; + + return vcpu_svm->nested.ctl.int_ctl & AVIC_ENABLE_MASK; +} #define INVALID_BACKING_PAGE (~(u64)0) From patchwork Wed Apr 27 20:03:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A548C4332F for ; Wed, 27 Apr 2022 20:05:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9E94110E57B; Wed, 27 Apr 2022 20:05:04 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id DD00D10E56E for ; Wed, 27 Apr 2022 20:05:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089901; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RUs4DpgdxHTyD0OQTTODRh43DT2WHmGc1ZDpPP7qap8=; b=jTa+hOeDAvyJLGYUrGrjFfOgAEp1qOGnDkHEtjc2A5TQggTYYe1+vdqWmiAZkhcKJSVgNy R51/9M7Q4KPYApDqE8zrr0WOA6tIKG2871RYhuy31cXFgmlgVXD3QJVxPxZ/gtCVim9hom SNkOYz2yLLg2mYxVj6f8ymZZXsOZRIM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-443-craAUaisNMOfN7hKJZAMCQ-1; Wed, 27 Apr 2022 16:04:57 -0400 X-MC-Unique: craAUaisNMOfN7hKJZAMCQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1E45080346F; Wed, 27 Apr 2022 20:04:56 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id A53DE7C55; Wed, 27 Apr 2022 20:04:50 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 14/19] KVM: x86: rename .set_apic_access_page_addr to reload_apic_access_page Date: Wed, 27 Apr 2022 23:03:09 +0300 Message-Id: <20220427200314.276673-15-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This will be used on SVM to reload shadow page of the AVIC physid table No functional change intended Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/vmx/vmx.c | 8 ++++---- arch/x86/kvm/x86.c | 6 +++--- 4 files changed, 9 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 96e4e9842dfc6..997edb7453ac2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -82,7 +82,7 @@ KVM_X86_OP_OPTIONAL(hwapic_isr_update) KVM_X86_OP_OPTIONAL_RET0(guest_apic_has_interrupt) KVM_X86_OP_OPTIONAL(load_eoi_exitmap) KVM_X86_OP_OPTIONAL(set_virtual_apic_mode) -KVM_X86_OP_OPTIONAL(set_apic_access_page_addr) +KVM_X86_OP_OPTIONAL(reload_apic_pages) KVM_X86_OP(deliver_interrupt) KVM_X86_OP_OPTIONAL(sync_pir_to_irr) KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fc7df778a3d71..52fa04c3108b1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1436,7 +1436,7 @@ struct kvm_x86_ops { bool (*guest_apic_has_interrupt)(struct kvm_vcpu *vcpu); void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu); - void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu); + void (*reload_apic_pages)(struct kvm_vcpu *vcpu); void (*deliver_interrupt)(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector); int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); @@ -1909,7 +1909,6 @@ int kvm_cpu_has_extint(struct kvm_vcpu *v); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); - int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low, unsigned long ipi_bitmap_high, u32 min, unsigned long icr, int op_64_bit); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cf8581978bce3..7defd31703c61 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6339,7 +6339,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) vmx_update_msr_bitmap_x2apic(vcpu); } -static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) +static void vmx_reload_apic_access_page(struct kvm_vcpu *vcpu) { struct page *page; @@ -7777,7 +7777,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .enable_irq_window = vmx_enable_irq_window, .update_cr8_intercept = vmx_update_cr8_intercept, .set_virtual_apic_mode = vmx_set_virtual_apic_mode, - .set_apic_access_page_addr = vmx_set_apic_access_page_addr, + .reload_apic_pages = vmx_reload_apic_access_page, .refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl, .load_eoi_exitmap = vmx_load_eoi_exitmap, .apicv_post_state_restore = vmx_apicv_post_state_restore, @@ -7940,12 +7940,12 @@ static __init int hardware_setup(void) enable_vnmi = 0; /* - * set_apic_access_page_addr() is used to reload apic access + * kvm_vcpu_reload_apic_pages() is used to reload apic access * page upon invalidation. No need to do anything if not * using the APIC_ACCESS_ADDR VMCS field. */ if (!flexpriority_enabled) - vmx_x86_ops.set_apic_access_page_addr = NULL; + vmx_x86_ops.reload_apic_pages = NULL; if (!cpu_has_vmx_tpr_shadow()) vmx_x86_ops.update_cr8_intercept = NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d2f73ce87a1e3..ad744ab99734c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9949,12 +9949,12 @@ void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); } -static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +static void kvm_vcpu_reload_apic_pages(struct kvm_vcpu *vcpu) { if (!lapic_in_kernel(vcpu)) return; - static_call_cond(kvm_x86_set_apic_access_page_addr)(vcpu); + static_call_cond(kvm_x86_reload_apic_pages)(vcpu); } void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu) @@ -10071,7 +10071,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_LOAD_EOI_EXITMAP, vcpu)) vcpu_load_eoi_exitmap(vcpu); if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) - kvm_vcpu_reload_apic_access_page(vcpu); + kvm_vcpu_reload_apic_pages(vcpu); if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) { vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH; From patchwork Wed Apr 27 20:03:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D08DBC433F5 for ; Wed, 27 Apr 2022 20:05:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F231410E587; Wed, 27 Apr 2022 20:05:16 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id A72DD10E58A for ; Wed, 27 Apr 2022 20:05:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089908; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Am71f5MoXzQi71Stjn8gBw9OmI6jTzVjTksW/0/1FrU=; b=S88+rJ9lsFuW1okiij37i0elUMmIzvlCBzS4mln0WcBQfxdXxEfC+rLo50ynwt7MV8yr8R KKMRC6q+8Fqk5l5z6h/VW+v9oARZrUnNoIxRhd1pdvWZs4gn2LdtrpIr5vI/F8oukBLu+8 ICFDys95BN9HUwk6CbiGmnCbOryKOAs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-292-56fBFPFpPlCUYVISPPIDqQ-1; Wed, 27 Apr 2022 16:05:03 -0400 X-MC-Unique: 56fBFPFpPlCUYVISPPIDqQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E62C6185A7A4; Wed, 27 Apr 2022 20:05:01 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 77F9210725; Wed, 27 Apr 2022 20:04:56 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 15/19] KVM: x86: nSVM: add code to reload AVIC physid table when it is invalidated Date: Wed, 27 Apr 2022 23:03:10 +0300 Message-Id: <20220427200314.276673-16-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" An AVIC table invalidation is not supposed to happen often, and can only happen when the guest does something suspicious such as: - It places physid page in a memslot that is enabled/disabled and memslot flushing happens. - It tries to update apic backing page addresses - guest has no reason to touch this, and doing so on real hardware will likely result in unpredictable results. - It writes to reserved bits of a tracked page. - It write floods a physid table while no vCPU is using it (the page is likely reused at that point to contain something else) All of the above causes a KVM_REQ_APIC_PAGE_RELOAD request to be raised on all vCPUS, which kicks them out of the guest mode, and then first vCPU to reach the handler will re-create the entries of the physid page, and others will notice this and do nothing. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 13 +++++++++++++ arch/x86/kvm/svm/svm.c | 1 + arch/x86/kvm/svm/svm.h | 1 + 3 files changed, 15 insertions(+) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index e6ec525a88625..f13ca1e7b2845 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -379,6 +379,7 @@ static void avic_physid_shadow_table_invalidate(struct kvm *kvm, struct kvm_svm *kvm_svm = to_kvm_svm(kvm); lockdep_assert_held(&kvm_svm->avic.tables_lock); + kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); avic_physid_shadow_table_erase(kvm, t); } @@ -1638,3 +1639,15 @@ bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu) return true; return false; } + +void avic_reload_apic_pages(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *vcpu_svm = to_svm(vcpu); + struct avic_physid_table *t = vcpu_svm->nested.l2_physical_id_table; + + int nentries = vcpu_svm->nested.ctl.avic_physical_id & + AVIC_PHYSICAL_ID_TABLE_SIZE_MASK; + + if (t && is_guest_mode(vcpu) && nested_avic_in_use(vcpu)) + avic_physid_shadow_table_sync(vcpu, t, nentries); +} diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index a39bb0b27a51d..d96a73931d1e5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4677,6 +4677,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .enable_nmi_window = svm_enable_nmi_window, .enable_irq_window = svm_enable_irq_window, .update_cr8_intercept = svm_update_cr8_intercept, + .reload_apic_pages = avic_reload_apic_pages, .refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl, .check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons, .apicv_post_state_restore = avic_apicv_post_state_restore, diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 17fcc09cf4be1..93fd9d6f5fd85 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -711,6 +711,7 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu); void avic_vcpu_unblocking(struct kvm_vcpu *vcpu); void avic_ring_doorbell(struct kvm_vcpu *vcpu); unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); +void avic_reload_apic_pages(struct kvm_vcpu *vcpu); void avic_free_nested(struct kvm_vcpu *vcpu); bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu); From patchwork Wed Apr 27 20:03:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5AC0BC433FE for ; Wed, 27 Apr 2022 20:05:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 07F3D10E5AE; Wed, 27 Apr 2022 20:05:18 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6389A10E587 for ; Wed, 27 Apr 2022 20:05:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+9l7/nYYuUFlKjIqI70hEfPt9KC0Picpboy6nUK0n3w=; b=CVaiwSpFa/uaK3sq1foY7Dwb8ohceK53u5/JqNeESB2SBK5JuIpQ8lxp7ufw0cr69rsjia Sj7zSywlJtxVWwdd59TU9uv2iiyEc0S965AqlKtiohq6nwkMFiL+gC2+cD/g1Hnu3Mw2NL MZYImfqJMrVJReRupv2IMvAoojMmYi4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-300-5GoJeysdOquCIPBUL9jxLg-1; Wed, 27 Apr 2022 16:05:09 -0400 X-MC-Unique: 5GoJeysdOquCIPBUL9jxLg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B611785A5BC; Wed, 27 Apr 2022 20:05:07 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 477759E74; Wed, 27 Apr 2022 20:05:02 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 16/19] KVM: x86: nSVM: implement support for nested AVIC vmexits Date: Wed, 27 Apr 2022 23:03:11 +0300 Message-Id: <20220427200314.276673-17-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" * SVM_EXIT_AVIC_UNACCELERATED_ACCESS is always forwarded to the L1 * SVM_EXIT_AVIC_INCOMPLETE_IPI is hidden from the guest if: - is_running was false in shadow physid page because L1's vCPU was scheduled out - in this case, the vCPU is waken up, and it will process nested AVIC on next VM entry - invalid physical address of avic backing page was present in the guest's physid page, which KVM translates to valid physical address of a dummy page and is_running=false. If this condition happens, the AVIC_IPI_FAILURE_INVALID_BACKING_PAGE VM exit is injected to the nested hypervisor. * Note that it is possible to have SVM_EXIT_AVIC_INCOMPLETE_IPI VM exit happen both due to host and guest related reason at the same time: For example if a broadcast IPI was attempted and some shadow physid entries had 'is_running=false' set by the guest, and some had it set to false due to scheduled out L1 vCPUs. To support this case, all relevant entries of guest's physical and logical id tables are checked, and both host related actions (e.g wakeup) and guest vm exit reflection are done. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 204 +++++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm/nested.c | 14 +++ 2 files changed, 216 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index f13ca1e7b2845..e8c53fd77f0b1 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -917,6 +917,164 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, } } +static void +avic_kick_target_vcpu_nested_physical(struct vcpu_svm *svm, + int target_l2_apic_id, + int *index, + bool *invalid_page) +{ + u64 gentry, sentry; + int target_l1_apicid; + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + + if (WARN_ON_ONCE(!t)) + return; + + /* + * This shouldn't normally happen because this condition + * should cause AVIC_IPI_FAILURE_INVALID_TARGET vmexit, + * however the guest can change the page and trigger this. + */ + if (target_l2_apic_id >= t->nentries) + return; + + gentry = t->entries[target_l2_apic_id].gentry; + sentry = *t->entries[target_l2_apic_id].sentry; + + /* Same reasoning as above */ + if (!(gentry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return; + + /* + * This races against the guest updating is_running bit. + * + * Race itself happens on real hardware as well, and the guest + * must use the correct means to avoid it. + * + * AVIC hardware already set IRR and should have done memory + * barrier, and then found out that is_running is false + * in shadow physid table. + * + * We are doing another is_running check (in the guest physid table), + * completing it, thus don't need additional memory barrier. + */ + + target_l1_apicid = physid_entry_get_apicid(gentry); + + if (target_l1_apicid == -1) { + + /* is_running is false, need to vmexit to the guest */ + if (*index == -1) { + u64 backing_page_phys = physid_entry_get_backing_table(sentry); + + *index = target_l2_apic_id; + if (backing_page_phys == t->dummy_page_hpa) + *invalid_page = true; + } + } else { + /* Wake up the target vCPU and hide the VM exit from the guest */ + struct kvm_vcpu *target = avic_vcpu_by_l1_apicid(svm->vcpu.kvm, target_l1_apicid); + + if (target && target != &svm->vcpu) + kvm_vcpu_wake_up(target); + } + + trace_kvm_avic_nested_kick_vcpu(svm->vcpu.vcpu_id, + target_l2_apic_id, + target_l1_apicid); +} + +static void +avic_kick_target_vcpus_nested_logical(struct vcpu_svm *svm, unsigned long dest, + int *index, bool *invalid_page) +{ + int logical_id; + u8 cluster = 0; + u64 *logical_id_table = (u64 *)svm->nested.l2_logical_id_table.hva; + int physical_index = -1; + + if (WARN_ON_ONCE(!logical_id_table)) + return; + + if (nested_avic_get_reg(&svm->vcpu, APIC_DFR) == APIC_DFR_CLUSTER) { + if (dest >= 0x40) + return; + cluster = dest & 0x3C; + dest &= 0x3; + } + + for_each_set_bit(logical_id, &dest, 8) { + int logical_index = cluster | logical_id; + u64 log_gentry = logical_id_table[logical_index]; + int l2_apicid = logid_get_physid(log_gentry); + + /* Should not happen as in this case AVIC should VM exit + * with 'invalid target' + + * However the guest can change the entry under KVM's back, + * thus ignore this case. + */ + if (l2_apicid == -1) + continue; + + avic_kick_target_vcpu_nested_physical(svm, l2_apicid, + &physical_index, + invalid_page); + + /* Reported index is the index of the logical entry in this case */ + if (physical_index != -1) + *index = logical_index; + } +} + +static void +avic_kick_target_vcpus_nested_broadcast(struct vcpu_svm *svm, + int *index, bool *invalid_page) +{ + struct avic_physid_table *t = svm->nested.l2_physical_id_table; + int l2_apicid; + + /* + * This races against the guest changing the valid bit in the physid + * table and/or increasing number of entries of the table. + * + * In both cases the race would happen on real hardware as well, + * thus this code can avoid synchronization vs write tracking. + */ + for_each_set_bit(l2_apicid, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) + avic_kick_target_vcpu_nested_physical(svm, l2_apicid, + index, invalid_page); +} + +static void avic_kick_target_vcpus_nested(struct kvm_vcpu *vcpu, + struct kvm_lapic *source, + u32 icrl, u32 icrh, + int *index, bool *invalid_page) +{ + struct vcpu_svm *svm = to_svm(vcpu); + int dest = GET_APIC_DEST_FIELD(icrh); + + switch (icrl & APIC_SHORT_MASK) { + case APIC_DEST_NOSHORT: + if (dest == 0xFF) + avic_kick_target_vcpus_nested_broadcast(svm, + index, invalid_page); + else if (icrl & APIC_DEST_MASK) + avic_kick_target_vcpus_nested_logical(svm, dest, + index, invalid_page); + else + avic_kick_target_vcpu_nested_physical(svm, dest, + index, invalid_page); + break; + case APIC_DEST_ALLINC: + case APIC_DEST_ALLBUT: + avic_kick_target_vcpus_nested_broadcast(svm, index, invalid_page); + break; + case APIC_DEST_SELF: + break; + } +} + int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -924,10 +1082,20 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) u32 icrl = svm->vmcb->control.exit_info_1; u32 id = svm->vmcb->control.exit_info_2 >> 32; u32 index = svm->vmcb->control.exit_info_2 & 0x1FF; + int nindex = -1; + bool invalid_page = false; + struct kvm_lapic *apic = vcpu->arch.apic; trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index); + if (is_guest_mode(&svm->vcpu)) { + if (WARN_ON_ONCE(!nested_avic_in_use(vcpu))) + return 1; + if (WARN_ON_ONCE(!svm->nested.l2_physical_id_table)) + return 1; + } + switch (id) { case AVIC_IPI_FAILURE_INVALID_INT_TYPE: /* @@ -939,23 +1107,49 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) * which case KVM needs to emulate the ICR write as well in * order to clear the BUSY flag. */ + if (is_guest_mode(&svm->vcpu)) { + nested_svm_vmexit(svm); + break; + } + if (icrl & APIC_ICR_BUSY) kvm_apic_write_nodecode(vcpu, APIC_ICR); else kvm_apic_send_ipi(apic, icrl, icrh); + break; case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: /* * At this point, we expect that the AVIC HW has already * set the appropriate IRR bits on the valid target * vcpus. So, we just need to kick the appropriate vcpu. + * + * If nested KVM might also need to reflect the VM exit to + * the guest. */ - avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh, index); + if (!is_guest_mode(&svm->vcpu)) { + avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh, index); + break; + } + + avic_kick_target_vcpus_nested(vcpu, apic, icrl, icrh, + &nindex, &invalid_page); + if (nindex != -1) { + if (invalid_page) + id = AVIC_IPI_FAILURE_INVALID_BACKING_PAGE; + + svm->vmcb->control.exit_info_2 = ((u64)id << 32) | nindex; + nested_svm_vmexit(svm); + } break; case AVIC_IPI_FAILURE_INVALID_TARGET: + if (is_guest_mode(&svm->vcpu)) + nested_svm_vmexit(svm); + else + WARN_ON_ONCE(1); break; case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE: - WARN_ONCE(1, "Invalid backing page\n"); + WARN_ON_ONCE(1); break; default: pr_err("Unknown IPI interception\n"); @@ -1064,9 +1258,13 @@ static void avic_handle_dfr_update(struct kvm_vcpu *vcpu) static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu) { + struct vcpu_svm *svm = to_svm(vcpu); u32 offset = to_svm(vcpu)->vmcb->control.exit_info_1 & AVIC_UNACCEL_ACCESS_OFFSET_MASK; + if (WARN_ON_ONCE(is_guest_mode(&svm->vcpu))) + return 0; + switch (offset) { case APIC_LDR: if (avic_handle_ldr_update(vcpu)) @@ -1124,6 +1322,8 @@ int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu) AVIC_UNACCEL_ACCESS_WRITE_MASK; bool trap = is_avic_unaccelerated_access_trap(offset); + WARN_ON_ONCE(is_guest_mode(&svm->vcpu)); + trace_kvm_avic_unaccelerated_access(vcpu->vcpu_id, offset, trap, write, vector); if (trap) { diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index eb5e9b600e052..decc665d7cc69 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1320,6 +1320,20 @@ static int nested_svm_intercept(struct vcpu_svm *svm) vmexit = NESTED_EXIT_DONE; break; } + case SVM_EXIT_AVIC_UNACCELERATED_ACCESS: { + /* + * Unaccelerated AVIC access is always reflected. + * Also there is no intercept bit for it. + */ + vmexit = NESTED_EXIT_DONE; + break; + } + case SVM_EXIT_AVIC_INCOMPLETE_IPI: + /* + * Doesn't have an intercept bit, host needs to check + * if to reflect it to the guest or handle it by itself. + */ + break; default: { if (vmcb12_is_intercept(&svm->nested.ctl, exit_code)) vmexit = NESTED_EXIT_DONE; From patchwork Wed Apr 27 20:03:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 56FCAC433EF for ; Wed, 27 Apr 2022 20:05:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 25A7610E5BB; Wed, 27 Apr 2022 20:05:44 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id C519B10E5C2 for ; Wed, 27 Apr 2022 20:05:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089941; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yA9Ogh+XZjCqc2vT+o0W3Eipd0C5KqefizxAchXjEVE=; b=Tne9iSDPebKu/feFFETnJrg5Y9qhaq4BU+iZXxi1q0bAQHObhPiNTvcvnLbnJBtzCcswdc SXDpZgYxWgYVFsILtaDn4QYNeWEtD5WBajukqIu5fBX5ExG7i6xOegCrXnGIR8rcbgXWqA VQ1EV6epGbeiF0kg6vqc/OJusfQ56go= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-263-qLc1Y9eCOsGgT0TQKKNudg-1; Wed, 27 Apr 2022 16:05:36 -0400 X-MC-Unique: qLc1Y9eCOsGgT0TQKKNudg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D94E980418D; Wed, 27 Apr 2022 20:05:34 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1676E9E74; Wed, 27 Apr 2022 20:05:08 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 17/19] KVM: x86: nSVM: implement nested AVIC doorbell emulation Date: Wed, 27 Apr 2022 23:03:12 +0300 Message-Id: <20220427200314.276673-18-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This patch implements the doorbell msr emulation for nested AVIC. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 49 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/svm/svm.h | 1 + 3 files changed, 52 insertions(+) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index e8c53fd77f0b1..149df26e17462 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -1165,6 +1165,55 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu) return 0; } +int avic_emulate_doorbell_write(struct kvm_vcpu *vcpu, u64 data) +{ + int source_l1_apicid = vcpu->vcpu_id; + int target_l1_apicid = data & AVIC_DOORBELL_PHYSICAL_ID_MASK; + bool target_running, target_nested; + struct kvm_vcpu *target; + struct vcpu_svm *svm = to_svm(vcpu); + + if (!svm->avic_enabled || (data & ~AVIC_DOORBELL_PHYSICAL_ID_MASK)) + return 1; + + target = avic_vcpu_by_l1_apicid(vcpu->kvm, target_l1_apicid); + if (!target) + /* Guest bug: targeting invalid APIC ID. */ + return 0; + + target_running = READ_ONCE(target->mode) == IN_GUEST_MODE; + target_nested = is_guest_mode(target); + + trace_kvm_avic_nested_doorbell(source_l1_apicid, target_l1_apicid, + target_nested, target_running); + + /* + * Target is not in the nested mode, thus the doorbell doesn't affect it. + * If it just became nested after is_guest_mode was checked, + * it means that it just processed AVIC state and KVM doesn't need + * to send it another doorbell. + */ + if (!target_nested) + return 0; + + /* + * If the target vCPU is in guest mode, kick the real doorbell. + * Otherwise KVM needs to try to wake it up if it was sleeping. + * + * If the target is not longer in guest mode (just exited it), + * it will either halt and before that it will notice pending IRR + * bits, and cancel halting, or it will enter the guest mode again, + * and notice the IRR bits as well. + */ + if (target_running) + wrmsr(MSR_AMD64_SVM_AVIC_DOORBELL, + kvm_cpu_get_apicid(READ_ONCE(target->cpu)), 0); + else + kvm_vcpu_wake_up(target); + + return 0; +} + static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat) { struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d96a73931d1e5..b31bab832360e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2772,6 +2772,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) u32 ecx = msr->index; u64 data = msr->data; switch (ecx) { + case MSR_AMD64_SVM_AVIC_DOORBELL: + return avic_emulate_doorbell_write(vcpu, data); case MSR_AMD64_TSC_RATIO: if (!svm->tsc_scaling_enabled) { diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 93fd9d6f5fd85..14e2c5c451cad 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -714,6 +714,7 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); void avic_reload_apic_pages(struct kvm_vcpu *vcpu); void avic_free_nested(struct kvm_vcpu *vcpu); bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu); +int avic_emulate_doorbell_write(struct kvm_vcpu *vcpu, u64 data); struct avic_physid_table * avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn); From patchwork Wed Apr 27 20:03:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 781CCC433FE for ; Wed, 27 Apr 2022 20:06:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8E35E10E5C4; Wed, 27 Apr 2022 20:05:59 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id E2AC910E5C3 for ; Wed, 27 Apr 2022 20:05:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GV3d6wfnLZ9DPMhIuJkRiJ9c+aPcmjflnu/p8xzABDY=; b=Ektl3BjUqKhFySJe2iQ9pWl40uZ/9vmkPd/PWwwGuFwner5J/HcKniEQULrP+tvglpsR2Q qyflSiWUgeydknydNPrLlNJ/SwMiW/F7odhQZZyXRaKs8wQX9hNXl39uflFqhV+VG+gEmC D55K8G8ydmWNxbGJaL2URXblARh6EQs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-553-YDhqTfXTMQaHEF-ulFcYlQ-1; Wed, 27 Apr 2022 16:05:52 -0400 X-MC-Unique: YDhqTfXTMQaHEF-ulFcYlQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EC88802812; Wed, 27 Apr 2022 20:05:51 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B0DB9E74; Wed, 27 Apr 2022 20:05:35 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 18/19] KVM: x86: SVM/nSVM: add optional non strict AVIC doorbell mode Date: Wed, 27 Apr 2022 23:03:13 +0300 Message-Id: <20220427200314.276673-19-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" By default, peers of a vCPU, can send it doorbell messages, only when that vCPU is assigned (loaded) a physical CPU. However when doorbell messages are not allowed, this causes all of the vCPU's peers to get VM exits, which is suboptimal when this vCPU is not halted, and therefore just temporary not running in the guest mode due to being scheduled out and/or having a userspace VM exit. In this case peers can't make this vCPU enter guest mode faster, and thus the VM exits they get don't do anything good. Therefore this patch introduces (disabled by default) new non strict mode (enabled by setting avic_doorbell_strict kvm_amd module param to 0), such as when it is enabled, and a vCPU is scheduled out but not halted, its peers can continue sending doorbell messages to the last physical CPU where the vCPU was last running. Security wise, a malicious guest with a compromised guest kernel, can in this mode in some cases slow down whatever is running on the last physical CPU where a vCPU was running by spamming it with doorbell messages (hammering on ICR), from its another vCPU. Thus this mode is disabled by default. However if admin policy is to have 1:1 vCPU/pCPU mapping, this mode can be useful to avoid VM exits when a vCPU has a userspace VM exit and such. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 16 +++++++++------- arch/x86/kvm/svm/svm.c | 25 +++++++++++++++++++++---- 2 files changed, 30 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 149df26e17462..4bf0f00f13c12 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -1704,7 +1704,7 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - u64 entry; + u64 old_entry, new_entry; int h_physical_id = kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm = to_svm(vcpu); @@ -1723,14 +1723,16 @@ void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (kvm_vcpu_is_blocking(vcpu)) return; - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + old_entry = READ_ONCE(*(svm->avic_physical_id_cache)); + new_entry = old_entry; - entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; - entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); - entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + new_entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + new_entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); + new_entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + + if (old_entry != new_entry) + WRITE_ONCE(*(svm->avic_physical_id_cache), new_entry); - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true); } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index b31bab832360e..099329711ad13 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -191,6 +191,10 @@ module_param(avic, bool, 0444); static bool force_avic; module_param_unsafe(force_avic, bool, 0444); +static bool avic_doorbell_strict = true; +module_param(avic_doorbell_strict, bool, 0444); + + bool __read_mostly dump_invalid_vmcb; module_param(dump_invalid_vmcb, bool, 0644); @@ -1402,10 +1406,23 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) static void svm_vcpu_put(struct kvm_vcpu *vcpu) { - if (kvm_vcpu_apicv_active(vcpu)) - __avic_vcpu_put(vcpu); - - __nested_avic_put(vcpu); + /* + * Forbid this vCPU's peers to send doorbell messages. + * Unless non strict doorbell mode is used. + * + * In this mode, doorbell messages are forbidden only when a vCPU + * blocks, since for correctness only in this case it is needed + * to intercept an IPI to wake up a vCPU. + * + * However this reduces the isolation of the guest since flood of + * spurious doorbell messages can slow a CPU running another task + * while this vCPU is scheduled out. + */ + if (avic_doorbell_strict) { + if (kvm_vcpu_apicv_active(vcpu)) + __avic_vcpu_put(vcpu); + __nested_avic_put(vcpu); + } svm_prepare_host_switch(vcpu); From patchwork Wed Apr 27 20:03:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 12829526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A3081C433F5 for ; Wed, 27 Apr 2022 20:06:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 601CF10E5D7; Wed, 27 Apr 2022 20:06:08 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB66A10E5D9 for ; Wed, 27 Apr 2022 20:06:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651089965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lggN3RfeBWkcrXGP2XX4hq9ENBmADwLyyEqxgrx+HuM=; b=fh5KHX9OX8FiSoZEIbgpYiQaAVUTzt8sL9O+pJWne2Be4mLPfJBFW4UhYBWCI2Uzm5Bvkh Ju24r48Sr/q/WMxG4WGDCdPQETcAx1s0li65642T6WGSgYktVML0gg2IxuM8Q5RB52aXac GWUntS1Dd/29o9/rYzasniZyHF266j0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-104-Gz4Bkn4gMvCnTjdaHLk12w-1; Wed, 27 Apr 2022 16:05:59 -0400 X-MC-Unique: Gz4Bkn4gMvCnTjdaHLk12w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 389A43834C16; Wed, 27 Apr 2022 20:05:58 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id C3AE49E74; Wed, 27 Apr 2022 20:05:51 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Subject: [RFC PATCH v3 19/19] KVM: x86: nSVM: expose the nested AVIC to the guest Date: Wed, 27 Apr 2022 23:03:14 +0300 Message-Id: <20220427200314.276673-20-mlevitsk@redhat.com> In-Reply-To: <20220427200314.276673-1-mlevitsk@redhat.com> References: <20220427200314.276673-1-mlevitsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , David Airlie , Dave Hansen , dri-devel@lists.freedesktop.org, "H. Peter Anvin" , Brijesh Singh , Joerg Roedel , x86@kernel.org, Maxim Levitsky , Ingo Molnar , Zhi Wang , Tom Lendacky , intel-gfx@lists.freedesktop.org, Borislav Petkov , Rodrigo Vivi , Thomas Gleixner , intel-gvt-dev@lists.freedesktop.org, Jim Mattson , Tvrtko Ursulin , Sean Christopherson , linux-kernel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This patch enables and exposes to the nested guest the support for the nested AVIC. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/svm.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 099329711ad13..431281ccc40ef 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4087,6 +4087,9 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC)) kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_X2APIC); } + + svm->avic_enabled = enable_apicv && guest_cpuid_has(vcpu, X86_FEATURE_AVIC); + init_vmcb_after_set_cpuid(vcpu); } @@ -4827,6 +4830,9 @@ static __init void svm_set_cpu_caps(void) if (vgif) kvm_cpu_cap_set(X86_FEATURE_VGIF); + if (enable_apicv) + kvm_cpu_cap_set(X86_FEATURE_AVIC); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }