From patchwork Fri Jul 12 17:00:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732004 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A813AC3DA45 for ; Fri, 12 Jul 2024 17:02:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3752A6B007B; Fri, 12 Jul 2024 13:01:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F5F06B00B7; Fri, 12 Jul 2024 13:01:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 123776B00B8; Fri, 12 Jul 2024 13:01:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DB9B66B00B7 for ; Fri, 12 Jul 2024 13:01:56 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 22319140C05 for ; Fri, 12 Jul 2024 17:01:56 +0000 (UTC) X-FDA: 82331717832.10.D832C11 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf15.hostedemail.com (Postfix) with ESMTP id 149D3A0027 for ; Fri, 12 Jul 2024 17:01:53 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oiPm41vd; spf=pass (imf15.hostedemail.com: domain of 3gWGRZggKCMQtkmuwkxlqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jackmanb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3gWGRZggKCMQtkmuwkxlqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720803697; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=syDdReUuymO3tXlu7l6fC5v0DKaIMeU5rdrn3xQ5ppEeC3nTjUT93qDG+1SpIByL68Ccwb cFPsTAHfHdfGpkSq/P7Wts+A+3tuL11Ny32VyggaNd1bmx/PFqCMIl9XGnQSpCvoxze8zM h6xhOeQKpE56eBgA1y6GY1Ieb8dwvv0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oiPm41vd; spf=pass (imf15.hostedemail.com: domain of 3gWGRZggKCMQtkmuwkxlqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jackmanb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3gWGRZggKCMQtkmuwkxlqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720803697; a=rsa-sha256; cv=none; b=7/MwrRqKh5OEuovUl0KwnEl06Meb2a9Rd4z6awXvtnFRWuZEpeQA0jkJ5VHmet5wb4bI0k n8MthJYiZfQzVF/4mJXzHUbKzKs6MoU2v8tfTbxWelNv45/prmMoM4w4Fc9TuF8gyx5HK7 Ya//s/L+MYAjOcvMvPQ7MmabuKlOi4M= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e05600ade22so3606409276.1 for ; Fri, 12 Jul 2024 10:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803713; x=1721408513; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=oiPm41vdQvuOW1bzr8/2mdQdtrF9ZVYXjRFnNlHqoNgBu60dquP/P3zF/Z0teyqTt5 rqKd1JNq8lE/zzYrgCaNs1pKflagV1azOyuP5ntztAgom9hz0GEzfdyMAL1YPpUh+oDb FyP7ytovIWpVFarq1YEVxnDNAwVfbZYGgbfOQavp5VdxP2InoFu1Jm5blQpRE5/pGTeM eey3MCpgQ8gAKlj7RwmgY2uDHY4X7etSh17KsDjr0tApdz2wK2YCz2jvPtoUpaJ1DeYf c+NRfNn0Ta03hcIldaQyO+aVSXXzS8w4mZOUuJnZYvsqtmodHyo1bhbf7/QtMvuSQ9KX IncQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803713; x=1721408513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=RmM8gEvmz7pQvF5InDS9n9dWHzq9VR0+NPizDWx2VqUTmnSOmi793ZIQ4ORhVba8v/ sYmBN7FTOYhLg2bOw0LlwK1WgdDTj2czowtyNtrQO6ghJgYkPk75TI27SDJbMMeqDUXr KE047/65+zsbe62/L6McYRpPoSOVYeLx8KNw5lkhxleYgiqgthbzQGeeQiwFHDfAxhzh zGqSvcb7DfCvD9nxzpuHenrNr7+4haThltf1dtDzeeNH3LipNjGXmQSREtOupP8rw9ON gZCSv2UyiG6lrdEIEcIHdgoX4t/3VFttqpS9p4zZ06QNM8p/as9Er3B6RbTls3A8p+1O 10LQ== X-Forwarded-Encrypted: i=1; AJvYcCVFk0hGCiDkxfp0dXD6YY0MshdeQHfSRdjgC7AXm6mmJwLtrvYTajj3yXN2VYDPWTg6ILavu5c4DEZgZj0p0TeBrRg= X-Gm-Message-State: AOJu0YwSLTEEO9WCOAGKhE1ZEuQlD62TsL9DkpfKRYL/VS/mWzLa+p4r 0dADCNXxEOlq+XJ7NNsPizXutjjIxYTjRI+X6OIvhLukgT8PDZVFfhe6iP4voIKAOgW9u0EJjZI 7Xt0LAVZZ5Q== X-Google-Smtp-Source: AGHT+IGnjpXnxbERxy2U1NWxsSb37YpVUWZ9S/9b1Rm5PwazcdcV13kMgwRbwaJighCTP4xDBxvKZC1Kul9XcQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:1389:b0:e02:f35c:d398 with SMTP id 3f1490d57ef6-e058a707db8mr92172276.0.1720803713060; Fri, 12 Jul 2024 10:01:53 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:39 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-21-144b319a40d8@google.com> Subject: [PATCH 21/26] KVM: x86: asi: Restricted address space for VM execution From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 149D3A0027 X-Stat-Signature: a39yoypjdhz3iihu4bzrsipry9sppj1x X-HE-Tag: 1720803713-947317 X-HE-Meta: U2FsdGVkX18RcjSIQxBdn/ei2hiYvuktrHsN6M0B/iAM3UjgTgM8ESmKwYD+LQXHjyEkk+RlG+EQrVq/f5NpfjK9u0eWd0dPhANCXmzw//y524kF53hJYfalu7vN2y8XPTD4rAiNaxXFETXdttmr4cvXsBvbtPSkwkIIc0Fdq85VTaITclm6KW9iBtLtDVhQW8wOFD6SbMo+w44uPYpmkBLDoT5btv+N0IS6DHhGlqSCH4n0gyLim/xghGsy7GTyOGGTpUGV35NhR8F+iy4E/aP7OnYO0jeqGpG3wWmzEM2BjJQ77nc6dyNS1yTp55yy2Dacj6ocdKGPYpoG0dIbEedOz+m33DUbtbnhQjPDGwlVQ2s0HjvDcnTutBDAtTB7T0X0xnNCoPB4NPFGIwabTvL1Gpz8f4LsvjLtegNyx8ZJ/M+1O6vb/alBtmm+vlo+8xIkg827083nPWfEsWvvU96vHLjf7x4ZqB/VruLBoyzgHUPK/lnPrSQkIOVl+ammyEMLMOKh2PwtvOj6+o5IJH4bN/jy+zHxdGPdzoF12dz9siE2PRhtMMmGgAfyeyT1+YRztYlh0B19oN9Vnx/FWM629VqKSKhqDslOtCrAZsttamHdNN7g7heYoQ9voR8c1ELEgju+hEMPlrVExaX9dD60gAZK7Hwv8wn77jsqjqlxToTgCKzdqUeMHYrx/3raqvu/2rMVb7jVxCKrv7kCqwqIk7RsnalgzPhINS/YUi4zmhIMBATTwzhEWGpSTcXfjpbsj8+c+dTvcjRtcDyVBTh/+vhJ4/aJYw0SQeCOa4lrVdblmVqDknCTQQrn51aqvZz2JpEmkutPjDegY4AGx+CWHxy2OfeE9UpGIyvjzZXbPqNcrvGpH/FxKhdXPdyltS2S9KGUX4lRPDyTHhmS7wORawxL8Xz3QUb7MhvBWWPGHvpzVsAgzahhJE166x53+4sdR/h57O+zJ+6j7qT 32YTbDwz wMilQB47zPP6cV0HRU7lXcotscZOLUIQ+gYMI6tuj81/FxgUhzx93iADrSkrqHhhsLo/3Onmk5BeeE8eb6UXOgnExUyR2ZvkMILfYuRCydx4sue3A1KCNtYm4XVHSTIco28Al7XfvBK9Nwl1O8+mIPiGMG4IETc/AYzSJ1FUQ8ljLhXX7KTQfYZNChylqxS2P1DMaj6EgKCg/ppVNgBVSsZ1uxqY77O4YJQRgqs+xtYn7ILAlZXaOwXkRSHNch73wYwGjsMpGOYAHJ0M6S3YvWXslFwNCbo1TkgxVzRhfdFFd3/7PEWCzFRC2/lWH1hsGXYIEEc2u4QtrPyA+J/VIj4gsbqUPgPfbKbHq1FK4uCgl5Xm9MXQzbxnCzH7F07I7MrRCl3nnkfonVnJ4KYvdNGL0n5qKVQkbsWGaJaPvGkIUeCcL3hoE4DB3Ii9NITBxXRXcVFrVjlYU0zgy5ZbD6SoXhMwIb/uc+GLAnDH8ublwICepzp8vXCDIXWNhaYgeR/KKqFApsdGDTWZC5AgDRV4eiw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: An ASI restricted address space is added for KVM. It is currently only enabled for Intel CPUs. This change incorporates an extra asi_exit at the end of vcpu_run. We expect later iterations of ASI to drop that call as we gain the ablity to context switch within the ASI domain. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 36 ++++++++++++++++++++++-------------- arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++++++-- 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6efd1497b0263..6c3326cb8273c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -36,6 +36,7 @@ #include #include #include +#include #define __KVM_HAVE_ARCH_VCPU_DEBUGFS @@ -1514,6 +1515,8 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + + struct asi *asi; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9aaf83c8d57df..6f9a279c12dc7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4108,6 +4108,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in guest_state_enter_irqoff(); amd_clear_divider(); + asi_enter(vcpu->kvm->arch.asi); if (sev_es_guest(vcpu->kvm)) __svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted, @@ -4115,6 +4116,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in else __svm_vcpu_run(svm, spec_ctrl_intercepted); + asi_relax(); guest_state_exit_irqoff(); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 22411f4aff530..1105d666a8ade 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -7255,14 +7256,32 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, unsigned int flags) { struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long cr3; guest_state_enter_irqoff(); + asi_enter(vcpu->kvm->arch.asi); + + /* + * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately + * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time + * it switches back to the current->mm, which can occur in KVM context + * when switching to a temporary mm to patch kernel code, e.g. if KVM + * toggles a static key while handling a VM-Exit. + * Also, this must be done after asi_enter(), as it changes CR3 + * when switching address spaces. + */ + cr3 = __get_current_cr3_fast(); + if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { + vmcs_writel(HOST_CR3, cr3); + vmx->loaded_vmcs->host_state.cr3 = cr3; + } /* * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW * mitigation for MDS is done late in VMentry and is still * executed in spite of L1D Flush. This is because an extra VERW * should not matter much after the big hammer L1D Flush. + * This is only after asi_enter() for performance reasons. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); @@ -7283,6 +7302,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, vmx->idt_vectoring_info = 0; + asi_relax(); + vmx_enable_fb_clear(vmx); if (unlikely(vmx->fail)) { @@ -7311,7 +7332,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) { struct vcpu_vmx *vmx = to_vmx(vcpu); - unsigned long cr3, cr4; + unsigned long cr4; /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && @@ -7354,19 +7375,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); vcpu->arch.regs_dirty = 0; - /* - * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately - * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time - * it switches back to the current->mm, which can occur in KVM context - * when switching to a temporary mm to patch kernel code, e.g. if KVM - * toggles a static key while handling a VM-Exit. - */ - cr3 = __get_current_cr3_fast(); - if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { - vmcs_writel(HOST_CR3, cr3); - vmx->loaded_vmcs->host_state.cr3 = cr3; - } - cr4 = cr4_read_shadow(); if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) { vmcs_writel(HOST_CR4, cr4); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91478b769af08..b9947e88d4ac6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -85,6 +85,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include "trace.h" @@ -318,6 +319,8 @@ u64 __read_mostly host_xcr0; static struct kmem_cache *x86_emulator_cache; +static int __read_mostly kvm_asi_index = -1; + /* * When called, it means the previous get/set msr reached an invalid msr. * Return true if we want to ignore/silent this failed msr access. @@ -9750,6 +9753,11 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r) goto out_free_percpu; + r = asi_register_class("KVM", NULL); + if (r < 0) + goto out_mmu_exit; + kvm_asi_index = r; + if (boot_cpu_has(X86_FEATURE_XSAVE)) { host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0; @@ -9767,7 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) r = ops->hardware_setup(); if (r != 0) - goto out_mmu_exit; + goto out_asi_unregister; kvm_ops_update(ops); @@ -9820,6 +9828,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) out_unwind_ops: kvm_x86_ops.hardware_enable = NULL; static_call(kvm_x86_hardware_unsetup)(); +out_asi_unregister: + asi_unregister_class(kvm_asi_index); out_mmu_exit: kvm_mmu_vendor_module_exit(); out_free_percpu: @@ -9851,6 +9861,7 @@ void kvm_x86_vendor_exit(void) cancel_work_sync(&pvclock_gtod_work); #endif static_call(kvm_x86_hardware_unsetup)(); + asi_unregister_class(kvm_asi_index); kvm_mmu_vendor_module_exit(); free_percpu(user_return_msrs); kmem_cache_destroy(x86_emulator_cache); @@ -11436,6 +11447,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) r = vcpu_run(vcpu); + /* + * At present ASI doesn't have the capability to transition directly + * from the restricted address space to the user address space. So we + * just return to the unrestricted address space in between. + */ + asi_exit(); + out: kvm_put_guest_fpu(vcpu); if (kvm_run->kvm_valid_regs) @@ -12539,10 +12557,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm_mmu_init_vm(kvm); - ret = static_call(kvm_x86_vm_init)(kvm); + ret = asi_init(kvm->mm, kvm_asi_index, &kvm->arch.asi); if (ret) goto out_uninit_mmu; + ret = static_call(kvm_x86_vm_init)(kvm); + if (ret) + goto out_asi_destroy; + INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list); atomic_set(&kvm->arch.noncoherent_dma_count, 0); @@ -12579,6 +12601,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return 0; +out_asi_destroy: + asi_destroy(kvm->arch.asi); out_uninit_mmu: kvm_mmu_uninit_vm(kvm); kvm_page_track_cleanup(kvm); @@ -12720,6 +12744,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_destroy_vcpus(kvm); kvfree(rcu_dereference_check(kvm->arch.apic_map, 1)); kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1)); + asi_destroy(kvm->arch.asi); kvm_mmu_uninit_vm(kvm); kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm);