From patchwork Wed Dec 11 01:32:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13902733 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 326AA7080E for ; Wed, 11 Dec 2024 01:33:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880790; cv=none; b=JSiVPLef9coitP+9WNOdgXLhQpZZPBFQL/OM15A7HybTS0LkiIH7o5EN13FjXCzaAFqGLXB/7kOnNmLpYqz9d/boc7bvXD3Uh+YfSGDUK/dJj1+80NJ0Zrtdb48X8CeRVF2DNevCU8DL7md0E4hhUahcBd3aYW9E9WWmet1v914= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880790; c=relaxed/simple; bh=rWELINKVO9cwROYYGiGuNHuCfbRf7w0n47dt3Cv4zOQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fXym48UeM2Hrr1WlHGcvWiLhrCtbFRqg5GTH3qlUnup11rvvOJLsWsViLuvv3LRk+2hVLQsZ7fpxD9opVtrkDQBvO4Y/bJzwIKSq+hJUZ8BtFNChHvl8vUtW4HIGkmE8m8SnSZ1Pbbn1M9iuOrCHBPYehmm2w65a/U5HMMSqwMM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JkUZx97k; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JkUZx97k" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-725eff499c0so2137097b3a.2 for ; Tue, 10 Dec 2024 17:33:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733880786; x=1734485586; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=+sXSf0fbdvo1bwXJdt6IwevPFQA/6DmNdkDt/p5c+pE=; b=JkUZx97kZtRYqcfW9zMcg7WX+fGauVz8M3M4ca3GIJiOHQKmtoKTRVyoyTFGYHsfw9 NvwMkEBoTyMhph+V1fKJLyTOLyUnQ5RJJ//MO60pfRyW/BVxReCfpz+8keDbBQeI7oqV RS2nzJklgCiiCZB1P8NSUlmNpMSbsBzFK0lafi7vYYcKc1JheY1jjIhS67L+4OlNHQH0 PHuVKw4LdtM+3sjgmRFDt5nNYl21PLHf7OCIFN65AswMtpFo46Aj1inMHyCvT+Cwz/zo 16BQgFMNBpWKVVBdoLr6M88R/X3FrtSWY+8epbtAj5rmlT6pWFq88nMji7M8e8WGv1l/ nTgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733880786; x=1734485586; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+sXSf0fbdvo1bwXJdt6IwevPFQA/6DmNdkDt/p5c+pE=; b=fnoxxE7VKxKkgZT/EXQW2kcCj/XWnIHPGCBfOQv2WNl5IVX1HkyBIHhgyhWApd//Km HOGzoMaX/ivVw36Yl/34pmLiTAdJbZ1LObpmHBtzT/glhHgxW3po9RP4vk0DtG1fJHil ORMvFynq1iyZfMmP8clnsAr0WEROh/wZVHaHrSF17wDkDm+JXXRMrPzEYln9G4nZ/2Cv um5nfQpgBdwTuUDIb86alszbFllhVBeMptXM11FRjP6OK5d7mNPWYxLSpfWRe1Outzjm gMXWpeTlQTSp9y4nemc6sjejJDSr9BGncUrrYOLMeUBvqN5ZwFv8hKUiPMA1IeBCUA1t XgkA== X-Gm-Message-State: AOJu0Yw+SZNBJcI7jTgsR9EkFqxBmAYm00YqIsHYyYZ5TyYE2KoVGXhk G3WE/TUBojJCNqD6BEJ3oxwtpVIJUg4uuLAtcnMMJQ81m8J8dcPbyn1fGP2cpFl0yjUT2/7Enu/ amw== X-Google-Smtp-Source: AGHT+IG3GZ4W/TYyWjLl5icoqMTXvacs+JQ1e4JM35OpRnI2r7Sy8DjpZ8xr1wEe56qgzdazQgSnYS3T5iE= X-Received: from pgbeu11.prod.google.com ([2002:a05:6a02:478b:b0:7fd:4497:f282]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:6f04:b0:1e0:d867:c880 with SMTP id adf61e73a8af0-1e1c13ee2eemr2083297637.42.1733880786547; Tue, 10 Dec 2024 17:33:06 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 10 Dec 2024 17:32:58 -0800 In-Reply-To: <20241211013302.1347853-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211013302.1347853-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog Message-ID: <20241211013302.1347853-2-seanjc@google.com> Subject: [PATCH 1/5] KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to avoid the overead of CPUID during runtime. The offset, size, and metadata for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values, i.e. is constant for a given CPU, and thus can be cached during module load. On Intel's Emerald Rapids, CPUID is *wildly* expensive, to the point where recomputing XSAVE offsets and sizes results in a 4x increase in latency of nested VM-Enter and VM-Exit (nested transitions can trigger xstate_required_size() multiple times per transition), relative to using cached values. The issue is easily visible by running `perf top` while triggering nested transitions: kvm_update_cpuid_runtime() shows up at a whopping 50%. As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald Rapids (EMR) takes: SKX 11650 ICX 22350 EMR 28850 Using cached values, the latency drops to: SKX 6850 ICX 9000 EMR 7900 The underlying issue is that CPUID itself is slow on ICX, and comically slow on EMR. The problem is exacerbated on CPUs which support XSAVES and/or XSAVEC, as KVM invokes xstate_required_size() twice on each runtime CPUID update, and because there are more supported XSAVE features (CPUID for supported XSAVE feature sub-leafs is significantly slower). SKX: CPUID.0xD.2 = 348 cycles CPUID.0xD.3 = 400 cycles CPUID.0xD.4 = 276 cycles CPUID.0xD.5 = 236 cycles EMR: CPUID.0xD.2 = 1138 cycles CPUID.0xD.3 = 1362 cycles CPUID.0xD.4 = 1068 cycles CPUID.0xD.5 = 910 cycles CPUID.0xD.6 = 914 cycles CPUID.0xD.7 = 1350 cycles CPUID.0xD.8 = 734 cycles CPUID.0xD.9 = 766 cycles CPUID.0xD.10 = 732 cycles CPUID.0xD.11 = 718 cycles CPUID.0xD.12 = 734 cycles CPUID.0xD.13 = 1700 cycles CPUID.0xD.14 = 1126 cycles CPUID.0xD.15 = 898 cycles CPUID.0xD.16 = 716 cycles CPUID.0xD.17 = 748 cycles CPUID.0xD.18 = 776 cycles Note, updating runtime CPUID information multiple times per nested transition is itself a flaw, especially since CPUID is a mandotory intercept on both Intel and AMD. E.g. KVM doesn't need to ensure emulated CPUID state is up-to-date while running L2. That flaw will be fixed in a future patch, as deferring runtime CPUID updates is more subtle than it appears at first glance, the benefits aren't super critical to have once the XSAVE issue is resolved, and caching CPUID output is desirable even if KVM's updates are deferred. Cc: Jim Mattson Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/kvm/cpuid.c | 31 ++++++++++++++++++++++++++----- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/x86.c | 2 ++ 3 files changed, 29 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 572dfa7e206e..edef30359c19 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -36,6 +36,26 @@ u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; EXPORT_SYMBOL_GPL(kvm_cpu_caps); +struct cpuid_xstate_sizes { + u32 eax; + u32 ebx; + u32 ecx; +}; + +static struct cpuid_xstate_sizes xstate_sizes[XFEATURE_MAX] __ro_after_init; + +void __init kvm_init_xstate_sizes(void) +{ + u32 ign; + int i; + + for (i = XFEATURE_YMM; i < ARRAY_SIZE(xstate_sizes); i++) { + struct cpuid_xstate_sizes *xs = &xstate_sizes[i]; + + cpuid_count(0xD, i, &xs->eax, &xs->ebx, &xs->ecx, &ign); + } +} + u32 xstate_required_size(u64 xstate_bv, bool compacted) { int feature_bit = 0; @@ -44,14 +64,15 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted) xstate_bv &= XFEATURE_MASK_EXTEND; while (xstate_bv) { if (xstate_bv & 0x1) { - u32 eax, ebx, ecx, edx, offset; - cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx, &edx); + struct cpuid_xstate_sizes *xs = &xstate_sizes[feature_bit]; + u32 offset; + /* ECX[1]: 64B alignment in compacted form */ if (compacted) - offset = (ecx & 0x2) ? ALIGN(ret, 64) : ret; + offset = (xs->ecx & 0x2) ? ALIGN(ret, 64) : ret; else - offset = ebx; - ret = max(ret, offset + eax); + offset = xs->ebx; + ret = max(ret, offset + xs->eax); } xstate_bv >>= 1; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 3d69a0ef8268..67d80aa72d50 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -31,6 +31,7 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool exact_only); +void __init kvm_init_xstate_sizes(void); u32 xstate_required_size(u64 xstate_bv, bool compacted); int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc4563fb07d1..320764e5f798 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13982,6 +13982,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault); static int __init kvm_x86_init(void) { + kvm_init_xstate_sizes(); + kvm_mmu_x86_module_init(); mitigate_smt_rsb &= boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible(); return 0; From patchwork Wed Dec 11 01:32:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13902734 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E90A31494DD for ; Wed, 11 Dec 2024 01:33:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880790; cv=none; b=mEc/Ipve3ZZW8QiMHZAn2XU4s2sL9OxPpgGKd9ccSGCmOyrRUzTTHwnONWLwZHtxiOjNLZ4JsEr4WU93LMmwr7/9cbb1/pdmA4ZRlui//WAcO7Vw8jprEC65rl5f3/sHQaFFsimIlfZjQ9/XudQ/0jUmsVJASY+q9euCHRJYTKI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880790; c=relaxed/simple; bh=8b2AQB973t30BilMva2oh6+dpiiMR2E8ld1nNQT09uQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Bvcqzwqb7a8Kfi6ns7ygP5rId6VRWcNaHTSqm2PiHxXfbFrlnLzFykziGJEymuNbKVmAZjf00yxEn3Mbba4Xog7tenVuqJQnvGIKyLUEdCHfrKN63ISzYGJMGFV9U/NI00yaB5MCpA+uP7yqtS3t3wa6YRnD9/iz8Eofk6bgdCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LqKA0CZB; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LqKA0CZB" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2f129f7717fso144683a91.0 for ; Tue, 10 Dec 2024 17:33:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733880788; x=1734485588; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Z1ZQbW1xnyC6NfLXQdYONQhlKi+Sr1eS8AiU/xUyBTs=; b=LqKA0CZBU8tlKZt1yAKIJRjMyf6Sosu1WYJ7ext6Dm+rCGW8/331Cb0AWd7dR9omWS 1hq1X//B6sBjHQYsaLocnstE5I94KU17+1McebfZtUi/qUoVz3AWvs5c3jamCfdHgA6B EO0TRWZvGkDzz+9OAr1CpQzKTqaY1ah7rr36r4ZlhpBo3rQxh6ywMkYdxGA25cOTefMu Bw2Ktz24nuFPuJvQlD5gmuwsHtbF5QFGIZsu2TwP7OQwv9Ivxwxeta1jGFsUxP9mYL0I DuY925OfwM8PTFCZyblUQXRrFdBhQHVE2RNstScg4RexxxbcxFacdYDCyzHQtKi6nzU9 7XkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733880788; x=1734485588; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Z1ZQbW1xnyC6NfLXQdYONQhlKi+Sr1eS8AiU/xUyBTs=; b=Cl6P/2Uim70LqaQhJSWNiF5Qn6a9qDZBGeLWYZa+san/mWTbMyDJt1JxHkV/AtQC13 DNQQzzqn4Kz957BY6TVbofBa1ukX8PYBQfyrvNDmu0AAHiSw5njpchKrC2Kyt67XOIMf CMFMq9RYg6Zgy8s5x6Xxvu766pKilmgASLA5d9MbzjqAGZhxkdRcIYp3jaP85LoIxVgR byK8dU7kpqvBsuKKkpECzIVyKwah4K1jDdZQQ+vjdkR1z2EdEM9wENNRk6hy/iA3umS0 VXGZt5g0UArOPkIOscjNVOmGkmWmbcy4NR1+fnKw39XH6faWqn//JF5rZnTzQivu2EAA EdLg== X-Gm-Message-State: AOJu0YxhXgICXasZfp1CYtlyGEydDoPrXBBa7gyPYEXxoeXJRWGtmBDn EgEr/l67tYibX9ZlA4fKP6VTU9T7jTeH6hz05DjBWOf3gxKw2ojLpfBpH+lMF8UxhHX/5yBQpn1 3mg== X-Google-Smtp-Source: AGHT+IGfRkEO+FbAOHWSi0VqbHNSTM5uuurs/BLJsW/phFbs70MA0Op9MVaMK8nzudTz1MdEv6o2b9u5odI= X-Received: from pjbnc12.prod.google.com ([2002:a17:90b:37cc:b0:2ea:adc3:8daa]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5447:b0:2ea:6f19:1815 with SMTP id 98e67ed59e1d1-2f12802c957mr1812786a91.24.1733880788188; Tue, 10 Dec 2024 17:33:08 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 10 Dec 2024 17:32:59 -0800 In-Reply-To: <20241211013302.1347853-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211013302.1347853-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog Message-ID: <20241211013302.1347853-3-seanjc@google.com> Subject: [PATCH 2/5] KVM: x86: Use for-loop to iterate over XSTATE size entries From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson Rework xstate_required_size() to use a for-loop and continue, to make it more obvious that the xstate_sizes[] lookups are indeed correctly bounded, and to make it (hopefully) easier to understand that the loop is iterating over supported XSAVE features. Signed-off-by: Sean Christopherson --- arch/x86/kvm/cpuid.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index edef30359c19..f73af4a98c35 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -58,25 +58,24 @@ void __init kvm_init_xstate_sizes(void) u32 xstate_required_size(u64 xstate_bv, bool compacted) { - int feature_bit = 0; u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET; + int i; xstate_bv &= XFEATURE_MASK_EXTEND; - while (xstate_bv) { - if (xstate_bv & 0x1) { - struct cpuid_xstate_sizes *xs = &xstate_sizes[feature_bit]; - u32 offset; + for (i = XFEATURE_YMM; i < ARRAY_SIZE(xstate_sizes) && xstate_bv; i++) { + struct cpuid_xstate_sizes *xs = &xstate_sizes[i]; + u32 offset; - /* ECX[1]: 64B alignment in compacted form */ - if (compacted) - offset = (xs->ecx & 0x2) ? ALIGN(ret, 64) : ret; - else - offset = xs->ebx; - ret = max(ret, offset + xs->eax); - } + if (!(xstate_bv & BIT_ULL(i))) + continue; - xstate_bv >>= 1; - feature_bit++; + /* ECX[1]: 64B alignment in compacted form */ + if (compacted) + offset = (xs->ecx & 0x2) ? ALIGN(ret, 64) : ret; + else + offset = xs->ebx; + ret = max(ret, offset + xs->eax); + xstate_bv &= ~BIT_ULL(i); } return ret; From patchwork Wed Dec 11 01:33:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13902735 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE035176AC5 for ; Wed, 11 Dec 2024 01:33:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880792; cv=none; b=p9/Z+/NEccDEaFr1Fl5LZJDmjgJSNMt0SoaXANAzXgbQ8LdhuG/gB2w0xgr5Pq9W6wP0ANjdL0pcNU1sqFy/Hh0fwymLayJhlZ1nRMpIWsRAKk6/RRGtNGgqeo0JOt+8Pi7RZdasG2Y6ulvNJM2hgorzvD7rTPSRVi9Vh2AJqtw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880792; c=relaxed/simple; bh=MEVW6xDzLLQfv9y9OXgkK6FmeeyEKbPbGZpDIKqPQkk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=GvKg9MgLDuXe1va62Z7iNRWYnWUSwDJ+5l7ZIgq26PpASzSkAtilH8Y6IcYfrY6Y2EXc8jtgK3WquKjGszwHE3o2qn8aXmxJuCQPeSaQpUmtembko3qqyz6k/8AyInsaWuNIHelyev/exk25w8n3Aqo5uaw2Mesn9z98ahNiURE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IZ6TOzIO; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IZ6TOzIO" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7f71e2fc065so4299581a12.3 for ; Tue, 10 Dec 2024 17:33:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733880790; x=1734485590; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=cL3NRVkBfZBJU3o2RWiXYFqfXEQDl3zXojuDagzC/0c=; b=IZ6TOzIOQvGBH83BDvXv2u1PeruzxdX5qpwBGsgZ9BAJMrIeXaP87bqxWdmLBX4ZC5 sCB8xBxLpq70sYr5hINj6hdHHoniCDDJ+zEr80ss/WGvsoe7q8lJdlQjQTBO0NTdgdEU k+3W32niR8oDMep3bbPPUWesNATE1QKGEpp3IkCWX2Ts3JoKa5Ze6M0snSj8Wt04v1kc fgnW37Di+2pihN8lshttc0GKs37kojvd/FSO4w2Tcwln2iNFUM7Ng7nqqlCWyYOdwaoj aHupjFYwz8n8aGtzl7lPUPdA8vOCgGwXGsml5ls30UspEq/JV0Egz9uwrinhwjEYPJF0 5Mwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733880790; x=1734485590; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cL3NRVkBfZBJU3o2RWiXYFqfXEQDl3zXojuDagzC/0c=; b=FB2Jns1cLHRON+zN6BMVmZ7cNDEu98VSCm1wfCGN877k6cqXQJZav/4hMsvTvjW7in Fxh6QcZ71YKWu5iG4ywLyStzHyPv3QmsdKnsEZ22NpdMuZxGQAukycADSJphHX3vgCWb m5wVc3bvZjIOxnXtD3qRg30VMXAywvSEM4HZuWO1eWI0y0EO7sYiUlv/MYI7a01px72i 22VF6AHY8JrzyNsZvbnmAjh78SS9GwWetjdHmsrav8HXr9rCVHhp3nN+T70+EEgA/jZt uJVb+ELACjt51/LNvDy3/lSoBed2mjAEihGqMplZQiWydTuuoPSiLup6JGsK2ZdialqL xXvg== X-Gm-Message-State: AOJu0Yyxfn8+91yPX3eMitORwBUacxq63LJKJLfrzuniLjgh/H93OhkG 7k3J7YHIjzg6ft8P3Q2bRtS1UiLKHr3BUgGDx5O2HS/Qh14aFZP7QsVR+CMGz7lrSiKaEcx4Qk7 sxw== X-Google-Smtp-Source: AGHT+IHE/2gxSbX+S+9LU/gh8HaW14iyj5UQ7k4tSNCWmJQgUFrk9t8D9rOA16e9oYsL7aftQDmDCAMD5+A= X-Received: from pjbqx8.prod.google.com ([2002:a17:90b:3e48:b0:2e2:9021:cf53]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3e82:b0:2ee:94d1:7a9d with SMTP id 98e67ed59e1d1-2f12804a1cdmr1552457a91.32.1733880789942; Tue, 10 Dec 2024 17:33:09 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 10 Dec 2024 17:33:00 -0800 In-Reply-To: <20241211013302.1347853-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211013302.1347853-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog Message-ID: <20241211013302.1347853-4-seanjc@google.com> Subject: [PATCH 3/5] KVM: x86: Apply TSX_CTRL_CPUID_CLEAR if and only if the vCPU has RTM or HLE From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson When emulating CPUID, retrieve MSR_IA32_TSX_CTRL.TSX_CTRL_CPUID_CLEAR if and only if RTM and/or HLE feature bits need to be cleared. Getting the MSR value is unnecessary if neither bit is set, and avoiding the lookup saves ~80 cycles for vCPUs without RTM or HLE. Cc: Jim Mattson Signed-off-by: Sean Christopherson Reviewed-by: Jim Mattson --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index f73af4a98c35..7f5fa6665969 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1998,7 +1998,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, *edx = entry->edx; if (function == 7 && index == 0) { u64 data; - if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && + if ((*ebx & (feature_bit(RTM) | feature_bit(HLE))) && + !__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && (data & TSX_CTRL_CPUID_CLEAR)) *ebx &= ~(feature_bit(RTM) | feature_bit(HLE)); } else if (function == 0x80000007) { From patchwork Wed Dec 11 01:33:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13902736 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D52319995B for ; Wed, 11 Dec 2024 01:33:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880793; cv=none; b=nKIM9DQGDSjedBtVgqlI85bizR6UXGz9ZPhSnZ9u1kB7jzgBDZ5Khnwr/THoj9h87zUJQo6Cr/pdm48NB0GE7TN82JIuyyKWJSTsoSoHbnf5DX97OIpvF4+TepW/0cBa30zqTVcgOTfyxTd8v2wFzAUsmAOn+k++dOskD+v8nCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880793; c=relaxed/simple; bh=pVN9X3/LQlagv85CHblFqLUrbHNMnt6biCo4ArBf60M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=I1lJQ82jH8FiyGbE5E2Dli0YyoeM7udRDpyXYToJaxdctWM7m/H0uFVI3ocvd51PhBIgAz6T5/kKwdaNSluwJY9+oQNY5CXtRkGfO1191CwC5XB6Pfs1PrRrgkopXoSi7sCNBS0GetwppDXgLXjpK8zMsHCESw+vOl/vFWRhtZw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DDNCbXqQ; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DDNCbXqQ" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2efc3292021so3501192a91.1 for ; Tue, 10 Dec 2024 17:33:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733880791; x=1734485591; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=E9jKmjJGCtx++sJ0UBKZjnf35PeNXL0CwNPg/tsJ8RY=; b=DDNCbXqQRUjcixf2h6TjNxLqc8gx9s+2bEX/zK6vvOYIe+GO13tc/HJl+VEEnXwr17 jmk5w/eyE6hHKVwHGXrsXiatgulUOg/Sp7EN0Ee5LBNc0qDTMQCVw+YwZ+kBtscdoJhW CALsUaCIl9iMauG2Vsozthb0+fbo3gI9ifpX5UthAlSReFyPN7/AHi031xTuo0rIehyS Loq8wiISmiWv6dtoLMerPtnzIhMFdBjzgguXbj3wo61fMcXhM/97vABQLOnzxMZdPxy3 AYZxEkndz3fm4biCfDmk83LH+KpGXWlm/ibyLLdzok4Eq62PHhUmdOEtKE2ztFBsix16 n3hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733880791; x=1734485591; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=E9jKmjJGCtx++sJ0UBKZjnf35PeNXL0CwNPg/tsJ8RY=; b=abpnTqfiV+ozgE4lWy3FMSPN8KW+dB7EV0rUetg35IAn6FMqHhJrk8Q5Wq/yQ+nxF0 euZjyouyYNNkhzL3B113qvx0Mjjl4EMoR/Rm4J3VqsNQBddXeRalOUWPXdtCtt/V+Hrb FS5V6wd9xIDe/e9tY2NlqqrPvmsnlxFTmxPI7jju5BcKmBtIuSMKz3c1BGHT2rcayXzx jni+pnmefOyoNMbbg6dXOwnOYiebXV7WjQZey8O3kqKhAFHTQDOktCOkPlVz2Ef0mjp3 2TCchJ7BO+m6pIXydyx6EowVgzqfQsRkWCszhoGOIE+pEpH2jasJk/iKVfCImoqQF2c6 +U2w== X-Gm-Message-State: AOJu0YzCvuaSlti+ObYlAk2T+zTmRAimGsa/oe2PQz8ZhT07LNK2GYSC IWehlmdq+8njrtBJ4fIa5LPbxN1RHkh8/SxFkQBdCp/zwg4qeQ1Zp/2vwBmr2WlWoch3VoGmoAW DRw== X-Google-Smtp-Source: AGHT+IG09HOqKskQ/NpzxWW95bfRUV0AChGHA0pVhWpnKemNkwgkww9Mg4Vap1JIsW27rzv5Znl+GPfsFg0= X-Received: from pjbsw6.prod.google.com ([2002:a17:90b:2c86:b0:2ee:4b37:f869]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5111:b0:2ee:9d49:3ae6 with SMTP id 98e67ed59e1d1-2f127fc46ffmr1820845a91.10.1733880791641; Tue, 10 Dec 2024 17:33:11 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 10 Dec 2024 17:33:01 -0800 In-Reply-To: <20241211013302.1347853-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211013302.1347853-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog Message-ID: <20241211013302.1347853-5-seanjc@google.com> Subject: [PATCH 4/5] KVM: x86: Query X86_FEATURE_MWAIT iff userspace owns the CPUID feature bit From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson Rework MONITOR/MWAIT emulation to query X86_FEATURE_MWAIT if and only if the MISC_ENABLE_NO_MWAIT quirk is enabled, in which case MWAIT is not a dynamic, KVM-controlled CPUID feature. KVM's funky ABI for that quirk is to emulate MONITOR/MWAIT as nops if userspace sets MWAIT in guest CPUID. For the case where KVM owns the MWAIT feature bit, check MISC_ENABLES itself, i.e. check the actual control, not its reflection in guest CPUID. Avoiding consumption of dynamic CPUID features will allow KVM to defer runtime CPUID updates until kvm_emulate_cpuid(), i.e. until the updates become visible to the guest. Alternatively, KVM could play other games with runtime CPUID updates, e.g. by precisely specifying which feature bits to update, but doing so adds non-trivial complexity and doesn't solve the underlying issue of unnecessary updates causing meaningful overhead for nested virtualization roundtrips. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 320764e5f798..dc8829712edd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2080,10 +2080,20 @@ EXPORT_SYMBOL_GPL(kvm_handle_invalid_op); static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn) { - if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) && - !guest_cpu_cap_has(vcpu, X86_FEATURE_MWAIT)) + bool enabled; + + if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)) + goto emulate_as_nop; + + if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) + enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_MWAIT); + else + enabled = vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT; + + if (!enabled) return kvm_handle_invalid_op(vcpu); +emulate_as_nop: pr_warn_once("%s instruction emulated as NOP!\n", insn); return kvm_emulate_as_nop(vcpu); } From patchwork Wed Dec 11 01:33:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13902737 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 085931AA1C8 for ; Wed, 11 Dec 2024 01:33:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880796; cv=none; b=sEpIe0h11NYd3E+y5IIVrCMypd3E3ehqDBJWE2HktvYE0d00Tko8PrEiro5S+lldLPGLG/AR/aCwQN37ypMEuMTdrnde+ZLfQVAcmD2CPjCw7Jedq6tEuJ8TBMJibhdCkYyNWlEy0yUb252QYKd51cpviVEFIquvxRzNsppMCbQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733880796; c=relaxed/simple; bh=GeZjefPzaSzlW9P5U7c5cbWNrl4gdZzPq5kjW0P6E6Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lSo56z7B3vwIMDdVH7dSP4ZRKOMkJmfOXD+ope/TDD6Y5gC3/LltkgLd2VMRCJdcCEzSQUIQxRYBISzz/6jUisthu9jNHDceX0LxwTm9/ppGbian5yfNdTJgGQ3qYNIqVUrQDXOzhGX1DcPoXWX8r2f+DZoOrcaeaVHufUZ4ZUU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yNnm3B7U; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yNnm3B7U" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7f3e30a441aso4145397a12.2 for ; Tue, 10 Dec 2024 17:33:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733880793; x=1734485593; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=HZ5sJeak2EiI3SOZqjSO6VGW0fiyRkqJf4P1SWT3nKE=; b=yNnm3B7U0Pe2E/brk7morvl0klLgROn/AX6dOmZQbzNK4jh30Hd3ExCRK2e0s/AeJx ypYUvnLeptFMPb9JLthYWIX2h1WcgUQEIrrBm235Qwv1pD3IxblPX+Hf2JPYDb7/jZ9z D3HBHaBlvk3LE4jVMsduiH2J/knmGSJZ/iPl3nUJUp+l0aZXoT9lmEwQsFjtTzJxEOAX 7CbefCcmiBBUHVN0WAX7imoPZFiOOtgz9PiQlRiV5RfuIZq+7PYcUp7uwHnGe5T/FaYd domPbhfHpkioPKINvAraAdz9nfMapccvK4O7sHqqxzvqTCv1lKfJZKS9k8nbwwd4Fgvl kb7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733880793; x=1734485593; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HZ5sJeak2EiI3SOZqjSO6VGW0fiyRkqJf4P1SWT3nKE=; b=kAVujZ+pLBZRle9g/6W/4stCLIUW6fuPnNB9CPH7JEzOROyWKwtr/XdEqTFXBcDSxh yXWf86yUXu/IYpwewunojUWs9byvku1LhLcxMaAdplLiwHK1kUgIpHkBUylWK7Po8VUx jCyKfbtogdQKkTGgRETxqqLlE2dy5uF1ZJosBxRl9us6dgtZtb+cp9j2iIjWTakZCc9v J6Q3xYqqPBQAIw8gNG9foKzVdNkLbfoEW/cHrgraV690RYAcnXXai6PgCqz6dSdshkNx suOyFxeSdYIrS/6LWFBgbn36ifzX6jNylo9xgQQQykYSz/cfuR4Kuhfvk9Z8FFdYJIav qxSw== X-Gm-Message-State: AOJu0Yw27ni1H6Jiww6UXW08xeF9LZ9d/3F7rdcVJFLRYE/6GFjy2wdn iajOAup99CqeUgzG374VLQOe79ighhstEiWN/erEs0CeOItbQNF6SruxudFiZ0fLdU53AHvLPlS rBQ== X-Google-Smtp-Source: AGHT+IF9gRHWUxSXw+ed16y1ZE/J0WJZZNP87qlPzDNpWKIFFMyqHS+ld7Ia5f+mFIBaDw0CKeEge/S/ndY= X-Received: from pjbsh15.prod.google.com ([2002:a17:90b:524f:b0:2ef:9b30:69d3]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:180f:b0:2ee:cd83:8fc3 with SMTP id 98e67ed59e1d1-2f12804c61cmr1774285a91.37.1733880793368; Tue, 10 Dec 2024 17:33:13 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 10 Dec 2024 17:33:02 -0800 In-Reply-To: <20241211013302.1347853-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211013302.1347853-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog Message-ID: <20241211013302.1347853-6-seanjc@google.com> Subject: [PATCH 5/5] KVM: x86: Defer runtime updates of dynamic CPUID bits until CPUID emulation From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson Defer runtime CPUID updates until the next non-faulting CPUID emulation or KVM_GET_CPUID2, which are the only paths in KVM that consume the dynamic entries. Deferring the updates is especially beneficial to nested VM-Enter/VM-Exit, as KVM will almost always detect multiple state changes, not to mention the updates don't need to be realized while L2 is active, as CPUID is a mandatory intercept on both Intel and AMD. Deferring CPUID updates shaves several hundred cycles from nested VMX roundtrips, as measured from L2 executing CPUID in a tight loop: SKX 6850 => 6450 ICX 9000 => 8800 EMR 7900 => 7700 Alternatively, KVM could update only the CPUID leaves that are affected by the state change, e.g. update XSAVE info only if XCR0 or XSS changes, but that adds non-trivial complexity and doesn't solve the underlying problem of nested transitions potentially changing both XCR0 and XSS, on both nested VM-Enter and VM-Exit. KVM could also skip updates entirely while L2 is active, because again CPUID is a mandatory intercept. However, simply skipping updates if L2 is active is *very* subtly dangerous and complex. Most KVM updates are triggered by changes to the current vCPU state, which may be L2 state whereas performing updates only for L1 would requiring detecting changes to L1 state. KVM would need to either track relevant L1 state, or defer runtime CPUID updates until the next nested VM-Exit. The former is ugly and complex, while the latter comes with similar dangers to deferring all CPUID updates, and would only address the nested VM-Enter path. To guard against using stale data, disallow querying dynamic CPUID feature bits, i.e. features that KVM updates at runtime, via a compile-time assertion in guest_cpu_cap_has(). Exempt MWAIT from the rule, as the MISC_ENABLE_NO_MWAIT means that MWAIT is _conditionally_ a dynamic CPUID feature. Note, the rule could be enforced for MWAIT as well, e.g. by querying guest CPUID in kvm_emulate_monitor_mwait, but there's no obvious advtantage to doing so, and allowing MWAIT for guest_cpuid_has() opens up a different can of worms. MONITOR/MWAIT can't be virtualized (for a reasonable definition), and the nature of the MWAIT_NEVER_UD_FAULTS and MISC_ENABLE_NO_MWAIT quirks means checking X86_FEATURE_MWAIT outside of kvm_emulate_monitor_mwait() is wrong for other reasons. Beyond the aforementioned feature bits, the only other dynamic CPUID (sub)leaves are the XSAVE sizes, and similar to MWAIT, consuming those CPUID entries in KVM is all but guaranteed to be a bug. The layout for an actual XSAVE buffer depends on the format (compacted or not) and potentially the features that are actually enabled. E.g. see the logic in fpstate_clear_xstate_component() needed to poke into the guest's effective XSAVE state to clear MPX state on INIT. KVM does consume CPUID.0xD.0.{EAX,EDX} in kvm_check_cpuid() and cpuid_get_supported_xcr0(), but not EBX, which is the only dynamic output register in the leaf. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c | 12 ++++++++++-- arch/x86/kvm/cpuid.h | 9 ++++++++- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/smm.c | 2 +- arch/x86/kvm/svm/sev.c | 2 +- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 6 +++--- 9 files changed, 27 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 81ce8cd5814a..23cc5c10060e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -871,6 +871,7 @@ struct kvm_vcpu_arch { int cpuid_nent; struct kvm_cpuid_entry2 *cpuid_entries; + bool cpuid_dynamic_bits_dirty; bool is_amd_compatible; /* diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7f5fa6665969..54ba1a75b779 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -195,6 +195,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu) } static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu); +static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); /* Check whether the supplied CPUID data is equal to what is already set for the vCPU. */ static int kvm_cpuid_check_equal(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, @@ -299,10 +300,12 @@ static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu, guest_cpu_cap_change(vcpu, x86_feature, has_feature); } -void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) +static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; + vcpu->arch.cpuid_dynamic_bits_dirty = false; + best = kvm_find_cpuid_entry(vcpu, 1); if (best) { kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSXSAVE, @@ -332,7 +335,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) cpuid_entry_has(best, X86_FEATURE_XSAVEC))) best->ebx = xstate_required_size(vcpu->arch.xcr0, true); } -EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime); static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu) { @@ -645,6 +647,9 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, if (cpuid->nent < vcpu->arch.cpuid_nent) return -E2BIG; + if (vcpu->arch.cpuid_dynamic_bits_dirty) + kvm_update_cpuid_runtime(vcpu); + if (copy_to_user(entries, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) return -EFAULT; @@ -1983,6 +1988,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, struct kvm_cpuid_entry2 *entry; bool exact, used_max_basic = false; + if (vcpu->arch.cpuid_dynamic_bits_dirty) + kvm_update_cpuid_runtime(vcpu); + entry = kvm_find_cpuid_entry_index(vcpu, function, index); exact = !!entry; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 67d80aa72d50..d2884162a46a 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -11,7 +11,6 @@ extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; void kvm_set_cpu_caps(void); void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu); -void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, u32 function, u32 index); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, @@ -232,6 +231,14 @@ static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, { unsigned int x86_leaf = __feature_leaf(x86_feature); + /* + * Except for MWAIT, querying dynamic feature bits is disallowed, so + * that KVM can defer runtime updates until the next CPUID emulation. + */ + BUILD_BUG_ON(x86_feature == X86_FEATURE_APIC || + x86_feature == X86_FEATURE_OSXSAVE || + x86_feature == X86_FEATURE_OSPKE); + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index ae81ae27d534..cf74c87b8b3f 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2585,7 +2585,7 @@ static void __kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value) vcpu->arch.apic_base = value; if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; if (!apic) return; diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c index e0ab7df27b66..699e551ec93b 100644 --- a/arch/x86/kvm/smm.c +++ b/arch/x86/kvm/smm.c @@ -358,7 +358,7 @@ void enter_smm(struct kvm_vcpu *vcpu) goto error; #endif - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; kvm_mmu_reset_context(vcpu); return; error: diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 09be12a44288..5e4581ed0ef1 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -3274,7 +3274,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) if (kvm_ghcb_xcr0_is_valid(svm)) { vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; } /* Copy the GHCB exit information into the VMCB fields */ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 07911ddf1efe..6a350cee2f6c 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1936,7 +1936,7 @@ void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) vmcb_mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR); if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE)) - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; } static void svm_set_segment(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cf872d8691b5..b5f3c5628bfd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3516,7 +3516,7 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) vmcs_writel(GUEST_CR4, hw_cr4); if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE)) - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; } void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dc8829712edd..10b7d8c01e4d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1264,7 +1264,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) vcpu->arch.xcr0 = xcr0; if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND) - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; return 0; } @@ -3899,7 +3899,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3)) return 1; vcpu->arch.ia32_misc_enable_msr = data; - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; } else { vcpu->arch.ia32_misc_enable_msr = data; } @@ -3934,7 +3934,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (data & ~kvm_caps.supported_xss) return 1; vcpu->arch.ia32_xss = data; - kvm_update_cpuid_runtime(vcpu); + vcpu->arch.cpuid_dynamic_bits_dirty = true; break; case MSR_SMI_COUNT: if (!msr_info->host_initiated)