From patchwork Sun May 1 22:07:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833786 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90C00C433EF for ; Sun, 1 May 2022 22:08:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355596AbiEAWL0 (ORCPT ); Sun, 1 May 2022 18:11:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355565AbiEAWLY (ORCPT ); Sun, 1 May 2022 18:11:24 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6582A1CB37; Sun, 1 May 2022 15:07:57 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjH-0008MX-9t; Mon, 02 May 2022 00:07:47 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 01/12] KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02 Date: Mon, 2 May 2022 00:07:25 +0200 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: "Maciej S. Szmigiero" The next_rip field of a VMCB is *not* an output-only field for a VMRUN. This field value (instead of the saved guest RIP) in used by the CPU for the return address pushed on stack when injecting a software interrupt or INT3 or INTO exception. Make sure this field gets synced from vmcb12 to vmcb02 when entering L2 or loading a nested state and NRIPS is exposed to L1. If NRIPS is supported in hardware but not exposed to L1 (nrips=0 or hidden by userspace), stuff vmcb02's next_rip from the new L2 RIP to emulate a !NRIPS CPU (which saves RIP on the stack as-is). Reviewed-by: Maxim Levitsky Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/nested.c | 22 +++++++++++++++++++--- arch/x86/kvm/svm/svm.h | 1 + 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index bed5e1692cef..91a1f9ff2b86 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -371,6 +371,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu, to->nested_ctl = from->nested_ctl; to->event_inj = from->event_inj; to->event_inj_err = from->event_inj_err; + to->next_rip = from->next_rip; to->nested_cr3 = from->nested_cr3; to->virt_ext = from->virt_ext; to->pause_filter_count = from->pause_filter_count; @@ -608,7 +609,8 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12 } } -static void nested_vmcb02_prepare_control(struct vcpu_svm *svm) +static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, + unsigned long vmcb12_rip) { u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK; u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK; @@ -662,6 +664,19 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm) vmcb02->control.event_inj = svm->nested.ctl.event_inj; vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err; + /* + * next_rip is consumed on VMRUN as the return address pushed on the + * stack for injected soft exceptions/interrupts. If nrips is exposed + * to L1, take it verbatim from vmcb12. If nrips is supported in + * hardware but not exposed to L1, stuff the actual L2 RIP to emulate + * what a nrips=0 CPU would do (L1 is responsible for advancing RIP + * prior to injecting the event). + */ + if (svm->nrips_enabled) + vmcb02->control.next_rip = svm->nested.ctl.next_rip; + else if (boot_cpu_has(X86_FEATURE_NRIPS)) + vmcb02->control.next_rip = vmcb12_rip; + vmcb02->control.virt_ext = vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK; if (svm->lbrv_enabled) @@ -745,7 +760,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr); svm_switch_vmcb(svm, &svm->nested.vmcb02); - nested_vmcb02_prepare_control(svm); + nested_vmcb02_prepare_control(svm, vmcb12->save.rip); nested_vmcb02_prepare_save(svm, vmcb12); ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3, @@ -1418,6 +1433,7 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst, dst->nested_ctl = from->nested_ctl; dst->event_inj = from->event_inj; dst->event_inj_err = from->event_inj_err; + dst->next_rip = from->next_rip; dst->nested_cr3 = from->nested_cr3; dst->virt_ext = from->virt_ext; dst->pause_filter_count = from->pause_filter_count; @@ -1602,7 +1618,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, nested_copy_vmcb_control_to_cache(svm, ctl); svm_switch_vmcb(svm, &svm->nested.vmcb02); - nested_vmcb02_prepare_control(svm); + nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip); /* * While the nested guest CR3 is already checked and set by diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 32220a1b0ea2..7d97e4d18c8b 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -139,6 +139,7 @@ struct vmcb_ctrl_area_cached { u64 nested_ctl; u32 event_inj; u32 event_inj_err; + u64 next_rip; u64 nested_cr3; u64 virt_ext; u32 clean; From patchwork Sun May 1 22:07:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2057C4332F for ; Sun, 1 May 2022 22:08:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359529AbiEAWLc (ORCPT ); Sun, 1 May 2022 18:11:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355597AbiEAWL1 (ORCPT ); Sun, 1 May 2022 18:11:27 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EE301C93F; Sun, 1 May 2022 15:08:00 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjM-0008Mj-L7; Mon, 02 May 2022 00:07:52 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 02/12] KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0 Date: Mon, 2 May 2022 00:07:26 +0200 Message-Id: <35426af6e123cbe91ec7ce5132ce72521f02b1b5.1651440202.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: "Maciej S. Szmigiero" Don't BUG/WARN on interrupt injection due to GIF being cleared, since it's trivial for userspace to force the situation via KVM_SET_VCPU_EVENTS (even if having at least a WARN there would be correct for KVM internally generated injections). kernel BUG at arch/x86/kvm/svm/svm.c:3386! invalid opcode: 0000 [#1] SMP CPU: 15 PID: 926 Comm: smm_test Not tainted 5.17.0-rc3+ #264 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:svm_inject_irq+0xab/0xb0 [kvm_amd] Code: <0f> 0b 0f 1f 00 0f 1f 44 00 00 80 3d ac b3 01 00 00 55 48 89 f5 53 RSP: 0018:ffffc90000b37d88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810a234ac0 RCX: 0000000000000006 RDX: 0000000000000000 RSI: ffffc90000b37df7 RDI: ffff88810a234ac0 RBP: ffffc90000b37df7 R08: ffff88810a1fa410 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff888109571000 R14: ffff88810a234ac0 R15: 0000000000000000 FS: 0000000001821380(0000) GS:ffff88846fdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f74fc550008 CR3: 000000010a6fe000 CR4: 0000000000350ea0 Call Trace: inject_pending_event+0x2f7/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x791/0x17a0 [kvm] kvm_vcpu_ioctl+0x26d/0x650 [kvm] __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 219b65dcf6c0 ("KVM: SVM: Improve nested interrupt injection") Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/svm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 75b4f3ac8b1a..1cec671fc668 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3384,8 +3384,6 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - BUG_ON(!(gif_set(svm))); - trace_kvm_inj_virq(vcpu->arch.interrupt.nr); ++vcpu->stat.irq_injections; From patchwork Sun May 1 22:07:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C783C433FE for ; Sun, 1 May 2022 22:08:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376452AbiEAWLk (ORCPT ); Sun, 1 May 2022 18:11:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376843AbiEAWLg (ORCPT ); Sun, 1 May 2022 18:11:36 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EF4B1DA73; Sun, 1 May 2022 15:08:07 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjS-0008Mw-12; Mon, 02 May 2022 00:07:58 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 03/12] KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails" Date: Mon, 2 May 2022 00:07:27 +0200 Message-Id: <450133cf0a026cb9825a2ff55d02cb136a1cb111.1651440202.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Unwind the RIP advancement done by svm_queue_exception() when injecting an INT3 ultimately "fails" due to the CPU encountering a VM-Exit while vectoring the injected event, even if the exception reported by the CPU isn't the same event that was injected. If vectoring INT3 encounters an exception, e.g. #NP, and vectoring the #NP encounters an intercepted exception, e.g. #PF when KVM is using shadow paging, then the #NP will be reported as the event that was in-progress. Note, this is still imperfect, as it will get a false positive if the INT3 is cleanly injected, no VM-Exit occurs before the IRET from the INT3 handler in the guest, the instruction following the INT3 generates an exception (directly or indirectly), _and_ vectoring that exception encounters an exception that is intercepted by KVM. The false positives could theoretically be solved by further analyzing the vectoring event, e.g. by comparing the error code against the expected error code were an exception to occur when vectoring the original injected exception, but SVM without NRIPS is a complete disaster, trying to make it 100% correct is a waste of time. Reviewed-by: Maxim Levitsky Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3") Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/svm.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 1cec671fc668..1a719f47a964 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3698,6 +3698,18 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK; type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK; + /* + * If NextRIP isn't enabled, KVM must manually advance RIP prior to + * injecting the soft exception/interrupt. That advancement needs to + * be unwound if vectoring didn't complete. Note, the _new_ event may + * not be the injected event, e.g. if KVM injected an INTn, the INTn + * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will + * be the reported vectored event, but RIP still needs to be unwound. + */ + if (int3_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && + kvm_is_linear_rip(vcpu, svm->int3_rip)) + kvm_rip_write(vcpu, kvm_rip_read(vcpu) - int3_injected); + switch (type) { case SVM_EXITINTINFO_TYPE_NMI: vcpu->arch.nmi_injected = true; @@ -3711,16 +3723,11 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) /* * In case of software exceptions, do not reinject the vector, - * but re-execute the instruction instead. Rewind RIP first - * if we emulated INT3 before. + * but re-execute the instruction instead. */ - if (kvm_exception_is_soft(vector)) { - if (vector == BP_VECTOR && int3_injected && - kvm_is_linear_rip(vcpu, svm->int3_rip)) - kvm_rip_write(vcpu, - kvm_rip_read(vcpu) - int3_injected); + if (kvm_exception_is_soft(vector)) break; - } + if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { u32 err = svm->vmcb->control.exit_int_info_err; kvm_requeue_exception_e(vcpu, vector, err); From patchwork Sun May 1 22:07:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833796 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E375AC433EF for ; Sun, 1 May 2022 22:10:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377013AbiEAWLl (ORCPT ); Sun, 1 May 2022 18:11:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355597AbiEAWLi (ORCPT ); Sun, 1 May 2022 18:11:38 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC04B26118; Sun, 1 May 2022 15:08:09 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjX-0008NO-BV; Mon, 02 May 2022 00:08:03 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 04/12] KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported Date: Mon, 2 May 2022 00:07:28 +0200 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson If NRIPS is supported in hardware but disabled in KVM, set next_rip to the next RIP when advancing RIP as part of emulating INT3 injection. There is no flag to tell the CPU that KVM isn't using next_rip, and so leaving next_rip is left as is will result in the CPU pushing garbage onto the stack when vectoring the injected event. Reviewed-by: Maxim Levitsky Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3") Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/svm.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 1a719f47a964..a05a4e521400 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -392,6 +392,10 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu) */ (void)svm_skip_emulated_instruction(vcpu); rip = kvm_rip_read(vcpu); + + if (boot_cpu_has(X86_FEATURE_NRIPS)) + svm->vmcb->control.next_rip = rip; + svm->int3_rip = rip + svm->vmcb->save.cs.base; svm->int3_injected = rip - old_rip; } @@ -3701,7 +3705,7 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) /* * If NextRIP isn't enabled, KVM must manually advance RIP prior to * injecting the soft exception/interrupt. That advancement needs to - * be unwound if vectoring didn't complete. Note, the _new_ event may + * be unwound if vectoring didn't complete. Note, the new event may * not be the injected event, e.g. if KVM injected an INTn, the INTn * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will * be the reported vectored event, but RIP still needs to be unwound. From patchwork Sun May 1 22:07:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833788 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 530C1C433EF for ; Sun, 1 May 2022 22:08:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377166AbiEAWLx (ORCPT ); Sun, 1 May 2022 18:11:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377243AbiEAWLo (ORCPT ); Sun, 1 May 2022 18:11:44 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1105226118; Sun, 1 May 2022 15:08:16 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjc-0008Nc-M5; Mon, 02 May 2022 00:08:08 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 05/12] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction Date: Mon, 2 May 2022 00:07:29 +0200 Message-Id: <65cb88deab40bc1649d509194864312a89bbe02e.1651440202.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Re-inject INT3/INTO instead of retrying the instruction if the CPU encountered an intercepted exception while vectoring the software exception, e.g. if vectoring INT3 encounters a #PF and KVM is using shadow paging. Retrying the instruction is architecturally wrong, e.g. will result in a spurious #DB if there's a code breakpoint on the INT3/O, and lack of re-injection also breaks nested virtualization, e.g. if L1 injects a software exception and vectoring the injected exception encounters an exception that is intercepted by L0 but not L1. Due to, ahem, deficiencies in the SVM architecture, acquiring the next RIP may require flowing through the emulator even if NRIPS is supported, as the CPU clears next_rip if the VM-Exit is due to an exception other than "exceptions caused by the INT3, INTO, and BOUND instructions". To deal with this, "skip" the instruction to calculate next_rip (if it's not already known), and then unwind the RIP write and any side effects (RFLAGS updates). Save the computed next_rip and use it to re-stuff next_rip if injection doesn't complete. This allows KVM to do the right thing if next_rip was known prior to injection, e.g. if L1 injects a soft event into L2, and there is no backing INTn instruction, e.g. if L1 is injecting an arbitrary event. Note, it's impossible to guarantee architectural correctness given SVM's architectural flaws. E.g. if the guest executes INTn (no KVM injection), an exit occurs while vectoring the INTn, and the guest modifies the code stream while the exit is being handled, KVM will compute the incorrect next_rip due to "skipping" the wrong instruction. A future enhancement to make this less awful would be for KVM to detect that the decoded instruction is not the correct INTn and drop the to-be-injected soft event (retrying is a lesser evil compared to shoving the wrong RIP on the exception stack). Reported-by: Maciej S. Szmigiero Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/nested.c | 26 +++++++ arch/x86/kvm/svm/svm.c | 140 +++++++++++++++++++++++++++----------- arch/x86/kvm/svm/svm.h | 6 +- 3 files changed, 129 insertions(+), 43 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 91a1f9ff2b86..0163238aa198 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -609,6 +609,21 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12 } } +static inline bool is_evtinj_soft(u32 evtinj) +{ + u32 type = evtinj & SVM_EVTINJ_TYPE_MASK; + u8 vector = evtinj & SVM_EVTINJ_VEC_MASK; + + if (!(evtinj & SVM_EVTINJ_VALID)) + return false; + + /* + * Intentionally return false for SOFT events, SVM doesn't yet support + * re-injecting soft interrupts. + */ + return type == SVM_EVTINJ_TYPE_EXEPT && kvm_exception_is_soft(vector); +} + static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, unsigned long vmcb12_rip) { @@ -677,6 +692,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, else if (boot_cpu_has(X86_FEATURE_NRIPS)) vmcb02->control.next_rip = vmcb12_rip; + if (is_evtinj_soft(vmcb02->control.event_inj)) { + svm->soft_int_injected = true; + svm->soft_int_csbase = svm->vmcb->save.cs.base; + svm->soft_int_old_rip = vmcb12_rip; + if (svm->nrips_enabled) + svm->soft_int_next_rip = svm->nested.ctl.next_rip; + else + svm->soft_int_next_rip = vmcb12_rip; + } + vmcb02->control.virt_ext = vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK; if (svm->lbrv_enabled) @@ -849,6 +874,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) out_exit_err: svm->nested.nested_run_pending = 0; + svm->soft_int_injected = false; svm->vmcb->control.exit_code = SVM_EXIT_ERR; svm->vmcb->control.exit_code_hi = 0; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index a05a4e521400..94d111ddec1c 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -342,9 +342,11 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) } -static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) +static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu, + bool commit_side_effects) { struct vcpu_svm *svm = to_svm(vcpu); + unsigned long old_rflags; /* * SEV-ES does not expose the next RIP. The RIP update is controlled by @@ -359,18 +361,75 @@ static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) } if (!svm->next_rip) { + if (unlikely(!commit_side_effects)) + old_rflags = svm->vmcb->save.rflags; + if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP)) return 0; + + if (unlikely(!commit_side_effects)) + svm->vmcb->save.rflags = old_rflags; } else { kvm_rip_write(vcpu, svm->next_rip); } done: - svm_set_interrupt_shadow(vcpu, 0); + if (likely(commit_side_effects)) + svm_set_interrupt_shadow(vcpu, 0); return 1; } +static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) +{ + return __svm_skip_emulated_instruction(vcpu, true); +} + +static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu) +{ + unsigned long rip, old_rip = kvm_rip_read(vcpu); + struct vcpu_svm *svm = to_svm(vcpu); + + /* + * Due to architectural shortcomings, the CPU doesn't always provide + * NextRIP, e.g. if KVM intercepted an exception that occurred while + * the CPU was vectoring an INTO/INT3 in the guest. Temporarily skip + * the instruction even if NextRIP is supported to acquire the next + * RIP so that it can be shoved into the NextRIP field, otherwise + * hardware will fail to advance guest RIP during event injection. + * Drop the exception/interrupt if emulation fails and effectively + * retry the instruction, it's the least awful option. If NRIPS is + * in use, the skip must not commit any side effects such as clearing + * the interrupt shadow or RFLAGS.RF. + */ + if (!__svm_skip_emulated_instruction(vcpu, !nrips)) + return -EIO; + + rip = kvm_rip_read(vcpu); + + /* + * Save the injection information, even when using next_rip, as the + * VMCB's next_rip will be lost (cleared on VM-Exit) if the injection + * doesn't complete due to a VM-Exit occurring while the CPU is + * vectoring the event. Decoding the instruction isn't guaranteed to + * work as there may be no backing instruction, e.g. if the event is + * being injected by L1 for L2, or if the guest is patching INT3 into + * a different instruction. + */ + svm->soft_int_injected = true; + svm->soft_int_csbase = svm->vmcb->save.cs.base; + svm->soft_int_old_rip = old_rip; + svm->soft_int_next_rip = rip; + + if (nrips) + kvm_rip_write(vcpu, old_rip); + + if (static_cpu_has(X86_FEATURE_NRIPS)) + svm->vmcb->control.next_rip = rip; + + return 0; +} + static void svm_queue_exception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -380,25 +439,9 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu) kvm_deliver_exception_payload(vcpu); - if (nr == BP_VECTOR && !nrips) { - unsigned long rip, old_rip = kvm_rip_read(vcpu); - - /* - * For guest debugging where we have to reinject #BP if some - * INT3 is guest-owned: - * Emulate nRIP by moving RIP forward. Will fail if injection - * raises a fault that is not intercepted. Still better than - * failing in all cases. - */ - (void)svm_skip_emulated_instruction(vcpu); - rip = kvm_rip_read(vcpu); - - if (boot_cpu_has(X86_FEATURE_NRIPS)) - svm->vmcb->control.next_rip = rip; - - svm->int3_rip = rip + svm->vmcb->save.cs.base; - svm->int3_injected = rip - old_rip; - } + if (kvm_exception_is_soft(nr) && + svm_update_soft_interrupt_rip(vcpu)) + return; svm->vmcb->control.event_inj = nr | SVM_EVTINJ_VALID @@ -3669,15 +3712,46 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu) svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK; } +static void svm_complete_soft_interrupt(struct kvm_vcpu *vcpu, u8 vector, + int type) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + /* + * If NRIPS is enabled, KVM must snapshot the pre-VMRUN next_rip that's + * associated with the original soft exception/interrupt. next_rip is + * cleared on all exits that can occur while vectoring an event, so KVM + * needs to manually set next_rip for re-injection. Unlike the !nrips + * case below, this needs to be done if and only if KVM is re-injecting + * the same event, i.e. if the event is a soft exception/interrupt, + * otherwise next_rip is unused on VMRUN. + */ + if (nrips && type == SVM_EXITINTINFO_TYPE_EXEPT && + kvm_exception_is_soft(vector) && + kvm_is_linear_rip(vcpu, svm->soft_int_old_rip + svm->soft_int_csbase)) + svm->vmcb->control.next_rip = svm->soft_int_next_rip; + /* + * If NRIPS isn't enabled, KVM must manually advance RIP prior to + * injecting the soft exception/interrupt. That advancement needs to + * be unwound if vectoring didn't complete. Note, the new event may + * not be the injected event, e.g. if KVM injected an INTn, the INTn + * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will + * be the reported vectored event, but RIP still needs to be unwound. + */ + else if (!nrips && type == SVM_EXITINTINFO_TYPE_EXEPT && + kvm_is_linear_rip(vcpu, svm->soft_int_next_rip + svm->soft_int_csbase)) + kvm_rip_write(vcpu, svm->soft_int_old_rip); +} + static void svm_complete_interrupts(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); u8 vector; int type; u32 exitintinfo = svm->vmcb->control.exit_int_info; - unsigned int3_injected = svm->int3_injected; + bool soft_int_injected = svm->soft_int_injected; - svm->int3_injected = 0; + svm->soft_int_injected = false; /* * If we've made progress since setting HF_IRET_MASK, we've @@ -3702,17 +3776,8 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK; type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK; - /* - * If NextRIP isn't enabled, KVM must manually advance RIP prior to - * injecting the soft exception/interrupt. That advancement needs to - * be unwound if vectoring didn't complete. Note, the new event may - * not be the injected event, e.g. if KVM injected an INTn, the INTn - * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will - * be the reported vectored event, but RIP still needs to be unwound. - */ - if (int3_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && - kvm_is_linear_rip(vcpu, svm->int3_rip)) - kvm_rip_write(vcpu, kvm_rip_read(vcpu) - int3_injected); + if (soft_int_injected) + svm_complete_soft_interrupt(vcpu, vector, type); switch (type) { case SVM_EXITINTINFO_TYPE_NMI: @@ -3725,13 +3790,6 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) if (vector == X86_TRAP_VC) break; - /* - * In case of software exceptions, do not reinject the vector, - * but re-execute the instruction instead. - */ - if (kvm_exception_is_soft(vector)) - break; - if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { u32 err = svm->vmcb->control.exit_int_info_err; kvm_requeue_exception_e(vcpu, vector, err); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 7d97e4d18c8b..6acb494e3598 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -230,8 +230,10 @@ struct vcpu_svm { bool nmi_singlestep; u64 nmi_singlestep_guest_rflags; - unsigned int3_injected; - unsigned long int3_rip; + unsigned long soft_int_csbase; + unsigned long soft_int_old_rip; + unsigned long soft_int_next_rip; + bool soft_int_injected; /* optional nested SVM features that are enabled for this guest */ bool nrips_enabled : 1; From patchwork Sun May 1 22:07:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15BC4C433F5 for ; Sun, 1 May 2022 22:08:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377915AbiEAWMC (ORCPT ); Sun, 1 May 2022 18:12:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377148AbiEAWL4 (ORCPT ); Sun, 1 May 2022 18:11:56 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C6415E75D; Sun, 1 May 2022 15:08:22 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjh-0008O1-W1; Mon, 02 May 2022 00:08:14 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 06/12] KVM: SVM: Re-inject INTn instead of retrying the insn on "failure" Date: Mon, 2 May 2022 00:07:30 +0200 Message-Id: <1654ad502f860948e4f2d57b8bd881d67301f785.1651440202.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Re-inject INTn software interrupts instead of retrying the instruction if the CPU encountered an intercepted exception while vectoring the INTn, e.g. if KVM intercepted a #PF when utilizing shadow paging. Retrying the instruction is architecturally wrong e.g. will result in a spurious #DB if there's a code breakpoint on the INT3/O, and lack of re-injection also breaks nested virtualization, e.g. if L1 injects a software interrupt and vectoring the injected interrupt encounters an exception that is intercepted by L0 but not L1. Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/nested.c | 7 +++---- arch/x86/kvm/svm/svm.c | 23 +++++++++++++++++++---- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 0163238aa198..a83e367ade54 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -617,10 +617,9 @@ static inline bool is_evtinj_soft(u32 evtinj) if (!(evtinj & SVM_EVTINJ_VALID)) return false; - /* - * Intentionally return false for SOFT events, SVM doesn't yet support - * re-injecting soft interrupts. - */ + if (type == SVM_EVTINJ_TYPE_SOFT) + return true; + return type == SVM_EVTINJ_TYPE_EXEPT && kvm_exception_is_soft(vector); } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 94d111ddec1c..a1158fc25c4c 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3430,12 +3430,22 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) static void svm_inject_irq(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + u32 type; + + if (vcpu->arch.interrupt.soft) { + if (svm_update_soft_interrupt_rip(vcpu)) + return; + + type = SVM_EVTINJ_TYPE_SOFT; + } else { + type = SVM_EVTINJ_TYPE_INTR; + } trace_kvm_inj_virq(vcpu->arch.interrupt.nr); ++vcpu->stat.irq_injections; svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr | - SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; + SVM_EVTINJ_VALID | type; } void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode, @@ -3715,6 +3725,8 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu) static void svm_complete_soft_interrupt(struct kvm_vcpu *vcpu, u8 vector, int type) { + bool is_exception = (type == SVM_EXITINTINFO_TYPE_EXEPT); + bool is_soft = (type == SVM_EXITINTINFO_TYPE_SOFT); struct vcpu_svm *svm = to_svm(vcpu); /* @@ -3726,8 +3738,7 @@ static void svm_complete_soft_interrupt(struct kvm_vcpu *vcpu, u8 vector, * the same event, i.e. if the event is a soft exception/interrupt, * otherwise next_rip is unused on VMRUN. */ - if (nrips && type == SVM_EXITINTINFO_TYPE_EXEPT && - kvm_exception_is_soft(vector) && + if (nrips && (is_soft || (is_exception && kvm_exception_is_soft(vector))) && kvm_is_linear_rip(vcpu, svm->soft_int_old_rip + svm->soft_int_csbase)) svm->vmcb->control.next_rip = svm->soft_int_next_rip; /* @@ -3738,7 +3749,7 @@ static void svm_complete_soft_interrupt(struct kvm_vcpu *vcpu, u8 vector, * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will * be the reported vectored event, but RIP still needs to be unwound. */ - else if (!nrips && type == SVM_EXITINTINFO_TYPE_EXEPT && + else if (!nrips && (is_soft || is_exception) && kvm_is_linear_rip(vcpu, svm->soft_int_next_rip + svm->soft_int_csbase)) kvm_rip_write(vcpu, svm->soft_int_old_rip); } @@ -3800,9 +3811,13 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) case SVM_EXITINTINFO_TYPE_INTR: kvm_queue_interrupt(vcpu, vector, false); break; + case SVM_EXITINTINFO_TYPE_SOFT: + kvm_queue_interrupt(vcpu, vector, true); + break; default: break; } + } static void svm_cancel_injection(struct kvm_vcpu *vcpu) From patchwork Sun May 1 22:07:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833790 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 205AEC433EF for ; Sun, 1 May 2022 22:08:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376764AbiEAWML (ORCPT ); Sun, 1 May 2022 18:12:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377791AbiEAWMC (ORCPT ); Sun, 1 May 2022 18:12:02 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAACA5E775; Sun, 1 May 2022 15:08:25 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjn-0008OL-9h; Mon, 02 May 2022 00:08:19 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 07/12] KVM: x86: Trace re-injected exceptions Date: Mon, 2 May 2022 00:07:31 +0200 Message-Id: <25470690a38b4d2b32b6204875dd35676c65c9f2.1651440202.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Trace exceptions that are re-injected, not just those that KVM is injecting for the first time. Debugging re-injection bugs is painful enough as is, not having visibility into what KVM is doing only makes things worse. Delay propagating pending=>injected in the non-reinjection path so that the tracing can properly identify reinjected exceptions. Signed-off-by: Sean Christopherson Reviewed-by: Maxim Levitsky Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/trace.h | 12 ++++++++---- arch/x86/kvm/x86.c | 16 +++++++++------- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index de4762517569..d07428e660e3 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -358,25 +358,29 @@ TRACE_EVENT(kvm_inj_virq, * Tracepoint for kvm interrupt injection: */ TRACE_EVENT(kvm_inj_exception, - TP_PROTO(unsigned exception, bool has_error, unsigned error_code), - TP_ARGS(exception, has_error, error_code), + TP_PROTO(unsigned exception, bool has_error, unsigned error_code, + bool reinjected), + TP_ARGS(exception, has_error, error_code, reinjected), TP_STRUCT__entry( __field( u8, exception ) __field( u8, has_error ) __field( u32, error_code ) + __field( bool, reinjected ) ), TP_fast_assign( __entry->exception = exception; __entry->has_error = has_error; __entry->error_code = error_code; + __entry->reinjected = reinjected; ), - TP_printk("%s (0x%x)", + TP_printk("%s (0x%x)%s", __print_symbolic(__entry->exception, kvm_trace_sym_exc), /* FIXME: don't print error_code if not present */ - __entry->has_error ? __entry->error_code : 0) + __entry->has_error ? __entry->error_code : 0, + __entry->reinjected ? " [reinjected]" : "") ); /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1abce22b14d7..a41d616ad943 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9393,6 +9393,11 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu) static void kvm_inject_exception(struct kvm_vcpu *vcpu) { + trace_kvm_inj_exception(vcpu->arch.exception.nr, + vcpu->arch.exception.has_error_code, + vcpu->arch.exception.error_code, + vcpu->arch.exception.injected); + if (vcpu->arch.exception.error_code && !is_protmode(vcpu)) vcpu->arch.exception.error_code = false; static_call(kvm_x86_queue_exception)(vcpu); @@ -9450,13 +9455,6 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) /* try to inject new event if pending */ if (vcpu->arch.exception.pending) { - trace_kvm_inj_exception(vcpu->arch.exception.nr, - vcpu->arch.exception.has_error_code, - vcpu->arch.exception.error_code); - - vcpu->arch.exception.pending = false; - vcpu->arch.exception.injected = true; - if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT) __kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | X86_EFLAGS_RF); @@ -9470,6 +9468,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) } kvm_inject_exception(vcpu); + + vcpu->arch.exception.pending = false; + vcpu->arch.exception.injected = true; + can_inject = false; } From patchwork Sun May 1 22:07:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ECC3C433F5 for ; Sun, 1 May 2022 22:09:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378144AbiEAWM0 (ORCPT ); Sun, 1 May 2022 18:12:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378529AbiEAWMM (ORCPT ); Sun, 1 May 2022 18:12:12 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 227265E756; Sun, 1 May 2022 15:08:33 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjs-0008OZ-Jp; Mon, 02 May 2022 00:08:24 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 08/12] KVM: x86: Print error code in exception injection tracepoint iff valid Date: Mon, 2 May 2022 00:07:32 +0200 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Print the error code in the exception injection tracepoint if and only if the exception has an error code. Define the entire error code sequence as a set of formatted strings, print empty strings if there's no error code, and abuse __print_symbolic() by passing it an empty array to coerce it into printing the error code as a hex string. Signed-off-by: Sean Christopherson Reviewed-by: Maxim Levitsky Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/trace.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index d07428e660e3..385436d12024 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -376,10 +376,11 @@ TRACE_EVENT(kvm_inj_exception, __entry->reinjected = reinjected; ), - TP_printk("%s (0x%x)%s", + TP_printk("%s%s%s%s%s", __print_symbolic(__entry->exception, kvm_trace_sym_exc), - /* FIXME: don't print error_code if not present */ - __entry->has_error ? __entry->error_code : 0, + !__entry->has_error ? "" : " (", + !__entry->has_error ? "" : __print_symbolic(__entry->error_code, { }), + !__entry->has_error ? "" : ")", __entry->reinjected ? " [reinjected]" : "") ); From patchwork Sun May 1 22:07:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833792 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2929CC433F5 for ; Sun, 1 May 2022 22:09:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378386AbiEAWM3 (ORCPT ); Sun, 1 May 2022 18:12:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378175AbiEAWMR (ORCPT ); Sun, 1 May 2022 18:12:17 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B725D252B0; Sun, 1 May 2022 15:08:36 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHjx-0008P0-VB; Mon, 02 May 2022 00:08:30 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 09/12] KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint Date: Mon, 2 May 2022 00:07:33 +0200 Message-Id: <9664d49b3bd21e227caa501cff77b0569bebffe2.1651440202.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson In the IRQ injection tracepoint, differentiate between Hard IRQs and Soft "IRQs", i.e. interrupts that are reinjected after incomplete delivery of a software interrupt from an INTn instruction. Tag reinjected interrupts as such, even though the information is usually redundant since soft interrupts are only ever reinjected by KVM. Though rare in practice, a hard IRQ can be reinjected. Signed-off-by: Sean Christopherson [MSS: change "kvm_inj_virq" event "reinjected" field type to bool] Signed-off-by: Maciej S. Szmigiero --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/svm/svm.c | 5 +++-- arch/x86/kvm/trace.h | 16 +++++++++++----- arch/x86/kvm/vmx/vmx.c | 4 ++-- arch/x86/kvm/x86.c | 4 ++-- 5 files changed, 19 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f164c6c1514a..ae088c6fb287 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1400,7 +1400,7 @@ struct kvm_x86_ops { u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu); void (*patch_hypercall)(struct kvm_vcpu *vcpu, unsigned char *hypercall_addr); - void (*inject_irq)(struct kvm_vcpu *vcpu); + void (*inject_irq)(struct kvm_vcpu *vcpu, bool reinjected); void (*inject_nmi)(struct kvm_vcpu *vcpu); void (*queue_exception)(struct kvm_vcpu *vcpu); void (*cancel_injection)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index a1158fc25c4c..f946161520f6 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3427,7 +3427,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) ++vcpu->stat.nmi_injections; } -static void svm_inject_irq(struct kvm_vcpu *vcpu) +static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) { struct vcpu_svm *svm = to_svm(vcpu); u32 type; @@ -3441,7 +3441,8 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu) type = SVM_EVTINJ_TYPE_INTR; } - trace_kvm_inj_virq(vcpu->arch.interrupt.nr); + trace_kvm_inj_virq(vcpu->arch.interrupt.nr, + vcpu->arch.interrupt.soft, reinjected); ++vcpu->stat.irq_injections; svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr | diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 385436d12024..fd28dd40b813 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -333,18 +333,24 @@ TRACE_EVENT_KVM_EXIT(kvm_exit); * Tracepoint for kvm interrupt injection: */ TRACE_EVENT(kvm_inj_virq, - TP_PROTO(unsigned int irq), - TP_ARGS(irq), + TP_PROTO(unsigned int vector, bool soft, bool reinjected), + TP_ARGS(vector, soft, reinjected), TP_STRUCT__entry( - __field( unsigned int, irq ) + __field( unsigned int, vector ) + __field( bool, soft ) + __field( bool, reinjected ) ), TP_fast_assign( - __entry->irq = irq; + __entry->vector = vector; + __entry->soft = soft; + __entry->reinjected = reinjected; ), - TP_printk("irq %u", __entry->irq) + TP_printk("%s 0x%x%s", + __entry->soft ? "Soft/INTn" : "IRQ", __entry->vector, + __entry->reinjected ? " [reinjected]" : "") ); #define EXS(x) { x##_VECTOR, "#" #x } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cf8581978bce..a0083464682d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4566,13 +4566,13 @@ static void vmx_enable_nmi_window(struct kvm_vcpu *vcpu) exec_controls_setbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING); } -static void vmx_inject_irq(struct kvm_vcpu *vcpu) +static void vmx_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) { struct vcpu_vmx *vmx = to_vmx(vcpu); uint32_t intr; int irq = vcpu->arch.interrupt.nr; - trace_kvm_inj_virq(irq); + trace_kvm_inj_virq(irq, vcpu->arch.interrupt.soft, reinjected); ++vcpu->stat.irq_injections; if (vmx->rmode.vm86_active) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a41d616ad943..d2d118498437 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9433,7 +9433,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) static_call(kvm_x86_inject_nmi)(vcpu); can_inject = false; } else if (vcpu->arch.interrupt.injected) { - static_call(kvm_x86_inject_irq)(vcpu); + static_call(kvm_x86_inject_irq)(vcpu, true); can_inject = false; } } @@ -9524,7 +9524,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) goto out; if (r) { kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), false); - static_call(kvm_x86_inject_irq)(vcpu); + static_call(kvm_x86_inject_irq)(vcpu, false); WARN_ON(static_call(kvm_x86_interrupt_allowed)(vcpu, true) < 0); } if (kvm_cpu_has_injectable_intr(vcpu)) From patchwork Sun May 1 22:07:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833794 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F26BEC433EF for ; Sun, 1 May 2022 22:09:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377826AbiEAWMo (ORCPT ); Sun, 1 May 2022 18:12:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377791AbiEAWMX (ORCPT ); Sun, 1 May 2022 18:12:23 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07A1E5E76F; Sun, 1 May 2022 15:08:40 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHk3-0008PB-8f; Mon, 02 May 2022 00:08:35 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 10/12] KVM: nSVM: Transparently handle L1 -> L2 NMI re-injection Date: Mon, 2 May 2022 00:07:34 +0200 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: "Maciej S. Szmigiero" A NMI that L1 wants to inject into its L2 should be directly re-injected, without causing L0 side effects like engaging NMI blocking for L1. It's also worth noting that in this case it is L1 responsibility to track the NMI window status for its L2 guest. Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/nested.c | 12 ++++++++++++ arch/x86/kvm/svm/svm.c | 7 +++++++ arch/x86/kvm/svm/svm.h | 1 + 3 files changed, 20 insertions(+) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index a83e367ade54..2cf92c12706a 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -623,6 +623,16 @@ static inline bool is_evtinj_soft(u32 evtinj) return type == SVM_EVTINJ_TYPE_EXEPT && kvm_exception_is_soft(vector); } +static bool is_evtinj_nmi(u32 evtinj) +{ + u32 type = evtinj & SVM_EVTINJ_TYPE_MASK; + + if (!(evtinj & SVM_EVTINJ_VALID)) + return false; + + return type == SVM_EVTINJ_TYPE_NMI; +} + static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, unsigned long vmcb12_rip) { @@ -691,6 +701,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, else if (boot_cpu_has(X86_FEATURE_NRIPS)) vmcb02->control.next_rip = vmcb12_rip; + svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj); if (is_evtinj_soft(vmcb02->control.event_inj)) { svm->soft_int_injected = true; svm->soft_int_csbase = svm->vmcb->save.cs.base; @@ -873,6 +884,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) out_exit_err: svm->nested.nested_run_pending = 0; + svm->nmi_l1_to_l2 = false; svm->soft_int_injected = false; svm->vmcb->control.exit_code = SVM_EXIT_ERR; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index f946161520f6..88eacbbe9348 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3421,6 +3421,10 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI; + + if (svm->nmi_l1_to_l2) + return; + vcpu->arch.hflags |= HF_NMI_MASK; if (!sev_es_guest(vcpu->kvm)) svm_set_intercept(svm, INTERCEPT_IRET); @@ -3761,8 +3765,10 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) u8 vector; int type; u32 exitintinfo = svm->vmcb->control.exit_int_info; + bool nmi_l1_to_l2 = svm->nmi_l1_to_l2; bool soft_int_injected = svm->soft_int_injected; + svm->nmi_l1_to_l2 = false; svm->soft_int_injected = false; /* @@ -3794,6 +3800,7 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) switch (type) { case SVM_EXITINTINFO_TYPE_NMI: vcpu->arch.nmi_injected = true; + svm->nmi_l1_to_l2 = nmi_l1_to_l2; break; case SVM_EXITINTINFO_TYPE_EXEPT: /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 6acb494e3598..e1117d92789d 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -229,6 +229,7 @@ struct vcpu_svm { bool nmi_singlestep; u64 nmi_singlestep_guest_rflags; + bool nmi_l1_to_l2; unsigned long soft_int_csbase; unsigned long soft_int_old_rip; From patchwork Sun May 1 22:07:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D09D0C433F5 for ; Sun, 1 May 2022 22:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356804AbiEAWMi (ORCPT ); Sun, 1 May 2022 18:12:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378579AbiEAWM0 (ORCPT ); Sun, 1 May 2022 18:12:26 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEE75483B8; Sun, 1 May 2022 15:08:47 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHk8-0008Pd-K5; Mon, 02 May 2022 00:08:40 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 11/12] KVM: selftests: nSVM: Add svm_nested_soft_inject_test Date: Mon, 2 May 2022 00:07:35 +0200 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: "Maciej S. Szmigiero" Add a KVM self-test that checks whether a nSVM L1 is able to successfully inject a software interrupt, a soft exception and a NMI into its L2 guest. In practice, this tests both the next_rip field consistency and L1-injected event with intervening L0 VMEXIT during its delivery: the first nested VMRUN (that's also trying to inject a software interrupt) will immediately trigger a L0 NPF. This L0 NPF will have zero in its CPU-returned next_rip field, which if incorrectly reused by KVM will trigger a #PF when trying to return to such address 0 from the interrupt handler. For NMI injection this tests whether the L1 NMI state isn't getting incorrectly mixed with the L2 NMI state if a L1 -> L2 NMI needs to be re-injected. Reviewed-by: Maxim Levitsky [sean: check exact L2 RIP on first soft interrupt] Signed-off-by: Sean Christopherson Signed-off-by: Maciej S. Szmigiero --- tools/testing/selftests/kvm/.gitignore | 3 +- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 17 ++ .../selftests/kvm/include/x86_64/svm_util.h | 12 + .../kvm/x86_64/svm_nested_soft_inject_test.c | 217 ++++++++++++++++++ 5 files changed, 249 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_nested_soft_inject_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 56140068b763..74f3099f0ad3 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -35,9 +35,10 @@ /x86_64/state_test /x86_64/svm_vmcall_test /x86_64/svm_int_ctl_test -/x86_64/tsc_scaling_sync +/x86_64/svm_nested_soft_inject_test /x86_64/sync_regs_test /x86_64/tsc_msrs_test +/x86_64/tsc_scaling_sync /x86_64/userspace_io_test /x86_64/userspace_msr_exit_test /x86_64/vmx_apic_access_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index af582d168621..eedf6bf713d3 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_soft_inject_test TEST_GEN_PROGS_x86_64 += x86_64/tsc_scaling_sync TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test TEST_GEN_PROGS_x86_64 += x86_64/userspace_io_test diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 37db341d4cc5..deff64605442 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -17,6 +17,8 @@ #include "../kvm_util.h" +#define NMI_VECTOR 0x02 + #define X86_EFLAGS_FIXED (1u << 1) #define X86_CR4_VME (1ul << 0) @@ -368,6 +370,21 @@ static inline void cpu_relax(void) asm volatile("rep; nop" ::: "memory"); } +#define vmmcall() \ + __asm__ __volatile__( \ + "vmmcall\n" \ + ) + +#define ud2() \ + __asm__ __volatile__( \ + "ud2\n" \ + ) + +#define hlt() \ + __asm__ __volatile__( \ + "hlt\n" \ + ) + bool is_intel_cpu(void); bool is_amd_cpu(void); diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h index a25aabd8f5e7..136ba6a5d027 100644 --- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h +++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h @@ -16,6 +16,8 @@ #define CPUID_SVM_BIT 2 #define CPUID_SVM BIT_ULL(CPUID_SVM_BIT) +#define SVM_EXIT_EXCP_BASE 0x040 +#define SVM_EXIT_HLT 0x078 #define SVM_EXIT_MSR 0x07c #define SVM_EXIT_VMMCALL 0x081 @@ -36,6 +38,16 @@ struct svm_test_data { uint64_t msr_gpa; }; +#define stgi() \ + __asm__ __volatile__( \ + "stgi\n" \ + ) + +#define clgi() \ + __asm__ __volatile__( \ + "clgi\n" \ + ) + struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva); void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp); void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa); diff --git a/tools/testing/selftests/kvm/x86_64/svm_nested_soft_inject_test.c b/tools/testing/selftests/kvm/x86_64/svm_nested_soft_inject_test.c new file mode 100644 index 000000000000..f94f1b449aef --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_nested_soft_inject_test.c @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2022 Oracle and/or its affiliates. + * + * Based on: + * svm_int_ctl_test + * + * Copyright (C) 2021, Red Hat, Inc. + * + */ + +#include +#include +#include +#include "apic.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h" +#include "test_util.h" +#include "../lib/kvm_util_internal.h" + +#define VCPU_ID 0 +#define INT_NR 0x20 +#define X86_FEATURE_NRIPS BIT(3) + +static_assert(ATOMIC_INT_LOCK_FREE == 2, "atomic int is not lockless"); + +static unsigned int bp_fired; +static void guest_bp_handler(struct ex_regs *regs) +{ + bp_fired++; +} + +static unsigned int int_fired; +static void l2_guest_code_int(void); + +static void guest_int_handler(struct ex_regs *regs) +{ + int_fired++; + GUEST_ASSERT_2(regs->rip == (unsigned long)l2_guest_code_int, + regs->rip, (unsigned long)l2_guest_code_int); +} + +static void l2_guest_code_int(void) +{ + GUEST_ASSERT_1(int_fired == 1, int_fired); + vmmcall(); + ud2(); + + GUEST_ASSERT_1(bp_fired == 1, bp_fired); + hlt(); +} + +static atomic_int nmi_stage; +#define nmi_stage_get() atomic_load_explicit(&nmi_stage, memory_order_acquire) +#define nmi_stage_inc() atomic_fetch_add_explicit(&nmi_stage, 1, memory_order_acq_rel) +static void guest_nmi_handler(struct ex_regs *regs) +{ + nmi_stage_inc(); + + if (nmi_stage_get() == 1) { + vmmcall(); + GUEST_ASSERT(false); + } else { + GUEST_ASSERT_1(nmi_stage_get() == 3, nmi_stage_get()); + GUEST_DONE(); + } +} + +static void l2_guest_code_nmi(void) +{ + ud2(); +} + +static void l1_guest_code(struct svm_test_data *svm, uint64_t is_nmi, uint64_t idt_alt) +{ + #define L2_GUEST_STACK_SIZE 64 + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + + if (is_nmi) + x2apic_enable(); + + /* Prepare for L2 execution. */ + generic_svm_setup(svm, + is_nmi ? l2_guest_code_nmi : l2_guest_code_int, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + vmcb->control.intercept_exceptions |= BIT(PF_VECTOR) | BIT(UD_VECTOR); + vmcb->control.intercept |= BIT(INTERCEPT_NMI) | BIT(INTERCEPT_HLT); + + if (is_nmi) { + vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI; + } else { + vmcb->control.event_inj = INT_NR | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_SOFT; + /* The return address pushed on stack */ + vmcb->control.next_rip = vmcb->save.rip; + } + + run_guest(vmcb, svm->vmcb_gpa); + GUEST_ASSERT_3(vmcb->control.exit_code == SVM_EXIT_VMMCALL, + vmcb->control.exit_code, + vmcb->control.exit_info_1, vmcb->control.exit_info_2); + + if (is_nmi) { + clgi(); + x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_NMI); + + GUEST_ASSERT_1(nmi_stage_get() == 1, nmi_stage_get()); + nmi_stage_inc(); + + stgi(); + /* self-NMI happens here */ + while (true) + cpu_relax(); + } + + /* Skip over VMMCALL */ + vmcb->save.rip += 3; + + /* Switch to alternate IDT to cause intervening NPF again */ + vmcb->save.idtr.base = idt_alt; + vmcb->control.clean = 0; /* &= ~BIT(VMCB_DT) would be enough */ + + vmcb->control.event_inj = BP_VECTOR | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_EXEPT; + /* The return address pushed on stack, skip over UD2 */ + vmcb->control.next_rip = vmcb->save.rip + 2; + + run_guest(vmcb, svm->vmcb_gpa); + GUEST_ASSERT_3(vmcb->control.exit_code == SVM_EXIT_HLT, + vmcb->control.exit_code, + vmcb->control.exit_info_1, vmcb->control.exit_info_2); + + GUEST_DONE(); +} + +static void run_test(bool is_nmi) +{ + struct kvm_vm *vm; + vm_vaddr_t svm_gva; + vm_vaddr_t idt_alt_vm; + struct kvm_guest_debug debug; + + pr_info("Running %s test\n", is_nmi ? "NMI" : "soft int"); + + vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code); + + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vm, VCPU_ID); + + vm_install_exception_handler(vm, NMI_VECTOR, guest_nmi_handler); + vm_install_exception_handler(vm, BP_VECTOR, guest_bp_handler); + vm_install_exception_handler(vm, INT_NR, guest_int_handler); + + vcpu_alloc_svm(vm, &svm_gva); + + if (!is_nmi) { + void *idt, *idt_alt; + + idt_alt_vm = vm_vaddr_alloc_page(vm); + idt_alt = addr_gva2hva(vm, idt_alt_vm); + idt = addr_gva2hva(vm, vm->idt); + memcpy(idt_alt, idt, getpagesize()); + } else { + idt_alt_vm = 0; + } + vcpu_args_set(vm, VCPU_ID, 3, svm_gva, (uint64_t)is_nmi, (uint64_t)idt_alt_vm); + + memset(&debug, 0, sizeof(debug)); + vcpu_set_guest_debug(vm, VCPU_ID, &debug); + + struct kvm_run *run = vcpu_state(vm, VCPU_ID); + struct ucall uc; + + alarm(2); + vcpu_run(vm, VCPU_ID); + alarm(0); + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n", + run->exit_reason, + exit_reason_str(run->exit_reason)); + + switch (get_ucall(vm, VCPU_ID, &uc)) { + case UCALL_ABORT: + TEST_FAIL("%s at %s:%ld, vals = 0x%lx 0x%lx 0x%lx", (const char *)uc.args[0], + __FILE__, uc.args[1], uc.args[2], uc.args[3], uc.args[4]); + break; + /* NOT REACHED */ + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } +done: + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + struct kvm_cpuid_entry2 *cpuid; + + /* Tell stdout not to buffer its content */ + setbuf(stdout, NULL); + + nested_svm_check_supported(); + + cpuid = kvm_get_supported_cpuid_entry(0x8000000a); + TEST_ASSERT(cpuid->edx & X86_FEATURE_NRIPS, + "KVM with nSVM is supposed to unconditionally advertise nRIP Save\n"); + + atomic_init(&nmi_stage, 0); + + run_test(false); + run_test(true); + + return 0; +} From patchwork Sun May 1 22:07:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 12833795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6239CC433F5 for ; Sun, 1 May 2022 22:09:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377951AbiEAWMq (ORCPT ); Sun, 1 May 2022 18:12:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377835AbiEAWMj (ORCPT ); Sun, 1 May 2022 18:12:39 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CD625EBED; Sun, 1 May 2022 15:08:53 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nlHkE-0008Pw-09; Mon, 02 May 2022 00:08:46 +0200 From: "Maciej S. Szmigiero" To: Paolo Bonzini , Sean Christopherson Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Maxim Levitsky , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 12/12] KVM: nSVM: Drop support for CPUs without NRIPS (NextRIP Save) support Date: Mon, 2 May 2022 00:07:36 +0200 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Drop nested support for CPUs without NRIPS, as requiring NRIPS simplifies a handful of paths in KVM. NRIPS was introduced in 2009, i.e. every AMD-based CPU released in the last decade should support NRIPS. Suggested-by: Paolo Bonzini Not-signed-off-by: Sean Christopherson [MSS: Just drop nested support for these CPUs instead of SVM support in general] Signed-off-by: Maciej S. Szmigiero --- arch/x86/kvm/svm/nested.c | 14 +++++--------- arch/x86/kvm/svm/svm.c | 19 +++++++++++-------- 2 files changed, 16 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 2cf92c12706a..e5c05e3427ae 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -691,14 +691,13 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, /* * next_rip is consumed on VMRUN as the return address pushed on the * stack for injected soft exceptions/interrupts. If nrips is exposed - * to L1, take it verbatim from vmcb12. If nrips is supported in - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate - * what a nrips=0 CPU would do (L1 is responsible for advancing RIP - * prior to injecting the event). + * to L1, take it verbatim from vmcb12. If nrips is not exposed to L1, + * stuff the actual L2 RIP to emulate what an nrips=0 CPU would do (L1 + * is responsible for advancing RIP prior to injecting the event). */ if (svm->nrips_enabled) vmcb02->control.next_rip = svm->nested.ctl.next_rip; - else if (boot_cpu_has(X86_FEATURE_NRIPS)) + else vmcb02->control.next_rip = vmcb12_rip; svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj); @@ -706,10 +705,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, svm->soft_int_injected = true; svm->soft_int_csbase = svm->vmcb->save.cs.base; svm->soft_int_old_rip = vmcb12_rip; - if (svm->nrips_enabled) - svm->soft_int_next_rip = svm->nested.ctl.next_rip; - else - svm->soft_int_next_rip = vmcb12_rip; + svm->soft_int_next_rip = vmcb02->control.next_rip; } vmcb02->control.virt_ext = vmcb01->control.virt_ext & diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 88eacbbe9348..dc980fa0daa8 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4330,9 +4330,7 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, break; } - /* TODO: Advertise NRIPS to guest hypervisor unconditionally */ - if (static_cpu_has(X86_FEATURE_NRIPS)) - vmcb->control.next_rip = info->next_rip; + vmcb->control.next_rip = info->next_rip; vmcb->control.exit_code = icpt_info.exit_code; vmexit = nested_svm_exit_handled(svm); @@ -4961,6 +4959,16 @@ static __init int svm_hardware_setup(void) pause_filter_thresh = 0; } + if (nrips) { + if (!boot_cpu_has(X86_FEATURE_NRIPS)) + nrips = false; + } + + if (!nrips && nested) { + pr_notice("kvm: Nested Virtualization requires NRIPS (NextRIP Save)\n"); + nested = false; + } + if (nested) { printk(KERN_INFO "kvm: Nested Virtualization enabled\n"); kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE); @@ -4995,11 +5003,6 @@ static __init int svm_hardware_setup(void) goto err; } - if (nrips) { - if (!boot_cpu_has(X86_FEATURE_NRIPS)) - nrips = false; - } - enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic); if (enable_apicv) {