From patchwork Tue Dec 5 08:16:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liran Alon X-Patchwork-Id: 10092433 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1987C60348 for ; Tue, 5 Dec 2017 08:17:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 09D012950F for ; Tue, 5 Dec 2017 08:17:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2C0B29536; Tue, 5 Dec 2017 08:17:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0137C29518 for ; Tue, 5 Dec 2017 08:17:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656AbdLEIRR (ORCPT ); Tue, 5 Dec 2017 03:17:17 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:42002 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752562AbdLEIRQ (ORCPT ); Tue, 5 Dec 2017 03:17:16 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vB58HDOj182723; Tue, 5 Dec 2017 08:17:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=thcF8WUwWVeZqoQ4dsD3aEOKkR/WHNh2xAfXk8dnaqQ=; b=G5fmKSy8JbD9F0vRf8M6j+PfRR6x5g57dyHPmT6zZ9eH7hWvT9ReW8AyjsulTGJw6pwq G5LfXxVGrCl50BbCgYseIHZrTNyUJvw+OegIhtiyOKf2UuL4up2ePz+5lmYi36kc4tZp btSZkhk9ZGmcRE4Y3IlFZZhLioyuzGcAG87SkDXfTrqDtLWAYWs3JtW07RLNQoVDdCRV 0/5/zkJ/Zq9UHk3wpPRzSt8Y8MXLD6bDuoGyzYB8z4DtZ6SECA5uTHAOIzRbhnPLnHrJ csmsXm2jQuDs1MCpkPhMFrNXNrMLTaTQEvMRfki0kUXce4+xtvGv/Z7k9I97L67MWDOY Fw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2enc2g1hy4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 Dec 2017 08:17:12 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id vB58H8ZE008999 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 5 Dec 2017 08:17:09 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id vB58H8YT028201; Tue, 5 Dec 2017 08:17:08 GMT Received: from localhost.localdomain (/172.58.43.69) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 05 Dec 2017 00:17:08 -0800 From: Liran Alon To: pbonzini@redhat.com, rkrcmar@redhat.com, kvm@vger.kernel.org Cc: jmattson@google.com, wanpeng.li@hotmail.com, idan.brown@oracle.com, Liran Alon , Krish Sadhukhan , Konrad Rzeszutek Wilk Subject: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Date: Tue, 5 Dec 2017 10:16:25 +0200 Message-Id: <1512461786-6465-5-git-send-email-liran.alon@oracle.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1512461786-6465-1-git-send-email-liran.alon@oracle.com> References: <1512461786-6465-1-git-send-email-liran.alon@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8735 signatures=668637 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712050123 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When L1 wants to send a posted-interrupt to another L1 CPU running L2, it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in vmx->nested.pi_desc->control. Then it attempts to send a notification-vector IPI to dest L1 CPU. This attempt to send IPI will exit to L0 which will reach vmx_deliver_nested_posted_interrupt() which does the following: 1. If dest L0 CPU is currently running guest (vcpu->mode == IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector. 2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest L0 CPU exits from guest to host and therefore doesn't recognize physical IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next vmentry which will call vmx_check_nested_events() which should call (in theory) vmx_complete_nested_posted_interrupt(). That function should see vmx->nested.pi_desc->control ON bit is set and therefore "emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1 virtual-apic-page & update vmcs02 RVI). The above logic regarding nested-posted-interrupts contains multiple issues: A) Race-condition: On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE but before sending physical IPI, the dest CPU will exit to host. Therefore, physical IPI could be received at host which it's hanlder does nothing. In addition, assume that dest CPU passes the checks for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore, dest CPU will resume L2 without evaluating nested-posted-interrupts. B) vmx_complete_nested_posted_interrupt() is not always called when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug that vmx_check_nested_events() could exit from L2 to L1 before calling vmx_complete_nested_posted_interrupt(). Therefore, on next resume of L1 into L2, nested-posted-interrupts won't be evaluated even though L0 resume L2 (We may resume L2 without having KVM_REQ_EVENT set). This commit removes nested-posted-interrupts processing from check_nested_events() and instead makes sure to process nested-posted-interrupts on vmentry after interrupts disabled. Processing of nested-posted-interrupts is delegated to hardware by issueing a self-IPI of relevant notification-vector which will be delivered to CPU when CPU is in guest. * Bug (A) is solved by the fact that processing of nested-posted-interrupts is not dependent on KVM_REQ_EVENT and happens before every vmentry to L2. * Bug (B) is now trivially solved by processing nested-posted-interrupts before each vmentry to L2 guest. An alternative could have been to just call vmx_complete_nested_posted_interrupt() at this call-site. However, we have decided to go with this approach because: 1. It would require modifying vmx_complete_nested_posted_interrupt() to be able to work with interrupts disabled (not-trivial). 2. We preffer to avoid software-emulation of hardware behavior if it is possible. Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Signed-off-by: Liran Alon Co-authored-by: Nikita Leshenko Reviewed-by: Krish Sadhukhan Signed-off-by: Krish Sadhukhan Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx.c | 57 ++++++++++------------------------------- arch/x86/kvm/x86.c | 14 +++++++--- 3 files changed, 25 insertions(+), 47 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1bfb99770c34..dc0affe69903 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -980,6 +980,7 @@ struct kvm_x86_ops { void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa); void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); + void (*complete_nested_posted_interrupt)(struct kvm_vcpu *vcpu); int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bbc023fb9ef1..517822f94158 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -534,12 +534,6 @@ static bool pi_test_and_set_on(struct pi_desc *pi_desc) (unsigned long *)&pi_desc->control); } -static bool pi_test_and_clear_on(struct pi_desc *pi_desc) -{ - return test_and_clear_bit(POSTED_INTR_ON, - (unsigned long *)&pi_desc->control); -} - static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc) { return test_and_set_bit(vector, (unsigned long *)pi_desc->pir); @@ -5041,37 +5035,6 @@ static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu) } } - -static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - int max_irr; - void *vapic_page; - u16 status; - - if (!vmx->nested.pi_desc) - return; - - if (!pi_test_and_clear_on(vmx->nested.pi_desc)) - return; - - max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256); - if (max_irr != 256) { - vapic_page = kmap(vmx->nested.virtual_apic_page); - __kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page); - kunmap(vmx->nested.virtual_apic_page); - - status = vmcs_read16(GUEST_INTR_STATUS); - if ((u8)max_irr > ((u8)status & 0xff)) { - status &= ~0xff; - status |= (u8)max_irr; - vmcs_write16(GUEST_INTR_STATUS, status); - } - } - - nested_mark_vmcs12_pages_dirty(vcpu); -} - static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu, bool nested) { @@ -5120,11 +5083,6 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, vector == vmx->nested.posted_intr_nv) { /* the PIR and ON have been set by L1. */ kvm_vcpu_trigger_posted_interrupt(vcpu, true); - /* - * If a posted intr is not recognized by hardware, - * we will accomplish it in the next vmentry. - */ - kvm_make_request(KVM_REQ_EVENT, vcpu); return 0; } return -1; @@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector) kvm_vcpu_kick(vcpu); } +static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + WARN_ON(!vcpu->arch.apicv_active); + + if (is_guest_mode(vcpu) && vmx->nested.pi_desc && + pi_test_on(vmx->nested.pi_desc)) + kvm_vcpu_trigger_posted_interrupt(vcpu, true); +} + /* * Set up the vmcs's constant host-state fields, i.e., host-state fields that * will not change in the lifetime of the guest. @@ -6823,6 +6792,7 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_apicv()) { enable_apicv = 0; kvm_x86_ops->sync_pir_to_irr = NULL; + kvm_x86_ops->complete_nested_posted_interrupt = NULL; } if (cpu_has_vmx_tsc_scaling()) { @@ -11144,7 +11114,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) return 0; } - vmx_complete_nested_posted_interrupt(vcpu); return 0; } @@ -12136,6 +12105,8 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) .hwapic_isr_update = vmx_hwapic_isr_update, .sync_pir_to_irr = vmx_sync_pir_to_irr, .deliver_posted_interrupt = vmx_deliver_posted_interrupt, + .complete_nested_posted_interrupt = + vmx_complete_nested_posted_interrupt, .set_tss_addr = vmx_set_tss_addr, .get_tdp_level = get_ept_level, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 34c85aa2e2d1..6fb58ab9a85c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_mb__after_srcu_read_unlock(); /* - * This handles the case where a posted interrupt was - * notified with kvm_vcpu_kick. + * In case guest got the posted-interrupt notification + * vector while running in host, we need to make sure + * it arrives to guest. + * For L1 posted-interrupts, we manually sync PIR to IRR. + * For L2 posted-interrupts, we send notification-vector + * again by self IPI such that it will now be received in guest. */ - if (kvm_lapic_enabled(vcpu)) { - if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active) + if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) { + if (kvm_x86_ops->sync_pir_to_irr) kvm_x86_ops->sync_pir_to_irr(vcpu); + if (kvm_x86_ops->complete_nested_posted_interrupt) + kvm_x86_ops->complete_nested_posted_interrupt(vcpu); } if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)