From patchwork Fri Feb 16 14:23:49 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: KarimAllah Ahmed <karahmed@amazon.de>
X-Patchwork-Id: 10224809
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	2029560467 for <patchwork-kvm@patchwork.kernel.org>;
	Fri, 16 Feb 2018 14:24:15 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0FE9629489
	for <patchwork-kvm@patchwork.kernel.org>;
	Fri, 16 Feb 2018 14:24:15 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 02D9B29492; Fri, 16 Feb 2018 14:24:14 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3961A29489
	for <patchwork-kvm@patchwork.kernel.org>;
	Fri, 16 Feb 2018 14:24:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1034268AbeBPOYK (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Fri, 16 Feb 2018 09:24:10 -0500
Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:15839 "EHLO
	smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1034154AbeBPOYH (ORCPT <rfc822; kvm@vger.kernel.org>);
	Fri, 16 Feb 2018 09:24:07 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209;
	t=1518791047; x=1550327047;
	h=from:to:cc:subject:date:message-id:in-reply-to: references;
	bh=GmgJO5++J0MnVZSIEQUtVCMhfpSCD+I18FuvLDFjLD4=;
	b=IT0jkfc25tYG1O4upiA+ObragYi3GTwen/7Uw+nJ9Bs/T6YKB6YYgULX
	ICg6WfQVY8T9ZgRq1rH/Gm+TdaTLNfIzmtek+CXvAircIcmd92R8Nb65t
	lLVY9uQgYcalnV3pnbVxUmp34c7FldDSeKKQ024fDx/TAvx9ASw86cDDE E=;
X-IronPort-AV: E=Sophos;i="5.46,519,1511827200"; d="scan'208";a="332290723"
Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO
	email-inbound-relay-2b-c300ac87.us-west-2.amazon.com)
	([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with
	ESMTP/TLS/DHE-RSA-AES256-SHA; 16 Feb 2018 14:24:05 +0000
Received: from u54e1ad5160425a4b64ea.ant.amazon.com
	(pdx2-ws-svc-lb17-vlan3.amazon.com [10.247.140.70])
	by email-inbound-relay-2b-c300ac87.us-west-2.amazon.com
	(8.14.7/8.14.7) with ESMTP id w1GEO1wr006319
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 16 Feb 2018 14:24:02 GMT
Received: from u54e1ad5160425a4b64ea.ant.amazon.com (localhost [127.0.0.1])
	by u54e1ad5160425a4b64ea.ant.amazon.com (8.15.2/8.15.2/Debian-3)
	with ESMTP id w1GEO0IK031835; Fri, 16 Feb 2018 15:24:00 +0100
Received: (from karahmed@localhost)
	by u54e1ad5160425a4b64ea.ant.amazon.com (8.15.2/8.15.2/Submit) id
	w1GENxd1031831; Fri, 16 Feb 2018 15:23:59 +0100
From: KarimAllah Ahmed <karahmed@amazon.de>
To: kvm@vger.kernel.org, x86@kernel.org
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Subject: [RFC 1/2] KVM/nVMX: Cleanly exit from L2 to L1 on user-space exit
Date: Fri, 16 Feb 2018 15:23:49 +0100
Message-Id: <1518791030-31765-2-git-send-email-karahmed@amazon.de>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1518791030-31765-1-git-send-email-karahmed@amazon.de>
References: <1518791030-31765-1-git-send-email-karahmed@amazon.de>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On exit to L0 user-space, always exit from L2 to L1 and synchronize the
state properly for L1. This ensures that user-space only ever sees L1
state. It also allows L1 to be saved and resumed properly. Obviously
horrible things will still happen to the L2 guest. This will be handled in
a seperate patch.

There is only a single case which requires a bit of extra care. When the
decision to switch to user space happens while handling an L1
VMRESUME/VMLAUNCH (i.e. pending_nested_run). In order to handle this
as cleanly as possible without major restructuring, we simply do not exit
to user-space in this case and give L2 another chance to actually run. We
also request an immediate exit to ensure that an exit to user space will
still happen for the L2.

The only reason I can see where an exit to user space will occur while L2
is running is because of a pending signal. The is how user space preempts
the KVM_RUN in order to save the state. L2 exits are either handled in L0
kernel or reflected to L1 and not handled in L0 user-space.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx.c              | 39 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              | 33 ++++++++++++++++++++++++++++-----
 3 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 318a414..2c8be56 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -961,6 +961,8 @@ struct kvm_x86_ops {
 			      struct msr_bitmap_range *whitelist);
 
 	void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
+	void (*prepare_exit_user)(struct kvm_vcpu *vcpu);
+	bool (*allow_exit_user)(struct kvm_vcpu *vcpu);
 	void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
 	void (*vcpu_put)(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 52539be..22eb0dc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2130,6 +2130,42 @@ static unsigned long segment_base(u16 selector)
 }
 #endif
 
+static bool vmx_allow_exit_user(struct kvm_vcpu *vcpu)
+{
+	return !to_vmx(vcpu)->nested.nested_run_pending;
+}
+
+static void vmx_prepare_exit_user(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (vmx->nested.current_vmptr == -1ull)
+                return;
+
+	/*
+	 * If L2 is running no need to update vmcs12 from shadow VMCS.
+	 * Just force an exit from L2 to L1
+	 */
+	if (is_guest_mode(vcpu)) {
+		/*
+		 * Pretend that an external interrupt occurred while L2 is
+		 * running to cleanly exit into L1.
+		 */
+		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+
+		/* Switch from L2 MMU to L1 MMU */
+		kvm_mmu_reset_context(vcpu);
+	} else if (enable_shadow_vmcs) {
+		copy_shadow_to_vmcs12(vmx);
+	}
+
+	/* Flush VMCS12 to guest memory */
+	kvm_write_guest(vcpu->kvm, vmx->nested.current_vmptr,
+			get_vmcs12(vcpu), sizeof(*vmx->nested.cached_vmcs12));
+
+	return;
+}
+
 static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -12440,6 +12476,9 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 
 	.whitelist_msrs = vmx_whitelist_msrs,
 
+	.prepare_exit_user = vmx_prepare_exit_user,
+	.allow_exit_user = vmx_allow_exit_user,
+
 	.prepare_guest_switch = vmx_save_host_state,
 	.vcpu_load = vmx_vcpu_load,
 	.vcpu_put = vmx_vcpu_put,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2cfbf39..8256a2d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -996,6 +996,12 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_rdpmc);
 
+static __always_inline bool should_exit_user(struct kvm_vcpu *vcpu)
+{
+	return signal_pending(current) && (kvm_x86_ops->allow_exit_user ?
+					   kvm_x86_ops->allow_exit_user(vcpu): true);
+}
+
 /*
  * List of msr numbers which we expose to userspace through KVM_GET_MSRS
  * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST.
@@ -7187,8 +7193,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
-	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) || need_resched()) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -7198,6 +7203,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		goto cancel_injection;
 	}
 
+	if (signal_pending(current)) {
+		if (kvm_x86_ops->allow_exit_user &&
+		    kvm_x86_ops->allow_exit_user(vcpu)) {
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			smp_wmb();
+			local_irq_enable();
+			preempt_enable();
+			vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+			r = 1;
+			goto cancel_injection;
+		} else
+			req_immediate_exit = true;
+	}
+
 	kvm_load_guest_xcr0(vcpu);
 
 	if (req_immediate_exit) {
@@ -7364,7 +7383,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 
 		kvm_check_async_pf_completion(vcpu);
 
-		if (signal_pending(current)) {
+		if (should_exit_user(vcpu)) {
 			r = -EINTR;
 			vcpu->run->exit_reason = KVM_EXIT_INTR;
 			++vcpu->stat.signal_exits;
@@ -7506,11 +7525,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	} else
 		WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
 
-	if (kvm_run->immediate_exit)
+	if (kvm_run->immediate_exit) {
 		r = -EINTR;
-	else
+	} else {
 		r = vcpu_run(vcpu);
 
+		if (kvm_x86_ops->prepare_exit_user)
+			kvm_x86_ops->prepare_exit_user(vcpu);
+	}
+
 out:
 	kvm_put_guest_fpu(vcpu);
 	post_kvm_run_save(vcpu);