From patchwork Wed Oct 31 13:26:31 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marc Orr <marcorr@google.com>
X-Patchwork-Id: 10662673
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38EDB14DE
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:45 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 276F2284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:45 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 1B60528AA4; Wed, 31 Oct 2018 13:26:45 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,
	USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8960284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729088AbeJaWYo (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 31 Oct 2018 18:24:44 -0400
Received: from mail-yb1-f202.google.com ([209.85.219.202]:37200 "EHLO
        mail-yb1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728858AbeJaWYo (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 31 Oct 2018 18:24:44 -0400
Received: by mail-yb1-f202.google.com with SMTP id t60-v6so1229476ybi.4
        for <kvm@vger.kernel.org>; Wed, 31 Oct 2018 06:26:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=sag5MWmUMVeJlNMpxbgpOCQmQo4Ok3FbocF8rLJTZCg=;
        b=klEZe1+YLFreakUC1NNWYeyOnlaW0ntB3zzqwzbnKOaNr+HjvwFIoPO+mZH+KAwDlZ
         j+lseImIqnz9dnhMd0IopzDS5+E5z7GC8b1ifKqMksISoeug5GrN82adGqE2ttXDcJHr
         9h9IpYHmtTy6cIvoqClSy2yErvilzoRCGC936R0e+woy9EklHZAlZ2zXeyvO449TV7DP
         FnECVaZ8rWoA5SGl0kzb/l2Kjgtf4oPjArv7bo8AZ5qFCZymZpQPtMzWm+vq867jTIHJ
         WYP6G9Dua4VPzkl0oURZB8xSwkhWwSCYXNmVeYkpAakTx5Z/wiJ2oezakszw8tHPOlsU
         Rxjw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=sag5MWmUMVeJlNMpxbgpOCQmQo4Ok3FbocF8rLJTZCg=;
        b=bUv0wcigkKxOZeC4B5W3vDzfiTYQmDO1TwtsiTuYAnukVkAJlFJ+6AXCpYWxTarlII
         B7vvRlki0DR7BFeEW+mJxc2G3ByKz6HkIAen/xGS+KvfZ8R7OZ1H0u/lwfFP85CtGTFz
         KLT0fpqAUtw3OcIGYGsMwZVE7/zHKdgEAeQmdMyoU57Htm91zjN/b5/SThhqmmfQqttv
         +AtUcEjiFAhYZq4VmvYIeLOyV0+llYaSQw2WYNanmCCnwVuxI6Vl4Cix+D4mkBQZPZ3F
         /X1two+e9ksvCV1y5oZ4d2pW6Qyk+4d32S1dy+shm/Aa7jRyo7s7CKfHwzE9tEXBhHNL
         8I3g==
X-Gm-Message-State: AGRZ1gLtSa7U27I+N0VuAqy813CRTzW3Q05OVIY52A7mdNuXZv+J1hZN
        pRIoN4fjZ432AX7fDqB//t6uTR0zdzAElJHNeTrPImRBNUuXsPVuti7LsyP8h5e2DDl8XNGVAO9
        wz6POZz0Wl0FPGYekc+tDo1bq4Tmtwm5YxMzmTbbmhb3UVJDGj5Tx8I9zNZdN
X-Google-Smtp-Source: 
 AJdET5c7s7Wrujes1yN28z27Ablc0GtM2SWbbocQ15S6tHRBU9kdq6AxSSMsZhc+s2/n1eNSXZgn2HWdP18m
X-Received: by 2002:a25:cb56:: with SMTP id
 b83-v6mr1485845ybg.72.1540992402222;
 Wed, 31 Oct 2018 06:26:42 -0700 (PDT)
Date: Wed, 31 Oct 2018 06:26:31 -0700
In-Reply-To: <20181031132634.50440-1-marcorr@google.com>
Message-Id: <20181031132634.50440-2-marcorr@google.com>
Mime-Version: 1.0
References: <20181031132634.50440-1-marcorr@google.com>
X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog
Subject: [kvm PATCH v5 1/4] kvm: x86: Use task structs fpu field for user
From: Marc Orr <marcorr@google.com>
To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com,
        konrad.wilk@oracle.com, linux-mm@kvack.org,
        akpm@linux-foundation.org, pbonzini@redhat.com, rkrcmar@redhat.com,
        willy@infradead.org, sean.j.christopherson@intel.com,
        dave.hansen@linux.intel.com, kernellwp@gmail.com
Cc: Marc Orr <marcorr@google.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Previously, x86's instantiation of 'struct kvm_vcpu_arch' added an fpu
field to save/restore fpu-related architectural state, which will differ
from kvm's fpu state. However, this is redundant to the 'struct fpu'
field, called fpu, embedded in the task struct, via the thread field.
Thus, this patch removes the user_fpu field from the kvm_vcpu_arch
struct and replaces it with the task struct's fpu field.

This change is significant because the fpu struct is actually quite
large. For example, on the system used to develop this patch, this
change reduces the size of the vcpu_vmx struct from 23680 bytes down to
19520 bytes, when building the kernel with kvmconfig. This reduction in
the size of the vcpu_vmx struct moves us closer to being able to
allocate the struct at order 2, rather than order 3.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Marc Orr <marcorr@google.com>
---
 arch/x86/include/asm/kvm_host.h | 7 +++----
 arch/x86/kvm/x86.c              | 4 ++--
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55e51ff7e421..ebb1d7a755d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -601,16 +601,15 @@ struct kvm_vcpu_arch {
 
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
-	 * In vcpu_run, we switch between the user and guest FPU contexts.
-	 * While running a VCPU, the VCPU thread will have the guest FPU
-	 * context.
+	 * In vcpu_run, we switch between the user, maintained in the
+	 * task_struct struct, and guest FPU contexts. While running a VCPU,
+	 * the VCPU thread will have the guest FPU context.
 	 *
 	 * Note that while the PKRU state lives inside the fpu registers,
 	 * it is switched out separately at VMENTER and VMEXIT time. The
 	 * "guest_fpu" state here contains the guest FPU context, with the
 	 * host PRKU bits.
 	 */
-	struct fpu user_fpu;
 	struct fpu guest_fpu;
 
 	u64 xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bdcb5babfb68..ff77514f7367 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7999,7 +7999,7 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
-	copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
+	copy_fpregs_to_fpstate(&current->thread.fpu);
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
 				~XFEATURE_MASK_PKRU);
@@ -8012,7 +8012,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
-	copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+	copy_kernel_to_fpregs(&current->thread.fpu.state);
 	preempt_enable();
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);

From patchwork Wed Oct 31 13:26:32 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marc Orr <marcorr@google.com>
X-Patchwork-Id: 10662677
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE47013BF
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:47 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC034284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:47 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CFC4928AA4; Wed, 31 Oct 2018 13:26:47 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,
	USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0966C284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729204AbeJaWYr (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 31 Oct 2018 18:24:47 -0400
Received: from mail-qk1-f202.google.com ([209.85.222.202]:43831 "EHLO
        mail-qk1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729100AbeJaWYq (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 31 Oct 2018 18:24:46 -0400
Received: by mail-qk1-f202.google.com with SMTP id z126so242934qka.10
        for <kvm@vger.kernel.org>; Wed, 31 Oct 2018 06:26:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=uuP0VVlUnqXp2dMHDoSAgZr4dF/qtgdpqgl8mRbBnkk=;
        b=F+NB1RxcQr2bTutSgJwdifRc4zUjLOl3TxTiJHutBqkRq9urjO1g/Ejfoj2m02VRRi
         mXa/kOSHDf/1S4KttEeQI3hW+yIRaWyYJPtEj/dd2aj7aAHrB26s2t7pkMRg48EnSci0
         yqIfk90njcXNVQY1uiKJ68GfDVT5xiWF7pkMfgPTBRoaVTKiWJAhUyl1WlU2wL4toIIn
         JFjJ/iE4dnPv3Bx0Mt9D7bq70PkLz+bIg3jOY7BsGE/V5dK0Mv+LNlqAJSNn9tKRCUBF
         HaD5EDUQolCbdGjcBOeO52uO4hdoud3797xftSYdWIOQeDEFhmro/vIcS3OEMiXdi2eP
         8F5A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=uuP0VVlUnqXp2dMHDoSAgZr4dF/qtgdpqgl8mRbBnkk=;
        b=neTh9XbToWkEGegiT304uZOCHvcC5DphManHef6R+H8aqCWahTsUScxQCfDrmyj6XW
         rkSwIPQGAPDZnWGV4GZNAHhU2vQpHQDbXrZHDOhDzJGNe9uqMI3ThTLBnA7HZm2E+Ged
         Gj3TYP6iRVRLXbGyIPl5I8BS8ekrQ6rzDN5zmF/iJtb8P7s8ReTUECxLapZAc9RomyHQ
         fu3VQipUsWCwa9Zmr4kKsQLi1A9Ywtng70Yx/BJ4XPRZ+S9mUPix2ZRyNi6/V80aAaOI
         NUmlrbjo8YZhSr7DW+D7t536sGENNMkTco/9ZTQvLvIPXvhXr1a3cvu1hzzJZjUX+0bz
         ugQg==
X-Gm-Message-State: AGRZ1gKWvvyjWWKdQaxIDkb6Jy40iYUC0HT8yaaTIs65Dq1NE0n1LORf
        czq59eF59SsfIgbwUHs8maiGhv18TTVlcTME86Y2XY2dv6TlAIgUumT+0WgyGb46V3pXMyU4srs
        rznQEYiRFV8BZccf7Z7EtHrlus/pPbHAya2UHgC9W8Lvoo5EhDjlISJMmnNmI
X-Google-Smtp-Source: 
 AJdET5ctl87/dqXf6bdBRw29fQWrPDgKICEXJLDQF68l2QLKEa9Behs/YHIqguLELUiMZXKxyWi7agOifVkj
X-Received: by 2002:a37:9a90:: with SMTP id c138mr2150620qke.36.1540992404300;
 Wed, 31 Oct 2018 06:26:44 -0700 (PDT)
Date: Wed, 31 Oct 2018 06:26:32 -0700
In-Reply-To: <20181031132634.50440-1-marcorr@google.com>
Message-Id: <20181031132634.50440-3-marcorr@google.com>
Mime-Version: 1.0
References: <20181031132634.50440-1-marcorr@google.com>
X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog
Subject: [kvm PATCH v5 2/4] kvm: x86: Dynamically allocate guest_fpu
From: Marc Orr <marcorr@google.com>
To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com,
        konrad.wilk@oracle.com, linux-mm@kvack.org,
        akpm@linux-foundation.org, pbonzini@redhat.com, rkrcmar@redhat.com,
        willy@infradead.org, sean.j.christopherson@intel.com,
        dave.hansen@linux.intel.com, kernellwp@gmail.com
Cc: Marc Orr <marcorr@google.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Previously, the guest_fpu field was embedded in the kvm_vcpu_arch
struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my
current setup). This bloats the kvm_vcpu_arch struct for x86 into an
order 3 memory allocation, which can become a problem on overcommitted
machines. Thus, this patch moves the fpu state outside of the
kvm_vcpu_arch struct.

With this patch applied, the kvm_vcpu_arch struct is reduced to 15168
bytes for vmx on my setup when building the kernel with kvmconfig.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Marc Orr <marcorr@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/svm.c              | 10 ++++++++
 arch/x86/kvm/vmx.c              | 10 ++++++++
 arch/x86/kvm/x86.c              | 45 +++++++++++++++++++++++----------
 4 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ebb1d7a755d4..c8a2a263f91f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -610,7 +610,7 @@ struct kvm_vcpu_arch {
 	 * "guest_fpu" state here contains the guest FPU context, with the
 	 * host PRKU bits.
 	 */
-	struct fpu guest_fpu;
+	struct fpu *guest_fpu;
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
@@ -1194,6 +1194,7 @@ struct kvm_arch_async_pf {
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
+extern struct kmem_cache *x86_fpu_cache;
 
 #define __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f416f5c7f2ae..ac0c52ca22c6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2121,6 +2121,13 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 		goto out;
 	}
 
+	svm->vcpu.arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache, GFP_KERNEL);
+	if (!svm->vcpu.arch.guest_fpu) {
+		printk(KERN_ERR "kvm: failed to allocate vcpu's fpu\n");
+		err = -ENOMEM;
+		goto free_partial_svm;
+	}
+
 	err = kvm_vcpu_init(&svm->vcpu, kvm, id);
 	if (err)
 		goto free_svm;
@@ -2180,6 +2187,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 uninit:
 	kvm_vcpu_uninit(&svm->vcpu);
 free_svm:
+	kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu);
+free_partial_svm:
 	kmem_cache_free(kvm_vcpu_cache, svm);
 out:
 	return ERR_PTR(err);
@@ -2194,6 +2203,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_page(virt_to_page(svm->nested.hsave));
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
+	kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 	/*
 	 * The vmcb page can be recycled, causing a false negative in
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index abeeb45d1c33..4078cf15a4b0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11476,6 +11476,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 	free_loaded_vmcs(vmx->loaded_vmcs);
 	kfree(vmx->guest_msrs);
 	kvm_vcpu_uninit(vcpu);
+	kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
 	kmem_cache_free(kvm_vcpu_cache, vmx);
 }
 
@@ -11489,6 +11490,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	if (!vmx)
 		return ERR_PTR(-ENOMEM);
 
+	vmx->vcpu.arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache, GFP_KERNEL);
+	if (!vmx->vcpu.arch.guest_fpu) {
+		printk(KERN_ERR "kvm: failed to allocate vcpu's fpu\n");
+		err = -ENOMEM;
+		goto free_partial_vcpu;
+	}
+
 	vmx->vpid = allocate_vpid();
 
 	err = kvm_vcpu_init(&vmx->vcpu, kvm, id);
@@ -11576,6 +11584,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	kvm_vcpu_uninit(&vmx->vcpu);
 free_vcpu:
 	free_vpid(vmx->vpid);
+	kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
+free_partial_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vmx);
 	return ERR_PTR(err);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff77514f7367..420516f0749a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -213,6 +213,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 
 u64 __read_mostly host_xcr0;
 
+struct kmem_cache *x86_fpu_cache;
+EXPORT_SYMBOL_GPL(x86_fpu_cache);
+
 static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
 
 static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
@@ -3635,7 +3638,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 
 static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 {
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu.state.xsave;
+	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
 	u64 xstate_bv = xsave->header.xfeatures;
 	u64 valid;
 
@@ -3677,7 +3680,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 
 static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 {
-	struct xregs_state *xsave = &vcpu->arch.guest_fpu.state.xsave;
+	struct xregs_state *xsave = &vcpu->arch.guest_fpu->state.xsave;
 	u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
 	u64 valid;
 
@@ -3725,7 +3728,7 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 		fill_xsave((u8 *) guest_xsave->region, vcpu);
 	} else {
 		memcpy(guest_xsave->region,
-			&vcpu->arch.guest_fpu.state.fxsave,
+			&vcpu->arch.guest_fpu->state.fxsave,
 			sizeof(struct fxregs_state));
 		*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
 			XFEATURE_MASK_FPSSE;
@@ -3755,7 +3758,7 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 		if (xstate_bv & ~XFEATURE_MASK_FPSSE ||
 			mxcsr & ~mxcsr_feature_mask)
 			return -EINVAL;
-		memcpy(&vcpu->arch.guest_fpu.state.fxsave,
+		memcpy(&vcpu->arch.guest_fpu->state.fxsave,
 			guest_xsave->region, sizeof(struct fxregs_state));
 	}
 	return 0;
@@ -6819,10 +6822,23 @@ int kvm_arch_init(void *opaque)
 	}
 
 	r = -ENOMEM;
+	x86_fpu_cache = kmem_cache_create_usercopy(
+				"x86_fpu",
+				sizeof(struct fpu),
+				__alignof__(struct fpu),
+				SLAB_ACCOUNT,
+				offsetof(struct fpu, state),
+				sizeof_field(struct fpu, state),
+				NULL);
+	if (!x86_fpu_cache) {
+		printk(KERN_ERR "kvm: failed to allocate cache for x86 fpu\n");
+		goto out;
+	}
+
 	shared_msrs = alloc_percpu(struct kvm_shared_msrs);
 	if (!shared_msrs) {
 		printk(KERN_ERR "kvm: failed to allocate percpu kvm_shared_msrs\n");
-		goto out;
+		goto out_free_x86_fpu_cache;
 	}
 
 	r = kvm_mmu_module_init();
@@ -6855,6 +6871,8 @@ int kvm_arch_init(void *opaque)
 
 out_free_percpu:
 	free_percpu(shared_msrs);
+out_free_x86_fpu_cache:
+	kmem_cache_destroy(x86_fpu_cache);
 out:
 	return r;
 }
@@ -6878,6 +6896,7 @@ void kvm_arch_exit(void)
 	kvm_x86_ops = NULL;
 	kvm_mmu_module_exit();
 	free_percpu(shared_msrs);
+	kmem_cache_destroy(x86_fpu_cache);
 }
 
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
@@ -8001,7 +8020,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 	preempt_disable();
 	copy_fpregs_to_fpstate(&current->thread.fpu);
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
-	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
+	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
 				~XFEATURE_MASK_PKRU);
 	preempt_enable();
 	trace_kvm_fpu(1);
@@ -8011,7 +8030,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
-	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
+	copy_fpregs_to_fpstate(vcpu->arch.guest_fpu);
 	copy_kernel_to_fpregs(&current->thread.fpu.state);
 	preempt_enable();
 	++vcpu->stat.fpu_reload;
@@ -8506,7 +8525,7 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 
 	vcpu_load(vcpu);
 
-	fxsave = &vcpu->arch.guest_fpu.state.fxsave;
+	fxsave = &vcpu->arch.guest_fpu->state.fxsave;
 	memcpy(fpu->fpr, fxsave->st_space, 128);
 	fpu->fcw = fxsave->cwd;
 	fpu->fsw = fxsave->swd;
@@ -8526,7 +8545,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 
 	vcpu_load(vcpu);
 
-	fxsave = &vcpu->arch.guest_fpu.state.fxsave;
+	fxsave = &vcpu->arch.guest_fpu->state.fxsave;
 
 	memcpy(fxsave->st_space, fpu->fpr, 128);
 	fxsave->cwd = fpu->fcw;
@@ -8582,9 +8601,9 @@ static int sync_regs(struct kvm_vcpu *vcpu)
 
 static void fx_init(struct kvm_vcpu *vcpu)
 {
-	fpstate_init(&vcpu->arch.guest_fpu.state);
+	fpstate_init(&vcpu->arch.guest_fpu->state);
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		vcpu->arch.guest_fpu.state.xsave.header.xcomp_bv =
+		vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv =
 			host_xcr0 | XSTATE_COMPACTION_ENABLED;
 
 	/*
@@ -8708,11 +8727,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		 */
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
-		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
+		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
 					XFEATURE_MASK_BNDREGS);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state));
-		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
+		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu->state.xsave,
 					XFEATURE_MASK_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));

From patchwork Wed Oct 31 13:26:33 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marc Orr <marcorr@google.com>
X-Patchwork-Id: 10662679
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA61E14DE
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:49 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A1CEA28AA4
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:49 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 90684284BD; Wed, 31 Oct 2018 13:26:49 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,
	USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02FFD284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729147AbeJaWYt (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 31 Oct 2018 18:24:49 -0400
Received: from mail-oi1-f201.google.com ([209.85.167.201]:44263 "EHLO
        mail-oi1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729100AbeJaWYs (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 31 Oct 2018 18:24:48 -0400
Received: by mail-oi1-f201.google.com with SMTP id j192-v6so7601183oih.11
        for <kvm@vger.kernel.org>; Wed, 31 Oct 2018 06:26:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=+kKh85pxhdnY1kL+7UzqrV5xVyad1ppPxeuCWtpIWyw=;
        b=Ct2XtX9b23GZ26vl6VP5NA3NZg7RWq519mYx7oJ8CrLrgjowUDaiZmDRTdQDGls9mT
         adpzHqPePiiIeDFZMbj92pvz6N2BDauDdNyROzkGus8ImDbwPxtAbVvqr/ycqpMvQJHv
         S3XKoeOaUs74PlB+MOu/CD58m/hbdCNGs1SmDNmsyMF4afG/1S2Lq1rp4hF0sEa7yUIi
         6GBq8IONXg300L89jOOR1PBFinf4RCMuDc1lycFEnQi2oB70ZeZoUOH2J3dIpzYyR3Nw
         pxXtwnSaNrHPkltm+9T5rfgPTEqid56qv/OqGBClscQUvX8F0cdH55m9IZGHlRVhYn48
         MABg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=+kKh85pxhdnY1kL+7UzqrV5xVyad1ppPxeuCWtpIWyw=;
        b=EpIyc0SsvwUJ/SET5jaZax7wudEcMbc/CU9P5Vhu+kg9dN3hcS4PsrkJqISpRHeojL
         4uT53GoC7jZ4uIf78Aa60EAq5tmIdBYULkWIAR6cht5lNKptbblaVmg6UXa4cf6NNvlf
         ag12pL/12veJTNZxw3lKlOpe7ojfVMfaBzjdG3OXNOJ2OOYYOTqXxWFS066gLFqLN0KB
         pZ7eEa7k4IBzRomYLCvaP7/fYWvQZiIde6HqOGFBCmnSZ/kPu6PVJJRrA++g+BI+YHyB
         kBkp3axfk1MM8UAQHy1oaIKUAyqqmU/SXX+shmpUxiPaVTqPQHxxANCJPdJtU4VUWkU7
         1W0Q==
X-Gm-Message-State: AGRZ1gJS5jNE8+CPZQEUPWIvC53yshBJVaZ5p9/xj5uKy5K7XM7zvsWs
        eG3Cho7ZO9CcUXrFcFlPNfPWyVB5xPsvzAwFBdA9qB9y3JB5aPX5DbZBZ9Q9k4lWjuAekAudEnb
        58fCKqjaEDn68UvzU7FtbXo2TGk/jk1xq0DHYXcFZH5csTo42Yipe+B+72is/
X-Google-Smtp-Source: 
 AJdET5enSEQeBJOcQVgXZpGLBcQtZd0h76K8S3C874gGlFjLhe01hHCqTSH9nWeq2d0OTbNmqW0DBLEbWib0
X-Received: by 2002:a9d:2944:: with SMTP id d62mr1890189otb.9.1540992406394;
 Wed, 31 Oct 2018 06:26:46 -0700 (PDT)
Date: Wed, 31 Oct 2018 06:26:33 -0700
In-Reply-To: <20181031132634.50440-1-marcorr@google.com>
Message-Id: <20181031132634.50440-4-marcorr@google.com>
Mime-Version: 1.0
References: <20181031132634.50440-1-marcorr@google.com>
X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog
Subject: [kvm PATCH v5 3/4] kvm: vmx: refactor vmx_msrs struct for vmalloc
From: Marc Orr <marcorr@google.com>
To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com,
        konrad.wilk@oracle.com, linux-mm@kvack.org,
        akpm@linux-foundation.org, pbonzini@redhat.com, rkrcmar@redhat.com,
        willy@infradead.org, sean.j.christopherson@intel.com,
        dave.hansen@linux.intel.com, kernellwp@gmail.com
Cc: Marc Orr <marcorr@google.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Previously, the vmx_msrs struct relied being aligned within a struct
that is backed by the direct map (e.g., memory allocated with kalloc()).
Specifically, this enabled the virtual addresses associated with the
struct to be translated to physical addresses. However, we'd like to
refactor the host struct, vcpu_vmx, to be allocated with vmalloc(), so
that allocation will succeed when contiguous physical memory is scarce.

Thus, this patch refactors how vmx_msrs is declared and allocated, to
ensure that it can be mapped to the physical address space, even when
vmx_msrs resides within in a vmalloc()'d struct.

Signed-off-by: Marc Orr <marcorr@google.com>
---
 arch/x86/kvm/vmx.c | 57 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4078cf15a4b0..315cf4b5f262 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -970,8 +970,25 @@ static inline int pi_test_sn(struct pi_desc *pi_desc)
 
 struct vmx_msrs {
 	unsigned int		nr;
-	struct vmx_msr_entry	val[NR_AUTOLOAD_MSRS];
+	struct vmx_msr_entry	*val;
 };
+struct kmem_cache *vmx_msr_entry_cache;
+
+/*
+ * To prevent vmx_msr_entry array from crossing a page boundary, require:
+ * sizeof(*vmx_msrs.vmx_msr_entry.val) to be a power of two. This is guaranteed
+ * through compile-time asserts that:
+ *   - NR_AUTOLOAD_MSRS * sizeof(struct vmx_msr_entry) is a power of two
+ *   - NR_AUTOLOAD_MSRS * sizeof(struct vmx_msr_entry) <= PAGE_SIZE
+ *   - The allocation of vmx_msrs.vmx_msr_entry.val is aligned to its size.
+ */
+#define CHECK_POWER_OF_TWO(val) \
+	BUILD_BUG_ON_MSG(!((val) && !((val) & ((val) - 1))), \
+	#val " is not a power of two.")
+#define CHECK_INTRA_PAGE(val) do { \
+		CHECK_POWER_OF_TWO(val); \
+		BUILD_BUG_ON(!(val <= PAGE_SIZE)); \
+	} while (0)
 
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
@@ -11497,6 +11514,19 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 		goto free_partial_vcpu;
 	}
 
+	vmx->msr_autoload.guest.val =
+		kmem_cache_zalloc(vmx_msr_entry_cache, GFP_KERNEL);
+	if (!vmx->msr_autoload.guest.val) {
+		err = -ENOMEM;
+		goto free_fpu;
+	}
+	vmx->msr_autoload.host.val =
+		kmem_cache_zalloc(vmx_msr_entry_cache, GFP_KERNEL);
+	if (!vmx->msr_autoload.host.val) {
+		err = -ENOMEM;
+		goto free_msr_autoload_guest;
+	}
+
 	vmx->vpid = allocate_vpid();
 
 	err = kvm_vcpu_init(&vmx->vcpu, kvm, id);
@@ -11584,6 +11614,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	kvm_vcpu_uninit(&vmx->vcpu);
 free_vcpu:
 	free_vpid(vmx->vpid);
+	kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.host.val);
+free_msr_autoload_guest:
+	kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.guest.val);
+free_fpu:
 	kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
 free_partial_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vmx);
@@ -15163,6 +15197,10 @@ module_exit(vmx_exit);
 static int __init vmx_init(void)
 {
 	int r;
+	size_t vmx_msr_entry_size =
+		sizeof(struct vmx_msr_entry) * NR_AUTOLOAD_MSRS;
+
+	CHECK_INTRA_PAGE(vmx_msr_entry_size);
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	/*
@@ -15194,9 +15232,21 @@ static int __init vmx_init(void)
 #endif
 
 	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
-		     __alignof__(struct vcpu_vmx), THIS_MODULE);
+		__alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
 		return r;
+	/*
+	 * A vmx_msr_entry array resides exclusively within the kernel. Thus,
+	 * use kmem_cache_create_usercopy(), with the usersize argument set to
+	 * ZERO, to blacklist copying vmx_msr_entry to/from user space.
+	 */
+	vmx_msr_entry_cache =
+		kmem_cache_create_usercopy("vmx_msr_entry", vmx_msr_entry_size,
+				  vmx_msr_entry_size, SLAB_ACCOUNT, 0, 0, NULL);
+	if (!vmx_msr_entry_cache) {
+		r = -ENOMEM;
+		goto out;
+	}
 
 	/*
 	 * Must be called after kvm_init() so enable_ept is properly set
@@ -15220,5 +15270,8 @@ static int __init vmx_init(void)
 	vmx_check_vmcs12_offsets();
 
 	return 0;
+out:
+	kvm_exit();
+	return r;
 }
 module_init(vmx_init);

From patchwork Wed Oct 31 13:26:34 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marc Orr <marcorr@google.com>
X-Patchwork-Id: 10662683
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33EC113BF
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:51 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24298284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:51 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 17FAF28AA4; Wed, 31 Oct 2018 13:26:51 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,
	USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 923CB284BD
	for <patchwork-kvm@patchwork.kernel.org>;
 Wed, 31 Oct 2018 13:26:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729266AbeJaWYu (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Wed, 31 Oct 2018 18:24:50 -0400
Received: from mail-pg1-f201.google.com ([209.85.215.201]:46974 "EHLO
        mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729118AbeJaWYu (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 31 Oct 2018 18:24:50 -0400
Received: by mail-pg1-f201.google.com with SMTP id 75-v6so11467663pgc.13
        for <kvm@vger.kernel.org>; Wed, 31 Oct 2018 06:26:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=unPH4bSgTFjVNwKk6oCcUeWwQ3wTeQ6A3j9CRmR6rxE=;
        b=cTbNiZHLLqiCsYSAZzQUyUDFYhvrkAWgtf3cj0Miwz59hKR9ktOnzwuowCNk10ND2T
         XeC/gRxIHywBmmJIRBm202aMLKcmxqyqY5Jtg5SqkFm4X2H6ffysMCn7ybtrWYT+PaaG
         M0udFkpE2MJmeBW3yYirc9TQFPR+lwIX/UJkCkjvvv5cDEEMWsZV5vM7mQLnxMTRsD3j
         ifkQ6gYckIfuYPQVcq+Uen7rBOmcmYiU5wMG6yroNThexULCmjpdl7EG7m7JYQpb5ct4
         65Rcfut8b2u83RbIRPgKl12fBTrwroL2+60iS17EuEts6hEnbbtCZ/zelxNB/5rTu84f
         W3AA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=unPH4bSgTFjVNwKk6oCcUeWwQ3wTeQ6A3j9CRmR6rxE=;
        b=PQknEiU/89mcytkCBw/FhIEuyMEZBSikZSPLt365hmC6OHXEQ4EtYPtFeJ60o3mS++
         PStyaLkPgCKw09b8/nQh6O0O7oPJQKb3JCGkWdLvIl2fE5/5ntC5go+Xq3NDh64GNtY9
         xTpDlH4sVPZtJ54adhnyn3KltHYSTnzGjFTXZRtVBzRP3PyGxxtbS36IS5lI1kKdMU+X
         +gFsed308T/Fwv9mVNHLYHna2p+KuDCfbmq4lMXctSR41TgmDh/OBAwAbLsxI/WbikIr
         4zonGIM5u0tcZ8is8ZbVffYt1wkhFwoMAuf8QQU7MZT9n3QE5wdu8NvuPlaudp9NPzwp
         JWEQ==
X-Gm-Message-State: AGRZ1gL1YTollcTVKJAG0EgbyUytcTmmJVloihFADdVoWiPbzTgsrLPw
        awA/938iCNtket692nVfBTQOJjsvobNWb5iXfrH5nO1u5CTkv2wXZUX8Sn1mjGpZMycD/3ulTvX
        E5S57xWeyDwaf1d98U7Tpl8ENmK5WRI4gJWBBPnLS3CEmL49LQwDUdfDm4AiA
X-Google-Smtp-Source: 
 AJdET5fwqKY/nofJYGiflINsQ/c34vPpLl2kZUVixIhMBX5Oty0tx4FrTX1q4lr7MlcFLh3fR7e1oRtlW1FC
X-Received: by 2002:a17:902:20e8:: with SMTP id
 v37-v6mr1039308plg.42.1540992408341;
 Wed, 31 Oct 2018 06:26:48 -0700 (PDT)
Date: Wed, 31 Oct 2018 06:26:34 -0700
In-Reply-To: <20181031132634.50440-1-marcorr@google.com>
Message-Id: <20181031132634.50440-5-marcorr@google.com>
Mime-Version: 1.0
References: <20181031132634.50440-1-marcorr@google.com>
X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog
Subject: [kvm PATCH v5 4/4] kvm: vmx: use vmalloc() to allocate vcpus
From: Marc Orr <marcorr@google.com>
To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com,
        konrad.wilk@oracle.com, linux-mm@kvack.org,
        akpm@linux-foundation.org, pbonzini@redhat.com, rkrcmar@redhat.com,
        willy@infradead.org, sean.j.christopherson@intel.com,
        dave.hansen@linux.intel.com, kernellwp@gmail.com
Cc: Marc Orr <marcorr@google.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Previously, vcpus were allocated through the kmem_cache_zalloc() API,
which requires the underlying physical memory to be contiguous.
Because the x86 vcpu struct, struct vcpu_vmx, is relatively large
(e.g., currently 47680 bytes on my setup), it can become hard to find
contiguous memory.

At the same time, the comments in the code indicate that the primary
reason for using the kmem_cache_zalloc() API is to align the memory
rather than to provide physical contiguity.

Thus, this patch updates the vcpu allocation logic for vmx to use the
vmalloc() API.

Signed-off-by: Marc Orr <marcorr@google.com>
---
 arch/x86/kvm/vmx.c  | 37 ++++++++++++++++++++++++++++++-------
 virt/kvm/kvm_main.c | 28 ++++++++++++++++------------
 2 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 315cf4b5f262..af651540ee45 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -898,7 +898,14 @@ struct nested_vmx {
 #define POSTED_INTR_ON  0
 #define POSTED_INTR_SN  1
 
-/* Posted-Interrupt Descriptor */
+/*
+ * Posted-Interrupt Descriptor
+ *
+ * Note, the physical address of this structure is used by VMX. Furthermore, the
+ * translation code assumes that the entire pi_desc struct resides within a
+ * single page, which will be true because the struct is 64 bytes and 64-byte
+ * aligned.
+ */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
 	union {
@@ -6633,6 +6640,14 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 	}
 
 	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
+		/*
+		 * Note, pi_desc is contained within a single
+		 * page because the struct is 64 bytes and 64-byte aligned.
+		 */
+		phys_addr_t pi_desc_phys =
+			page_to_phys(vmalloc_to_page(&vmx->pi_desc)) +
+			(u64)&vmx->pi_desc % PAGE_SIZE;
+
 		vmcs_write64(EOI_EXIT_BITMAP0, 0);
 		vmcs_write64(EOI_EXIT_BITMAP1, 0);
 		vmcs_write64(EOI_EXIT_BITMAP2, 0);
@@ -6641,7 +6656,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		vmcs_write16(GUEST_INTR_STATUS, 0);
 
 		vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
-		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
+		vmcs_write64(POSTED_INTR_DESC_ADDR, pi_desc_phys);
 	}
 
 	if (!kvm_pause_in_guest(vmx->vcpu.kvm)) {
@@ -11494,13 +11509,18 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 	kfree(vmx->guest_msrs);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
-	kmem_cache_free(kvm_vcpu_cache, vmx);
+	kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.guest.val);
+	kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.host.val);
+	vfree(vmx);
 }
 
 static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 {
 	int err;
-	struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	struct vcpu_vmx *vmx =
+		__vmalloc(sizeof(struct vcpu_vmx),
+			  GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT,
+			  PAGE_KERNEL);
 	unsigned long *msr_bitmap;
 	int cpu;
 
@@ -11620,7 +11640,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 free_fpu:
 	kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
 free_partial_vcpu:
-	kmem_cache_free(kvm_vcpu_cache, vmx);
+	vfree(vmx);
 	return ERR_PTR(err);
 }
 
@@ -15231,8 +15251,11 @@ static int __init vmx_init(void)
 	}
 #endif
 
-	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
-		__alignof__(struct vcpu_vmx), THIS_MODULE);
+	/*
+	 * Disable kmem cache; vmalloc will be used instead
+	 * to avoid OOM'ing when memory is available but not contiguous.
+	 */
+	r = kvm_init(&vmx_x86_ops, 0, 0, THIS_MODULE);
 	if (r)
 		return r;
 	/*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 786ade1843a2..8b979e7c3ecd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4038,18 +4038,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		goto out_free_2;
 	register_reboot_notifier(&kvm_reboot_notifier);
 
-	/* A kmem cache lets us meet the alignment requirements of fx_save. */
-	if (!vcpu_align)
-		vcpu_align = __alignof__(struct kvm_vcpu);
-	kvm_vcpu_cache =
-		kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align,
-					   SLAB_ACCOUNT,
-					   offsetof(struct kvm_vcpu, arch),
-					   sizeof_field(struct kvm_vcpu, arch),
-					   NULL);
-	if (!kvm_vcpu_cache) {
-		r = -ENOMEM;
-		goto out_free_3;
+	/*
+	 * When vcpu_size is zero,
+	 * architecture-specific code manages its own vcpu allocation.
+	 */
+	kvm_vcpu_cache = NULL;
+	if (vcpu_size) {
+		if (!vcpu_align)
+			vcpu_align = __alignof__(struct kvm_vcpu);
+		kvm_vcpu_cache = kmem_cache_create_usercopy(
+			"kvm_vcpu", vcpu_size, vcpu_align, SLAB_ACCOUNT,
+			offsetof(struct kvm_vcpu, arch),
+			sizeof_field(struct kvm_vcpu, arch), NULL);
+		if (!kvm_vcpu_cache) {
+			r = -ENOMEM;
+			goto out_free_3;
+		}
 	}
 
 	r = kvm_async_pf_init();