From patchwork Sat Oct 20 21:11:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Orr X-Patchwork-Id: 10650599 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0896109C for ; Sat, 20 Oct 2018 21:13:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 168BD1FEBA for ; Sat, 20 Oct 2018 21:13:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 068B9286C7; Sat, 20 Oct 2018 21:13:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3CA7286BF for ; Sat, 20 Oct 2018 21:13:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726884AbeJUFZB (ORCPT ); Sun, 21 Oct 2018 01:25:01 -0400 Received: from mail-ua1-f74.google.com ([209.85.222.74]:56520 "EHLO mail-ua1-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726833AbeJUFZA (ORCPT ); Sun, 21 Oct 2018 01:25:00 -0400 Received: by mail-ua1-f74.google.com with SMTP id 32so3507376uaf.23 for ; Sat, 20 Oct 2018 14:13:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=841g6O24C2FMjldMr9VomvMEIv7DibxQljAQnpLM+ds=; b=OHCDMWSu3W2Uuf7KfqwQjSynoV04p1vu1hE3nPUusiOXv8QgSaxwSN+5FO8ETdQBg3 yPlP1lCt1rEjj2rv/KE7CA851W92lUYAoZt41AtGqXZW6HRd0m553RoEU/kJSFi+xqII TLuoFYGKhVYNOVNjTacumTec10LHbI5M8S+m8egekZWD09pAWEe8sa7hV50zE2hUY/r/ KEM4DiI5IfMcxTjHXNhLnkJpbYvA4iNQAXwEelbrWf7oyrzhkQ9MnJBdOIXzzDCrKurg F38GhanRMaXnE5AmQLgaT2s3ElcXQw1vx5jYlFkYkTmjoLXI81EctB/OaJJY+jlBah+3 stnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=841g6O24C2FMjldMr9VomvMEIv7DibxQljAQnpLM+ds=; b=Jsz3mJxTX6/hcWqISGXVeV6IQJ2i9N+5PgVCesqlVQ6hElZVV2dhSgObRPsLzJRmer yWyq1ssaewVNMsXuJOXQoZ+HNEtB4+97BsAVaudoslamb5wckTHBtlRmI5GuWPyNpXxq i0mboHCSBAeyuRU1xd6Kv/m+xcNo8Efc6EUB9j9kmCqVWtwyMi2VDrWSGYXo/ICTg8bf t/UkFSy+NrgWn7rgASNm+W51gXs/D3naIQhPLGURkdK5zUdD7sH9erZhmiEjs1dhqz9g yZw4khsrVih9ink84/qV6RbfnuxgKV4CtOS7Nlq23VUAamEYKYzaRZaGbOJPqnwOz00x 3pnQ== X-Gm-Message-State: ABuFfojpxxQB4McJcewDHYycEwHskvqKi6cRFF8fpp8vCLaBwEx0Jz6W UlPZi3Pp39scivYuTIqG5OK1z5jLxwOR2t9+yLnuKLJdlp4Cz0/uPd5+HGWZjLXgEmEpDgcnjl5 WNOlnOqsyAPBxdI4GsLsKlOLKhnyEMYen0uA6bZhrnx9WzMc5GLyzrAD+W9uR X-Google-Smtp-Source: ACcGV61+rXdkPFzZbj7Zm//L/ZS/13+CHcjAbRoAoRjKrbxQstX0F7o0FsUZh23ZzUqYU6JWK5ZcDV1IrXNf X-Received: by 2002:ab0:2151:: with SMTP id t17mr36806847ual.8.1540069994171; Sat, 20 Oct 2018 14:13:14 -0700 (PDT) Date: Sat, 20 Oct 2018 14:11:59 -0700 In-Reply-To: <20181020211200.255171-1-marcorr@google.com> Message-Id: <20181020211200.255171-2-marcorr@google.com> Mime-Version: 1.0 References: <20181020211200.255171-1-marcorr@google.com> X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog Subject: [kvm PATCH 1/2] mm: export __vmalloc_node_range() From: Marc Orr To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com Cc: Marc Orr Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The __vmalloc_node_range() is in the include/linux/vmalloc.h file, but it's not exported so it can't be used. This patch exports the API. The motivation to export it is so that we can do aligned vmalloc's of KVM vcpus. Signed-off-by: Marc Orr --- mm/vmalloc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a728fc492557..9e7974ab1da4 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1763,6 +1763,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, "vmalloc: allocation failure: %lu bytes", real_size); return NULL; } +EXPORT_SYMBOL_GPL(__vmalloc_node_range); /** * __vmalloc_node - allocate virtually contiguous memory From patchwork Sat Oct 20 21:12:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Orr X-Patchwork-Id: 10650601 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED097109C for ; Sat, 20 Oct 2018 21:13:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCF1128173 for ; Sat, 20 Oct 2018 21:13:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D00E228382; Sat, 20 Oct 2018 21:13:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D578F28173 for ; Sat, 20 Oct 2018 21:13:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727350AbeJUFZG (ORCPT ); Sun, 21 Oct 2018 01:25:06 -0400 Received: from mail-oi1-f202.google.com ([209.85.167.202]:37731 "EHLO mail-oi1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726993AbeJUFZF (ORCPT ); Sun, 21 Oct 2018 01:25:05 -0400 Received: by mail-oi1-f202.google.com with SMTP id t68-v6so25687491oih.4 for ; Sat, 20 Oct 2018 14:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ICvQjTKXxUpjh+eQ+miezgtXZKuBEjN04F8ycubEuZc=; b=J1tSwu/eiXGiApdk9PQS9qKAnScVFVHkR4Q4fMTrmRN/fnXhmuIpKY+gJbp8lHY4y9 /8YeTe5EbgI4kH4p5v7K4AQimH6WBl5+mTMrGKcq50NhcJbBO0KJdgMRup3d9xgCC3wp UbtPsfwdDVlBiNJVbybjBFwMgU+65V9iNGM0rFPBxp229de7KwB0otyBL/TGX2lD1Shg /HegQ/HtWTVykhSEkxZdNOQUzM4fZsC5MZAXqMjF9lnl38sxvbJyv9rBWaRc9o/yVBRO /L2TyJt3Q+vcnTWu8/K3GfeDCChDg2oAUDW7OB5ggjcKesthCPaIMeAIWBFTJDxwnlzp AvZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ICvQjTKXxUpjh+eQ+miezgtXZKuBEjN04F8ycubEuZc=; b=X7MqWs97IVe0WXE7BLV5zoED5c/7jwxhNl01bYL5+kPEUM7AIled2nvQyaahSYLXyE anUNxuDLOeBBTJw6Ta51c046AsN+v7X93Kwmhrnl5FeaUTLk7OD5+h/MwPgZXh2WfveS r7mr7BYl4fO/j2u7+qId8xtEkxub/cKB8c3zCkzWyPUJtEp4sWKnNGvXDgrXZpfBxQCn psdW8cqkDc3DOrJv7kBdpJmOXSeEXt0EwZwW20YIUsCKGFJ1Wl+Bgc3/q2k0IvJ0h6dg 4ntnnZ4FVet/qKSnIBIKUL3of1bShlh/iptuIy7DeJf3Z1noWEMChUFnrzwtRYvMrvh/ zQOg== X-Gm-Message-State: ABuFfogn8qti88sxBRSZqWcItEIrZqokfdgx+1iAHPIF83KYDVGJm/zV N8I+aAQXyb2PubPKmUWCDzsp7uxWp/JfBi2sccPk543RV9sfko04PWZcdq0ZlpASvrqk5Bs6xAY UU86M/2ekfyKdP+9ywYExPvZN3Dxhq2zxgnCtIF9394G71tghqNYCjS8HVrPM X-Google-Smtp-Source: ACcGV63vnpclD+ckzamcmW1RkPyU7SfuRJGCYMSuj86O9PZMezF7tL/1Eecj6aUGxTeo+Eh+6TODF4iRmzs2 X-Received: by 2002:a9d:42d3:: with SMTP id c19mr32626545otj.33.1540069999357; Sat, 20 Oct 2018 14:13:19 -0700 (PDT) Date: Sat, 20 Oct 2018 14:12:00 -0700 In-Reply-To: <20181020211200.255171-1-marcorr@google.com> Message-Id: <20181020211200.255171-3-marcorr@google.com> Mime-Version: 1.0 References: <20181020211200.255171-1-marcorr@google.com> X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog Subject: [kvm PATCH 2/2] kvm: vmx: use vmalloc() to allocate vcpus From: Marc Orr To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com Cc: Marc Orr Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Previously, vcpus were allocated through the kmem_cache_zalloc() API, which requires the underlying physical memory to be contiguous. Because the x86 vcpu struct, struct vcpu_vmx, is relatively large (e.g., currently 47680 bytes on my setup), it can become hard to find contiguous memory. At the same time, the comments in the code indicate that the primary reason for using the kmem_cache_zalloc() API is to align the memory rather than to provide physical contiguity. Thus, this patch updates the vcpu allocation logic for vmx to use the vmalloc() API. Signed-off-by: Marc Orr --- arch/x86/kvm/vmx.c | 98 +++++++++++++++++++++++++++++++++++++++++---- virt/kvm/kvm_main.c | 28 +++++++------ 2 files changed, 106 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index abeeb45d1c33..d480a2cc0667 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -898,7 +898,14 @@ struct nested_vmx { #define POSTED_INTR_ON 0 #define POSTED_INTR_SN 1 -/* Posted-Interrupt Descriptor */ +/* + * Posted-Interrupt Descriptor + * + * Note, the physical address of this structure is used by VMX. Furthermore, the + * translation code assumes that the entire pi_desc struct resides within a + * single page, which will be true because the struct is 64 bytes and 64-byte + * aligned. + */ struct pi_desc { u32 pir[8]; /* Posted interrupt requested */ union { @@ -970,8 +977,25 @@ static inline int pi_test_sn(struct pi_desc *pi_desc) struct vmx_msrs { unsigned int nr; - struct vmx_msr_entry val[NR_AUTOLOAD_MSRS]; + struct vmx_msr_entry *val; }; +struct kmem_cache *vmx_msr_entry_cache; + +/* + * To prevent vmx_msr_entry array from crossing a page boundary, require: + * sizeof(*vmx_msrs.vmx_msr_entry.val) to be a power of two. This is guaranteed + * through compile-time asserts that: + * - NR_AUTOLOAD_MSRS * sizeof(struct vmx_msr_entry) is a power of two + * - NR_AUTOLOAD_MSRS * sizeof(struct vmx_msr_entry) <= PAGE_SIZE + * - The allocation of vmx_msrs.vmx_msr_entry.val is aligned to its size. + */ +#define CHECK_POWER_OF_TWO(val) \ + BUILD_BUG_ON_MSG(!((val) && !((val) & ((val) - 1))), \ + #val " is not a power of two.") +#define CHECK_INTRA_PAGE(val) do { \ + CHECK_POWER_OF_TWO(val); \ + BUILD_BUG_ON(!(val <= PAGE_SIZE)); \ + } while (0) struct vcpu_vmx { struct kvm_vcpu vcpu; @@ -6616,6 +6640,14 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) } if (kvm_vcpu_apicv_active(&vmx->vcpu)) { + /* + * Note, pi_desc is contained within a single + * page because the struct is 64 bytes and 64-byte aligned. + */ + phys_addr_t pi_desc_phys = + page_to_phys(vmalloc_to_page(&vmx->pi_desc)) + + (u64)&vmx->pi_desc % PAGE_SIZE; + vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); vmcs_write64(EOI_EXIT_BITMAP2, 0); @@ -6624,7 +6656,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write16(GUEST_INTR_STATUS, 0); vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); - vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); + vmcs_write64(POSTED_INTR_DESC_ADDR, pi_desc_phys); } if (!kvm_pause_in_guest(vmx->vcpu.kvm)) { @@ -11476,19 +11508,43 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) free_loaded_vmcs(vmx->loaded_vmcs); kfree(vmx->guest_msrs); kvm_vcpu_uninit(vcpu); - kmem_cache_free(kvm_vcpu_cache, vmx); + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.guest.val); + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.host.val); + vfree(vmx); } static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; - struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + struct vcpu_vmx *vmx = __vmalloc_node_range( + sizeof(struct vcpu_vmx), + __alignof__(struct vcpu_vmx), + VMALLOC_START, + VMALLOC_END, + GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO | __GFP_ACCOUNT, + PAGE_KERNEL, + 0, + NUMA_NO_NODE, + __builtin_return_address(0)); unsigned long *msr_bitmap; int cpu; if (!vmx) return ERR_PTR(-ENOMEM); + vmx->msr_autoload.guest.val = + kmem_cache_zalloc(vmx_msr_entry_cache, GFP_KERNEL); + if (!vmx->msr_autoload.guest.val) { + err = -ENOMEM; + goto free_vmx; + } + vmx->msr_autoload.host.val = + kmem_cache_zalloc(vmx_msr_entry_cache, GFP_KERNEL); + if (!vmx->msr_autoload.host.val) { + err = -ENOMEM; + goto free_msr_autoload_guest; + } + vmx->vpid = allocate_vpid(); err = kvm_vcpu_init(&vmx->vcpu, kvm, id); @@ -11576,7 +11632,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) kvm_vcpu_uninit(&vmx->vcpu); free_vcpu: free_vpid(vmx->vpid); - kmem_cache_free(kvm_vcpu_cache, vmx); + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.host.val); +free_msr_autoload_guest: + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.guest.val); +free_vmx: + vfree(vmx); return ERR_PTR(err); } @@ -15153,6 +15213,10 @@ module_exit(vmx_exit); static int __init vmx_init(void) { int r; + size_t vmx_msr_entry_size = + sizeof(struct vmx_msr_entry) * NR_AUTOLOAD_MSRS; + + CHECK_INTRA_PAGE(vmx_msr_entry_size); #if IS_ENABLED(CONFIG_HYPERV) /* @@ -15183,10 +15247,25 @@ static int __init vmx_init(void) } #endif - r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), - __alignof__(struct vcpu_vmx), THIS_MODULE); + /* + * Disable kmem cache; vmalloc will be used instead + * to avoid OOM'ing when memory is available but not contiguous. + */ + r = kvm_init(&vmx_x86_ops, 0, 0, THIS_MODULE); if (r) return r; + /* + * A vmx_msr_entry array resides exclusively within the kernel. Thus, + * use kmem_cache_create_usercopy(), with the usersize argument set to + * ZERO, to blacklist copying vmx_msr_entry to/from user space. + */ + vmx_msr_entry_cache = + kmem_cache_create_usercopy("vmx_msr_entry", vmx_msr_entry_size, + vmx_msr_entry_size, SLAB_ACCOUNT, 0, 0, NULL); + if (!vmx_msr_entry_cache) { + r = -ENOMEM; + goto out; + } /* * Must be called after kvm_init() so enable_ept is properly set @@ -15210,5 +15289,8 @@ static int __init vmx_init(void) vmx_check_vmcs12_offsets(); return 0; +out: + kvm_exit(); + return r; } module_init(vmx_init); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 786ade1843a2..8b979e7c3ecd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4038,18 +4038,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, goto out_free_2; register_reboot_notifier(&kvm_reboot_notifier); - /* A kmem cache lets us meet the alignment requirements of fx_save. */ - if (!vcpu_align) - vcpu_align = __alignof__(struct kvm_vcpu); - kvm_vcpu_cache = - kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align, - SLAB_ACCOUNT, - offsetof(struct kvm_vcpu, arch), - sizeof_field(struct kvm_vcpu, arch), - NULL); - if (!kvm_vcpu_cache) { - r = -ENOMEM; - goto out_free_3; + /* + * When vcpu_size is zero, + * architecture-specific code manages its own vcpu allocation. + */ + kvm_vcpu_cache = NULL; + if (vcpu_size) { + if (!vcpu_align) + vcpu_align = __alignof__(struct kvm_vcpu); + kvm_vcpu_cache = kmem_cache_create_usercopy( + "kvm_vcpu", vcpu_size, vcpu_align, SLAB_ACCOUNT, + offsetof(struct kvm_vcpu, arch), + sizeof_field(struct kvm_vcpu, arch), NULL); + if (!kvm_vcpu_cache) { + r = -ENOMEM; + goto out_free_3; + } } r = kvm_async_pf_init();