From patchwork Wed Oct 24 19:39:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Orr X-Patchwork-Id: 10654917 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 12D8B1751 for ; Wed, 24 Oct 2018 19:39:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02B5C2B215 for ; Wed, 24 Oct 2018 19:39:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EADF32B219; Wed, 24 Oct 2018 19:39:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32C6A2B217 for ; Wed, 24 Oct 2018 19:39:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15C4D6B0007; Wed, 24 Oct 2018 15:39:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0E3836B0008; Wed, 24 Oct 2018 15:39:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC7136B000A; Wed, 24 Oct 2018 15:39:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) by kanga.kvack.org (Postfix) with ESMTP id C089B6B0007 for ; Wed, 24 Oct 2018 15:39:25 -0400 (EDT) Received: by mail-io1-f71.google.com with SMTP id w13-v6so5005751iop.2 for ; Wed, 24 Oct 2018 12:39:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:message-id:mime-version :subject:from:to:cc; bh=1tDGsT5+rxAFSigTJUazw2fIWdBCae4JsuSGI97O2rg=; b=hrtKQk8jpT+7/1OFHPCbO/g5tcCv86tqBoTbBnk4eLgywuS+fx7hG2sriSJz/7V8zX UE9abOiCYdMXn8UKQ6yX+Cm/G8lp5wvZzS1E5AuQG7Mkdi0HguNQm/RnBT0OeMTjIVNM oqx6JCPWfBudJg74CV557KfExukeoB1JrVW6WQKVIdCY6W2KwB2oxwBtOsUHnI+p6yEo IQWmAw1Z14NNJ4GfMRmn2wyiI6tevV6PZOff6hwfTskv/ZlcE4IA7w4jMY6drrvysOkQ xbqr8lqhlhqf/jgiIZCNhBiLMIqCqDDddMlk+ySdbzOhJV3hxQfDzhwo+FVK4NYepojf i8Kg== X-Gm-Message-State: AGRZ1gKG0eVYY5BN3zoPVN2GqT/SecMtnohWP1oG9+Mb18f60qTx5N10 M8uX4le6a/W1Q3IKJW6dUrhVDXV/mc097v/wQSSEYUJkmXXKgXr3sB28AmX1rENYNWG0WlWDuDh VYnA00zXSbGX+FdlIcxLiOild21U8E5qWLHek53hUiRC8HdNMukIL8BlwH7M8Vp/X9yXrn6Nbms s06fMgLWmqLySUSa6vvffnxEyiA0h7N/zo/QFe6SEmYxmgiXSpPz4kLhcLUUP8bh26VcDaAnvOw RLe+K55qHrCe/ioImtbsp8OrpiZ4FL114UJcpfwx9g2zNKyUFaUtZHBd+jsr5iT/wlF6Ggz+MDc lnTzmh/H9IApXKDkPiQfNW4wVwT5yABmQ8zIUtY4A6bxzV+xgolJWFvSuWXNosTIUtYz7GyYB9P z X-Received: by 2002:a6b:388b:: with SMTP id f133-v6mr11110815ioa.305.1540409965341; Wed, 24 Oct 2018 12:39:25 -0700 (PDT) X-Received: by 2002:a6b:388b:: with SMTP id f133-v6mr11110782ioa.305.1540409964177; Wed, 24 Oct 2018 12:39:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540409964; cv=none; d=google.com; s=arc-20160816; b=WapEDrjZlN8xiGpofdUGUoyT4oX43YnEe2Ic7gNE0AoXvfWymC6yymOyFa+Tra6SkC HMRVXHmqJckp/g/TTty+Tkgeg8CHaNrvEpHT83S/gtTw+4JxFrmKXPJ82E/iLa3LJXhP 2oj42Ir3f8WtzOCe9tS/Co0kkC/SOcd4NmCJiWRBE/gu0aRnt0EGG2dPSG+YJv6ehUli zeztG2N9ySsSXownwQQJYOqhCHi2Sk86ZGtlzY8uwhLD/r7R8q8f7XNtgSFn76bQ8SJ0 7snijbcfqEX7WkukQ1CSlerli0wydBAbuNr8xWnKS9F0R7vZxxL65ENKQuEcYn119Vfw +SPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:mime-version:message-id:date:dkim-signature; bh=1tDGsT5+rxAFSigTJUazw2fIWdBCae4JsuSGI97O2rg=; b=QHraL/v4omiCPZfNtlVNc9Bjt27TpGzkePnetnVka2mOHgyQPFGTGnMfSk8Hivp/yF +cKv+1vhpzi8dGDsbmszm687LTUv/wt2y4V0pirh31FzElAzAfAMsNeU3TXEl8BzXimL xH0DJt0+rypgRYz0wNujoa3NImNMGxC9Rewzon72RBkQ0D/9150XcmxGLtMSwlIA0NS8 7bIVJvv5//lWJYBtQDPX1q1qJjAQBXT3wJQZHbQUQJdkCW75Pg/+6Unxx+6FbCsKxA1p 6SAXJuu22zFfsc/mbeHrFCUccOK4VAKpNyu2dJ6LZjWX+/7lPU4yNwWPgPEz4iQQj1Rh n5Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="nZ8jHpW/"; spf=pass (google.com: domain of 3a8rqwwckccgqevgsvvksskpi.gsqpmryb-qqozego.svk@flex--marcorr.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3a8rQWwcKCCgQEVGSVVKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--marcorr.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id v64-v6sor9308270itb.31.2018.10.24.12.39.23 for (Google Transport Security); Wed, 24 Oct 2018 12:39:24 -0700 (PDT) Received-SPF: pass (google.com: domain of 3a8rqwwckccgqevgsvvksskpi.gsqpmryb-qqozego.svk@flex--marcorr.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="nZ8jHpW/"; spf=pass (google.com: domain of 3a8rqwwckccgqevgsvvksskpi.gsqpmryb-qqozego.svk@flex--marcorr.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3a8rQWwcKCCgQEVGSVVKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--marcorr.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=1tDGsT5+rxAFSigTJUazw2fIWdBCae4JsuSGI97O2rg=; b=nZ8jHpW/hW4fhHdN3NXcZhKCtoW0doje0WlNL4zZ4MjFybfpirVgcvLKKZ8l/5xis5 uORa0ApU99tFRrXcZZPUNWmc3kQHWh1PGP4mEGxraFSS7cWiZSMqw7Rz/TCfSGZN2GZ6 dCTBPFzkeUZ4EEDkUH0uQ5SGRyuTBWEVJcoqkKgVkQ31vLzhafYXupRQTb9jd/8ZI5Xy o4HLYjXcaLkHTQyISMboOnDDIe7vd56s9oLViBZxV0NorOQJgHr5BRUqLJ+4wDZIRapr 7SoM1Yp7dxe4IyF9fwPt1xmUmhSuGR8/p7h3Ti0wpCpp8Eruy3u/hhNb+JJw48iRXVIF 7Cfw== X-Google-Smtp-Source: AJdET5dIhGRdSfZlhKJFvbLqm9FhJSOPB1I5eGOwfUodj6OJeAQBoMYFgvQ5+if678FXnkXRMafh6/CqMYaM X-Received: by 2002:a24:10cd:: with SMTP id 196-v6mr2945852ity.21.1540409963723; Wed, 24 Oct 2018 12:39:23 -0700 (PDT) Date: Wed, 24 Oct 2018 12:39:12 -0700 Message-Id: <20181024193912.37318-1-marcorr@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog Subject: [kvm PATCH v3 1/1] kvm: vmx: use vmalloc() to allocate vcpus From: Marc Orr To: kvm@vger.kernel.org, jmattson@google.com, rientjes@google.com, konrad.wilk@oracle.com, linux-mm@kvack.org, akpm@linux-foundation.org, pbonzini@redhat.com, rkrcmar@redhat.com, willy@infradead.org Cc: Marc Orr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, vcpus were allocated through the kmem_cache_zalloc() API, which requires the underlying physical memory to be contiguous. Because the x86 vcpu struct, struct vcpu_vmx, is relatively large (e.g., currently 47680 bytes on my setup), it can become hard to find contiguous memory. At the same time, the comments in the code indicate that the primary reason for using the kmem_cache_zalloc() API is to align the memory rather than to provide physical contiguity. Thus, this patch updates the vcpu allocation logic for vmx to use the vmalloc() API. Note, this patch uses the __vmalloc_node_range() API, which is in the include/linux/vmalloc.h file. To use __vmalloc_node_range(), this patch exports the API. Signed-off-by: Marc Orr Reviewed-by: Matthew Wilcox --- arch/x86/kvm/vmx.c | 89 +++++++++++++++++++++++++++++++++++++---- include/linux/vmalloc.h | 1 + mm/vmalloc.c | 7 ++++ virt/kvm/kvm_main.c | 28 +++++++------ 4 files changed, 105 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index abeeb45d1c33..62fcc0d63585 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -898,7 +898,14 @@ struct nested_vmx { #define POSTED_INTR_ON 0 #define POSTED_INTR_SN 1 -/* Posted-Interrupt Descriptor */ +/* + * Posted-Interrupt Descriptor + * + * Note, the physical address of this structure is used by VMX. Furthermore, the + * translation code assumes that the entire pi_desc struct resides within a + * single page, which will be true because the struct is 64 bytes and 64-byte + * aligned. + */ struct pi_desc { u32 pir[8]; /* Posted interrupt requested */ union { @@ -970,8 +977,25 @@ static inline int pi_test_sn(struct pi_desc *pi_desc) struct vmx_msrs { unsigned int nr; - struct vmx_msr_entry val[NR_AUTOLOAD_MSRS]; + struct vmx_msr_entry *val; }; +struct kmem_cache *vmx_msr_entry_cache; + +/* + * To prevent vmx_msr_entry array from crossing a page boundary, require: + * sizeof(*vmx_msrs.vmx_msr_entry.val) to be a power of two. This is guaranteed + * through compile-time asserts that: + * - NR_AUTOLOAD_MSRS * sizeof(struct vmx_msr_entry) is a power of two + * - NR_AUTOLOAD_MSRS * sizeof(struct vmx_msr_entry) <= PAGE_SIZE + * - The allocation of vmx_msrs.vmx_msr_entry.val is aligned to its size. + */ +#define CHECK_POWER_OF_TWO(val) \ + BUILD_BUG_ON_MSG(!((val) && !((val) & ((val) - 1))), \ + #val " is not a power of two.") +#define CHECK_INTRA_PAGE(val) do { \ + CHECK_POWER_OF_TWO(val); \ + BUILD_BUG_ON(!(val <= PAGE_SIZE)); \ + } while (0) struct vcpu_vmx { struct kvm_vcpu vcpu; @@ -6616,6 +6640,14 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) } if (kvm_vcpu_apicv_active(&vmx->vcpu)) { + /* + * Note, pi_desc is contained within a single + * page because the struct is 64 bytes and 64-byte aligned. + */ + phys_addr_t pi_desc_phys = + page_to_phys(vmalloc_to_page(&vmx->pi_desc)) + + (u64)&vmx->pi_desc % PAGE_SIZE; + vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); vmcs_write64(EOI_EXIT_BITMAP2, 0); @@ -6624,7 +6656,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write16(GUEST_INTR_STATUS, 0); vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); - vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); + vmcs_write64(POSTED_INTR_DESC_ADDR, pi_desc_phys); } if (!kvm_pause_in_guest(vmx->vcpu.kvm)) { @@ -11476,19 +11508,34 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) free_loaded_vmcs(vmx->loaded_vmcs); kfree(vmx->guest_msrs); kvm_vcpu_uninit(vcpu); - kmem_cache_free(kvm_vcpu_cache, vmx); + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.guest.val); + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.host.val); + vfree(vmx); } static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; - struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + struct vcpu_vmx *vmx = vzalloc_account(sizeof(struct vcpu_vmx)); unsigned long *msr_bitmap; int cpu; if (!vmx) return ERR_PTR(-ENOMEM); + vmx->msr_autoload.guest.val = + kmem_cache_zalloc(vmx_msr_entry_cache, GFP_KERNEL); + if (!vmx->msr_autoload.guest.val) { + err = -ENOMEM; + goto free_vmx; + } + vmx->msr_autoload.host.val = + kmem_cache_zalloc(vmx_msr_entry_cache, GFP_KERNEL); + if (!vmx->msr_autoload.host.val) { + err = -ENOMEM; + goto free_msr_autoload_guest; + } + vmx->vpid = allocate_vpid(); err = kvm_vcpu_init(&vmx->vcpu, kvm, id); @@ -11576,7 +11623,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) kvm_vcpu_uninit(&vmx->vcpu); free_vcpu: free_vpid(vmx->vpid); - kmem_cache_free(kvm_vcpu_cache, vmx); + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.host.val); +free_msr_autoload_guest: + kmem_cache_free(vmx_msr_entry_cache, vmx->msr_autoload.guest.val); +free_vmx: + vfree(vmx); return ERR_PTR(err); } @@ -15153,6 +15204,10 @@ module_exit(vmx_exit); static int __init vmx_init(void) { int r; + size_t vmx_msr_entry_size = + sizeof(struct vmx_msr_entry) * NR_AUTOLOAD_MSRS; + + CHECK_INTRA_PAGE(vmx_msr_entry_size); #if IS_ENABLED(CONFIG_HYPERV) /* @@ -15183,10 +15238,25 @@ static int __init vmx_init(void) } #endif - r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), - __alignof__(struct vcpu_vmx), THIS_MODULE); + /* + * Disable kmem cache; vmalloc will be used instead + * to avoid OOM'ing when memory is available but not contiguous. + */ + r = kvm_init(&vmx_x86_ops, 0, 0, THIS_MODULE); if (r) return r; + /* + * A vmx_msr_entry array resides exclusively within the kernel. Thus, + * use kmem_cache_create_usercopy(), with the usersize argument set to + * ZERO, to blacklist copying vmx_msr_entry to/from user space. + */ + vmx_msr_entry_cache = + kmem_cache_create_usercopy("vmx_msr_entry", vmx_msr_entry_size, + vmx_msr_entry_size, SLAB_ACCOUNT, 0, 0, NULL); + if (!vmx_msr_entry_cache) { + r = -ENOMEM; + goto out; + } /* * Must be called after kvm_init() so enable_ept is properly set @@ -15210,5 +15280,8 @@ static int __init vmx_init(void) vmx_check_vmcs12_offsets(); return 0; +out: + kvm_exit(); + return r; } module_init(vmx_init); diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 398e9c95cd61..47ae6e19ea72 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -71,6 +71,7 @@ static inline void vmalloc_init(void) extern void *vmalloc(unsigned long size); extern void *vzalloc(unsigned long size); +extern void *vzalloc_account(unsigned long size); extern void *vmalloc_user(unsigned long size); extern void *vmalloc_node(unsigned long size, int node); extern void *vzalloc_node(unsigned long size, int node); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a728fc492557..20adc04d9558 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1846,6 +1846,13 @@ void *vzalloc(unsigned long size) } EXPORT_SYMBOL(vzalloc); +void *vzalloc_account(unsigned long size) +{ + return __vmalloc_node_flags(size, NUMA_NO_NODE, + GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT); +} +EXPORT_SYMBOL(vzalloc_account); + /** * vmalloc_user - allocate zeroed virtually contiguous memory for userspace * @size: allocation size diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 786ade1843a2..8b979e7c3ecd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4038,18 +4038,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, goto out_free_2; register_reboot_notifier(&kvm_reboot_notifier); - /* A kmem cache lets us meet the alignment requirements of fx_save. */ - if (!vcpu_align) - vcpu_align = __alignof__(struct kvm_vcpu); - kvm_vcpu_cache = - kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align, - SLAB_ACCOUNT, - offsetof(struct kvm_vcpu, arch), - sizeof_field(struct kvm_vcpu, arch), - NULL); - if (!kvm_vcpu_cache) { - r = -ENOMEM; - goto out_free_3; + /* + * When vcpu_size is zero, + * architecture-specific code manages its own vcpu allocation. + */ + kvm_vcpu_cache = NULL; + if (vcpu_size) { + if (!vcpu_align) + vcpu_align = __alignof__(struct kvm_vcpu); + kvm_vcpu_cache = kmem_cache_create_usercopy( + "kvm_vcpu", vcpu_size, vcpu_align, SLAB_ACCOUNT, + offsetof(struct kvm_vcpu, arch), + sizeof_field(struct kvm_vcpu, arch), NULL); + if (!kvm_vcpu_cache) { + r = -ENOMEM; + goto out_free_3; + } } r = kvm_async_pf_init();