From patchwork Mon Jul 11 20:54:05 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 9224207 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 80CE860890 for ; Mon, 11 Jul 2016 20:58:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 72D3D27DCE for ; Mon, 11 Jul 2016 20:58:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6742E27D76; Mon, 11 Jul 2016 20:58:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 98B2727DCE for ; Mon, 11 Jul 2016 20:58:36 +0000 (UTC) Received: (qmail 28307 invoked by uid 550); 11 Jul 2016 20:55:11 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: kernel-hardening@lists.openwall.com Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 28212 invoked from network); 11 Jul 2016 20:55:10 -0000 From: Andy Lutomirski To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org, Borislav Petkov , Nadav Amit , Kees Cook , Brian Gerst , "kernel-hardening@lists.openwall.com" , Linus Torvalds , Josh Poimboeuf , Jann Horn , Heiko Carstens , Andy Lutomirski Date: Mon, 11 Jul 2016 13:54:05 -0700 Message-Id: X-Mailer: git-send-email 2.7.4 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 X-Virus-Scanned: ClamAV using ClamSMTP Subject: [kernel-hardening] [PATCH v5 32/32] fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set X-Virus-Scanned: ClamAV using ClamSMTP vmalloc is a bit slow, and pounding vmalloc/vfree will eventually force a global TLB flush. To reduce pressure on them, if CONFIG_VMAP_STACK, cache two thread stacks per cpu. This will let us quickly allocate a hopefully cache-hot, TLB-hot stack under heavy forking workloads (shell script style). On my silly pthread_create benchmark, it saves about 2 µs per pthread_create+join with CONFIG_VMAP_STACK=y. Signed-off-by: Andy Lutomirski --- kernel/fork.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 53 insertions(+), 9 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 3003edae90fc..e8e21134ba23 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -159,15 +159,41 @@ void __weak arch_release_thread_stack(unsigned long *stack) * kmemcache based allocator. */ # if THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK) + +#ifdef CONFIG_VMAP_STACK +/* + * vmalloc is a bit slow, and calling vfree enough times will force a TLB + * flush. Try to minimize the number of calls by caching stacks. + */ +#define NR_CACHED_STACKS 2 +static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]); +#endif + static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) { #ifdef CONFIG_VMAP_STACK - void *stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE, - VMALLOC_START, VMALLOC_END, - THREADINFO_GFP | __GFP_HIGHMEM, - PAGE_KERNEL, - 0, node, - __builtin_return_address(0)); + void *stack; + int i; + + local_irq_disable(); + for (i = 0; i < NR_CACHED_STACKS; i++) { + struct vm_struct *s = this_cpu_read(cached_stacks[i]); + + if (!s) + continue; + this_cpu_write(cached_stacks[i], NULL); + + tsk->stack_vm_area = s; + local_irq_enable(); + return s->addr; + } + local_irq_enable(); + + stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE, + VMALLOC_START, VMALLOC_END, + THREADINFO_GFP | __GFP_HIGHMEM, + PAGE_KERNEL, + 0, node, __builtin_return_address(0)); /* * We can't call find_vm_area() in interrupt context, and @@ -187,10 +213,28 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) static inline void free_thread_stack(struct task_struct *tsk) { - if (task_stack_vm_area(tsk)) +#ifdef CONFIG_VMAP_STACK + if (task_stack_vm_area(tsk)) { + unsigned long flags; + int i; + + local_irq_save(flags); + for (i = 0; i < NR_CACHED_STACKS; i++) { + if (this_cpu_read(cached_stacks[i])) + continue; + + this_cpu_write(cached_stacks[i], tsk->stack_vm_area); + local_irq_restore(flags); + return; + } + local_irq_restore(flags); + vfree(tsk->stack); - else - free_kmem_pages((unsigned long)tsk->stack, THREAD_SIZE_ORDER); + return; + } +#endif + + free_kmem_pages((unsigned long)tsk->stack, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_stack_cache;