From patchwork Fri Oct 19 17:35:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 10649917 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 748821508 for ; Fri, 19 Oct 2018 17:36:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AA2225F3E for ; Fri, 19 Oct 2018 17:36:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4E7E028498; Fri, 19 Oct 2018 17:36:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 737AD25F3E for ; Fri, 19 Oct 2018 17:36:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 061846B000D; Fri, 19 Oct 2018 13:35:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 044F46B0010; Fri, 19 Oct 2018 13:35:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCD5B6B000E; Fri, 19 Oct 2018 13:35:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by kanga.kvack.org (Postfix) with ESMTP id 4B2B16B000C for ; Fri, 19 Oct 2018 13:35:57 -0400 (EDT) Received: by mail-lj1-f199.google.com with SMTP id v62-v6so10373178lje.9 for ; Fri, 19 Oct 2018 10:35:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=lpl4qHwQkGX2E7wjIMpLE4BBjhWg3HJCZnRKWFPJCQ0=; b=i4JVUKnlylNWjSlO2RlQ/ZHF5TtBdkNFUK7LieMLDlGTDk+5IxIhTeJGZxhwKtURYe dk28bVxufDPrJvthDHO2mtU3EC/fmUpIcpY9rPpQyNDPUF0CwkiiFAuu9LuMQhpdzLki E2IZTZiAGFmaUmg/ySF+b09mQRcaPqMtJz1huh/xJe7EFEzBxFJug0HItglzQhpEfhup KmxjxCYwEQK7+Kq3rPZrnatGQUkhSE/GDIi88mmW/gyMUJYJ+R+CUTFS8OCym4cHxWIP Sl1en6V+OuJ5KFgjVhMvFnrgDSFnEV7DWW7QU/ZyNqVd11XWJwerRAEtrej4elLBrvHw ngTA== X-Gm-Message-State: ABuFfohjY+p/yxlkW/LSKMOqC0AZGxY0T/Jzcj3DlPXAFjGXdB5rVwG4 jR4pS9sYl9jh7gC/sF+MBPv4dOIyxAabAI2ywvn28Ar/luMSLm/QYSimldAXY7slnoPeJAKv10W Z5usk4z2XXzyWMqasLl0JeL01KTkQMOdYZndGpPcuCCDElH6N4f4z4mYoM7iY239Z3BQOqxKP6t dVMXrYwwfgrFxjC+m8xn3+a6navsCUJ+3SG9pB3IMpVzSM+DMtNa+k1dXHMZNwbncMOswEMDwdH jCjV3BpA0Uhq4bK7Rd6Zq6vzX8PTkRYjqcITLUy5D4vgIKZQ7AitFjXGzsN1bhdPXBQrthqwQks CsuT4X6c35iT1zzBIo1j6QBD5h0GHMyNcGtQEeUwept2o1/c9aBIZzdnxtzerS+fLLAwmknnHtA J X-Received: by 2002:a2e:5b96:: with SMTP id m22-v6mr20358898lje.115.1539970556424; Fri, 19 Oct 2018 10:35:56 -0700 (PDT) X-Received: by 2002:a2e:5b96:: with SMTP id m22-v6mr20358809lje.115.1539970553196; Fri, 19 Oct 2018 10:35:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539970553; cv=none; d=google.com; s=arc-20160816; b=duFJbneAUdn0ORiZt+FLi0z+EzhEaxGX9v5iBHofU5N67gI9DMmxdx9JxvZSqptxhX a5cX5YkNpd8BBO65u4iWDeczm7l5g0T4hy0+wtpuZV0xr1nZC3TDLPUJRK4w7+a5PcfR Dbg5fBuwwP/EHMHs+rwFmiV06FBHOrTPaQEsPDwMIvPzxxg0GeUeZO1YWscYG3yjBnLm T2Yl6BiwM1FHRI3O3Slkjex4rWPfMaQytY5NkdPNlGfYYQJ4SLX+jxIKe0bW14Jen6pJ 6nq2cGMkcMGZo2w8/hG5NVPzFwve7T5uXe7I7fsh0aWQntZoAYKeYXYlVrWB14l7PCXu axoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lpl4qHwQkGX2E7wjIMpLE4BBjhWg3HJCZnRKWFPJCQ0=; b=XlzKzUTSPkiQH1jyuOuBaUtRUdIZ/WLa0MaCKXdve5TdjEQikLVLgRBKqIRUq7/sC6 DukrpI6d4tM114M8DgBx2DjN/DAeGKtfm1YQTxuE4k0gEmpJcRPzWQtMOVt01a95jFKM kRvyvVylUzMefGHSI2+75cuihSLdcr4ooTCsGqyivYDs1a5sy5RDjxEoyuHghjg/mXKa YDoihSN2vv0+mdN680P/mp/uXVI5SjL5ROzinYGe7VjlJg31MRBAsQQiqOdgl9wpkSay JNOe0zO4scw4FxTHx/oyghMdkQrymO4bl7LudVWbDUmiHZsuxOVvQwL4w1vlYlJq46Da /bzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=piXPp5si; spf=pass (google.com: domain of urezki@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id o195-v6sor7561855lfa.42.2018.10.19.10.35.53 for (Google Transport Security); Fri, 19 Oct 2018 10:35:53 -0700 (PDT) Received-SPF: pass (google.com: domain of urezki@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=piXPp5si; spf=pass (google.com: domain of urezki@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lpl4qHwQkGX2E7wjIMpLE4BBjhWg3HJCZnRKWFPJCQ0=; b=piXPp5sitacbI84Dp5Psec9LzdohahAXn5TNUxmDqksZJU9E09b1lKpEJxdZx83MIP dVcqluD8DQH4WFfuyp2TaBFlRFIp8ZFOhBlrlQMqd6QgLWlckC1ob2NKNi8ha8eXbeat S/IrCVDT7RqmZtJlblips/I3uZtPRObnHtc4GTv2VvYjz/n3pqhgb5aRiF/cvdfp+bsD mg1Sur7Hnrig290tsyDjTb/7WPTKX5NQwfJVe3SzeqJ3Ov/ZBAoQ3kk4dcriJQgkadIL KIhcpjNVy9l+o9LR5p2FVfLgdXhDRRratRNCkwUHI1gAB1u2b8TB1Wed1rggqq6Etjo7 XSgw== X-Google-Smtp-Source: ACcGV63M/b9zcNrDvRuMt2B8Yhko3js9AQmi6mCgBhNl1yY5Bf1dTwcn/EB0PQN8ZpiuHmEdmutn8Q== X-Received: by 2002:a19:f813:: with SMTP id a19mr3540174lff.67.1539970552180; Fri, 19 Oct 2018 10:35:52 -0700 (PDT) Received: from pc636.semobile.internal ([37.139.158.167]) by smtp.gmail.com with ESMTPSA id k9-v6sm5531530lje.51.2018.10.19.10.35.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Oct 2018 10:35:51 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: Matthew Wilcox , Andrew Morton , linux-mm@kvack.org Cc: LKML , Michal Hocko , Thomas Garnier , Oleksiy Avramchenko , Steven Rostedt , Joel Fernandes , Thomas Gleixner , Ingo Molnar , Tejun Heo , "Uladzislau Rezki (Sony)" Subject: [RFC PATCH 1/2] mm/vmalloc: keep track of free blocks for allocation Date: Fri, 19 Oct 2018 19:35:37 +0200 Message-Id: <20181019173538.590-2-urezki@gmail.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181019173538.590-1-urezki@gmail.com> References: <20181019173538.590-1-urezki@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently an allocation of the new VA area is done over busy list iteration until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown. Due to long list and different permissive parameters an allocation can take a long time on embedded devices(milliseconds). This patch organizes the vmalloc memory layout into free areas of the VMALLOC_START-VMALLOC_END range. It uses a red-black tree that keeps blocks sorted by their offsets in pair with linked list keeping the free space in order of increasing addresses. Allocation: to allocate a new block a search is done over free list areas until a suitable block is large enough to encompass the requested size. If the block is bigger than requested size - it is split. De-allocation: red-black tree allows efficiently find a spot in the tree whereas a linked list allows fast merge of de-allocated memory chunks with existing free blocks creating large coalesced areas. model name: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz test_1: for (n = 0; n < 1000000; n++) { void *ptr_1 = vmalloc(3 * PAGE_SIZE); *((__u8 *)ptr_1) = 0; /* Pretend we used the mem */ vfree(ptr_1); } 1218459(us) vs 1146597(us) 5% 1219721(us) vs 1145212(us) 6% 1226255(us) vs 1142134(us) 6% 1239828(us) vs 1144809(us) 7% 1232131(us) vs 1144775(us) 7% test_2: for (n = 0; n < 15000; n++) ptr[n] = vmalloc(1 * PAGE_SIZE); for (n = 0; n < 1000000; n++) { void *ptr_1 = vmalloc(100 * PAGE_SIZE); void *ptr_2 = vmalloc(1 * PAGE_SIZE); *((__u8 *)ptr_1) = 0; /* Pretend we used the mem */ *((__u8 *)ptr_2) = 1; /* Pretend we used the mem */ vfree(ptr_1); vfree(ptr_2); } 55866315(us) vs 15037680(us) 73% 57601435(us) vs 14809454(us) 74% 52612371(us) vs 14550292(us) 72% 48894648(us) vs 14769538(us) 69% 55718063(us) vs 14727350(us) 73% Signed-off-by: Uladzislau Rezki (Sony) --- include/linux/vmalloc.h | 2 +- mm/vmalloc.c | 836 ++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 668 insertions(+), 170 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 398e9c95cd61..01a73d4795f4 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -46,11 +46,11 @@ struct vmap_area { unsigned long va_start; unsigned long va_end; unsigned long flags; + unsigned long align; struct rb_node rb_node; /* address sorted rbtree */ struct list_head list; /* address sorted list */ struct llist_node purge_list; /* "lazy purge" list */ struct vm_struct *vm; - struct rcu_head rcu_head; }; /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index cfea25be7754..a7f257540a05 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -326,12 +326,57 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREE 0x02 #define VM_VM_AREA 0x04 +/* + * Define default padding for alignment purposes. + */ +#define VMALLOC_MAX_PADDING THREAD_ALIGN + static DEFINE_SPINLOCK(vmap_area_lock); /* Export for kexec only */ LIST_HEAD(vmap_area_list); static LLIST_HEAD(vmap_purge_list); static struct rb_root vmap_area_root = RB_ROOT; +/* + * This linked list is used in pair with free_vmap_area_root. + * It makes it possible of fast accessing to next/prev nodes + * to perform coalescing. + */ +static LIST_HEAD(free_vmap_area_list); + +/* + * This red-black tree is used for storing address-sorted + * vmap areas during free operation. Sorting is done using + * va_start address. We make use of it to merge a VA with + * its prev/next neighbors. + */ +static struct rb_root free_vmap_area_root = RB_ROOT; + +/* + * This cache list is used for keeping free vmap_area objects. + * Basically, when free VA area is split, a remaining space has + * to be placed back to free list/tree structures. Instead of + * allocating from slab we reuse vmap_area objects from this + * cache. + */ +static LIST_HEAD(free_va_cache); + +/* + * This is a cache size counter. A maximum cache size depends on + * lazy_max_pages() and is not higher than lazy_max_pages() / 2. + * A "purge layer" drains free areas feeding the cache back when + * the threshold is crossed. + */ +static unsigned long free_va_cache_size; + +/* + * For vmalloc specific area allocation. + */ +static struct vmap_area *last_free_area; +static unsigned long last_alloc_vstart; +static unsigned long last_alloc_align; +static unsigned long free_va_max_size; + /* The vmap cache globals are protected by vmap_area_lock */ static struct rb_node *free_vmap_cache; static unsigned long cached_hole_size; @@ -340,6 +385,10 @@ static unsigned long cached_align; static unsigned long vmap_area_pcpu_hole; +static void purge_vmap_area_lazy(void); +static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); +static unsigned long lazy_max_pages(void); + static struct vmap_area *__find_vmap_area(unsigned long addr) { struct rb_node *n = vmap_area_root.rb_node; @@ -359,41 +408,411 @@ static struct vmap_area *__find_vmap_area(unsigned long addr) return NULL; } -static void __insert_vmap_area(struct vmap_area *va) +static inline bool +__put_free_va_to_cache(struct vmap_area *va) +{ + if (free_va_cache_size < (lazy_max_pages() >> 1)) { + list_add(&va->list, &free_va_cache); + free_va_cache_size++; + return true; + } + + return false; +} + +static inline void * +__get_free_va_from_cache(void) { - struct rb_node **p = &vmap_area_root.rb_node; - struct rb_node *parent = NULL; - struct rb_node *tmp; + struct vmap_area *va; + + va = list_first_entry_or_null(&free_va_cache, + struct vmap_area, list); + + if (va) { + list_del(&va->list); + free_va_cache_size--; + } - while (*p) { - struct vmap_area *tmp_va; + return va; +} + +static inline void +__find_va_slot(struct vmap_area *va, + struct rb_root *root, struct rb_node *from, + struct rb_node **parent, struct rb_node ***link) +{ + struct vmap_area *tmp_va; + + if (root) { + *link = &root->rb_node; + if (unlikely(!**link)) { + *parent = NULL; + return; + } + } else { + *link = &from; + } - parent = *p; - tmp_va = rb_entry(parent, struct vmap_area, rb_node); + do { + tmp_va = rb_entry(**link, struct vmap_area, rb_node); if (va->va_start < tmp_va->va_end) - p = &(*p)->rb_left; + *link = &(**link)->rb_left; else if (va->va_end > tmp_va->va_start) - p = &(*p)->rb_right; + *link = &(**link)->rb_right; else BUG(); + } while (**link); + + /* + * Return back addresses of parent node of VA and + * parent's left/right link for further inserting. + */ + *parent = &tmp_va->rb_node; +} + +static inline void +__find_va_free_siblings(struct rb_node *parent, struct rb_node **link, + struct list_head **prev, struct list_head **next) +{ + struct list_head *list; + + if (likely(parent)) { + list = &rb_entry(parent, struct vmap_area, rb_node)->list; + if (&parent->rb_right == link) { + *next = list->next; + *prev = list; + } else { + *prev = list->prev; + *next = list; + } + } else { + /* + * The red-black tree where we try to find VA neighbors + * before merging or inserting is empty, i.e. it means + * there is no free vmalloc space. Normally it does not + * happen but we handle this case anyway. + */ + *next = *prev = &free_vmap_area_list; + } +} + +static inline void +__link_va(struct vmap_area *va, struct rb_root *root, + struct rb_node *parent, struct rb_node **link, struct list_head *head) +{ + /* + * VA is still not in the list, but we can + * identify its future previous list_head node. + */ + if (likely(parent)) { + head = &rb_entry(parent, struct vmap_area, rb_node)->list; + if (&parent->rb_right != link) + head = head->prev; } - rb_link_node(&va->rb_node, parent, p); - rb_insert_color(&va->rb_node, &vmap_area_root); + /* Insert to the rb-tree */ + rb_link_node(&va->rb_node, parent, link); + rb_insert_color(&va->rb_node, root); - /* address-sort this list */ - tmp = rb_prev(&va->rb_node); - if (tmp) { - struct vmap_area *prev; - prev = rb_entry(tmp, struct vmap_area, rb_node); - list_add_rcu(&va->list, &prev->list); - } else - list_add_rcu(&va->list, &vmap_area_list); + /* Address-sort this list */ + list_add(&va->list, head); } -static void purge_vmap_area_lazy(void); +static inline void +__unlink_va(struct vmap_area *va, struct rb_root *root) +{ + /* + * During merging a VA node can be empty, therefore + * not linked with the tree nor list. Just check it. + */ + if (!RB_EMPTY_NODE(&va->rb_node)) { + rb_erase(&va->rb_node, root); + list_del(&va->list); + } +} -static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); +static void +__insert_vmap_area(struct vmap_area *va, + struct rb_root *root, struct list_head *head) +{ + struct rb_node **link; + struct rb_node *parent; + + __find_va_slot(va, root, NULL, &parent, &link); + __link_va(va, root, parent, link, head); +} + +static inline void +__remove_vmap_area(struct vmap_area *va, struct rb_root *root) +{ + __unlink_va(va, root); + + /* + * Fill the cache. If it is full, we just free VA. + */ + if (!__put_free_va_to_cache(va)) + kfree(va); +} + +/* + * Merge de-allocated chunk of VA memory with previous + * and next free blocks. Either a pointer to the new + * merged area is returned if coalesce is done or VA + * area if inserting is done. + */ +static inline struct vmap_area * +__merge_add_free_va_area(struct vmap_area *va, + struct rb_root *root, struct list_head *head) +{ + struct vmap_area *sibling; + struct list_head *next, *prev; + struct rb_node **link; + struct rb_node *parent; + bool merged = false; + + /* + * To perform merging we have to restore an area which belongs + * to this VA if the allocation has been done with specific align + * value. In case of PCPU allocations nothing is changed. + */ + if (va->align <= VMALLOC_MAX_PADDING) + va->va_end = ALIGN(va->va_end, va->align); + + /* + * Find a place in the tree where VA potentially will be + * inserted, unless it is merged with its sibling/siblings. + */ + __find_va_slot(va, root, NULL, &parent, &link); + + /* + * Get next/prev nodes of VA to check if merging can be done. + */ + __find_va_free_siblings(parent, link, &prev, &next); + + /* + * start end + * | | + * |<------VA------>|<-----Next----->| + * | | + * start end + */ + if (next != head) { + sibling = list_entry(next, struct vmap_area, list); + if (sibling->va_start == va->va_end) { + sibling->va_start = va->va_start; + __remove_vmap_area(va, root); + + /* Point to the new merged area. */ + va = sibling; + merged = true; + } + } + + /* + * start end + * | | + * |<-----Prev----->|<------VA------>| + * | | + * start end + */ + if (prev != head) { + sibling = list_entry(prev, struct vmap_area, list); + if (sibling->va_end == va->va_start) { + sibling->va_end = va->va_end; + __remove_vmap_area(va, root); + + /* Point to the new merged area. */ + va = sibling; + merged = true; + } + } + + if (!merged) + __link_va(va, root, parent, link, head); + + return va; +} + +enum alloc_fit_type { + NOTHING_FIT = 0, + FL_FIT_TYPE = 1, /* full fit */ + LE_FIT_TYPE = 2, /* left edge fit */ + RE_FIT_TYPE = 3, /* right edge fit */ + NE_FIT_TYPE = 4 /* no edge fit */ +}; + +static inline unsigned long +alloc_vmalloc_area(struct vmap_area **fl_fit_va, unsigned long size, + unsigned long align, unsigned long vstart, unsigned long vend) +{ + unsigned long nva_start_addr; + struct vmap_area *va, *lva; + struct rb_node *parent; + struct rb_node **link; + u8 fit_type; + + va = last_free_area; + fit_type = NOTHING_FIT; + *fl_fit_va = NULL; + + /* + * Use aligned size if the align value is within + * allowed padding range. This is done to reduce + * external fragmentation. + */ + if (align <= VMALLOC_MAX_PADDING) + size = ALIGN(size, align); + + if (!last_free_area || size < free_va_max_size || + vstart < last_alloc_vstart || + align < last_alloc_align) { + va = list_first_entry_or_null(&free_vmap_area_list, + struct vmap_area, list); + + if (unlikely(!va)) + return vend; + + free_va_max_size = 0; + last_free_area = NULL; + } + + nva_start_addr = ALIGN(vstart, align); + list_for_each_entry_from(va, &free_vmap_area_list, list) { + if (va->va_start > vstart) + nva_start_addr = ALIGN(va->va_start, align); + + /* + * Sanity test for following scenarios: + * - overflow, due to big size; + * - vend restriction check; + * - vstart check, due to big align. + */ + if (nva_start_addr + size < nva_start_addr || + nva_start_addr + size > vend || + nva_start_addr < vstart) + break; + + /* + * VA does not fit to requested parameters. In this case we + * calculate max available aligned size if nva_start_addr is + * within this VA. + */ + if (nva_start_addr + size > va->va_end) { + if (nva_start_addr < va->va_end) + free_va_max_size = max(free_va_max_size, + va->va_end - nva_start_addr); + continue; + } + + /* Classify what we have found. */ + if (va->va_start == nva_start_addr) { + if (va->va_end == nva_start_addr + size) + fit_type = FL_FIT_TYPE; + else + fit_type = LE_FIT_TYPE; + } else if (va->va_end == nva_start_addr + size) { + fit_type = RE_FIT_TYPE; + } else { + fit_type = NE_FIT_TYPE; + } + + last_free_area = va; + last_alloc_vstart = vstart; + last_alloc_align = align; + break; + } + + if (fit_type == FL_FIT_TYPE) { + /* + * No need to split VA, it fully fits. + * + * | | + * V NVA V + * |---------------| + */ + if (va->list.prev != &free_vmap_area_list) + last_free_area = list_prev_entry(va, list); + else + last_free_area = NULL; + + __unlink_va(va, &free_vmap_area_root); + *fl_fit_va = va; + } else if (fit_type == LE_FIT_TYPE) { + /* + * Split left edge fit VA. + * + * | | + * V NVA V R + * |-------|-------| + */ + va->va_start += size; + } else if (fit_type == RE_FIT_TYPE) { + /* + * Split right edge fit VA. + * + * | | + * L V NVA V + * |-------|-------| + */ + va->va_end = nva_start_addr; + } else if (fit_type == NE_FIT_TYPE) { + /* + * Split no edge fit VA. + * + * | | + * L V NVA V R + * |---|-------|---| + */ + lva = __get_free_va_from_cache(); + if (!lva) { + lva = kmalloc(sizeof(*lva), GFP_NOWAIT); + if (unlikely(!lva)) + return vend; + } + + /* + * Build the remainder. + */ + lva->va_start = va->va_start; + lva->va_end = nva_start_addr; + + /* + * Shrink this VA to remaining size. + */ + va->va_start = nva_start_addr + size; + + /* + * Add the remainder to the address sorted free list/tree. + */ + __find_va_slot(lva, NULL, &va->rb_node, &parent, &link); + __link_va(lva, &free_vmap_area_root, + parent, link, &free_vmap_area_list); + } else { + /* Not found. */ + nva_start_addr = vend; + } + + return nva_start_addr; +} + +static inline struct vmap_area * +kmalloc_va_node_leak_scan(gfp_t mask, int node) +{ + struct vmap_area *va; + + mask &= GFP_RECLAIM_MASK; + + va = kmalloc_node(sizeof(*va), mask, node); + if (unlikely(!va)) + return NULL; + + /* + * Only scan the relevant parts containing pointers + * to other objects to avoid false negatives. + */ + kmemleak_scan_area(&va->rb_node, SIZE_MAX, mask); + return va; +} /* * Allocate a region of KVA of the specified size and alignment, within the @@ -404,11 +823,12 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, unsigned long vstart, unsigned long vend, int node, gfp_t gfp_mask) { - struct vmap_area *va; + struct vmap_area *va = NULL; struct rb_node *n; unsigned long addr; int purged = 0; struct vmap_area *first; + bool is_vmalloc_alloc; BUG_ON(!size); BUG_ON(offset_in_page(size)); @@ -416,19 +836,38 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, might_sleep(); - va = kmalloc_node(sizeof(struct vmap_area), - gfp_mask & GFP_RECLAIM_MASK, node); - if (unlikely(!va)) - return ERR_PTR(-ENOMEM); - - /* - * Only scan the relevant parts containing pointers to other objects - * to avoid false negatives. - */ - kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK); + is_vmalloc_alloc = is_vmalloc_addr((void *) vstart); + if (!is_vmalloc_alloc) { + va = kmalloc_va_node_leak_scan(gfp_mask, node); + if (unlikely(!va)) + return ERR_PTR(-ENOMEM); + } retry: spin_lock(&vmap_area_lock); + if (is_vmalloc_alloc) { + addr = alloc_vmalloc_area(&va, size, align, vstart, vend); + + /* + * If an allocation fails, the "vend" address is + * returned. Therefore trigger an overflow path. + */ + if (unlikely(addr == vend)) + goto overflow; + + if (!va) { + spin_unlock(&vmap_area_lock); + + va = kmalloc_va_node_leak_scan(gfp_mask, node); + if (unlikely(!va)) + return ERR_PTR(-ENOMEM); + + spin_lock(&vmap_area_lock); + } + + goto insert_vmap_area; + } + /* * Invalidate cache if we have more permissive parameters. * cached_hole_size notes the largest hole noticed _below_ @@ -501,11 +940,15 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, if (addr + size > vend) goto overflow; + free_vmap_cache = &va->rb_node; + +insert_vmap_area: va->va_start = addr; va->va_end = addr + size; + va->align = align; va->flags = 0; - __insert_vmap_area(va); - free_vmap_cache = &va->rb_node; + __insert_vmap_area(va, &vmap_area_root, &vmap_area_list); + spin_unlock(&vmap_area_lock); BUG_ON(!IS_ALIGNED(va->va_start, align)); @@ -552,9 +995,13 @@ EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier); static void __free_vmap_area(struct vmap_area *va) { + unsigned long last_free_va_start; + bool is_vmalloc_area; + BUG_ON(RB_EMPTY_NODE(&va->rb_node)); + is_vmalloc_area = is_vmalloc_addr((void *) va->va_start); - if (free_vmap_cache) { + if (!is_vmalloc_area && free_vmap_cache) { if (va->va_end < cached_vstart) { free_vmap_cache = NULL; } else { @@ -571,18 +1018,38 @@ static void __free_vmap_area(struct vmap_area *va) } rb_erase(&va->rb_node, &vmap_area_root); RB_CLEAR_NODE(&va->rb_node); - list_del_rcu(&va->list); + list_del(&va->list); - /* - * Track the highest possible candidate for pcpu area - * allocation. Areas outside of vmalloc area can be returned - * here too, consider only end addresses which fall inside - * vmalloc area proper. - */ - if (va->va_end > VMALLOC_START && va->va_end <= VMALLOC_END) + if (is_vmalloc_area) { + /* + * Track the highest possible candidate for pcpu area + * allocation. Areas outside of vmalloc area can be returned + * here too, consider only end addresses which fall inside + * vmalloc area proper. + */ vmap_area_pcpu_hole = max(vmap_area_pcpu_hole, va->va_end); - kfree_rcu(va, rcu_head); + if (last_free_area) + last_free_va_start = last_free_area->va_start; + else + last_free_va_start = 0; + + /* + * Merge VA with its neighbors, otherwise just add it. + */ + va = __merge_add_free_va_area(va, + &free_vmap_area_root, &free_vmap_area_list); + + /* + * Update a search criteria if merging/inserting is done + * before the va_start address of last_free_area marker. + */ + if (last_free_area) + if (va->va_start < last_free_va_start) + last_free_area = va; + } else { + kfree(va); + } } /* @@ -1238,6 +1705,51 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align) vm_area_add_early(vm); } +static void vmalloc_init_free_space(void) +{ + unsigned long free_hole_start = VMALLOC_START; + const unsigned long vmalloc_end = VMALLOC_END; + struct vmap_area *busy_area, *free_area; + + /* + * B F B B B F + * -|-----|.....|-----|-----|-----|.....|- + * | vmalloc space | + * |<--------------------------------->| + */ + list_for_each_entry(busy_area, &vmap_area_list, list) { + if (!is_vmalloc_addr((void *) busy_area->va_start)) + continue; + + if (busy_area->va_start - free_hole_start > 0) { + free_area = kzalloc(sizeof(*free_area), GFP_NOWAIT); + free_area->va_start = free_hole_start; + free_area->va_end = busy_area->va_start; + + __insert_vmap_area(free_area, + &free_vmap_area_root, &free_vmap_area_list); + } + + free_hole_start = busy_area->va_end; + } + + if (vmalloc_end - free_hole_start > 0) { + free_area = kzalloc(sizeof(*free_area), GFP_NOWAIT); + free_area->va_start = free_hole_start; + free_area->va_end = vmalloc_end; + + __insert_vmap_area(free_area, + &free_vmap_area_root, &free_vmap_area_list); + } + + /* + * Assume if busy VA overlaps two areas it is wrong. + * I.e. a start address is vmalloc address whereas an + * end address is not. Warn if so. + */ + WARN_ON(free_hole_start > vmalloc_end); +} + void __init vmalloc_init(void) { struct vmap_area *va; @@ -1263,9 +1775,14 @@ void __init vmalloc_init(void) va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; - __insert_vmap_area(va); + __insert_vmap_area(va, &vmap_area_root, &vmap_area_list); } + /* + * Now we can initialize a free vmalloc space. + */ + vmalloc_init_free_space(); + vmap_area_pcpu_hole = VMALLOC_END; vmap_initialized = true; @@ -2365,82 +2882,23 @@ static struct vmap_area *node_to_va(struct rb_node *n) return rb_entry_safe(n, struct vmap_area, rb_node); } -/** - * pvm_find_next_prev - find the next and prev vmap_area surrounding @end - * @end: target address - * @pnext: out arg for the next vmap_area - * @pprev: out arg for the previous vmap_area - * - * Returns: %true if either or both of next and prev are found, - * %false if no vmap_area exists - * - * Find vmap_areas end addresses of which enclose @end. ie. if not - * NULL, *pnext->va_end > @end and *pprev->va_end <= @end. - */ -static bool pvm_find_next_prev(unsigned long end, - struct vmap_area **pnext, - struct vmap_area **pprev) +static struct vmap_area * +addr_to_free_va(unsigned long addr) { - struct rb_node *n = vmap_area_root.rb_node; + struct rb_node *n = free_vmap_area_root.rb_node; struct vmap_area *va = NULL; while (n) { va = rb_entry(n, struct vmap_area, rb_node); - if (end < va->va_end) + if (addr < va->va_start) n = n->rb_left; - else if (end > va->va_end) + else if (addr > va->va_end) n = n->rb_right; else - break; - } - - if (!va) - return false; - - if (va->va_end > end) { - *pnext = va; - *pprev = node_to_va(rb_prev(&(*pnext)->rb_node)); - } else { - *pprev = va; - *pnext = node_to_va(rb_next(&(*pprev)->rb_node)); - } - return true; -} - -/** - * pvm_determine_end - find the highest aligned address between two vmap_areas - * @pnext: in/out arg for the next vmap_area - * @pprev: in/out arg for the previous vmap_area - * @align: alignment - * - * Returns: determined end address - * - * Find the highest aligned address between *@pnext and *@pprev below - * VMALLOC_END. *@pnext and *@pprev are adjusted so that the aligned - * down address is between the end addresses of the two vmap_areas. - * - * Please note that the address returned by this function may fall - * inside *@pnext vmap_area. The caller is responsible for checking - * that. - */ -static unsigned long pvm_determine_end(struct vmap_area **pnext, - struct vmap_area **pprev, - unsigned long align) -{ - const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1); - unsigned long addr; - - if (*pnext) - addr = min((*pnext)->va_start & ~(align - 1), vmalloc_end); - else - addr = vmalloc_end; - - while (*pprev && (*pprev)->va_end > addr) { - *pnext = *pprev; - *pprev = node_to_va(rb_prev(&(*pnext)->rb_node)); + return va; } - return addr; + return NULL; } /** @@ -2473,11 +2931,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, { const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align); const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1); - struct vmap_area **vas, *prev, *next; + struct vmap_area **vas, *va; + struct vmap_area **off = NULL; struct vm_struct **vms; - int area, area2, last_area, term_area; - unsigned long base, start, end, last_end; + int area, area2, last_area; + unsigned long start, end, last_end; + unsigned long base; bool purged = false; + u8 fit_type = NOTHING_FIT; /* verify parameters and allocate data structures */ BUG_ON(offset_in_page(align) || !is_power_of_2(align)); @@ -2512,90 +2973,122 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, if (!vas || !vms) goto err_free2; + if (nr_vms > 1) { + off = kcalloc(nr_vms, sizeof(off[0]), GFP_KERNEL); + if (!off) + goto err_free2; + } + for (area = 0; area < nr_vms; area++) { vas[area] = kzalloc(sizeof(struct vmap_area), GFP_KERNEL); vms[area] = kzalloc(sizeof(struct vm_struct), GFP_KERNEL); if (!vas[area] || !vms[area]) goto err_free; + + if (nr_vms > 1) { + off[area] = kzalloc(sizeof(off[0]), GFP_KERNEL); + if (!off[area]) + goto err_free; + } } + retry: spin_lock(&vmap_area_lock); - /* start scanning - we scan from the top, begin with the last area */ - area = term_area = last_area; - start = offsets[area]; - end = start + sizes[area]; + /* + * Initialize va here, since we can retry the search. + */ + va = NULL; - if (!pvm_find_next_prev(vmap_area_pcpu_hole, &next, &prev)) { - base = vmalloc_end - last_end; - goto found; + if (unlikely(list_empty(&free_vmap_area_list))) { + spin_unlock(&vmap_area_lock); + goto err_free; } - base = pvm_determine_end(&next, &prev, align) - end; - while (true) { - BUG_ON(next && next->va_end <= base + end); - BUG_ON(prev && prev->va_end > base + end); + if (off) + va = addr_to_free_va(vmap_area_pcpu_hole); + + if (!va) + va = list_last_entry(&free_vmap_area_list, + struct vmap_area, list); + + list_for_each_entry_from_reverse(va, &free_vmap_area_list, list) { + base = (va->va_end & ~(align - 1)) - last_end; /* * base might have underflowed, add last_end before * comparing. */ - if (base + last_end < vmalloc_start + last_end) { - spin_unlock(&vmap_area_lock); - if (!purged) { - purge_vmap_area_lazy(); - purged = true; - goto retry; - } - goto err_free; - } + if (base + last_end < vmalloc_start + last_end) + break; - /* - * If next overlaps, move base downwards so that it's - * right below next and then recheck. - */ - if (next && next->va_start < base + end) { - base = pvm_determine_end(&next, &prev, align) - end; - term_area = area; + if (base < va->va_start) continue; - } + if (base > va->va_start) + fit_type = RE_FIT_TYPE; + else + /* base == va->va_start */ + fit_type = FL_FIT_TYPE; + + break; + } + + if (fit_type == RE_FIT_TYPE) { + va->va_end = base; + } else if (fit_type == FL_FIT_TYPE) { /* - * If prev overlaps, shift down next and prev and move - * base so that it's right below new next and then - * recheck. + * Check if there is an interaction with regular + * vmalloc allocations when a free block fully fits. + * If so just shift back last_free_area marker. */ - if (prev && prev->va_end > base + start) { - next = prev; - prev = node_to_va(rb_prev(&next->rb_node)); - base = pvm_determine_end(&next, &prev, align) - end; - term_area = area; - continue; + if (last_free_area == va) + last_free_area = node_to_va(rb_prev(&va->rb_node)); + + __remove_vmap_area(va, &free_vmap_area_root); + } else { + spin_unlock(&vmap_area_lock); + if (!purged) { + purge_vmap_area_lazy(); + purged = true; + goto retry; } - /* - * This area fits, move on to the previous one. If - * the previous one is the terminal one, we're done. - */ - area = (area + nr_vms - 1) % nr_vms; - if (area == term_area) - break; - start = offsets[area]; - end = start + sizes[area]; - pvm_find_next_prev(base + end, &next, &prev); + goto err_free; } -found: - /* we've found a fitting base, insert all va's */ - for (area = 0; area < nr_vms; area++) { - struct vmap_area *va = vas[area]; + /* we've found a fitting base, insert all va's */ + for (area = 0, start = base; area < nr_vms; area++) { + va = vas[area]; va->va_start = base + offsets[area]; va->va_end = va->va_start + sizes[area]; - __insert_vmap_area(va); - } - vmap_area_pcpu_hole = base + offsets[last_area]; + /* + * If there are several areas to allocate, we should insert + * back a free space that is organized by area size and offset. + */ + if (off) { + off[area]->va_start = start; + off[area]->va_end = start + offsets[area]; + /* Shift to next free start. */ + start = va->va_end; + + /* + * Some initialization before adding/merging. + */ + RB_CLEAR_NODE(&off[area]->rb_node); + off[area]->align = 1; + + (void) __merge_add_free_va_area(off[area], + &free_vmap_area_root, &free_vmap_area_list); + } + + __insert_vmap_area(va, &vmap_area_root, &vmap_area_list); + va->align = 1; + } + + vmap_area_pcpu_hole = base; spin_unlock(&vmap_area_lock); /* insert all vm's */ @@ -2604,16 +3097,21 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, pcpu_get_vm_areas); kfree(vas); + kfree(off); return vms; err_free: for (area = 0; area < nr_vms; area++) { + if (nr_vms > 1) + kfree(off[area]); + kfree(vas[area]); kfree(vms[area]); } err_free2: kfree(vas); kfree(vms); + kfree(off); return NULL; } From patchwork Fri Oct 19 17:35:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 10649915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3204B109C for ; Fri, 19 Oct 2018 17:35:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 186E428497 for ; Fri, 19 Oct 2018 17:35:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C9092849D; Fri, 19 Oct 2018 17:35:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B5ED28497 for ; Fri, 19 Oct 2018 17:35:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 829BF6B0008; Fri, 19 Oct 2018 13:35:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7D96A6B000C; Fri, 19 Oct 2018 13:35:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C8EF6B000D; Fri, 19 Oct 2018 13:35:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by kanga.kvack.org (Postfix) with ESMTP id F14016B0008 for ; Fri, 19 Oct 2018 13:35:55 -0400 (EDT) Received: by mail-lf1-f72.google.com with SMTP id m3-v6so795123lfh.14 for ; Fri, 19 Oct 2018 10:35:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=QOpxQtsqnZO1O69woFc9jT6MsNTDcWqgsVf/VQ7puy8=; b=OHSNCOV8wyQrpldiGSm0ztJiVk4/sI02iWHUwWvPVOyGptCtuXAi8RlXIK9+FLM3Aa 4aTjtVKTabCjxsxYsYJFjgsDucGrA0g///SdZyJhYqPU0MmmzS3srPHXYQgXQtzOns75 sOex7d0vLnIfBQdF2zsYmCscUafUGL+Foy7j100yqNHkMy6+BGP/UY4M3NlDPrtH2zxQ ypugf01WghTiKST0Kv2c3wdmN5MqmvQ56aOgnGPExf21RH/QM6SbgZUKg8MJ7z3wfDPq Iicx11Mk+vpGZztjiBRjOxYBwYCCkpdOp+CS88vh5RIisVK1Fwl0DKDdrQvmz3HM0+Yp khBA== X-Gm-Message-State: ABuFfogtXdZrG85s7DkGlWhlQq/gkSDzAQZvM+Yq0XdSTD9qIhMjkKpP /1QwSnchOAf9V6RqFQo1C7S7cO2v4857g50JIWbACA3snOWlwG/vsCuhCVSFTkcCFBF6pNkT1vB PVQi9nIjLlyiZPj+BQ/4DfHe0yIqrQlvlQNH3DKHooiq928Wt7IQAcgwh9nnGykUT733Ogjw4gn LBfHnRwvRNnnB/9g1LsManJuHIHZxd2x/oiJwRg8Wjd2t+5+8gNRorFKUyuwEi42JM1DG+pbAhH H2vNZbb/vl/6ol65FJJGHFrIYZfN6Gn/4M/Hwwfg9Yu3647z0vDZtoMNZK1eM3s4tAgUZcz30WP YI8T5eqYnQWLbrAa4uZLliqOqrDLsgtKIaQEDofAVfulGDM2W6BkqozndiU6FP8Oqh47kddEJFi + X-Received: by 2002:a19:ec0a:: with SMTP id b10-v6mr3846817lfa.65.1539970555031; Fri, 19 Oct 2018 10:35:55 -0700 (PDT) X-Received: by 2002:a19:ec0a:: with SMTP id b10-v6mr3846780lfa.65.1539970553880; Fri, 19 Oct 2018 10:35:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539970553; cv=none; d=google.com; s=arc-20160816; b=GLHoiLfJZdbpETjRsVsRgvMGKyW6lHUDkJSggpyfXlIO/QDs65WLuX1sDpHDnr70mz TtbQ3YmVLyZDU3G97JPhPAuY0hOR8VyG8JJaCfzFXHHMxzIjCoawPoRANWp9/J0le3qZ F0b0HddRUk8IEVwy00EVUnNZJ8Ic7JaR6OaYrFSotH3McBh/Fej1CzOWrRMmZ8yOjBLL bZ7RUVqe9xEGseHo+uJ7fENS3IAoGQuo7h1Ej6+51ykLAO8zMvjScf5KAa0t2fYeCQ/z 6dpw9+ucrtV7g68UPVS8UZ8k7vZD+ggzNB8m/qgJnYMTLsGmTBB8WRatXpOxXu3nDo+H oDqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QOpxQtsqnZO1O69woFc9jT6MsNTDcWqgsVf/VQ7puy8=; b=KVy8y3jyBO1Gsh7OW+C7kZXIw0jpLPMt49HMRgUD0l4BuIyre9sSFx6VXwPJ2VQ96C cKpZXnblq49OYMbG61OCYbHuCcavqljjbP7kyPeJyuq9V+oay7nW4Wsd3fBPvx7jLAfv edHKPZhUE0zrkybtOIMdKO1kn1G/kK4KAv0EenuZcxFsMfJSVGmMcQpkt+VaxHycOI9R UntCk7TWTKDjH7hnMGeey7yGZWsaYvoQZlKC4DJ06bo2IzXVm0wUVLq6SsDo6bW89IuD RXCk7UyCE3T6LPGPhKC62EnYqd8BaR+dSqDmlVIpFPMNDbchyndTf7N4hIz/s18oUE2m m46g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=syXSz+ry; spf=pass (google.com: domain of urezki@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id n24-v6sor7822936lfe.33.2018.10.19.10.35.53 for (Google Transport Security); Fri, 19 Oct 2018 10:35:53 -0700 (PDT) Received-SPF: pass (google.com: domain of urezki@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=syXSz+ry; spf=pass (google.com: domain of urezki@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QOpxQtsqnZO1O69woFc9jT6MsNTDcWqgsVf/VQ7puy8=; b=syXSz+rynUT0XXnSrP7QxmpOFvRr5Bfr0xP2NEUQePfz5z37UIDck093aaGr+Jxa4v EIInTHwHm0Q1NDl5tr94Fdjbza1caupFFHFKQVtbRprJ360q4FOBCZ13lGxR4Zf+HMxQ 094PnyXuCxKdPliwGRChjlwkjH/W/vP9duBjC07du/+d5V8+x2QdLY3kZaPjdshVJdVP mh7t5Ai8b+R+dVx7nyy1No4zBKV5LFDhL/DjNTVsa9XiKP8BHHkRrRYw72bQ2wMQIHVv bGpn8T5tGgIOlqUQy7B5X1gv9wggtCiGwjIPj9TcxxaN8O1vYv1Sexy5i8ne01YMXcxG VupA== X-Google-Smtp-Source: ACcGV61jzkETd02239M3JQrg7GcoPJDUVxeCvyghNY325cP/LpZXE8bIG+qEVdYd9SXYWUownIJSJg== X-Received: by 2002:a19:a501:: with SMTP id o1-v6mr3480817lfe.41.1539970553327; Fri, 19 Oct 2018 10:35:53 -0700 (PDT) Received: from pc636.semobile.internal ([37.139.158.167]) by smtp.gmail.com with ESMTPSA id k9-v6sm5531530lje.51.2018.10.19.10.35.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Oct 2018 10:35:52 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: Matthew Wilcox , Andrew Morton , linux-mm@kvack.org Cc: LKML , Michal Hocko , Thomas Garnier , Oleksiy Avramchenko , Steven Rostedt , Joel Fernandes , Thomas Gleixner , Ingo Molnar , Tejun Heo , "Uladzislau Rezki (Sony)" Subject: [RFC PATCH 2/2] mm: add priority threshold to __purge_vmap_area_lazy() Date: Fri, 19 Oct 2018 19:35:38 +0200 Message-Id: <20181019173538.590-3-urezki@gmail.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181019173538.590-1-urezki@gmail.com> References: <20181019173538.590-1-urezki@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP commit 763b218ddfaf ("mm: add preempt points into __purge_vmap_area_lazy()") introduced some preempt points, one of those is making an allocation more prioritized. Prioritizing an allocation over freeing does not work well all the time, i.e. it should be rather a compromise. 1) Number of lazy pages directly influence on busy list length thus on operations like: allocation, lookup, unmap, remove, etc. 2) Under heavy simultaneous allocations/releases there may be a situation when memory usage grows too fast hitting out_of_memory -> panic. Establish a threshold passing which the freeing path is prioritized over allocation creating a balance between both. Signed-off-by: Uladzislau Rezki (Sony) --- mm/vmalloc.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a7f257540a05..bbafcff6632b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1124,23 +1124,23 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) struct llist_node *valist; struct vmap_area *va; struct vmap_area *n_va; - bool do_free = false; + int resched_threshold; lockdep_assert_held(&vmap_purge_lock); valist = llist_del_all(&vmap_purge_list); + if (unlikely(valist == NULL)) + return false; + llist_for_each_entry(va, valist, purge_list) { if (va->va_start < start) start = va->va_start; if (va->va_end > end) end = va->va_end; - do_free = true; } - if (!do_free) - return false; - flush_tlb_kernel_range(start, end); + resched_threshold = (int) lazy_max_pages() << 1; spin_lock(&vmap_area_lock); llist_for_each_entry_safe(va, n_va, valist, purge_list) { @@ -1148,7 +1148,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) __free_vmap_area(va); atomic_sub(nr, &vmap_lazy_nr); - cond_resched_lock(&vmap_area_lock); + + if (atomic_read(&vmap_lazy_nr) < resched_threshold) + cond_resched_lock(&vmap_area_lock); } spin_unlock(&vmap_area_lock); return true;