From patchwork Sun Sep 19 16:42:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hyeonggon Yoo <42.hyeyoo@gmail.com> X-Patchwork-Id: 12504371 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1077FC433F5 for ; Sun, 19 Sep 2021 16:42:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8731A611C8 for ; Sun, 19 Sep 2021 16:42:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8731A611C8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 34CA0900003; Sun, 19 Sep 2021 12:42:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2FEFF900002; Sun, 19 Sep 2021 12:42:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C3F9900003; Sun, 19 Sep 2021 12:42:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 0A516900002 for ; Sun, 19 Sep 2021 12:42:53 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B997D180AD820 for ; Sun, 19 Sep 2021 16:42:52 +0000 (UTC) X-FDA: 78604892184.06.7EA66A4 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf27.hostedemail.com (Postfix) with ESMTP id 63EE870000B7 for ; Sun, 19 Sep 2021 16:42:52 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id y8so13949662pfa.7 for ; Sun, 19 Sep 2021 09:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=gzC2O/s8ikNUmX8UKae7CAywgcUoWNrsJ166hFxSJ/M=; b=I+yJHIPqGUy8zAQJ77F4USO8jOPYezfBkwEhMcMaZQPUe38apE3IEuXRG2Ydi2M2c4 0tn2Ap98xzqwIoXg9VglLUcixiv7mmkNWUVD4sWGeS/5pjex6PfpgH5ke26KTVGdwAj3 ftvlaBGYFnbOZ0cgDh2OS6a3dPIOD7BG2mX1LJjtbyD4KCQmnX6uP/teHV8dg7vWTPwX NKY9weCveoXJokyNnFShS7nbdheQ4vysvkFONBFIdh3Lt5EVSlSgUjrZmXKDHh/zmtj6 PJ/6cHZHOmx3BeiXW+oKcAMUIQk6HrrQtzMkSzCWI0jbPOdBnQ6dhB76Sphss7INZC+H IX3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=gzC2O/s8ikNUmX8UKae7CAywgcUoWNrsJ166hFxSJ/M=; b=BPoV+iH0lcR4sne03ah5zYdNcHT1y6JMHpi5xlbsKKL+m9Bvyx8evRwFRiBiLwI041 1QqKawpQpttdsTdoGcaCgKl0fho8FA45sUYwyk+8rn7c1uj0r+Mmq9XIr+0o31GQcOz1 gw/aORw//uutgToEq0DfZpNBHDSR634w7U/1xRM4oJtvWuWS4frz9zMVm4/mgsFd6d6+ M8LZW1HpnZjCkLZk6GG8KQwpAZSOLNYyzHz4lgkRx/mw7Hq+sNdt3UHFXpJfHFpfMa3J GqAudgaW2UGBUyXfSQ5texcjp1sfC3phqRF8kF3wHaMWz1Jx3Du86/pd377TkcJQIhIT I8sg== X-Gm-Message-State: AOAM533bHVhhl1kiHrog4aVRMxQ1bUmnh7dbMn1V+ldEbH3LrFIw18kh YBG+/+7k9dtDh/bcX/1euOU= X-Google-Smtp-Source: ABdhPJx2j39d4jPqCtptbwq7+Uy8CyHhbLA2PDrOOIiILccGIY2j8CZJuXNQhEiQ2iKYPM/qgH7tCQ== X-Received: by 2002:a62:1ad6:0:b0:440:3aef:46b7 with SMTP id a205-20020a621ad6000000b004403aef46b7mr20816934pfa.86.1632069771294; Sun, 19 Sep 2021 09:42:51 -0700 (PDT) Received: from kvm.asia-northeast3-a.c.our-ratio-313919.internal (252.229.64.34.bc.googleusercontent.com. [34.64.229.252]) by smtp.gmail.com with ESMTPSA id c23sm11080069pjr.20.2021.09.19.09.42.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Sep 2021 09:42:50 -0700 (PDT) From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: 42.hyeyoo@gmail.com Cc: Christoph Lameter , David Rientjes , Joonsoo Kim , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jens Axboe Subject: [RFC PATCH] Introducing lockless cache built on top of slab allocator Date: Sun, 19 Sep 2021 16:42:39 +0000 Message-Id: <20210919164239.49905-1-42.hyeyoo@gmail.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=I+yJHIPq; spf=pass (imf27.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: oj8g7xxc7psa3e61nsxxijkfmqiqsrwb X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 63EE870000B7 X-HE-Tag: 1632069772-74432 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It is just simple proof of concept, and not ready for submission yet. There can be wrong code (like wrong gfp flags, or wrong error handling, etc) it is just simple proof of concept. I want comment from you. Recently block layer implemented percpu, lockless cache on the top of slab allocator. It can be used for IO polling, because IO polling disables interrupt. Link: https://lwn.net/Articles/868070/ Link: https://www.spinics.net/lists/linux-block/msg71964.html it gained some IOPS increase after this. (Note that Jens used SLUB on performance measurement) this is generalization of what have been done in block layer, built on top of slab allocator. lockless cache uses simple queuing to be more cache friendly. and when the percpu freelist gets too big, it returns some objects back to slab allocator. it seems lockless cache can be used network layer's IO Polling (NAPI) too. Any ideas/opinions on this? --- include/linux/lockless_cache.h | 31 ++++++++ init/main.c | 2 + mm/Makefile | 2 +- mm/lockless_cache.c | 132 +++++++++++++++++++++++++++++++++ 4 files changed, 166 insertions(+), 1 deletion(-) create mode 100644 include/linux/lockless_cache.h create mode 100644 mm/lockless_cache.c diff --git a/include/linux/lockless_cache.h b/include/linux/lockless_cache.h new file mode 100644 index 000000000000..e64b85e869f3 --- /dev/null +++ b/include/linux/lockless_cache.h @@ -0,0 +1,31 @@ +#include + +struct object_list { + void *object; + struct list_head list; +}; + +struct freelist { + struct object_list *head; + int size; +}; + +struct lockless_cache { + struct kmem_cache *cache; + struct freelist __percpu *freelist; + + int total_size; + unsigned int max; /* maximum size for each percpu freelist */ + unsigned int slack; /* number of objects returning to slab when freelist is too big (> max) */ +}; + +void lockless_cache_init(void); +struct lockless_cache +*lockless_cache_create(const char *name, unsigned int size, unsigned int align, + slab_flags_t flags, void (*ctor)(void *), unsigned int max, + unsigned int slack); + +void lockless_cache_destroy(struct lockless_cache *cache); +void *lockless_cache_alloc(struct lockless_cache *cache, gfp_t flags); +void lockless_cache_free(struct lockless_cache *cache, void *object); + diff --git a/init/main.c b/init/main.c index 3f7216934441..c18d6421cb65 100644 --- a/init/main.c +++ b/init/main.c @@ -79,6 +79,7 @@ #include #include #include +#include #include #include #include @@ -848,6 +849,7 @@ static void __init mm_init(void) /* page_owner must be initialized after buddy is ready */ page_ext_init_flatmem_late(); kmem_cache_init(); + lockless_cache_init(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); diff --git a/mm/Makefile b/mm/Makefile index fc60a40ce954..d6c3a89ed548 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -52,7 +52,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ mm_init.o percpu.o slab_common.o \ compaction.o vmacache.o \ interval_tree.o list_lru.o workingset.o \ - debug.o gup.o mmap_lock.o $(mmu-y) + debug.o gup.o mmap_lock.o lockless_cache.o $(mmu-y) # Give 'page_alloc' its own module-parameter namespace page-alloc-y := page_alloc.o diff --git a/mm/lockless_cache.c b/mm/lockless_cache.c new file mode 100644 index 000000000000..05b8cdb672ff --- /dev/null +++ b/mm/lockless_cache.c @@ -0,0 +1,132 @@ +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_SLUB +#include +#elif CONFIG_SLAB +#include +#else +#include +#endif + +static struct kmem_cache *global_lockless_cache; +static struct kmem_cache *global_list_cache; + +/* + * What should to do if initialization fails? + */ +void lockless_cache_init(void) +{ + global_lockless_cache = kmem_cache_create("global_lockless_cache", sizeof(struct lockless_cache), + sizeof(struct lockless_cache), 0, NULL); + + global_list_cache = kmem_cache_create("global_list_cache", sizeof(struct object_list), + sizeof(struct object_list), 0, NULL); + +} +EXPORT_SYMBOL(lockless_cache_init); + +struct lockless_cache +*lockless_cache_create(const char *name, unsigned int size, unsigned int align, + slab_flags_t flags, void (*ctor)(void *), unsigned int max, unsigned int slack) +{ + int cpu; + struct lockless_cache *cache; + + cache = kmem_cache_alloc(global_lockless_cache, GFP_KERNEL || __GFP_ZERO); + if (!cache) + return NULL; + + cache->cache = kmem_cache_create(name, size, align, 0, ctor); + if (!cache->cache) + goto destroy_cache; + + cache->freelist = alloc_percpu(struct freelist); + if (!cache->freelist) + goto destroy_cache; + + cache->max = max; + cache->slack = slack; + cache->total_size = 0; + + for_each_possible_cpu(cpu) { + struct freelist *freelist; + freelist = per_cpu_ptr(cache->freelist, cpu); + INIT_LIST_HEAD(&freelist->head->list); + freelist->size = 0; + } + + return cache; + +destroy_cache: + + lockless_cache_destroy(cache); + return cache; +} +EXPORT_SYMBOL(lockless_cache_create); + +void lockless_cache_destroy(struct lockless_cache *cache) +{ + int cpu; + struct object_list *elem; + + for_each_possible_cpu(cpu) { + free_percpu(cache->freelist); + list_for_each_entry(elem, &cache->freelist->head->list, list) { + lockless_cache_free(cache, elem->object); + kmem_cache_free(global_list_cache, elem); + } + } + + kmem_cache_destroy(cache->cache); +} +EXPORT_SYMBOL(lockless_cache_destroy); + +void *lockless_cache_alloc(struct lockless_cache *cache, gfp_t flags) +{ + struct freelist *freelist; + struct object_list *elem; + + freelist = this_cpu_ptr(cache->freelist); + + if (list_empty(&freelist->head->list)) { + elem = freelist->head; + list_del(&freelist->head->list); + cache->total_size--; + freelist->size--; + cache->cache->ctor(elem->object); + } else { + elem = kmem_cache_alloc(global_list_cache, flags); + } + + return elem->object; +} +EXPORT_SYMBOL(lockless_cache_alloc); + +void lockless_cache_free(struct lockless_cache *cache, void *object) +{ + struct freelist *freelist; + struct object_list *elem; + + elem = container_of(&object, struct object_list, object); + freelist = this_cpu_ptr(cache->freelist); + list_add(&freelist->head->list, &elem->list); + cache->total_size++; + freelist->size++; + + /* return back to slab allocator */ + if (freelist->size > cache->max) { + elem = list_last_entry(&freelist->head->list, struct object_list, list); + list_del(&elem->list); + + kmem_cache_free(cache->cache, elem->object); + kmem_cache_free(global_list_cache, elem); + cache->total_size--; + freelist->size--; + } +} +EXPORT_SYMBOL(lockless_cache_free);