From patchwork Fri Jun 21 07:15:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13706934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C9DFC2BA1A for ; Fri, 21 Jun 2024 07:15:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 089AB8D013E; Fri, 21 Jun 2024 03:15:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 011C08D0138; Fri, 21 Jun 2024 03:15:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D80778D013E; Fri, 21 Jun 2024 03:15:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A732C8D0138 for ; Fri, 21 Jun 2024 03:15:47 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4B74440A9F for ; Fri, 21 Jun 2024 07:15:47 +0000 (UTC) X-FDA: 82254035934.15.84BF7C3 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) by imf09.hostedemail.com (Postfix) with ESMTP id B29AB140014 for ; Fri, 21 Jun 2024 07:15:44 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=hm1aa+AG; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf09.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.188 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718954136; a=rsa-sha256; cv=none; b=mbc1BTvM2WwlGQVXEis93woi2gQpR0ux/igsQbQ1/L6rKZ3fO/7s0SI4UbzGU45kRZpQqu TM/Fs3EdH4ZmvxxkPP8g1Cj5Il90cqSxs07NSHby6nTkrz7K2zoh70SiLTl9bPIx7OERhy Ub/Ziw7b7lCRbg5HFnwfjYrXRTaN48w= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=hm1aa+AG; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf09.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.188 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718954136; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n5V79APV9SLPspx3kdrd68+FrzReeozu5pmIiu2TBnw=; b=FDToWm/dRPkq0akqM4yZjbHvA9dSzhxxUnUceCsns8jBoeD+EAa932m63gpP/ZRVZsHJTq gN6ZwzPKHJpCCu23GJiHnba7oPopMqJ84YdvvRyuYrZPElATIg8qCgSL8Ik3lKlnqwK5G3 m0A2iBL11i9wuJSBDqgnXX5wBPs+dlY= X-Envelope-To: nphamcs@gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1718954143; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n5V79APV9SLPspx3kdrd68+FrzReeozu5pmIiu2TBnw=; b=hm1aa+AGXSuv1zSJwTgOjJFEBd5by/7+6roN/KjMuuCUDuABbgHokxd6agqI4cRyK/RrYE lB3kfJHwqGkkL6LkOX/GXz8YpNDs78OCK8QsiV0EE0yu6JViZyzmer4JlcagkJlVN49cW5 r4Mo0PmIz2NcDhDCabvSL/iLWh6jDEg= X-Envelope-To: yosryahmed@google.com X-Envelope-To: dan.carpenter@linaro.org X-Envelope-To: chengming.zhou@linux.dev X-Envelope-To: linux-mm@kvack.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: zhouchengming@bytedance.com X-Envelope-To: yuzhao@google.com X-Envelope-To: senozhatsky@chromium.org X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: flintglass@gmail.com X-Envelope-To: minchan@kernel.org X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Fri, 21 Jun 2024 15:15:09 +0800 Subject: [PATCH v2 1/2] mm/zsmalloc: change back to per-size_class lock MIME-Version: 1.0 Message-Id: <20240621-zsmalloc-lock-mm-everything-v2-1-d30e9cd2b793@linux.dev> References: <20240621-zsmalloc-lock-mm-everything-v2-0-d30e9cd2b793@linux.dev> In-Reply-To: <20240621-zsmalloc-lock-mm-everything-v2-0-d30e9cd2b793@linux.dev> To: Minchan Kim , Sergey Senozhatsky , Andrew Morton , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: Yu Zhao , Takero Funaki , Chengming Zhou , Dan Carpenter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chengming Zhou X-Developer-Signature: v=1; a=ed25519-sha256; t=1718954135; l=10469; i=chengming.zhou@linux.dev; s=20240617; h=from:subject:message-id; bh=RvNVJoE3TzTjiAxCWXUnTGNhmX2AfOERL3opyizU/oo=; b=t0Kg4JgGK+PxXkVt+dhJ2z2PqxO9wJwqxxDdsSxmGORGTsqlUmPme7oUSR7tUH9AewG/oLn+R rvUdwH+AEILAkDSm5yuWtZQylu1JbKr84BYOm0yzd7H17+WXAtGZd9i X-Developer-Key: i=chengming.zhou@linux.dev; a=ed25519; pk=/XPhIutBo+zyUeQyf4Ni5JYk/PEIWxIeUQqy2DYjmhI= X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B29AB140014 X-Stat-Signature: ka1cqzfgy8dminxcq5tchjfomxg699kj X-Rspam-User: X-HE-Tag: 1718954144-720242 X-HE-Meta: U2FsdGVkX199tKb/P/BlTRre1r4eTvcmC1CjLrPjmCHV3I+QK2h87HYbAHWc5qPaknIoFuy2HF2WQb/jQK6zykh02Lxd+yHBdMnIf9eVEsMom2NnSJWZxcfiHo/2obvr5kOxXGaS649fYNefEfUts36MD0xECtXTQXVbcxNpbKZSvTqpCoWi3tUvnabPdSOJNhtOiabrJksWuFXNlm7U39qXSjAKXMoZEepCFgaAHw6BvpM58uEJzZZfNrF4yO0QuGrfP8s11L9TkgeeXz9yA5uS8fZT09hO01xaYfKWWqNCkEVfbnZ+yoN1qxAad4oXaFA0pd9gM8MaYovj0GHOUmIHUs2ZZgabzpg39pcHGYvPFa0mfkNWXG4bAs8tI4ueB/Qy0aAxPclJ1TCUVx1CpSk2xyCmo+d7CCwvkIh/y79hndsI9zBgX5VbUKDyLGR2E+tzyH0LTJBVGOVE9QTkOrUoLuzAaiSbHYiqwAhQfUJhfC9Y1yjIxAH+GsCIXkFhZN0fpcVEFhUxP/z/3+87JeEhRMKVlUI989aXX7Kz8Mw8ogCOkRrIR2Xm/dC3Fti9FHM2nuGQhRNyDuHOqISi4L9tnTDi/Clnyi/BjmDXXnXFUzvLSJOnrfNBTpLeIif3Tf6EtHyxx4zyBkhOR5F4ShyZhTPpH4JzSBxrrSqzDe21sbI7SJ6fe2JtPo942cpxNurXiZoke2zyj48rPHSua1c7V3Zudd4CdXlEt75+RRcI1p03taA1frjBGV9xkzzXuctILK4pZFy0gmjknuOwSfOKgd4+lbQGpe7jT+NcSGrj5YvIVHvcGD6yq5FkIMT7T7AQMNDaDxNKA450I05VgJV9azV54e57TLD7ouqzdNOcGm+/SdWJ648MYKVoBunDWsjRiQPiFByyEetubRe39qC3NPdjS8/shrsE8ozC9A9veu6eiz5wmrSVvIlGY57Y9v6OpPpUJeTEwDWtuLY C4Bobfuf Mh7yY2Pchqr6/s4pqw2Y3dml3rWx1x98kcPvJAoKuDIWIABanz/vz1P6R8psfwFyqgjciO2Zaym8fn/t6GF7DmX/afFf95M082tDJbCA6AxnvNARmvcH87FUOYmxqwY4bzJ7WaKgZyy8rJ6+m0x8lavVzzbDP/rubb6SsVxKfJucErcI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch is almost the revert of the commit c0547d0b6a4b ("zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks"), which changed to use a global pool->lock instead of per-size_class lock and pool->migrate_lock, was preparation for suppporting reclaim in zsmalloc. Then reclaim in zsmalloc had been dropped in favor of LRU reclaim in zswap. In theory, per-size_class is more fine-grained than the pool->lock, since a pool can have many size_classes. As for the additional pool->migrate_lock, only free() and map() need to grab it to access stable handle to get zspage, and only in read lock mode. Signed-off-by: Chengming Zhou Reviewed-by: Sergey Senozhatsky --- mm/zsmalloc.c | 85 +++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 50 insertions(+), 35 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 44e0171d6003..fec1a39e5bbe 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -34,7 +34,8 @@ /* * lock ordering: * page_lock - * pool->lock + * pool->migrate_lock + * class->lock * zspage->lock */ @@ -183,6 +184,7 @@ static struct dentry *zs_stat_root; static size_t huge_class_size; struct size_class { + spinlock_t lock; struct list_head fullness_list[NR_FULLNESS_GROUPS]; /* * Size of objects stored in this class. Must be multiple @@ -237,7 +239,8 @@ struct zs_pool { #ifdef CONFIG_COMPACTION struct work_struct free_work; #endif - spinlock_t lock; + /* protect page/zspage migration */ + rwlock_t migrate_lock; atomic_t compaction_in_progress; }; @@ -336,7 +339,7 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) kmem_cache_free(pool->zspage_cachep, zspage); } -/* pool->lock(which owns the handle) synchronizes races */ +/* class->lock(which owns the handle) synchronizes races */ static void record_obj(unsigned long handle, unsigned long obj) { *(unsigned long *)handle = obj; @@ -431,7 +434,7 @@ static __maybe_unused int is_first_page(struct page *page) return PagePrivate(page); } -/* Protected by pool->lock */ +/* Protected by class->lock */ static inline int get_zspage_inuse(struct zspage *zspage) { return zspage->inuse; @@ -569,7 +572,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) if (class->index != i) continue; - spin_lock(&pool->lock); + spin_lock(&class->lock); seq_printf(s, " %5u %5u ", i, class->size); for (fg = ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++) { @@ -580,7 +583,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) obj_allocated = zs_stat_get(class, ZS_OBJS_ALLOCATED); obj_used = zs_stat_get(class, ZS_OBJS_INUSE); freeable = zs_can_compact(class); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); objs_per_zspage = class->objs_per_zspage; pages_used = obj_allocated / objs_per_zspage * @@ -837,7 +840,7 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class, { struct page *page, *next; - assert_spin_locked(&pool->lock); + assert_spin_locked(&class->lock); VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(zspage->fullness != ZS_INUSE_RATIO_0); @@ -1196,19 +1199,19 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, BUG_ON(in_interrupt()); /* It guarantees it can get zspage from handle safely */ - spin_lock(&pool->lock); + read_lock(&pool->migrate_lock); obj = handle_to_obj(handle); obj_to_location(obj, &page, &obj_idx); zspage = get_zspage(page); /* - * migration cannot move any zpages in this zspage. Here, pool->lock + * migration cannot move any zpages in this zspage. Here, class->lock * is too heavy since callers would take some time until they calls * zs_unmap_object API so delegate the locking from class to zspage * which is smaller granularity. */ migrate_read_lock(zspage); - spin_unlock(&pool->lock); + read_unlock(&pool->migrate_lock); class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); @@ -1364,8 +1367,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) size += ZS_HANDLE_SIZE; class = pool->size_class[get_size_class_index(size)]; - /* pool->lock effectively protects the zpage migration */ - spin_lock(&pool->lock); + /* class->lock effectively protects the zpage migration */ + spin_lock(&class->lock); zspage = find_get_zspage(class); if (likely(zspage)) { obj = obj_malloc(pool, zspage, handle); @@ -1377,7 +1380,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) goto out; } - spin_unlock(&pool->lock); + spin_unlock(&class->lock); zspage = alloc_zspage(pool, class, gfp); if (!zspage) { @@ -1385,7 +1388,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) return (unsigned long)ERR_PTR(-ENOMEM); } - spin_lock(&pool->lock); + spin_lock(&class->lock); obj = obj_malloc(pool, zspage, handle); newfg = get_fullness_group(class, zspage); insert_zspage(class, zspage, newfg); @@ -1397,7 +1400,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) /* We completely set up zspage so mark them as movable */ SetZsPageMovable(pool, zspage); out: - spin_unlock(&pool->lock); + spin_unlock(&class->lock); return handle; } @@ -1442,14 +1445,16 @@ void zs_free(struct zs_pool *pool, unsigned long handle) return; /* - * The pool->lock protects the race with zpage's migration + * The pool->migrate_lock protects the race with zpage's migration * so it's safe to get the page from handle. */ - spin_lock(&pool->lock); + read_lock(&pool->migrate_lock); obj = handle_to_obj(handle); obj_to_page(obj, &f_page); zspage = get_zspage(f_page); class = zspage_class(pool, zspage); + spin_lock(&class->lock); + read_unlock(&pool->migrate_lock); class_stat_dec(class, ZS_OBJS_INUSE, 1); obj_free(class->size, obj); @@ -1458,7 +1463,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle) if (fullness == ZS_INUSE_RATIO_0) free_zspage(pool, class, zspage); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); cache_free_handle(pool, handle); } EXPORT_SYMBOL_GPL(zs_free); @@ -1780,12 +1785,16 @@ static int zs_page_migrate(struct page *newpage, struct page *page, pool = zspage->pool; /* - * The pool's lock protects the race between zpage migration + * The pool migrate_lock protects the race between zpage migration * and zs_free. */ - spin_lock(&pool->lock); + write_lock(&pool->migrate_lock); class = zspage_class(pool, zspage); + /* + * the class lock protects zpage alloc/free in the zspage. + */ + spin_lock(&class->lock); /* the migrate_write_lock protects zpage access via zs_map_object */ migrate_write_lock(zspage); @@ -1815,9 +1824,10 @@ static int zs_page_migrate(struct page *newpage, struct page *page, replace_sub_page(class, zspage, newpage, page); /* * Since we complete the data copy and set up new zspage structure, - * it's okay to release the pool's lock. + * it's okay to release migration_lock. */ - spin_unlock(&pool->lock); + write_unlock(&pool->migrate_lock); + spin_unlock(&class->lock); migrate_write_unlock(zspage); get_page(newpage); @@ -1861,20 +1871,20 @@ static void async_free_zspage(struct work_struct *work) if (class->index != i) continue; - spin_lock(&pool->lock); + spin_lock(&class->lock); list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0], &free_pages); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); } list_for_each_entry_safe(zspage, tmp, &free_pages, list) { list_del(&zspage->list); lock_zspage(zspage); - spin_lock(&pool->lock); class = zspage_class(pool, zspage); + spin_lock(&class->lock); __free_zspage(pool, class, zspage); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); } }; @@ -1938,7 +1948,8 @@ static unsigned long __zs_compact(struct zs_pool *pool, * protect the race between zpage migration and zs_free * as well as zpage allocation/free */ - spin_lock(&pool->lock); + write_lock(&pool->migrate_lock); + spin_lock(&class->lock); while (zs_can_compact(class)) { int fg; @@ -1964,13 +1975,15 @@ static unsigned long __zs_compact(struct zs_pool *pool, src_zspage = NULL; if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100 - || spin_is_contended(&pool->lock)) { + || rwlock_is_contended(&pool->migrate_lock)) { putback_zspage(class, dst_zspage); dst_zspage = NULL; - spin_unlock(&pool->lock); + spin_unlock(&class->lock); + write_unlock(&pool->migrate_lock); cond_resched(); - spin_lock(&pool->lock); + write_lock(&pool->migrate_lock); + spin_lock(&class->lock); } } @@ -1980,7 +1993,8 @@ static unsigned long __zs_compact(struct zs_pool *pool, if (dst_zspage) putback_zspage(class, dst_zspage); - spin_unlock(&pool->lock); + spin_unlock(&class->lock); + write_unlock(&pool->migrate_lock); return pages_freed; } @@ -1992,10 +2006,10 @@ unsigned long zs_compact(struct zs_pool *pool) unsigned long pages_freed = 0; /* - * Pool compaction is performed under pool->lock so it is basically + * Pool compaction is performed under pool->migrate_lock so it is basically * single-threaded. Having more than one thread in __zs_compact() - * will increase pool->lock contention, which will impact other - * zsmalloc operations that need pool->lock. + * will increase pool->migrate_lock contention, which will impact other + * zsmalloc operations that need pool->migrate_lock. */ if (atomic_xchg(&pool->compaction_in_progress, 1)) return 0; @@ -2117,7 +2131,7 @@ struct zs_pool *zs_create_pool(const char *name) return NULL; init_deferred_free(pool); - spin_lock_init(&pool->lock); + rwlock_init(&pool->migrate_lock); atomic_set(&pool->compaction_in_progress, 0); pool->name = kstrdup(name, GFP_KERNEL); @@ -2189,6 +2203,7 @@ struct zs_pool *zs_create_pool(const char *name) class->index = i; class->pages_per_zspage = pages_per_zspage; class->objs_per_zspage = objs_per_zspage; + spin_lock_init(&class->lock); pool->size_class[i] = class; fullness = ZS_INUSE_RATIO_0;