From patchwork Tue Aug 8 09:53:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13345908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EEBFC001B0 for ; Tue, 8 Aug 2023 09:53:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 917808D0003; Tue, 8 Aug 2023 05:53:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89DF68D0002; Tue, 8 Aug 2023 05:53:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EED68D0001; Tue, 8 Aug 2023 05:53:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5B32A6B0074 for ; Tue, 8 Aug 2023 05:53:56 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2A35AB14FB for ; Tue, 8 Aug 2023 09:53:56 +0000 (UTC) X-FDA: 81100476072.08.06349B0 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf01.hostedemail.com (Postfix) with ESMTP id 0665140016 for ; Tue, 8 Aug 2023 09:53:53 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=DvAX8Npb; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=lWyzyMKK; dmarc=none; spf=pass (imf01.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691488434; a=rsa-sha256; cv=none; b=txCH0RIj0D7pEff0Bihg6gIQG85zGDbWeg6wmPhYzr9yFXIGeGPn5ECHGZsqzp+0lnfyWs mOr8UbiSSQq0DPZNoeh2r4IXdBky54nNvLakeVN6jUu8WFHstBU4hNkArBCYVY2wKaCGqi lZY+tCEMF9pg5X8RsLTfuAZe9iso0i8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=DvAX8Npb; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=lWyzyMKK; dmarc=none; spf=pass (imf01.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691488434; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=mE/Rlif6LEotEbhW7qL2YSR5OM+dvqWF2/m03kvt0U0=; b=hgi7LnIEntQn3nACFxymr6m/b3Bm60I993ijh8dWBFwukagMu32sUQRGKeifoiSkity8Se KfrZSp5KYQTsRkI6ZScypF6q23Xa4oUE3N2Ejs9BHApjWdo2s9LP/4VNlylcvgRsmRBuYz c8LGUFspgV/JEQYfjtylth4kdgpYErg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 949231F88D; Tue, 8 Aug 2023 09:53:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1691488432; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=mE/Rlif6LEotEbhW7qL2YSR5OM+dvqWF2/m03kvt0U0=; b=DvAX8NpbbBMqlMysbAG+fvQDGkmOrXEnblMCzccKAo+xL0aP0tDUfc+TKuoKgigvLFBpAm wMIbsOj3BbC3tmFDw56FqL+p6V0HK4/yEs6LkLNlcISHGFGYSZHRlOT7fXERua/QBNiZfA Oom//yntN8wykThY1yl9lnlENpHwyQ8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1691488432; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=mE/Rlif6LEotEbhW7qL2YSR5OM+dvqWF2/m03kvt0U0=; b=lWyzyMKKMywL36JbUfDPqD01yvvtDsJ+aNWVgtYFd44duwe5t5YZyhSH//3WkNBkaKQxrp TP/N0eVhQTSA//Aw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 60A6A13451; Tue, 8 Aug 2023 09:53:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id YNCLFrAQ0mSBJQAAMHmgww (envelope-from ); Tue, 08 Aug 2023 09:53:52 +0000 From: Vlastimil Babka To: "Liam R. Howlett" , Matthew Wilcox , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Vlastimil Babka Subject: [RFC v1 0/5] SLUB percpu array caches and maple tree nodes Date: Tue, 8 Aug 2023 11:53:43 +0200 Message-ID: <20230808095342.12637-7-vbabka@suse.cz> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0665140016 X-Stat-Signature: gthn7brduczpifwb44zarqmhe5twkd3m X-HE-Tag: 1691488433-516998 X-HE-Meta: U2FsdGVkX1/RgHhF/FoBBKq3zErRIQPFfj0q2pSIAoalzKUActUifF6GnQmV94LKG/ykAagjKqUECO26tLwTjqeu9VUPdSwRLAvWEhLmuieuVSxHS0x7lFAdxXmXBQB8EnyxNTeVYxI0cNr7nvnoxRyoaUiddCsFNPH7pIa9dXVsMDFaYKnqbmsIDG5rxyIpPpBL7yz9WHr+9oK9yI9SyL+55U5tbwh0kMx7EWTapL06Ds681S4MLly2K51XThLeMuwN7IiN5xD+5HTGD9XtLwLDThvbxxmoFOUxnPlo2K1HjVjfzxwdhp4fsi26jnkASXf7pkthnJ0vfUArtyJaKQZiVIs7tTE4uvwntbHAJ4tZNh8GYAWVb7I7XlUwHmQM56/0tscP6DedDiX9zi4HM3uhWTiqa8WFz67pnAqr6I4Y+7IZKqB1R5GBJsslNx//tLFDM41Iy9SmscMRADoa6TZd9h9pUMWOLTm+vNzBtYf1roWJ3phBtFLQTmICewb68BGD2LKLbW4Se+gxEsk7aThLs7S+9x15CnWnbr0UFfImVqlsz34et2WqV+0KOoStk002c0CsWxE6ukrz+yD1CtuWhbH7iu/Mno6mMREbZgYTxEqjsgX7XrpaPGYq4MevGNXRZ8hASYKbzRBv6OnIpBdxtJc77IwhInroupiMeCGtaV5rekgIYrYpCWRX89JNX0ivkl1hdXaBm6ejedL2RggfX3TOgzXmqvfxIjYyi3bSvmCA2yqxsw3CoWWTL3EFmVKBztLtGSZxdDiYj/nfqi21covaIOxBsSbJuYVp+A0UsUJ5t9x1EYTTDmVGhW2e++FZndDSVj0jIa+KNiPbbU0ttKucOgol2QTAqm6VZvTAg08BKY4R8BZvssImmQ7WgBlyTOCaC48yrrlKvOpIYIKb1uI1T2H5XJO2z9cC/JsL9eexHPNPhWUOQ/mHWHUgIrM/aZKBppNSbAXzzP6 v5M/F8TR UVq8Y7yGTPCry+CYP0Xs4NykWQtm5Nw/aPhWHGLBQs4Yw9QmXp7BZQxsE7+ok1Ms/XBwL31OwR26n8vdcYvMfWjI0F4/Y0GO+grdu2lRMjPLE2ztSTLuwHjruG9F8hlsO7vfhty8xbc5WFtk0LEBt/1LUbjsc2EcVSDB5/XbJZHgv8y6TfVYOm1y5OGO+x9UckByjVIso6Zes5P6a6lbZWpcJzwXlgkCUo9meTYUJShWYivax6Z8KdA2mZ5K0/dtB7w4oN11Wus1yY29qS4b8+jrTe1cTPH5/fuLWGd+TIU9tjL+gYuBsJhuAkupxRrhN8/r8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Also available in git, based on v6.5-rc5: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-percpu-caches-v1 At LSF/MM I've mentioned that I see several use cases for introducing opt-in percpu arrays for caching alloc/free objects in SLUB. This is my first exploration of this idea, speficially for the use case of maple tree nodes. We have brainstormed this use case on IRC last week with Liam and Matthew and this how I understood the requirements: - percpu arrays will be faster thank bulk alloc/free which needs relatively long freelists to work well. Especially in the freeing case we need the nodes to come from the same slab (or small set of those) - preallocation for the worst case of needed nodes for a tree operation that can't reclaim due to locks is wasteful. We could instead expect that most of the time percpu arrays would satisfy the constained allocations, and in the rare cases it does not we can dip into GFP_ATOMIC reserves temporarily. Instead of preallocation just prefill the arrays. - NUMA locality is not a concern as the nodes of a process's VMA tree end up all over the place anyway. So this RFC patchset adds such percpu array in Patch 2. Locking is stolen from Mel's recent page allocator's pcplists implementation so it can avoid disabling IRQs and just disable preemption, but the trylocks can fail in rare situations. Then maple tree is modified in patches 3-5 to benefit from this. This is done in a very crude way as I'm not so familiar with the code. I've briefly tested this with virtme VM boot and checking the stats from CONFIG_SLUB_STATS in sysfs. Patch 2: slub changes implemented including new counters alloc_cpu_cache and free_cpu_cache but maple tree doesn't use them yet (none):/sys/kernel/slab/maple_node # grep . alloc_cpu_cache alloc_*path free_cpu_cache free_*path | cut -d' ' -f1 alloc_cpu_cache:0 alloc_fastpath:56604 alloc_slowpath:7279 free_cpu_cache:0 free_fastpath:35087 free_slowpath:22403 Patch 3: maple node cache creates percpu array with 32 entries, not changed anything else -> some allocs/free satisfied by the array alloc_cpu_cache:11950 alloc_fastpath:39955 alloc_slowpath:7989 free_cpu_cache:12076 free_fastpath:22878 free_slowpath:18677 Patch 4: maple tree nodes bulk alloc/free converted to loop of normal alloc to use percpu array more, because bulk alloc bypasses it -> majority alloc/free now satisfied by percpu array alloc_cpu_cache:54178 alloc_fastpath:4959 alloc_slowpath:727 free_cpu_cache:54244 free_fastpath:354 free_slowpath:5159 Patch 5: mas_preallocate() just prefills the percpu array, actually preallocates only a single node mas_store_prealloc() gains a retry loop with mas_nomem(mas, GFP_ATOMIC | __GFP_NOFAIL) -> major drop of actual alloc/free alloc_cpu_cache:17031 alloc_fastpath:5324 alloc_slowpath:631 free_cpu_cache:17099 free_fastpath:277 free_slowpath:5503 Would be interesting to see how it affects the workloads that saw regressions from the maple tree introduction, as the slab operations were suspected to be a major factor. Vlastimil Babka (5): mm, slub: fix bulk alloc and free stats mm, slub: add opt-in slub_percpu_array maple_tree: use slub percpu array maple_tree: avoid bulk alloc/free to use percpu array more maple_tree: replace preallocation with slub percpu array prefill include/linux/slab.h | 4 + include/linux/slub_def.h | 10 ++ lib/maple_tree.c | 30 +++++- mm/slub.c | 221 ++++++++++++++++++++++++++++++++++++++- 4 files changed, 258 insertions(+), 7 deletions(-)