From patchwork Wed Nov 29 09:53:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 13472572 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1A8FC4167B for ; Wed, 29 Nov 2023 09:53:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD72A6B03C1; Wed, 29 Nov 2023 04:53:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D60E6B03BA; Wed, 29 Nov 2023 04:53:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3532D6B03C1; Wed, 29 Nov 2023 04:53:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 526C06B03C1 for ; Wed, 29 Nov 2023 04:53:42 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 29B6DA05B4 for ; Wed, 29 Nov 2023 09:53:42 +0000 (UTC) X-FDA: 81510529884.16.92EA1E6 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf10.hostedemail.com (Postfix) with ESMTP id E32CDC0006 for ; Wed, 29 Nov 2023 09:53:39 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=DhAjX4+x; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=AFGProdn; dmarc=none; spf=pass (imf10.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701251620; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=eoFLVBeLcmc4evqkljz3fngHIKUUCOgba17WIJPWQ+0=; b=TKZTLIQEkmlBss/bLZTPbzlmuXoVXd9k7c/yIK1ZGHzVtQXMPfB9C8Gg6KCDnyeCMa71pc qxuULoP0e9u1qHaomjyOM21pF/qRFqZkEaS0jNQTOXqoHuOQecRyStZFQRenvks/1epEJx OXUgsnNuZdaZQCRLztmGDJkVOFo5g70= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=DhAjX4+x; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=AFGProdn; dmarc=none; spf=pass (imf10.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701251620; a=rsa-sha256; cv=none; b=x4OmSiqLW1Q7+VLUkyB9QXYm1yIIpMJspmk0SXyzTtt6qQcXz3Gw6wd+rbvGxL1RIqDvF6 1WbWssLxKhPGkgx4+z4JXFpDYILqSfn3U9KjEMabm0N8vin/zSl66K+9Dykb7RizEuqhCX p5C5hMXqNvYoDl3t32w5T2DoRD5Kun0= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CA5F72198B; Wed, 29 Nov 2023 09:53:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1701251616; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=eoFLVBeLcmc4evqkljz3fngHIKUUCOgba17WIJPWQ+0=; b=DhAjX4+xb9egQ4e/kFsy8nAm5bsTgOIbdfZmAXEDiYGXpSW0aJyAjkbpr/AWyJD/WnkZ43 VagLHVVyUW7GFZuPbUFq3SMfbAWeB6tbOPAy16pSLY0u0MIOEvip7gZLDuqTe54Go6ncKJ XqtmUJT5s40PAS1HTU/Hn+0fETbHHIQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1701251616; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=eoFLVBeLcmc4evqkljz3fngHIKUUCOgba17WIJPWQ+0=; b=AFGProdnynrmKZ4ksYdy2fXGxF280zC9G/ic+BPCyDtHe4ij6EsYeKmY2cmNaNbYBuFmtq PebK73ONoAXcrOCQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id AC1FA1388B; Wed, 29 Nov 2023 09:53:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 8pWoKSAKZ2UrfQAAD6G6ig (envelope-from ); Wed, 29 Nov 2023 09:53:36 +0000 From: Vlastimil Babka Subject: [PATCH RFC v3 0/9] SLUB percpu array caches and maple tree nodes Date: Wed, 29 Nov 2023 10:53:25 +0100 Message-Id: <20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz> MIME-Version: 1.0 X-B4-Tracking: v=1; b=H4sIABUKZ2UC/z2MQQ6CMBAAv0L27JJuIQU8mZj4AK/GQymLNCo0X SFGwt9tPHicSWZWEI6eBfbZCpEXL34aExS7DNxgxxuj7xKDVrog0jXKY24xcHRhRmfdwIJNWVL daEXUVZDCELn379/0AufTEa5J9nF64muIbP8/VZMiUxhd5UYrgw0urW3v9iCzcO4+sG1fIbWTd aAAAAA= To: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Matthew Wilcox , "Liam R. Howlett" Cc: Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Alexander Potapenko , Marco Elver , Dmitry Vyukov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, kasan-dev@googlegroups.com, Vlastimil Babka X-Mailer: b4 0.12.4 X-Rspamd-Queue-Id: E32CDC0006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: jgbwbiw9bwnz4mwyhoznajn1tfk5dc4m X-HE-Tag: 1701251619-595371 X-HE-Meta: U2FsdGVkX1/UzKYiWREy3Nm4y8GZDB+AkwpSwS7xSXWuTwRaRwSl+Ln8I3vQ0NwD4O0KAmczSU2Ot3Et8ZMdG8IPIeZjpoFT3tMBxoGkrskuYMN774kODdiMgB8CnK6o81bA1vDbzr44/y6URszejbOqdtPNg+xx4KNff2+xMhDCqDlTDzG4IzktsM/+R7tL9jvVcJdNwuRuqNPFF67dhIvMAIlbCxTr1btbNT4oHxJ6OWQN/yagZihTsa/8+cY3QYr4p4BRRL055xsYwY6RQB4HOseIS6MwxdKUgV0pHAwlHe0FMa2tmPcYscehAK7NPHk8GV21g8i1ATvQNqborNYwbITLJUd3Io15wvDhVtK1WIPadjrHAxM9t4Ft/JsAAEI6ECWh+1UD/F48IaPo7ylM+N32GJW9OY+k7opAdRHnV3IB6y3/F//ZZEAfLP9t0W9W3JN4xUqVZxwrDZWkDXHcmwVf0JySfjNQ2ofwzB+AipyUyxijQ0Z2rK2D5VVp9yBlEzZuQxZo/z+i6grAvx2RCNdN43iB4GkAqHok37jzr6HLhx4o7qq1unWjRhZkZxTTolaD1Ss2F88xC8Ms+KYUIeF7e6XN+0CxUXrmB1DiJSVQIR6r/vr7IU3KPqbbpa0AKFflnp/G6uIiCAybAhxUOKPYUoOi17pEPqQspgTAwtX3Zls2L3vRMKp/BRwjQoN0/uZQshBwWNHhzU4lLQ7TUKPXvLq0Eiecg2RktrIVxqj7AU7iwmKZ2ny7qadfPjzk9BNHr+z1Fn0Vw8V3KLfSxL0XaYt3nMDaTnrFWTw6lvwbvhqUucHEvbW989/VgGF+jaW8iAFYy6a425pkpn4ZhTfyDDhSvGrjvTfSiMIvzvu8qpOpV5hl7uTFqswrtp5/yG1AQaS616V/6A5ZkzTwjhhds1fmiUc5uHyCqkcJJyKj/TcDQVd/C/PR+qaQlPq+LKESEK32/+P7+h/ NSpeU4OH aisV7LlvsW6my486y3ZNS1R0RUtxL5HbDU0WKcfz+sRI7DX9xlohjSm2S5+lGufAmZud4855/7r5j+faNUDrawX874K8S1AWrGHlKbCeeFhMgYJvEokxCiIv1Usd9BaOJmAhI4HUiUJOwHDM2T1m2ZwbbZB/p6aI+6ZX2XYi2vTbO8hnDR4hMW8qkV9W804JbxXAoERV+/P7usLrpgBBRNK1oUXTIr6VrXO/gkEZjodJEGaHMkx99C9QU+VIzA3f9WtxVBOR+IdSPAIi5uAc1uM1XvCn8lvNMeaHLnDUKkaKX/jlF9UNTFkP0JgffQGs5WRb9KwvnAPlJipqsapLdc7PXsnx+9qKuomuo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Also in git [1]. Changes since v2 [2]: - empty cache refill/full cache flush using internal bulk operations - bulk alloc/free operations also use the cache - memcg, KASAN etc hooks processed when the cache is used for the operation - now fully transparent - NUMA node-specific allocations now explicitly bypass the cache [1] https://git.kernel.org/vbabka/l/slub-percpu-caches-v3r2 [2] https://lore.kernel.org/all/20230810163627.6206-9-vbabka@suse.cz/ ---- At LSF/MM I've mentioned that I see several use cases for introducing opt-in percpu arrays for caching alloc/free objects in SLUB. This is my first exploration of this idea, speficially for the use case of maple tree nodes. The assumptions are: - percpu arrays will be faster thank bulk alloc/free which needs relatively long freelists to work well. Especially in the freeing case we need the nodes to come from the same slab (or small set of those) - preallocation for the worst case of needed nodes for a tree operation that can't reclaim due to locks is wasteful. We could instead expect that most of the time percpu arrays would satisfy the constained allocations, and in the rare cases it does not we can dip into GFP_ATOMIC reserves temporarily. So instead of preallocation just prefill the arrays. - NUMA locality of the nodes is not a concern as the nodes of a process's VMA tree end up all over the place anyway. Patches 1-4 are preparatory, but should also work as standalone fixes and cleanups, so I would like to add them for 6.8 after review, and probably rebasing on top of the current series in slab/for-next, mainly SLAB removal, as it should be easier to follow than the necessary conflict resolutions. Patch 5 adds the per-cpu array caches support. Locking is stolen from Mel's recent page allocator's pcplists implementation so it can avoid disabling IRQs and just disable preemption, but the trylocks can fail in rare situations - in most cases the locks are uncontended so the locking should be cheap. Then maple tree is modified in patches 6-9 to benefit from this. From that, only Liam's patches make sense and the rest are my crude hacks. Liam is already working on a better solution for the maple tree side. I'm including this only so the bots have something for testing that uses the new code. The stats below thus likely don't reflect the full benefits that can be achieved from cache prefill vs preallocation. I've briefly tested this with virtme VM boot and checking the stats from CONFIG_SLUB_STATS in sysfs. Patch 5: slub per-cpu array caches implemented including new counters but maple tree doesn't use them yet /sys/kernel/slab/maple_node # grep . alloc_cpu_cache alloc_*path free_cpu_cache free_*path cpu_cache* | cut -d' ' -f1 alloc_cpu_cache:0 alloc_fastpath:20213 alloc_slowpath:1741 free_cpu_cache:0 free_fastpath:10754 free_slowpath:9232 cpu_cache_flush:0 cpu_cache_refill:0 Patch 7: maple node cache creates percpu array with 32 entries, not changed anything else majority alloc/free operations are satisfied by the array, number of flushed/refilled objects is 1/3 of the cached operations so the hit ratio is 2/3. Note the flush/refill operations also increase the fastpath/slowpath counters, thus the majority of those indeed come from the flushes and refills. alloc_cpu_cache:11880 alloc_fastpath:4131 alloc_slowpath:587 free_cpu_cache:13075 free_fastpath:437 free_slowpath:2216 cpu_cache_flush:4336 cpu_cache_refill:3216 Patch 9: This tries to replace maple tree's preallocation with the cache prefill. Thus should reduce all of the counters as many of the preallocations for the worst-case scenarios are not needed in the end. But according to Liam it's not the full solution, which probably explains why the reduction is only modest. alloc_cpu_cache:11540 alloc_fastpath:3756 alloc_slowpath:512 free_cpu_cache:12775 free_fastpath:388 free_slowpath:1944 cpu_cache_flush:3904 cpu_cache_refill:2742 --- Liam R. Howlett (2): tools: Add SLUB percpu array functions for testing maple_tree: Remove MA_STATE_PREALLOC Vlastimil Babka (7): mm/slub: fix bulk alloc and free stats mm/slub: introduce __kmem_cache_free_bulk() without free hooks mm/slub: handle bulk and single object freeing separately mm/slub: free KFENCE objects in slab_free_hook() mm/slub: add opt-in percpu array cache of objects maple_tree: use slub percpu array maple_tree: replace preallocation with slub percpu array prefill include/linux/slab.h | 4 + include/linux/slub_def.h | 12 + lib/maple_tree.c | 46 ++- mm/Kconfig | 1 + mm/slub.c | 561 +++++++++++++++++++++++++++++--- tools/include/linux/slab.h | 4 + tools/testing/radix-tree/linux.c | 14 + tools/testing/radix-tree/linux/kernel.h | 1 + 8 files changed, 578 insertions(+), 65 deletions(-) --- base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 change-id: 20231128-slub-percpu-caches-9441892011d7 Best regards,