Message ID | 20231102032330.1036151-1-chengming.zhou@linux.dev (mailing list archive) |
---|---|
Headers | show
Return-Path: <owner-linux-mm@kvack.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B8FEC4332F for <linux-mm@archiver.kernel.org>; Thu, 2 Nov 2023 03:24:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17B698E0015; Wed, 1 Nov 2023 23:24:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 105008E0009; Wed, 1 Nov 2023 23:24:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE8D88E0015; Wed, 1 Nov 2023 23:24:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DFD668E0009 for <linux-mm@kvack.org>; Wed, 1 Nov 2023 23:24:41 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BED1E1A0E2D for <linux-mm@kvack.org>; Thu, 2 Nov 2023 03:24:41 +0000 (UTC) X-FDA: 81411571962.03.A417275 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) by imf10.hostedemail.com (Postfix) with ESMTP id C13DEC000A for <linux-mm@kvack.org>; Thu, 2 Nov 2023 03:24:39 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=hDZi+BTf; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.179 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698895480; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=44Ofo08nnlVGC1+D0ik4N2mhA+VghDHvF1jQ7/BBmJM=; b=HaWKsSNbILds097Ir4wKFc3jkmIg2zk6VxCNgXlG1/biq3nvB4sWyJD0HmJniqGVBVymGC hdMIJquvlo/IgkRrzKWNhSfy/ayzhtd0XcZTnR7AVynsb/I38S4kU7TS3y8V+xSVKe3mks 3ge6wZzO5JW9kVOBg0bvQS4T27LSxVQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=hDZi+BTf; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.179 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698895480; a=rsa-sha256; cv=none; b=vE+mPI0o+fzYfTjS5HOsAMDXd589F9muWX/xdN+2MaWcrd4x1VrZPjAC5FwuFNXBVaCvAe /n/MjhnM2wxAQj13tTilQ4GQCm7DDyhh/DcEPWby140vk/N8u7f7NLCNRZbcXcNHBgLIEU rkLRF4wSH/DA+Q9nvE+DtTbhUdngwAc= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698895477; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=44Ofo08nnlVGC1+D0ik4N2mhA+VghDHvF1jQ7/BBmJM=; b=hDZi+BTfnatfirUvnC7qCE1cD7sxRD9ekvwfTs2FxozQdKO3S6K8R5hv5aVRh1LgWJlh5c IBHDJ5naZiteqnk/UJphOvy/Q1P8V7W6R9urRvc5GuxWcLv5v9qERwjEgUNT2lVB+G25u7 /3vLyJzzp6meRSQUw8dGOqX9r97j2GM= From: chengming.zhou@linux.dev To: vbabka@suse.cz, cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou <zhouchengming@bytedance.com> Subject: [PATCH v5 0/9] slub: Delay freezing of CPU partial slabs Date: Thu, 2 Nov 2023 03:23:21 +0000 Message-Id: <20231102032330.1036151-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C13DEC000A X-Stat-Signature: exon9azrojibgsxosmpjkc9ygjdafg6k X-Rspam-User: X-HE-Tag: 1698895479-485499 X-HE-Meta: U2FsdGVkX19B1xWAzF8CFYqLXtaAY+1qMRXDaJTJd1RWEXZDyvsN+rotqi+vdE7fAtAe5k1KPbLBiy+ECJrv+B0on5Gqgoa0BdOQFNO5xrg505ZtiBUw1EG2v0uWcigsFeapBB/KwzQ5YfB9JYCg3q1Fkr0m1WJZ2/iplUNiqIcjez4t78YdZton2d5tfwKxKUld7PpT097Wtx8DZae9sP9yZH4piw+Q58rJMR11CJNEJ24NL8Rsu9SvzxuvMfDCuXPl53aTa5z/1teIe0CskFi9PHk9SDuyOhrwsDAVgEvGKFTcAG/TkD3i2U4sTLXxp9mOyuhGZUi3mxG7J/OzRH6gjDhh7Hs0cFjabquMzeTaZZif/+uhWbRSGTh+H9O95I63BUWWgV4egjKbs52BECuTWFrRhYP8w13dvyMctUIZi1T2uVBQaxrcl+aglEHXHIXOQhvnQz6MnXj2ZLfF4cCSC/oxcpRQCX00ZnUH697ZCL6v3jgG1CeHXSgJ/9FgRclpc9GfgzhOz3b3h/j4xd4sBtVe/AjrBWHUn8biRxfV8JB672Qb2piBPTsKxNq29CdDoEHrPse9u9iJ5WfHiyWdfxRz4fVg3irigQr6Bj93p+rZt+4K4gkYug5C+tKgz4GrRMdbs1P/qdumPd38NhN+QFjWYvqXOR9xmgNEypQY1hSA8ShIud8IJ9xPBVKFfZWqVMQAWnzpv7TqkpLnC42rehweoIWTu5SjTmD6PfST2G+OyNtPne32IZ1XbPTqoy9uViKmZolGg78+YZtX9XbSmKluDlyzclS8B+eMjPjviDAjeO78saQf8sE2I7LMejscTHiyR1eoMhFOK3lMkxZJzo543vTcprYNBs5KVHsWJl+VpfZ5gfJye5fueJr8jusyl4JUsT7cJ0K4DkvxUlWfJoB7FFfTMTCqnz3roHkb4MDujisAEvug37SM6ZyKar7CqZh9/9veDSdz433 l/RJEbfR szyUpGVLAazAaO/FcU568DKd9TnfkJj92oClgjGvSOAys+4iPg272V8LlUSdtAKAzOYnX2lgSd9Op2RCOlv7gLPS/jWe67DJrHMkV45iVRw/hxYcBiraqesdQ+7poDe9xzkREeFEE54bPZ2jexGIJ7nFoM85Wkx9n22aybXE6RmTObzdciFmckaaTGEPHmY9qWEsiM38VzsjaJLkWTTfcmsezM8vLe5Jytn3BG87TkmnzYeRf4CIueOaGraCtQT3Qqf0l915xjjjyji4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org> List-Subscribe: <mailto:majordomo@kvack.org> List-Unsubscribe: <mailto:majordomo@kvack.org> |
Series |
slub: Delay freezing of CPU partial slabs
|
expand
|
On 11/2/23 04:23, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > Changes in v5: > - Drop "RFC". > - Retest to update performance numbers (little difference with RFC v1). > - Add Reviewed-by and Tested-by tags. Many thanks! > - Change to better function name: __put_partials(). > - Some minor improvements of comments and changelog. > - RFC v4: https://lore.kernel.org/all/20231031140741.79387-1-chengming.zhou@linux.dev/ Thanks! Pushed to https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.8/partial-freezing Vlastimil
From: Chengming Zhou <zhouchengming@bytedance.com> Changes in v5: - Drop "RFC". - Retest to update performance numbers (little difference with RFC v1). - Add Reviewed-by and Tested-by tags. Many thanks! - Change to better function name: __put_partials(). - Some minor improvements of comments and changelog. - RFC v4: https://lore.kernel.org/all/20231031140741.79387-1-chengming.zhou@linux.dev/ Changes in RFC v4: - Reorder patches to put the two cleanup patches to the front. - Move slab_node_partial flag functions to mm/slub.c. - Fix freeze_slab() by using slab_update_freelist(). - Fix build error when !CONFIG_SLUB_CPU_PARTIAL. - Add a patch to rename all *unfreeze_partials* functions. - Add a patch to update inconsistent documentations in the source. - Some comments and changelog improvements. - Add Reviewed-by and Suggested-by tags. Many thanks! - RFC v3: https://lore.kernel.org/all/20231024093345.3676493-1-chengming.zhou@linux.dev/ Changes in RFC v3: - Directly use __set_bit() and __clear_bit() for the slab_node_partial flag operations to avoid exporting non-atomic "workingset" interfaces. - Change get_partial() related functions to return a slab instead of returning the freelist or single object. - Don't freeze any slab under the node list_lock to further reduce list_lock holding times, as suggested by Vlastimil Babka. - Introduce freeze_slab() to do the delay freezing and return freelist. - Reorder patches. - RFC v2: https://lore.kernel.org/all/20231021144317.3400916-1-chengming.zhou@linux.dev/ Changes in RFC v2: - Reuse PG_workingset bit to keep track of whether slub is on the per-node partial list, as suggested by Matthew Wilcox. - Fix OOM problem on kernel without CONFIG_SLUB_CPU_PARTIAL, which is caused by leak of partial slabs when get_partial_node(). - Add a patch to simplify acquire_slab(). - Reorder patches a little. - RFC v1: https://lore.kernel.org/all/20231017154439.3036608-1-chengming.zhou@linux.dev/ 1. Problem ========== Now we have to freeze the slab when get from the node partial list, and unfreeze the slab when put to the node partial list. Because we need to rely on the node list_lock to synchronize the "frozen" bit changes. This implementation has some drawbacks: - Alloc path: twice cmpxchg_double. It has to get some partial slabs from node when the allocator has used up the CPU partial slabs. So it freeze the slab (one cmpxchg_double) with node list_lock held, put those frozen slabs on its CPU partial list. Later ___slab_alloc() will cmpxchg_double try-loop again if that slab is picked to use. - Alloc path: amplified contention on node list_lock. Since we have to synchronize the "frozen" bit changes under the node list_lock, the contention of slab (struct page) can be transferred to the node list_lock. On machine with many CPUs in one node, the contention of list_lock will be amplified by all CPUs' alloc path. The current code has to workaround this problem by avoiding using cmpxchg_double try-loop, which will just break and return when contention of page encountered and the first cmpxchg_double failed. But this workaround has its own problem. For more context, see 9b1ea29bc0d7 ("Revert "mm, slub: consider rest of partial list if acquire_slab() fails""). - Free path: redundant unfreeze. __slab_free() will freeze and cache some slabs on its partial list, and flush them to the node partial list when exceed, which has to unfreeze those slabs again under the node list_lock. Actually we don't need to freeze slab on CPU partial list, in which case we can save the unfreeze cmpxchg_double operations in flush path. 2. Solution =========== We solve these problems by leaving slabs unfrozen when moving out of the node partial list and on CPU partial list, so "frozen" bit is 0. These partial slabs won't be manipulate concurrently by alloc path, the only racer is free path, which may manipulate its list when !inuse. So we need to introduce another synchronization way to avoid it, we reuse PG_workingset to keep track of whether the slab is on node partial list or not, only in that case we can manipulate the slab list. The slab will be delay frozen when it's picked to actively use by the CPU, it becomes full at the same time, in which case we still need to rely on "frozen" bit to avoid manipulating its list. So the slab will be frozen only when activate use and be unfrozen only when deactivate. The current updated scheme (which this series implemented) is: - node partial slabs: PG_Workingset && !frozen - cpu partial slabs: !PG_Workingset && !frozen - cpu slabs: !PG_Workingset && frozen - full slabs: !PG_Workingset && !frozen The most important change is that "frozen" bit is not set for the cpu partial slabs anymore, __slab_free() will grab node list_lock then check by !PG_Workingset that it's not on a node partial list. And the "frozen" bit is still kept for the cpu slabs for performance, since we don't need to grab node list_lock to check whether PG_Workingset is set or not if the "frozen" bit is set in the __slab_free(). 3. Testing ========== We did some simple testing on a server with 128 CPUs (2 nodes) to compare performance. - perf bench sched messaging -g 5 -t -l 100000 baseline v5 7.042s 6.934s 7.022s 6.865s 7.054s 7.009s - stress-ng --rawpkt 128 --rawpkt-ops 100000000 baseline v5 2.42s 2.18s 2.45s 2.16s 2.44s 2.17s It shows above there is about 10% improvement on stress-ng rawpkt testcase, although no much improvement on perf sched bench testcase. Thanks for any comment and code review! Chengming Zhou (9): slub: Reflow ___slab_alloc() slub: Change get_partial() interfaces to return slab slub: Keep track of whether slub is on the per-node partial list slub: Prepare __slab_free() for unfrozen partial slab out of node partial list slub: Introduce freeze_slab() slub: Delay freezing of partial slabs slub: Optimize deactivate_slab() slub: Rename all *unfreeze_partials* functions to *put_partials* slub: Update frozen slabs documentations in the source mm/slub.c | 384 +++++++++++++++++++++++++----------------------------- 1 file changed, 180 insertions(+), 204 deletions(-)