From patchwork Fri Feb 21 16:36:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sidhartha Kumar X-Patchwork-Id: 13985954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF050C021B3 for ; Fri, 21 Feb 2025 16:36:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37E76280019; Fri, 21 Feb 2025 11:36:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32E6B280016; Fri, 21 Feb 2025 11:36:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D0CB280019; Fri, 21 Feb 2025 11:36:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EED95280016 for ; Fri, 21 Feb 2025 11:36:19 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9C01B16185B for ; Fri, 21 Feb 2025 16:36:19 +0000 (UTC) X-FDA: 83144504478.02.1825437 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf20.hostedemail.com (Postfix) with ESMTP id 70D421C001D for ; Fri, 21 Feb 2025 16:36:17 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-11-20 header.b=IQLwrJzQ; spf=pass (imf20.hostedemail.com: domain of sidhartha.kumar@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=sidhartha.kumar@oracle.com; dmarc=pass (policy=reject) header.from=oracle.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740155777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=649omYDxNDHzRQG2ilV3tHLUDJcolgsRVCUcJB/s4q8=; b=inbxyqgSfLyDuomiZFf6X9Me57uj5I2C1pX94FcwGLykJ2Bd4A/LBRiC2A9yg3T+6Nc/E1 mShtjd7s5Zin1JeBWNixx4K24i2vaLz9xBfTQGNuPk8ITFPXo+yrM2UoRJIf+Tjod/dcMA vfMWywbz9ogL3q14eLd5hrZLVpYQvOA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-11-20 header.b=IQLwrJzQ; spf=pass (imf20.hostedemail.com: domain of sidhartha.kumar@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=sidhartha.kumar@oracle.com; dmarc=pass (policy=reject) header.from=oracle.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740155777; a=rsa-sha256; cv=none; b=yL9vw4bGmCZDnERH0+k0goFH/isInc7IKhP0ySptYG5jlG4Fe0wD9/Rrt+59s9JaOMPUG4 nqloVFHQQiM0nVQkeIAnyBpWntZmAD8r5Neb7sx7XbzVxHkKsBzCXZycw9fNCqkv3nPzqo Q57fKeJ5uHHspOJKBeH5lyXMuSTQoyM= Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51L8fcY9018599; Fri, 21 Feb 2025 16:36:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:message-id:mime-version :subject:to; s=corp-2023-11-20; bh=649omYDxNDHzRQG2ilV3tHLUDJcol gsRVCUcJB/s4q8=; b=IQLwrJzQSbzL+WTOsja4vo9vto1YsuCPSnYUTfxeR41ys IeMIMHJGyBlnCmdzdP2Y11XdC4a+mmQXkY4O5ZJEVIARTwGEDwIJBdiZ3ZL1D2IJ UJu9FHpcd0qHgC/VJbuIToXDEQxjcO+thbCDZLwav95UzIAzIUnt222SdAAmnymb tX4W7InBPXQulBl6MJEPQ8Ycoyv2rQl2CsOa0dTnMtoTDlcZsXY1q6q5N8moPT9w vuhtNulg0ZP+RWl2OCB1KkoIxsIoxdWO2LOK4Ayd2AdIACKJ3ePtfkOHaqonuvJc krgyLIhAYxayZ1dExzKTY/nMXcXbdT46DRlwGK3Bw== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44w00npq56-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Feb 2025 16:36:15 +0000 (GMT) Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 51LG6DSI010612; Fri, 21 Feb 2025 16:36:13 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 44w07gxm6v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Feb 2025 16:36:13 +0000 Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 51LGUIXB005786; Fri, 21 Feb 2025 16:36:13 GMT Received: from sidhakum-ubuntu.osdevelopmeniad.oraclevcn.com (sidhakum-ubuntu.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.250.108]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 44w07gxm69-1; Fri, 21 Feb 2025 16:36:12 +0000 From: Sidhartha Kumar To: linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, liam.howlett@oracle.com, richard.weiyang@gmail.com, Sidhartha Kumar Subject: [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Date: Fri, 21 Feb 2025 16:36:04 +0000 Message-ID: <20250221163610.578409-1-sidhartha.kumar@oracle.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-21_05,2025-02-20_02,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2502100000 definitions=main-2502210118 X-Proofpoint-GUID: x-qf010Q5Zo79Xi2hL3fMVdnP0oSjw9c X-Proofpoint-ORIG-GUID: x-qf010Q5Zo79Xi2hL3fMVdnP0oSjw9c X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 70D421C001D X-Stat-Signature: hwxmft68esaeg3kbukqhkf7parx431nw X-HE-Tag: 1740155777-287495 X-HE-Meta: U2FsdGVkX187sa6XvB7/7ZWI/Czp7y1y0SRVDTEdnyKyjqnerlDa5X7+xjzt2mO6djOTKJ7Vw9J8f6TUQPWDyoOnYGDVD8GqTzHmGHQzPmKKKhXj13JWVRI/8h5x6ycpHjPqqw5ytjIHURll75pukAKgOTfjsSvK/ybPUliGALjr6siJfdysvZn3WabIcqJdItxwm3D5UOV/HsDl9htk6tz0nz0K3u8T8VsnGnxHfv72z36bORMb5SdCU9JdXR/E4MFaBQzJS1xNwsU8pOWdWsAg5DaGkId/Uiibt0u6EPfX0ROok2JcKlzOY7VfWCbPYK/DZktzoKPRXutZm8Bsq7059KrFb4/dyanlBGqS1jNt8CRgC10tsAvUx0DXtxFRJFpNqv5hUig/jlej/6jFdB4cnJoof47ZEsiPFTxxDLTywGCUyYMqi0SkP9uhO76i+O7EpIiAa7CGby3tsyNgHad/GZ62qaZTCy4Ps4/zvv6hg/OL7EtL5TbnomfVGldBk0MDh9DUaCsDe5OSpxZEypK7wxL6nctgVzkYyAqpcSULrfropysgJjAn1GcAbmkcqoQ8iA4K+K/uYSq0+X++i5Tt/sDjTvpDBVnSNkPO/TS1nmCX0hNy/RgwD3f5Rcr1whMqdNNyjnF/InPnJinuO3klERR+yEwAQHd/9CU+IB9cy0L4WO7meagRd1b2XBwbxYF6lUY66cjSUaFD+/ez/4f543xiQdi3h2euUWcC402/kOPvWD0Rw7l9Wn9g+hdKNkrClCwU2zgsV3wbVYAgEo183aPv3JINFimJo28ZgTySpTbwb2cy2JAYP/VtZMWW6uSPOAn58LWflYUw84+kI8HtSlxwvV94SAwO2vixjWfew350RDNcdG2EJtYXkYfah46DhAcxzhhT8FNMnyiACAlj8WJjFjGp98khm9gMNF2V5d4iNPm6rMcmWhAssR1w5nl4ORlmSrsFhG9D4bM hburWvXs E2yX62Xb8ZMuLuQDxE+3E6VA38gJ5wDRo/svesMTJouQVoJgStdsn/1rcqQBVBAbiHspIQjLXqOIDZUC5wFkouG6xJEsUtgnZykFjK+TGVo1BhaB4eNrVMUzlyAYI08hPLgWT4Un6YxFxH+8I5tWgA1uhBxr5u/Sx3dweMasGSDrMHoBXTuJawyb7G4wNb6Svh+ZZFoWUyTYwJP2yMkysl+A0PHrxckryml2+otKlNDdvSfN3RhPUT6VzElc+sibrOqSa9EwV2neWKB4Mh0/PAB11RNCAOBFauofEOi8aq4sGdqrHSH5SGVwH0idKjUpngIAMEvnnTiS/gf4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000128, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: v1[1] -> v2: - fix comment for vacant_height which refers to depth per Wei - add a patch to reorder switch case statements in mas_prealloc_calc and mas_wr_store_entry - use sufficient height in spanning stores - modify patch 2 to use a counter to track ascending the tree rather than overloading mas->depth to have this function. ================ overview ======================== Currently, the maple tree preallocates the worst case number of nodes for given store type by taking into account the whole height of the tree. This comes from a worst case scenario of every node in the tree being full and having to propagate node allocation upwards until we reach the root of the tree. This can be optimized if there are vacancies in nodes that are at a lower depth than the root node. This series implements tracking the level at which there is a vacant node so we only need to allocate until this level is reached, rather than always using the full height of the tree. The ma_wr_state struct is modified to add a field which keeps track of the vacant height and is updated during walks of the tree. This value is then read in mas_prealloc_calc() when we decide how many nodes to allocate. For rebalancing and spanning stores, we also need to track the lowest height at which a node has 1 more entry than the minimum sufficient number of entries. This is because rebalancing can cause a parent node to become insufficient which results in further node allocations. In this case, we need to use the sufficient height as the worst case rather than the vacant height. patch 1-2: preparatory patches patch 3: implement vacant height tracking + update the tests patch 4: support vacant height tracking for rebalacning writes patch 5: implement sufficient height tracking patch 6: reorder switch case statements ================ results ========================= Bpftrace was used to profile the allocation path for requesting new maple nodes while running stress-ng mmap 120s. The histograms below represent requests to kmem_cache_alloc_bulk() and show the count argument. This represnts how many maple nodes the caller is requesting in kmem_cache_alloc_bulk() command: stress-ng --mmap 4 --timeout 120 mm-unstable @bulk_alloc_req: [3, 4) 4 | | [4, 5) 54170 |@ | [5, 6) 0 | | [6, 7) 893057 |@@@@@@@@@@@@@@@@@@@@ | [7, 8) 4 | | [8, 9) 2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [9, 10) 55811 |@ | [10, 11) 77834 |@ | [11, 12) 0 | | [12, 13) 1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [13, 14) 0 | | [14, 15) 0 | | [15, 16) 367197 |@@@@@@@@ | @maple_node_total: 46,630,160 @total_vmas: 46184591 mm-unstable + this series @bulk_alloc_req: [2, 3) 198 | | [3, 4) 4 | | [4, 5) 43 | | [5, 6) 0 | | [6, 7) 1069503 |@@@@@@@@@@@@@@@@@@@@@ | [7, 8) 4 | | [8, 9) 2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [9, 10) 472191 |@@@@@@@@@ | [10, 11) 191904 |@@@ | [11, 12) 0 | | [12, 13) 247316 |@@@@ | [13, 14) 0 | | [14, 15) 0 | | [15, 16) 98769 |@ | @maple_node_total: 37,813,856 @total_vmas: 43493287 This represents a ~19% reduction in the number of bulk maple nodes allocated. For more reproducible results, a historgram of the return value of mas_prealloc_calc() is displayed while running the maple_tree_tests whcih have a deterministic store pattern mas_prealloc_calc() return value mm-unstable 1 : (12068) 3 : (11836) 5 : ***** (271192) 7 : ************************************************** (2329329) 9 : *********** (534186) 10 : (435) 11 : *************** (704306) 13 : ******** (409781) mas_prealloc_calc() return value mm-unstable + this series 1 : (12070) 3 : ************************************************** (3548777) 5 : ******** (633458) 7 : (65081) 9 : (11224) 10 : (341) 11 : (2973) 13 : (68) do_mmap latency was also measured for regressions: command: stress-ng --mmap 4 --timeout 120 mm-unstable: avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034 mm-unstable + this series: avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726 [1]: https://lore.kernel.org/lkml/20241114170524.64391-1-sidhartha.kumar@oracle.com/T/ Sidhartha Kumar (6): maple_tree: convert mas_prealloc_calc() to take in a maple write state maple_tree: use height and depth consistently maple_tree: use vacant nodes to reduce worst case allocations maple_tree: break on convergence in mas_spanning_rebalance() maple_tree: add sufficient height maple_tree: reorder mas->store_type case statements include/linux/maple_tree.h | 4 + lib/maple_tree.c | 193 ++++++++++++++++++------------- tools/testing/radix-tree/maple.c | 130 +++++++++++++++++++-- 3 files changed, 240 insertions(+), 87 deletions(-)