From patchwork Thu Feb 27 20:48:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sidhartha Kumar X-Patchwork-Id: 13995210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCD59C19F32 for ; Thu, 27 Feb 2025 20:48:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 855EA6B008C; Thu, 27 Feb 2025 15:48:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E3F66B0092; Thu, 27 Feb 2025 15:48:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 635336B0093; Thu, 27 Feb 2025 15:48:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3BAB56B008C for ; Thu, 27 Feb 2025 15:48:40 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AE541121CDE for ; Thu, 27 Feb 2025 20:48:39 +0000 (UTC) X-FDA: 83166913158.12.4875E04 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf21.hostedemail.com (Postfix) with ESMTP id 95D251C0007 for ; Thu, 27 Feb 2025 20:48:37 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-11-20 header.b="B/MWfCW/"; dmarc=pass (policy=reject) header.from=oracle.com; spf=pass (imf21.hostedemail.com: domain of sidhartha.kumar@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=sidhartha.kumar@oracle.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740689317; a=rsa-sha256; cv=none; b=yf6g3qXqg0T+11xdnNRpjk0f4c6BQjDYrfaIOZ1iIGq/aBulaz8BsYPAPehOekdWEozg/q AYlQ9u/TifI6n4NiNMbMTRBaSfL0Dfu650HTu8faCzUiKb7g+UwGb3rS4+vjYNrZQWigQ6 GixTnCoap3lpYs3WklrMzZNxsKiKEfY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-11-20 header.b="B/MWfCW/"; dmarc=pass (policy=reject) header.from=oracle.com; spf=pass (imf21.hostedemail.com: domain of sidhartha.kumar@oracle.com designates 205.220.165.32 as permitted sender) smtp.mailfrom=sidhartha.kumar@oracle.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740689317; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JuEm0THBBAVfyITvD6DAdLmtYPg2v5ViaCoOAvJ6+pg=; b=laWwo9F/yXFF9ExpESXKNijljnPpZZ58a2K8GcBeeqo4P+93+po4iryWrPAokEflp2OL8+ C1gI8dEgGcKNeYpJZrUi2jejtDnU6ESibVqr+816aHiYpJ+5xWm6AWW7bOBf7uvN+ngCC/ ueGhfLS5cCz4/CwcORc4ui/hG+vPXBE= Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51RJiXZv016373; Thu, 27 Feb 2025 20:48:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=JuEm0 THBBAVfyITvD6DAdLmtYPg2v5ViaCoOAvJ6+pg=; b=B/MWfCW/6nVzxODMI4aAh tGJYjO/x3zbgvlkDiywZwNdVfmtMgTS4rVDja7zV/B/2PQ0tuZI9Vh17GAiPz3qn tnt2776lS0aDWlgQWZE3eGq/HcwnX46ybhLlvbvcgCpctNyCPFK7r6Oi7ec9APVm 8gXgV8Um9MwSaxx0OaMa/UlKnIHiPLdcyLSSbJfNcN79i47cJsxWr9IaMWU+Z+nQ 6EXVs7mgEaZalrLDuLEOA1DNXoppK5AOGn/pWmgrhtEx3oEOlz3MUQr97xU3odtz LhNF/cSD1rF5Ll1kLEPSZKPMHGkHqXvPhJElfx7hkcsiVuL+XXHje2aqj6wCLke/ A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 451psecbja-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Feb 2025 20:48:35 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 51RJB2Du012641; Thu, 27 Feb 2025 20:48:35 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44y51dws6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Feb 2025 20:48:35 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 51RKhrvm030883; Thu, 27 Feb 2025 20:48:34 GMT Received: from sidhakum-ubuntu.osdevelopmeniad.oraclevcn.com (sidhakum-ubuntu.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.250.108]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44y51dwrvc-6; Thu, 27 Feb 2025 20:48:34 +0000 From: Sidhartha Kumar To: linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, liam.howlett@oracle.com, richard.weiyang@gmail.com, Sidhartha Kumar Subject: [PATCH v3 5/6] maple_tree: add sufficient height Date: Thu, 27 Feb 2025 20:48:22 +0000 Message-ID: <20250227204823.758784-6-sidhartha.kumar@oracle.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250227204823.758784-1-sidhartha.kumar@oracle.com> References: <20250227204823.758784-1-sidhartha.kumar@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-27_07,2025-02-27_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2502100000 definitions=main-2502270154 X-Proofpoint-GUID: -0BP64E3TDaVjZabqOuZvlV2I6UXJsU1 X-Proofpoint-ORIG-GUID: -0BP64E3TDaVjZabqOuZvlV2I6UXJsU1 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 95D251C0007 X-Stat-Signature: po9bd5h3t3i5njeuk9w1o6t5dert84xp X-Rspam-User: X-HE-Tag: 1740689317-317662 X-HE-Meta: U2FsdGVkX1/RMCRvEs0wiNc/Oz8O7uE6/n4IB0Nbpy+MH2VxW5M5sCSv4npClejjU41b/xG3o/6a7uASAfmllIKMTib2ulum/VMJU63BkN/RoPVFCshGnGJ1i6OhA2wO89u7ZEuZKk2B4h+5bBPiX4WBw2PGLgBkt/HP2tW95q2M6zRac74QgFy3NEDcAM0Iw3h3Wkc0Y48fhsBf/T+/RqXAUvF4pyHEdJ+pn7vuRON/DkpYwy5XJ0rF57wyYmNqK1vyzXa2BfhYLi/HIjupIWMDV3qtm5U0OTbsM7lVWJVN4pn1wIfKtPJKgJlMGK0xSxtuc2knpo1/xZtpE9IdSTCVzPPftkPY4bulTyuQBofvn2h1CRkDBS6cEq0Tkl46fSwsHw2hmyQ2YLgvDX/0Bie3wRq1mGa18YEtzZnaT1Qx4McVHKYdGo/C+V6i6eAy6vXppLovxrwHRTaEl6gJp3iWsD8DkCLDIN2qzCBFnu/dSS3rvNnAsZYG0QVeqCf/G6W4ozXEoVLwBQA/eptwstnTc/uhSIM7Sex+siU/uPiB8lYH72Obx0dZniYjkhh3uY3jOu0ig114xIbhCI4Mxn1AVS5QZ/GzkTEI4G6+bzIIDrv3fbGTq2PJi2ER5qdjDw9fbwQwUn0Zbp77jaohh4P0psJEpcZgHsPqrNIG7EqC2RLKUl/Xy4qGiLaA8zuI3enedmCkUm8mQdjGnLMn3zkgG7s4gv8IXrHCYhqt7Ux35M0djFH23Nk5W13UN7Rdvp+/u4gaAZTHCu+TNgY6QXAVzOQpK72MigdmcuIfJyOgux9lm3cHPLwo8XXNDGDTsyZgioHQkftvbQ4mJcOYwu6d799TpATwQM2wzJUiVrzGMp5n5CmtbS9yFWqz9ImCim9ILMaVUwKBbVHZQnLoVZUVQ4owCfL5V9jBOt+FRxFW9cyXZtznNEpIcE5acwzBc+SYoBnxKy8h4y+5COF onW5hvPd vrkdHAz6Cr8oiY26jVqsJw4gRJkAtgX6f9abtsa/KOHRvCTxX5DYt5bL4b6A3cxba/WIcUPtutqWNvmRTky+dFmfPm2ZcSpn1KgGgLpsPMi15UvHvppX1bbH+jQQydrCvBwlISYLc0WIeI7GUiqEnrhbcgJsp29yXoTEWunAXGRQl9aYSSj8vWrCHlpaiGJ0KK64mQjBbX/XJRsWuf4Hj4megxrcg9iaHhO1rm4AL2SisDFvF5LswudE7LKDK8mNBTaYUaDbBrM/SY306hlAa6f16uksez3hp5G6a3r2XfNLY1protoQ1x3CzifFTvRgYbi4plJc/d7W8yzirJrD+KaohGazNfJxfbbTb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In order to support rebalancing and spanning stores using less than the worst case number of nodes, we need to track more than just the vacant height. Using only vacant height to reduce the worst case maple node allocation count can lead to a shortcoming of nodes in the following scenarios. For rebalancing writes, when a leaf node becomes insufficient, it may be combined with a sibling into a single node. This means that the parent node which has entries for this children will lose one entry. If this parent node was just meeting the minimum entries, losing one entry will now cause this parent node to be insufficient. This leads to a cascading operation of rebalancing at different levels and can lead to more node allocations than simply using vacant height can return. For spanning writes, a similar situation occurs. At the location at which a spanning write is detected, the number of ancestor nodes may similarly need to rebalanced into a smaller number of nodes and the same cascading situation could occur. To use less than the full height of the tree for the number of allocations, we also need to track the height at which a non-leaf node cannot become insufficient. This means even if a rebalance occurs to a child of this node, it currently has enough entries that it can lose one without any further action. This field is stored in the maple write state as sufficient height. In mas_prealloc_calc() when figuring out how many nodes to allocate, we check if the the vacant node is lower in the tree than a sufficient node (has a larger value). If it is, we cannot use the vacant height and must use the difference in the height and sufficient height as the basis for the number of nodes needed. Signed-off-by: Sidhartha Kumar --- include/linux/maple_tree.h | 4 +++- lib/maple_tree.c | 17 +++++++++++++++-- tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 46 insertions(+), 3 deletions(-) diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 7d777aa2d9ed..37dc9525dff6 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -464,6 +464,7 @@ struct ma_wr_state { void *entry; /* The entry to write */ void *content; /* The existing entry that is being overwritten */ unsigned char vacant_height; /* Depth of lowest node with free space */ + unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */ }; #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock)) @@ -499,7 +500,8 @@ struct ma_wr_state { .mas = ma_state, \ .content = NULL, \ .entry = wr_entry, \ - .vacant_height = 0 \ + .vacant_height = 0, \ + .sufficient_height = 0 \ } #define MA_TOPIARY(name, tree) \ diff --git a/lib/maple_tree.c b/lib/maple_tree.c index c859c7253d69..d3aa5241166b 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -3560,6 +3560,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas) if (mas->end < mt_slots[wr_mas->type] - 1) wr_mas->vacant_height = mas->depth + 1; + if (ma_is_root(mas_mn(mas))) { + /* root needs more than 2 entries to be sufficient + 1 */ + if (mas->end > 2) + wr_mas->sufficient_height = 1; + } else if (mas->end > mt_min_slots[wr_mas->type] + 1) + wr_mas->sufficient_height = mas->depth + 1; + mas_wr_walk_traverse(wr_mas); } @@ -4195,13 +4202,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry) ret = 0; break; case wr_spanning_store: - WARN_ON_ONCE(ret != height * 3 + 1); + if (wr_mas->sufficient_height < wr_mas->vacant_height) + ret = (height - wr_mas->sufficient_height) * 3 + 1; + else + ret = delta * 3 + 1; break; case wr_split_store: ret = delta * 2 + 1; break; case wr_rebalance: - ret = height * 2 + 1; + if (wr_mas->sufficient_height < wr_mas->vacant_height) + ret = (height - wr_mas->sufficient_height) * 2 + 1; + else + ret = delta * 2 + 1; break; case wr_node_store: ret = mt_in_rcu(mas->tree) ? 1 : 0; diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c index 5950a7c9b27f..dc8e40b94c3f 100644 --- a/tools/testing/radix-tree/maple.c +++ b/tools/testing/radix-tree/maple.c @@ -36311,6 +36311,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt) extern void test_kmem_cache_bulk(void); +/* + * Test to check the path of a spanning rebalance which results in + * a collapse where the rebalancing of the child node leads to + * insufficieny in the parent node. + */ +static void check_collapsing_rebalance(struct maple_tree *mt) +{ + int i = 0; + MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX); + + /* create a height 4 tree */ + while (mt_height(mt) < 4) { + mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL); + i += 9; + } + + /* delete all entries one at a time, starting from the right */ + do { + mas_erase(&mas); + } while (mas_prev(&mas, 0) != NULL); + + mtree_unlock(mt); +} + /* callback function used for check_nomem_writer_race() */ static void writer2(void *maple_tree) { @@ -36477,6 +36501,10 @@ void farmer_tests(void) check_spanning_write(&tree); mtree_destroy(&tree); + mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE); + check_collapsing_rebalance(&tree); + mtree_destroy(&tree); + mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE); check_null_expand(&tree); mtree_destroy(&tree);