From patchwork Mon Oct 16 03:22:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13422443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EA09CDB465 for ; Mon, 16 Oct 2023 03:22:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B92F8D001F; Sun, 15 Oct 2023 23:22:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 969288D0001; Sun, 15 Oct 2023 23:22:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 857A18D001F; Sun, 15 Oct 2023 23:22:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 770AB8D0001 for ; Sun, 15 Oct 2023 23:22:46 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 47C2E140A67 for ; Mon, 16 Oct 2023 03:22:46 +0000 (UTC) X-FDA: 81349877532.07.4AB7DA3 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf04.hostedemail.com (Postfix) with ESMTP id A83814000D for ; Mon, 16 Oct 2023 03:22:43 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="PGcSuDN/"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf04.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697426564; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=gczzW0LaNc0vEssMkLzS11PtF6rOzBiU0Uwpu4VNOxY=; b=Z/gj65ZPMsP0/5hyUqOdGw7kdI7geY2icoA3IykbGttcVF3vqCG5mA4e0yQuRrIXHYuDyz xz7Omg3cEC2Sa6ReY8aZyCg0sdls5S/OwCULV2vw64xhKURXxCKnKpF0zMlIxl+jmRRSkd dWqCFDvjQW4/Wt1ueMMduJK3eZ+sfjY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="PGcSuDN/"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf04.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697426564; a=rsa-sha256; cv=none; b=m4lStnvGF16JKEpRjI+tJrH2JhEmViXZ650ro5zBN8nwo727dlXW8lzOzLQMGaDVGDflIS rcID2Ga2I1nSrPGVwJa/2pYqnpwkwYL3mj/xWayxHbaTJto2j3gIRzoJ14Ogdyo6OMAEZp vLKCpWyAcuyT7H/AXLjmYKTdAwpsORY= Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-27d1aee5aa1so2901149a91.0 for ; Sun, 15 Oct 2023 20:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1697426562; x=1698031362; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=gczzW0LaNc0vEssMkLzS11PtF6rOzBiU0Uwpu4VNOxY=; b=PGcSuDN/C7UoL4DwFJYSDWp/e4x0am6f8M3WfRMlC7GBt+wDFUkVYwbqWYh+TPUjnT yABMhgXk0mT8WxBUSD4AcHBWk+7ajrdrd1lWnQ2u6b8eTUP4Wwbb191XkB3ezfQfR0kP IMtooPUdvLxTjw5xwPtPQGd3hjqNz+QesjjZumQlmXIRNupgQTYaP1GeiB9+A3vSAld3 UL+MBp7FLHdFL0dyg1r5chVZLeAzH7i4gqaEOXmz0LuIXkZlgwY5qqkfIe9rr7DMAl+i FpRGzF/WlTbYeMZfPvXk+JudB7rpmg074TM2n0fOO7hHk5pnsPBZdZvb4RE3f+SXAD4H 5d3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697426562; x=1698031362; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gczzW0LaNc0vEssMkLzS11PtF6rOzBiU0Uwpu4VNOxY=; b=SGREfDVzSkp0T90rMpFYf0W3jLQDFmDpP9Lhap0U0taWMIuBHWWMFJUsaCaZegkIAw V+d54HOgZsJlMp92BKR1FblZKpY4n2uCXtRmlN/fp4Xyb594KWVK/mQii7Kusj6tlrnF fy2Nxa+tbR05aWPzqdlUjZFKuz8RwvMCUWF65tpREPbeMPLONMLopTgmwoD40T8XSonV CrZGByKTRoaLo6zG/Z/2PzNy9b6v9SzGDyVf77vMhtZjm0NoaNP6uUKbJQYeECXwhMxv Y2fWX6fJ0Abm0SMOI3uFkm39AWP/mpOdumVSNRxhsxT2pEhG6G7Fmk38Wv+sGJ7iWczL 0eoA== X-Gm-Message-State: AOJu0YxbYcY/bwpHqdp2IyI1PzdAaI23twfXwqz3u4VCxcH6SE0/eaAW 56uRLscNi4WwxCYGXORoYV1hdA== X-Google-Smtp-Source: AGHT+IFzFX6024nzxiosM9l+6Ya1tT1Cn5ggSgsLi/0x1b1M6T1Yxj2Q5/ayz8g2pgVlEQTFv5OCyw== X-Received: by 2002:a17:90a:c202:b0:27c:f282:adac with SMTP id e2-20020a17090ac20200b0027cf282adacmr9364622pjt.0.1697426562152; Sun, 15 Oct 2023 20:22:42 -0700 (PDT) Received: from GL4FX4PXWL.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d8-20020a17090ae28800b0027758c7f585sm3452770pjz.52.2023.10.15.20.22.35 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 15 Oct 2023 20:22:41 -0700 (PDT) From: Peng Zhang To: Liam.Howlett@oracle.com, corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, surenb@google.com, michael.christie@oracle.com, mjguzik@gmail.com, mathieu.desnoyers@efficios.com, npiggin@gmail.com, peterz@infradead.org, oliver.sang@intel.com, mst@redhat.com Cc: zhangpeng.00@bytedance.com, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 00/10] Introduce __mt_dup() to improve the performance of fork() Date: Mon, 16 Oct 2023 11:22:16 +0800 Message-Id: <20231016032226.59199-1-zhangpeng.00@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: xskputgwgzwguxxdixsbs969f76rwryn X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A83814000D X-HE-Tag: 1697426563-751256 X-HE-Meta: U2FsdGVkX1/r+2wuNguamyKXDEOE0oCTBFdhmARcKmaXAZU9ewlBr64D3meSLCgNA4td01QS0lEg0H35LRVFGoaj9or8vxmPbDYo3sgZI/m0iWEBmsLGn32zoVVyo0yWHrNdkjE/oHhdPyyblnUVqOF9JIjb6vY365yEKgjL04rYteFovImHnd43EW1Ph86RyE/1a3vNgg1q4zr4zztreRI43qVCj5XEgqdQ1CB1oG1TTdMjMhl8RPP2kMP2Vfz0dF1mt9eQkjHFX9BvukU3xPrgOwd66SvmPk7MioooCKTOPrxZMNyCpZDTgc7QLhALot+sQkXkOpt5mn+T0UiS/KkoafJDCTFKSBZIfv850d2AhbmbakzC8AB3twOXwPL+qBgovNlbrd2A05vODcv3t6OkGS3redI97oKpqUBlQc/Wx6hHGMEGaPDuGOSJLwYXqiAGBKfAa6y2JHLyB+lz+rodFx+FvhnlCxBmxmlcweM4lGElPEC9OeQX/Uw0ffzBOQSDW+pIchWhKFHdOcK9E5l+po0ULEtCMIJ/hs4KCLBmAzIAv1aoDcK7XmsZOI8eclw8lbVnZnxSUxpwfmzaj+Lr/HfqU2+P1cJe1QXjrQz+8yS6MX18WU1QEgieiu2SGixVfX00Zq2zs4NNf56xt68l3fHkUF4dCfZ9NuBG0aSK+JD6EwgsYF7jUedmwPet4qaS9c01StLoM+gEJz5urtJLgkauIl1sBM5LdojGyV2NeVCOGCQxOZhAXr6613QHFGhdZeCL6X1iQpQfp7yNld6KoK56ntL9M+/PivcDXexDE0hFH27uyAhqT1cOyPe6KGNcTlSrjM6e4LtfrdUzNndQ3vBoDbDl65TAi3IeqOl3KfMCoeJt6se4VXNu88RXHKxlEHUijnNpVWCC2nHj4jA8GmSoazJglkAFXD/CaOR/o6xPkrRcH7mq8mqcAJTkos2qH8MSYQjnxrVS320 qNF1kx58 LvuFSpAQpx5FysHxTB0NtHu8pslyv5IUlWkW7Rd6HulD2IQ3ADhPtrKEwsFr5o/J5Bz/WDyyKWUjsVN5FVTxX6TixBGWfIh5nNCJJ8NANMB+agQPmXOIRzAIkfoGaiIhH7HiCrBzpkTAe7mb9/rHQ5HQuBq5C6IOKo1WARHSoJpekph9QuHXTGpPb3gG28OTRcBQlRt5BQqMyMZ4mFK+XuOtQx5Golz2LCsYigJ1Z/uc4cw1W+qVFB7keDLWBK6fyf8huPm4Dtjmi1/wOlkyUOBCxz/4fjNJhNeP6w6h3rEuz+PBzV41QgrWBxDPAvXAqv/lW3F6XlS+XmqF9TzBDInjMnJYXz2JDii3gNTHGvMPeg2hPaRW/PtKDJ8mtqJkXcI+4yx1FC3TyI1areDkWF41R3vbg5PfHW6ogsHPdZYADm4M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all, This series introduces __mt_dup() to improve the performance of fork(). During the duplication process of mmap, all VMAs are traversed and inserted one by one into the new maple tree, causing the maple tree to be rebalanced multiple times. Balancing the maple tree is a costly operation. To duplicate VMAs more efficiently, mtree_dup() and __mt_dup() are introduced for the maple tree. They can efficiently duplicate a maple tree. Here are some algorithmic details about {mtree,__mt}_dup(). We perform a DFS pre-order traversal of all nodes in the source maple tree. During this process, we fully copy the nodes from the source tree to the new tree. This involves memory allocation, and when encountering a new node, if it is a non-leaf node, all its child nodes are allocated at once. Some previous discussions can be referred to as [1]. For a more detailed analysis of the algorithm, please refer to the logs for patch [3/10] and patch [10/10] There is a "spawn" in byte-unixbench[2], which can be used to test the performance of fork(). I modified it slightly to make it work with different number of VMAs. Below are the test results. The first row shows the number of VMAs. The second and third rows show the number of fork() calls per ten seconds, corresponding to next-20231006 and the this patchset, respectively. The test results were obtained with CPU binding to avoid scheduler load balancing that could cause unstable results. There are still some fluctuations in the test results, but at least they are better than the original performance. 21 121 221 421 821 1621 3221 6421 12821 25621 51221 112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393 114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599 2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42% Thanks for Liam's review. Changes since v4: - Change the handling method for the failure of dup_mmap(). Handle it in exit_mmap(). - Update check_forking() and bench_forking(). - Add the corresponding copyright statement. Peng Zhang (10): maple_tree: Add mt_free_one() and mt_attr() helpers maple_tree: Introduce {mtree,mas}_lock_nested() maple_tree: Introduce interfaces __mt_dup() and mtree_dup() radix tree test suite: Align kmem_cache_alloc_bulk() with kernel behavior. maple_tree: Add test for mtree_dup() maple_tree: Update the documentation of maple tree maple_tree: Skip other tests when BENCH is enabled maple_tree: Update check_forking() and bench_forking() maple_tree: Preserve the tree attributes when destroying maple tree fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Documentation/core-api/maple_tree.rst | 4 + include/linux/maple_tree.h | 7 + kernel/fork.c | 39 ++- lib/maple_tree.c | 304 ++++++++++++++++++++- lib/test_maple_tree.c | 123 +++++---- mm/memory.c | 7 +- mm/mmap.c | 9 +- tools/include/linux/rwsem.h | 4 + tools/include/linux/spinlock.h | 1 + tools/testing/radix-tree/linux.c | 45 +++- tools/testing/radix-tree/maple.c | 363 ++++++++++++++++++++++++++ 11 files changed, 815 insertions(+), 91 deletions(-)