From patchwork Wed Jan 8 23:31:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13931711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0D99E77188 for ; Wed, 8 Jan 2025 23:33:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE8F46B0083; Wed, 8 Jan 2025 18:33:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9BDA6B0088; Wed, 8 Jan 2025 18:33:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8AA06B0085; Wed, 8 Jan 2025 18:33:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8BC706B007B for ; Wed, 8 Jan 2025 18:33:10 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 270D1AF49E for ; Wed, 8 Jan 2025 23:33:10 +0000 (UTC) X-FDA: 82985887740.20.191A281 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id C79DDC001A for ; Wed, 8 Jan 2025 23:33:06 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CDtjjtNW; spf=pass (imf10.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736379188; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=GToyGf4CfAXG8e+haEf5ADNNDX3ORMNgxBwzAGbkC/A=; b=ZaYW8RkAjGlwzD7ftfteZTPl7BUMIwuGFsi/C+1V9cbimRUunrFlPnZyPwvXUs9c/iWnSt /XXS9BId8BXVaAxY9Koz91wn1XeTzmfRJrVr5rIxtacBWtFgiK3ilRajrsrB0vYlw2x2f0 DF5XDdDHwRDjEUs9h7/uR2fZ4fcY+VU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736379188; a=rsa-sha256; cv=none; b=4FndXv1wdFG3wB809PcrZ3po1MZCKTgmBcLhyGnb8PMwF9BfaxhMjN5pj30lX13PUddMfc ZZT9kjz8rJ1xDEqIdimt3W+uSZw3PgdI59uvAM+iBTfS0QOA9WKdXosLGGzJZDLDhrRNl8 /pHmk/og7rQDijaTlZPXWxGOzISC1VE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CDtjjtNW; spf=pass (imf10.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=GToyGf4CfAXG8e+haEf5ADNNDX3ORMNgxBwzAGbkC/A=; b=CDtjjtNW260SkufR6GBDPtJhUTn+2XDOyJoqFakvtPGtATCz1Tj9Lwor+JijQBYvCbQRFD BTfvR5uVmmc3lBzN9H7HrHMkYYLLzdpZYd4gXM7eoIrpklyiU+iNVeBq7HIJb7FUG8AvLk 4l37INKl3iU3RRfQoA3HZcFukGqLjEY= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-360-sav41ajtPjuGhqnCZ7cZjg-1; Wed, 08 Jan 2025 18:33:01 -0500 X-MC-Unique: sav41ajtPjuGhqnCZ7cZjg-1 X-Mimecast-MFC-AGG-ID: sav41ajtPjuGhqnCZ7cZjg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 321CD1944D05; Wed, 8 Jan 2025 23:32:52 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 14A8519560AE; Wed, 8 Jan 2025 23:32:41 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 00/11] khugepaged: mTHP support Date: Wed, 8 Jan 2025 16:31:16 -0700 Message-ID: <20250108233128.14484-1-npache@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C79DDC001A X-Stat-Signature: z8daxzjhq61t3hyf83ixtk8m5txj34db X-Rspam-User: X-HE-Tag: 1736379186-279653 X-HE-Meta: U2FsdGVkX19R6YDi2MA3KKI1pk1M8wpk0VCyK3oHGDh27UdBNxRLC5BzZkGsBeQVzF1f3xcygzHGbbZoJv1TR36wEyH7rGtW1bbqNqol7eXAXfZXo7egWjieIiHlWglO4JKDfuEtvIs63GIn6g9glAw4cdTB+GHJ0tLHCrngwOyoyBNXp2tN1h3ELfQndrnLAcf+NMLn36AMix21wegaraiVU9HnZK4hVfbQXHghUWQpMM4bL0/m55i1uoPpengazBE0zeVtknc3CKxlAULQYZxuT/c/P36HMUomdsXKmWQfQHo3WtKeZ/5D2+AjscLWQIvq5Ifwt0ejCZtgydUWH4y0htv2d17gaockbICYSvLEivrB6/n8padbKH/bWwNMX7y22yQkDFHb8MlvOznOXyIlkT0xa2EjvUU9aQW605e0l7En9HiCa07KusQYpx7sUY38hisQ9MXi2fDbDXHOGCtR8SYTUi4w4iNYvwpmI+SLYwA0lB8dwOw+aT8myAY3tlHVVSm39kQC8aaj67I4qyuKdFAc5jMxEZqvsFfJs0zT0Hl8ZtDFnOz63aGvz5ZuW1jH1mNgmwFdWFYm40AKZI4rRQ9NoJbMLMKlOvb7LeUuh/ZN8iyT9XjLUHgIJwgsRTeuN/wGPceK2Z0o7o7HjaaCeq/WUXo+vYqzbZEkzmydL26crW49196cVGogp/APDpqsk4AgUvulYqVBVy6HpzcuiENPyVwsIZ+0pGmJ6YlojypzVpUGjdNcTR/fDMgElTKa1hNMqBcbZ3Of3m1y0dhc6crCVVcJ3ueKFtRzehzAX2N+Lovu4bZmTrN5Ipu1WDAiLHMlP+syzEwY9i3bjURVk9WqX4cFPGtjyIvZ3RMekRPsnoK5RgfkMcWKlLNoUWMZ9tgQ0WoJm/U8fbR9JXlHY4ExYl/ul4JezY5z3AvrzruSKh6ljVfUx19xvX8wB0b3c4N3Ja9KqjCvDT/ hyftJqJx +ER5WkcKO3YWhNMahLQNjTg3oVDNxECzwE5dYGVkG/WzuKrb8yCHVCVM1bYFJuic+w7WELCkFmdb0cUI9Bhn8KAKFAAL4h3bkYCzpHb3El0FyF7NdxA5RTnV6S2Vtr6IaHxWXsHAwhf1GFFkOzuafV9oxC1yTzTsGhJhlM1ykUheqPVjwbvKYmtjIwAx6fhLA2sR8kaH2TQvisVkArVt+PjbJQXDIAJT8/7QCkeMeT8z80pyg2ALEXG6BrMz650zqNsR+nZ/PBnCdKAJX9sTO64sGrmtq4/vUJWJI8rmmDU5eSYczDtWdleLqeZVg61ioMDEoeILGlSOkECKh2X0SLHGB4lojurNO4vWsWLeKsp/8rvTrJLJFhJTq3wTOV7nxC2EYrWJOWxHCDHl1V+D3T5OTJ51glpgfH7Q6h5Ybf4wuF+px/U7vNOKaeQFiZUQdU9FPFUIs3J7E6Qc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The following series provides khugepaged and madvise collapse with the capability to collapse regions to mTHPs. To achieve this we generalize the khugepaged functions to no longer depend on PMD_ORDER. Then during the PMD scan, we keep track of chunks of pages (defined by MTHP_MIN_ORDER) that are fully utilized. This info is tracked using a bitmap. After the PMD scan is done, we do binary recursion on the bitmap to find the optimal mTHP sizes for the PMD range. The restriction on max_ptes_none is removed during the scan, to make sure we account for the whole PMD range. max_ptes_none is mapped to a 0-100 range to determine how full a mTHP order needs to be before collapsing it. Some design choices to note: - bitmap structures are allocated dynamically because on some arch's (like PowerPC) the value of MTHP_BITMAP_SIZE cannot be computed at compile time leading to warnings. - The recursion is masked through a stack structure. - A MTHP_MIN_ORDER was added to compress the bitmap, and ensure it was 64bit on x86. This provides some optimization on the bitmap operations. if other arches/configs that have larger than 512 PTEs per PMD want to compress their bitmap further we can change this value per arch. Patch 1-2: Some refactoring to combine madvise_collapse and khugepaged Patch 3: A minor "fix"/optimization Patch 4: Refactor/rename hpage_collapse Patch 5-7: Generalize khugepaged functions for arbitrary orders Patch 8-11: The mTHP patches This series acts as an alternative to Dev Jain's approach [1]. The two series differ in a few ways: - My approach uses a bitmap to store the state of the linear scan_pmd to then determine potential mTHP batches. Devs incorporates his directly into the scan, and will try each available order. - Dev is attempting to optimize the locking, while my approach keeps the locking changes to a minimum. I believe his changes are not safe for uffd. - Dev's changes only work for khugepaged not madvise_collapse (although i think that was by choice and it could easily support madvise) - Dev scales all khugepaged sysfs tunables by order, while im removing the restriction of max_ptes_none and converting it to a scale to determine a (m)THP threshold. - Dev turns on khugepaged if any order is available while mine still only runs if PMDs are enabled. I like Dev's approach and will most likely do the same in my PATCH posting. - mTHPs need their ref count updated to 1<