From patchwork Wed Jan 8 23:31:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13931720 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B51DE77199 for ; Wed, 8 Jan 2025 23:34:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A55236B009D; Wed, 8 Jan 2025 18:34:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A055E6B009E; Wed, 8 Jan 2025 18:34:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A5C26B009F; Wed, 8 Jan 2025 18:34:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6C8FE6B009D for ; Wed, 8 Jan 2025 18:34:18 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 297F616153F for ; Wed, 8 Jan 2025 23:34:18 +0000 (UTC) X-FDA: 82985890596.19.744DBE3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 3719520009 for ; Wed, 8 Jan 2025 23:34:15 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gUafi8+B; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf03.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736379256; a=rsa-sha256; cv=none; b=qcmzW1cuP4ht8wU19WkylbF5wyQMuldscrSxT2INxHZxipHKoBCb4cQBYazjEg/nUQocxC HO7xFgNfK09KvYPE0o+CluFX3VgrmGjF7mIy95TFoDZ2l9I3keg/lnz7O1s0+tVot8M9DZ 87oP0QcOi9KdQnJTx5Ix8YnoqeFJLx0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gUafi8+B; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf03.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736379256; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0aDz9njSBgcFlph30zgTLUgh8iNh2Ug8pSZWMj0+9ks=; b=196LOk/SAeByJvex/iGVxLWizRiFx49MOWMO/5GaQ9JtJEVFwrQPTgyLSAwMQEvHvWaIO0 SGdnKS+0Qcsi1uX64YmfgVAIR5x8yyW4XauIul4yvXUfaiospbLSxWkOnaVpIONI4gvIW+ YH2huquiQuelEurs6RNOdiFD6kmkdrE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736379255; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0aDz9njSBgcFlph30zgTLUgh8iNh2Ug8pSZWMj0+9ks=; b=gUafi8+B82jjuGIstSsxzhZWVG0RhjE6TYuIV7m8rzG41QoGxyrptQX3NWZt5t6BM4IY5y KcMyejeNLHw3nD0wpAKMUYeuk7K35onhGPVz91LM9fZ1NDxlSqIoh8bDLe5qFypcjTmeDJ WewfSMKfdaZuy0sSdVxa3nqRHKm2tg8= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-682-IzrUvAYLNmSq9Wj_N5ghYw-1; Wed, 08 Jan 2025 18:34:10 -0500 X-MC-Unique: IzrUvAYLNmSq9Wj_N5ghYw-1 X-Mimecast-MFC-AGG-ID: IzrUvAYLNmSq9Wj_N5ghYw Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8C0DF19560B1; Wed, 8 Jan 2025 23:34:05 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.80.41]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2718F19560AE; Wed, 8 Jan 2025 23:33:57 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org Subject: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support Date: Wed, 8 Jan 2025 16:31:24 -0700 Message-ID: <20250108233128.14484-9-npache@redhat.com> In-Reply-To: <20250108233128.14484-1-npache@redhat.com> References: <20250108233128.14484-1-npache@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3719520009 X-Stat-Signature: mudzbxrh7f817qkwtprg9p4tpu44isr8 X-Rspam-User: X-HE-Tag: 1736379255-702926 X-HE-Meta: U2FsdGVkX18I2Whn1ro4mLuE+KFt1O0mR+lTneFmFqZ/ZcJ9q6fMEoy5gwCbAccnTjyHCerx7JfbeAB4N/lrFnulrYwiOZfoT8yiyQNHeg+47dTqZJwzoNE63tQfisBlvCfTPplhX9flVG0KzhyfMTK6E8Yn4Q0j9gilIvszox4/qt0ZbxUmncOLSy1uCB8CzN9OVCIn91d5Lcree7J8B1MRZ3PGg4mbontTYBalOX2Fm5HovgYWS135j9KLvR5lUEVEHJUOhMniOBK2PeJrdwmkAkjMkCuHZxRH8bQigf0Fy6y8ku9QGGaIFdZ5Sw5EXikVwXhpViFwhtWB0kPNNQ3bgGE34m/7uqcatYbl2F4lg+a/QdB4rkYPe/uOc/0jpXvmU4V7QYiMk5OyKHLu1wsl6//ixJo6Okdggk8Qsg+6R5e7RUCTjAn+hr6JFFzxM1V3CF3jhn9v6Yec4n8fHQrMRn5hipsyubB/JAn9cJeGzxkl/ox1DGt42JCqqVYxRPZlfnRyGgFoyOCzpuiZ0Dr5ighcLqq+qlxr5lfylChfcNO7BzvAvtrCk8WeUWrxXrWDCex0IqpQJWgfQzrzIsTvZHHM9SGE0B8bxEyw7Yqq9lg/SDzPt7LvJiG1du7qr7LQahTpy0fO84/t9A012skDdhavpZGrRTi/HFMieK2ijP+Rmn3xv01orth3jlNUBaG6YZbYlXBJsgJXWSKwM20LIUC6VQl3EagDgvoANH2e1WW4wUfgYv3vFo/YnZhL2ktNP1P22HBPHxGTccQppcmfx2zIoZXg87fnbG67qBBUjzy1Nf5ayEMQUFzfgWLBl8EjvyCV++Wbh/5SWsqD3RykzNyMO52t3eCV7Ozvg4IkTkHuR0XP+4jIF5lTwi8Mt8h/DNpMh2KfmQMljscUp4SXnbZt1uWePDLWr5ILRP+7JqyvFO9beK5JuWOLdx7oKwOBbYwzOM3cC7qKOGL NGf7IPyb Uoqb937iSKPOTcsA/Nyb+dbAbNdZJFoAYyMCAO7eUEQE7bBFoBdBhh7Paqhywh44MDZ7//ZkK86apKh6PeudJud9ztOQCyNOZv6M7/vymfIZWQ6pxiR5PonUu5Gwj8jaCUrwHZtda9yhbjWix7Sq3b5O9BGfWaLIrsAGjdP8GhM0n5v7S+nVcCBwsqg6j4FYG3e+LaofxOVIB6qpHMBfYwAQrlXR54ehzyG9lLPjkCeD0v6Z+oAIPRmtKKJeg5Zlff4mOvD7HuMDWrrwsHgd8evx1APhcp57lW/irO48U5+ndan4OQCkpRG9tCvQJyxvaHyIS/5FNHLSmoiCnEaoooRtIwu8xtkEv8ea3g/MaPZyI4J0QYhcrXUcsonn2iq/LoDg/iuP5yh5YW6G55wGku7/aTRnAkjOldhYItB7EZpr2Gp77C5wOVYr+a5ROmnbMdMtl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: khugepaged scans PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of fully utilized sections of the PMD. create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks. by default we will set this to order 3. The reasoning is that for 4K 512 PMD size this results in a 64 bit bitmap which has some optimizations. For other arches like ARM64 64K, we can set a larger order if needed. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of fully utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the PMD. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 +- mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++++-- 2 files changed, 126 insertions(+), 7 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 1f46046080f5..31cff8aeec4a 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,7 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H - +#define MIN_MTHP_ORDER 3 +#define MIN_MTHP_NR (1<mthp_bitmap_stack[++top] = (struct scan_bit_state) + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 }; + + while (top >= 0) { + state = cc->mthp_bitmap_stack[top--]; + order = state.order; + offset = state.offset; + num_chunks = 1 << order; + // Skip mTHP orders that are not enabled + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1) + goto next; + + // copy the relavant section to a new bitmap + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + + // Check if the region is "almost full" based on the threshold + max_percent = ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100) + / (HPAGE_PMD_NR - 1); + threshold_bits = (max_percent * num_chunks) / 100; + + if (bits_set >= threshold_bits) { + ret = collapse_huge_page(mm, address, referenced, unmapped, cc, + mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR); + if (ret == SCAN_SUCCEED) + collapsed += (1 << (order + MIN_MTHP_ORDER)); + continue; + } + +next: + if (order > 0) { + next_order = order - 1; + mid_offset = offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1430,7 +1528,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { result = collapse_huge_page(mm, address, referenced, - unmapped, cc); + unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked = false; } @@ -2767,6 +2865,21 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, return -ENOMEM; cc->is_khugepaged = false; + cc->mthp_bitmap = kmalloc_array( + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long), GFP_KERNEL); + if (!cc->mthp_bitmap) + return -ENOMEM; + + cc->mthp_bitmap_temp = kmalloc_array( + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long), GFP_KERNEL); + if (!cc->mthp_bitmap_temp) + return -ENOMEM; + + cc->mthp_bitmap_stack = kmalloc_array( + MTHP_BITMAP_SIZE, sizeof(struct scan_bit_state), GFP_KERNEL); + if (!cc->mthp_bitmap_stack) + return -ENOMEM; + mmgrab(mm); lru_add_drain_all(); @@ -2831,8 +2944,12 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, out_nolock: mmap_assert_locked(mm); mmdrop(mm); + kfree(cc->mthp_bitmap); + kfree(cc->mthp_bitmap_temp); + kfree(cc->mthp_bitmap_stack); kfree(cc); + return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0 : madvise_collapse_errno(last_fail); }