From patchwork Fri Jan 24 21:37:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luiz Capitulino X-Patchwork-Id: 13949945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7577BC0218B for ; Fri, 24 Jan 2025 21:38:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 618D06B00B1; Fri, 24 Jan 2025 16:38:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C9EF280079; Fri, 24 Jan 2025 16:38:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 469BE280077; Fri, 24 Jan 2025 16:38:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2B0506B00B1 for ; Fri, 24 Jan 2025 16:38:15 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D129816192D for ; Fri, 24 Jan 2025 21:38:14 +0000 (UTC) X-FDA: 83043658908.21.B705E91 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 1F05140010 for ; Fri, 24 Jan 2025 21:38:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JewSsHSh; spf=pass (imf01.hostedemail.com: domain of luizcap@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737754693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=upr7vHxw+moDnUyoNmdOrdtldZZRztXFPosSk1iZLLs=; b=IxQFpaFJIz3+fzEU4c4s/CrA18jT4l4nA5zlu7cDTkGTjbji4+7TraXrJToLILoJ/fyuQe rZ/D9cmwxO0nTgJJ18LAH0NmqJ32L2tbGgaB4zEmm+HeXz0djF7CRin/Zlg2yC4itCN/+l r79FgdeYCG4Zpb7LLv83dkRFcPG+B/k= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JewSsHSh; spf=pass (imf01.hostedemail.com: domain of luizcap@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737754693; a=rsa-sha256; cv=none; b=G8t3jdxNi8dRUURKXvXPyvQQiyq9DUhR1LsLV1JBGmFxzYnvdgtYLu7GY0pPtnlFqn28Uy ztxoMX7b9IQD257Vakaen3ZyvDk66TanhSxZToL3DX5xPh+6FFlt+RnzVZRuMEhduPkOyE uCJNrU0P0fX/PYyTBrYcWOoZukc1Yr0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737754692; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=upr7vHxw+moDnUyoNmdOrdtldZZRztXFPosSk1iZLLs=; b=JewSsHShS9pEh8yKkuIpfHsLz4pdK4+qnrW0glElcKOro4NxZCCYT1UIh575nnaoO3n0Dz wZmLoE5TjLgEwJUcyKYrYC6Y/zc0JoUk1xPs351lFGKdrFGlomcm6G/+bb6bto5VddBVd7 3Bnvhq49wiX2udJQ2pfqL6eckuSL2mM= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-320-TwaOxa-OPeiDnT_sg6coXw-1; Fri, 24 Jan 2025 16:38:10 -0500 X-MC-Unique: TwaOxa-OPeiDnT_sg6coXw-1 X-Mimecast-MFC-AGG-ID: TwaOxa-OPeiDnT_sg6coXw Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 901FB180036E; Fri, 24 Jan 2025 21:38:08 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.22.81.148]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7E739180035F; Fri, 24 Jan 2025 21:38:06 +0000 (UTC) From: Luiz Capitulino To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, david@redhat.com, yuzhao@google.com Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, muchun.song@linux.dev, lcapitulino@gmail.com, luizcap@redhat.com Subject: [RFC 1/4] mm: page_ext: add an iteration API for page extensions Date: Fri, 24 Jan 2025 16:37:51 -0500 Message-ID: <70bc5513e599d3386533fcc25dfe33685d2ca1bb.1737754625.git.luizcap@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Rspamd-Queue-Id: 1F05140010 X-Stat-Signature: he8dw95hn3pyc6cc1b1kk5npbihpqy56 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1737754692-614470 X-HE-Meta: U2FsdGVkX1+gqTjod4YyfQnJvVrYW4FOMTy/4tOTUOzswlss+yqXBZYZ2UvMS42aA+QY3veUipbBYdDAqn5Dr4t9fXIBHvVtXPbPsyH6QiR1ZBd1D8iOx9mPgi2aC3jSm2JN5zA4YCLxO2yjImKCt78a6BFufs109WY1F7SQAMDvUlKTu9XYCfZlNSqbfmy+3nZqR8+4Iyr9mOBJZqiLjmjMkzgd3s1yPIYj9G6zSARsd4eVamLvdUuWWbwBf4VKM2qSQKzoItgXXx8jczdnoRASCzO7EIROOT/TbU66Jpwza4CWiZHUBqlCmnSseaO6ZcaQ0P6xg3dJmIAxQFXq/E/p5n9H4Sc2AfgCMPUD4TloMQ3ItGkAIdqryyLt/7h9vj9Lqy47ZiSN/Yvu/IffIfFDaFCUgMO+t0Lski2aLKqBATAsQb/iiACaMWUYi8kEdRYlTsmGds8OABA1ad9mCn8cqd5txRNIEcDJKZ3/BbK2XTBkqkbvsiiw1EFkcCbBaQpj2KRzc7O0FC31I00USnxRXqXV8sC9CDWlzrts/qy44jehd2v4VG2EiJKsgnGAhGXD3hwj8JSH7r7gHy3EqE0CZhnPPQpEBNmiDr02qF+XFSOTsSaglPAWe3/lnhzl4Wr5atbrgciqAyRVQDF+cK/9GaC/N8yIxaF1MvQjCtNZnH0mqnXj2uzkE8RpPRLG/D2ooMZ32+h6u7oi+S/3DPwF6kX4CYOdOR4k3MqvClwrlNd39FGdsRLUl1qp7uJZ5xrW8YnQ2oYSFZci4Ybx/00EMIEHlpWbmMORYF3MbJIpjc8DDSoMWPx/08VNjEpWOsBmF7OL7Dlhe5x2/G5EzsmkZT4ntDvq8oQHppZlQv+wisLdrXxULPvG8FjObF1ovCrB6e8kYb7y/DgTPwbxAOLAB2WnD92AtRrnyvOIncgyVgy5VGbDG8Am6IuglKT/9vxzO8Lqjr4RnoilBCV ULocoBIz NY4zAZxZur/NUxIPW2uRNp1jEWfJMoJrIcdXVMp7zOYXkDh2BXYxC0aGAMNrJa2Gi4g41ccjq8x7RytycljGu0v1Tl8J9eYAzQQ754VI4fIlO7fYTEgtsui6NqXZEab3cE0pXJaYROnNF4EWzGC98sWgrDHtdUgqW8Y5sCg/yPI6/nsSn0cf6uChA5+5sfLJefom85wIvqo2QQlNUCJP5nn7Abs5n+3NqHKwOb8ogLixeiuFEcgN6MBkYiGyB/0ZBNnrPP9+WyqeSaPjSpLCOIF1gvYkC2E73MMGKrtZvRliQtwB5AM3DIpmkwGElHqbhN/bc6gcY4/c1I4yJ2qmO7k3VH1T3bBWgSYxx/dqn19V2U90kvwetWUU+6Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The page extension implementation assumes that all page extensions of a given page order are stored in the same memory section. The function page_ext_next() relies on this assumption by adding an offset to the current object to return the next adjacent page extension. This behavior works as expected for flatmem but fails for sparsemem when using 1G pages. Commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP") exposes this issue, making it possible for a crash when using page_owner or page_table_check page extensions. The problem is that for 1G pages, the page extensions may span memory section boundaries and be stored in different memory sections. This issue was not visible before commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP") because alloc_contig_pages() never passed more than MAX_PAGE_ORDER to post_alloc_hook(). However, the mentioned commit changed this behavior allowing the full 1G page order to be passed. Reproducer: 1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line 3. Reserve one 1G page at run-time, this should crash (backtrace below) To address this issue, this commit introduces a new API for iterating through page extensions. In page_ext_iter_next(), we always go through page_ext_get() to guarantee that we do a new memory section lookup for the next page extension. In the future, this API could be used as a basis to implement for_each_page_ext() type of macro. Thanks to David Hildenbrand for helping identify the root cause and providing suggestions on how to fix it (final implementation and bugs are all mine though). Here's the backtrace, without kasan you can get random crashes: [ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298 [ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598 [ 76.066714] [ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3 [ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10 [ 76.088972] Call trace: [ 76.091411] show_stack+0x20/0x38 (C) [ 76.095073] dump_stack_lvl+0x80/0xf8 [ 76.098733] print_address_description.constprop.0+0x88/0x398 [ 76.104476] print_report+0xa8/0x278 [ 76.108041] kasan_report+0xa8/0xf8 [ 76.111520] __asan_report_store4_noabort+0x20/0x30 [ 76.116391] __update_page_owner_handle+0x238/0x298 [ 76.121259] __set_page_owner+0xdc/0x140 [ 76.125173] post_alloc_hook+0x190/0x1d8 [ 76.129090] alloc_contig_range_noprof+0x54c/0x890 [ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8 [ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368 [ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150 [ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8 [ 76.153787] set_max_huge_pages+0x364/0xca8 [ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0 [ 76.162829] nr_hugepages_store+0x108/0x118 [ 76.167003] kobj_attr_store+0x3c/0x70 [ 76.170745] sysfs_kf_write+0xfc/0x188 [ 76.174492] kernfs_fop_write_iter+0x274/0x3e0 [ 76.178927] vfs_write+0x64c/0x8e0 [ 76.182323] ksys_write+0xf8/0x1f0 [ 76.185716] __arm64_sys_write+0x74/0xb0 [ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0 [ 76.194412] do_el0_svc+0x164/0x1e0 [ 76.197891] el0_svc+0x40/0xe0 [ 76.200939] el0t_64_sync_handler+0x144/0x168 [ 76.205287] el0t_64_sync+0x1ac/0x1b0 Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP") Signed-off-by: Luiz Capitulino --- include/linux/page_ext.h | 10 ++++++++ mm/page_ext.c | 55 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index e4b48a0dda244..df904544d3fac 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -93,6 +93,16 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) return next; } +struct page_ext_iter { + unsigned long pfn; + struct page_ext *page_ext; +}; + +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page); +struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter); +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter); +void page_ext_iter_end(struct page_ext_iter *iter); + #else /* !CONFIG_PAGE_EXTENSION */ struct page_ext; diff --git a/mm/page_ext.c b/mm/page_ext.c index 641d93f6af4c1..0b6eb5524cb2c 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -549,3 +549,58 @@ void page_ext_put(struct page_ext *page_ext) rcu_read_unlock(); } + +/** + * page_ext_iter_begin() - Prepare for iterating through page extensions. + * @iter: page extension iterator. + * @page: The page we're interested in. + * + * Return: NULL if no page_ext exists for this page. + */ +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page) +{ + iter->pfn = page_to_pfn(page); + iter->page_ext = page_ext_get(page); + + return iter->page_ext; +} + +/** + * page_ext_iter_get() - Get current page extension + * @iter: page extension iterator. + * + * Return: NULL if no page_ext exists for this iterator. + */ +struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter) +{ + return iter->page_ext; +} + +/** + * page_ext_iter_next() - Get next page extension + * @iter: page extension iterator. + * + * Return: NULL if no next page_ext exists. + */ +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter) +{ + if (!iter->page_ext) + return NULL; + + page_ext_put(iter->page_ext); + + iter->pfn++; + iter->page_ext = page_ext_get(pfn_to_page(iter->pfn)); + + return iter->page_ext; +} + +/** + * page_ext_iter_end() - End iteration through page extensions. + * @iter: page extension iterator. + */ +void page_ext_iter_end(struct page_ext_iter *iter) +{ + page_ext_put(iter->page_ext); + iter->page_ext = NULL; +}