From patchwork Mon Apr 7 12:47:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinjiang Tu X-Patchwork-Id: 14040461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61F74C36010 for ; Mon, 7 Apr 2025 12:57:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79D166B0005; Mon, 7 Apr 2025 08:56:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74C086B0007; Mon, 7 Apr 2025 08:56:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6154B6B0008; Mon, 7 Apr 2025 08:56:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 430906B0005 for ; Mon, 7 Apr 2025 08:56:57 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 795A7C0D65 for ; Mon, 7 Apr 2025 12:56:58 +0000 (UTC) X-FDA: 83307247716.08.A4D6956 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf18.hostedemail.com (Postfix) with ESMTP id 6DBA51C000A for ; Mon, 7 Apr 2025 12:56:55 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744030616; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=1NM+YCaMoNv2ketVraUYsuDLdxM09nsjK4JIOCFD91U=; b=r5pl0oebmQsL0aEeCT1AdIEcQRHo5HPxuE745GtipagctSDkRUPEWMMTDfdwABCDSwpJvr XmqAeMyPnPN0xhkSoIvWeyu6O6PQFxjLST7Qi0QbqkjnhwsBeET6ML/l+gXJw4mTmEDEoX 29uZ4pkXH3L5PUscXgC7cDaCqljOE7g= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744030616; a=rsa-sha256; cv=none; b=0Xbf6LjsdXeu+OjOuDmUyerzmkyq4F/MX+RXI2jQ6oTueGLlRm50ZmTCdZSiHFKMq434BR +FxN9cIyXjcwsegfhfREQq9gG8DQqLB/R6n5VEJNbIqY1sEofTP+kGOP2szX6UIDVttNO/ zjbOXri4+OfIYLCo8OH4g5xiwK78Ohc= Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4ZWTfH4QFkz2CdTw; Mon, 7 Apr 2025 20:53:27 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id F01B1140275; Mon, 7 Apr 2025 20:56:49 +0800 (CST) Received: from huawei.com (10.175.124.71) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 7 Apr 2025 20:56:49 +0800 From: Jinjiang Tu To: , , , CC: , , Subject: [PATCH v3] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages Date: Mon, 7 Apr 2025 20:47:06 +0800 Message-ID: <20250407124706.2688092-1-tujinjiang@huawei.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Originating-IP: [10.175.124.71] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Queue-Id: 6DBA51C000A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: iwhpfio4n33qr7bdawaoaifbipk5o4wu X-HE-Tag: 1744030615-305548 X-HE-Meta: U2FsdGVkX1/E+SHunlViJeInk6A+Z/tSUxWa11L/kjhbRhfUPy4MSHNqPcc0Fe2N8EhuIJX0oxWfaVJ0GcXM1XNcEyV7blJqM3dGAmmNkhCIHtO2+mL8zalhKkY/e1tKvtBP33J6db5BRw7DkxCbWr+JXEfOb60zoxAZRPd0z+x9jX93JxEigQrH3qx3EszSZgpa8wCD3apxLXayrB8ospMNJsL9tbw0z4Up639hbCievDIiqzOpPHe+Ldx6Nd51fZbsOHGoLiC1cVsz/vew+O2wa7n6MAvQgBSG978JMPoTpUYaV9XQzLsiSbCGWCZgfL0RkZVf9p6IWoU0+K6c1HgnofZ45LlnY6tFv2bamRZ6rJfBjkbds5S0Cv99kAxGCSvEQ0Ei5vfhh2SZTgU21TIEj88HNInNT1cW9+6VC0ujxrET7A+Fwe0HLNSwh+1wNPXBsj7cIU7UTdZ/U+vrnIccaNq4Qt+jbGieu2GhaziHOOcZzO1FZeRL10zF/VIS39HVJzr7vDpa7BLOReJYBoozNCHP9UYjdY/t28eBWHzHY8h2V8PveoIvGz/WZOiHvn8vVy/Qo6/E+0fZ+wrAygbcpyzPXik6Uy06z8E9NgtLk8ld9xzGV6d9H4uBSp57c+GFK0iv9G0nsRvHlPRRmL/h88iem++fuaYnv3/yYbkjjdXkwfSxOHPJUiGjZwiBA0oRtyRZvDy9NsPN9uIXzaa7NDD5SaOoxqbwVUS896btyKhKcYmhj4QYFr1wymciUxY4zcTwNMaJXjrfYdMGgvwjtxgBE5rdrLp/3waq2FL5C/aguk090jvnTBsU0mQuUrYemD+mDUClg8pZrV77jXrh4mRoaBQwRivwLd9GH5NuPKuLaXufraVl7PIQphfmdgorISXViGPwkhfsqlSOPpZ0Bj6hYscipKMiAcvDh8SYVNxDlPaATfetmpz9UVuWNpymlugDFo2jm4LRm4h A2SxfMiA CmZwioUSArltMR/5lG9rbHh5yQmjrQWCy5ZBypJTk7gHkEwmoNtlJXxpCmm0YuWR7u5gaMjOZHWM8+n8cFWPO5QS7F2mmHp7nJFklPfpdWMG7oY4JY2jsNjNIqL+aZigpd7cWiuqXXJ5ZncELny1sKax1zMIOGk3EnkTOmaYRyN37sNLKtAmwc8YJPC2+pUoQHOPt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In set_max_huge_pages(), min_count should mean the acquired persistent huge pages, but it contains surplus huge pages. It will leads to failing to freeing free huge pages for a Node. Steps to reproduce: 1) create 5 hugetlb folios in Node0 2) run a program to use all the hugetlb folios 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 hugetlb folios in Node0 are accounted as surplus. 4) create 5 hugetlb folios in Node1 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios The result: Node0 Node1 Total 5 5 Free 0 5 Surp 5 5 We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(), hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and treat it as surplus. If we directly subtract surplus_huge_pages from min_mount, some free folios will be subtracted twice. To fix it, check if count is less than the num of free huge pages that could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb folios if so. Since there may exist free surplus hugetlb folios, we should remove surplus folios first to make surplus count correct. The result with this patch: Node0 Node1 Total 5 0 Free 0 0 Surp 5 0 Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes") Signed-off-by: Jinjiang Tu Acked-by: Oscar Salvador --- Changelog since v2: * Fix this issue by calculating free surplus count, and add comments, suggested by Oscar Salvador mm/hugetlb.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 39f92aad7bd1..e4aed3557339 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3825,6 +3825,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { + unsigned long persistent_free_count; unsigned long min_count; unsigned long allocated; struct folio *folio; @@ -3959,8 +3960,24 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, * though, we'll note that we're not allowed to exceed surplus * and won't grow the pool anywhere else. Not until one of the * sysctls are changed, or the surplus pages go out of use. + * + * min_count is the expected number of persistent pages, we + * shouldn't calculate min_count by using + * resv_huge_pages + persistent_huge_pages() - free_huge_pages, + * because there may exist free surplus huge pages, and this will + * lead to subtracting twice. Free surplus huge pages come from HVO + * failing to restore vmemmap, see comments in the callers of + * hugetlb_vmemmap_restore_folio(). Thus, we should calculate + * persistent free count first. */ - min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages; + persistent_free_count = h->free_huge_pages; + if (h->free_huge_pages > persistent_huge_pages(h)) { + if (h->free_huge_pages > h->surplus_huge_pages) + persistent_free_count -= h->surplus_huge_pages; + else + persistent_free_count = 0; + } + min_count = h->resv_huge_pages + persistent_huge_pages(h) - persistent_free_count; min_count = max(count, min_count); try_to_free_low(h, min_count, nodes_allowed);