From patchwork Wed Apr 9 05:59:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinjiang Tu X-Patchwork-Id: 14044096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DF7FC369A2 for ; Wed, 9 Apr 2025 06:09:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 112976B0125; Wed, 9 Apr 2025 02:09:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C1FB6B0135; Wed, 9 Apr 2025 02:09:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF1F96B0136; Wed, 9 Apr 2025 02:09:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D01FC6B0134 for ; Wed, 9 Apr 2025 02:09:53 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E204EBCE39 for ; Wed, 9 Apr 2025 06:09:54 +0000 (UTC) X-FDA: 83313479508.30.8D5472E Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf21.hostedemail.com (Postfix) with ESMTP id E59D61C0009 for ; Wed, 9 Apr 2025 06:09:51 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf21.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744178993; a=rsa-sha256; cv=none; b=wg00BZhqlao3bdMIh/plvS0eUp4u1hhHmfeW7AY6IumcLJQmkF9xMUXMxtqK4mG77gCDX4 gJ6NhH0eNcVXeiKerRMXbaEVFgbFVQ9MVwxmm09nvZGF6ymZzChMKuiAJO8NHSEpNO5oVJ kvu6S0DEo94uuNjk+IvlM8lOKQ9Hmr4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf21.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744178993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=SAGSZQaU74G4QMy5C0Pde3nVbup1bmhvHp1VkgCC5UA=; b=zngLWTkYy4XyNkwcO48w80xtrNs7gk0iAslJ8ducRV/RhCIfUgVrAGWxkSOmcBUgMVaVKJ sY755X74+M6M5YCo1mxoYVULTQMaUr647pvSDqE3PBANp622pwJ7GazTL+d0IRlIIQLDm6 3K0AB9Qvl7MhdkkmATbqwqLmQGjvmc4= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4ZXXcM4BVnz27hjZ; Wed, 9 Apr 2025 14:10:27 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 2751118001B; Wed, 9 Apr 2025 14:09:47 +0800 (CST) Received: from huawei.com (10.175.124.71) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 9 Apr 2025 14:09:46 +0800 From: Jinjiang Tu To: , , , CC: , , Subject: [PATCH v4] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages Date: Wed, 9 Apr 2025 13:59:57 +0800 Message-ID: <20250409055957.3774471-1-tujinjiang@huawei.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Originating-IP: [10.175.124.71] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E59D61C0009 X-Stat-Signature: nemi1nuub9b83hyqspwf38ar313m85md X-HE-Tag: 1744178991-376810 X-HE-Meta: U2FsdGVkX19TUPVfZGxS/6agpWN6T8bb0s9kYI+fm4yC3A5w7joUZygROb1DTTZIBQprI5WuhJVeqzXypcP0sJkzWY7ikkQumnxjPwID97ZbHkS4OO2pqQABRzwEirt4y54PI6rLuhfT9vIR4z/fwFU7/dEH7nsc+/Zt38Gu+YPNzB/YoWWqteo9LrZfTO0Yu5U9iaKFcgVZ8yqYr9G1SkD5k5TGWgGSh5bIi5MwrvcCSasa60WnpAiH9Tx7WipDCo64fExa3Zcw72JRGaSA9TRM31AWuHXWHsp47cwuQWlnF4mGaT8AoAGfR6tj4YpyBuvjSIxxJoST2On03/XDM5VktJlgOrGhr5sf8Cq3wzKCwSeY5hNjwac2Ox5uZerOGGhTwTV7FDC05n0pHka0BmBufhx9XOeb3DjSNz9uacIi3zJsYD5GtsKOninJCfimoO+zwjzB90Oc+BWi4A54D/G1kC2rfkTdu94nDqPXh5H41MjredybNrINNVYqwr7G1k6ALEribgMS8BT/j9m/7LHnKRKoj1TfZY2jAQ67NKjHI2P46xcBR9jVgIIhO6fSmMoQ2PMN/sPsc+AOaXHU2Bm1mI8rlLjimIIZZD1taJQo9vanxibv47fkUUMSfllGLfucRuEt7HaYK8TgcDvzcfTptwLdDcF1x3dJAUrx8JWQ3HYDsHxZDOg2WFBgPP7TyH/UamCMwLdoKKLPzju+8Pc4CeO782uz4Ks/M0J275gwB2sLITAk3fFY35KfjOkWwDvu0jqQaRSf+7XXR2x4RYfmo2M9TyVCcjYVRDBGPjHowsNZynY0NJMn0xMxeDoimn+HeLYxv2Heo5+xtgdm3wyujB4iE4Z4++nnPWeVj56b2esBlFHgC5snyMwoZWNPsWuGxGoEuqRZUPQTSZBFj/WwMtAtIvAwb12ua91WclxyOYVdXhjNeYVSxHYQKBktrU9ODjrvy65Zbsxvixy 9IHT0cG8 wXQQ8d/P2JWJ5hWL5l0NaZ/Cx1ZJrPkxO69bIoDsjXTQi41wvhJlLi+rZVsZxTwuu0Npl3r8nddQkYuLgXdwZrBQUbcfYAEB//65IlUOrSIohS8YGd1CoDR2ykgIt/eIIIRKwmD8Si5ztDfRYCexaF5ljHmz1DMJx0uDnSG+3lmBkAYNw9udsN8LL92Hj42RmHIQw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In set_max_huge_pages(), min_count is computed taking into account surplus huge pages, which might lead in some cases to not be able to free huge pages and end up accounting them as surplus instead. One way to solve it is to subtract surplus_huge_pages directly, but we cannot do it blindly because there might be surplus pages that are also free pages, which might happen when we fail to restore the vmemmap for optimized hvo pages. So we could be subtracting the same page twice. In order to work this around, let us first compute the number of free persistent pages, and use that along with surplus pages to compute min_count. Steps to reproduce: 1) create 5 hugetlb folios in Node0 2) run a program to use all the hugetlb folios 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 hugetlb folios in Node0 are accounted as surplus. 4) create 5 hugetlb folios in Node1 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios The result: Node0 Node1 Total 5 5 Free 0 5 Surp 5 5 The result with this patch: Node0 Node1 Total 5 0 Free 0 0 Surp 5 0 Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes") Acked-by: Oscar Salvador Signed-off-by: Jinjiang Tu --- Changelog since v3: * update changelog, suggested by Oscar Salvador * collect ack from Oscar Salvador mm/hugetlb.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 39f92aad7bd1..e4aed3557339 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3825,6 +3825,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { + unsigned long persistent_free_count; unsigned long min_count; unsigned long allocated; struct folio *folio; @@ -3959,8 +3960,24 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, * though, we'll note that we're not allowed to exceed surplus * and won't grow the pool anywhere else. Not until one of the * sysctls are changed, or the surplus pages go out of use. + * + * min_count is the expected number of persistent pages, we + * shouldn't calculate min_count by using + * resv_huge_pages + persistent_huge_pages() - free_huge_pages, + * because there may exist free surplus huge pages, and this will + * lead to subtracting twice. Free surplus huge pages come from HVO + * failing to restore vmemmap, see comments in the callers of + * hugetlb_vmemmap_restore_folio(). Thus, we should calculate + * persistent free count first. */ - min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages; + persistent_free_count = h->free_huge_pages; + if (h->free_huge_pages > persistent_huge_pages(h)) { + if (h->free_huge_pages > h->surplus_huge_pages) + persistent_free_count -= h->surplus_huge_pages; + else + persistent_free_count = 0; + } + min_count = h->resv_huge_pages + persistent_huge_pages(h) - persistent_free_count; min_count = max(count, min_count); try_to_free_low(h, min_count, nodes_allowed);