From patchwork Thu Apr 10 06:26:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wupeng Ma X-Patchwork-Id: 14045908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAC81C369A4 for ; Thu, 10 Apr 2025 06:36:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12B906B02D0; Thu, 10 Apr 2025 02:36:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D9DD6B02D1; Thu, 10 Apr 2025 02:36:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3DCF6B02D2; Thu, 10 Apr 2025 02:36:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D58A66B02D0 for ; Thu, 10 Apr 2025 02:36:41 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D7020120294 for ; Thu, 10 Apr 2025 06:36:42 +0000 (UTC) X-FDA: 83317175844.26.BB1B7CA Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf10.hostedemail.com (Postfix) with ESMTP id 03CA2C0006 for ; Thu, 10 Apr 2025 06:36:39 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf10.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744267000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=HmfH+Tu2Os96edD8oy3i/0x/b6kFheh2DdcatMdJtwQ=; b=Dkug3LF4LyZXTm0YobyApf8wiVZLN9aG9MsKRefFWhw+7ju2sRhRb+EN7dfkvDMGIjzw+l qQyN8tPHDS8B0zx302AOJFmsElHaMjMZte6hXrGnB0Z2JMKq67BAPJqoCpH/lE3ZNBJsha ejVbe+M4I6AnILsNcLdvIySlm5uXdVM= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf10.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744267000; a=rsa-sha256; cv=none; b=gEoJB5VB1qjionX48AKCgFYmbuldEeFJn7GrghKQgC8Wcb9QLp5bzDDao6RqwQXcCBwU/N B1nAmXXLUn5IffjtfOW5gRewRf7+iSYmSzxNVh4bDh/9zLp5uiAKtYhlGrbKN1Fi2jEGxg TDY1G5j9liVbycfZ4uHbMDEAs6LLtpk= Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4ZY9490GX2z2Cdmm; Thu, 10 Apr 2025 14:33:13 +0800 (CST) Received: from kwepemg100017.china.huawei.com (unknown [7.202.181.58]) by mail.maildlp.com (Postfix) with ESMTPS id 936D11A0188; Thu, 10 Apr 2025 14:36:36 +0800 (CST) Received: from huawei.com (10.175.124.71) by kwepemg100017.china.huawei.com (7.202.181.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 10 Apr 2025 14:36:35 +0800 From: Wupeng Ma To: , , , CC: , , , Subject: [PATCH] mm: hugetlb: Fix incorrect fallback for subpool Date: Thu, 10 Apr 2025 14:26:33 +0800 Message-ID: <20250410062633.3102457-1-mawupeng1@huawei.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Originating-IP: [10.175.124.71] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemg100017.china.huawei.com (7.202.181.58) X-Rspamd-Server: rspam01 X-Stat-Signature: xchzqsjkybneiyg9pogn69xti8o4gxyt X-Rspam-User: X-Rspamd-Queue-Id: 03CA2C0006 X-HE-Tag: 1744266999-567644 X-HE-Meta: U2FsdGVkX19kOiPyN6MHJtD066MLM6vfiBDwAJCrBdtV0vGz3l77MC3qrMRQkaviIUn+q1pw9Z8ZZGGKol+WDOc3zBN85oxTfn/rrG94WXFJOKIZkN0AGUAVjfwJxqvUq9BWoMnkWfG68QV0e13evC5kmOYP/Q7juf/rwTWo6cjeuqiGpwSuD4BDgxy6VztMvRhf8vq5PxgT+bN8EnzbJSQl3Hk/xtynOFsG0aQCsE4KMXtw5tMJC5oVfFXl30n3dX0k9XFxxr7h0uZtpA4sIXS4M3DNOiIf6U8BRr5+BofyLMyP58H5XV8lmekOHfdOrpIBQiaUSu16cnv6pQwaSB5+RU/mZpwozLlSf3mutR1BE+z8DAGSYeeVB0wjhvW7upxa+++YVLZypDX4DgScfxD3gfTGS3JsmWeJY3dW7rDz/5F5XBQy+YQiGqZ6gUXKgKCL8DkQc5K1sofIk2yCITwCOyfH+bo0IUXO3HmzkM2oHjW1wduvzsCqJZ9+PNNtrQAvB9ydIm0NmXDBk7ZlqfGyNfX1P6gcsjKYeOj5ByQxsRXLbi4W/bvFxhUKmZvz2Lu/q97LlgkAEmcRrWYI4yl21Hyu6+CuJsSasvsF9epZ+9KBFm9KMke+S136tHacG54rPH/+GLxcrkpkBaPiA2LH0K6sMoQsPkGjMgfnUTy5008AMDXPkjf922uJKvA8xefLt3nC3S3RgOKbqtCLYgZ02oKCikhPepp830E2dChqEftNxaeokYsuRXqvx6SX8J91bJU9za8PuNZ5/S6HzhP+cv3aqkMOXfSRMV1vuke4s8kK7EnqYuwaLfjJf6IwSq8ePV9PiMJphH+RhNqDPgxgJDLI5QGnfpaJ/0cKTjzjPKs2DHw4nB7J5g5VJneAW24swFpk29ZLds90VemWG5ONZtEX0j/V47pUhZGztGxR144iNDyrWZ4XGkvJB+k3ZjG9hyV2ASGLK8AFvPG 1s+vIc2n 4WqoP/OQ5XDav7QLydHrL5ZKEfpT9GMztCepQllmFORsR/pIgTvWcrND29NZjn3xjj7UKdxl+F300D1oTLn/nfbis1S8POlramETi0RneyAdMimDWARobnazS4QA9eOV1YfNTazrDSt7Xt+rzyK3tkI0A2huEHgY0K9MEGSX2dMVdjT8Rl08h9A/zS6luFwUCDyRW0WKpr6DsAeJAkSgPNsvu0jnA357GOPfNRG9sMcvG9gA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: During our testing with hugetlb subpool enabled, we observe that hstate->resv_huge_pages may underflow into negative values. Root cause analysis reveals a race condition in subpool reservation fallback handling as follow: hugetlb_reserve_pages() /* Attempt subpool reservation */ gbl_reserve = hugepage_subpool_get_pages(spool, chg); /* Global reservation may fail after subpool allocation */ if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages; out_put_pages: /* This incorrectly restores reservation to subpool */ hugepage_subpool_put_pages(spool, chg); When hugetlb_acct_memory() fails after subpool allocation, the current implementation over-commits subpool reservations by returning the full 'chg' value instead of the actual allocated 'gbl_reserve' amount. This discrepancy propagates to global reservations during subsequent releases, eventually causing resv_huge_pages underflow. This problem can be trigger easily with the following steps: 1. reverse hugepage for hugeltb allocation 2. mount hugetlbfs with min_size to enable hugetlb subpool 3. alloc hugepages with two task(make sure the second will fail due to insufficient amount of hugepages) 4. with for a few seconds and repeat step 3 which will make hstate->resv_huge_pages to go below zero. To fix this problem, return corrent amount of pages to subpool during the fallback after hugepage_subpool_get_pages is called. Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools") Signed-off-by: Wupeng Ma Tested-by: Joshua Hahn --- mm/hugetlb.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 39f92aad7bd1e..50bd1fe3ab400 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3010,7 +3010,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long retval, gbl_chg; + long retval, gbl_chg, gbl_reserve; map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg = NULL; @@ -3163,8 +3163,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg) - hugepage_subpool_put_pages(spool, 1); + /* + * put page to subpool iff the quota of subpool's rsv_hpages is used + * during hugepage_subpool_get_pages. + */ + if (map_chg && !gbl_chg) { + gbl_reserve = hugepage_subpool_put_pages(spool, 1); + hugetlb_acct_memory(h, -gbl_reserve); + } + + out_end_reservation: if (map_chg != MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); @@ -7216,7 +7224,7 @@ bool hugetlb_reserve_pages(struct inode *inode, struct vm_area_struct *vma, vm_flags_t vm_flags) { - long chg = -1, add = -1; + long chg = -1, add = -1, spool_resv, gbl_resv; struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; @@ -7351,8 +7359,16 @@ bool hugetlb_reserve_pages(struct inode *inode, return true; out_put_pages: - /* put back original number of pages, chg */ - (void)hugepage_subpool_put_pages(spool, chg); + spool_resv = chg - gbl_reserve; + if (spool_resv) { + /* put sub pool's reservation back, chg - gbl_reserve */ + gbl_resv = hugepage_subpool_put_pages(spool, spool_resv); + /* + * subpool's reserved pages can not be put back due to race, + * return to hstate. + */ + hugetlb_acct_memory(h, -gbl_resv); + } out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg);