From patchwork Tue Jan 7 20:39:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 113BCE77197 for ; Tue, 7 Jan 2025 20:40:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80EC96B00A7; Tue, 7 Jan 2025 15:40:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79C546B00A4; Tue, 7 Jan 2025 15:40:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F2D46B00A6; Tue, 7 Jan 2025 15:40:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 35DAE6B00A3 for ; Tue, 7 Jan 2025 15:40:14 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9E5EB1C7E08 for ; Tue, 7 Jan 2025 20:40:13 +0000 (UTC) X-FDA: 82981823106.23.5FC279F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 684DD180002 for ; Tue, 7 Jan 2025 20:40:11 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=it5EevuF; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282411; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7lk3UnwL4Id+x/YZ2BCdCNZ6wF+YE+g3ghMfVfHdxsk=; b=QtBWBUeKWjBQzOuZ+SjjupcAeJBsPjgdEV6H3KnzjNUsN4T0zm6M2oYWY2d/BB57nIZSB6 qnfPnYPSwORTmWqC4jtqZOHiXpgm6XGYIyg/cFbQG+JV1ttaE+EHmlGPkby/GucVuYbSS7 ZyCzQSNV3iAtgQk4SREaagIWxvP5vzU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282411; a=rsa-sha256; cv=none; b=zULfY4Rhh92dVRPY2iurcYEDUh5OF8MvjVq8eXO4bW2CaVJkVvujD2Ku3uCxEx0P2RPi2z Xdn0ouv1FvOjafLDRYpRjh/hrxWupdev4+NKtkFZ3yYLHn+TzrEDlYp9WHovalu5kIl1an hiUc9XUpMvLWhYFV9wdkbFeiScmmTYI= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=it5EevuF; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282410; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7lk3UnwL4Id+x/YZ2BCdCNZ6wF+YE+g3ghMfVfHdxsk=; b=it5EevuFYzu9wa2ByvJ/uxAQ9D/wgZ7wJoOKWVGZD/hx+bvNFx0oN5zXmvvJrixzucN9jg FZRuuuac6MD1hg5BmRi6gjrWnGd8ljAhM4eDugQxMsH8bHWMm6tGAANx8aifBtSjweH/Qr VKkIvGA8pM/ZYaa101/AbsdwZxPp5Po= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-280-Pbqg5RqEO0CTFHlqQGLhPw-1; Tue, 07 Jan 2025 15:40:09 -0500 X-MC-Unique: Pbqg5RqEO0CTFHlqQGLhPw-1 X-Mimecast-MFC-AGG-ID: Pbqg5RqEO0CTFHlqQGLhPw Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6dcccc8b035so3154126d6.1 for ; Tue, 07 Jan 2025 12:40:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282408; x=1736887208; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7lk3UnwL4Id+x/YZ2BCdCNZ6wF+YE+g3ghMfVfHdxsk=; b=dg37lyAVM9L0R+nHqECYY68Hp5t4mrwA/qp9MijNTTJLGXPJyLl1mLA3ifi6inE6iK sfcBdhcgULjT7qH3/tY1R5z19+7ElyQzdVLXBWnUPHMPKDYnQVtSb/EJxClnseCVC4D7 dP1IVfjTMpPtRwfD48DO18kXCqhe/HzzQ8dtGdVWssaJOmQyn1rcetRzYPXThSFYF/rN rImB6yeghzqC1ZGEYxFCIROj7ry6iTtTq4SOK+U85vRcEH0sd4Isgrm829xXcmbaShTr Dv8bC3iwmV2vwF/yOWPSXWY8uMCISlMwM9R/2gar2LyLRBFlxF+JiJdpz4U88tQtj9o+ 431g== X-Gm-Message-State: AOJu0YyX5KlGlyrV00u2YAW12K75wawY31knpxKVVgbI4y3+3BJ3yYib 47ngWJs2nWGJltrajquMf9ntNC6snydNtxo772ZwjEnfsyjIMzY5JyIM56coLYtV8L3wRWQE7dx c1kO0vFY0j6ygehhSHfqpq54ZSAG59KS1J6zn+sOs2FiDkLEy5Kdgpdb5KdAJOaDbqkc2pZJ0c4 BAlIO+MID/xA66IkxjMkMpAFufOzWzwQ== X-Gm-Gg: ASbGncuXfcVnseQGm/3brD9ewSW17mQrt1nIWxBxlY2L2rUGxoKEXWiGbKz2kiNKIn4 8qh6VhpalKwSPSTYwYCZJ7r/DKeAzgcxAXOrxAezOUeasIRrOr/CWro2TmN7kuAVkvqTpXArjqc SALxTuN8FCJMMFrImqYfb/hfdRqcjZGT5WTRdyoouGAEyTFDJV0OGGY2W4oNWNJC4c9r0sm1ydC 66KS+plP9y9aR5/C8vCAHId2OzbOU9a1Drvkh/y9LyJ8KIAck4tu0IaWkrfEDWhfJbeCAisAhfG v80A2wFRv68ecQq3Ot+4EcpJ5dvRjTSA X-Received: by 2002:ad4:5dc5:0:b0:6d8:b660:f6aa with SMTP id 6a1803df08f44-6df9adfb518mr7494796d6.14.1736282407121; Tue, 07 Jan 2025 12:40:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IE4opU0e+dfjClXuwDmR9cg3uGQm3YY2brmnjieEI+XeJr35YfbdedExB7KgsOyuQE8PaeuGA== X-Received: by 2002:ad4:5dc5:0:b0:6d8:b660:f6aa with SMTP id 6a1803df08f44-6df9adfb518mr7494476d6.14.1736282406765; Tue, 07 Jan 2025 12:40:06 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:06 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador , linux-stable Subject: [PATCH v2 1/7] mm/hugetlb: Fix avoid_reserve to allow taking folio from subpool Date: Tue, 7 Jan 2025 15:39:56 -0500 Message-ID: <20250107204002.2683356-2-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: U31TBF16MWb_LwvyqvZoJ9m3qEpEKGPIVvBCSiuJhlE_1736282409 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 684DD180002 X-Stat-Signature: qgwtz41oybwhbymo7pk78jdg8xgdbbab X-Rspam-User: X-HE-Tag: 1736282411-137945 X-HE-Meta: U2FsdGVkX1+lowTYhpyJ20ymD+8Gz8mkCAhqGANUnicaqI20Xz7UFFdhfkFA8+RomchVE3uqSBubsSSEu40hVML5dIaHY3Dw2ojXzUMemMXoxG4CEyB87AWPX0ha4ehqYGCs1pIEDjtl5tYAPj4uORGjLvdlGjePZEYGHW+sJeU3yqX45ZufipLPuWWxjWSZstx/Qt/BYtS3fKH4zZNN9XseR3BoG3a+RoifWjQHxnU4KtlrZQFRRT0wtei4PJkBp9PN469Ts+WMqzDlp6WJyXrR00555t/2g7GXAnMK+lHWpnNKbo27V16dYROpPX6THVdrQa6Iw7sKx8/3w/oUh1BJb76O9/W538fGYIlm35CwW7gOd/yabEVH0mac6912wYmbr1CV2WU+rUgNhg1X2b4amYfhHeP9BktqckzQAcIbPn9eeNm3o/Fql3rohU7q1yq1GFafYJ2+vMv2sbF17A5YZIGwrZjG/dNTi0zQKfhPh9wsoSIWHOqU3VpT1GAUmQh+h/j6tAJS1wnL4hkStga/C718EYjcOduGOH1gM3aDQHSoAX0FEEOWgi/J157WC+UxOVOLjIRBxLiYacWOw6JoSqvspRNoHideTGPD5sRQlpuzRF1Au2gCT8LuSwSfiBKN1B9OfZen2bKep6lcNAcvlqBu3VME+PXTrZY48glcPEMyPQngYhkI0miftVG9Kr/+E99XUMsgdt/aox2jVWfKRGf8tSghAwfj4/sDaAW/u1Fcwf2i+M5oycmOjIJIlQ2D6gGaofdnJqmx0rx0B+RT/AMh/wemmD8Cbmn2WqGeJXfSc9VJDGiHyb8qO92yqNpxYbGWTd7HE+nr7o9axYFcN1hZ7mubEQaRilfturL3Ii6064+ri/Po+nz2C29mHlYL1guKka7OTl4SHV9PQLyFSVidDfMgSbm8ZMWcUwgZ3jqLWO2McBcNgNtzpgGR3lCi/Pk8xvjuHcPq0A/ pZ9p1oU6 iz9iF+kXgzmb454T1rmLhJRPONqR+8BD83MpM+xHIPo1J153PlRtkSUTp94yMdFMSrPcB9BTzA/Qq0F3jWopMf/K9WI3EAk87GVJqM63O+7Y1LBFhKt9rUEMg5JAYMInvFHXUuApKoBTKSBHetdZb4CHwLpzhffkD6ONTGoElqoDScHFlf3zA2uadqOz4DPg28kdlN0W3WuzWE70Hh2Mshn/dHdbia1BjN00ukuiQjrFOPpNEbXRzb2dcusWQe7mwyu6xzzw6f4cJptogIvNnZryh8I4OiRfr4imy6/D808lc1ZwMyKmpmW+ptAU9iDueNjWX4uf1kUfZ7CfAfCFF+E6DutkS55CDRgEDqdXVEvWPvlCOs17Jy+PxRsUbt8XPS39Y0p7/Bghmzc+xz1y3fqx2S6QctvUszZ1NHfw5gaT+fFSBBEqkl4pbH4aG5gY38JAlIeK8JfOVwIUfkvmNEH3ShEg8iLR/9uTYkp0EC3TDyo0bpPvQg7iXS/Bu/9aA26/YRZee1lWlugTFE9BE6xxQTU7jj6YxiOG8ap/Tb0sCFVFZ2OubNGd/j5orVwI78cU174DRaD/IAxtucHlHeLLNFd9GoU8A2kIYO7fvYbkJ6OUQ+YvurYdGirIVzbH01D+lIrsGvMFIUv8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"), avoid_reserve was introduced for a special case of CoW on hugetlb private mappings, and only if the owner VMA is trying to allocate yet another hugetlb folio that is not reserved within the private vma reserved map. Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate"), alloc_huge_page() enforced to not consume any global reservation as long as avoid_reserve=true. This operation doesn't look correct, because even if it will enforce the allocation to not use global reservation at all, it will still try to take one reservation from the spool (if the subpool existed). Then since the spool reserved pages take from global reservation, it'll also take one reservation globally. Logically it can cause global reservation to go wrong. I wrote a reproducer below, trigger this special path, and every run of such program will cause global reservation count to increment by one, until it hits the number of free pages: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include #include #define MSIZE (2UL << 20) int main(int argc, char *argv[]) { const char *path; int *buf; int fd, ret; pid_t child; if (argc < 2) { printf("usage: %s \n", argv[0]); return -1; } path = argv[1]; fd = open(path, O_RDWR | O_CREAT, 0666); if (fd < 0) { perror("open failed"); return -1; } ret = fallocate(fd, 0, 0, MSIZE); if (ret != 0) { perror("fallocate"); return -1; } buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); if (buf == MAP_FAILED) { perror("mmap() failed"); return -1; } /* Allocate a page */ *buf = 1; child = fork(); if (child == 0) { /* child doesn't need to do anything */ exit(0); } /* Trigger CoW from owner */ *buf = 2; munmap(buf, MSIZE); close(fd); unlink(path); return 0; } It can only reproduce with a sub-mount when there're reserved pages on the spool, like: # sysctl vm.nr_hugepages=128 # mkdir ./hugetlb-pool # mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool Then run the reproducer on the mountpoint: # ./reproducer ./hugetlb-pool/test Fix it by taking the reservation from spool if available. In general, avoid_reserve is IMHO more about "avoid vma resv map", not spool's. I copied stable, however I have no intention for backporting if it's not a clean cherry-pick, because private hugetlb mapping, and then fork() on top is too rare to hit. Cc: linux-stable Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate") Reviewed-by: Ackerley Tng Tested-by: Ackerley Tng Signed-off-by: Peter Xu --- mm/hugetlb.c | 22 +++------------------- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 354eec6f7e84..2bf971f77553 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1394,8 +1394,7 @@ static unsigned long available_huge_pages(struct hstate *h) static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, int avoid_reserve, - long chg) + unsigned long address, long chg) { struct folio *folio = NULL; struct mempolicy *mpol; @@ -1411,10 +1410,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, if (!vma_has_reserves(vma, chg) && !available_huge_pages(h)) goto err; - /* If reserves cannot be used, ensure enough pages are in the pool */ - if (avoid_reserve && !available_huge_pages(h)) - goto err; - gfp_mask = htlb_alloc_mask(h); nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); @@ -1430,7 +1425,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) { + if (folio && vma_has_reserves(vma, chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3047,17 +3042,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; - - /* - * Even though there was no reservation in the region/reserve - * map, there could be reservations associated with the - * subpool that can be used. This would be indicated if the - * return value of hugepage_subpool_get_pages() is zero. - * However, if avoid_reserve is specified we still avoid even - * the subpool reservations. - */ - if (avoid_reserve) - gbl_chg = 1; } /* If this allocation is not consuming a reservation, charge it now. @@ -3080,7 +3064,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * from the global free pool (global change). gbl_chg == 0 indicates * a reservation exists for the allocation. */ - folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg); + folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg); if (!folio) { spin_unlock_irq(&hugetlb_lock); folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); From patchwork Tue Jan 7 20:39:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929592 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E8D7E77198 for ; Tue, 7 Jan 2025 20:40:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3DCB6B00A3; Tue, 7 Jan 2025 15:40:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AEE296B00A5; Tue, 7 Jan 2025 15:40:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80B426B00A3; Tue, 7 Jan 2025 15:40:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 56F696B00A5 for ; Tue, 7 Jan 2025 15:40:14 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 17439AFEF4 for ; Tue, 7 Jan 2025 20:40:14 +0000 (UTC) X-FDA: 82981823148.19.279D4B6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id DF27840012 for ; Tue, 7 Jan 2025 20:40:11 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P2Vqp4hi; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282411; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TCpDisLLwvCIGNbT+PhKKopNoYb0ShL8WntNQkEFj2A=; b=BY8D0T1caVZCC19VHjuVlmCM3d7DumYWayIWJud9B2X6txhA1Dxad/JG6UqNpnBkR+hm6V o/XI/20C9wffNjab6fllw6/fwZ7WK+NzJ7U8rTdgUZCTiZzhqzCXPRWjeC/glD3RunXZ8T 2VfvXLMV0gDyGuoYzUlqAA87H4H7x8s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282411; a=rsa-sha256; cv=none; b=Q+ckSMvYHNZ/t7UNQedrAjliIfYiho29BvnZli3au2L6tNafWxpDQXFjNQ54J1Y7ALrxRg wTr/Z7Lx5q7xzNR99+jtVHL2slOklec1Z3clP+UjL3IfJzLneRmprRXSif+WRvxntbPiWR zXDykTaO9UF/hQQ5Ay2b+JEiRuJuzKo= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P2Vqp4hi; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282411; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TCpDisLLwvCIGNbT+PhKKopNoYb0ShL8WntNQkEFj2A=; b=P2Vqp4hiPlJUPtZamfY6+/jaCyFePF49VA8EjguaIl3z7v3OK9yJQb4qVZ9qXihSQhLTn0 +0qIDV39bsHK1QX/3KxCuD0L5hwpiZKvCLrCN4ZO7ZLmRAWOZ/K/EWq7ogle04jvzbWFr5 YX4xnGQVUwyrLI9W3yJZk2PXC+EoM3U= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-491-93O6fbJDPeunrGzMPN8ryw-1; Tue, 07 Jan 2025 15:40:10 -0500 X-MC-Unique: 93O6fbJDPeunrGzMPN8ryw-1 X-Mimecast-MFC-AGG-ID: 93O6fbJDPeunrGzMPN8ryw Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-467b0b0aed4so341169111cf.2 for ; Tue, 07 Jan 2025 12:40:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282409; x=1736887209; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TCpDisLLwvCIGNbT+PhKKopNoYb0ShL8WntNQkEFj2A=; b=W9vnCfv4FbCMfZ4/eCY2X5g2qfN5s/y7j5IUNnDCxehb1nbnJoQA1lCYx1gUqxSnOI BCnWkE0RS7LBkKlXOf3c7kD3jm0yQcB9TfLrnMQQvkISiZ9w1GBk3jE+oRtV+l3fmSQ7 tP0m1t4fe80M1Sps9Fjij5Uy+SyDVkoxXmgNm6NxAv+BQxfiJQFQMpXY3Lz8o8/mnyxQ NpFksSDX2RCEyv48MjUBvKWT1jLuyLJZzgErKaejASxlu64m3abtzJAp9yxtrbyXDaVS d80fine3vPRFotWjBTfyI/WYLVk0DQI1COeMZCZYr50zQQ1igLWU2fVH8WlqpGWDG09i +6Ww== X-Gm-Message-State: AOJu0YwSLHVfjQ+WD8ISQlofydzVq52AvXN8SXAonZYXHfH1vjLQ0x5P uFdko50sFYYYjVye4/5M6Tfi8VsBqv88I3Usmm1B0txq8hn4EMfgW23JbpBybD0H1+Cma1yUTh4 hJSk6iCDRd7Lkllv38ViutbUEkuSQqdDGy1I8/ThbkV0oZeKLvRad2jMosnFE7FsGUIgVX8LynU qw+sLsi42tHyH1vW95SEEjaCT/13ZhTA== X-Gm-Gg: ASbGnctNTHQDZNaq9KqJWJGDTs58WDqk/TnwH2S4bczIVOlo5S+qBOi4P0eAqMjYAG+ kWHzfiqRMujPkXjaPtH0J7ibxRdeEVLRWZmbY+6z7QcD+22jeu/lzbU/bOn2zpSk+bjCh/0waif zae1QRshhj0cWuPNF96r1h3zoUQm95GKtSWt5U++84PtzbT3/lsN5AeV9Jh0SAwS/OFw7aSohxo JEHwFWBnoVVLBndRVyKhnZ4FKoRVrynpW02OF6dW/vvBDW/682t3vLo2JemqcfNxRUoY2ukC3fd FySS7JgiW2RwdF5B2vGVSqSV79cJSCS7 X-Received: by 2002:a05:6214:62c:b0:6d8:e641:da29 with SMTP id 6a1803df08f44-6df9b1cb73bmr8045906d6.6.1736282409218; Tue, 07 Jan 2025 12:40:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IFsmPVbepwscx9u5TPGeWDqEKCt+YxjZ5PHErDxy3KLOftZYeSvZ9Ie5a+UMBDrcG7IfTLdxQ== X-Received: by 2002:a05:6214:62c:b0:6d8:e641:da29 with SMTP id 6a1803df08f44-6df9b1cb73bmr8045476d6.6.1736282408878; Tue, 07 Jan 2025 12:40:08 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:08 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 2/7] mm/hugetlb: Stop using avoid_reserve flag in fork() Date: Tue, 7 Jan 2025 15:39:57 -0500 Message-ID: <20250107204002.2683356-3-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: na9qXYEt3lKDpGhSVMrd5JYro90S0eGfOQ8WpRTr8Do_1736282410 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: DF27840012 X-Stat-Signature: knba16xmdcsggqrit379piy99tf5oghc X-Rspam-User: X-HE-Tag: 1736282411-227477 X-HE-Meta: U2FsdGVkX19BN9zSg0vmTENtHJ94e+Z22DRQ4PWO3E5vygE2G8Y+RMkDnkum+NJQQd6BvvtglNsUlHYJvL4Z/94f1Irg06yLXB9hz55JYvESPxMrlF9gu3ijhbrD3tKgXxkA6XluGaumDPGDHRHMb66Nz4g5SJFx98Sj9XK2KiOYIhjy+wsW83wGCLJ7+7Gba2DGaZQJYS8lgs7dDA0DhKp8y4eMDyiWcdQiE4UsmY72vTmFg5DCf/Ewgkim+vrQd89PPw+/7ot+05himsGHrfbhnnSjWBynWPs49G0ZedbQTsNLfhh6tXy6pV2/bvNfN/nvF4HY+VbPauRcTNg/4a0Hvd8lFZIOrVFF00+KChbevrboBzuXekKOUJKZzxndNG88uE0H38qKNKzdhs4W8cAhpoKYvCe67ZeE00ZRJYJNjnv9Rv+i914WaZwQ5S6/DqJVDpuMldYsTihrJo1maC2+/WoKTOJEVSLc6lsv075cmSlKp4sdsMtFbNQI5SKj7o7AhdUFcCa6r1Q3zXIHEs0IQdUq9UhMkuJrqrYV6bTbwEJMcMyvwqZVw3+UozgVQ+jFh3cq5y1g12Nxr+WDqfwkXgjyv+iZHFecvPh6ZHKltiEEOnheDp1wVcgjSVDfs8kgG6/aAZyjJkqEgqeAdr9TjlSM42HD2S1/dg8+DCF/blQ5jxAX7TW3dqD40Lpi/qSmNPwPrjPNUk7vulhi0Jebr7olxK09llWTKyH/OocjNQwDFzmz35FxGSrgoEVYL9+4bg6ftf7lcxQ1FKhkoHdzPcFtrRZC6ECoBM9DM6/TtWJUZ6TcSXskzbbFlBXhxsi8O/rjjUd2ldcZrcNaol8dOJa+bT7YDOLJT3nhkPlZIpZe3musXkETVVAynFui2GcqF/nTWxRBM6oXv3Vwp9k2O9sMhkkPvE57EdyLGreMPAduMHEBC7QLbC3GdNMzhyONThiZ9lU982kNS15 hs8P8kuJ iQ+CYfYiuSLjfU4FO32sFsfJlEJjzogWnLd1Sp/0iAu7lqnLFv87zIeR5v82tHISqn25YagwwjFMoSttp5XRGePO7dWM5ku7HVdsHknN4Uiul7bGZ6q+qMSRaKlill4MN4c55EUCLaZbFSMBq9ZEU6GJeTZSbHrqwEUUOvBscuBiKQnSQ7nK2pVUtrs04STbFvx1UO6nmQIkh2Ofy5S7NBaUmCr2YtaGDS6D03zjdZLYAeDGVt9eoNHuEKosbhSnCKKtEomx6Z9QB5BczIiw7hU7p86B6ur03xo2owLjxGZeH+Br1U/HMFrWSwV8o2SJVQGqMQ9Aehn17DyrQ1djL8isFvNZCX5mv3ro6UjT1bwhl2VVqEOJfwwKPdR8U246GOJPgXViK0XxKXME9/zYHjWxBBubbUZwv2qEbTY7pkwS4LYXkmwPYDwsQOabPgAoTaRumvFu3HptvCZaH/Kd1uQ5I2bCLWrEqUH4t X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When fork() and stumble on top of a dma-pinned hugetlb private page, CoW must happen during fork() to guarantee dma coherency. In this specific path, hugetlb pages need to be allocated for the child process. Stop using avoid_reserve=1 flag here: it's not required to be used here, as dest_vma (which is destined to be a MAP_PRIVATE hugetlb vma) will have no private vma resv map, and that will make sure it won't be able to use a vma reservation later. No functional change intended with this change. Said that, it's still wanted to do this, so as to reduce the usage of avoid_reserve to the only one user, which is also why this flag was introduced initially in commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"). I don't see whoever else should set it at all. Further patch will clean up resv accounting based on this. Signed-off-by: Peter Xu --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2bf971f77553..7be8c35d2a83 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5369,7 +5369,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 1); + new_folio = alloc_hugetlb_folio(dst_vma, addr, 0); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); From patchwork Tue Jan 7 20:39:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B05E77197 for ; Tue, 7 Jan 2025 20:40:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F61D6B00A5; Tue, 7 Jan 2025 15:40:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A51E6B00A6; Tue, 7 Jan 2025 15:40:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D1D6B00A8; Tue, 7 Jan 2025 15:40:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3D0306B00A5 for ; Tue, 7 Jan 2025 15:40:17 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E82EC160AF2 for ; Tue, 7 Jan 2025 20:40:16 +0000 (UTC) X-FDA: 82981823232.17.91A3508 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id B945D1A001F for ; Tue, 7 Jan 2025 20:40:14 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TTQMcap3; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282414; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/Z9zMvty6z22PLVpM0GzhFMnUyCY+IlYCy3OrPbUsSM=; b=lpF6fnO+4yyNE6w3n9HeMSo70O2dOnUkHlAr2XpM0BJ8qeVYxayV1nt52QbqjCtviUEjvb 7RuGfeyB5TsH571pUeIF88dsPkf4WvMGtMRbJQjDwno91fHIgebFseBxtVMt2Rk5Fb05Ol sNN3i5TmqPHoZC6jYxS9n1oz0taa4KA= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TTQMcap3; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282414; a=rsa-sha256; cv=none; b=WMGr7GYpCsqgZeXe1TKIf7aaFuF3w4lkspKYn9xlM9n0yg03aiC3Z/qElrqlZHt+ZHwhkC wdgkCQ45WUyCxZYzcAG7Pj6N7NX9nslsRCADf6GkOgdS5QtL/ZbQueYMZVgnmRhFuFKXC6 OSLpyzs5PXs+vlvP9cy1l0/UT+jzWw8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Z9zMvty6z22PLVpM0GzhFMnUyCY+IlYCy3OrPbUsSM=; b=TTQMcap3mj3WgV+bjezlT79b4LK2I+6fyXVPB0ChI167dSDUA1oWGgf1KZFooUP8UEfO0b S2Cw04UmPHXXfupTxIrFN/AZqero2H4Dek1ihTyX1kRH630pj82Paoc2+7+cN/v/58OW32 +ct7n9ReeWFPDRj824SZbU4EboANmXc= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-187-TghSzuSVM1afhl5EjNRMEw-1; Tue, 07 Jan 2025 15:40:12 -0500 X-MC-Unique: TghSzuSVM1afhl5EjNRMEw-1 X-Mimecast-MFC-AGG-ID: TghSzuSVM1afhl5EjNRMEw Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6d88833dffcso214734056d6.0 for ; Tue, 07 Jan 2025 12:40:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282411; x=1736887211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/Z9zMvty6z22PLVpM0GzhFMnUyCY+IlYCy3OrPbUsSM=; b=ALNe58GbJPhtgGmV96b6Xya+hPFcmO2Pm40U6E1w6ggRYcyiLHQ1OAG0GH+RCDaN1s VNDZ+pSkQjBvM4BsnhCjouTmYzz7Ce6gxAkV2Wllp6hvIs7W/Jnma6yM7uxCn/yhUjoV aqlHsL4xxNKhLFR6DqpPPCGDW4Bx+acBUh161+SOjjelXj9IXSDPTy15YTH55RD1K6rI c6uOrSxvF/P8UpjnjVaM2YDqGcOmpKcncyH0auKazr+OpFU2DazyXCLu2CC7Oz3sqaag kn6rrjhT6qcwEXTusgXtrnrUcj7KY+ZGDmz+SqYY+pw8xbmMQBc9hrtAFqztztzcVTEA u42Q== X-Gm-Message-State: AOJu0YzkzZk14BCl7tnSbKv09IFW0e4YjT8A/G0AMEk8Ppf0HTQ4j0jo 9D3gbEVKBaSrYC17ffBK6kcwo3OD3ueyGPxTg/RaCjusihgMzz7OpTlaoFBxCj4g+XcTgaRTNvA mS1SUU0qhfbOmiRDbqK7/7ry8Q2gncYG5xkxP4WTLSwGsBZUWZDifmMsixk29FeveR/TWYIMncQ p1nciYO50mjjRWLfy3Pfdbmpd2mcoeTQ== X-Gm-Gg: ASbGncu48jXWkgtJXlnS9koMHlEAP7DgpuJNk3Jhjmn2dCZMq0mnT8HU/d7HdYz/u3f CeyIGFYbeEeXiHssduSY9F3PChj6whNPRNNrF26YXFJv4kEqxEKqFlpgVqPwQPqZBA66tyj+fg3 XjYMcUr6qofvrzV35mI2TM69eexQrilPB/Y6SGs7GTi1Ng09UjAkX73KMfi6sEoo4JD3+fmKA9R +dPm9UX5B80hvX/4N1UEMZ2TgLvqWo3BSF0L/YD5AbO4NOCV1eqiKIDI4Kj5iZjes/YRvP9BeOy YXulcycHubkHvM0KPkslhgW8hwvBVsAH X-Received: by 2002:a05:6214:27c2:b0:6d8:b3a7:75a5 with SMTP id 6a1803df08f44-6df9b2d50d7mr8914106d6.42.1736282411456; Tue, 07 Jan 2025 12:40:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IHj1C6u68Y/+sXafpOSyDBghbB3mfdOGpS6Z/lZ5W8Owyk67iqxQBFKx0AO6ST9F8O901e9zA== X-Received: by 2002:a05:6214:27c2:b0:6d8:b3a7:75a5 with SMTP id 6a1803df08f44-6df9b2d50d7mr8913536d6.42.1736282410900; Tue, 07 Jan 2025 12:40:10 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:10 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 3/7] mm/hugetlb: Rename avoid_reserve to cow_from_owner Date: Tue, 7 Jan 2025 15:39:58 -0500 Message-ID: <20250107204002.2683356-4-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: UYr7ijEQgL--zSKYyBa8XNC--aNPH_XagvA1hiHKr1A_1736282412 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: B945D1A001F X-Rspamd-Server: rspam12 X-Stat-Signature: cg4obdoxofhfi1udz3cqk4wm5xisqh9m X-Rspam-User: X-HE-Tag: 1736282414-452707 X-HE-Meta: U2FsdGVkX1/Y9iC2oz2W6f5t0sT/hIt6KSEea26++4ERREqqWrulmKEUkfR64cmqGV/OsUE5q8gA2iOHG/bfbsSiDOu5IXqz2jPuB6G4GxTbsxR8iKDDi5ChLcnO49R/Fi9d0atmrMen1+r22qwpBuoWfcJSs3jOHYG2rqxheqIOLZTUZqGCkxguEe0sJ/sBDRELJO70hcg9sxcZwv6DZzq6lqeij4n/dkaE9e2MVCSv0oqA3vqgtBtKlNBahR6qEQ5wircEfhG+vRQVuIHkRa0+LbvkBQVizn4DrlCanWff7LnogHzX3uposyOQjoJ8NO9UEGIscEKnhZTlN/+XunJeq4J/9eSc5jiM6h5yiMSsQwC7wp0cllV+MAgvzK8iGCVzp/5s51eUBKwiOFiArpD60hujTsq/QZRd7Ljk8BuQEErN+DNSr6+mvBSW0RG2VnOlLiWYpeNQ9DHU0ro4mHBa0bia8lTn7grjEmu0VufZR5J03IvYqFocE+gAzK0WStueM7vXH+8ZsOOKr7TkLRxAkc6sdvSwmh4iAaSkr2tF/RpCtDngdC1IEF2nEzyVbDxlle8JHe8TWzyLNugMnildPzAiIiveV/T4jFBBFo/kLCnlXLC3VQ7FRu3txMYbdxNAW2ElzB6OTuNtBzbfYJ3KvIDs4LV68ZbMMfFgteK+CYhHDWX23B7TmpVeH/tphCP7yavpHu7QRo0c/+m7g+s95hBqXbm9U/7dLvVzUwb9FrvX0cHC+BgtKupxd8jYooC9PoSmHxKXiXqFu6JVk+r8SmaY0q/kxEN0BEa+XO+PiEPeohjK7RV7xxJEiQAo4BT+OtnHoorEOcPuWkj+BFTywj2DHjSS1A4hkpPhMdxuRxN0pS80DBZrIwFrG9lqmtLvtIvRZnwzDRMXBNznQOrI8cnO3rzoatvJ6hKNC41UG2LmDzikfmzZh+S/13OGiRqieFwKqGxVOHSCay/ SRJKF4Pa 8zf/5TRhKodv6DmjMqt53q30iSb3Umokemg/weSCDvNvMl8vf4erswi//YvCSwSaAeECMATyatn+6Qo3nvrLMkQ0IEdYDhVDwWjx3elQfV8HKWvX8jEjlNCp58ErvxA1L9it9Y2X4EJdrTDels7TAmVZlygoRuPz/+NA0iJpcyF3yYSyOW7+29BSRntG4U1hDB9NG8ZhQY+Ec1bVTh3Nj6//Rh8iVJzkt61eB9bVAIewuk1HG2dKJyU6r/Jz+tXKrd4hibqSwC/D9XdsGsqq2exQ7F/W2oDZnB5ZVcNbF7Cz/9oyVy7kX++aOEgY3K+9UTmtRrmoD2IwNRLsJvgD2ZbSOW3BIIRMgHJ8Xfg1erz6I3i9JgR5GdNOg26G/jXzVxVa5grg1nTj23rnrSRA9aYlovQJsVD7+BlPgYghYMvYo/3LjX+ghUeY6nLXpsQx8teLJ0NDCSEALISzRNWML6V2R+nI+iho0FDp0dqPFDiAt3kw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The old name "avoid_reserve" can be too generic and can be used wrongly in the new call sites that want to allocate a hugetlb folio. It's confusing on two things: (1) whether one can opt-in to avoid global reservation, and (2) whether it should take more than one count. In reality, this flag is only used in an extremely hacky path, in an extremely hacky way in hugetlb CoW path only, and always use with 1 saying "skip global reservation". Rename the flag to avoid future abuse of this flag, making it a boolean so as to reflect its true representation that it's not a counter. To make it even harder to abuse, add a comment above the function to explain it. Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 33 ++++++++++++++++++++------------- 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 62fb0cbc93ab..0fc179a59830 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -814,7 +814,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * folios in these areas, we need to consume the reserves * to keep reservation accounting consistent. */ - folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); + folio = alloc_hugetlb_folio(&pseudo_vma, addr, false); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); error = PTR_ERR(folio); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 10faf42ca96a..49ec2362ce92 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -683,7 +683,7 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); + unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback); @@ -1068,7 +1068,7 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - int avoid_reserve) + bool cow_from_owner) { return NULL; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7be8c35d2a83..cdbc8914a9f7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3008,8 +3008,15 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) return ret; } +/* + * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW + * faults of hugetlb private mappings on top of a non-page-cache folio (in + * which case even if there's a private vma resv map it won't cover such + * allocation). New call sites should (probably) never set it to true!! + * When it's set, the allocation will bypass all vma level reservations. + */ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) + unsigned long addr, bool cow_from_owner) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -3038,7 +3045,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * Allocations for MAP_NORESERVE mappings also need to be * checked against any subpool limit. */ - if (map_chg || avoid_reserve) { + if (map_chg || cow_from_owner) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; @@ -3046,7 +3053,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* If this allocation is not consuming a reservation, charge it now. */ - deferred_reserve = map_chg || avoid_reserve; + deferred_reserve = map_chg || cow_from_owner; if (deferred_reserve) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); @@ -3071,7 +3078,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3138,7 +3145,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg || avoid_reserve) + if (map_chg || cow_from_owner) hugepage_subpool_put_pages(spool, 1); out_end_reservation: vma_end_reservation(h, vma, addr); @@ -5369,7 +5376,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 0); + new_folio = alloc_hugetlb_folio(dst_vma, addr, false); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5823,7 +5830,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, struct hstate *h = hstate_vma(vma); struct folio *old_folio; struct folio *new_folio; - int outside_reserve = 0; + bool cow_from_owner = 0; vm_fault_t ret = 0; struct mmu_notifier_range range; @@ -5886,7 +5893,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, */ if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && old_folio != pagecache_folio) - outside_reserve = 1; + cow_from_owner = true; folio_get(old_folio); @@ -5895,7 +5902,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * be acquired again before returning to the caller, as expected. */ spin_unlock(vmf->ptl); - new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve); + new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner); if (IS_ERR(new_folio)) { /* @@ -5905,7 +5912,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * reliability, unmap the page from child processes. The child * may get SIGKILLed if it later faults. */ - if (outside_reserve) { + if (cow_from_owner) { struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx; u32 hash; @@ -6156,7 +6163,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, goto out; } - folio = alloc_hugetlb_folio(vma, vmf->address, 0); + folio = alloc_hugetlb_folio(vma, vmf->address, false); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -6622,7 +6629,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6664,7 +6671,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM; From patchwork Tue Jan 7 20:39:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929594 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31177E77199 for ; Tue, 7 Jan 2025 20:40:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C54046B00A8; Tue, 7 Jan 2025 15:40:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C01476B00A9; Tue, 7 Jan 2025 15:40:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2D766B00AA; Tue, 7 Jan 2025 15:40:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 828DD6B00A8 for ; Tue, 7 Jan 2025 15:40:18 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 495DCA0CF5 for ; Tue, 7 Jan 2025 20:40:18 +0000 (UTC) X-FDA: 82981823316.11.7A1545E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 0C9D11C000D for ; Tue, 7 Jan 2025 20:40:15 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ibYPuqTo; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282416; a=rsa-sha256; cv=none; b=cCDtHUMpph8IYg4GXBkYFlbm3r3k6QNkKQrItLFfJSkADMg/izT2+caY0UKCmEg1933TUj 0vnOVY0BRkXeO1bb23ahZcdOn2sHDMTtlizAh0RqBcHUsv7pHo5CNOymf2bHLletZogKNI DyIg7oxA11mL6UV0lWcCfcyDZh2NMAc= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ibYPuqTo; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yuwIwTLmWdhGbI1nIJskaIORTo+ctaybQQ0IHgYhnBg=; b=fYal4eOkvigdz1bVA7pkzFcFKYAMAV7OdWKgPW/4zz7+3AegjdgPhH0HDuB1ola0AVBxVX XFOTqkzhyfNLyP6BxyLG4MwRepU+PyNVmogbVvdn8ggO8RIzohAqzZn5/uBvKUbG4H3k8W pW0b4ZvXjugNzMBaC9ZpO1hYYs2QVHs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282415; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yuwIwTLmWdhGbI1nIJskaIORTo+ctaybQQ0IHgYhnBg=; b=ibYPuqToPLVpCHxuqM68Ra3iRYgorCAIOfob619JxZlE8HOxUs/FVh9J2bnHbr6bQthGoN XXYtcSnQLwR/cGFBWzpH062/k0SxxZoDuZ8Z6bV7PlrEVCek9+c7KKfucf5cRg7KJwz+zw YdZI8Rwv0uVJ7zRBwNx0WAYAThvp3KE= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-353-qVfsUGJSPU-MjjS2F_ZcnQ-1; Tue, 07 Jan 2025 15:40:14 -0500 X-MC-Unique: qVfsUGJSPU-MjjS2F_ZcnQ-1 X-Mimecast-MFC-AGG-ID: qVfsUGJSPU-MjjS2F_ZcnQ Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6d8f51b49e5so278011276d6.1 for ; Tue, 07 Jan 2025 12:40:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282413; x=1736887213; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yuwIwTLmWdhGbI1nIJskaIORTo+ctaybQQ0IHgYhnBg=; b=TapP39RxVN9C41QQC/lHUXTcHKSI2LlarRVmgo8SBkH2eUU4JnGbdy4TZNhRJffquX PzfDkrBMuBzvd3tewWd3n5jngptRlXsSIkwQD7uwdWXpYUVj0jB59/CeRKGUrOqVhPhx zn1Bvbh73F6xG+9w+4ZlUp7A2Xb5jGzmsdbhCf90NPIn1XJMBo4X1Hlnjg5ZmWtwz6hA F5kFCoq/7nn4x43z0/Bw0Pb6sN8E7uzTTRKQ738nqyaBJjvGouZIewE+YZ5QeIzxz2m1 u2LHFiPNXy9L0YCvj3gkHx92tsOdxPOKIh7Rh3ofYeHBhSMdGbyYRxIxyeQF3iY1P4D2 7vhQ== X-Gm-Message-State: AOJu0YxHPt0WkkSSXw42Ykrg/Xv3HEAmnuzcCyzTpoCavUVgjzJxIG3Y UphPR4jXwJCEYDW2aCOrc3OLpErDpub6DwDrLpRCmdt2dgkz1DJArj+pSoO3Dfcm1f1D5lHQ0bj sNJjP9rZmv4DckdVwE4ZPGuidnDinuEVgzE6RSntnTwBWxJ3YyCpSSM1zVQSINCdEWG3QNG0zIH t6jcXjotaRAXdztnEmYFC/B62d7kYXqw== X-Gm-Gg: ASbGnctV5mo6OgHX9toHOtYii8Nx2Rh/fJKzjbsfvMSVIPr5xuXy/4rXMW1TqsDOf2y ZR6OnVDEQvn+Gv0v32amtgfhunlWJiBZr98bE37nsSLVmnclygzftxIJzzERB+eDvr6U6j6Q67A hh7I3VE2/YyskEsE2VML+K2dPKZ+XfPU5IkfEttCJAmdH264BAdOjUKaSJ0uStJLkZJ+PeTHDWn Y81OUEDqem3j/ZB3ijN145KGrRJvD5IAwBb662MiVZIJGpknl/UgToNInFm79R2yl44FfiDv7T1 nzNJgyOhomCgz1zezueNsUZSVxYSUf00 X-Received: by 2002:a05:6214:2586:b0:6d8:883b:142a with SMTP id 6a1803df08f44-6df9b1b4fdfmr9241226d6.2.1736282413272; Tue, 07 Jan 2025 12:40:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IFsIwbZ8AHU5wiq8R2gE/O6m3JMATzJlnU9LbxD3qmGV4Q8ausp9Ssq2fF5qI6zbJ5x9oBK+A== X-Received: by 2002:a05:6214:2586:b0:6d8:883b:142a with SMTP id 6a1803df08f44-6df9b1b4fdfmr9240726d6.2.1736282412797; Tue, 07 Jan 2025 12:40:12 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:11 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 4/7] mm/hugetlb: Clean up map/global resv accounting when allocate Date: Tue, 7 Jan 2025 15:39:59 -0500 Message-ID: <20250107204002.2683356-5-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Vbju-fQblqbhM4_Y41Uxai0RrRob5k6H5GxabLRhLPk_1736282414 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0C9D11C000D X-Stat-Signature: sfibmbwtq4c7gufdhu4iqete1d8p1rw8 X-Rspam-User: X-HE-Tag: 1736282415-412785 X-HE-Meta: U2FsdGVkX1/VuS7H7+QyqFfnud6ezCqrmrMVnJj4quoYtVdEBQHiTBEa7l3/uiVEUC/QM4K3s61WC6YZvK+w/r80qSZmqni9rGjcbsuDRzq35bG3bOP6lW1xpHlbKc0H1e1juPaarqzYIpDCtw0oM6O+1Ka6jhy3F+PIK3oCGRg1yQr3LCOBALAJzPJMulXLn3h3v/LROHlfCoAJ1zte0+wx1rownRbHefNGc+uGHBhWNWHmXGZpxPEBNdyPKU3sgZIR933H5g4R8UW/TYu84iGL+AL2wpu03EDQHi1mUXH6CJzHYSoSQrfl6Ua1Lhf0z74pE+xq6cJacw/GkEc1xfoX6fSyhons5z9nT2MGnTlyPsj/0Et7izi1YVGiUrQLS7tJ/bROqx9Xq7heoJClb3ArnUy21Th1D1nN02FXVXsiiHDIc993Ru7kYj9A4ZVqWikYDmyV+DgNYvAmgRdIwi0oLKK2A+eQ7tfM9IJTtnMYYtd0oVB5qQeW4GXgXc3dtkZmqwe6ctSu1MMcJcs/NTs3ECvPcckC3qqdaxU0dKBdX6aI9g45tuSctPUomSuOZ/rj9hw+95tKFdx4KqSsKJg5gRxcVvUv29T/lIoRqUTWhUpd2zuUmAdVs1jtmfaXx9pjQmKi3hPnky63dC8TBGKzxisludA//c477Fezc0xVMwp9QkWW3RcZQirYggPBhhkgORyfKJf8cskqx+aFP7GhdDYwb7YyO0utYRf3RlRwhKCUcrA0nvk/aotUDKRsZiXhoV2SFFgaeOCbnbHYdJsOvRqZskREpSSsBMbJGR57XARhTeODuzRF3EnmZDc9IrEZDgQl3dSmgRYioTCAgYPLZFWaA8Kky+I0q3U2qYoLmUKpns6ybKSemdQf8V5wfNxi+ZG7t7GqEjUhwSNEftv++GQ26jXzoZkiIP93hxK4mBVbK+J74wLgRfLK7URpiGpvhJakfs2m6d/JPLj VIYEhgRA 0LqueWg2mNqAIcoBWN+IDmmtXZMI7pDWq60KKT2Ru/8Wgt9ktjhwgRWTDtLvfvAXinfFj+UJpAxTqqkps46MBRzoPFoB1EXu/E93zMumA4yHa2wRHmTS138k7GcWTvpPT0h5dw7l4sCezeUx24lYTXR7hdPmMLcLxPZpNamEG0auIWUtddXVVoE2/ClQg411VrkRWCkb5iNLgtIueWCOOcwiJEAwCMtXi8rs/nN2VipdEpFFLVVJTfTgEmE3+eZLdB1kDSAVOKRV0g6xG+3wRb/w6KKqEJptro/Rs3WArlwQSt8bSbFHdazmVc0zHib1y4cnTeu7WNBbiriRhNgdBtteDE3tzAXgSdaYCZNX9cq8vEhjYbfqWKXVeOvgdSlQG1UhWLbQyTzUKnPk/d0yThUP6PHey2FYh3BBQDgSkh9go+a3x0eFJGZ7GEA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: alloc_hugetlb_folio() isn't a function easy to read, especially on reservation accountings for either VMA or globally (majorly, spool only). The 1st complexity lies in the special private CoW path, aka, cow_from_owner=true case. The 2nd complexity may be the confusing updates of gbl_chg after it's set once, which looks like they can change anytime on the fly. Logically, cow_from_user is only about vma reservation. We could already decouple the flag and consolidate it into map charge flag very early. Then we don't need to keep checking the CoW special flag every time. This patch does it by making map_chg a tri-state flag. Tri-state needed is unfortunate, and it's because currently vma_needs_reservation() has a side effect internally, that it must be followed by either a end() or commit(). We keep the same semantic as before on one thing: "if (map_chg)" means we need a separate per-vma resv count. It keeps most of the old code like before untouched with the new enum. After this patch, we take these steps to decide these variables, hopefully slightly easier to follow: - First, decide map_chg. This will take cow_from_owner into account, once and for all. It's about whether we could take a resv count from the vma, no matter it's shared, private, etc. - Then, decide gbl_chg. The only diff here is spool, comparing to map_chg. Now only update each flag once and for all, instead of keep any of them flipping which can be very hard to follow. With cow_from_owner merged into map_chg, we could remove quite a few such checks all over. Side benefit of such is that we can get rid of one more confusing flag, which is deferred_reserve. Cleanup the comments a bit too. E.g., MAP_NORESERVE may not need to check against spool limit, AFAIU, if it's on a shared mapping, and if the page cache folio has its inode's resv map available (in which case map_chg would have been set zero, hence the code should be correct, not the comment). There's one trivial detail that needs attention that this patch touched, which is this check right after vma_commit_reservation(): if (map_chg > map_commit) It changes to: if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) It should behave the same like before, because previously the only way to make "map_chg > map_commit" happen is map_chg=1 && map_commit=0. That's exactly the rewritten line. Meanwhile, either commit() or end() will need to be skipped if ENFORCE, to keep the old behavior. Even though it looks a lot changed, but no functional change expected. Signed-off-by: Peter Xu --- mm/hugetlb.c | 110 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 77 insertions(+), 33 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cdbc8914a9f7..b8a849fe1531 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2970,6 +2970,25 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } +typedef enum { + /* + * For either 0/1: we checked the per-vma resv map, and one resv + * count either can be reused (0), or an extra needed (1). + */ + MAP_CHG_REUSE = 0, + MAP_CHG_NEEDED = 1, + /* + * Cannot use per-vma resv count can be used, hence a new resv + * count is enforced. + * + * NOTE: This is mostly identical to MAP_CHG_NEEDED, except + * that currently vma_needs_reservation() has an unwanted side + * effect to either use end() or commit() to complete the + * transaction. Hence it needs to differenciate from NEEDED. + */ + MAP_CHG_ENFORCED = 2, +} map_chg_state; + /* * replace_free_hugepage_folios - Replace free hugepage folios in a given pfn * range with new folios. @@ -3021,40 +3040,59 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long map_chg, map_commit; - long gbl_chg; + long retval, gbl_chg; + map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg = NULL; - bool deferred_reserve; gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; idx = hstate_index(h); - /* - * Examine the region/reserve map to determine if the process - * has a reservation for the page to be allocated. A return - * code of zero indicates a reservation exists (no change). - */ - map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); - if (map_chg < 0) - return ERR_PTR(-ENOMEM); + + /* Whether we need a separate per-vma reservation? */ + if (cow_from_owner) { + /* + * Special case! Since it's a CoW on top of a reserved + * page, the private resv map doesn't count. So it cannot + * consume the per-vma resv map even if it's reserved. + */ + map_chg = MAP_CHG_ENFORCED; + } else { + /* + * Examine the region/reserve map to determine if the process + * has a reservation for the page to be allocated. A return + * code of zero indicates a reservation exists (no change). + */ + retval = vma_needs_reservation(h, vma, addr); + if (retval < 0) + return ERR_PTR(-ENOMEM); + map_chg = retval ? MAP_CHG_NEEDED : MAP_CHG_REUSE; + } /* + * Whether we need a separate global reservation? + * * Processes that did not create the mapping will have no * reserves as indicated by the region/reserve map. Check * that the allocation will not exceed the subpool limit. - * Allocations for MAP_NORESERVE mappings also need to be - * checked against any subpool limit. + * Or if it can get one from the pool reservation directly. */ - if (map_chg || cow_from_owner) { + if (map_chg) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; + } else { + /* + * If we have the vma reservation ready, no need for extra + * global reservation. + */ + gbl_chg = 0; } - /* If this allocation is not consuming a reservation, charge it now. + /* + * If this allocation is not consuming a per-vma reservation, + * charge the hugetlb cgroup now. */ - deferred_reserve = map_chg || cow_from_owner; - if (deferred_reserve) { + if (map_chg) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); if (ret) @@ -3078,7 +3116,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { + if (vma_has_reserves(vma, gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3091,7 +3129,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* If allocation is not consuming a reservation, also store the * hugetlb_cgroup pointer on the page. */ - if (deferred_reserve) { + if (map_chg) { hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), h_cg, folio); } @@ -3100,26 +3138,31 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_set_folio_subpool(folio, spool); - map_commit = vma_commit_reservation(h, vma, addr); - if (unlikely(map_chg > map_commit)) { + if (map_chg != MAP_CHG_ENFORCED) { + /* commit() is only needed if the map_chg is not enforced */ + retval = vma_commit_reservation(h, vma, addr); /* + * Check for possible race conditions. When it happens.. * The page was added to the reservation map between * vma_needs_reservation and vma_commit_reservation. * This indicates a race with hugetlb_reserve_pages. * Adjust for the subpool count incremented above AND - * in hugetlb_reserve_pages for the same page. Also, + * in hugetlb_reserve_pages for the same page. Also, * the reservation count added in hugetlb_reserve_pages * no longer applies. */ - long rsv_adjust; + if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) { + long rsv_adjust; - rsv_adjust = hugepage_subpool_put_pages(spool, 1); - hugetlb_acct_memory(h, -rsv_adjust); - if (deferred_reserve) { - spin_lock_irq(&hugetlb_lock); - hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), - pages_per_huge_page(h), folio); - spin_unlock_irq(&hugetlb_lock); + rsv_adjust = hugepage_subpool_put_pages(spool, 1); + hugetlb_acct_memory(h, -rsv_adjust); + if (map_chg) { + spin_lock_irq(&hugetlb_lock); + hugetlb_cgroup_uncharge_folio_rsvd( + hstate_index(h), pages_per_huge_page(h), + folio); + spin_unlock_irq(&hugetlb_lock); + } } } @@ -3141,14 +3184,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); out_uncharge_cgroup_reservation: - if (deferred_reserve) + if (map_chg) hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg || cow_from_owner) + if (map_chg) hugepage_subpool_put_pages(spool, 1); out_end_reservation: - vma_end_reservation(h, vma, addr); + if (map_chg != MAP_CHG_ENFORCED) + vma_end_reservation(h, vma, addr); return ERR_PTR(-ENOSPC); } From patchwork Tue Jan 7 20:40:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA457E77199 for ; Tue, 7 Jan 2025 20:40:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4623F6B00AA; Tue, 7 Jan 2025 15:40:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3ED296B00AB; Tue, 7 Jan 2025 15:40:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F1746B00AC; Tue, 7 Jan 2025 15:40:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EE38A6B00AA for ; Tue, 7 Jan 2025 15:40:20 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7CA9B1C7E0A for ; Tue, 7 Jan 2025 20:40:20 +0000 (UTC) X-FDA: 82981823400.16.1EF8B80 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 4919880012 for ; Tue, 7 Jan 2025 20:40:18 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HM1XZ9vP; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FeRSwArockADk7XuGGq52WKujX3Jyjof3FZOrhLMA7I=; b=npkz93gdH10d0Ns+IAXk+8j0cZFcyrv86CqE+UDh8/gAyMC8zXo0McZcy31+jYDdIZWvZW QNJxV/mg/0a1vGtJ03uA3fhyPrf+Q9v0gqZ3z9Zor4aL8p+2BXBqH+AO9OIE+EoMxQPt0N lP1xo5emANT5cUAGannaKUAxAKV7sac= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HM1XZ9vP; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282418; a=rsa-sha256; cv=none; b=qLjMlCxEW1wfIePYnn+OGHpRJzVIePDClWtevtNeIrhJofsftgaXEAxQLjyEKdI39qB+Zc qhikm9cVSf4StwowFvGdOWzEP0Z27h2ccHJlpvPPrnlQZ1hVoRLgRun7PE/SRRm/8Cx4bh 7htSOuXrwoabZgrSvzxteVbLLflDTf0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FeRSwArockADk7XuGGq52WKujX3Jyjof3FZOrhLMA7I=; b=HM1XZ9vPLD6v8HDh3lIpoCYe6x2e3fdxI4n7B7Hph5Dp9WV8KksMMM12N81m3+PcRG+rKt 43CNm9yThNllS9GLEGWgYJ1dCiXZoRBYbxEHtofM6iyn2n6NVSUwM9sc4Ww62fBzZ+60XH hsjRqY1CyyAIZplMauMRl9ExXns3alA= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-362-5WID1n0rNJCWudo3xIupMg-1; Tue, 07 Jan 2025 15:40:16 -0500 X-MC-Unique: 5WID1n0rNJCWudo3xIupMg-1 X-Mimecast-MFC-AGG-ID: 5WID1n0rNJCWudo3xIupMg Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6d8eb5ea994so186588916d6.1 for ; Tue, 07 Jan 2025 12:40:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282416; x=1736887216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FeRSwArockADk7XuGGq52WKujX3Jyjof3FZOrhLMA7I=; b=Vs/MHV71iYGr2XcpamxfmZZ6NZhwmeAG4jgu7pc19OJINNogVfMToQBAXmlNJbeIzZ HJZs+/aqeNzImnt37aUwv/H2MQEefojO3LeMpng+n9v2XKXaGAZeD4KqzM0qg5YUbO4G wed81w8y/sPywY9fXr6qyQcPLrO9UryFD3k4lf0hNCVCOkCqrdQ+eZWhkf6+vCdFVXda l+Px4LSIs26huXFuIZdu5b/04KlV0agZEqMpZsAm8u7yNaxC1H0/1BhbDl0hEHBx68ve O6u026dFgOu1gmIFbK9asdQv3TM7XQ39GFa72QL5qucxm5x9quRiX6+aaklgTyTKrFjZ BrBw== X-Gm-Message-State: AOJu0YxBWCSSIxETIOnHSth62L0+ae/FoahAzFlXsWQqICIvjOKW30iQ /WEu84AXohu4CHOzFAYQdaPsygrcY8Aa/Xb+R1LEHgC4QnUMoqKWxLF0yrFwRm9Tla49zhIX9et F5hnjqKDKy0JQJ8zWRCafYrUBz4/FcK6z6LlFKNPthno3iT+4y07VunnxUpENdux66sQmsXJ5Jn bWtyt4km8iDFcYkFTvVkjkrhlleK0ovQ== X-Gm-Gg: ASbGncut6/BOkXUKCncAbDyKSa1w04bVXncoWXixq6INTwUav05UkVUvNcT0zC0JepH jaCI2GVnD3J8RbG37XEApvd+c94a0bVRz7IAulaHf1PLu4BzhYq+gZn733ZbZ9iMhgXv+KphhTD dLrPJ5G+hd49kT/nwiUTNMSoUEXvMroVa2QS7qqmqxfxnWxIs5z7ZnUW02nlR+LSmL7nODbc1/N NIrFcxNk01cBG08JTqaRGbuTLQheBuGM94b+P8ACZ/MZTgXLBMZ329opeIwTDzXyzbgpA5l7YkS izW+jMs+jwo4Tnx1vNVPfV3fH9AhJQqK X-Received: by 2002:a05:6214:434a:b0:6d8:cff9:f373 with SMTP id 6a1803df08f44-6df9b2d1a40mr7581116d6.30.1736282415574; Tue, 07 Jan 2025 12:40:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IE3MtP6+dWvdpGsDlM1wGqTR7s1w0fblu3/xx93mfYpEOm6U5pVW/zHRBSeKXgOBBACpJiRAA== X-Received: by 2002:a05:6214:434a:b0:6d8:cff9:f373 with SMTP id 6a1803df08f44-6df9b2d1a40mr7580606d6.30.1736282415027; Tue, 07 Jan 2025 12:40:15 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:13 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 5/7] mm/hugetlb: Simplify vma_has_reserves() Date: Tue, 7 Jan 2025 15:40:00 -0500 Message-ID: <20250107204002.2683356-6-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: f8UL08aBnSclgQrznmi9LcuEszG4VBfXTyhXzkyFihc_1736282416 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: 4919880012 X-Rspamd-Server: rspam12 X-Stat-Signature: 5tub6c4tndtnbn7zeoek7zmxnx9bt87f X-Rspam-User: X-HE-Tag: 1736282418-163087 X-HE-Meta: U2FsdGVkX19Lxo7l+5FsTRIOJeFA1U/OvelZFFCURSYlVJPfcdEmoXiZa0cPeal7PcwPJGpa7kas6/Kc8GgK1swi7/uMPsbh60mAMr+YBfiIX8v4gLuazifotFZJCtTfoblV9ltic9TB2kwhmNNCTU+QnN5eYufKrV8lMAVhuR25RuVIesS+hUSMUdm6hb+AIY5Yvn34hd243fmtgTBlksuLxtcrGVbBtE0IMswRuSNsBLTQniWvXgqEnJHGxWVDEA+D157D9LkaFoVAxT2azdqmtutfOefg+XgIWuaA2n5Ll/bTXOPladIXP6ycqZv6+mA/J27T62Xl51gbLgTDpoNze66TB3f5IMTXmyGTDE7S0Ix/s9Wm8aYPtpHVowet+43BFUj/TOzgCSZwyqxM78KU41Dj242MTcQyZh7JDw+0dOPqYbY32+PDrokZNTlGqSIjxkpHaoarpG9QP/DtmGdffXpoV9X4aL9P20PtWKkQTULFOdlulzTZOYe26dFxWbR9RQzIPQaIzTIs2T7Kvwu3bMwLRCm5GBHfuepJTM9QHUVB/uRr1082Oz1eTuf085NEea+aYuHQhC1RNDUXzdaP3p4Yw7OoHx3FTRQA/AtEu27JflR+9MlmOptJjmxG3T0bTGH1/UxEsWshhdHg9aNIibbexc/6zsmIjq7MekK+ggMB+nGfJdgQMshIXaHnEqYAczsQS8YNiPDZLmmT+jifjFUAsm5XxM+k2CRP230dApKprdKC53Zf/d8Iz5HFAoy+XRTVWfK3Olh2BV0vJnOa10yNoT9y6B9m33W9G617gHpU4o3sNApbf+QM28mhjzO8XhE3Ei+SoOoLoagBjtQOwuQk5yIIlg/ZCjl+8r3Kijo/sQ2F5M6UsmKIm+vfUqqAaHdJq/YJYybI+l0U392uN5lWqKWnUAVrRlJOKpy4t5yG2byEwkxd4Pe7LGdAl/MiqbjnKJ4UBUGaJPh UgnJs+EZ ko1WY0agQ4pf4rxaH33ybdtRrD91GHPOVsnUYV1LUlT5TIKzLJ/oJLRTTCTpnbVTGxQRgMeBAfiprRaLfYhQ3h5WXm6EkMh2Gq2PWowMWvZKXvSFGV7VmCCxq2MmWktEOXFIi1yPFc23+fZgZXlnwKdSzw+270RzcDHTyvVnPrKbphd4CIZu2gXuP0t6EbBS2TDWO071uuT83Xgecdq8vqajB1ba7r6z+LsfNNuMmL4LpBARtt5ucybrgK6mG1n7JdrhiU69jn5j4Wj7pqcEvjAl75/C+dhC2v0BjoK67FHFXpEotdDDBLX/iNcJNd14PHa2AEMB5YC51kcjnUUBUaIk4G28r8uBELvtkC8nsxbo8jYGMaYGFcmb1MYNftlY0QyDIJffhmadl+3V+8Cfuz0U9MdmTxjwbDLebvk6fozkPv5RGOcxTbWoFMD0lvlGnDKXQKgF7pwJa+FTDI3dGWlsGk+kqpPPkhd88mQTUr9QOI84= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: vma_has_reserves() is a helper "trying" to know whether the vma should consume one reservation when allocating the hugetlb folio. However it's not clear on why we need such complexity, as such information is already represented in the "chg" variable. From alloc_hugetlb_folio() context, "chg" (or in the function's context, "gbl_chg") is defined as: - If gbl_chg=1, the allocation cannot reuse an existing reservation - If gbl_chg=0, the allocation should reuse an existing reservation Firstly, map_chg is defined as following, to cover all cases of hugetlb reservation scenarios (mostly, via vma_needs_reservation(), but cow_from_owner is an outlier): CONDITION HAS RESERVATION? ========= ================ - SHARED: always check against per-inode resv_map (ignore NONRESERVE) - If resv exists ==> YES [1] - If not ==> NO [2] - PRIVATE: complicated... - Request came from a CoW from owner resv map ==> NO [3] (when cow_from_owner==true) - If does not own a resv_map at all.. ==> NO [4] (examples: VM_NORESERVE, private fork()) - If owns a resv_map, but resv donsn't exists ==> NO [5] - If owns a resv_map, and resv exists ==> YES [6] Further on, gbl_chg considered spool setup, so that is a decision based on all the context. If we look at vma_has_reserves(), it almost does check that has already been processed by map_chg accounting (I marked each return value to the case above): static bool vma_has_reserves(struct vm_area_struct *vma, long chg) { if (vma->vm_flags & VM_NORESERVE) { if (vma->vm_flags & VM_MAYSHARE && chg == 0) return true; ==> [1] else return false; ==> [2] or [4] } if (vma->vm_flags & VM_MAYSHARE) { if (chg) return false; ==> [2] else return true; ==> [1] } if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { if (chg) return false; ==> [5] else return true; ==> [6] } return false; ==> [4] } It didn't check [3], but [3] case was actually already covered now by the "chg" / "gbl_chg" / "map_chg" calculations. In short, vma_has_reserves() doesn't provide anything more than return "!chg".. so just simplify all the things. There're a lot of comments describing truncation races, IIUC there should have no race as long as map_chg is properly done. Signed-off-by: Peter Xu --- mm/hugetlb.c | 67 ++++++---------------------------------------------- 1 file changed, 7 insertions(+), 60 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b8a849fe1531..5ec079f32f44 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1247,66 +1247,13 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) } /* Returns true if the VMA has associated reserve pages */ -static bool vma_has_reserves(struct vm_area_struct *vma, long chg) +static bool vma_has_reserves(long chg) { - if (vma->vm_flags & VM_NORESERVE) { - /* - * This address is already reserved by other process(chg == 0), - * so, we should decrement reserved count. Without decrementing, - * reserve count remains after releasing inode, because this - * allocated page will go into page cache and is regarded as - * coming from reserved pool in releasing step. Currently, we - * don't have any other solution to deal with this situation - * properly, so add work-around here. - */ - if (vma->vm_flags & VM_MAYSHARE && chg == 0) - return true; - else - return false; - } - - /* Shared mappings always use reserves */ - if (vma->vm_flags & VM_MAYSHARE) { - /* - * We know VM_NORESERVE is not set. Therefore, there SHOULD - * be a region map for all pages. The only situation where - * there is no region map is if a hole was punched via - * fallocate. In this case, there really are no reserves to - * use. This situation is indicated if chg != 0. - */ - if (chg) - return false; - else - return true; - } - /* - * Only the process that called mmap() has reserves for - * private mappings. + * Now "chg" has all the conditions considered for whether we + * should use an existing reservation. */ - if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { - /* - * Like the shared case above, a hole punch or truncate - * could have been performed on the private mapping. - * Examine the value of chg to determine if reserves - * actually exist or were previously consumed. - * Very Subtle - The value of chg comes from a previous - * call to vma_needs_reserves(). The reserve map for - * private mappings has different (opposite) semantics - * than that of shared mappings. vma_needs_reserves() - * has already taken this difference in semantics into - * account. Therefore, the meaning of chg is the same - * as in the shared case above. Code could easily be - * combined, but keeping it separate draws attention to - * subtle differences. - */ - if (chg) - return false; - else - return true; - } - - return false; + return chg == 0; } static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) @@ -1407,7 +1354,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, * have no page reserves. This check ensures that reservations are * not "stolen". The child may still get SIGKILLed */ - if (!vma_has_reserves(vma, chg) && !available_huge_pages(h)) + if (!vma_has_reserves(chg) && !available_huge_pages(h)) goto err; gfp_mask = htlb_alloc_mask(h); @@ -1425,7 +1372,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && vma_has_reserves(vma, chg)) { + if (folio && vma_has_reserves(chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3116,7 +3063,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (vma_has_reserves(vma, gbl_chg)) { + if (vma_has_reserves(gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } From patchwork Tue Jan 7 20:40:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929596 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 451ACE77198 for ; Tue, 7 Jan 2025 20:40:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC36C6B00AE; Tue, 7 Jan 2025 15:40:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4CDF6B00AF; Tue, 7 Jan 2025 15:40:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8B6E6B00B0; Tue, 7 Jan 2025 15:40:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 99EAF6B00AE for ; Tue, 7 Jan 2025 15:40:27 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 633A1A036D for ; Tue, 7 Jan 2025 20:40:27 +0000 (UTC) X-FDA: 82981823694.09.AE27566 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 33E5640015 for ; Tue, 7 Jan 2025 20:40:25 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=C8Sw9ES3; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Tqf5zDG9MEFYyrwcky7SImgBB1+VbhDKSE1RWjLi9H0=; b=jw5DiETREmsmFa/Kb1z/CA0ekcP790Sn3SSnffMpA7WY2S7xh3YSme77DJVyNoQ2h+Iig0 e5m+gQvREryuhqW3+iZvWvKmwwfSyLpdlK3fhSBVxi+M0+UFbmUc6NIGeE2uCtT6sJ5Gqw 3/cB38xSdEiUzcsho+o33bTPwmF3rqc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=C8Sw9ES3; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282425; a=rsa-sha256; cv=none; b=OJASBNaldprZYLwBqTnT7DBc2C4BqNot95HX+EsjTiwRrnEuCGw4xG4bchRRoIjjovQ64/ AiQgEXnvGOIQff0+isfL5v+sowhaupE3uMfP5nbq01PWezkKFwd10MGXqia7tAr3ztODLJ d0Oi2siCHFsREXdy/1B3+uw6OHB1zyU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tqf5zDG9MEFYyrwcky7SImgBB1+VbhDKSE1RWjLi9H0=; b=C8Sw9ES3b2hTYmg7arr1/aXAIHS1VsEAiT24/2hYc8sXH0WsfVaHQrrqagV/rwCp3CmhGU BEIx3aw4vsue6ZWk6Z0+/WVMCR/nBQx9OSxFEZh0kZxQWY1KjkwPJZV+6ZD+6i34fEilus +GRNely3qrnDkvzA55+QRZwPqiloIvg= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-120-S3jTBcwMOy2iNZ5Q3F1kUA-1; Tue, 07 Jan 2025 15:40:19 -0500 X-MC-Unique: S3jTBcwMOy2iNZ5Q3F1kUA-1 X-Mimecast-MFC-AGG-ID: S3jTBcwMOy2iNZ5Q3F1kUA Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-7b6e7f07332so699901985a.1 for ; Tue, 07 Jan 2025 12:40:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282419; x=1736887219; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Tqf5zDG9MEFYyrwcky7SImgBB1+VbhDKSE1RWjLi9H0=; b=luSTu2u7NeeloN29ynsYETe9TfJPZLpPLAKXcZpRAjCs96UY+R3o6vFj+4NH5z3TDm mQC6ZVQvrqrxV/sK0+jb4AHY2jFRsqluRFZ7FNxxhlzLzH9Hr4D/WcAAtmyp6xVlm2cb gRR3Ss2jjK7cWgpXSH/sJPgBrwPNervEs1hz6z921JFqUJnSKRnBfIw5by4DkgAVasEh a0x2Vad+imOZj4/OLc7Kow+V5ihIocbFSBQhgdLcDNFxAfi6JJzbyKkkiRvSOwX0DyR1 6RhrC8inpXMKLG6YWFrVb4x+nrKaJYEi6RrVQ4bWbLkQTj7S5XyEELsTsVPYEL2zV9bb nzkg== X-Gm-Message-State: AOJu0YzoMuIWkjrNcJR/8ZRaV5vEDZRr3sY3mVLpVFtiRAWVwnhS2Awq /3d3ZAHNh7VZNZ8o5xEWwAEIJ44bkG9OO8vOKxFN8zQ/2FwliIc4lVRzuGlfyvhRr6tLigC1ljd Y5wKEGfhIlTvPdOhNSQ208aqx9NRUj6AahgoNLX/ud/jB9+Vxf1TzC1F03hsMY1VpQoKG90g3AK Bq/JsmLwVscllwcHcuiuxsr2NyVzxiTA== X-Gm-Gg: ASbGnct3mHt5rfIQnuL4g7Y5bPbZI2XPoHKUXWiV9acnwWWSIasGiI3rKARNc3Lw6rf y+WA+2rd3NwEK9+4LEa3nCJVWOh3FiY80J6HRDmoW68WhLtMcu7MSIbVeMzXWigM5w4BYQG+H91 6+XZqRK625ydMxvc4cPRnhd44z37IJXM2pbGttRZA83Ui9rkXNUzq9Gm/WiRnUZd9HR0bi/hbGX KHmXyHLUICuQYYEE5m0kDHGGXx2ghzeyDV6lNU+ueBV0nYpk9t2BzmAtvBCXOjtN26F9LW2WkfL +9JJiPWWTfBnTDu60Mi5JdUHN/PjWk0P X-Received: by 2002:ad4:5b8d:0:b0:6d8:a091:4f5c with SMTP id 6a1803df08f44-6df9b2ddae0mr7742976d6.33.1736282417373; Tue, 07 Jan 2025 12:40:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IEO5EEzKoo4b8+dAKo+7OX2D+ltSzCgzDPjxXEVdIBKDurWSenXTIIEcfAQvgTlo32tr/LGYA== X-Received: by 2002:ad4:5b8d:0:b0:6d8:a091:4f5c with SMTP id 6a1803df08f44-6df9b2ddae0mr7742556d6.33.1736282416972; Tue, 07 Jan 2025 12:40:16 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:16 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 6/7] mm/hugetlb: Drop vma_has_reserves() Date: Tue, 7 Jan 2025 15:40:01 -0500 Message-ID: <20250107204002.2683356-7-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 5xUNE_BsARsFpoTx4wfWAZNUut0JubJklhVfsHYGEJo_1736282419 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: 33E5640015 X-Rspamd-Server: rspam12 X-Stat-Signature: nnozc6p93zkbnexf6kny833rf5i8bdyp X-Rspam-User: X-HE-Tag: 1736282425-70733 X-HE-Meta: U2FsdGVkX1/eKYKNxReUqNzUb9t1UYDqMret6f9SInSg1zSCSqoYzUh122vgJsnUYMYAcvRiz4RsGU7IbAPLyvt3hIaYHHPO/YIhMRLJNDKH0G7LRxrevUAaMI2Z1EogcP516Z0VYMCrjvX3G2FDforXsClnJSRuzbwUu/tDdlTrFoC7ObdklOSqeAKDw1+hFZbL3btHVrD5yXj+lvZkFN4DVX0DPtWJQqpFfos5DX8Ull3EqZDQ3Fh8qRicgvpufSCTR4JIALeyav6s2PXA1J4mCHTwTsqRrIXOdsyFE/v7bFD4eTDGkJETNnstgBC3w88P58cEMHWFgmDs+iFMvVWUq8eRNemfllAy3RynsKm/GNh6PdVpAdJRVkAZez9sp7ETs/roRqCefDdceCQOZ9Ba9OJX1kFbdNH/4BIqM9AbdnsZEKGNWfJolqG+tTpLUsM3ZuuRYhhFr/vz+ekppKw0RZshnb7X9WDAABoW+jZG+KEjIHqzyGAzJ5uivJKTANoL7EdqgFB7vElldt35H4atwow5tlga28JRqVi8cI3N3/N04dSMzyhT1EQI7wNx+NVTrLJe5c8TrX2yews7I+l0b1VgM7xy9/Zu0Ehic3ShaXBU9f2Q3mZ1x7E/wNNQidxjHtZyFSvnH/8QHED97gcUGLbY124ZWKT35YDNwWjSHwNAKE41Nd1wiliyMYGSnmIrDY7vbafenj4qKWvIolUF48iEOR8oqtEED3vmuOJM4SepUCCH2Si/1IiPvpNOJXdGqdFwSIDwbbTkZhEdcnHMQ1SDqaRTv1sul0iKwkgEpvcchdq/UkJpf+jbtmm4KPO7ZeLn90ED9hPWhD8oSR1xoOeAsy+J4lEYLyBW59089xzpP+y1AzgaWa1iQZe+i9A28EfJrBCa1L70NYsmDMhLAr3yFM9g4zDsoe++1DnJ+ghi+coeaLwh41ZlI67o17moz+tX9FNYvvJH3Hp H7W2FCxH NXu04r4ZoC3osWEex6n/kH2wA+hXY8ydbNHmRtx+n0sT5tYxLcQ0N8efmLkigXhmKj7rqXUU8EDUXWjB5hSfFbuauCoqaFQLyEZJ4xTKEm4TS5CTUJw8BF4EIyPUSIYYXlT3zaI8ALWkhLICGkanKB0JXvYyZTjLa1AZMcGDW9MegaioZLoiLwOpqBGBBlkz53I5MVZGj0wV6eAHSMLSCwVDtna9Y9VCV7VFWaNW45FpOVSdedxCYsGWA2xTcPpHlMsg5yJHxBexOHjje+SBhi5qdplNpLGg0wYfMF3+MaKPyCdIGmEGYsNmmFqt/830QUXlc9sHEg4e1ofzpuqtKhfSQr/byWWmO8vOhSaRMoIrSZd0Q5bDlpEfLh6oeOgmIIj3MvuyAL6bZBLQYQ0ccLCCcaiwp9TRulhZK61wSn7lL1e7rQ1TVdG6rjFE0yShCThSBdd9n2rkOaMwcZooMBkTXmrNIRMQNhhBnp2TczK3gUBM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: After the previous cleanup, vma_has_reserves() is mostly an empty helper except that it says "use reserve count" is inverted meaning from "needs a global reserve count", which is still true. To avoid confusions on having two inverted ways to ask the same question, always use the gbl_chg everywhere, and drop the function. When at it, rename "chg" to "gbl_chg" in dequeue_hugetlb_folio_vma(). It might be helpful for readers to see that the "chg" here is the global reserve count, not the vma resv count. Signed-off-by: Peter Xu --- mm/hugetlb.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5ec079f32f44..922d57e2413a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1246,16 +1246,6 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) hugetlb_dup_vma_private(vma); } -/* Returns true if the VMA has associated reserve pages */ -static bool vma_has_reserves(long chg) -{ - /* - * Now "chg" has all the conditions considered for whether we - * should use an existing reservation. - */ - return chg == 0; -} - static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) { int nid = folio_nid(folio); @@ -1341,7 +1331,7 @@ static unsigned long available_huge_pages(struct hstate *h) static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, long chg) + unsigned long address, long gbl_chg) { struct folio *folio = NULL; struct mempolicy *mpol; @@ -1350,11 +1340,10 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, int nid; /* - * A child process with MAP_PRIVATE mappings created by their parent - * have no page reserves. This check ensures that reservations are - * not "stolen". The child may still get SIGKILLed + * gbl_chg==1 means the allocation requires a new page that was not + * reserved before. Making sure there's at least one free page. */ - if (!vma_has_reserves(chg) && !available_huge_pages(h)) + if (gbl_chg && !available_huge_pages(h)) goto err; gfp_mask = htlb_alloc_mask(h); @@ -1372,7 +1361,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && vma_has_reserves(chg)) { + if (folio && !gbl_chg) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3063,7 +3052,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (vma_has_reserves(gbl_chg)) { + if (!gbl_chg) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } From patchwork Tue Jan 7 20:40:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9574AE77197 for ; Tue, 7 Jan 2025 20:40:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC77A6B00AC; Tue, 7 Jan 2025 15:40:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C75216B00AD; Tue, 7 Jan 2025 15:40:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACAA56B00AE; Tue, 7 Jan 2025 15:40:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 883556B00AC for ; Tue, 7 Jan 2025 15:40:25 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3837E140CCD for ; Tue, 7 Jan 2025 20:40:25 +0000 (UTC) X-FDA: 82981823610.19.99C3CA6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 01A31C0006 for ; Tue, 7 Jan 2025 20:40:22 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MBF+oQsU; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282423; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/+nsOFPsTLzwi0UWOyyE5ZOk1n60MXP64Ncf3blKR+I=; b=lDcj4/jt4GJMAB7saQ0soFy3svFGlv2Aixl7MKV4fjTZ0THiOCUFQaJrsCY2qZjXhSCuqI 95o3Y31J0XUDJQxhmUhZGUF/y3TzIjln+XU9CgyuUMgZU7nFj9Y1JPI87/H25vAdcpflOK DAAWDSFlcrsASJxXBN3Zc9pARjdd078= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282423; a=rsa-sha256; cv=none; b=KdHm7W3Jli7VZc13AOCTLNAC6A7zCjOI6/gyw8x4QtOxQeSsOCQq3kbOXx1eIUlKO+LrB6 Y4Ek1UF2M6L14G7TM3KLcx3ulrzW0JwYSZ+scAJkRIRQ0jHDp/QD+knif5gwVd18TwNa2m DPgsfQ3WidRyaN+NkimGwR+/iGhHaWU= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MBF+oQsU; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf10.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282422; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/+nsOFPsTLzwi0UWOyyE5ZOk1n60MXP64Ncf3blKR+I=; b=MBF+oQsUc2yNRjBDJQHh2ZMk7ChCkb9HcUaQBgLODU/nJ1NTr3JV+ZbY3fwJMNIz1BcTYQ arEntC4pAtgYnOZ0p2SWxyrKnyy3+RogcYc/efX5IAYkDwgLR8H8yQ7tPWaIfQGV8xVK1q eMC3pmFGtAA84yWZbRtHfbGsAV9/rPU= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-134-3d8ha9J4ML6-6dRm5gxtZw-1; Tue, 07 Jan 2025 15:40:20 -0500 X-MC-Unique: 3d8ha9J4ML6-6dRm5gxtZw-1 X-Mimecast-MFC-AGG-ID: 3d8ha9J4ML6-6dRm5gxtZw Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-467b0b0aed4so341173251cf.2 for ; Tue, 07 Jan 2025 12:40:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282419; x=1736887219; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/+nsOFPsTLzwi0UWOyyE5ZOk1n60MXP64Ncf3blKR+I=; b=xSyE2aHpa9brrgx3Nq80vgQDb3q2a74BsVFCaOegc1oSyN829gvXVkcK1hR1TTnDt1 rXki3Ez8TOV/nHzonOA8ngTlM1DmSTp2uTry3atjaIcQoT0wWZ/A+LQisdiPUwd6WphB LEOd9hEiMC4np24A10IsXyTsVm5uVBV53DGnELJxpgxIsUt1qNvnQ9wEYesGQiv/obIu M1B2vhHxHw4hahBM5V/ToIuNL+g80kVmB174haQSoCl6OEijvjbyU8Y2ShKhqxZ2fLUG oI8kMUnogmXGuKVK+UdPoOchxi/3CnYRJxF/8c2jE8ABB40zXTLdQWeNKy243iIZO3Sr qtTQ== X-Gm-Message-State: AOJu0YxWJqp1HDyr1z5FDu6IMIFwkUb/aQHcgyOmCVvStjiUewmhqvP9 PbVLBr4FYIG598Nnr7O+mPl19kdwpnqA/YTcukgjpapJ+igwMKyYUqUVXtMabEHRt3fxY9SUhuA rs1eOV8YsioaQ+xqkJoM61QlOmA51+FN5I1JbujQKVDZP228383OoAe2uk47IwUIYO+bv2vqOFn QofhmsE/hiqPFh5KyZ7jF+eUOBRelBkw== X-Gm-Gg: ASbGncvdLnxOuGqraga5WEBG9qXrUjeNifAAQx1w7o7haOSmW5SPD92x2JS0QaWPraR SmTXWqepuBdhlChM0AhTVUsnUUkCmgv/zssEjujgawcaNxidptIDUiK360NjKM8dk3V0vyfagPK mEH4Go22d0xBNwtp0FHVHG06aBfq/mJU0ZMfIySaqfQ5lGd+Nh9mTsfauWSMEpChjUvasm7LORc mI0tbgifTDkr+GfpPk1yTr1YY9naC2f1xvOxsV3iFXR/4eDvuXbecvKMiTiSiFsHojtZ1zKicOW W7N4Ibizq/6ce6h4ap+iPoG8UutwP6lm X-Received: by 2002:a05:6214:31a0:b0:6d8:f7cf:a12 with SMTP id 6a1803df08f44-6df9b2ff012mr6628226d6.45.1736282418931; Tue, 07 Jan 2025 12:40:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IH3x2JnYKqYDGZceKkbdYysg3OVrQ5XZc6PkLXZvto1IqyPEH4R4CSwLbtkmwcRm2B/Q8ATbw== X-Received: by 2002:a05:6214:31a0:b0:6d8:f7cf:a12 with SMTP id 6a1803df08f44-6df9b2ff012mr6627846d6.45.1736282418604; Tue, 07 Jan 2025 12:40:18 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:17 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 7/7] mm/hugetlb: Unify restore reserve accounting for new allocations Date: Tue, 7 Jan 2025 15:40:02 -0500 Message-ID: <20250107204002.2683356-8-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 2OPJKxdTOUMBsRh1Xm14li8U6PovpB8keF8LZHqmFr8_1736282420 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Stat-Signature: tajet51byuq7up5bdm4htpyor1i6hw5m X-Rspamd-Queue-Id: 01A31C0006 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736282422-701029 X-HE-Meta: U2FsdGVkX1/gWUR6YFa1DzGj7ISrYX9w4BCiCK3LbedLUIG8s+wT19HZwOFkaftoHZOgyiq/b1d0Ef/SAefQdSr0NGtnXjRPozUo+0THSyOOp58hVyHKDx2ynYKpSbM2fEl4fc2f1sO4OKqHQadpehWwLXlfyvP8xKP3QZ6IE9qHmvL0TXVobbN1l9x5vUH8huNtGw9H4RpEUm2wIhVFqXI+vo6WQjzl0SjnWd7WZarfTnws9cNhrNdDFV4G3Juco/kXzDTjvLJ5biy6jiS/LZht+KOoXN2bmAv9rCyslktQdsgGhO2D1hAThfLWVt9p1U0K2+TyKaBDCc4VUbgXIf25WSnr8akLbclVhmTdVbYGuj/3eJ+OWlOXhB9LdA78non5RjrTUVr2Q9AUkqMBPQMRLSPYHZBOtR8Ux4sUAKfrb0wZFETh1hlR2hsiPyIwXqMfq6nWNyAskVAU9Kf9Yyn3NDwL6kthRCDelWdj/Rff9A4hgKw/PDa3iqWO0iTazx5sFAbaU0Rc/Ryh6QXchaKkFePQ5s04OLQ865mmTWNrDE0/RooXFfPQ+qF/36dZVjM0CUFx0XLAyYAXWz61loDkxzDgzHrrlGsFEo6Djk864+bxjcVvAEcPItNr4j2V/yiAc8LKkbeRKBwwewdsMaP2aZi8L9MAgn3a2Ad5Ve70OmkTESp14O6OIaMENSPBFHFam4qkH93YhiO1dy+qLhOOoF1yLCjrAQUSMs9bedPFksmn3Hrz/soCYkVO2kQQnL8vCXd7ZCh2ddml/W3rCPFZBt4mq7wpXXOJW0siT43+IrZ8JbzCdIlR1wsdSqwFHQGIqmPk/MZ7vv3BHwu+G9Dtj5KC70PAnbJPysQ1U5oVvoWQ3XmpYh22F5ozUPzVT+w7FV2z7nI0oHIAVO7xtDLYLOQqmIGhETR69PvFIr813+/vwmJoGWkZPRlTYvbJsFH1Drof9tnVRxvq+IM r6/LcftF xpMGT4vXN84KmGXzXixrcxbL0k0xoZefTFExdpleGzWjko6mWX2kENLRd3wdyls9esbgtpqx7Fe1j9+nX72PbT8n+vFG1z471x53oHhEPB0wRwdP3CEpdo4qNwBU0/RJZDV30V/VeM3TvJrtXUfD+sulLpNBlfZKUfgagyLpwkbmxFnYkf3iJN5oBzMSDkyg4hu28LgWNNhcdM9zDk832cscc38MtyUN1AEndG9dNUkaHu1TkRI4sOYrGmGyGfk16DN08AfrhyhIW9SdzK5GiPBzNppmxHe7jQ/HGClqcdGYgNL3VzVvNnvbJz6eQiFsxTRKQcUEHnsMegTjSoLbsBRfcuLSnEXNwYvL9L4AD1MvF7Yd/6qJMCu9cG4ztikusxh30dZZsIfh1G2F7N4cVpGATimNrHTUIE+UURltUQCQIerbti/J651uadg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Either hugetlb pages dequeued from hstate, or newly allocated from buddy, would require restore-reserve accounting to be managed properly. Merge the two paths on it. Add a small comment to make it slightly nicer. Signed-off-by: Peter Xu --- mm/hugetlb.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 922d57e2413a..3b27840de36f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1361,11 +1361,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && !gbl_chg) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } - mpol_cond_put(mpol); return folio; @@ -3052,15 +3047,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!gbl_chg) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } list_add(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); /* Fall through */ } + /* + * Either dequeued or buddy-allocated folio needs to add special + * mark to the folio when it consumes a global reservation. + */ + if (!gbl_chg) { + folio_set_hugetlb_restore_reserve(folio); + h->resv_huge_pages--; + } + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); /* If allocation is not consuming a reservation, also store the * hugetlb_cgroup pointer on the page.