From patchwork Sun Dec 1 21:22:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5EB7D4978D for ; Sun, 1 Dec 2024 21:22:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4A3B46B0083; Sun, 1 Dec 2024 16:22:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 404B46B0088; Sun, 1 Dec 2024 16:22:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2096D6B0089; Sun, 1 Dec 2024 16:22:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F37296B0083 for ; Sun, 1 Dec 2024 16:22:52 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 96392121398 for ; Sun, 1 Dec 2024 21:22:52 +0000 (UTC) X-FDA: 82847664648.23.125D82B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 8F1B54000E for ; Sun, 1 Dec 2024 21:22:45 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WmDj0fCX; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VgKna+t5A1Bp9qSkDY4pJYluOesve9BjxVP72JgrxSQ=; b=cbuSq1dZPtpJ25hKavEexiLmA9WuVGlwH0Jc5zOij0LjS+KPn0L1G/sczV/EXqQztZzWXR XzvJNBqy1ANZMBhEN/GJoFpW+39p13F+mSGX96ESTdWy/ED7Sh74l/PGqm1OvWyWHrCv9A 4DGoM9uoikpmJK/KDpwWkOjYIhNzlNg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088163; a=rsa-sha256; cv=none; b=eQw6BX7DtzimaZeWYz1uGtCRMWOjmx5vHqYEezh8wzde3hq5Rm7ziC7u3bXjM7H2y9G5TA qiRp/rdyIVjhuFNbNnoDO0pvFwR+goDOo8JZj/EwSyJMBRRPvgdtlZysyvmDB7eVClw8OD Zr2EYRFXw7aRdju1cftCigca4WkRnrI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WmDj0fCX; spf=pass (imf12.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088169; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VgKna+t5A1Bp9qSkDY4pJYluOesve9BjxVP72JgrxSQ=; b=WmDj0fCXdSE0MvRO8HNOQkVr9m7dJSAd++UJ4CKRwa2QoBnUBzn0rBS7L+pd//awUpp82V koudJEWdqDQyKcRo8TQJXXy+zwNua4eTY6FbeTJj4V6N80gEuLqtgnQU5QrW8PHgy2WEHs lWRU7vIGRba3LczwEJHi5oEP0fXV/jw= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-288-0NDNW59qPv2fJHMz3bL47g-1; Sun, 01 Dec 2024 16:22:48 -0500 X-MC-Unique: 0NDNW59qPv2fJHMz3bL47g-1 X-Mimecast-MFC-AGG-ID: 0NDNW59qPv2fJHMz3bL47g Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-4667e12c945so53229451cf.3 for ; Sun, 01 Dec 2024 13:22:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088168; x=1733692968; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VgKna+t5A1Bp9qSkDY4pJYluOesve9BjxVP72JgrxSQ=; b=xNN75WQ6Shs25TUy9UWxIcb+PVKlP32alZcTwBJMdAgCzQUg5qjt0o0xYULxF/+T2P nLDAfxOZ4nnn5ch3TY+/DT/uGXCWnNdB9IGl7pO7mg9YK7w7R3AaOpcYVnWiAG/UNd6c 4n4Tuya887OquPxSqGSK3725UZIQToFE+JDYIfWOnrRjbXxdfTore5wCmlZrQI5a8MlU D4CyVViDyCeENLFamcJ6JcvDum07jyNdnftlnEiA00uorPdLTwkzwch+4yCRuPqxhJSF y28XTupZpE1haRx4+/C1R62vdz6GAP6kVfmfPNz7ApLhZmtcA0wAGqBUnmfDAq7gp7NQ i/DQ== X-Forwarded-Encrypted: i=1; AJvYcCX1hn6cy0xjQ4ejJjqtq1Jbn/J7XxzocsVJj9QcZvGpK1LbtpWU1cfSHD8O2H2iUYJr14MzfqGq5w==@kvack.org X-Gm-Message-State: AOJu0YxIiZJ3FUrK97KkIAu6QCriDKkFmY128G/8lH/O8BVxtfBZmg9Q hmo+D3INsmCvzmzT4B4CDJGvl/B6hmQyHPXNAhCkzRMIYiY4AzP1QwVn+kIeC/wAnKNgrGkt+ij vap7hCzZ+RosixOKntS6B8lml3+PUJ0BvNJCZCy6o9WZea0xY X-Gm-Gg: ASbGncsEOLqFBYLg/PTxMKya4CzOwuc6TilQ5spzAm/Hurs3Gp2DIduSYtLUgoZEbah TbD7bBw0/hBWc1sF1WXELJYQehXchZGgiV2Dhy8AxME/C2cG1XUM39uB2HHe+3UXwrIbaUrb6qk aOBQ/Zrs0rcmBCwQ89slWi8SpyXEG5L8IVxD0o7ZK7BC7ZBcV94LRb3Cjd00Gd7eGPAa/zzUtf0 bHZ2ZRrrgrKIyt/w+ap3jqrQ3IWLZzW97tLWTXwnEV55J/aYWDFWnIj/WX+1Zm/yT0vIKQD2he+ N9V5bkE28JFqdFMaGTyrqxnHFQ== X-Received: by 2002:a05:622a:1a20:b0:466:a587:8ce9 with SMTP id d75a77b69052e-466b34dc575mr318575421cf.6.1733088167850; Sun, 01 Dec 2024 13:22:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IG7GykcZ01mbWQKH4SyRwwIoF+mZE5tT1yvMdlS3a2CiAzYqEu5ysg5BfeHrZjo8bJtgpTuBA== X-Received: by 2002:a05:622a:1a20:b0:466:a587:8ce9 with SMTP id d75a77b69052e-466b34dc575mr318575121cf.6.1733088167457; Sun, 01 Dec 2024 13:22:47 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:46 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng , linux-stable Subject: [PATCH 1/7] mm/hugetlb: Fix avoid_reserve to allow taking folio from subpool Date: Sun, 1 Dec 2024 16:22:34 -0500 Message-ID: <20241201212240.533824-2-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: iiKpv166HpJnE16ucMzDpZUjJv5VL6sB0FhHxaRydKE_1733088168 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8F1B54000E X-Stat-Signature: z84mu8s9aw3u1wq6mucxgbu54mzsojid X-Rspam-User: X-HE-Tag: 1733088165-207640 X-HE-Meta: U2FsdGVkX1+avZBPw4XAwFS5JbYcp/EAHUScYQ6Jjg3Jpg7LbkQ9Qm2sc+JuRDISgo99Fm3PKhXUBeUgsqclTyRqsGVODZdZRFtCOMMLiympb4QPzIfI+/c1TEIxtLkb5sm3yWcPXHiLeEOduIcpkLtnlkBAqq1F2VX3sRoPVG1A++BZG9mOIfRBtWbOI5bihqTn/hofJ1tJh5XbALTquM3MJphlHfcK/0pFXd0PGvSCkiMRPpTPVmabXsu/UwpaksXbnP/3sg4OzPuV/6xGe8T1/pTY7Fvewz5vULb8aro6DbOUTVxmVwCZDXK4AM3Nry1fd3rO5rDamCyGnYp9TYEdXK5b4IQz08blvSAYTMc8hQsviIjdFRIoduBKGxEByGffUg/7u3Rz/AOrbt9p01RbVqro9snefGBUSzz24F0XGTi0jH248+SgiXYg6F3rk5rO/3pUafe20A82LeoSVKbzXaEWs9NKgesz6PzPmWHc9HPEBYUIAZXcMI9tRkyHryq8q51RT8CfIhlnMMnYdMqAGdZwNbrw1muLn+3ySLW9OInf98rbYzDUu4B8CfXpa00KNs2HJvmPijAuLeD+3EE4lG4W5KvKHb0je68AubeWXH1KUtk4vN2VUZhYH+T5/hF9tYB5J/2gSCnzgitJ60t0W58MK+jcP2OjO+JnynqP6QNwBppZVP8O9u9CFJoMPsBrD4ue+SVqVlLfb7I1/LYeEOZAfG6XdMSghfnR8WGRkHYcIghhMTt8pGww+cP+uxZIyxi0/0OykK/8cpoY0hIwyv9Mfvb0zVfVM091EEJm76qGTDA6fBCGOlFj8NUV8ybxX/1xSdn8XD/oUB5QZyiiCAQdskm9dLnshj+CMqd39VrBW8CO/jFfbH8XX90jdZHOnBzDGJLUZ8ex/XMLLtmuHN4y48wu0QjJjialhbDcv8ogTrJL4hK3L8rWM/hldLDt4zKtwMv7ONBtz3J DVz8y2ZJ +F+YBEezeelBdgYZ9aG9gKHQeFSgAj8X5oOAmO5ZbOk0gG+PLmyHM4jhgTSrUfX630BD5dit7K+vxYzZHnkxMfqm1Qln4Q42KKt2pWvlMI2Cy7oQ5eKp9GHbIlB3lCdQk6u8++zDNblY4QqnlB9pZOQBbtA0wl4kffPUwWxQdj5Md13nYQS/6Oidn4GyelPXZShX2n8t6Qh+00cPlvqRcqF7GbiJTd7YVWHichMQAGhkDWgHFn1MW1b5MVZcl1Jguffks3jo56SSgClDLxNegLuK9rjvSV3Spe5CDQUu2F+HhzuAtipoa2UkACAVJcD7bKCphrQgN3EzG0iZQnAXp61xss5d8ph2wHi/yvC9Fu17l6H1ziS8kYAMOSrl5Mu/ArNVEgVka1R/d3eAc3DnJzNrgzrfT69msWUkGYTc2hKaE0+X/PkUBl24nFx+33SfoswEMAy98PZnRcC546ThFXnFFeiL912XcPNbzyz3YiHv3AHJCtFUC8Y+Qm/2YhiH1gJbh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"), avoid_reserve was introduced for a special case of CoW on hugetlb private mappings, and only if the owner VMA is trying to allocate yet another hugetlb folio that is not reserved within the private vma reserved map. Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate"), alloc_huge_page() enforced to not consume any global reservation as long as avoid_reserve=true. This operation doesn't look correct, because even if it will enforce the allocation to not use global reservation at all, it will still try to take one reservation from the spool (if the subpool existed). Then since the spool reserved pages take from global reservation, it'll also take one reservation globally. Logically it can cause global reservation to go wrong. I wrote a reproducer below, trigger this special path, and every run of such program will cause global reservation count to increment by one, until it hits the number of free pages: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include #include #define MSIZE (2UL << 20) int main(int argc, char *argv[]) { const char *path; int *buf; int fd, ret; pid_t child; if (argc < 2) { printf("usage: %s \n", argv[0]); return -1; } path = argv[1]; fd = open(path, O_RDWR | O_CREAT, 0666); if (fd < 0) { perror("open failed"); return -1; } ret = fallocate(fd, 0, 0, MSIZE); if (ret != 0) { perror("fallocate"); return -1; } buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); if (buf == MAP_FAILED) { perror("mmap() failed"); return -1; } /* Allocate a page */ *buf = 1; child = fork(); if (child == 0) { /* child doesn't need to do anything */ exit(0); } /* Trigger CoW from owner */ *buf = 2; munmap(buf, MSIZE); close(fd); unlink(path); return 0; } It can only reproduce with a sub-mount when there're reserved pages on the spool, like: # sysctl vm.nr_hugepages=128 # mkdir ./hugetlb-pool # mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool Then run the reproducer on the mountpoint: # ./reproducer ./hugetlb-pool/test Fix it by taking the reservation from spool if available. In general, avoid_reserve is IMHO more about "avoid vma resv map", not spool's. I copied stable, however I have no intention for backporting if it's not a clean cherry-pick, because private hugetlb mapping, and then fork() on top is too rare to hit. Cc: linux-stable Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate") Signed-off-by: Peter Xu --- mm/hugetlb.c | 22 +++------------------- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cec4b121193f..9ce69fd22a01 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1394,8 +1394,7 @@ static unsigned long available_huge_pages(struct hstate *h) static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, int avoid_reserve, - long chg) + unsigned long address, long chg) { struct folio *folio = NULL; struct mempolicy *mpol; @@ -1411,10 +1410,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, if (!vma_has_reserves(vma, chg) && !available_huge_pages(h)) goto err; - /* If reserves cannot be used, ensure enough pages are in the pool */ - if (avoid_reserve && !available_huge_pages(h)) - goto err; - gfp_mask = htlb_alloc_mask(h); nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); @@ -1430,7 +1425,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) { + if (folio && vma_has_reserves(vma, chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3007,17 +3002,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; - - /* - * Even though there was no reservation in the region/reserve - * map, there could be reservations associated with the - * subpool that can be used. This would be indicated if the - * return value of hugepage_subpool_get_pages() is zero. - * However, if avoid_reserve is specified we still avoid even - * the subpool reservations. - */ - if (avoid_reserve) - gbl_chg = 1; } /* If this allocation is not consuming a reservation, charge it now. @@ -3040,7 +3024,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * from the global free pool (global change). gbl_chg == 0 indicates * a reservation exists for the allocation. */ - folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg); + folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg); if (!folio) { spin_unlock_irq(&hugetlb_lock); folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); From patchwork Sun Dec 1 21:22:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889652 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4091BD49789 for ; Sun, 1 Dec 2024 21:22:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C44D96B0089; Sun, 1 Dec 2024 16:22:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BCD356B008A; Sun, 1 Dec 2024 16:22:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1F216B008C; Sun, 1 Dec 2024 16:22:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7619C6B0089 for ; Sun, 1 Dec 2024 16:22:57 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3D1D7AEA90 for ; Sun, 1 Dec 2024 21:22:57 +0000 (UTC) X-FDA: 82847664648.03.E3C0D5A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id C0CEAC001C for ; Sun, 1 Dec 2024 21:22:40 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NlI2sh0Q; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KsWvwrfBnqjvbOnN4TJvOMsBkpZZHFy0C0E293ikvrs=; b=5oyXBilVdhTu+TPi3dIZKGgjVXjNxkeYEyEc5nO2gh+fJQ1iQjA52nLqJllXutYGGf8pig RUDD2K9L4HjiihbLRF/PfAuDXbtVV3S1441Xe1fNJFZE4MLhFxrCA6MphOzA76GpV4U9J8 3KXXxyozEjRP2rze9Clapno0PLlIFQc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NlI2sh0Q; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088170; a=rsa-sha256; cv=none; b=Uz5HTuleyE4coEOBjWvTbkST7hbylesOo3d8PfIQmjnShBXQgajpGvy4JOtLgYoUO3rw2z VevtL9V9IX+CjL4ZQBlrf4MQd5dyxvSLJMbpMhZkkwfVoTNFVMb+qSD9LHVq41Mez9r0jt kL3Uskiyjt1ZbZOYtmvGIOFKOhhAqYs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KsWvwrfBnqjvbOnN4TJvOMsBkpZZHFy0C0E293ikvrs=; b=NlI2sh0QVW/RKVELuahqqG9WwC8QiEIRYXVXRH6EbZ5MSOaKpZPKDfl8tiJDj1izhfS6Gr 2E+Cg6jG6SmPBgymeWHpFJ+rMeAphsVU1EslgCHH3bEgIFBuZ/7QpdsrmKjx42AzqLejXg 2cvRlFh+3BcVbdurCR8AllssF5GL9zk= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-78-DaP31WVdNRKnCRJP1_xCsw-1; Sun, 01 Dec 2024 16:22:51 -0500 X-MC-Unique: DaP31WVdNRKnCRJP1_xCsw-1 X-Mimecast-MFC-AGG-ID: DaP31WVdNRKnCRJP1_xCsw Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-4669c1479easo58056981cf.1 for ; Sun, 01 Dec 2024 13:22:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088170; x=1733692970; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KsWvwrfBnqjvbOnN4TJvOMsBkpZZHFy0C0E293ikvrs=; b=xJrZcW+OarU1j002tv/mnAGotgij/YOg6p6Z5Nm1iK6yBf7v1gGYE+ZQWYZwRkFlf2 WluYtJXgJXLWe/DhWKTHKb/aSUXe+7wlcS71pTKPMMSqEcGeqi53+umHdnZqfzTlrgji qMQ4JUWvHgKgjWhj6UGqFENaD31NvD1gIeX7fQNheNuP1ShFvcmK/jB9VxW1nfgGw9yD in1uCDsxINoHmyU0vbj+4O+lcypGqwf+0DLQPvi5SWozrcxueKV0Q5RQzZBZoA6TOAhl zwQNiB7PotvPc4Yheo2nYn0A6iTX1swhEDxqrqI/Du2xesd7fUGIauJ5eryXwPdtHD56 LE9g== X-Forwarded-Encrypted: i=1; AJvYcCX4trccvfSbjBrgrmXSLz/S7VtF7zdI1za6iFtzVm+tGU+gjftyk1Z9i9cJ95WcpygVgu0EZ/mkVQ==@kvack.org X-Gm-Message-State: AOJu0YxzMhXELePXzi5payAh0GlCkHuBD2C/E1CQ4GB2bbdjgVrNs3Cx 2nFLASxkrjBLLhaBXzv6fJqZF5+fg/omFZyXl0W8oWgkgRhi6cE1TVvpzucBGETSTOOycJj1SZm 4BDPYxNMyRcSVxH8Ark9ArAMBoZjdQcgqPLU8M+WNwhpV5/5C X-Gm-Gg: ASbGncvNfA4kYavgI+Pjr1+E2RAZr/o0qtPDDluYnfdptT4EGQcyWuYMrKuNYXw09Pq u4V4wcAK4TDjEa6lduAuOuTLQGRTMGG/+auFwklKHkRuGz68klHHPAV3KDauyhYoESJSclatyrI iqeiyW+APsl+2kt1uD2baJoewK4Qygk9+/HjKJI+cy1h4rxwtR6mOA8awAZxVUEJcYsgh5KXStX iJGsOVIqJ7MslE/fVv031Ev0dEggbA75oL54zZJ3ZsIV6BzYdIAX+/lnZv9jvyR1IPrltf+3Qh7 hF4ExbaxftsI4CR2hKaURZmBNw== X-Received: by 2002:a05:622a:46ca:b0:466:d559:b528 with SMTP id d75a77b69052e-466d559b5cfmr148250331cf.17.1733088170654; Sun, 01 Dec 2024 13:22:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IGTMiVxnYL67fM20hfjxO5Kmh9CkTakr0bybTA/KMxSrhoCBEVkaO75R4et7jq3KR1ig2eA4Q== X-Received: by 2002:a05:622a:46ca:b0:466:d559:b528 with SMTP id d75a77b69052e-466d559b5cfmr148250121cf.17.1733088170359; Sun, 01 Dec 2024 13:22:50 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:49 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 2/7] mm/hugetlb: Stop using avoid_reserve flag in fork() Date: Sun, 1 Dec 2024 16:22:35 -0500 Message-ID: <20241201212240.533824-3-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: sKpDCUB4u3kLGqGpRGPB6PVnNNcFA50zPdQuOc6kcts_1733088170 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: C0CEAC001C X-Stat-Signature: s3hp69xabtc3sqi5jk3sdfhmb1xt1gi4 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1733088160-451404 X-HE-Meta: U2FsdGVkX18u7nzkyhtxACt+pr877GUQ09Ls9h/Byz4YZahyrcuV/h2E9fYBozcBa9NllhG6/CtxXYHsAN6Rs3mCoLGi5auPuYE7j6rhtmCylTPTPwqHWKhCD/tx/D+npg7jBE7xSj+dJE1L50tV6OvoONZUor7vd3HawaSEPT+aTVYGBJCVb/Go5V+DMRIfsVn5nfDhCAu8rR4mnuZuHSM+BV7sxHoWoIvFP6Kiup44X6cj+sGM9DA0UBB5wbaYuJ1rx+TOW+smTaLIKGaWzgphLdNPxhqSmF8KJoeqCiGsHW4/ppcs8LieZOeSvGn+W276m9O8a5YEaWmfNCSdMt+auYJvbLl7nsQPlcu2rHqAfJ8/wTk4WgO35csvI8f1tZYNXyAga/qHquB3ULQf6SZmdjRD7EcspJiFdMZ31bvnm+ol3HC6vZJN+JS7ikk5H6bCCZNPkrTPjhwCHEr4vBqls3aOYOtR8sRrkEItUd4RNaCE+ARJ71VCG5wPHSPiTcWHHpmajGU+ydnPf/+UNW+n5Q4U4vBBPEmvQc78Q1p1nnkM3BBg7n6LXEqo0M2QreGD+av6sUgLKWDFKoi+srflpMqjT8QPFFpSaqACAWdhIk7TCgEKONPJ0GJ4TIOs/enHPTgArhbCJ2WkDYDzZsE7V+YaotNNTOdNaPqFANtUr0U2s0DMGeuVBfPX4t6uIuWNKD0YhMfXfdK1MajxE+1e9GOsfxkVSGYLkFeaQ0EAYjH2FQUMdGCk3q+jxGo8dm5hIbrfT9SSIoSg2ar+k2YAVRi2IUBwewjPQrYcgFx6YR5CxOoTvpOf8Lnrzu8lmNSar6p+RMTSpn4s54EExd+dGpvUlWUHSjUYW1OAwMfopU2M1wNv8DDb5+89a9UxsyPDw0ejJoBVOG2N+gpwdWvIisxhPK1kiXABkLiuWcIPdvrJKk2BQoZf99cpdloNAz1Jiph4K9ZNh0JhPbd Ba/D1jqD 0VNj0D7/r34lLJyb1yOtQkglRcdaQ4Sgs5mXQcCH+XU8jFm90IwTXx4YtnNTV/0+v90w4aWtHm0NXeBdqEPfh/Z5utYGnUAsssr7ZB7BVTm0AahlOHKEjVSsdXmaLSgRnKQe8E3wVRL42BTt/AHxdx0gRnraACOtJKE9hQz3+Fk3K9T0gSM/f4X7o8yHWeIbKpE8pXVwLrcnUtJr3j8YObbfguc39LRfCNysS4EfejFhpYn4Iq5A5UhENOMGBu1ugsEi+/QvAFNKlntAhN7TbtN/Dam/JAjCSsLOqxBW4ewJF1YuXQPKafh7gBwLtJ1uX6/DH5gj78yk8mK9bQb/E6vpaH6fQpmICNrtTO8IbNCasTdWq9bUk49HdUwmBNPDPLytFwZeMe67M7YKSWay5ofpgCVZtWkp+pE4pYm9C9PYRqB2cr0125D9enLPTJID1oq4p89P4wP+VNEzWRE9QynNOMtnvlrVXMvXr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When fork() and stumble on top of a dma-pinned hugetlb private page, CoW must happen during fork() to guarantee dma coherency. In this specific path, hugetlb pages need to be allocated for the child process. Stop using avoid_reserve=1 flag here: it's not required to be used here, as dest_vma (which is destined to be a MAP_PRIVATE hugetlb vma) will have no private vma resv map, and that will make sure it won't be able to use a vma reservation later. No functional change intended with this change. Said that, it's still wanted to do this, so as to reduce the usage of avoid_reserve to the only one user, which is also why this flag was introduced initially in commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"). I don't see whoever else should set it at all. Further patch will clean up resv accounting based on this. Signed-off-by: Peter Xu --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9ce69fd22a01..8d4b4197d11b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5317,7 +5317,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 1); + new_folio = alloc_hugetlb_folio(dst_vma, addr, 0); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); From patchwork Sun Dec 1 21:22:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889653 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DE0ED49789 for ; Sun, 1 Dec 2024 21:23:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BBDF6B008C; Sun, 1 Dec 2024 16:23:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1459F6B0092; Sun, 1 Dec 2024 16:23:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDB266B0093; Sun, 1 Dec 2024 16:23:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C3F016B008C for ; Sun, 1 Dec 2024 16:23:00 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 81B72141813 for ; Sun, 1 Dec 2024 21:23:00 +0000 (UTC) X-FDA: 82847664816.03.6BD840B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 2F98E1C0004 for ; Sun, 1 Dec 2024 21:22:53 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KL+BMF5S; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eYuSsE1kcQK7m9nM+2EfixOkXvdk4hDxDm5g5MvtHpw=; b=v1H0NiyN23f3e/RNNxXt2KeC4Zs03W5mJPKB5bvh6zHbdwKoz1wzHzmPW2rC8Fl8t+mSNt M8xuJJ/wH7fu4qAynBUIfDIEEgzPHm7/Casf6dd956OHSC3Py12z6ikSEuQYMflowOgiO9 o/KBqhlXyO4At1FBLLXxYxg2Iui8FzE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KL+BMF5S; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088170; a=rsa-sha256; cv=none; b=XWbMjES1tWhYtpLD/lj1t7qxh40UXkDoc1XIveJbbQRPigyetqrE7Bm9LEG2/RO++Si7EB rmkquJbwKiQDUmmbeh1NUWaB8P3Er37pY04QNztCAEMCL1tOGYxemEsNi7BNYS4f1PfN0p gRY79hUdBAbziCREnbpCalZk7yZHyrA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eYuSsE1kcQK7m9nM+2EfixOkXvdk4hDxDm5g5MvtHpw=; b=KL+BMF5S3xbn07yV0zrL50+bjFK1E0ve9pTejSwSH6+Fx/7UAkcKxM8+LidYmdVYEjdjng PfbTFkbOkXP618yrsxdiEMSQ4BIAeG6qvdXHrr0Fn+5APIM5fwcPcdJZ0zbS3joeN9nvvH iqRvcj5nwG2jWu7OQWAYvwR+P7qY+IE= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-128-M77ppVR-OJ-CcfIRaI4lPg-1; Sun, 01 Dec 2024 16:22:54 -0500 X-MC-Unique: M77ppVR-OJ-CcfIRaI4lPg-1 X-Mimecast-MFC-AGG-ID: M77ppVR-OJ-CcfIRaI4lPg Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-466c88a95e5so48597631cf.0 for ; Sun, 01 Dec 2024 13:22:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088174; x=1733692974; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eYuSsE1kcQK7m9nM+2EfixOkXvdk4hDxDm5g5MvtHpw=; b=SBcNloZ77Uok2gETFvSvBESE6P7k03RHnSeuyI90uAyWeG+Rqf/bGu5akdFYPPTnQs /pHn7c4qnjwfCSCioBjZkAKiD2z+081kG6+YOPW+8WEmVoHsYoUlUpqRhe2pgf9lYATQ mxmazjSL0u3t61vANQUdCAHuVIxlhRyk6MprtNgkb1L0ygMJjSazMZ04k9FLvtoYsg2L etA5Z1thDduqf56PyNRv5JtrynovJwSrN3D0ldVz1mKJ4lVGDIU3WqJjQnqkjb21/oyf UmsYYoiC6HmihBngkLRhYhD70zDumGX5Pz8LKns4xFMrdssB7M8HkvjhDt+on04w8JA8 JKZQ== X-Forwarded-Encrypted: i=1; AJvYcCXSssB6ePnYP4+7Q7UVz2T1T8psa37CDKIXyt+YCtsUhtAhEaHihbyesdzgwqdqW43W1nVt7/Ywfg==@kvack.org X-Gm-Message-State: AOJu0Yyt/7IH1jrWXNrMfKiqquA2s2YZqjLXcuOGQpEysIPmoS1LqYRG iFII8pqVAi1z+0f6hBCTYovGE9mL+WXCFhwOJ40Xwj3ZYYVKLOxzZaachq4gel7ZIwJYQO9Az/s ZKtMCDpxCb4OoY5qlaxoujfIiB5SymvMmjFGjFF08m5z4vkaW X-Gm-Gg: ASbGncsGH/SbiQm6q2SVa4I2tFWheCuJCN2epWVdmtKWsht5sCOUQyJLZOwFdrjXNP4 NkgE3XU40OkELGqPxnAUaOZJoDZNoBraMhTU06/a5PicXPlSQ6kf3rxGckN4aQWaJTVOE7Uhi28 XKXOn1xtjZfaqNlNDYrU3EB6bVqsJ2WkbJHRw6vZ6vCtJ8SqEzycl1BeG+0+p9L2gtUyxko0AH5 gbgZCJkGhQKrK3/T638/Gvv8+ko9YhmerAR8iQBTFoJWrkYbHUDWYqIAYLPBslINpaQkiVb09hm QZRMX22se14qQFX2mF5AjQCxbQ== X-Received: by 2002:ac8:5a86:0:b0:466:a060:a484 with SMTP id d75a77b69052e-466b35264a8mr377536141cf.27.1733088173837; Sun, 01 Dec 2024 13:22:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEB36TOKW8CGRWjXDNI9ssLhVGE/dYfSeis1g4xl3UE1LIzQ+/MdfCEtO+Pg1Pp3wKjx4YPPw== X-Received: by 2002:ac8:5a86:0:b0:466:a060:a484 with SMTP id d75a77b69052e-466b35264a8mr377535791cf.27.1733088173463; Sun, 01 Dec 2024 13:22:53 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:52 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 3/7] mm/hugetlb: Rename avoid_reserve to cow_from_owner Date: Sun, 1 Dec 2024 16:22:36 -0500 Message-ID: <20241201212240.533824-4-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: KKS3gXiONywSJ5N_1CHtqGO4P3pSb0U6OHXq15OZo4Y_1733088174 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam05 X-Stat-Signature: afpxrg7hygs358smfry64kpezzwh3w59 X-Rspamd-Queue-Id: 2F98E1C0004 X-Rspam-User: X-HE-Tag: 1733088173-6876 X-HE-Meta: U2FsdGVkX18/Fh06BmcBg3Fzt/35ZSkTBHYJ19fjO6aiPupKwg0rITSoJ03kMvtGteFhQ1qhHd6sj/chQG3Y9ldLzkM+wztunLPwuNWlEXPP2261nIoSvGdkOjng5p3JqUNT/Kyzyec8Q1IioZGQpr5SPOxqqPooT1bJuTKn6kUzcqcDugnV4CiA86sGealvF8987aLoGc9UQRMV4PJr5JLc5XMutEVQ2ihHo0cqe3JWHZmt8lg0WwRKJ2m5xFiiDHKn0a+UYu6TkDI0szu9qxMt2ydqzOgXK3ElOKTjXHQ4HO9wRMVLsin2rhXr3lmNQd239c7V2VtK2m/cLxdlDEtxuXxOQjIOE9Hm2sPWQoglZuXP1Y1R/IEfrVthqJQRyR4sq9iJXWZx0y1lkWejxdHBru2/gtQqSzAo01a85YUmEMLmrr4oNbkPd7ISDcu6rwl1alL2oP2X3m4OpPO8GiEXONEp24qJXfV6zszSl3CHZB4g5/Eby5alerv13MM+FBZNIcSrkoF53e5ncvqOYgtOsIV2LnzDd57+4aWhxppycJlxdzWesNc4dQwvnTTITkAnMcDahZNppIncFU0mBkMIqguFfQyGqhL0x9US3oH/2xTCrBpYLGq9OUOgsHvy9srHkqXIl1v8wPDKqcx2TjByuyP5hiJp/b/SlMcOzX06F6qV25e22pcXTB9dAH2CpsLWKaj+jL9SjMTNsDotyRR1cK3/hhfWiHsRH0j6o/I7iH07T3/GSyCnxM6pxujcbh08C1RawBsa+qeRe2oudk1uYtKVAkS+V6oV/5dCYDEtEXgwoDHDND2b01zYnnkd+C2ShbsdWwXRcVCiG13q32fa2ycsFr+9IpsG2/C0UItxOJ8Xz+/heRdkwddv/D+qxbG9Q+aoxlMTCBTtq7WqvW2OAWFHyJbyC/MQQVZq9ZODrG7CYlqLlqopS9zU4M4tXGQT8PNhTWewWjlpE+8 t6aAnFR6 XNBDjsrsJtW0Buqsyi/3jXGW/aboBrZMqDLaxMFOfZ1czifdu3PTX/s/ydp+c1m8MbRwFWGXcEQ071yZAVdQvu/1gLtFVYE/2Jb2rZZeBoKraarkn+qPH4X96qCBAk7KOgdTp397TtGNy83Yr/TcxOn3JOyhYdzx8t8R32o4r+U7bWrO4TG4ORb/j6IHlnyLAyMwqlaexxoFx16HFh86t69Fw1R6A0QtwhQns8SvLkOkxYEjWHY/CuXtliVzFP+BTf9Psje+E1bWOlFWQ1ZFB6ADL7QbiAPtwILnH8k1x65/HMfElW0b1NSGwDA0HShb5Ri0/jQR74Xiui07gc7rcVjTFP6JLppYqK7Eliw9g8iQSEUFgS6rF6HzlNLZi69MfEGPquMKMY/2ymPUX4hjFqVAIULW2numvhiGxiGPqMOWI0n0BYq/kaYfbLLajmVYInsyyOu0JrC91t9etBnLDPcht84hNdrzSi8HTvtdgohmg9y8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The old name "avoid_reserve" can be too generic and can be used wrongly in the new call sites that want to allocate a hugetlb folio. It's confusing on two things: (1) whether one can opt-in to avoid global reservation, and (2) whether it should take more than one count. In reality, this flag is only used in an extremely hacky path, in an extremely hacky way in hugetlb CoW path only, and always use with 1 saying "skip global reservation". Rename the flag to avoid future abuse of this flag, making it a boolean so as to reflect its true representation that it's not a counter. To make it even harder to abuse, add a comment above the function to explain it. Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 33 ++++++++++++++++++++------------- 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a5ea006f403e..665c736bdb30 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -819,7 +819,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * folios in these areas, we need to consume the reserves * to keep reservation accounting consistent. */ - folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); + folio = alloc_hugetlb_folio(&pseudo_vma, addr, false); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); error = PTR_ERR(folio); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ae4fe8615bb6..6189d0383c7f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -682,7 +682,7 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); + unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback); @@ -1061,7 +1061,7 @@ static inline int isolate_or_dissolve_huge_page(struct page *page, static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - int avoid_reserve) + bool cow_from_owner) { return NULL; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8d4b4197d11b..dfd479a857b6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2956,8 +2956,15 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } +/* + * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW + * faults of hugetlb private mappings on top of a non-page-cache folio (in + * which case even if there's a private vma resv map it won't cover such + * allocation). New call sites should (probably) never set it to true!! + * When it's set, the allocation will bypass all vma level reservations. + */ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) + unsigned long addr, bool cow_from_owner) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -2998,7 +3005,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * Allocations for MAP_NORESERVE mappings also need to be * checked against any subpool limit. */ - if (map_chg || avoid_reserve) { + if (map_chg || cow_from_owner) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; @@ -3006,7 +3013,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* If this allocation is not consuming a reservation, charge it now. */ - deferred_reserve = map_chg || avoid_reserve; + deferred_reserve = map_chg || cow_from_owner; if (deferred_reserve) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); @@ -3031,7 +3038,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3090,7 +3097,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg || avoid_reserve) + if (map_chg || cow_from_owner) hugepage_subpool_put_pages(spool, 1); out_end_reservation: vma_end_reservation(h, vma, addr); @@ -5317,7 +5324,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 0); + new_folio = alloc_hugetlb_folio(dst_vma, addr, false); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5771,7 +5778,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, struct hstate *h = hstate_vma(vma); struct folio *old_folio; struct folio *new_folio; - int outside_reserve = 0; + bool cow_from_owner = 0; vm_fault_t ret = 0; struct mmu_notifier_range range; @@ -5840,7 +5847,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, */ if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && old_folio != pagecache_folio) - outside_reserve = 1; + cow_from_owner = true; folio_get(old_folio); @@ -5849,7 +5856,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * be acquired again before returning to the caller, as expected. */ spin_unlock(vmf->ptl); - new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve); + new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner); if (IS_ERR(new_folio)) { /* @@ -5859,7 +5866,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * reliability, unmap the page from child processes. The child * may get SIGKILLed if it later faults. */ - if (outside_reserve) { + if (cow_from_owner) { struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx; u32 hash; @@ -6110,7 +6117,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, goto out; } - folio = alloc_hugetlb_folio(vma, vmf->address, 0); + folio = alloc_hugetlb_folio(vma, vmf->address, false); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -6578,7 +6585,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6620,7 +6627,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM; From patchwork Sun Dec 1 21:22:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889654 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5364CD4978C for ; Sun, 1 Dec 2024 21:23:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 259616B0092; Sun, 1 Dec 2024 16:23:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 16C456B0093; Sun, 1 Dec 2024 16:23:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F02996B0095; Sun, 1 Dec 2024 16:23:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CC0BC6B0092 for ; Sun, 1 Dec 2024 16:23:01 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 52278121398 for ; Sun, 1 Dec 2024 21:23:01 +0000 (UTC) X-FDA: 82847665236.02.2621AF2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id D7F218000E for ; Sun, 1 Dec 2024 21:22:37 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dGW1X6rg; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088171; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JsfRVwDPpEABH52Gas2qEz8JIuJrK9vYix32n1qAh2g=; b=llfzv1T62uX/ZNQGlIKO9Ugi4M6ZETt8wOTH5RqZKZW5gh47ggLP1LsCJk/SHPCx9UnANs 1Pd3N6fetlz0ynD21tkY/Nohs8gCxsgLzmAuEy5SkDBES1Lqsx1M0/GY1oHR6xfGi7jhbH wGMqe53Z0IYSoyFCbOxWNamx8Aejwj4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088171; a=rsa-sha256; cv=none; b=Q3vQtdCuG+VEdF2cscU6QHStwPH/e7nDpuKt0stYx6S8Ez30QgFbdCWWI3HS8CcTC248YJ qhQgmERG5YDXq4qxBdKEaOqyClgQuwqafGXg0oqkUk6CPGFnui7Jbzi37JbJvvROXXhzXO VibD7cxrRPK3sOc7ol+9qezUm1V5dZE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dGW1X6rg; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088178; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JsfRVwDPpEABH52Gas2qEz8JIuJrK9vYix32n1qAh2g=; b=dGW1X6rgUwk2RufpSOVjoegJVvZYdY6Q2prjYR7H6K+Imn990jRq/DaztdWOJJ7eYuKrcE i18KzxF+tAb2KCARnv9yv4tYWxcUdkce7CnOYTGTHpPNG6c2d0lA5s6hVhKHPme4P+n32l 3O8SyblN9u36c2OJ7/77tSiBZoX1IoU= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-178-kjOe4ISmMs6U5kvzNXtjAw-1; Sun, 01 Dec 2024 16:22:57 -0500 X-MC-Unique: kjOe4ISmMs6U5kvzNXtjAw-1 X-Mimecast-MFC-AGG-ID: kjOe4ISmMs6U5kvzNXtjAw Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-46697645ceeso63314331cf.0 for ; Sun, 01 Dec 2024 13:22:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088176; x=1733692976; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JsfRVwDPpEABH52Gas2qEz8JIuJrK9vYix32n1qAh2g=; b=mZknMYaQwPcJylXVrjmIXBlRTCSQY9z5lkuJKvdDdOZZgfDTK8rvO0pIibk93E02Ap QzFpBkocMh0Kx7wcl4QIdKmAeLhm1H4RGNpu0eSDgAy/09IdT0I56wZPqZozRY389niq eUmpNON7nWFQ31HA5w86j0ie8c3Tzn4HXLv88kTwtYOer+1pL5FbJ9+3WtlWfEasN23N YLBfYgHQplMrYhRo0L1hEsQCcLJdQQfv7PJCfHD4THG78fUNMBjiuHNKwaCfOXZwQvTJ XwgiVYi/sbbYgBwaKZN/kKZHjM8N1sG39nUYVwyL4cFA5UszSjJPe3UZSvfudNaQLPKU G3OQ== X-Forwarded-Encrypted: i=1; AJvYcCWQHZi4A249jNPJ8DrT+ELD+m6sm0I5TjfyNq4X0qCHeGhmPiAIIWgPwTypbAwK6jOXTEBos9lelA==@kvack.org X-Gm-Message-State: AOJu0Yw+BpOcPony4mzrjmZTYFvfJ0b0ysdEZ/gFHlBjb/iDY2N18wIQ OlnyQWGD7AgBEtFlwBS/qOzlAHcWcQGaguK/1Ww21GLkEoypvXe1gKfa/UubTTsNGRWDgyp+iz7 qggvDWwPKzsvl1OyCCOhP3ZBL6ckBnAnXZ5wNm85Brbu17SUUdP45jhqX X-Gm-Gg: ASbGncty8wkMBoLAu3y1qV+Vt0S5714TGTDW6sFojvYqx7UQENnHB4bdGoH18GZxA/8 iuNBVLxBK68GoevLJeRHGgUfM83Kun8RcE46CnJnBwrwwFldJF+Ybwk+6aMjP+jAyRy05YeHB0o ngipwbKqmm5W6WPqxUoqMax8DpZg/bL1gUxDn4VSVoDmD9X5qEV0to4679gbqhZHkXc8LUJjc8R rm1l5ht5ALC2t2iprhpS0aMxjpBiksmJma3sgPdyX/e0u3iNRqpp/5PzYqr+XQolma+8TTz+JO9 e7md5KsrXqSp9jsLFzgDfNlyxA== X-Received: by 2002:a05:622a:1801:b0:462:a7d1:8e19 with SMTP id d75a77b69052e-466b359cc6dmr326628331cf.13.1733088176508; Sun, 01 Dec 2024 13:22:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IEIHjJ7gsnlT3rnkpu2KbnEapiMVvKzn9R0CmFm4KgWLF8XZTcnqWRYbl4fLEluyyAPgKpicA== X-Received: by 2002:a05:622a:1801:b0:462:a7d1:8e19 with SMTP id d75a77b69052e-466b359cc6dmr326628031cf.13.1733088176142; Sun, 01 Dec 2024 13:22:56 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:55 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 4/7] mm/hugetlb: Clean up map/global resv accounting when allocate Date: Sun, 1 Dec 2024 16:22:37 -0500 Message-ID: <20241201212240.533824-5-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: UwPSu6P7VBkjWC4vr1bmlk3T3fxzexp2Px-HtJRDBRM_1733088177 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D7F218000E X-Stat-Signature: 71fmkhyjqbcncm1toig89x6yixxapije X-Rspam-User: X-HE-Tag: 1733088157-665065 X-HE-Meta: U2FsdGVkX18hNOqNv4Mxt7NhouSkTCTk/RDtftLayWG4PXwPLdZf2Z5Lon0ssz8iDbaBZSjH4YCOvGD9avwjeR3Rujt44qsyPy88UCuUNMTnM91FN+prUHjrE4KQKoZkuDm22WD6RWL2Cb4W44pO+H4kIAmK3je4RR3MOTMjxh2ttFQOvCe2WZIY/b69MHcyr91OjcRc+4N3QGedblvtLlPUc0SNIoR52mfux1ehLf9inZHPRBfjYUWLYzSEkve1mXYScpYLvsW7IZFbK4loC5WgcMy0lei9tHFk+nKys+TpEnOCCMzGohPUynOqYPtuzhf7ddVnoI81BUjcWU86j/UxKMn+MM1B3FsR+VRep3PPWmICiwLTuUQE9ve6I2+RwmRtsmcE1KJVBrY5Gz5JZ8mYGgqzcyFZ5BVTNVelJbxwHpuoex3YEWRtjIsyxfgKOcbJWMKSglq51YFIsKc4tY96gTfzrrX77NlCoWIlIJ7jw40um/LnhWRTyX1KW6QOKww4spAJV2gcIz+e3ErGXVzucQzf+GuO1BnOvbWgCpTNR1wxm6znEyu+75CFOZN5lynjbR8qqCKnxcb/DtVmzukaBj6r/WQ3wwLR1Jv8gN+eVRT2gT7OoleZ1S0z3IeHUxeyK2AYsqGvew1CN9K1a4LYHIsM1gForl2Z/4Pr16krThRPldMowcsxBhewogPdK8Rp2cqG7JT/QuvU10VV5XppbkE/SZKIjFHvHbsfGUv6wQkrUEvXStuG1kmGCzfNXha9D8aHv9ODqG5nE69z5sRGpk9KQVcVNmVZzcTpFNz0xv3GiD0BRIUaNyrkbMjoSvh0kgwP3CZJf+/r1o+BXJNSCynmefUVd7sRqEdxRsa2+InxiAsnII9DyjR1K2xBHg639WuFjjzP0mukcVms7NGg2ptlycigtZWOeFwlT1T2w/pfZKQgcpeokyA+Y+nIM0DoRCRMVYKgQ/IgauP Wx0zs/ea 7eKsh1KKko+Mo3NKEFg+Byhdxk8wA2y49yGBWUmWhkSobLFthU0WT0jHBPR6iiZ9w3H32eEKlAiTgObDB0sQAH5liN6161SlG7BvNiMas5gh/OvI5VuhXJg06F9oLsuE9VnFbwx9x7IHUPIhLqbmkOMuqli7ob7Z85Xrh6qWRwFDAMniprQtBK4tO9WfaBHrwQtx3jYjy1uO8AHUY1D0M/YHBKO3uvNl0pr+ejgEBjHKv7oRGeG3hF6oTbwDUZS1gE41QIOcMPwDRiV3pQwz9GxT0a3RoTLEpmsvOBkE4bjg3TbsqlJi7nL+Jzj/D3FUfXEByPsX/xQ6j57jbWLJynh1DuaemztwvbKXeCtfbKOAkzB3Ur5+SopuvuFE/AA19HBZpRo+P+ZqrzC00V/5iVsdLrqYjcsgY+u+Rx+lRhljlijmyj6lVwmEKTQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: alloc_hugetlb_folio() isn't a function easy to read, especially on reservation accountings for either VMA or globally (majorly, spool only). The 1st complexity lies in the special private CoW path, aka, cow_from_owner=true case. The 2nd complexity may be the confusing updates of gbl_chg after it's set once, which looks like they can change anytime on the fly. Logically, cow_from_user is only about vma reservation. We could already decouple the flag and consolidate it into map charge flag very early. Then we don't need to keep checking the CoW special flag every time. This patch does it by making map_chg a tri-state flag. Tri-state needed is unfortunate, and it's because currently vma_needs_reservation() has a side effect internally, that it must be followed by either a end() or commit(). We keep the same semantic as before on one thing: "if (map_chg)" means we need a separate per-vma resv count. It keeps most of the old code like before untouched with the new enum. After this patch, we take these steps to decide these variables, hopefully slightly easier to follow: - First, decide map_chg. This will take cow_from_owner into account, once and for all. It's about whether we could take a resv count from the vma, no matter it's shared, private, etc. - Then, decide gbl_chg. The only diff here is spool, comparing to map_chg. Now only update each flag once and for all, instead of keep any of them flipping which can be very hard to follow. With cow_from_owner merged into map_chg, we could remove quite a few such checks all over. Side benefit of such is that we can get rid of one more confusing flag, which is deferred_reserve. Cleanup the comments a bit too. E.g., MAP_NORESERVE may not need to check against spool limit, AFAIU, if it's on a shared mapping, and if the page cache folio has its inode's resv map available (in which case map_chg would have been set zero, hence the code should be correct, not the comment). There's one trivial detail that needs attention that this patch touched, which is this check right after vma_commit_reservation(): if (map_chg > map_commit) It changes to: if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) It should behave the same like before, because previously the only way to make "map_chg > map_commit" happen is map_chg=1 && map_commit=0. That's exactly the rewritten line. Meanwhile, either commit() or end() will need to be skipped if ENFORCE, to keep the old behavior. Even though it looks a lot changed, but no functional change expected. Signed-off-by: Peter Xu --- mm/hugetlb.c | 116 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 80 insertions(+), 36 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dfd479a857b6..14cfe0bb01e4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2956,6 +2956,25 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } +typedef enum { + /* + * For either 0/1: we checked the per-vma resv map, and one resv + * count either can be reused (0), or an extra needed (1). + */ + MAP_CHG_REUSE = 0, + MAP_CHG_NEEDED = 1, + /* + * Cannot use per-vma resv count can be used, hence a new resv + * count is enforced. + * + * NOTE: This is mostly identical to MAP_CHG_NEEDED, except + * that currently vma_needs_reservation() has an unwanted side + * effect to either use end() or commit() to complete the + * transaction. Hence it needs to differenciate from NEEDED. + */ + MAP_CHG_ENFORCED = 2, +} map_chg_state; + /* * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW * faults of hugetlb private mappings on top of a non-page-cache folio (in @@ -2969,12 +2988,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long map_chg, map_commit, nr_pages = pages_per_huge_page(h); - long gbl_chg; + long retval, gbl_chg, nr_pages = pages_per_huge_page(h); + map_chg_state map_chg; int memcg_charge_ret, ret, idx; struct hugetlb_cgroup *h_cg = NULL; struct mem_cgroup *memcg; - bool deferred_reserve; gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; memcg = get_mem_cgroup_from_current(); @@ -2985,36 +3003,56 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, } idx = hstate_index(h); - /* - * Examine the region/reserve map to determine if the process - * has a reservation for the page to be allocated. A return - * code of zero indicates a reservation exists (no change). - */ - map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); - if (map_chg < 0) { - if (!memcg_charge_ret) - mem_cgroup_cancel_charge(memcg, nr_pages); - mem_cgroup_put(memcg); - return ERR_PTR(-ENOMEM); + + /* Whether we need a separate per-vma reservation? */ + if (cow_from_owner) { + /* + * Special case! Since it's a CoW on top of a reserved + * page, the private resv map doesn't count. So it cannot + * consume the per-vma resv map even if it's reserved. + */ + map_chg = MAP_CHG_ENFORCED; + } else { + /* + * Examine the region/reserve map to determine if the process + * has a reservation for the page to be allocated. A return + * code of zero indicates a reservation exists (no change). + */ + retval = vma_needs_reservation(h, vma, addr); + if (retval < 0) { + if (!memcg_charge_ret) + mem_cgroup_cancel_charge(memcg, nr_pages); + mem_cgroup_put(memcg); + return ERR_PTR(-ENOMEM); + } + map_chg = retval ? MAP_CHG_NEEDED : MAP_CHG_REUSE; } /* + * Whether we need a separate global reservation? + * * Processes that did not create the mapping will have no * reserves as indicated by the region/reserve map. Check * that the allocation will not exceed the subpool limit. - * Allocations for MAP_NORESERVE mappings also need to be - * checked against any subpool limit. + * Or if it can get one from the pool reservation directly. */ - if (map_chg || cow_from_owner) { + if (map_chg) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; + } else { + /* + * If we have the vma reservation ready, no need for extra + * global reservation. + */ + gbl_chg = 0; } - /* If this allocation is not consuming a reservation, charge it now. + /* + * If this allocation is not consuming a per-vma reservation, + * charge the hugetlb cgroup now. */ - deferred_reserve = map_chg || cow_from_owner; - if (deferred_reserve) { + if (map_chg) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); if (ret) @@ -3038,7 +3076,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { + if (vma_has_reserves(vma, gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3051,7 +3089,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* If allocation is not consuming a reservation, also store the * hugetlb_cgroup pointer on the page. */ - if (deferred_reserve) { + if (map_chg) { hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), h_cg, folio); } @@ -3060,26 +3098,31 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_set_folio_subpool(folio, spool); - map_commit = vma_commit_reservation(h, vma, addr); - if (unlikely(map_chg > map_commit)) { + if (map_chg != MAP_CHG_ENFORCED) { + /* commit() is only needed if the map_chg is not enforced */ + retval = vma_commit_reservation(h, vma, addr); /* + * Check for possible race conditions. When it happens.. * The page was added to the reservation map between * vma_needs_reservation and vma_commit_reservation. * This indicates a race with hugetlb_reserve_pages. * Adjust for the subpool count incremented above AND - * in hugetlb_reserve_pages for the same page. Also, + * in hugetlb_reserve_pages for the same page. Also, * the reservation count added in hugetlb_reserve_pages * no longer applies. */ - long rsv_adjust; + if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) { + long rsv_adjust; - rsv_adjust = hugepage_subpool_put_pages(spool, 1); - hugetlb_acct_memory(h, -rsv_adjust); - if (deferred_reserve) { - spin_lock_irq(&hugetlb_lock); - hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), - pages_per_huge_page(h), folio); - spin_unlock_irq(&hugetlb_lock); + rsv_adjust = hugepage_subpool_put_pages(spool, 1); + hugetlb_acct_memory(h, -rsv_adjust); + if (map_chg) { + spin_lock_irq(&hugetlb_lock); + hugetlb_cgroup_uncharge_folio_rsvd( + hstate_index(h), pages_per_huge_page(h), + folio); + spin_unlock_irq(&hugetlb_lock); + } } } @@ -3093,14 +3136,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); out_uncharge_cgroup_reservation: - if (deferred_reserve) + if (map_chg) hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg || cow_from_owner) + if (map_chg) hugepage_subpool_put_pages(spool, 1); out_end_reservation: - vma_end_reservation(h, vma, addr); + if (map_chg != MAP_CHG_ENFORCED) + vma_end_reservation(h, vma, addr); if (!memcg_charge_ret) mem_cgroup_cancel_charge(memcg, nr_pages); mem_cgroup_put(memcg); From patchwork Sun Dec 1 21:22:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D17E2D49789 for ; Sun, 1 Dec 2024 21:23:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 408A86B0095; Sun, 1 Dec 2024 16:23:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 391796B0096; Sun, 1 Dec 2024 16:23:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F7946B0098; Sun, 1 Dec 2024 16:23:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D6C5F6B0095 for ; Sun, 1 Dec 2024 16:23:03 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8E7B2AE8E1 for ; Sun, 1 Dec 2024 21:23:03 +0000 (UTC) X-FDA: 82847665026.06.2BEB567 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 245154000D for ; Sun, 1 Dec 2024 21:22:46 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FQjhHuim; spf=pass (imf27.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088173; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4S7W+NNw/Z0asEvav/qetVxn0P5gIx5aAGF1EDJoEHQ=; b=v8Dz+Wf8CDfa8UbyL6vaqrNExqjBpzMSNSkcFkHzD9pIOPyLySkbZQ9TCLp6wVypc8CbyI KiDphWC+2PTGirS+9P1ywbl2OpYH9vZ8+7oPnptsr0ebeXz6BR+COLNO0f8hmfnQwZvAgK +61xpuwo/R2ZzR8aXMDNcAiil2je+NQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FQjhHuim; spf=pass (imf27.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088173; a=rsa-sha256; cv=none; b=7u/cWnaHk3WDvT8dBehclw45FIg39vyTMbe4dojY52sFpR+3Ev+V8pAa8fekijwf6zSUAz 64EqbJu8P+mW4oIPK6dneObh3H71TSgbLKL/vTRW9Z0nRbSnaGPEBA0G7EOhqyr/bPZMV5 mcHpEBxREvjUYr5jKBFnOkbZwJ0DqQk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088180; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4S7W+NNw/Z0asEvav/qetVxn0P5gIx5aAGF1EDJoEHQ=; b=FQjhHuimqHQ+o7LOE10mFJFDyutRc9nKBzpYWg1SA9pgCmeuDvsbVpPqVvf+4wsZhzNGsN 37gyfwhQPdo2Hc2UZFyk5x8r3hmEKG92XULebkEt2wF3UxhytJBS7fEH6+0pTpfoyz2rOS qJEqNxikB5iPsmPT5iobFUBAkqfD1yM= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-403-glk-lk0FMNO1y7BuGPKt7w-1; Sun, 01 Dec 2024 16:22:59 -0500 X-MC-Unique: glk-lk0FMNO1y7BuGPKt7w-1 X-Mimecast-MFC-AGG-ID: glk-lk0FMNO1y7BuGPKt7w Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-466c88a95e5so48598521cf.0 for ; Sun, 01 Dec 2024 13:22:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088179; x=1733692979; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4S7W+NNw/Z0asEvav/qetVxn0P5gIx5aAGF1EDJoEHQ=; b=SkxP6lbmNigdM04psYkjDc38ezRcd3bL6l4t+WFmDOtWGFlqRoRonRED6BmzwiZENE +6REl5ew95fH3MOkSzREPPsn0ZiX35nf8HsIioiM4czSQFs7g3ag86/sImIlvLzMBmMj 7/mXFFlQB1aH7dljRiDfwHG8w7kIWa+3v0iXMUitUagTSczrcE+HAZwCQB60U8GnG9R2 Q8d7L88VP+0uppWuZx1YaPFybnmv15J6OGXAEE7gFmQx/lxWTI1+WAf6u0LPlSiWgPqf gDuslQUc5q2NtphtHZru/JAoTHaowXRd1JV5xsf9qe4DaXjFh5nn3jLPrjkoIp/bj91G beow== X-Forwarded-Encrypted: i=1; AJvYcCXXeT15mkNtlPQqLH2w8iq6vM0wNJjMkEmAxJdgF2DLx5x9JY2ULf2V8+/GjxY9xRDL0zZZRYBJ5g==@kvack.org X-Gm-Message-State: AOJu0YwYwtxFAl3YaPbg4YYba0SP8hATvKj1u/TWTFyiFtlborlliCx8 Z9P6C5uS6ERsKL3XwJ8IJUlnfwgou2EaSVzoBrVZfKDVYEJi8JEtARgNjoLsE56bpNGEOraevuG 1JkDDvtG+cFvk8HRbYNntBSe18jeXVZst1xOLK5GBYgfxLk2tf3GkKy57 X-Gm-Gg: ASbGnctXjz6SfKYpkzHLkaEr4c1DONvpIysb03UAgRpIpcDjd6B+IUe/tYnHZlyRVM9 4THsVdplFY/9ctH4YdGj0PT71q8mcgg8vaLFpnYgx85kK2Cu9qt57NnXCMZLDWqWfTDlMlO58fa kXQYo7dnLOaqRrBjb4fhSNWskXG1qZrISKtPFvs/dKopjbCk54AK0NKX9ziymHIaqob7jfgVlDs 4igWCDWBRO3hs4Um2iroXU6Th1SlyaH+Uv9F5hKxk8V8SvnsY8vE9AjjI1zcXNa+rNDnMq4bPFX IlChw9rYYW35+H1ZGawz6JD2LQ== X-Received: by 2002:a05:622a:558f:b0:463:60a9:74c0 with SMTP id d75a77b69052e-466b34df5a2mr287104951cf.14.1733088178946; Sun, 01 Dec 2024 13:22:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IEU8yFbm4tc7hT2oRa608rksSZP3tFjGJoJtWKVGNQdDCFCtb+uupJvQXL6XiGSzbeqJN01Ew== X-Received: by 2002:a05:622a:558f:b0:463:60a9:74c0 with SMTP id d75a77b69052e-466b34df5a2mr287104571cf.14.1733088178467; Sun, 01 Dec 2024 13:22:58 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:57 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 5/7] mm/hugetlb: Simplify vma_has_reserves() Date: Sun, 1 Dec 2024 16:22:38 -0500 Message-ID: <20241201212240.533824-6-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: qU9ZAbXpNm9HzeW_P7oeuFrFvqRd03t-QVkuNWTC1gU_1733088179 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam05 X-Stat-Signature: jcfag5kk4ggysmzgb1rhrdrcgfqm77nz X-Rspamd-Queue-Id: 245154000D X-Rspam-User: X-HE-Tag: 1733088166-239623 X-HE-Meta: U2FsdGVkX1/YEPiAzpG5o2PJbO+5p1Gx13io4YMtpmAMXiUerfHg8Vs6hARQ/aGWFDyIMLLh+khmxrmfwM+pAkewRZpczG9tLsQq03EGcuFd/EUDjb/NNg3fO0Ro2MGlDpyZdLEk4fbBNo7jGqQfeg/lQ9v98J47h5zJzEAG4q2M0+UGo1fLlcpWl/7wYfA90CuL9GGp8tm78HMHUDIGexDlw3WM/ewTpgzNrDjBEIDvL06GBNxMZfJkvYxD/+sWVlarqL/zYHJwofmRhv+a4D7RuuSIMongLuaGEOvREhrt+wJRVGai3cCu6o4tLEdS1peAdcdh195HfBNa3ARJsY6vtCsjxG/6kCgjFDCE40Tq6ewCRMpaD/MR0gfwfTVm2T8CUmdJgzAvEqg+2dBccjqfUiS/CqmyVznjOsIYlLTzJSgUXPyz0Zdb+zRRL+mskyeP4yKevJzdRqxQcllqhwS92bIlpTxLl69NCiRFeNX6aSyCC7y4z60NTwvbD0ns4zAlQZjRseD0jMnxsXK27Slf2up5hn3THi7UxcF5mRlspIUI23vaBW0uRwgxb1z80fg5Gd7COFleR4LYEfaHW3ibchA13KoNrv11dcOx05BpMuSKMRWWMJLV4WwJ/cELNpk/MC+PA32Dx93t5JxowPruDztoHFkKW9vvHlD59UEgakJkbdtnCVAAzosNJ6bTkkvz35RseQbPlUhRenpH9CSO0oYEu5xkmoKKUpVNzdAdjOE8SvSUBX7eADQoDWMcOiYS9sVpIxHuS+3cCU5NvE4hE5xVrIkGHvmNHXq9ynL8o7LHh5bjXKc2KxH/MnRpOlxQ350RvuyHbmnl4jKpcyikZDXVmTDrsMk1QY2SBllKqgnVhRfQ/qAlEQ1xGHf2gskofBh+xab6AxMJddxwvWPkJO6QEKy2d46c0xP1pOyXLghxL+2z8igv9uGAkt8BKPT0hmp1Qt/sPGKdGci ueX7hSO3 lU9giA7IyDB7UKPdA/Dp44onv6EqdCtLF/5nQB2Etz4BoPp/VXwOSDwEV/5GTul9fGM42kpHiDsW4geg9pJY3MHWY1KptkijXEEcY/NO2Wq3eKuOPByD1JxfTsgLHgD7Cckjj0ue/WbNBZp0FtkwzjguSlHQY3HPoquBx+Lvo8gsEWvjYoMOckhuXnMtAYmkeZ4sXlDPYnInjS8D37LSiIAyTZg4tVsXm6m1e8D+yE9Brj5pwKSS0sdYHh5hKMscdUJdsYdNRb30pQ3TcidmJ5LnWkIqAZ1Hia2xY6py+BcblfOxAs0eIxM3VsYnsFr8NsUO+xHZRHlBleJp+7pUEzbZCDnEPjq+pnmXac+oy5GBDvZeGSO12EijpjuD7iIyiYg3SbNmmt8hR/MxrDt0B2pdtehD5HngeDG8H5IrqpbsBYMENvtqxrbZnAI+GvdjNAnGm3NVx61lANGXIexVCNdId9nffoc9ULLk6DNEOT3qIkqA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: vma_has_reserves() is a helper "trying" to know whether the vma should consume one reservation when allocating the hugetlb folio. However it's not clear on why we need such complexity, as such information is already represented in the "chg" variable. From alloc_hugetlb_folio() context, "chg" (or in the function's context, "gbl_chg") is defined as: - If gbl_chg=1, the allocation cannot reuse an existing reservation - If gbl_chg=0, the allocation should reuse an existing reservation Firstly, map_chg is defined as following, to cover all cases of hugetlb reservation scenarios (mostly, via vma_needs_reservation(), but cow_from_owner is an outlier): CONDITION HAS RESERVATION? ========= ================ - SHARED: always check against per-inode resv_map (ignore NONRESERVE) - If resv exists ==> YES [1] - If not ==> NO [2] - PRIVATE: complicated... - Request came from a CoW from owner resv map ==> NO [3] (when cow_from_owner==true) - If does not own a resv_map at all.. ==> NO [4] (examples: VM_NORESERVE, private fork()) - If owns a resv_map, but resv donsn't exists ==> NO [5] - If owns a resv_map, and resv exists ==> YES [6] Further on, gbl_chg considered spool setup, so that is a decision based on all the context. If we look at vma_has_reserves(), it almost does check that has already been processed by map_chg accounting (I marked each return value to the case above): static bool vma_has_reserves(struct vm_area_struct *vma, long chg) { if (vma->vm_flags & VM_NORESERVE) { if (vma->vm_flags & VM_MAYSHARE && chg == 0) return true; ==> [1] else return false; ==> [2] or [4] } if (vma->vm_flags & VM_MAYSHARE) { if (chg) return false; ==> [2] else return true; ==> [1] } if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { if (chg) return false; ==> [5] else return true; ==> [6] } return false; ==> [4] } It didn't check [3], but [3] case was actually already covered now by the "chg" / "gbl_chg" / "map_chg" calculations. In short, vma_has_reserves() doesn't provide anything more than return "!chg".. so just simplify all the things. There're a lot of comments describing truncation races, IIUC there should have no race as long as map_chg is properly done. Signed-off-by: Peter Xu --- mm/hugetlb.c | 67 ++++++---------------------------------------------- 1 file changed, 7 insertions(+), 60 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 14cfe0bb01e4..b7e16b3c4e67 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1247,66 +1247,13 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) } /* Returns true if the VMA has associated reserve pages */ -static bool vma_has_reserves(struct vm_area_struct *vma, long chg) +static bool vma_has_reserves(long chg) { - if (vma->vm_flags & VM_NORESERVE) { - /* - * This address is already reserved by other process(chg == 0), - * so, we should decrement reserved count. Without decrementing, - * reserve count remains after releasing inode, because this - * allocated page will go into page cache and is regarded as - * coming from reserved pool in releasing step. Currently, we - * don't have any other solution to deal with this situation - * properly, so add work-around here. - */ - if (vma->vm_flags & VM_MAYSHARE && chg == 0) - return true; - else - return false; - } - - /* Shared mappings always use reserves */ - if (vma->vm_flags & VM_MAYSHARE) { - /* - * We know VM_NORESERVE is not set. Therefore, there SHOULD - * be a region map for all pages. The only situation where - * there is no region map is if a hole was punched via - * fallocate. In this case, there really are no reserves to - * use. This situation is indicated if chg != 0. - */ - if (chg) - return false; - else - return true; - } - /* - * Only the process that called mmap() has reserves for - * private mappings. + * Now "chg" has all the conditions considered for whether we + * should use an existing reservation. */ - if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { - /* - * Like the shared case above, a hole punch or truncate - * could have been performed on the private mapping. - * Examine the value of chg to determine if reserves - * actually exist or were previously consumed. - * Very Subtle - The value of chg comes from a previous - * call to vma_needs_reserves(). The reserve map for - * private mappings has different (opposite) semantics - * than that of shared mappings. vma_needs_reserves() - * has already taken this difference in semantics into - * account. Therefore, the meaning of chg is the same - * as in the shared case above. Code could easily be - * combined, but keeping it separate draws attention to - * subtle differences. - */ - if (chg) - return false; - else - return true; - } - - return false; + return chg == 0; } static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) @@ -1407,7 +1354,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, * have no page reserves. This check ensures that reservations are * not "stolen". The child may still get SIGKILLed */ - if (!vma_has_reserves(vma, chg) && !available_huge_pages(h)) + if (!vma_has_reserves(chg) && !available_huge_pages(h)) goto err; gfp_mask = htlb_alloc_mask(h); @@ -1425,7 +1372,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && vma_has_reserves(vma, chg)) { + if (folio && vma_has_reserves(chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3076,7 +3023,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (vma_has_reserves(vma, gbl_chg)) { + if (vma_has_reserves(gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } From patchwork Sun Dec 1 21:22:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65AD7D4978D for ; Sun, 1 Dec 2024 21:23:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6EC2D6B0096; Sun, 1 Dec 2024 16:23:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 675946B0098; Sun, 1 Dec 2024 16:23:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B8336B0099; Sun, 1 Dec 2024 16:23:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1304D6B0096 for ; Sun, 1 Dec 2024 16:23:06 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C56D616138C for ; Sun, 1 Dec 2024 21:23:05 +0000 (UTC) X-FDA: 82847665110.25.3CDBC2D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id AE477180003 for ; Sun, 1 Dec 2024 21:22:55 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LhBgKDiQ; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088173; a=rsa-sha256; cv=none; b=eQ7xBl7a5dACPsx6W9TVi13cYb1/aF4n51symzjldf35vh5HWhWqgXpVE3/YzL0CVCjl6v aE22TidSR3G+MbkUsEegUPQPPhNWjFl1Z10Rp0NiUzPWgiZeuMmwCh/YnC5wrzzLAZhjxb PC48m6xsoMXyNpU/iYlPN+IrCLpOcjs= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LhBgKDiQ; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088173; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XrrRdtWvOXhX0IC/Bu+h8sPnIUOX9C/tAx/abPzoJUA=; b=gl5n/KH38CAVwMeTFD/FTv0+h8oP1gSRZ0HCUxs2gT4Yp+v0dOVU7gsOLYhf6xVQqdM5rB VCphjwP+5CK+AA2/3LgK6dMlryufyEQSgyaBFXPnL/KCaNIo1jyEEPwUIRL8tWQPaiU4x4 FV+Pv0ja2c67WHBHc3DYM75G1ZXTKSc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XrrRdtWvOXhX0IC/Bu+h8sPnIUOX9C/tAx/abPzoJUA=; b=LhBgKDiQDl8j+/F/EvFt6QOE3OP2t9T7HrxdTiEq/9zteFeFubVCBQecSciLiLniraoxC4 BerehIYgqh4fKB/Er5x2UYtSW1YG7bhf5eYjYL972qm4Lw67+YOj2QAAF0Z0HRg/9qeVrL w9ks7x0APO4ful2PcxC73eqge62iXIY= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-483-yClOXT-8Ml2-ZzRRV3D6-A-1; Sun, 01 Dec 2024 16:23:01 -0500 X-MC-Unique: yClOXT-8Ml2-ZzRRV3D6-A-1 X-Mimecast-MFC-AGG-ID: yClOXT-8Ml2-ZzRRV3D6-A Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-4668d3833a4so88272811cf.1 for ; Sun, 01 Dec 2024 13:23:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088181; x=1733692981; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XrrRdtWvOXhX0IC/Bu+h8sPnIUOX9C/tAx/abPzoJUA=; b=KT79IFKL08JNYlnC7AwlYYv9LiZFrRRzTY2CYuuWa7dpwYTW1xY5AOR+8Q+bhHWDfD VeslO1/h4yX6eOs+tRwfm39jH2RGo/ECrFzsJmi3Jj20B1QDAG+3j9GOmqXwDRKUPtyf ZgqTsIOhbyCDhxo0W2mIpPJHT3zv4RFMQxBkVML4ikCg493LLVbHt34yNCNehIFl4OxY jMYt8JCIitb/3bSpoYX+CzlOF+t1h1abPW2h73i/dr9ANMZT+qgsHbeQ24DQzHyWtkJK wvfa5/7CpojLbrqcqNYgR9RpBGSwhZnWbuZ6I9+a/F4SRGJV4wBY5l4OXF/CwZ8Gj+Wh 42RA== X-Forwarded-Encrypted: i=1; AJvYcCXBs27RAlBxlrWCIyASlTLFEFe6cZmtmlzBx7VkKyPKhgSl8F28MtlT7w62Zkl7E2N5SxkSpgp+Dw==@kvack.org X-Gm-Message-State: AOJu0YzFMqfoh9qgZde+LI1HzCEt5C0VQZBAMkJly2QiWMDF/wQwUBmU qVEEhl8/gQYGuzYKRKJVlc5k9atqw3oDFU5ZqCIYo37to9lSusMmlAiCNEMjxfxetPeJIZ4Xe07 +h6XRhOvIN/ykcg66N3Vo7uwIo6qQztaSPvrOYuvSexwbZD6H X-Gm-Gg: ASbGncskfm16hc+NK2s6bsPACfqF4GKZNTkIeJb6o/8+f1r5ns7goHuZMm9HA7xt7PJ jvTA/EK+aeW4q+U/2OI+Bhir4E9WeB4d74b31SOOzoSoQV96zfmCB53c6WA93zKYMvcwJ8TJ7Gt tRlsVe6aXjBAVhWsnMnXk++2EDtqmQM/7megwZwA/ECxFxMRNPB65hpuWp+GQWCZaEeSl5agmNx iyqFKhk2hh3VgG599m5W2yDdvD3Mrn3GGys1/vyeP3rMS0mw+iRs0SeLRDDpA+X8PUqbbNKrwDf DqQ5BhmBqk/BpoxCTU8uBtxMgQ== X-Received: by 2002:ac8:7c48:0:b0:466:a41a:6448 with SMTP id d75a77b69052e-466b359e3e1mr323828821cf.18.1733088180853; Sun, 01 Dec 2024 13:23:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IFMtpb17mmgqOgRNsdiR78ELm0LNQJM3Yg9dkXYjGdXNTa3Aa8RyK4xqT6Ebmoe6jg1COSs1w== X-Received: by 2002:ac8:7c48:0:b0:466:a41a:6448 with SMTP id d75a77b69052e-466b359e3e1mr323828541cf.18.1733088180557; Sun, 01 Dec 2024 13:23:00 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:59 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 6/7] mm/hugetlb: Drop vma_has_reserves() Date: Sun, 1 Dec 2024 16:22:39 -0500 Message-ID: <20241201212240.533824-7-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: wS9Wh6lGoL1xLRaEslrI0qigWkriMWOwvALlrie2kuU_1733088181 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Stat-Signature: efgre8dkmeht6x1qrrioffh796hfmzrc X-Rspam-User: X-Rspamd-Queue-Id: AE477180003 X-Rspamd-Server: rspam08 X-HE-Tag: 1733088175-768133 X-HE-Meta: U2FsdGVkX19Xp2oCfPBBIYXhBBFICB33ReevOYkLeMo18UuF3XdJZfP2A1vtjqdRVUJJ00E7cNcXn+lXtUa2Zj5L/7s+vadvjCsP0xRGrS6d6od/xYZfMq+7ec9C2TFE4MpXnDxA9fdEFKzfgD2a13AFw8YX8zTVHcDz+dUNMjUJ1ueSNhHmRvRndVoQydvV3sPd5xiw2lnrLqw6yCo4PXnjU3lY/1sVVcys7WTYMSj9OZxZLH1iSxbeLaIkzp165HWhXUX8d5F9LBlDIR8xA9OQY92bu8r9WDQcm5VACkuryLQStidCle3EKY64vWpLLHQOSDasnHZshkZ27ETahv19eWj4NvBEp4Bcj+b2wk+GbBWdofnbMUGl/GNCJ9sU4BfxbsW84utsHD63OXaR1zuMeCAwD8mKa/M3dYdIGsebDIDAEKN7LFDSsAiDZ3FsCQm/2XtDL/ZCfzJ74HI+4VnQQsafgG3GuPXy3f4gu1b+TWKtPlbhCzG2Anu7LDMynKRw3MZ/TsKo5WHs+fJuuv4FtOs/zTcpnIHGDN8F0Krdt+mkgB6F9GW2pU8qa5D9ySlewhizxp9L+y7GNnfHa7VKfocIiwl4F8XwP4iFuVL3iFGWqNYilDGEdddCUiY6mY93ZTPlO8FTz5fut3pG4JBOG7EItPArN+SVfVKb54b2Bnkz8YJNYq0nH2fnQKOHawjAuGFsnJakJZXsF0OcCbN7Wo4tBoPd5BG8RrShl9iroCY2OEjlr3WT1TB9VUMtrP5wd7KDzdv275B/z7lubuX/3LZEsV2TF4H9iC3fsvgB3N2I0mHzyojEDXJ3tOYSMNaY/fWsfP5bWbGXBI0JbI/8sRWY9nBGfZPrGO/Cwmsnrz4MheJaaZwsaVjeDMRLND51YioTPNOMwsJ/0Vq8fE6HMAfcJhXpKqO800Dy08AK2yh8Ntk1weGdUyAhRXN/iEi2sKEd/r3hnNRoO7g 34y012OI Lh3YyFPZqCTIo0F+0/hCECwiOUYIy3a0/UEzx9/IWGVtx/DMn732MPrKDcZpMxrxN3odhtIRLp5fdMbMbN96ZxsFXy08qp1vZ7zXkKtvlu0hK/0DT3ke1IVWrLZ8bahHB3GJUP6tUNP5Ojy61vGtioX232DFMuC2YFxQA+sql4ToPIOcFD/WJb0LBhJLeG3hWX1vbgTIE4DpVkSZCgIoEFImaD1Fbyyj1/JPoWw70K2DdNsspwu5KSXMEjhDG1IUfFs2A2Y+3ClHivNKIvq+BQV0X/b+EePlste1sOTn711rDTPXnHVSJddVYnb8JzqM6pp8AvQRHMUqy6+ZF7VctVjq4j4gTLwcOMB0hS1dJyJdRIvEeF/Y7lvnLNGvCpK6Nm2zEjatrDgEFkiCB5s27M9LPuK7n3JP9Daf1crWCS9/43Hq/ouy68pOsZuv72cUhj8JEkQgf4CMGhePEJ3ZWvaYIe8SO5eo22MKeOjyAmfpKp+U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: After the previous cleanup, vma_has_reserves() is mostly an empty helper except that it says "use reserve count" is inverted meaning from "needs a global reserve count", which is still true. To avoid confusions on having two inverted ways to ask the same question, always use the gbl_chg everywhere, and drop the function. When at it, rename "chg" to "gbl_chg" in dequeue_hugetlb_folio_vma(). It might be helpful for readers to see that the "chg" here is the global reserve count, not the vma resv count. Signed-off-by: Peter Xu --- mm/hugetlb.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b7e16b3c4e67..10251ef3289a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1246,16 +1246,6 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) hugetlb_dup_vma_private(vma); } -/* Returns true if the VMA has associated reserve pages */ -static bool vma_has_reserves(long chg) -{ - /* - * Now "chg" has all the conditions considered for whether we - * should use an existing reservation. - */ - return chg == 0; -} - static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) { int nid = folio_nid(folio); @@ -1341,7 +1331,7 @@ static unsigned long available_huge_pages(struct hstate *h) static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, long chg) + unsigned long address, long gbl_chg) { struct folio *folio = NULL; struct mempolicy *mpol; @@ -1350,11 +1340,10 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, int nid; /* - * A child process with MAP_PRIVATE mappings created by their parent - * have no page reserves. This check ensures that reservations are - * not "stolen". The child may still get SIGKILLed + * gbl_chg==1 means the allocation requires a new page that was not + * reserved before. Making sure there's at least one free page. */ - if (!vma_has_reserves(chg) && !available_huge_pages(h)) + if (gbl_chg && !available_huge_pages(h)) goto err; gfp_mask = htlb_alloc_mask(h); @@ -1372,7 +1361,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && vma_has_reserves(chg)) { + if (folio && !gbl_chg) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3023,7 +3012,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (vma_has_reserves(gbl_chg)) { + if (!gbl_chg) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } From patchwork Sun Dec 1 21:22:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D515ED49789 for ; Sun, 1 Dec 2024 21:23:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20B0B6B0099; Sun, 1 Dec 2024 16:23:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 194C36B009A; Sun, 1 Dec 2024 16:23:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB4686B009B; Sun, 1 Dec 2024 16:23:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B43406B0099 for ; Sun, 1 Dec 2024 16:23:07 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 77474A1346 for ; Sun, 1 Dec 2024 21:23:07 +0000 (UTC) X-FDA: 82847665068.27.BA4F1D4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id CC913140008 for ; Sun, 1 Dec 2024 21:22:56 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZE9pwFXK; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088174; a=rsa-sha256; cv=none; b=gnlBGVPDuHyCbwxgxa0c4BskiJnV4WwNBOQWMArupo5NbFc5EQZpfqY58FOVB2npP8rmwj yD2DkSfo1l3J5ZGw3rnG7jTKUlKhSOZEpGd2qsFDd9nFf5yQuIsMsP7ZHFCXlNt/AdcPlk Q0nosHCgcg0EtIa8UjbRyKzi0WBx3vM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZE9pwFXK; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=286MOANtSnZ8I6XGYWwioLKp0ba2wutq5KAnTtwnT/g=; b=1NwtpE7dzs+HT5MYclJ0vWYALz2/279xGqNZsdU/A4goVoM2iO/rVI9z4rxnxJy76UILyu BXf0gQaLYtfcA+TJW1CekzBQ0LoZVZurYuayIzuxZYa+66SaHrTbJfT2TbngYD7s+2DDCs 4ApfCjpqmhJDm6ZVz6aLltIbBGYvHVU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=286MOANtSnZ8I6XGYWwioLKp0ba2wutq5KAnTtwnT/g=; b=ZE9pwFXKbN9QfFDObIKSkhQQP6HqaywPBAKi9bRNVl7l5/3Wk49Iuvw1mwFap2UYMhQlZ2 IlDtOwbmTDCD279jFeo+nL4jwqiPGZsZiz3dN5EpXqBY49u49Z5Pfy9ukgEiTBi8BPTFiZ Arpl8u8C8Tp0cL+LptAmSRA+Ujbz+EQ= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-493-pvkTX4vVMxmyDE_HLwUbZQ-1; Sun, 01 Dec 2024 16:23:03 -0500 X-MC-Unique: pvkTX4vVMxmyDE_HLwUbZQ-1 X-Mimecast-MFC-AGG-ID: pvkTX4vVMxmyDE_HLwUbZQ Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-466ba17bb88so57816211cf.2 for ; Sun, 01 Dec 2024 13:23:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088183; x=1733692983; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=286MOANtSnZ8I6XGYWwioLKp0ba2wutq5KAnTtwnT/g=; b=w65EcIfuN76RxUD8cZD4vsi1yplLPMbM3PPQXcs0jwGwgOkGiMhCABtfA13K1WLPq1 URH5T7+SYWB8S/XyjoBg/FptRgItuyXz/OlRRMYR+vj3WEX8joSWsHu0YwpSmx9ZAuX7 Lu3FbJNRkGnrXHRa3iOAF0alQ4nasRW4wwJqaaBad9yaGTf4ieQIKU4SRC+eGEiCD60m edP3H61/SUa8uP3eLgdwYWhnvUs869BpBOApRyqVtqBVBhG7XPVdygOSAaOl5mxIrdPT 414spAhC1IYbeDfLeZjCxTI3g7KSQxR0h1TmiyC4EdakpnOQbKxU2UVxfmEj0HqS2lwW hUOw== X-Forwarded-Encrypted: i=1; AJvYcCV9Q0BUpBNx0wXNYrwEYRRP2F6GJqRY+YjAi4N16joztikq7Su7WjNhVoiPAzMpK/TsWy5jdOSB2Q==@kvack.org X-Gm-Message-State: AOJu0Yw78i43yWBCxF8EDA+HJPx5FG0DgSXEFUS1zdmguaHaCDOLxyGg fdDiEOifyyGcQA5bs3R3gjevEhKR9fRH3hanZpN8EHbpyu+L6FFz4cd439uhz3aGSnsXrfZPZDK 2DE9B/35jKuhierf2Um+3CoBKgFO6YUcv4GsVmr+Y+vLsw26I X-Gm-Gg: ASbGncsOwDSXquVjHTpmEzHLVRLUwNQ3FdftVqCXsXakHDrh1bD3PmMcs/4CmykMWsW m1+KOClGyFK6iFydf6l+q34DzbpUwtBWxkOvjYx1zxbva2aULTPEHGVHkKyZjQF7sFp3TDmyAGv N783v78lNIV0DYOgGpnyUmafEBPeMrweHXTQCjqlyvf+JvhRoIEBwW2ivbuf7W3XoakVWzSsKth wnGot3FjoVFj90zZH9PzXtRok+uCgd+90byeCJ2JgGG5I//fY+RePhzJtVht3FRhKfPK5rDVtnr pCKR36xO6mEw9YoqpfAmqXIT8w== X-Received: by 2002:ac8:5d94:0:b0:464:c8f2:e553 with SMTP id d75a77b69052e-466b36549f8mr229630721cf.42.1733088183111; Sun, 01 Dec 2024 13:23:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IHcAYLfsmzWB1NZC0VaVby3ei9BnOo2eK+B/fq4AhdvJKUiO4tdmK0mrgEzoTHHOBn14RPVSQ== X-Received: by 2002:ac8:5d94:0:b0:464:c8f2:e553 with SMTP id d75a77b69052e-466b36549f8mr229630581cf.42.1733088182855; Sun, 01 Dec 2024 13:23:02 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.23.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:23:01 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 7/7] mm/hugetlb: Unify restore reserve accounting for new allocations Date: Sun, 1 Dec 2024 16:22:40 -0500 Message-ID: <20241201212240.533824-8-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ECrHhrPAKP80tadeGut45bZD_n_7eWnL-oYcImPAFW0_1733088183 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CC913140008 X-Stat-Signature: 5yokbz73gg4x8ouxsonryc7akbizhwuj X-Rspam-User: X-HE-Tag: 1733088176-304157 X-HE-Meta: U2FsdGVkX1+yqhJvB6pMsHu1qVzaYw++/7TesnaG3qRcJz14uHm4u7p6pY0gC6IFgFmsk/ti6K33cpxMdNOL5ToxyvmB39a4TVT+/ZGcz3eTXzXryw7jFQNDPLgDlJcEmRqVITO7ESSMF/1MmqWFXaj2hZvRwzoXokSQ1qgGy2z0DC2igen8G6t/t4X7KdC8FdyzdEOiPCVZv5yu7glPLmujRN2PyRyOCaKD3ByBd0Ngv0a4XUFzwlyxUBN6bth7pDdp9MUwPf5XX1KJJDbdKTmqeZpX9tgF8OPbNfAWmTnO4gPPaQUSyd7XXv5s3fNZTl9JSfERzMGCh6TNV79OotTR5dax8cSAQUxPqzUbQo9SIs1E/LfzpDLtIDb/06mVD/3PUc7ZNhnwBgGazYtRuF5YSmtaYFjl3J9QBSHavQQ4vILyLOPAhS0mgrn+8kPuyIQ90Et6Kz5KEwC8boQIRHnV4wdrDHvtWsmF5GZIqenDVb6bfxraGribUt50jDZwEokrkbGCB0UohKmgbbmpGQIO1efw1B4tUbPJfXrsfqkZ7y7PqTZMq7AT8mGywaEydnkZMPVnplMcRfLNOSfJ0PYBfnjQan/dRz0/EK0TacW1ge0mTi9tfxfRyrw/3a/J/TrAprxtA3O7eiwEc4t2m0tdQecmrYcRzN1TvWJRoMJ8jGBCeSuY/ve2VmTBt40xNDeKmm5Ed0C9+zgBm2REE0vqtg6AI/m1DQmexWNMUOGoASnC3exqmkLbGrCVJzJR9MC2XOObRuDzwHHBk4nqVEooFhreijMVWxXR3cNilRZVkd9I/JBdrrP4Dm1Ie4fwofF7gxvxB/3wwaXKeHtqTI1iIFx+NGc1+MiBVposFboL0I5CRq8Z51z1CdVbzuCLSqny7QsWKn52FC5qmkXwfpfK8Xyd95A4GDTd/fgmYjOcb313lo0YTxrPNq6T5gpj0evRp8M0iQRez9icAVK se5EawyR Z8SPNSXPZMmlqq9gCqwBop1l6GDh+36nHlbRbX0vGIUb9mAyagOR4Xs9Bi15hdBjD12KAOAeVUCz6oM3kgNKlf4kpMqxiVYryISoKOplXLZQ8zHoPJgxwUlUAJ1llv6esNo4rUeQ57rmKi/Qh7VSufxv1y1Ks4hKGMNre3UcyuqF9+zaZSxI2gzSNFmTONRJqcLsrfp7VVdN66EgMnK0jrfOmNubaJst9E179yBat7AIAvBnZIjTLX6wo3Acj0dg4VUDtpcPZtdsmwl9AAEdF/Gd1uQASy//9Kus9EJSlex3m1Yrl2QnzidEwgx1yHxbjMEXopkmJxtfJzOcy7sZnTaRSUpOwM9Zha+VYq9nSGGgSZ7+UcAfNcI2DiIyT5QeE4woypJ+s+dzwLtiv3+okUgjchk3gUe4gv+U/066+PkndqEj1TQuCUIVKrg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Either hugetlb pages dequeued from hstate, or newly allocated from buddy, would require restore-reserve accounting to be managed properly. Merge the two paths on it. Add a small comment to make it slightly nicer. Signed-off-by: Peter Xu --- mm/hugetlb.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 10251ef3289a..64e690fe52bf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1361,11 +1361,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && !gbl_chg) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } - mpol_cond_put(mpol); return folio; @@ -3012,15 +3007,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!gbl_chg) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } list_add(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); /* Fall through */ } + /* + * Either dequeued or buddy-allocated folio needs to add special + * mark to the folio when it consumes a global reservation. + */ + if (!gbl_chg) { + folio_set_hugetlb_restore_reserve(folio); + h->resv_huge_pages--; + } + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); /* If allocation is not consuming a reservation, also store the * hugetlb_cgroup pointer on the page.