From patchwork Sun Dec 1 21:22:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13889653 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DE0ED49789 for ; Sun, 1 Dec 2024 21:23:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BBDF6B008C; Sun, 1 Dec 2024 16:23:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1459F6B0092; Sun, 1 Dec 2024 16:23:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDB266B0093; Sun, 1 Dec 2024 16:23:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C3F016B008C for ; Sun, 1 Dec 2024 16:23:00 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 81B72141813 for ; Sun, 1 Dec 2024 21:23:00 +0000 (UTC) X-FDA: 82847664816.03.6BD840B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 2F98E1C0004 for ; Sun, 1 Dec 2024 21:22:53 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KL+BMF5S; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733088170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eYuSsE1kcQK7m9nM+2EfixOkXvdk4hDxDm5g5MvtHpw=; b=v1H0NiyN23f3e/RNNxXt2KeC4Zs03W5mJPKB5bvh6zHbdwKoz1wzHzmPW2rC8Fl8t+mSNt M8xuJJ/wH7fu4qAynBUIfDIEEgzPHm7/Casf6dd956OHSC3Py12z6ikSEuQYMflowOgiO9 o/KBqhlXyO4At1FBLLXxYxg2Iui8FzE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KL+BMF5S; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733088170; a=rsa-sha256; cv=none; b=XWbMjES1tWhYtpLD/lj1t7qxh40UXkDoc1XIveJbbQRPigyetqrE7Bm9LEG2/RO++Si7EB rmkquJbwKiQDUmmbeh1NUWaB8P3Er37pY04QNztCAEMCL1tOGYxemEsNi7BNYS4f1PfN0p gRY79hUdBAbziCREnbpCalZk7yZHyrA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733088177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eYuSsE1kcQK7m9nM+2EfixOkXvdk4hDxDm5g5MvtHpw=; b=KL+BMF5S3xbn07yV0zrL50+bjFK1E0ve9pTejSwSH6+Fx/7UAkcKxM8+LidYmdVYEjdjng PfbTFkbOkXP618yrsxdiEMSQ4BIAeG6qvdXHrr0Fn+5APIM5fwcPcdJZ0zbS3joeN9nvvH iqRvcj5nwG2jWu7OQWAYvwR+P7qY+IE= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-128-M77ppVR-OJ-CcfIRaI4lPg-1; Sun, 01 Dec 2024 16:22:54 -0500 X-MC-Unique: M77ppVR-OJ-CcfIRaI4lPg-1 X-Mimecast-MFC-AGG-ID: M77ppVR-OJ-CcfIRaI4lPg Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-466c88a95e5so48597631cf.0 for ; Sun, 01 Dec 2024 13:22:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733088174; x=1733692974; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eYuSsE1kcQK7m9nM+2EfixOkXvdk4hDxDm5g5MvtHpw=; b=SBcNloZ77Uok2gETFvSvBESE6P7k03RHnSeuyI90uAyWeG+Rqf/bGu5akdFYPPTnQs /pHn7c4qnjwfCSCioBjZkAKiD2z+081kG6+YOPW+8WEmVoHsYoUlUpqRhe2pgf9lYATQ mxmazjSL0u3t61vANQUdCAHuVIxlhRyk6MprtNgkb1L0ygMJjSazMZ04k9FLvtoYsg2L etA5Z1thDduqf56PyNRv5JtrynovJwSrN3D0ldVz1mKJ4lVGDIU3WqJjQnqkjb21/oyf UmsYYoiC6HmihBngkLRhYhD70zDumGX5Pz8LKns4xFMrdssB7M8HkvjhDt+on04w8JA8 JKZQ== X-Forwarded-Encrypted: i=1; AJvYcCXSssB6ePnYP4+7Q7UVz2T1T8psa37CDKIXyt+YCtsUhtAhEaHihbyesdzgwqdqW43W1nVt7/Ywfg==@kvack.org X-Gm-Message-State: AOJu0Yyt/7IH1jrWXNrMfKiqquA2s2YZqjLXcuOGQpEysIPmoS1LqYRG iFII8pqVAi1z+0f6hBCTYovGE9mL+WXCFhwOJ40Xwj3ZYYVKLOxzZaachq4gel7ZIwJYQO9Az/s ZKtMCDpxCb4OoY5qlaxoujfIiB5SymvMmjFGjFF08m5z4vkaW X-Gm-Gg: ASbGncsGH/SbiQm6q2SVa4I2tFWheCuJCN2epWVdmtKWsht5sCOUQyJLZOwFdrjXNP4 NkgE3XU40OkELGqPxnAUaOZJoDZNoBraMhTU06/a5PicXPlSQ6kf3rxGckN4aQWaJTVOE7Uhi28 XKXOn1xtjZfaqNlNDYrU3EB6bVqsJ2WkbJHRw6vZ6vCtJ8SqEzycl1BeG+0+p9L2gtUyxko0AH5 gbgZCJkGhQKrK3/T638/Gvv8+ko9YhmerAR8iQBTFoJWrkYbHUDWYqIAYLPBslINpaQkiVb09hm QZRMX22se14qQFX2mF5AjQCxbQ== X-Received: by 2002:ac8:5a86:0:b0:466:a060:a484 with SMTP id d75a77b69052e-466b35264a8mr377536141cf.27.1733088173837; Sun, 01 Dec 2024 13:22:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEB36TOKW8CGRWjXDNI9ssLhVGE/dYfSeis1g4xl3UE1LIzQ+/MdfCEtO+Pg1Pp3wKjx4YPPw== X-Received: by 2002:ac8:5a86:0:b0:466:a060:a484 with SMTP id d75a77b69052e-466b35264a8mr377535791cf.27.1733088173463; Sun, 01 Dec 2024 13:22:53 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466c4249f0asm41278911cf.81.2024.12.01.13.22.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Dec 2024 13:22:52 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Rik van Riel , Breno Leitao , Andrew Morton , peterx@redhat.com, Muchun Song , Oscar Salvador , Roman Gushchin , Naoya Horiguchi , Ackerley Tng Subject: [PATCH 3/7] mm/hugetlb: Rename avoid_reserve to cow_from_owner Date: Sun, 1 Dec 2024 16:22:36 -0500 Message-ID: <20241201212240.533824-4-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241201212240.533824-1-peterx@redhat.com> References: <20241201212240.533824-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: KKS3gXiONywSJ5N_1CHtqGO4P3pSb0U6OHXq15OZo4Y_1733088174 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Server: rspam05 X-Stat-Signature: afpxrg7hygs358smfry64kpezzwh3w59 X-Rspamd-Queue-Id: 2F98E1C0004 X-Rspam-User: X-HE-Tag: 1733088173-6876 X-HE-Meta: U2FsdGVkX18/Fh06BmcBg3Fzt/35ZSkTBHYJ19fjO6aiPupKwg0rITSoJ03kMvtGteFhQ1qhHd6sj/chQG3Y9ldLzkM+wztunLPwuNWlEXPP2261nIoSvGdkOjng5p3JqUNT/Kyzyec8Q1IioZGQpr5SPOxqqPooT1bJuTKn6kUzcqcDugnV4CiA86sGealvF8987aLoGc9UQRMV4PJr5JLc5XMutEVQ2ihHo0cqe3JWHZmt8lg0WwRKJ2m5xFiiDHKn0a+UYu6TkDI0szu9qxMt2ydqzOgXK3ElOKTjXHQ4HO9wRMVLsin2rhXr3lmNQd239c7V2VtK2m/cLxdlDEtxuXxOQjIOE9Hm2sPWQoglZuXP1Y1R/IEfrVthqJQRyR4sq9iJXWZx0y1lkWejxdHBru2/gtQqSzAo01a85YUmEMLmrr4oNbkPd7ISDcu6rwl1alL2oP2X3m4OpPO8GiEXONEp24qJXfV6zszSl3CHZB4g5/Eby5alerv13MM+FBZNIcSrkoF53e5ncvqOYgtOsIV2LnzDd57+4aWhxppycJlxdzWesNc4dQwvnTTITkAnMcDahZNppIncFU0mBkMIqguFfQyGqhL0x9US3oH/2xTCrBpYLGq9OUOgsHvy9srHkqXIl1v8wPDKqcx2TjByuyP5hiJp/b/SlMcOzX06F6qV25e22pcXTB9dAH2CpsLWKaj+jL9SjMTNsDotyRR1cK3/hhfWiHsRH0j6o/I7iH07T3/GSyCnxM6pxujcbh08C1RawBsa+qeRe2oudk1uYtKVAkS+V6oV/5dCYDEtEXgwoDHDND2b01zYnnkd+C2ShbsdWwXRcVCiG13q32fa2ycsFr+9IpsG2/C0UItxOJ8Xz+/heRdkwddv/D+qxbG9Q+aoxlMTCBTtq7WqvW2OAWFHyJbyC/MQQVZq9ZODrG7CYlqLlqopS9zU4M4tXGQT8PNhTWewWjlpE+8 t6aAnFR6 XNBDjsrsJtW0Buqsyi/3jXGW/aboBrZMqDLaxMFOfZ1czifdu3PTX/s/ydp+c1m8MbRwFWGXcEQ071yZAVdQvu/1gLtFVYE/2Jb2rZZeBoKraarkn+qPH4X96qCBAk7KOgdTp397TtGNy83Yr/TcxOn3JOyhYdzx8t8R32o4r+U7bWrO4TG4ORb/j6IHlnyLAyMwqlaexxoFx16HFh86t69Fw1R6A0QtwhQns8SvLkOkxYEjWHY/CuXtliVzFP+BTf9Psje+E1bWOlFWQ1ZFB6ADL7QbiAPtwILnH8k1x65/HMfElW0b1NSGwDA0HShb5Ri0/jQR74Xiui07gc7rcVjTFP6JLppYqK7Eliw9g8iQSEUFgS6rF6HzlNLZi69MfEGPquMKMY/2ymPUX4hjFqVAIULW2numvhiGxiGPqMOWI0n0BYq/kaYfbLLajmVYInsyyOu0JrC91t9etBnLDPcht84hNdrzSi8HTvtdgohmg9y8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The old name "avoid_reserve" can be too generic and can be used wrongly in the new call sites that want to allocate a hugetlb folio. It's confusing on two things: (1) whether one can opt-in to avoid global reservation, and (2) whether it should take more than one count. In reality, this flag is only used in an extremely hacky path, in an extremely hacky way in hugetlb CoW path only, and always use with 1 saying "skip global reservation". Rename the flag to avoid future abuse of this flag, making it a boolean so as to reflect its true representation that it's not a counter. To make it even harder to abuse, add a comment above the function to explain it. Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 33 ++++++++++++++++++++------------- 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a5ea006f403e..665c736bdb30 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -819,7 +819,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * folios in these areas, we need to consume the reserves * to keep reservation accounting consistent. */ - folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); + folio = alloc_hugetlb_folio(&pseudo_vma, addr, false); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); error = PTR_ERR(folio); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ae4fe8615bb6..6189d0383c7f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -682,7 +682,7 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); + unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback); @@ -1061,7 +1061,7 @@ static inline int isolate_or_dissolve_huge_page(struct page *page, static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - int avoid_reserve) + bool cow_from_owner) { return NULL; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8d4b4197d11b..dfd479a857b6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2956,8 +2956,15 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } +/* + * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW + * faults of hugetlb private mappings on top of a non-page-cache folio (in + * which case even if there's a private vma resv map it won't cover such + * allocation). New call sites should (probably) never set it to true!! + * When it's set, the allocation will bypass all vma level reservations. + */ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) + unsigned long addr, bool cow_from_owner) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -2998,7 +3005,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * Allocations for MAP_NORESERVE mappings also need to be * checked against any subpool limit. */ - if (map_chg || avoid_reserve) { + if (map_chg || cow_from_owner) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; @@ -3006,7 +3013,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* If this allocation is not consuming a reservation, charge it now. */ - deferred_reserve = map_chg || avoid_reserve; + deferred_reserve = map_chg || cow_from_owner; if (deferred_reserve) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); @@ -3031,7 +3038,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3090,7 +3097,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg || avoid_reserve) + if (map_chg || cow_from_owner) hugepage_subpool_put_pages(spool, 1); out_end_reservation: vma_end_reservation(h, vma, addr); @@ -5317,7 +5324,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 0); + new_folio = alloc_hugetlb_folio(dst_vma, addr, false); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5771,7 +5778,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, struct hstate *h = hstate_vma(vma); struct folio *old_folio; struct folio *new_folio; - int outside_reserve = 0; + bool cow_from_owner = 0; vm_fault_t ret = 0; struct mmu_notifier_range range; @@ -5840,7 +5847,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, */ if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && old_folio != pagecache_folio) - outside_reserve = 1; + cow_from_owner = true; folio_get(old_folio); @@ -5849,7 +5856,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * be acquired again before returning to the caller, as expected. */ spin_unlock(vmf->ptl); - new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve); + new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner); if (IS_ERR(new_folio)) { /* @@ -5859,7 +5866,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * reliability, unmap the page from child processes. The child * may get SIGKILLed if it later faults. */ - if (outside_reserve) { + if (cow_from_owner) { struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx; u32 hash; @@ -6110,7 +6117,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, goto out; } - folio = alloc_hugetlb_folio(vma, vmf->address, 0); + folio = alloc_hugetlb_folio(vma, vmf->address, false); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -6578,7 +6585,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6620,7 +6627,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM;