From patchwork Tue Jan 7 20:39:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B05E77197 for ; Tue, 7 Jan 2025 20:40:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F61D6B00A5; Tue, 7 Jan 2025 15:40:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A51E6B00A6; Tue, 7 Jan 2025 15:40:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D1D6B00A8; Tue, 7 Jan 2025 15:40:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3D0306B00A5 for ; Tue, 7 Jan 2025 15:40:17 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E82EC160AF2 for ; Tue, 7 Jan 2025 20:40:16 +0000 (UTC) X-FDA: 82981823232.17.91A3508 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id B945D1A001F for ; Tue, 7 Jan 2025 20:40:14 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TTQMcap3; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282414; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/Z9zMvty6z22PLVpM0GzhFMnUyCY+IlYCy3OrPbUsSM=; b=lpF6fnO+4yyNE6w3n9HeMSo70O2dOnUkHlAr2XpM0BJ8qeVYxayV1nt52QbqjCtviUEjvb 7RuGfeyB5TsH571pUeIF88dsPkf4WvMGtMRbJQjDwno91fHIgebFseBxtVMt2Rk5Fb05Ol sNN3i5TmqPHoZC6jYxS9n1oz0taa4KA= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TTQMcap3; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282414; a=rsa-sha256; cv=none; b=WMGr7GYpCsqgZeXe1TKIf7aaFuF3w4lkspKYn9xlM9n0yg03aiC3Z/qElrqlZHt+ZHwhkC wdgkCQ45WUyCxZYzcAG7Pj6N7NX9nslsRCADf6GkOgdS5QtL/ZbQueYMZVgnmRhFuFKXC6 OSLpyzs5PXs+vlvP9cy1l0/UT+jzWw8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Z9zMvty6z22PLVpM0GzhFMnUyCY+IlYCy3OrPbUsSM=; b=TTQMcap3mj3WgV+bjezlT79b4LK2I+6fyXVPB0ChI167dSDUA1oWGgf1KZFooUP8UEfO0b S2Cw04UmPHXXfupTxIrFN/AZqero2H4Dek1ihTyX1kRH630pj82Paoc2+7+cN/v/58OW32 +ct7n9ReeWFPDRj824SZbU4EboANmXc= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-187-TghSzuSVM1afhl5EjNRMEw-1; Tue, 07 Jan 2025 15:40:12 -0500 X-MC-Unique: TghSzuSVM1afhl5EjNRMEw-1 X-Mimecast-MFC-AGG-ID: TghSzuSVM1afhl5EjNRMEw Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6d88833dffcso214734056d6.0 for ; Tue, 07 Jan 2025 12:40:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282411; x=1736887211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/Z9zMvty6z22PLVpM0GzhFMnUyCY+IlYCy3OrPbUsSM=; b=ALNe58GbJPhtgGmV96b6Xya+hPFcmO2Pm40U6E1w6ggRYcyiLHQ1OAG0GH+RCDaN1s VNDZ+pSkQjBvM4BsnhCjouTmYzz7Ce6gxAkV2Wllp6hvIs7W/Jnma6yM7uxCn/yhUjoV aqlHsL4xxNKhLFR6DqpPPCGDW4Bx+acBUh161+SOjjelXj9IXSDPTy15YTH55RD1K6rI c6uOrSxvF/P8UpjnjVaM2YDqGcOmpKcncyH0auKazr+OpFU2DazyXCLu2CC7Oz3sqaag kn6rrjhT6qcwEXTusgXtrnrUcj7KY+ZGDmz+SqYY+pw8xbmMQBc9hrtAFqztztzcVTEA u42Q== X-Gm-Message-State: AOJu0YzkzZk14BCl7tnSbKv09IFW0e4YjT8A/G0AMEk8Ppf0HTQ4j0jo 9D3gbEVKBaSrYC17ffBK6kcwo3OD3ueyGPxTg/RaCjusihgMzz7OpTlaoFBxCj4g+XcTgaRTNvA mS1SUU0qhfbOmiRDbqK7/7ry8Q2gncYG5xkxP4WTLSwGsBZUWZDifmMsixk29FeveR/TWYIMncQ p1nciYO50mjjRWLfy3Pfdbmpd2mcoeTQ== X-Gm-Gg: ASbGncu48jXWkgtJXlnS9koMHlEAP7DgpuJNk3Jhjmn2dCZMq0mnT8HU/d7HdYz/u3f CeyIGFYbeEeXiHssduSY9F3PChj6whNPRNNrF26YXFJv4kEqxEKqFlpgVqPwQPqZBA66tyj+fg3 XjYMcUr6qofvrzV35mI2TM69eexQrilPB/Y6SGs7GTi1Ng09UjAkX73KMfi6sEoo4JD3+fmKA9R +dPm9UX5B80hvX/4N1UEMZ2TgLvqWo3BSF0L/YD5AbO4NOCV1eqiKIDI4Kj5iZjes/YRvP9BeOy YXulcycHubkHvM0KPkslhgW8hwvBVsAH X-Received: by 2002:a05:6214:27c2:b0:6d8:b3a7:75a5 with SMTP id 6a1803df08f44-6df9b2d50d7mr8914106d6.42.1736282411456; Tue, 07 Jan 2025 12:40:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IHj1C6u68Y/+sXafpOSyDBghbB3mfdOGpS6Z/lZ5W8Owyk67iqxQBFKx0AO6ST9F8O901e9zA== X-Received: by 2002:a05:6214:27c2:b0:6d8:b3a7:75a5 with SMTP id 6a1803df08f44-6df9b2d50d7mr8913536d6.42.1736282410900; Tue, 07 Jan 2025 12:40:10 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:10 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 3/7] mm/hugetlb: Rename avoid_reserve to cow_from_owner Date: Tue, 7 Jan 2025 15:39:58 -0500 Message-ID: <20250107204002.2683356-4-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: UYr7ijEQgL--zSKYyBa8XNC--aNPH_XagvA1hiHKr1A_1736282412 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: B945D1A001F X-Rspamd-Server: rspam12 X-Stat-Signature: cg4obdoxofhfi1udz3cqk4wm5xisqh9m X-Rspam-User: X-HE-Tag: 1736282414-452707 X-HE-Meta: U2FsdGVkX1/Y9iC2oz2W6f5t0sT/hIt6KSEea26++4ERREqqWrulmKEUkfR64cmqGV/OsUE5q8gA2iOHG/bfbsSiDOu5IXqz2jPuB6G4GxTbsxR8iKDDi5ChLcnO49R/Fi9d0atmrMen1+r22qwpBuoWfcJSs3jOHYG2rqxheqIOLZTUZqGCkxguEe0sJ/sBDRELJO70hcg9sxcZwv6DZzq6lqeij4n/dkaE9e2MVCSv0oqA3vqgtBtKlNBahR6qEQ5wircEfhG+vRQVuIHkRa0+LbvkBQVizn4DrlCanWff7LnogHzX3uposyOQjoJ8NO9UEGIscEKnhZTlN/+XunJeq4J/9eSc5jiM6h5yiMSsQwC7wp0cllV+MAgvzK8iGCVzp/5s51eUBKwiOFiArpD60hujTsq/QZRd7Ljk8BuQEErN+DNSr6+mvBSW0RG2VnOlLiWYpeNQ9DHU0ro4mHBa0bia8lTn7grjEmu0VufZR5J03IvYqFocE+gAzK0WStueM7vXH+8ZsOOKr7TkLRxAkc6sdvSwmh4iAaSkr2tF/RpCtDngdC1IEF2nEzyVbDxlle8JHe8TWzyLNugMnildPzAiIiveV/T4jFBBFo/kLCnlXLC3VQ7FRu3txMYbdxNAW2ElzB6OTuNtBzbfYJ3KvIDs4LV68ZbMMfFgteK+CYhHDWX23B7TmpVeH/tphCP7yavpHu7QRo0c/+m7g+s95hBqXbm9U/7dLvVzUwb9FrvX0cHC+BgtKupxd8jYooC9PoSmHxKXiXqFu6JVk+r8SmaY0q/kxEN0BEa+XO+PiEPeohjK7RV7xxJEiQAo4BT+OtnHoorEOcPuWkj+BFTywj2DHjSS1A4hkpPhMdxuRxN0pS80DBZrIwFrG9lqmtLvtIvRZnwzDRMXBNznQOrI8cnO3rzoatvJ6hKNC41UG2LmDzikfmzZh+S/13OGiRqieFwKqGxVOHSCay/ SRJKF4Pa 8zf/5TRhKodv6DmjMqt53q30iSb3Umokemg/weSCDvNvMl8vf4erswi//YvCSwSaAeECMATyatn+6Qo3nvrLMkQ0IEdYDhVDwWjx3elQfV8HKWvX8jEjlNCp58ErvxA1L9it9Y2X4EJdrTDels7TAmVZlygoRuPz/+NA0iJpcyF3yYSyOW7+29BSRntG4U1hDB9NG8ZhQY+Ec1bVTh3Nj6//Rh8iVJzkt61eB9bVAIewuk1HG2dKJyU6r/Jz+tXKrd4hibqSwC/D9XdsGsqq2exQ7F/W2oDZnB5ZVcNbF7Cz/9oyVy7kX++aOEgY3K+9UTmtRrmoD2IwNRLsJvgD2ZbSOW3BIIRMgHJ8Xfg1erz6I3i9JgR5GdNOg26G/jXzVxVa5grg1nTj23rnrSRA9aYlovQJsVD7+BlPgYghYMvYo/3LjX+ghUeY6nLXpsQx8teLJ0NDCSEALISzRNWML6V2R+nI+iho0FDp0dqPFDiAt3kw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The old name "avoid_reserve" can be too generic and can be used wrongly in the new call sites that want to allocate a hugetlb folio. It's confusing on two things: (1) whether one can opt-in to avoid global reservation, and (2) whether it should take more than one count. In reality, this flag is only used in an extremely hacky path, in an extremely hacky way in hugetlb CoW path only, and always use with 1 saying "skip global reservation". Rename the flag to avoid future abuse of this flag, making it a boolean so as to reflect its true representation that it's not a counter. To make it even harder to abuse, add a comment above the function to explain it. Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 33 ++++++++++++++++++++------------- 3 files changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 62fb0cbc93ab..0fc179a59830 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -814,7 +814,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * folios in these areas, we need to consume the reserves * to keep reservation accounting consistent. */ - folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); + folio = alloc_hugetlb_folio(&pseudo_vma, addr, false); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); error = PTR_ERR(folio); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 10faf42ca96a..49ec2362ce92 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -683,7 +683,7 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); + unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback); @@ -1068,7 +1068,7 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - int avoid_reserve) + bool cow_from_owner) { return NULL; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7be8c35d2a83..cdbc8914a9f7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3008,8 +3008,15 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) return ret; } +/* + * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW + * faults of hugetlb private mappings on top of a non-page-cache folio (in + * which case even if there's a private vma resv map it won't cover such + * allocation). New call sites should (probably) never set it to true!! + * When it's set, the allocation will bypass all vma level reservations. + */ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) + unsigned long addr, bool cow_from_owner) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -3038,7 +3045,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * Allocations for MAP_NORESERVE mappings also need to be * checked against any subpool limit. */ - if (map_chg || avoid_reserve) { + if (map_chg || cow_from_owner) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) goto out_end_reservation; @@ -3046,7 +3053,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, /* If this allocation is not consuming a reservation, charge it now. */ - deferred_reserve = map_chg || avoid_reserve; + deferred_reserve = map_chg || cow_from_owner; if (deferred_reserve) { ret = hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); @@ -3071,7 +3078,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { + if (!cow_from_owner && vma_has_reserves(vma, gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3138,7 +3145,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg || avoid_reserve) + if (map_chg || cow_from_owner) hugepage_subpool_put_pages(spool, 1); out_end_reservation: vma_end_reservation(h, vma, addr); @@ -5369,7 +5376,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 0); + new_folio = alloc_hugetlb_folio(dst_vma, addr, false); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5823,7 +5830,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, struct hstate *h = hstate_vma(vma); struct folio *old_folio; struct folio *new_folio; - int outside_reserve = 0; + bool cow_from_owner = 0; vm_fault_t ret = 0; struct mmu_notifier_range range; @@ -5886,7 +5893,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, */ if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && old_folio != pagecache_folio) - outside_reserve = 1; + cow_from_owner = true; folio_get(old_folio); @@ -5895,7 +5902,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * be acquired again before returning to the caller, as expected. */ spin_unlock(vmf->ptl); - new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve); + new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner); if (IS_ERR(new_folio)) { /* @@ -5905,7 +5912,7 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * reliability, unmap the page from child processes. The child * may get SIGKILLed if it later faults. */ - if (outside_reserve) { + if (cow_from_owner) { struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx; u32 hash; @@ -6156,7 +6163,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, goto out; } - folio = alloc_hugetlb_folio(vma, vmf->address, 0); + folio = alloc_hugetlb_folio(vma, vmf->address, false); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -6622,7 +6629,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6664,7 +6671,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM;