From patchwork Sat Feb 18 00:27:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13145384 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1685C64ED6 for ; Sat, 18 Feb 2023 00:29:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF93028000E; Fri, 17 Feb 2023 19:29:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D800D280002; Fri, 17 Feb 2023 19:29:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C701528000E; Fri, 17 Feb 2023 19:29:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B903A280002 for ; Fri, 17 Feb 2023 19:29:03 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 82CF11A050F for ; Sat, 18 Feb 2023 00:29:03 +0000 (UTC) X-FDA: 80478527766.26.E758635 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf10.hostedemail.com (Postfix) with ESMTP id B652BC0004 for ; Sat, 18 Feb 2023 00:29:01 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=faTIHphc; spf=pass (imf10.hostedemail.com: domain of 3zBvwYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3zBvwYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676680141; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Of81VsYOGaJiC6DFXM6sxGYcirLfRpMhUkY/yI+h7kc=; b=fJRk7fnfKfY2z7Yti96p+ueCyvFKivI5WqK387O8ppJnfaAZZ0LweInUOwO2QHeVs5gGy5 3hR07qAHk04d8cgn3f8+jryHLM9S9SUhDPJrSgVK6CACUE0BEv6DtLCdb3iPYvBSukJ04p pFZJ7f8D3g1xFA8G//th6nYN5S9fE2w= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=faTIHphc; spf=pass (imf10.hostedemail.com: domain of 3zBvwYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3zBvwYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676680141; a=rsa-sha256; cv=none; b=JdJL03+1A0I7lgjhjHEEazRHC9irjLuk4HOig8ZUURtpXGa88Ht0M+dBDBn/fA1p7q4V6V 0eWNMQSKcR92mO4/iLXrruJUUCRy9ltkt27TJjeHPd3bTUp+koUedUhMswtaLLKvGGM4ta CuCEKfWrOklyqfrgtcUVbtQuR/c+NA8= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-53659bb998bso19839597b3.9 for ; Fri, 17 Feb 2023 16:29:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Of81VsYOGaJiC6DFXM6sxGYcirLfRpMhUkY/yI+h7kc=; b=faTIHphcD3M2ErzVxaHvUUDOZt6TLoZA4Xzil1xu0/G1//a74JWzTWI0tyBHfpuNha JuiaDE7sP9JLYbJHclSCn1zLcoQow2nzU2lXYNGugjxvfih5hgYInPRhhLcCctGhNLo3 KwFJIDeU1tiOhE1xHmEihdXHq7cGiCAhH0q2zoPtNQjwpXYxrAP0w6dyKJSxmBEiRhDd 7V/h4dydjcI+nPCAOZLKD8SpPxrMq1uA1FOYjkSBWBh0f82wcANh+dEFe2hWIerkaZPV R7mY45T1lGt6PPZpw4zMX2iPUJ6tZ/5sKy1qlMP0mc3wUXq6LpfnZyl3pSM6f22otbp5 BCNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Of81VsYOGaJiC6DFXM6sxGYcirLfRpMhUkY/yI+h7kc=; b=imK6mDJrxexnxRkie5m1otXH6vVoDw1VjmOe98Ndi+Yw30jfD08QByBcLEQ52YEt1S +tsxKxKFnG+fhnE879/VNvR4TkehgpXQjTbfa/p9zjCMs4JqFviPzRPmVR1bX8piYsb3 zsqNv3Rc/+DxujUiHv1BC/yQB+8Yz4hCeUS1EX6whPnRXiesldDbkE/hn0F98JXHiHVp 6wkyv8mSa1PMdCuM13DeuWy7aEiPPfeNacO5JHQHhqE8BNrtTxkbprsgtwp7tGTOsGkO RRL2SFcn2FpeVcFq8mYQvOw8WzFFSiLJJMFOaQUx8FYeEm4jmxrcYPqIQFPpBi1MuVzG Nc2g== X-Gm-Message-State: AO0yUKVfOBaa7wmVNaF30zOd3b7ECgDvqMr8LnxYgdBOu4IhDUOzSVUi dPou2Vt0RZSzyLC3HCJKmFY6xyVSe9hhECkV X-Google-Smtp-Source: AK7set9tTFKkYw3kZzJu84DBO6wAm5FU1Hz3uatyB+dq4su6im3hpPVARJ7+PJEJD7fhUdzldcN3emm0/I1SXTEb X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:889:0:b0:95b:7778:5158 with SMTP id e9-20020a5b0889000000b0095b77785158mr63089ybq.12.1676680140991; Fri, 17 Feb 2023 16:29:00 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:51 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-19-jthoughton@google.com> Subject: [PATCH v2 18/46] hugetlb: add HGM support to __unmap_hugepage_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B652BC0004 X-Stat-Signature: mj4cwexbia5mab3sowk4csfttgzninim X-HE-Tag: 1676680141-67723 X-HE-Meta: U2FsdGVkX18cQNKKPwUYaWId15rk55IPSZdPh9NLpZ0AeG3Wkz6ypmsm9TdRGH4dBwtkMHXVog2vGxMugrUrEBHA9CPBJBGJEEle4IikhAOK4t4aXJ7oXnHWSwG8iiFlsFzOUytP8kp3iK07pi04hjwjetAoH5qA0WT6QVhqdbC8FFYv3pmaueuPRaZi5iAq+hC/XunxnKFz2MxsAGE3kiQZ8D7VXzm150YxyJAODjG5IyDOu2avncenMlQpkei7612HdA3aj9mKb12YVVGCD2g7kmwh2pp+wZ9mei+So1an/NAvFNmhr97ADa8LRIAuX8ma76tE5Ld9hGMZEitRNAjttx2xiXe1EWY0RsggMXD9de6RolkQf31FkOkOtzCtyFftCH9Uhzd6NrfiLHHrHrbnMv3pQS3FVb/v4NDp+zqubAsHcZsYUHD1DISfuUax51ARPxt9FDif0C79kQPdx9G4bHBdd/pDCuUjLl+Pwrm8z0Cyb2DOSn6YPEleLZA+ndBsV+lu1cLVEHfR3nr5YpYkdV7FLvdQasCGeS3IC6gjeCgkYLjMfN3GXW7lR5xJ0nBM7ZKW+a5sr2iF1Y0uywsfPgfvcbXGaWac+JfsxzNV3X2M2hJ1a/mxr7x+ZWGwc+U4DP5i6F8n3U8wIItPLVMDXZxVTCqECed1RRxj4V/BDKWdGX0d98Gd3AOofrDNIimcZvd+GWe+nel7glhWKUeX0KF7+gt5CrMvNUbvchXI6WZrEiTRs5ClUpCxyCuko3wEGIbUFpzYP73F9vjZntcP9po6AfMVbC/qck5JHbpCrddoImmZloSPL8r3j9GHBKlDQYNaHW51Cd2uNumFGs3in2dorGpWDTyCSJDZ/QNIaDSHIhKJ5mFKmOdfATNHfc7RI7EtdOuY/pI7R4gnieOWXRvZ4ajb+SwQO3dLU6U09YQ03SIAiLb2C6/PfqOzRxt6YEf1IyL72JTwPD1 ajIv2Wqc qsZSuh4ncAm3r99hBVIYyYbBDoj2rHCf7EtBBtSGzVlAKRan+/BGVYGaDZ8XiCamlNPQDfSl8H1gT/gaySo3gR6IIbZRwPI4bUi5JwPYKMiJwsaja17sHoGvrKyM1JtACdsgdw7TaRvjbP8pkn6rwGppit0YrZk7fOMq+7isNJ0S7TgkbO7GKLWHjhC+F/88KWihCKCHN5HmrF4sOxP2AtB3LgH7LID1clg12gCylnzAbR1s9PFNvitGErE+zj5p5Lw3SZjRpaK+j2PLaU6Xe4OM0omFTQr+Gi0d7Wb6O4+A9/Cxjr6qSb4NiMwbtUgQ+QuxcLcS2ZSaYFn9qjxt6vse4bFmHf5shL40Nkez6nMGuRU/ym8nrfe5NLMTNdKOeFcIt6/m8tCf/pESVLUW07hxwNcv9Ck7e2M7D6tGB2V4hCZQjmg6WchTzRGEMcea9hqtbzqr6t8b5o0ohYCJsLb5sEAxOVlRDNfnAlKL7WSE6bIvcBA5xEqVMyjvtlZfUqhio4exalBV4vF/W9rymakUvdC/QH64pfTpTFChTAJZDCLrjLNFfOIzvG8LDKitNxN1m49jLc8cBCcqR4IfxB6FOA3o+c8Icz7wcHGAhqYWDFsq7HwLjpk0dMXdRcGOXUbM4D7HI1ap+cPVcLZ4jCu5ms7qQG2Xi9aSaECQeE1rSI7ITFpk3S4z90cVGX2sSJocWaIc+z14xWiM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Enlighten __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Introduce hugetlb_remove_rmap to properly decrement mapcount for high-granularity-mapped HugeTLB pages. Signed-off-by: James Houghton diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index b46617207c93..31267471760e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -598,9 +598,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz = huge_page_size(h); \ + unsigned long _sz = hugetlb_pte_size(&hpte); \ if (_sz >= P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >= PUD_SIZE) \ @@ -609,7 +609,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) /** diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b767b6889dea..1a1a71868dfd 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,6 +160,9 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); +void hugetlb_remove_rmap(struct page *subpage, unsigned long shift, + struct hstate *h, struct vm_area_struct *vma); + void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ed1d806020de..ecf1a28dbaaa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -120,6 +120,28 @@ enum hugetlb_level hpage_size_to_level(unsigned long sz) return HUGETLB_LEVEL_PGD; } +void hugetlb_remove_rmap(struct page *subpage, unsigned long shift, + struct hstate *h, struct vm_area_struct *vma) +{ + struct page *hpage = compound_head(subpage); + + if (shift == huge_page_shift(h)) { + VM_BUG_ON_PAGE(subpage != hpage, subpage); + page_remove_rmap(hpage, vma, true); + } else { + unsigned long nr_subpages = 1UL << (shift - PAGE_SHIFT); + struct page *final_page = &subpage[nr_subpages]; + + VM_BUG_ON_PAGE(HPageVmemmapOptimized(hpage), hpage); + /* + * Decrement the mapcount on each page that is getting + * unmapped. + */ + for (; subpage < final_page; ++subpage) + page_remove_rmap(subpage, vma, false); + } +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -5466,10 +5488,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); unsigned long last_addr_mask; @@ -5479,35 +5501,33 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); last_addr_mask = hugetlb_mask_last_page(h); address = start; - for (; address < end; address += sz) { - ptep = hugetlb_walk(vma, address, sz); - if (!ptep) { - address |= last_addr_mask; + + while (address < end) { + if (hugetlb_full_walk(&hpte, vma, address)) { + address = (address | last_addr_mask) + sz; continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + ptl = hugetlb_pte_lock(&hpte); + if (hugetlb_pte_size(&hpte) == sz && + huge_pmd_unshare(mm, vma, address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush = true; address |= last_addr_mask; + address += sz; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); + if (huge_pte_none(pte)) { spin_unlock(ptl); - continue; + goto next_hpte; } /* @@ -5523,24 +5543,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * We raced with someone splitting out from under us. + * Retry the walk. + */ spin_unlock(ptl); continue; } - page = pte_page(pte); + subpage = pte_page(pte); + hpage = compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page != ref_page) { + if (hpage != ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5550,25 +5581,32 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte = huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte = huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + hugetlb_remove_rmap(subpage, hpte.shift, h, vma); spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); /* - * Bail out after unmapping reference page if supplied + * Lower the reference count on the head page. + */ + tlb_remove_page_size(tlb, hpage, sz); + /* + * Bail out after unmapping reference page if supplied, + * and there's only one PTE mapping this page. */ - if (ref_page) + if (ref_page && hugetlb_pte_size(&hpte) == sz) break; +next_hpte: + address += hugetlb_pte_size(&hpte); } tlb_end_vma(tlb, vma); @@ -5846,7 +5884,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); mmu_notifier_invalidate_range(mm, range.start, range.end); - page_remove_rmap(old_page, vma, true); + hugetlb_remove_rmap(old_page, huge_page_shift(h), h, vma); hugepage_add_new_anon_rmap(new_folio, vma, haddr); set_huge_pte_at(mm, haddr, ptep, make_huge_pte(vma, &new_folio->page, !unshare));