From patchwork Thu Jan 5 10:18:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13089651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE07FC54E76 for ; Thu, 5 Jan 2023 10:19:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BD4794000E; Thu, 5 Jan 2023 05:19:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 745CA940008; Thu, 5 Jan 2023 05:19:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D4894000E; Thu, 5 Jan 2023 05:19:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4EE4B940008 for ; Thu, 5 Jan 2023 05:19:23 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 232ED140D66 for ; Thu, 5 Jan 2023 10:19:23 +0000 (UTC) X-FDA: 80320348206.07.EF115F6 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf01.hostedemail.com (Postfix) with ESMTP id 6DEDD40007 for ; Thu, 5 Jan 2023 10:19:21 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L2aTbeQA; spf=pass (imf01.hostedemail.com: domain of 3KKS2YwoKCG8WgUbhTUgbaTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3KKS2YwoKCG8WgUbhTUgbaTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672913961; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3kWM8DYu7DnV7Jv/QGJ6f3ppiIBnNL94nY50ZsYLZ48=; b=UXMOhEveeGUMYKMX9qnTWhR0U9GBhpE128lvHzccpwoCgfBkfiaafB/L+okZDfVH5jdlgH PyCruals6Klzagoyaj0gl1kxKWeySTdtEKUKPe/+HD8qt07X2dnvx8+D7Uc+iU93nZa3KZ q6l7ief6MwKl0/cZVuCdAiiUtM/357U= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L2aTbeQA; spf=pass (imf01.hostedemail.com: domain of 3KKS2YwoKCG8WgUbhTUgbaTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3KKS2YwoKCG8WgUbhTUgbaTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672913961; a=rsa-sha256; cv=none; b=OWxQk5hMjuE9Njjt9Q0uKtExYtrOVa3l2mmZlKiNnNQJFTqf6YOQc6ytgSHaw2MCFYFDOP /LkgS9u4fiprM/MzdXD5UtmMtaxDB6aXjmfSdtV2iNO8LRTJO+RJ8crTxQh2U1PL72BrKy nxUQKWT3MX2fl+D6GNnDotrWuqKzZEs= Received: by mail-yb1-f201.google.com with SMTP id 195-20020a2505cc000000b0071163981d18so36623226ybf.13 for ; Thu, 05 Jan 2023 02:19:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3kWM8DYu7DnV7Jv/QGJ6f3ppiIBnNL94nY50ZsYLZ48=; b=L2aTbeQAeNJJeiUCuWw/dfwAL4UVIBqWGH55xL0ZJpLs3sGFv6hFcIvCc0Ga7d/ieX hjmwK3ib7/cchvS/YNI5B2b8qO3orIA5BWZwCv046KSsSk68EciP1H12pdFZ0JnY3ID3 pYb07IsQSWYw/0QL7lCbsdokm5Jdl9QVO58MurdRYCOYSbQ0K1s+CTWgqBbCQMbN52p3 rBm+Mj78x+W8YyDclWaBCjNduKppB0sC5WajJ2RRtfrhbf2j86ceAoNxktgW6316luJG DMu0s3GTU3x/vCtEgoRMt2pwviQKJhe3xm5AQwX5rbR/oBcVLAk0J73laj0wxnS/nDXd glUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3kWM8DYu7DnV7Jv/QGJ6f3ppiIBnNL94nY50ZsYLZ48=; b=gCBZVNoS6VreNK3Bypyhy7e0ImZXDm277u7Q1UvsRjfglWH5Qfj8mW080UmgeOiK35 0PoMEFZedk/l0F8CRrnpaYRv5Br6SDJpI0IxyqZcANk6A+6PFbGk6ZakUS/U7OyvwyMM TdPEWwGcSZ2YiRBX7bOdwMS58hQjWqBCp4soHxl+APio25iynU749aUO1ZsHzhZGpjI7 MMPgq2KC26Wc98eO93dW/Hf2vam2Nf2EgVg9fv319mHezjhSXhm3B5drUxAHu7nRv+xW OBzT9hsYLwO8blzhCdc58Hu4hNZfSS2fLjUqdokqik4dOj/insWziDLUXzdOnjA58c+N z6GQ== X-Gm-Message-State: AFqh2krFZkU7fwfb5ETTipoYVgD35FgoARtp6sXSmgxl15kN+OMni4q4 tYzYC14aVrC7ePMf+bVXTOnX++ZTiQbpttOb X-Google-Smtp-Source: AMrXdXvqOEL7ovL3FbMjk4okDrWCxN+YTjALnEUL0oRXnxb5KqHby493dqSvh11l+d52AF7sViTIqr3WdL+KcRjD X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:aa4f:0:b0:717:e051:5d2d with SMTP id s73-20020a25aa4f000000b00717e0515d2dmr5822098ybi.474.1672913960512; Thu, 05 Jan 2023 02:19:20 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:15 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-18-jthoughton@google.com> Subject: [PATCH 17/46] hugetlb: make unmapping compatible with high-granularity mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6DEDD40007 X-Rspam-User: X-Stat-Signature: c7befmc6eud8uafh9fypsfimhyaow6pw X-HE-Tag: 1672913961-106459 X-HE-Meta: U2FsdGVkX19sm8O/YwZkCaSfR0OOBNkeh804hIh+qM2eYVlj7MyIhcDYt+qnaLAYvXB8XKsQDkaLHEIAJx0TcjtVPsDd6q484ZUiCCydrjBYCxHGqez3UH6ruTmJaeauQCSzsnXh436KZ8ocGOgYLjvZxZZKEnI6YLWyM7FodRwOcNx98dkMo3N5jIRk4fZSKpDP8AUb3RKYAu2a70SROwsrrn2Hy91eEK+JOz2A/JolWM0FfZOVygBqpD7Mcj3BjsCvS6ZUjkdQTs2Bm23Z9RQJpK3M30ShMrmhZnhhLmT5ZpzOKAr/50seLk1bnNPlNGHDtph8G2XInV6wNua3xuGHTHJ1iNCL65OaWTjrDCCUzmT8qG9CjwLf7Ghp4WgXpGmSUFnl1yWgmDIscLxuFLsnYPuvdLGHi3C5SK3J+vCFmYlAU8EaYayYti0nEpWDHIwGzMbp4It8AWe2/24EL9XlBgYFRROhNQ5P7VJGZMq9XEuBc5ZltTf0tTNH6wPlgdmjmPGdMG+UgVT+9xoJoVYzVmbaF17Jhehowi69WFFSBnOnqkvKtM9m+OwfyAcaKIysyrzNYQJuFO3P7wRknQ3nyFPNTVsDFCgcHRKvLDGrz+vs1dzpSv2X3aYeyr4vFVrJYO/RIGGTNn4fUo5HC0VBBWW4qQ4zmJPikebjVW4u9tVMEXb2gygCesSbPfQA7cHc71QrAJmZGyzYZXQQjKY3M58/DDSilnTcgor2yXaFqxJXdqtzacLyMOyW9Mj4Ltk/Ehlinzw6m+FADdD+3ENya9qM6KMY+4p0/q57um8LjInSrd9RmfMAbsjtvpKMCsGsFHVNQwS08CoiVRA6EymyUZ9eqFiLTQcb0u63aniMy790GcUGq/ppOBe++K5gbeDhIQ9JOTKRCn3HcpjEa26n+9G4xYbyejmSAqC3KC9k4l3kRl5j+ABmPfqAr6T/QRy06qsoohDRFKTqvxF npee6HLf 9Y2AclIbR5HJQGLS0fx66BfmRB7tsefSsN4cCJE4BnCv0ut7Nt2lMNfJFtGl+RIQJOcevikgKr7bZ1MR0CGQlRRprayzSeyu7diGzrkvqJA8iOjHWqd/+EU35K4JALI7ql1qT6wnr2v7z521b/9334sS+oxbTfq1SjjsghG8FMv5mk38SPUu9wGy1SbToW3wjg7+FiLY1gc0uV9FYbhuMUe8xIqOFdm5WBCGG1kbdQMwMPg2GS9eVZd5FW6OqEeMPgbcffZj8SAuc55/2MQCJB3QnF5ag7V8LfbtKLWf1mp9YeoBcmH/v9P9Ql6qY8udSU13tzNl7zm8YagvkaTx2j7w/wYmyemI21mcYZ94rZN8RZrOf/pIQonsUVpxBQFiG7Df8sYfCZA2/I4I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Enlighten __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. The rules for mapcount and refcount here are: 1. Refcount and mapcount are tracked on the head page. 2. Each page table mapping into some of an hpage will increase that hpage's mapcount and refcount by 1. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Signed-off-by: James Houghton --- include/asm-generic/tlb.h | 6 ++-- mm/hugetlb.c | 74 ++++++++++++++++++++++++--------------- 2 files changed, 48 insertions(+), 32 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index b46617207c93..31267471760e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -598,9 +598,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz = huge_page_size(h); \ + unsigned long _sz = hugetlb_pte_size(&hpte); \ if (_sz >= P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >= PUD_SIZE) \ @@ -609,7 +609,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3a75833d7aba..dfd6c1491ac3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5384,10 +5384,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); unsigned long last_addr_mask; @@ -5397,35 +5397,33 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); last_addr_mask = hugetlb_mask_last_page(h); address = start; - for (; address < end; address += sz) { - ptep = hugetlb_walk(vma, address, sz); - if (!ptep) { - address |= last_addr_mask; + + while (address < end) { + if (hugetlb_full_walk(&hpte, vma, address)) { + address = (address | last_addr_mask) + sz; continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + ptl = hugetlb_pte_lock(&hpte); + if (hugetlb_pte_size(&hpte) == sz && + huge_pmd_unshare(mm, vma, address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush = true; address |= last_addr_mask; + address += sz; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); + if (huge_pte_none(pte)) { spin_unlock(ptl); - continue; + goto next_hpte; } /* @@ -5441,24 +5439,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * We raced with someone splitting out from under us. + * Retry the walk. + */ spin_unlock(ptl); continue; } - page = pte_page(pte); + subpage = pte_page(pte); + hpage = compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page != ref_page) { + if (hpage != ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5468,25 +5477,32 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte = huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte = huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + page_remove_rmap(hpage, vma, true); spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); /* - * Bail out after unmapping reference page if supplied + * Lower the reference count on the head page. + */ + tlb_remove_page_size(tlb, hpage, sz); + /* + * Bail out after unmapping reference page if supplied, + * and there's only one PTE mapping this page. */ - if (ref_page) + if (ref_page && hugetlb_pte_size(&hpte) == sz) break; +next_hpte: + address += hugetlb_pte_size(&hpte); } tlb_end_vma(tlb, vma);