From patchwork Tue Jun 13 21:53:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13279239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E542EB64DA for ; Tue, 13 Jun 2023 21:54:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0DB68E0003; Tue, 13 Jun 2023 17:54:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBCE68E0002; Tue, 13 Jun 2023 17:54:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0FAE8E0003; Tue, 13 Jun 2023 17:54:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C0BC48E0002 for ; Tue, 13 Jun 2023 17:54:15 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6F914AFC62 for ; Tue, 13 Jun 2023 21:54:15 +0000 (UTC) X-FDA: 80899078470.26.861F9F4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 51CEF1C0003 for ; Tue, 13 Jun 2023 21:54:13 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EknLURo9; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686693253; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/X3bGqn1q+zHUVNMkc4k9dYCbdcixuWgARgx8VaUnYY=; b=YPwF4zhToqrpqM0CFirKRxaSAFxjsLDkRJW+GbRhSIWoEBKnP16meA/ppvJxdg62ckHkhU Ro4xIP+A3KygI/Q7euU3ndaZy11/zbQYlaeN8kpiEuaKaXTmiYJgypTJbKVBSniPIbFYpT 903+w88LucpawnxwxyoGOJ6xvKl6CQ0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EknLURo9; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686693253; a=rsa-sha256; cv=none; b=Es1S9SEjJGJeWiGdwhvxkvmYmzHFvoH7Fe+LMFnasFX5gsYqzKHpQnuAVwgyeP3SCKPquP Y6gFfXjdyog2L5JdDJRHkEkrmQBg3WBMkUveZjm6HkmUWJ8yTN/P257SDbSGG50uyVRJy4 D/mvKwpIA/PQmyZyAO9FaHdSjkeIVNQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686693252; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/X3bGqn1q+zHUVNMkc4k9dYCbdcixuWgARgx8VaUnYY=; b=EknLURo9ydjhZF642WII0DY2Ihy2Q3AXsQhj8HvTmkf05hKcEUmF6ZrQZC3D5nzxYzndR+ EE1Z+VKgItSxyS4YxxEAu0FYb7V+9Ya3yRwyOAJO18V4BsmB6Ykvn8CmZe0aLOGqPrXMJn QHPPVALLKPmSLItoQh9dDwiWALeqdi4= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-605-62CorX6JOSCl0AtysufExA-1; Tue, 13 Jun 2023 17:54:11 -0400 X-MC-Unique: 62CorX6JOSCl0AtysufExA-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-3f9eb7d5202so4777521cf.1 for ; Tue, 13 Jun 2023 14:54:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686693247; x=1689285247; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/X3bGqn1q+zHUVNMkc4k9dYCbdcixuWgARgx8VaUnYY=; b=NFGdEaWrX6D2/ib/fhLUQoVeyD5po8H7D/9bQ47eqidoHsL/G0Kb49xAed9iNujO6s YDs1LwIXJ6jFCUMLZOOHA2TCdB/6SdpbBIQVzznO1ltA72f9AMQs879Kh/TIbOkIBEvz mbh0e3u49csLp5+kMNd8wblwI+yhVBiyd64GwnoZRUPOwpCaLtQchwFd2XTSLdr49Kla mXWuMaMVAsZyD1/+t9C26pLrRVcU/cCv2189721LqYvbQkAYaEWuLpA149dLK1TzstWc gb/4d5NTaaHrMFVlDstvq4znbJ26LhJDEbwLUHfTY/j7egX/WZPC0MoBzCbNN3oaSTKs rJBA== X-Gm-Message-State: AC+VfDyKjMC8Vj35+c5jVgXMI9IJcjygZS+SAFwVxhSXJkzCTa1qJENr mh1C7llgh1PdQOYgXMQflJvv/Cv46cvC7LXgzVGWLGaoEu5ge/nyulUBruNdg33Ed+df0IVexkn XeLJa2KP/6zw= X-Received: by 2002:a05:622a:1a02:b0:3f6:ab9a:3d8e with SMTP id f2-20020a05622a1a0200b003f6ab9a3d8emr17758119qtb.4.1686693247539; Tue, 13 Jun 2023 14:54:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5w87HGnELUi5EkIzhKvNW57lYdNaWbCEMcUpk8sbBZqYIrssusFuVowDO7JEjTYM2qoSnZFw== X-Received: by 2002:a05:622a:1a02:b0:3f6:ab9a:3d8e with SMTP id f2-20020a05622a1a0200b003f6ab9a3d8emr17758109qtb.4.1686693247132; Tue, 13 Jun 2023 14:54:07 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id fz24-20020a05622a5a9800b003f9bccc3182sm4522330qtb.32.2023.06.13.14.54.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 14:54:06 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Andrea Arcangeli , John Hubbard , Mike Rapoport , David Hildenbrand , Vlastimil Babka , peterx@redhat.com, "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , James Houghton , Hugh Dickins Subject: [PATCH 7/7] mm/gup: Retire follow_hugetlb_page() Date: Tue, 13 Jun 2023 17:53:46 -0400 Message-Id: <20230613215346.1022773-8-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230613215346.1022773-1-peterx@redhat.com> References: <20230613215346.1022773-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: 51CEF1C0003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 8fo5rw1fuhojr47iggqystxy9o4m6tyu X-HE-Tag: 1686693253-257658 X-HE-Meta: U2FsdGVkX1/FT6/kAnknXZ2dQONRc40yh4uORSe7tReO3UHT4sq5atyAEjoioW15EVS7MWvydzeuTP+45ZJZWSsVkGk2v+aRDy/nTnCHwFjII53dJVcnNk6Y3ZbSTZ+EHJurRR3Z3Mm4y7J9Rm64ZxTRTY+fkUgOKWTG0GVoyKvIv4NPkf/A/91o7G4BxudRC4DRkY0MZ/4gslXMMmTPL//IMtoNT47iQXm7jmdMJCSK1HpZsuIHwr9GxG9tHYEfxbKNPRvXsTHnOTZNt2clhZPmtNYDckRHbPUrmbkyh7/Tm+QlRkLGFpLue8GVVqY17THr5mHjVc3iM/yYqv4goRPRJGKhbp3aDzP0j3m7NNL0GhYMTFQ2mUy3Ow+AxioyOyi+CJfvDKUCgJQ2QMMXFsoCdtdf1it7Ge7h3n5Lyue4twvzdoZf0LowTTIjY6WF3gP6Fm2kAjCWOXilp0li9mVVb6fe9UxU7NwwHwtLH7qSFhP/6MESXKDmilS2SGoT7aFl6oEyvvfSy3kkx8vTKfuFWX/KkW6WRkpDk0DF3D39bbVqGua9AcEjeLLRalVqJKACB33HDJiS96WrUmgrDb9yfvJFFvyZpPYEx0Buo6rD14JvN/NxEluKppjye8uLEUU/whARv+ksnJs0qB5AwMjeRlHDBcc/TSwImJZ76/8zvE+on8S5B7y219bSw55Z94zec9m5ptI13ourkorN7ufQw26xdHPHEBJJY/C5KKScp4wK3z4okBnnp0Cx6pbbj1ejHggI1Vtf/UfO90jZHeXaBoL4fw4ox3fA6F6DUBmfPWpiCG7DncQiPp5UCeNKb6zyPp2M+b7nb0rL21bppiNWFOu/wPvOzmlW1b/jxFHZ9CTnTnYWO7O56+diIR/xAA3G6hw8fw1tGDo1pK7FI4M0x+Ls+YOzl9ejK1K5RniJ+uJLeH+8Y+77FhdVshA+pmJz1Ug/bLRNPiEPQmJ No3cdEfz ZAHMVxkah5iXZLHfNFLK8Pd86K7eplByo4tcO/s/tKcbF+fERTWEm6RWwWSdiqWqdRSnXCCF6c4OU5u1s/W1WokJrz+ifKslrbH/Tv5qFSEI761kYG329wE5+BAfTJMMI1QFa8bkz3KGhyVXuSnwlUCuSFs7Zw+p18l90yelxu2pCbEWwfFWBWn8Zc3iC+UJAgu6RrNmKr4A9DGvvx9ZU85cptkjOaTckaqa68ZA1kKt29huM/9FrBLgfT7RMuhbUupwVMCuOTs6LR5WCO00SsktQrM6/fXlDzrR4Ksk/6L8xPVE+yXmhxuA2+jTkTA2euKAed3j3X42jKPno1JEF9HjhowSIYb5l6W3eEGK9okamMoRPfBi0CcTLboeUYMWkyK/8BjsqyfGBb3ZSQdLT/9ljMMjcJ/OSajLcGAaVaa7Mnt897gL/BH/e1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now __get_user_pages() should be well prepared to handle thp completely, as long as hugetlb gup requests even without the hugetlb's special path. Time to retire follow_hugetlb_page(). Tweak the comments in follow_page_mask() to reflect reality, by dropping the "follow_page()" description. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 12 --- mm/gup.c | 19 ---- mm/hugetlb.c | 223 ---------------------------------------- 3 files changed, 254 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0d6f389d98de..44e5836eed15 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -133,9 +133,6 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask); -long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, - struct page **, unsigned long *, unsigned long *, - long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, struct page *, zap_flags_t); @@ -305,15 +302,6 @@ static inline struct page *hugetlb_follow_page_mask( BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ } -static inline long follow_hugetlb_page(struct mm_struct *mm, - struct vm_area_struct *vma, struct page **pages, - unsigned long *position, unsigned long *nr_pages, - long i, unsigned int flags, int *nonblocking) -{ - BUG(); - return 0; -} - static inline int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *dst_vma, diff --git a/mm/gup.c b/mm/gup.c index cdabc8ea783b..a65b80953b7a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -789,9 +789,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, * Call hugetlb_follow_page_mask for hugetlb vmas as it will use * special hugetlb page table walking code. This eliminates the * need to check for hugetlb entries in the general walking code. - * - * hugetlb_follow_page_mask is only for follow_page() handling here. - * Ordinary GUP uses follow_hugetlb_page for hugetlb processing. */ if (is_vm_hugetlb_page(vma)) return hugetlb_follow_page_mask(vma, address, flags, @@ -1149,22 +1146,6 @@ static long __get_user_pages(struct mm_struct *mm, ret = check_vma_flags(vma, gup_flags); if (ret) goto out; - - if (is_vm_hugetlb_page(vma)) { - i = follow_hugetlb_page(mm, vma, pages, - &start, &nr_pages, i, - gup_flags, locked); - if (!*locked) { - /* - * We've got a VM_FAULT_RETRY - * and we've lost mmap_lock. - * We must stop here. - */ - BUG_ON(gup_flags & FOLL_NOWAIT); - goto out; - } - continue; - } } retry: /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 31d8f18bc2e4..b7ff413ff68b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6425,37 +6425,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, } #endif /* CONFIG_USERFAULTFD */ -static void record_subpages(struct page *page, struct vm_area_struct *vma, - int refs, struct page **pages) -{ - int nr; - - for (nr = 0; nr < refs; nr++) { - if (likely(pages)) - pages[nr] = nth_page(page, nr); - } -} - -static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma, - unsigned int flags, pte_t *pte, - bool *unshare) -{ - pte_t pteval = huge_ptep_get(pte); - - *unshare = false; - if (is_swap_pte(pteval)) - return true; - if (huge_pte_write(pteval)) - return false; - if (flags & FOLL_WRITE) - return true; - if (gup_must_unshare(vma, flags, pte_page(pteval))) { - *unshare = true; - return true; - } - return false; -} - struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask) @@ -6518,198 +6487,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return page; } -long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, unsigned long *position, - unsigned long *nr_pages, long i, unsigned int flags, - int *locked) -{ - unsigned long pfn_offset; - unsigned long vaddr = *position; - unsigned long remainder = *nr_pages; - struct hstate *h = hstate_vma(vma); - int err = -EFAULT, refs; - - while (vaddr < vma->vm_end && remainder) { - pte_t *pte; - spinlock_t *ptl = NULL; - bool unshare = false; - int absent; - struct page *page; - - /* - * If we have a pending SIGKILL, don't keep faulting pages and - * potentially allocating memory. - */ - if (fatal_signal_pending(current)) { - remainder = 0; - break; - } - - hugetlb_vma_lock_read(vma); - /* - * Some archs (sparc64, sh*) have multiple pte_ts to - * each hugepage. We have to make sure we get the - * first, for the page indexing below to work. - * - * Note that page table lock is not held when pte is null. - */ - pte = hugetlb_walk(vma, vaddr & huge_page_mask(h), - huge_page_size(h)); - if (pte) - ptl = huge_pte_lock(h, mm, pte); - absent = !pte || huge_pte_none(huge_ptep_get(pte)); - - /* - * When coredumping, it suits get_dump_page if we just return - * an error where there's an empty slot with no huge pagecache - * to back it. This way, we avoid allocating a hugepage, and - * the sparse dumpfile avoids allocating disk blocks, but its - * huge holes still show up with zeroes where they need to be. - */ - if (absent && (flags & FOLL_DUMP) && - !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - remainder = 0; - break; - } - - /* - * We need call hugetlb_fault for both hugepages under migration - * (in which case hugetlb_fault waits for the migration,) and - * hwpoisoned hugepages (in which case we need to prevent the - * caller from accessing to them.) In order to do this, we use - * here is_swap_pte instead of is_hugetlb_entry_migration and - * is_hugetlb_entry_hwpoisoned. This is because it simply covers - * both cases, and because we can't follow correct pages - * directly from any kind of swap entries. - */ - if (absent || - __follow_hugetlb_must_fault(vma, flags, pte, &unshare)) { - vm_fault_t ret; - unsigned int fault_flags = 0; - - if (pte) - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - - if (flags & FOLL_WRITE) - fault_flags |= FAULT_FLAG_WRITE; - else if (unshare) - fault_flags |= FAULT_FLAG_UNSHARE; - if (locked) { - fault_flags |= FAULT_FLAG_ALLOW_RETRY | - FAULT_FLAG_KILLABLE; - if (flags & FOLL_INTERRUPTIBLE) - fault_flags |= FAULT_FLAG_INTERRUPTIBLE; - } - if (flags & FOLL_NOWAIT) - fault_flags |= FAULT_FLAG_ALLOW_RETRY | - FAULT_FLAG_RETRY_NOWAIT; - if (flags & FOLL_TRIED) { - /* - * Note: FAULT_FLAG_ALLOW_RETRY and - * FAULT_FLAG_TRIED can co-exist - */ - fault_flags |= FAULT_FLAG_TRIED; - } - ret = hugetlb_fault(mm, vma, vaddr, fault_flags); - if (ret & VM_FAULT_ERROR) { - err = vm_fault_to_errno(ret, flags); - remainder = 0; - break; - } - if (ret & VM_FAULT_RETRY) { - if (locked && - !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *locked = 0; - *nr_pages = 0; - /* - * VM_FAULT_RETRY must not return an - * error, it will return zero - * instead. - * - * No need to update "position" as the - * caller will not check it after - * *nr_pages is set to 0. - */ - return i; - } - continue; - } - - pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page = pte_page(huge_ptep_get(pte)); - - VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && - !PageAnonExclusive(page), page); - - /* - * If subpage information not requested, update counters - * and skip the same_page loop below. - */ - if (!pages && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >= pages_per_huge_page(h))) { - vaddr += huge_page_size(h); - remainder -= pages_per_huge_page(h); - i += pages_per_huge_page(h); - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - continue; - } - - /* vaddr may not be aligned to PAGE_SIZE */ - refs = min3(pages_per_huge_page(h) - pfn_offset, remainder, - (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); - - if (pages) - record_subpages(nth_page(page, pfn_offset), - vma, refs, - likely(pages) ? pages + i : NULL); - - if (pages) { - /* - * try_grab_folio() should always succeed here, - * because: a) we hold the ptl lock, and b) we've just - * checked that the huge page is present in the page - * tables. If the huge page is present, then the tail - * pages must also be present. The ptl prevents the - * head page and tail pages from being rearranged in - * any way. As this is hugetlb, the pages will never - * be p2pdma or not longterm pinable. So this page - * must be available at this point, unless the page - * refcount overflowed: - */ - if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, - flags))) { - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - remainder = 0; - err = -ENOMEM; - break; - } - } - - vaddr += (refs << PAGE_SHIFT); - remainder -= refs; - i += refs; - - spin_unlock(ptl); - hugetlb_vma_unlock_read(vma); - } - *nr_pages = remainder; - /* - * setting position is actually required only if remainder is - * not zero but it's faster not to add a "if (remainder)" - * branch. - */ - *position = vaddr; - - return i ? i : err; -} - long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags)