From patchwork Wed Jan 3 09:14:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13509776 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B33A8C47074 for ; Wed, 3 Jan 2024 09:17:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43A238D005A; Wed, 3 Jan 2024 04:17:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3749F8D0053; Wed, 3 Jan 2024 04:17:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CA158D005A; Wed, 3 Jan 2024 04:17:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 04EF58D0053 for ; Wed, 3 Jan 2024 04:17:25 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D2ACFC086E for ; Wed, 3 Jan 2024 09:17:24 +0000 (UTC) X-FDA: 81637446408.15.812EC3D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 1D7A91C0002 for ; Wed, 3 Jan 2024 09:17:22 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SMjl682N; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704273443; a=rsa-sha256; cv=none; b=lYyABNo8ZJoMxOGhFNm9bSPYIIxbzqla16ECU3fhsNKvQAoaq0xpa0hAyKSsyCUHILje5I 3mJOPsxtV9guymiu1diDuOwtd61h/evt4TFrQ/OReoSb696HPED3RrQV7za703D0AjfgL1 0X6U0s/GrFL2FNT2CY3saI5FNrgyLGQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SMjl682N; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704273443; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1p2LWjy53LBaoBTc753xtp3ZFESBZSj8dXlH3kmoDms=; b=SWAn6huB8jcJ07ZnbSj6k6ma/xUmxFxqJO9NtK63QfAUW7yslwrTsRQ23gBIsIgkBGgGVP 5EV9V20IBcU9eLtwhuYFJgfGhnCO66j3JCk9QHciPz67H1wc/b//gKrD2QfflnFlWZAxWs AIruNO4fnGI0XTtRX3JrcOHx27/CsLs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1704273442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1p2LWjy53LBaoBTc753xtp3ZFESBZSj8dXlH3kmoDms=; b=SMjl682N8IjoNKoD01jTAAGDpllo5MsqYaN36YwrFOsppQQPITcXZvALph4IPICmA/ZCWM HXO5KqxYrgScwS0pFTOac4xfwHmwl7/HG1uRb9ePN+cXubG11WJCc5IxFn4xOnXD6oo0mW w4Rwasix5N0F82KR/h9wEJzpnfIPRcc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-595-K7snBKXGMluKlA4u3ja1Zg-1; Wed, 03 Jan 2024 04:17:19 -0500 X-MC-Unique: K7snBKXGMluKlA4u3ja1Zg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 26F8783B86A; Wed, 3 Jan 2024 09:17:18 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.69]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4404B492BE6; Wed, 3 Jan 2024 09:17:05 +0000 (UTC) From: peterx@redhat.com To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: James Houghton , David Hildenbrand , "Kirill A . Shutemov" , Yang Shi , peterx@redhat.com, linux-riscv@lists.infradead.org, Andrew Morton , "Aneesh Kumar K . V" , Rik van Riel , Andrea Arcangeli , Axel Rasmussen , Mike Rapoport , John Hubbard , Vlastimil Babka , Michael Ellerman , Christophe Leroy , Andrew Jones , linuxppc-dev@lists.ozlabs.org, Mike Kravetz , Muchun Song , linux-arm-kernel@lists.infradead.org, Jason Gunthorpe , Christoph Hellwig , Lorenzo Stoakes , Matthew Wilcox Subject: [PATCH v2 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code Date: Wed, 3 Jan 2024 17:14:23 +0800 Message-ID: <20240103091423.400294-14-peterx@redhat.com> In-Reply-To: <20240103091423.400294-1-peterx@redhat.com> References: <20240103091423.400294-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1D7A91C0002 X-Stat-Signature: fpg31zhjiype7f3jkcjcrhimhwz9srhr X-HE-Tag: 1704273442-327122 X-HE-Meta: U2FsdGVkX18O8FDJQR/3du4Ab5t/bUd0+1DCAKRoecAdeRftP1JeY1ibXNJQCdEI3xkkKmi9kg2i4WHULAba/wqWWtSR//PPhFhMtjKkX63kAkBdVSi0SuZ8Js4bh30f+8fLOAkHs5FTd65RLNko5qdn0ZcUX0DjZHHed6dmYXge7rY/8nFEpPGltsxouRMirXT5EOp7xbilIT17f+dHkRxqwmlC8gSHqwKHrONAAp2IRakO4qIrNDUB/Y47zDiR1gdjT2OCBm2NFzukLGGW8hXxJn1SK4PWlmpDmXPy9p8roP2RLfEatus9GDAdgNq8pUoPndGZ5S8c93XKn2O4eKs/YBgQ3KQVXdSphhZBGjHF7OfYbpYZVW+E17AihMFBEY6HXSSBI8RYuE/ATCKuUN4i4G9dJTBiyL0+erwx5CzOoe7wQeYi2ks4hD2WtdYxaCghuLyNSxi8HaoZOJH3pupbj7Q6uOX9aSGQUA9Zn3S+1ruhbvRjnyXR24DAs0kIzw11RDuAALLXSjMYk6LYtPFg8mVBqpwDIqV0z1gtWO0Q85xYRymq6PSpuiPir1CqXuX1DAbBcNB2JhCMejbuJl2H6Nwahdc18UmmCKg19Y2I4WuaWtmJXqjjL62M2gjIFEhInoJc5rkIYA4trjFfPMn8D8dxA6XM6SZglX6Vg7C2krQ1RDXpeBT0ymoGaFtmLmAcIepKhkHAwZlwRZZB3/GbpLTkyZXE0Lp+rGfOmPOOvX+BMpl1UzUzkENw8m+fK8SL+hhSnDHR08mnZT8bkOsw/5jA/kvEl0o/r7ybXaMFcaOd0wkU3/A+mHEQ6/aZASStXoshtDoff5pmcJxPHs45jfnbnzbQewkxBJOHDaHVkgef8WKnKKlBT34qqjX90b/0qOZKmVvz49mu/QqqNFZZCR1RyUxJ3TsQ127IZ0SgCaQy8pWXnzq86w4mmIg5unz1ay7iJ3ZNwfP+5uG EJmRUhAA JbaPlHsor7dxIp/obWk/e/L6r7dCB0j5WoJ69iaaeuIeFJ6KmNkS+gKRRFsZORiF/8cK58JgwBrD9yOq6C0ma2BojwDcjC1AUPCFI23tuTD6dCPngo+aRYemLsjG9z702kqks7mSDU6RJbsqQzHYHROowS0tdxX9SSGWOZcjO3eAth06l6+ejKa7oZgK/ig1uCZOshwMNfR9dPgMI9/5WZ/hTNA3yFYtMXroo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight difference of how the loops run when processing slow GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each loop of __get_user_pages() will resolve one pgtable entry with the patch applied, rather than relying on the size of hugetlb hstate, the latter may cover multiple entries in one loop. A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over a tight loop of slow gup after the path switched. That shouldn't be a problem because slow-gup should not be a hot path for GUP in general: when page is commonly present, fast-gup will already succeed, while when the page is indeed missing and require a follow up page fault, the slow gup degrade will probably buried in the fault paths anyway. It also explains why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") lands, the latter not part of a performance analysis but a side benefit. If the performance will be a concern, we can consider handle CONT_PTE in follow_page(). Before that is justified to be necessary, keep everything clean and simple. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 ---- mm/gup.c | 15 +++------ mm/hugetlb.c | 71 ----------------------------------------- 3 files changed, 5 insertions(+), 88 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e8eddd51fc17..cdbb53407722 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -332,13 +332,6 @@ static inline void hugetlb_zap_end( { } -static inline struct page *hugetlb_follow_page_mask( - struct vm_area_struct *vma, unsigned long address, unsigned int flags, - unsigned int *page_mask) -{ - BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ -} - static inline int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *dst_vma, diff --git a/mm/gup.c b/mm/gup.c index 245214b64108..4f8a3dc287c9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -997,18 +997,11 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, { pgd_t *pgd, pgdval; struct mm_struct *mm = vma->vm_mm; + struct page *page; - ctx->page_mask = 0; - - /* - * Call hugetlb_follow_page_mask for hugetlb vmas as it will use - * special hugetlb page table walking code. This eliminates the - * need to check for hugetlb entries in the general walking code. - */ - if (is_vm_hugetlb_page(vma)) - return hugetlb_follow_page_mask(vma, address, flags, - &ctx->page_mask); + vma_pgtable_walk_begin(vma); + ctx->page_mask = 0; pgd = pgd_offset(mm, address); pgdval = *pgd; @@ -1020,6 +1013,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, else page = follow_p4d_mask(vma, address, pgd, flags, ctx); + vma_pgtable_walk_end(vma); + return page; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bfb52bb8b943..e13b4e038c2c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6782,77 +6782,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, } #endif /* CONFIG_USERFAULTFD */ -struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags, - unsigned int *page_mask) -{ - struct hstate *h = hstate_vma(vma); - struct mm_struct *mm = vma->vm_mm; - unsigned long haddr = address & huge_page_mask(h); - struct page *page = NULL; - spinlock_t *ptl; - pte_t *pte, entry; - int ret; - - hugetlb_vma_lock_read(vma); - pte = hugetlb_walk(vma, haddr, huge_page_size(h)); - if (!pte) - goto out_unlock; - - ptl = huge_pte_lock(h, mm, pte); - entry = huge_ptep_get(pte); - if (pte_present(entry)) { - page = pte_page(entry); - - if (!huge_pte_write(entry)) { - if (flags & FOLL_WRITE) { - page = NULL; - goto out; - } - - if (gup_must_unshare(vma, flags, page)) { - /* Tell the caller to do unsharing */ - page = ERR_PTR(-EMLINK); - goto out; - } - } - - page = nth_page(page, ((address & ~huge_page_mask(h)) >> PAGE_SHIFT)); - - /* - * Note that page may be a sub-page, and with vmemmap - * optimizations the page struct may be read only. - * try_grab_page() will increase the ref count on the - * head page, so this will be OK. - * - * try_grab_page() should always be able to get the page here, - * because we hold the ptl lock and have verified pte_present(). - */ - ret = try_grab_page(page, flags); - - if (WARN_ON_ONCE(ret)) { - page = ERR_PTR(ret); - goto out; - } - - *page_mask = (1U << huge_page_order(h)) - 1; - } -out: - spin_unlock(ptl); -out_unlock: - hugetlb_vma_unlock_read(vma); - - /* - * Fixup retval for dump requests: if pagecache doesn't exist, - * don't try to allocate a new page but just skip it. - */ - if (!page && (flags & FOLL_DUMP) && - !hugetlbfs_pagecache_present(h, vma, address)) - page = ERR_PTR(-EFAULT); - - return page; -} - long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags)