From patchwork Tue Dec 19 07:55:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13497946 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FF45C46CD4 for ; Tue, 19 Dec 2023 07:58:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 875E38D000B; Tue, 19 Dec 2023 02:58:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FECB8D0005; Tue, 19 Dec 2023 02:58:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6516B8D000B; Tue, 19 Dec 2023 02:58:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4D99D8D0005 for ; Tue, 19 Dec 2023 02:58:01 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1DB7E120340 for ; Tue, 19 Dec 2023 07:58:01 +0000 (UTC) X-FDA: 81582814362.30.EC31FD6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 66800C0016 for ; Tue, 19 Dec 2023 07:57:59 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fQknvCbV; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702972679; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5ijea/674RZI9IDvQT14lUfypWJKjpWiawe+yIIZEUw=; b=mDgY18l52YdIqxZqOhRQESRh0iXaOuuIWRrkUYrJ83VfNLVvODCSDBHuU2bkmtnzdurWTm 2X7UGG2JnKGcDS890BBYrYoxjjzYJS6n4+Y3FWpqnNQZ6xgpLoiPTI5WtVQI7vt1LZncxh E7A5OsOF1GbmqNOjaNF/kG4BsiUqArk= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fQknvCbV; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702972679; a=rsa-sha256; cv=none; b=oqc/Bo3uQtAX08+pO9YiVaEyXH5KuG0sB+Sc+YoN6a3b35vu9JXnoXYtKecbl/ImgsgYOC rfezIkr9oa3y2xrAtCZT7NlGgzB864jIZ6dDoFpKOL0Pq2SmdxH6PO/NkRHI/g4e7dPqap cDyy1gIYay8ErPmn/uFElLsEuosB65s= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702972678; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5ijea/674RZI9IDvQT14lUfypWJKjpWiawe+yIIZEUw=; b=fQknvCbVhetvN0TXIRAn7+BNxL2s4KvCwVsknoz43iD1x3/8n2uQZDfR6baUtWIQUA3gDG rwGgfsexeBi0QhWZTKZyScl/31hAyKRBJUa8PPJoeou8kmQWyHNtrJf1qDOtFi7tyBRmyL /Sx80Is7+t3WJMVkr7K7lZF6vJesTN4= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-48-DALhkOU0MoKtInSHgoxvqA-1; Tue, 19 Dec 2023 02:57:54 -0500 X-MC-Unique: DALhkOU0MoKtInSHgoxvqA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1D1EE3806720; Tue, 19 Dec 2023 07:57:53 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id BF70B2026D66; Tue, 19 Dec 2023 07:57:41 +0000 (UTC) From: peterx@redhat.com To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Christophe Leroy , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Mike Kravetz , Mike Rapoport , Christoph Hellwig , John Hubbard , Andrew Jones , linux-arm-kernel@lists.infradead.org, Michael Ellerman , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Rik van Riel , linux-riscv@lists.infradead.org, Yang Shi , James Houghton , "Aneesh Kumar K . V" , Andrew Morton , Jason Gunthorpe , Andrea Arcangeli , peterx@redhat.com, Axel Rasmussen Subject: [PATCH 10/13] mm/gup: Handle huge pud for follow_pud_mask() Date: Tue, 19 Dec 2023 15:55:35 +0800 Message-ID: <20231219075538.414708-11-peterx@redhat.com> In-Reply-To: <20231219075538.414708-1-peterx@redhat.com> References: <20231219075538.414708-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Rspamd-Queue-Id: 66800C0016 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 7eo1yugfztho9rnoqciwy954qqarnhx8 X-HE-Tag: 1702972679-27275 X-HE-Meta: U2FsdGVkX19PjC4L8KsI3HOJZv+Fzfw+DcgYtOkr6dHXVeHlAFcmyNO2LUuDAF/2FBoZjGHSs65RAOHgQhRaxrPY72iHLupuWdpXLuJekJQnFykSNESGRQihBhgzpec2O90SdTBKu+QaBUh0jc1wJ6tsgQopsBhHK1bg9PAK5TJ0fwwqcAEbjY3Ua29OcH/VsfbOMpO3s/RsFuLH/TkAs5zN/Qr8J+KyI5mZmXA0wzcpE3Ga9WMZRlvdLvgCBlp+yQ2gk0O2D/gULMRKJgoWjzPEPnXwXKSP5EhzQus5SBG7KOyxqxpClEcJyw4kgtbfvcp7VWBRU5Kc6xbHdVmqGhTiSgeW0tZRMadDhMFFQi1A+GyrvPJlrfmknGkt4prfteJDSgFMiqahtHFvHRHwRITKNidkLufoU7ZrF6Y2eF2O88N012g+qMUirX8eXCrTlcG3mLL6MLPM8bzNj8/2i+LJUGIr+mnXIwjIqKz385Qn+Es6Hb2mGDo/zxuGbNHzrxzR5bWdK9UQKoMyMKYbdFokItnZdx0Du1zIm3zkH0BPtHNKuwBC8KxudTM27BFQUn8CPcjreNA5qgiI2WUlOhUgIoD0IIN2jUbWEBSHhAgSdW8/3po9K/CpGGABcTR+6dYaA7ZyoCrCElkE1ZDGr9YLZI/7TFUDDvw3Zo3/MZiYiNul/rmsvWn+oO6GBozpizpdqeqR5oKEiNcUzZV7nvn6lMUhab1NurYnoI6Vk5NmiyKwwzuCVTKacFyoGKbb0oR66EQUevJHMAC74gr17ENWceB93vyKpqwOOz2zvOYRIwzXFn7fOgHJvD0F6f27tRa1GEbxooryhwGRvcT+7xwcDNZrEPvw+ErsmsuTjxOJ14LPf9H8WGS7jZTNrSCIU/2aXPiCaanCMEjJAixqICpRQzXLfgEf8BdQ79hAfcraMfhbs+VcyFiLPktJz3rFUk5oi6i+O0c8OGSpTea gz6eOJua keNzEMQb0Rp18CoBBNWWdFxFIcVO2YVveRdMVr6kbjx6lKJe0pWm8+Rs/Slgr7EulMDUQ9koe/cTLBA/9c6xu6gkD9s1yCvKhbnIKxQDWSiUEiTz09cEx3GmBG0vEJ+5XgrGAEDXWQKlFFg0D5kkvQGr/6Sd0bgm1odoFft1sQffu4gukwpfQj3zeR1INw42ZCHvFV78iMkaKqMvGvhYi6AzPWQWTHJf1VUa0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Peter Xu Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb. Rename follow_devmap_pud() to follow_huge_pud() so that it can process either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and and huge_memory.c (which relies on CONFIG_THP). In the new follow_huge_pud(), taking care of possible CoR for hugetlb if necessary. touch_pud() needs to be moved out of huge_memory.c to be accessable from gup.c even if !THP. Since at it, optimize the non-present check by adding a pud_present() early check before taking the pgtable lock, failing the follow_page() early if PUD is not present: that is required by both devmap or hugetlb. Use pud_huge() to also cover the pud_devmap() case. One more trivial thing to mention is, introduce "pud_t pud" in the code paths along the way, so the code doesn't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values when it's being modified at the same time. Setting ctx->page_mask properly for a PUD entry. As a side effect, this patch should also be able to optimize devmap GUP on PUD to be able to jump over the whole PUD range, but not yet verified. Hugetlb already can do so prior to this patch. Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 8 ----- mm/gup.c | 70 +++++++++++++++++++++++++++++++++++++++-- mm/huge_memory.c | 47 ++------------------------- mm/internal.h | 2 ++ 4 files changed, 71 insertions(+), 56 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d335130e145f..80f181d76f94 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -346,8 +346,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio) struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); -struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags, struct dev_pagemap **pgmap); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); @@ -503,12 +501,6 @@ static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma, return NULL; } -static inline struct page *follow_devmap_pud(struct vm_area_struct *vma, - unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap) -{ - return NULL; -} - static inline bool thp_migration_supported(void) { return false; diff --git a/mm/gup.c b/mm/gup.c index 97e87b7a15c3..5b14f91d2f6b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -525,6 +525,70 @@ static struct page *no_page_table(struct vm_area_struct *vma, return NULL; } +#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES +static struct page *follow_huge_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t *pudp, + int flags, struct follow_page_context *ctx) +{ + struct mm_struct *mm = vma->vm_mm; + struct page *page; + pud_t pud = *pudp; + unsigned long pfn = pud_pfn(pud); + int ret; + + assert_spin_locked(pud_lockptr(mm, pudp)); + + if ((flags & FOLL_WRITE) && !pud_write(pud)) + return NULL; + + if (!pud_present(pud)) + return NULL; + + pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + if (pud_devmap(pud)) { + /* + * device mapped pages can only be returned if the caller + * will manage the page reference count. + * + * At least one of FOLL_GET | FOLL_PIN must be set, so + * assert that here: + */ + if (!(flags & (FOLL_GET | FOLL_PIN))) + return ERR_PTR(-EEXIST); + + if (flags & FOLL_TOUCH) + touch_pud(vma, addr, pudp, flags & FOLL_WRITE); + + ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap); + if (!ctx->pgmap) + return ERR_PTR(-EFAULT); + } +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ + page = pfn_to_page(pfn); + + if (!pud_devmap(pud) && !pud_write(pud) && + gup_must_unshare(vma, flags, page)) + return ERR_PTR(-EMLINK); + + ret = try_grab_page(page, flags); + if (ret) + page = ERR_PTR(ret); + else + ctx->page_mask = HPAGE_PUD_NR - 1; + + return page; +} +#else /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ +static struct page *follow_huge_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t *pudp, + int flags, struct follow_page_context *ctx) +{ + return NULL; +} +#endif /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ + static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, pte_t *pte, unsigned int flags) { @@ -760,11 +824,11 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, pudp = pud_offset(p4dp, address); pud = *pudp; - if (pud_none(pud)) + if (pud_none(pud) || !pud_present(pud)) return no_page_table(vma, flags, address); - if (pud_devmap(pud)) { + if (pud_huge(pud)) { ptl = pud_lock(mm, pudp); - page = follow_devmap_pud(vma, address, pudp, flags, &ctx->pgmap); + page = follow_huge_pud(vma, address, pudp, flags, ctx); spin_unlock(ptl); if (page) return page; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6be1a380a298..def1dbe0d7e8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1371,8 +1371,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, } #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -static void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write) +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) { pud_t _pud; @@ -1384,49 +1384,6 @@ static void touch_pud(struct vm_area_struct *vma, unsigned long addr, update_mmu_cache_pud(vma, addr, pud); } -struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags, struct dev_pagemap **pgmap) -{ - unsigned long pfn = pud_pfn(*pud); - struct mm_struct *mm = vma->vm_mm; - struct page *page; - int ret; - - assert_spin_locked(pud_lockptr(mm, pud)); - - if (flags & FOLL_WRITE && !pud_write(*pud)) - return NULL; - - if (pud_present(*pud) && pud_devmap(*pud)) - /* pass */; - else - return NULL; - - if (flags & FOLL_TOUCH) - touch_pud(vma, addr, pud, flags & FOLL_WRITE); - - /* - * device mapped pages can only be returned if the - * caller will manage the page reference count. - * - * At least one of FOLL_GET | FOLL_PIN must be set, so assert that here: - */ - if (!(flags & (FOLL_GET | FOLL_PIN))) - return ERR_PTR(-EEXIST); - - pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; - *pgmap = get_dev_pagemap(pfn, *pgmap); - if (!*pgmap) - return ERR_PTR(-EFAULT); - page = pfn_to_page(pfn); - - ret = try_grab_page(page, flags); - if (ret) - page = ERR_PTR(ret); - - return page; -} - int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma) diff --git a/mm/internal.h b/mm/internal.h index 222e63b2dea4..2fca14553d0f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1007,6 +1007,8 @@ int __must_check try_grab_page(struct page *page, unsigned int flags); /* * mm/huge_memory.c */ +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write); struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags);