From patchwork Tue Dec 19 07:55:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13497980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97427C41535 for ; Tue, 19 Dec 2023 07:58:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 345538D000D; Tue, 19 Dec 2023 02:58:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CEC38D0005; Tue, 19 Dec 2023 02:58:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 148998D000D; Tue, 19 Dec 2023 02:58:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id F235A8D0005 for ; Tue, 19 Dec 2023 02:58:25 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CF7991A0439 for ; Tue, 19 Dec 2023 07:58:25 +0000 (UTC) X-FDA: 81582815370.26.8B74F15 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 2AEC9A0019 for ; Tue, 19 Dec 2023 07:58:23 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bLZnMMvB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702972704; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z9BJA067XqZpd3Nf7GBMDnnzZoziQMnRZVuO9/Z3c+Q=; b=tzzSoZnDyy8h1bG/Lit1s495nSaoXBYqaOBzWvjdifNA5vSoSoCo/q+6M6yuNGwawXeiSE HvWIDfEAJPTmXYBhqDfB//FfwpukZesP3lYUa+keLpae8fn+7VKsMOOFL+R/xPhFOOExvS Hz8GQe5lCqrqshGEGnOdlKajEzhWRvY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bLZnMMvB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702972704; a=rsa-sha256; cv=none; b=oTNceH8e3XNeJwvyfZQD23DwFrKzYh7x6y2hzUmN9Qtid+FmzyX7DBT/w6gXMEQp0o/FGp kFg05hR5Eql3pUJFthtfS30vL8h4kJsPD15EHfZd7Srf3eeo0J7JPDKRbE22cybjlwLVBr lNl35EUSjbawQtV9t6J21N/cNXx+G8w= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702972703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z9BJA067XqZpd3Nf7GBMDnnzZoziQMnRZVuO9/Z3c+Q=; b=bLZnMMvBbTXQHWTYLfC9owY4qgvPevf2DudV/PmgzRoWT9BifVXE4/yXJ/BKgaXTpiNK9d S3KD2SMH2WpuqzjxYFjhGUSNeQmdg4efvEqwY279Gv85S/eaT7vukZbEJN5PFhBmMExV0O PPpnnygS3cREcR3P89n70FJ2mBrNV/g= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-175-dRyYBC1tOkeHiKu36YFQHw-1; Tue, 19 Dec 2023 02:58:18 -0500 X-MC-Unique: dRyYBC1tOkeHiKu36YFQHw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7D4B63806720; Tue, 19 Dec 2023 07:58:16 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id 528C52026D66; Tue, 19 Dec 2023 07:58:04 +0000 (UTC) From: peterx@redhat.com To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Christophe Leroy , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Mike Kravetz , Mike Rapoport , Christoph Hellwig , John Hubbard , Andrew Jones , linux-arm-kernel@lists.infradead.org, Michael Ellerman , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Rik van Riel , linux-riscv@lists.infradead.org, Yang Shi , James Houghton , "Aneesh Kumar K . V" , Andrew Morton , Jason Gunthorpe , Andrea Arcangeli , peterx@redhat.com, Axel Rasmussen Subject: [PATCH 12/13] mm/gup: Handle hugepd for follow_page() Date: Tue, 19 Dec 2023 15:55:37 +0800 Message-ID: <20231219075538.414708-13-peterx@redhat.com> In-Reply-To: <20231219075538.414708-1-peterx@redhat.com> References: <20231219075538.414708-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Rspamd-Queue-Id: 2AEC9A0019 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: uu9feyn1w9izqdpitmpti3dqimew6ctc X-HE-Tag: 1702972703-386255 X-HE-Meta: U2FsdGVkX1+cniLjce3X0JDrAnW5nfHx5rFHZ+4CNDPu3IXCz3E6Gsf9Mr83VUdvBTM53WxHEGV0Umy8jeU/Mb6pITtfnCBBVMPPV+i0OIxUcOuIvuS9fBbSZ4ub7gCjJoWaXnri4iKxkixf8HbMy7/GFx8zcAsF5/flP8XlbNRuxY2qlKEYCE73j/V7IfIm7vodtpX+/wcisOLw4RBVX/1yR7v/z1qbaf+PHbhyRwpMp3mi/yLxEtrnu/Cft3gn1qlUIYkT2WWXO9igXvwmAUyJ2O7itNNZgHoVB3nuSZPT9rErOB213yL1lcImtJ97sDn7jEQyS5JZN/Zcgqg+wSfXIA9Mk3uPklMKrdDOcSTLqeUnsm3sw/gTZrBbEZNWk0Rvwqc8jpJZhz9FagC42M+JxZn0Qlr3+bQzxRWfZZnG9f6HOlWPOYXn2MbgQpBl9pZ6qMDZHPxHWxFgpNcgucTGy+mZMSzBRKtN0CuRlgqzVzZItNYvMZr3UQrBueiUjqRU00Zto90fUM9efTQ0pVSSM7hgUC8QeP897e2uhL/M3yn8YLRXKqcrNlWFu3B96osFFUIYj6trAwuNxc1DpOhD587UqnTYha81G5ndPOQA9vSiSl0dWCHUWMGFH059n1kLy0gkvI8SL7fyd1HqlhtSgG8IFQO4sU4WGr5OaZDg6rJPGas0IjT2FMIPa+0uUvh7zJ6KFdAW9VZSj+IIYvoCvJVQ8OUcRSkSyUT7KsckIEqWlLyk/YKNg/zcag1STA1+Df9YCCIJYeOzhEjwnrPCBqNxW9zwiAef898xVAkOEs47/cw23Swu0MUOwSMnmXt7S/xsQVa0sYiysPpzEh1M3gn0zXe8DeV35IEWoqSXa/cw/5yQ5wa2xcwd3fEkuQ4YPXay2l48yS+sHY2WhgR2f4SVrFV0OTVer41aBe8jLtp6zjL309i0RV3peTU1ygB2ZWowEFN89AcvyVa WJWlY9zN J0Hb4V5mFF20zKuEdxzwYLiY3ZQIkjtxQdrjzie2gmzSDyxVJens3Xlh1JRpaUNiHypWFjsarcIBdL9mpP3hU/rqshAIJOukRdVCAM65pyPoCPYr6WLB42G3BNbrJp+awuyzpQtxazu2NFBmusvRCBBQU1dGYciaNQ7Lm87iojsgghOJmDJODpc2W87llWUmBNZAHMOqZrMaYReYJDmQ4McZy4S+9eMtastpc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Peter Xu Hugepd is only used in PowerPC so far on 4K page size kernels where hash mmu is used. follow_page_mask() used to leverage hugetlb APIs to access hugepd entries. Teach follow_page_mask() itself on hugepd. With previous refactors on fast-gup gup_huge_pd(), most of the code can be easily leveraged. There's something not needed for follow page, for example, gup_hugepte() tries to detect pgtable entry change which will never happen with slow gup (which has the pgtable lock held), but that's not a problem to check. Since follow_page() always only fetch one page, set the end to "address + PAGE_SIZE" should suffice. We will still do the pgtable walk once for each hugetlb page by setting ctx->page_mask properly. One thing worth mentioning is that some level of pgtable's _bad() helper will report is_hugepd() entries as TRUE on Power8 hash MMUs. I think it at least applies to PUD on Power8 with 4K pgsize. It means feeding a hugepd entry to pud_bad() will report a false positive. Let's leave that for now because it can be arch-specific where I am a bit declined to touch. In this patch it's not a problem as long as hugepd is detected before any bad pgtable entries. Signed-off-by: Peter Xu --- mm/gup.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 69 insertions(+), 9 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 080dff79b650..14a7d13e7bd6 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -30,6 +30,11 @@ struct follow_page_context { unsigned int page_mask; }; +static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned int flags, + struct follow_page_context *ctx); + static inline void sanity_check_pinned_pages(struct page **pages, unsigned long npages) { @@ -871,6 +876,9 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags, address); if (!pmd_present(pmdval)) return no_page_table(vma, flags, address); + if (unlikely(is_hugepd(__hugepd(pmd_val(pmdval))))) + return follow_hugepd(vma, __hugepd(pmd_val(pmdval)), + address, PMD_SHIFT, flags, ctx); if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); page = follow_devmap_pmd(vma, address, pmd, flags, &ctx->pgmap); @@ -921,6 +929,9 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, pud = *pudp; if (pud_none(pud) || !pud_present(pud)) return no_page_table(vma, flags, address); + if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) + return follow_hugepd(vma, __hugepd(pud_val(pud)), + address, PUD_SHIFT, flags, ctx); if (pud_huge(pud)) { ptl = pud_lock(mm, pudp); page = follow_huge_pud(vma, address, pudp, flags, ctx); @@ -940,13 +951,17 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma, unsigned int flags, struct follow_page_context *ctx) { - p4d_t *p4d; + p4d_t *p4d, p4dval; p4d = p4d_offset(pgdp, address); - if (p4d_none(*p4d)) - return no_page_table(vma, flags, address); - BUILD_BUG_ON(p4d_huge(*p4d)); - if (unlikely(p4d_bad(*p4d))) + p4dval = *p4d; + BUILD_BUG_ON(p4d_huge(p4dval)); + + if (unlikely(is_hugepd(__hugepd(p4d_val(p4dval))))) + return follow_hugepd(vma, __hugepd(p4d_val(p4dval)), + address, P4D_SHIFT, flags, ctx); + + if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval))) return no_page_table(vma, flags, address); return follow_pud_mask(vma, address, p4d, flags, ctx); @@ -980,7 +995,7 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, struct follow_page_context *ctx) { - pgd_t *pgd; + pgd_t *pgd, pgdval; struct mm_struct *mm = vma->vm_mm; ctx->page_mask = 0; @@ -995,11 +1010,17 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, &ctx->page_mask); pgd = pgd_offset(mm, address); + pgdval = *pgd; - if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) - return no_page_table(vma, flags, address); + if (unlikely(is_hugepd(__hugepd(pgd_val(pgdval))))) + page = follow_hugepd(vma, __hugepd(pgd_val(pgdval)), + address, PGDIR_SHIFT, flags, ctx); + else if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) + page = no_page_table(vma, flags, address); + else + page = follow_p4d_mask(vma, address, pgd, flags, ctx); - return follow_p4d_mask(vma, address, pgd, flags, ctx); + return page; } struct page *follow_page(struct vm_area_struct *vma, unsigned long address, @@ -3026,6 +3047,37 @@ static int gup_huge_pd(hugepd_t hugepd, unsigned long addr, return 1; } + +static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned int flags, + struct follow_page_context *ctx) +{ + struct page *page; + struct hstate *h; + spinlock_t *ptl; + int nr = 0, ret; + pte_t *ptep; + + /* Only hugetlb supports hugepd */ + if (WARN_ON_ONCE(!is_vm_hugetlb_page(vma))) + return ERR_PTR(-EFAULT); + + h = hstate_vma(vma); + ptep = hugepte_offset(hugepd, addr, pdshift); + ptl = huge_pte_lock(h, vma->vm_mm, ptep); + ret = gup_huge_pd(hugepd, addr, pdshift, addr + PAGE_SIZE, + flags, &page, &nr); + spin_unlock(ptl); + + if (ret) { + WARN_ON_ONCE(nr != 1); + ctx->page_mask = (1U << huge_page_order(h)) - 1; + return page; + } + + return NULL; +} #else static inline int gup_huge_pd(hugepd_t hugepd, unsigned long addr, unsigned int pdshift, unsigned long end, unsigned int flags, @@ -3033,6 +3085,14 @@ static inline int gup_huge_pd(hugepd_t hugepd, unsigned long addr, { return 0; } + +static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned int flags, + struct follow_page_context *ctx) +{ + return NULL; +} #endif /* CONFIG_ARCH_HAS_HUGEPD */ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,