From patchwork Tue Apr 30 13:13:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13649001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E162EC4345F for ; Tue, 30 Apr 2024 13:13:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B87B6B0085; Tue, 30 Apr 2024 09:13:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76A016B0093; Tue, 30 Apr 2024 09:13:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6306E6B0096; Tue, 30 Apr 2024 09:13:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4473E6B0085 for ; Tue, 30 Apr 2024 09:13:12 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AE0EF16043C for ; Tue, 30 Apr 2024 13:13:11 +0000 (UTC) X-FDA: 82066238982.16.8CE4E36 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 68CB51A0019 for ; Tue, 30 Apr 2024 13:13:09 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="AB/XEZFR"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714482789; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=oWuqOsXifT7Zmvx2sEqXndtkJ5mO0uhVIU3UoDEgjcQ=; b=TfaG8sCGlzoDnBgY90DwX5TE/FGB1A+GeZUUBvH8j2wvXYUaAtcUsa/Hp0+OMpB3sVJVI0 74P+GDntu52Dm/g4KbG2JF712ZmwP+Kq6uuSa5GUK25TiiAKDt9k0VbYCu+RjXrCBNcHGD g+OYVJE1yU0++qSTa/bRiSiMl9ouyZ0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714482789; a=rsa-sha256; cv=none; b=rPVoyKNnxHFHJoPy+hYEDrlidJ34wKK2M1K/4jhEbdZV3TeCBqNO1H8NAmdrZDWZxKI0A2 osl4klBiQnUTP9q6s5ptEI0e1+bhQ6p5YIb/8pwgD383+FxRxX+p0Pf0w/PPRvgL1e/zE1 qdiX3TLDfg3KRkXg9uWY2gY6wPXCmTc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="AB/XEZFR"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714482788; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=oWuqOsXifT7Zmvx2sEqXndtkJ5mO0uhVIU3UoDEgjcQ=; b=AB/XEZFRWRgI/90C4ex/8SK/IzD4/PnF0qwcQV1j5FyOGhd0p6pDqMkqXpYsfHVIEO9yvC T4Q61JJ9eoemJ3eKk/NQCpC+GNVAh/sYqHIOlNWOhH85/LEtt1nCq4X5uLnEcc3tnCt6Mg i42VXLoESPbCrMSM997NImZxWQq3MaY= Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-18-nWmUjoaUMmW2-eTG4H97nQ-1; Tue, 30 Apr 2024 09:13:07 -0400 X-MC-Unique: nWmUjoaUMmW2-eTG4H97nQ-1 Received: by mail-ot1-f70.google.com with SMTP id 46e09a7af769-6ea10c8093eso225597a34.1 for ; Tue, 30 Apr 2024 06:13:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714482786; x=1715087586; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oWuqOsXifT7Zmvx2sEqXndtkJ5mO0uhVIU3UoDEgjcQ=; b=qlwAUqxzDafrFbW3FBvpZhItjqHIX0sAPo2GEPG8nOkzQe6H1F0XBZeBX3Rg76IDwT CZzXEuzKmw8mwJkxXLxmJY2tbaTMGl0ppxCZS4ToAghJjErsmvA2cZgXbvvtJnotY2XH oWEPoBkgOOLs4IFh/aM71TaCL1+Wc0XkfPOXNr7gPs9ktVU27QDpZtPwtuAZeT0X0pVj 4tpPpl3jga5XDTYHGmJJn5+Q+2FF28/WY/xny5ireINIs1yO4daiK0LzX91ThAuNv5TH tl+QswCnkA+ImwgqbXeP4ln8JeUWA4n4tGNVPQEzpn4fc37MfIJfDnsHsH5Pnz/ciuNo 8+yw== X-Forwarded-Encrypted: i=1; AJvYcCV9d85Njy2LdV4GZ5ZxjN/C4KZ5bhwimy45LEbzRrp9pRPy3Yhxixf+F0vu0CHKGnWMTTjJ0sDBWB3Qs24Kuv4Tp9k= X-Gm-Message-State: AOJu0YwwYKardI6c4Wx70LmwrekEmzZHtCLVA6TyxN9XoMOtASvxFxhE 68+JmrerFP9MaGnqHRet1yr4OfIpN8M1CX5XZzvClCgUiblUzl/EGZeBeVYleVSj5QJLNhJgS8s BLVVLCVwK99VgS6lxrWIeDtHGcWr7vC8tEmIqfZeDh/r3OC3u X-Received: by 2002:a05:6830:100f:b0:6ee:29b2:c2c5 with SMTP id a15-20020a056830100f00b006ee29b2c2c5mr6823542otp.1.1714482785893; Tue, 30 Apr 2024 06:13:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF6fWhVkCw/NAkBxI5mhCww2FCWhCtQ7223L4ixmBRvipLfoa6GDPyCxArc1sm1kF/OD00lJQ== X-Received: by 2002:a05:6830:100f:b0:6ee:29b2:c2c5 with SMTP id a15-20020a056830100f00b006ee29b2c2c5mr6823485otp.1.1714482785270; Tue, 30 Apr 2024 06:13:05 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id mk11-20020a056214580b00b006a0cd28f98dsm2008123qvb.25.2024.04.30.06.13.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 06:13:04 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Muchun Song , John Hubbard , David Hildenbrand , Andrew Morton , peterx@redhat.com, Jason Gunthorpe , Lorenzo Stoakes , linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K . V" , Christophe Leroy Subject: [PATCH v2] mm/gup: Fix hugepd handling in hugetlb rework Date: Tue, 30 Apr 2024 09:13:03 -0400 Message-ID: <20240430131303.264331-1-peterx@redhat.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: 68CB51A0019 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: kwsm1uuasrcqqz7ttnkgt8kczzjearqr X-HE-Tag: 1714482789-388751 X-HE-Meta: U2FsdGVkX182r2fF3/cudx/h6P71yZSTLQuHlmbmEYNTiED3ervGHq7GRRHVOxHFG5mPfZYCNgf7a1UpTdcewzzYERmDWp6cMmviKrQe2ITqQscElln7F/nUiUpT81SkS7N2QXs5uygKUC7YwhM/WrxV2/FeBH61USotrd+AcTz2hs13I9+OzwtVZcwwcfJ/FHm0Szcqm2IJQRLr5OHBao643FsXdHwhdWV41nfuHKUpcZyXp6U5y1kFlxpPtznguNPhaKgTE1eYMBAV5/jW2pJWNLnF6pFFAGMHjFiLzJWafCcL6OfJDI01xU+bTsZZ+BRYaKAePANMqir/L2OP5Fpc5mJsf60Vd6qBWgga16R4jz0uSJYHRL5E5IEOz0t2Nfjf78YOBNeTfdBjoaXFFtg2JGBt6uejusVgia0V3IASl4sO4tg4ZS3xpVSTsCjbQ3EOwrXeP+VGnBUU9eIUswJebE6ntbZj3d/0LpOsJwpAjOoNJn5rJpuhV9yHjWRa4WRzOLpIkPA9Jv9pUO8RFJojdPPVSZQjCIAKiRPW4D4t8dLf6Ek6fZehlt0FTj0nj8iG6SbBa3mQKP0f2E/PFjhjhdcjMrfnvzwsD6mWglxokt2w2Sakx9Ao1eLYct4ymnsprbHHoOBED9uX01LY0CuxFdqbRw3f5FZAqOOtk5rHqOj4zU20Ncd4LfmMit4j4KC6UenCpeG6NATwDeFKlwhtPcN93WExvJM/tDsnVaiCVGZ8+7ze0/ENnenqVDW6qUrDu7ylyHh2/aaPnHSey6m6VD+Zb/8SBhc+sYJgLbgm+pPFX64hK+se+W24wjBnkqLFXMjo5Y03UeajlGhxSdVOZ51e92IUcrNBPe1zK2b29HkaoIN6IDPHSUZSkz5H0edptwsLy2UeafhTEJO4HiCm4+Xl14UHXCh5RBtjng44uidKCmzeCCmT9Jw9XF19bldqRk0TQvB6OLVAZAJ e0Stk8l3 wydd99+LSIEYCi5Ft2i/vUofiVh6t1Aokz1VymMto2Ng3aXHOLaLf+t1VKYkTP3Bi0p1Gd1wWK+YE26x6RtG29QsaoaAFOADQxtQze4FDFWcUUW9elLtx6vwCc/ZsszQmcxb2GmDHnKkbML/Od1BQ2hYEEpwrchIccW+g5yaFHEsSzTHldlwcXoWvMje+RyeQHPgOlrkyapv4SF/vO9O8QZRxxlgCVrMukcbwASXO65mjQD9bqu01ul+sdl0WWJjcczLsuayo48y0SNMow0Ld5T9Ts1tLfZYa9WWPN5Q0b9toA+v6SmN12ZFnSaF0WmcAKQFpRRasClanPaQKxhTzOWc6WMonP1vNOeS70jCf6apjQMZf8qs4jI23G0J1r2R9JyyVs+ENsBa+zO4pfldDYWYvEG4H3nf7GyhFmcms3lHP5KsWUuCY1kbmea7nsjPFY83b96DOggsyAyLmlmxJSJFX0M224z0HE1fwNTLxsPU2A2E+JTSlMYrvYQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Commit a12083d721d7 added hugepd handling for gup-slow, reusing gup-fast functions. follow_hugepd() correctly took the vma pointer in, however didn't pass it over into the lower functions, which was overlooked. The issue is gup_fast_hugepte() uses the vma pointer to make the correct decision on whether an unshare is needed for a FOLL_PIN|FOLL_LONGTERM. Now without vma ponter it will constantly return "true" (needs an unshare) for a page cache, even though in the SHARED case it will be wrong to unshare. The other problem is, even if an unshare is needed, it now returns 0 rather than -EMLINK, which will not trigger a follow up FAULT_FLAG_UNSHARE fault. That will need to be fixed too when the unshare is wanted. gup_longterm test didn't expose this issue in the past because it didn't yet test R/O unshare in this case, another separate patch will enable that in future tests. Fix it by passing vma correctly to the bottom, rename gup_fast_hugepte() back to gup_hugepte() as it is shared between the fast/slow paths, and also allow -EMLINK to be returned properly by gup_hugepte() even though gup-fast will take it the same as zero. Reported-by: David Hildenbrand Fixes: a12083d721d7 ("mm/gup: handle hugepd for follow_page()") Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- v1: https://lore.kernel.org/r/20240428190151.201002-1-peterx@redhat.com This is v2 and dropped the 2nd test patch as a better one can come later, this patch alone is kept untouched, added David's R-b. Should apply to both mm-stable and mm-unstable. The target commit to be fixed should just been moved into mm-stable, so no need to cc stable. --- mm/gup.c | 64 ++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 2f7baf96f655..ca0f5cedce9b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -525,9 +525,17 @@ static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, return (__boundary - 1 < end - 1) ? __boundary : end; } -static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, - unsigned long end, unsigned int flags, struct page **pages, - int *nr) +/* + * Returns 1 if succeeded, 0 if failed, -EMLINK if unshare needed. + * + * NOTE: for the same entry, gup-fast and gup-slow can return different + * results (0 v.s. -EMLINK) depending on whether vma is available. This is + * the expected behavior, where we simply want gup-fast to fallback to + * gup-slow to take the vma reference first. + */ +static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz, + unsigned long addr, unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long pte_end; struct page *page; @@ -559,9 +567,9 @@ static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 0; } - if (!pte_write(pte) && gup_must_unshare(NULL, flags, &folio->page)) { + if (!pte_write(pte) && gup_must_unshare(vma, flags, &folio->page)) { gup_put_folio(folio, refs, flags); - return 0; + return -EMLINK; } *nr += refs; @@ -577,19 +585,22 @@ static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, * of the other folios. See writable_file_mapping_allowed() and * gup_fast_folio_allowed() for more information. */ -static int gup_fast_hugepd(hugepd_t hugepd, unsigned long addr, - unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) +static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { pte_t *ptep; unsigned long sz = 1UL << hugepd_shift(hugepd); unsigned long next; + int ret; ptep = hugepte_offset(hugepd, addr, pdshift); do { next = hugepte_addr_end(addr, end, sz); - if (!gup_fast_hugepte(ptep, sz, addr, end, flags, pages, nr)) - return 0; + ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr); + if (ret != 1) + return ret; } while (ptep++, addr = next, addr != end); return 1; @@ -613,22 +624,25 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, h = hstate_vma(vma); ptep = hugepte_offset(hugepd, addr, pdshift); ptl = huge_pte_lock(h, vma->vm_mm, ptep); - ret = gup_fast_hugepd(hugepd, addr, pdshift, addr + PAGE_SIZE, - flags, &page, &nr); + ret = gup_hugepd(vma, hugepd, addr, pdshift, addr + PAGE_SIZE, + flags, &page, &nr); spin_unlock(ptl); - if (ret) { + if (ret == 1) { + /* GUP succeeded */ WARN_ON_ONCE(nr != 1); ctx->page_mask = (1U << huge_page_order(h)) - 1; return page; } - return NULL; + /* ret can be either 0 (translates to NULL) or negative */ + return ERR_PTR(ret); } #else /* CONFIG_ARCH_HAS_HUGEPD */ -static inline int gup_fast_hugepd(hugepd_t hugepd, unsigned long addr, - unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) +static inline int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { return 0; } @@ -3261,8 +3275,8 @@ static int gup_fast_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, * architecture have different format for hugetlbfs * pmd format and THP pmd format */ - if (!gup_fast_hugepd(__hugepd(pmd_val(pmd)), addr, - PMD_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(pmd_val(pmd)), addr, + PMD_SHIFT, next, flags, pages, nr) != 1) return 0; } else if (!gup_fast_pte_range(pmd, pmdp, addr, next, flags, pages, nr)) @@ -3291,8 +3305,8 @@ static int gup_fast_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, pages, nr)) return 0; } else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) { - if (!gup_fast_hugepd(__hugepd(pud_val(pud)), addr, - PUD_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(pud_val(pud)), addr, + PUD_SHIFT, next, flags, pages, nr) != 1) return 0; } else if (!gup_fast_pmd_range(pudp, pud, addr, next, flags, pages, nr)) @@ -3318,8 +3332,8 @@ static int gup_fast_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, return 0; BUILD_BUG_ON(p4d_leaf(p4d)); if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) { - if (!gup_fast_hugepd(__hugepd(p4d_val(p4d)), addr, - P4D_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(p4d_val(p4d)), addr, + P4D_SHIFT, next, flags, pages, nr) != 1) return 0; } else if (!gup_fast_pud_range(p4dp, p4d, addr, next, flags, pages, nr)) @@ -3347,8 +3361,8 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end, pages, nr)) return; } else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) { - if (!gup_fast_hugepd(__hugepd(pgd_val(pgd)), addr, - PGDIR_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(pgd_val(pgd)), addr, + PGDIR_SHIFT, next, flags, pages, nr) != 1) return; } else if (!gup_fast_p4d_range(pgdp, pgd, addr, next, flags, pages, nr))