From patchwork Sun Apr 28 19:01:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13646093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E805C19F53 for ; Sun, 28 Apr 2024 19:02:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D27516B0085; Sun, 28 Apr 2024 15:02:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD92A6B0088; Sun, 28 Apr 2024 15:02:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2A626B0087; Sun, 28 Apr 2024 15:02:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 96B3A6B0083 for ; Sun, 28 Apr 2024 15:02:01 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B72324047E for ; Sun, 28 Apr 2024 19:02:00 +0000 (UTC) X-FDA: 82059860400.14.36BBF26 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 9C86380009 for ; Sun, 28 Apr 2024 19:01:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U6ILykjn; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714330918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1t4UcvVVCt+Lz5av6btBnWLmCHNkWkDBAkPZW0Lpq7g=; b=e2LJmLMNTiEn2503uOCY6gFLNP8yNrDVc1w/Rgst77iUdODnV/Uw+U6ArhVzVH4aMqshBL GMRYBtLbmIETMpjkCtSQsqR1Kqae6PV/YmuiQTxZlScpMB2GZE/dpqt0t8Nq7IXmI2ihhv YKHeD0N+GeJcVnQWZ24MmlkEgfRhMgo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U6ILykjn; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714330918; a=rsa-sha256; cv=none; b=YQw9zZLhjKeKCWi1TRWSauqeDRW36IGO0ESPU+V2gAGRKjfV6+VQCaNDvec+4H45VUcliX jbS0u0uHBstIq6XtOgxF9CfANj3jMN4EYLSrdoCp0YUzhC7FRlv9ERgwrM3j1x4snfSAfX R+e493nA2mJaLbfzKKQUCzMVquHda8A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714330917; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1t4UcvVVCt+Lz5av6btBnWLmCHNkWkDBAkPZW0Lpq7g=; b=U6ILykjnKd1LaCfHUYWGD40fpvT/Nd5N2Br6Y0ffD/rR0G7jeLZw7sUC/Ae2za1qxsVnn4 fyoR/GBYL3BbHXxPO7FhPrCQnz5hOUQC4D6MhUZahetFJwisRTsLFDxK1A8pzbAjJG2Fa8 ldcT823A6OQxqGukP89G/XDh1zPXGao= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-22-ZRt6-8PhPOCSKdCHHjUbuQ-1; Sun, 28 Apr 2024 15:01:56 -0400 X-MC-Unique: ZRt6-8PhPOCSKdCHHjUbuQ-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-434ed2e412fso12001201cf.0 for ; Sun, 28 Apr 2024 12:01:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714330915; x=1714935715; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1t4UcvVVCt+Lz5av6btBnWLmCHNkWkDBAkPZW0Lpq7g=; b=JIxjcB4tTM4ZCm612LSeR5CEBm79kA3CVo6BTuA1a0cLca77SWF0+vgBgbvJAI8D/k EFlUDlquKAr65aOopIBXQ9wgdfdMAhGovPYBK1BPV7faOlLVwqqvPmbrNWoaXJRO3HMQ /3gewlx91VfHeMxb9LiUH1clN+qd0hINuQw158kqn5C4Iqdtm9tYD94H2z33KX37/eA/ ZDsx4GJ8F5f4+GBZhMjBvBOKW+r90PTKYc2G2S4q7zRe0Ev3d0qSgegr3zk2CLGEVKQB UOXibcXoZyyZiC0llSC0NK2tDIYnbPh079qdXFD/WPsIOGhOUkK6xUVgrWMoVtMwEuc2 yTPQ== X-Forwarded-Encrypted: i=1; AJvYcCUk2caw7Py9ArbNzUH7SxORyJmytpw5ePqwq1gbdpuaLrmjHy0pGcJxFiPz1EpqlsVYY3UBEeTyU3UrY983wVRwLTo= X-Gm-Message-State: AOJu0Yz1g+88egDeY3C9OCUzZuyITij24n7h9mn8YtYwfCPNUh36Rqvk mf2740N2V2DEChBl5ECvhMYBWCA45L7QOMIWi99YAAG876FFVglsy1wzeNU0uiVz8iPodp8xZEF wBTL9NHx9szBy/s2BheKOwOqnKSLUddnsWQJazxW8YSEnE7/H X-Received: by 2002:a05:620a:17a0:b0:790:e83a:e6eb with SMTP id ay32-20020a05620a17a000b00790e83ae6ebmr4022694qkb.5.1714330915158; Sun, 28 Apr 2024 12:01:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEq4u33yQLthyJZ4VKOCA3UIJSHLI7n1RNa/Jea7Dll1CjPi5vBpPXi0aewsChF2WFetPSb5Q== X-Received: by 2002:a05:620a:17a0:b0:790:e83a:e6eb with SMTP id ay32-20020a05620a17a000b00790e83ae6ebmr4022657qkb.5.1714330914396; Sun, 28 Apr 2024 12:01:54 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id oo8-20020a05620a530800b0078d693c0b4bsm9818152qkn.135.2024.04.28.12.01.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Apr 2024 12:01:53 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Christophe Leroy , peterx@redhat.com, David Hildenbrand , Andrew Morton , "Aneesh Kumar K . V" , Lorenzo Stoakes , John Hubbard , linuxppc-dev@lists.ozlabs.org, Muchun Song , Jason Gunthorpe Subject: [PATCH 1/2] mm/gup: Fix hugepd handling in hugetlb rework Date: Sun, 28 Apr 2024 15:01:50 -0400 Message-ID: <20240428190151.201002-2-peterx@redhat.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240428190151.201002-1-peterx@redhat.com> References: <20240428190151.201002-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 9C86380009 X-Stat-Signature: 47sbrde4kcnu5j8tpdrjtzd7jzo5p74j X-HE-Tag: 1714330918-601194 X-HE-Meta: U2FsdGVkX19j7Dgyqj8d8up6YvqSq/tqD7zTJvwQypomJrIXMd+N6DgHTcCbf2mxdfAa1AYYnWtwiq9V20w/0NCrwan+qUGVQOTtgAQFqbSyLv/sLY4BgSTa00MnJQDMXnx7xakL+eJSTrmgVNsPOMUGncNQS1gIrEhfQGwWgMKah2kf/1kAxrIHUwgyy9rOD0YOIhRHzvV53GdC1LlANAk3A+qz+eiMnXHz3MAYl/OGs62EbWANNXfR27FkS1D7eqem91KfJpTHe/jo4jsyhYO63gvruA/md6X7ghDFag6G2aKDnaDv+vuj+MOqY1p5OFuerRHbvxZ50cofq4iT6Hk6m38rkLOmT8nK75emTIKGHvJNbJnNk1dUwd02Dp07zy3VfxANn76yKzX5Gjf39zIwLT969y7ws3Xn1p31O/BmTFxjGRxeh9/OJ4ROGMaJCZ8IEC8hT6EErJEsPmk0Nry5Ky3PTl4sLk50EK/dVoibJvFABOVfNSrjDXX1JmYSCOvlbqEgwxWQX/LKUhjybnmQJp28Y22IIPsA4/uQODQ1pZAKOlYJSA7Q2Hwt6OQ7G0vyMQNiNF2cX2t0AeN5CopwaVoHiTrQtOBU4f8W58ZoVzRVDVEbpY5LHPOBW4DB/CnQEy1QSycdetCIgHu4OTkXygqCiia9Zo+s8rZQNlDeRI5q43BwM1lAc8fy3HJwVfENxzq0j+jsjVg9kokEw0TrfLHlyCcdiNN37Z0KeE88mjRD+NN+kygi4UMppJfL4gtpot+DvzxztEZ6OfnGKvml6nmNjJlaIInZgH+DuMESh6shpV/XGzxjV1GACek0cq5qOodKQ5MsXv/uP4o++zlXC538f6e1OTuxUmTux9SzcBBdvo1qvT3uZUPuhci9BAOe/Czj3HkNl+96pEQJauuAnzaRiMYIDBt+put/ik9Jj2aiLjjpRq8Plr3927scx0iCM0FYYEsWQMhH4eW K9G+XHqG IcjFsYqRkFxdqD4sNWuZSY0WQgKVYAhv+0udzxCFwvzMBT0L6vmTlav9wIaMOuVsBOSCv8IKskAMJCVo4BjoR4w5f0jQFJ6prUH9nlGa7MMINlknh2rJ++aRvaEj1T8JmBuJpzZvF6UqcYpsFsaIPPQrL0iuI2KK/uhw25VN2OCQl0ThezYqgMeWbyr1jWayAAYkF6uoGb6DOPLC1ts8QjDNhKEiNwPNCNSkaOlCt/iRFWhrCvQkrGo8GExjHRlcJXC+3QV5s0J1UokLz91lAM84eTeK/0kFBqh5yPTptqiR7fmc425i9E4zmJY8QaIm2y4m+OIiLgcluECQMp4tt96/wi9BqQO0476P80fo0g0UdhAHfCcF7agKpmedwv+tMhLklpseDymOP1TCwailslb6+LAGsvvYIa351Y09Fwbn1EUdvM1XfoYe69I7eUeqnCZ9pkLaK3YoR9bErt3ybsDGFEvOXWmmP1Q/wwTTGCyEJPacSigeqD3Pisw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Commit a12083d721d7 added hugepd handling for gup-slow, reusing gup-fast functions. follow_hugepd() correctly took the vma pointer in, however didn't pass it over into the lower functions, which was overlooked. The issue is gup_fast_hugepte() uses the vma pointer to make the correct decision on whether an unshare is needed for a FOLL_PIN|FOLL_LONGTERM. Now without vma ponter it will constantly return "true" (needs an unshare) for a page cache, even though in the SHARED case it will be wrong to unshare. The other problem is, even if an unshare is needed, it now returns 0 rather than -EMLINK, which will not trigger a follow up FAULT_FLAG_UNSHARE fault. That will need to be fixed too when the unshare is wanted. gup_longterm test didn't expose this issue in the past because it didn't yet test R/O unshare in this case, another separate patch will enable that in future tests. Fix it by passing vma correctly to the bottom, rename gup_fast_hugepte() back to gup_hugepte() as it is shared between the fast/slow paths, and also allow -EMLINK to be returned properly by gup_hugepte() even though gup-fast will take it the same as zero. Reported-by: David Hildenbrand Fixes: a12083d721d7 ("mm/gup: handle hugepd for follow_page()") Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand --- Note: The target commit to be fixed should just been moved into mm-stable, so no need to cc stable. --- mm/gup.c | 64 ++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 2f7baf96f655..ca0f5cedce9b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -525,9 +525,17 @@ static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, return (__boundary - 1 < end - 1) ? __boundary : end; } -static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, - unsigned long end, unsigned int flags, struct page **pages, - int *nr) +/* + * Returns 1 if succeeded, 0 if failed, -EMLINK if unshare needed. + * + * NOTE: for the same entry, gup-fast and gup-slow can return different + * results (0 v.s. -EMLINK) depending on whether vma is available. This is + * the expected behavior, where we simply want gup-fast to fallback to + * gup-slow to take the vma reference first. + */ +static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz, + unsigned long addr, unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long pte_end; struct page *page; @@ -559,9 +567,9 @@ static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 0; } - if (!pte_write(pte) && gup_must_unshare(NULL, flags, &folio->page)) { + if (!pte_write(pte) && gup_must_unshare(vma, flags, &folio->page)) { gup_put_folio(folio, refs, flags); - return 0; + return -EMLINK; } *nr += refs; @@ -577,19 +585,22 @@ static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, * of the other folios. See writable_file_mapping_allowed() and * gup_fast_folio_allowed() for more information. */ -static int gup_fast_hugepd(hugepd_t hugepd, unsigned long addr, - unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) +static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { pte_t *ptep; unsigned long sz = 1UL << hugepd_shift(hugepd); unsigned long next; + int ret; ptep = hugepte_offset(hugepd, addr, pdshift); do { next = hugepte_addr_end(addr, end, sz); - if (!gup_fast_hugepte(ptep, sz, addr, end, flags, pages, nr)) - return 0; + ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr); + if (ret != 1) + return ret; } while (ptep++, addr = next, addr != end); return 1; @@ -613,22 +624,25 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, h = hstate_vma(vma); ptep = hugepte_offset(hugepd, addr, pdshift); ptl = huge_pte_lock(h, vma->vm_mm, ptep); - ret = gup_fast_hugepd(hugepd, addr, pdshift, addr + PAGE_SIZE, - flags, &page, &nr); + ret = gup_hugepd(vma, hugepd, addr, pdshift, addr + PAGE_SIZE, + flags, &page, &nr); spin_unlock(ptl); - if (ret) { + if (ret == 1) { + /* GUP succeeded */ WARN_ON_ONCE(nr != 1); ctx->page_mask = (1U << huge_page_order(h)) - 1; return page; } - return NULL; + /* ret can be either 0 (translates to NULL) or negative */ + return ERR_PTR(ret); } #else /* CONFIG_ARCH_HAS_HUGEPD */ -static inline int gup_fast_hugepd(hugepd_t hugepd, unsigned long addr, - unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) +static inline int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, + unsigned long addr, unsigned int pdshift, + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { return 0; } @@ -3261,8 +3275,8 @@ static int gup_fast_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, * architecture have different format for hugetlbfs * pmd format and THP pmd format */ - if (!gup_fast_hugepd(__hugepd(pmd_val(pmd)), addr, - PMD_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(pmd_val(pmd)), addr, + PMD_SHIFT, next, flags, pages, nr) != 1) return 0; } else if (!gup_fast_pte_range(pmd, pmdp, addr, next, flags, pages, nr)) @@ -3291,8 +3305,8 @@ static int gup_fast_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, pages, nr)) return 0; } else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) { - if (!gup_fast_hugepd(__hugepd(pud_val(pud)), addr, - PUD_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(pud_val(pud)), addr, + PUD_SHIFT, next, flags, pages, nr) != 1) return 0; } else if (!gup_fast_pmd_range(pudp, pud, addr, next, flags, pages, nr)) @@ -3318,8 +3332,8 @@ static int gup_fast_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, return 0; BUILD_BUG_ON(p4d_leaf(p4d)); if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) { - if (!gup_fast_hugepd(__hugepd(p4d_val(p4d)), addr, - P4D_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(p4d_val(p4d)), addr, + P4D_SHIFT, next, flags, pages, nr) != 1) return 0; } else if (!gup_fast_pud_range(p4dp, p4d, addr, next, flags, pages, nr)) @@ -3347,8 +3361,8 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end, pages, nr)) return; } else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) { - if (!gup_fast_hugepd(__hugepd(pgd_val(pgd)), addr, - PGDIR_SHIFT, next, flags, pages, nr)) + if (gup_hugepd(NULL, __hugepd(pgd_val(pgd)), addr, + PGDIR_SHIFT, next, flags, pages, nr) != 1) return; } else if (!gup_fast_p4d_range(pgdp, pgd, addr, next, flags, pages, nr))