From patchwork Fri Jun 9 01:43:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13273094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C3DC7EE23 for ; Fri, 9 Jun 2023 01:43:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62FB38E0002; Thu, 8 Jun 2023 21:43:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E0318E0001; Thu, 8 Jun 2023 21:43:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4820A8E0002; Thu, 8 Jun 2023 21:43:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 35CCF8E0001 for ; Thu, 8 Jun 2023 21:43:46 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 06ABCA03AE for ; Fri, 9 Jun 2023 01:43:46 +0000 (UTC) X-FDA: 80881512852.20.BB437AF Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf07.hostedemail.com (Postfix) with ESMTP id 384014001A for ; Fri, 9 Jun 2023 01:43:43 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="h7/lhgAB"; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686275024; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=FcGt7OZAbNXtVRIJUB1ML8yOaRkBZfBXzWnOrC9xLDkb2dctTqir610k+MW9D5mPc+eyiN KU4CZg2Hyv644OF+UpHE8vkUL/uInwHkONz2OKidH3jucJDbrT3QOVep8Xj4ZWxA4kZPUx Xr5LjGreSTn/OxCIOd7aoRFZ7Q9Oq0g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686275024; a=rsa-sha256; cv=none; b=XKNSxbPJvGGSE71DCuzQ2rDfn20YXFJzDojrLrqlw6YcOs6Fs5w8JI4YKUWtiItU/Aq0fG xVbdGmB9vkFkJCESih7Dt7ClQKsaci5Jw2awpbwdbqcKh3D6jufh2+Jqg/OhcBNWkipxE1 d6OUiIlVaporKa7WdjVgE0hC+9Ith7Q= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="h7/lhgAB"; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-bacfb7acdb7so1342516276.0 for ; Thu, 08 Jun 2023 18:43:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686275023; x=1688867023; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=h7/lhgABcsEuN80ZtY1CkQPEyiRM02tdU0Fj4Bqu3XEJ6+oMt1JttOuF1+QhBlyKlU nItRcb1lhVcg3eXONIukosC/ovw61FcNIrky9ZYjzzwajHv6HsTcvoIBjY3VhzMNyh1o MljGnp6iHoyALEuaRGEygREAs7NiPhq7Ts0bzgZaUDe8a2nZJcIQ2+Lj1wkKmKdUDbGy NH5AIRGaz2lGcsHiKp41MEO/swZqDgq35wqrQHtww6/nIP6oqDLpTQPv1VdUEAb8WQlv b+6qKzgTWVc0T+xle5LZMl0lKQUz8B9zZKcS4HbNa7M3nikEfj9E/klPVWnDJWzUyVm/ ISKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686275023; x=1688867023; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=aBI9UylUDNUS9PmI6MosqfhscjzYV3XZh2/yCb7QtMJ0tBmusagG3VIjegsqZwWo5N HEW2nYHHvTTxGzGNmnmmMnF8cPU8qSQ0Y4rHPJCQHvZElhcM+1UOoEJpUlXAubUsfRaJ OfE+ajnyDvGWc7ShS4Is4JJjXLQKiqkUSwWIOd4MkVVtORZyiQ7tZRO50endhdqi8x6p VtomuRp5TYGWYfzJp23CkUBj0XOmrOKmf+8gnWSi+0FpIoi6BaI9c4+/H/0RY3hbQ/Nr vCmkYNb0y9noP2Q2Ef5YWzKqRROiqpraAG1TAlBDgGVmwObppHJ55SYqs/sXDv7Ea3wo nDNA== X-Gm-Message-State: AC+VfDx7RVDJkzWk931Wk54ExanNzcXxa5+b3xZa1StWHBy7VowWXb5w tViNrjEGHh+k9uIQNxDrMgK/Ig== X-Google-Smtp-Source: ACHHUZ7uNPgy7ZBXRocyBo3vsfUWGTN3VMWhF4yrbhs6dEiMkK2v8+kmP/oHR3LauFiKONeXt7BBYg== X-Received: by 2002:a0d:f6c4:0:b0:55a:40d3:4d6f with SMTP id g187-20020a0df6c4000000b0055a40d34d6fmr1156326ywf.26.1686275023013; Thu, 08 Jun 2023 18:43:43 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a17-20020a81bb51000000b00545a08184fdsm281040ywl.141.2023.06.08.18.43.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 18:43:41 -0700 (PDT) Date: Thu, 8 Jun 2023 18:43:38 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 28/32] mm/memory: allow pte_offset_map[_lock]() to fail In-Reply-To: Message-ID: References: MIME-Version: 1.0 X-Stat-Signature: 8c1ff3kjcfzzirf9apa5qipxz5s35tz1 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 384014001A X-Rspam-User: X-HE-Tag: 1686275023-664687 X-HE-Meta: U2FsdGVkX1/aHTGM3fyQkgibezir8cMmCiOxEMIGF2V4koWxVgD9Y5+FomZgyyjplyOaCjr9yWgRGTSsNj44J4fQuErwNjWB1zrnUAN/Y5aVaT3ZwMKeBdtZd+h6EsFQhc15BEW3+AnN7nevDoo1nCOVgxNQmnWWrGaQ+o2WuntYqTLq9fz4linU/oPzIzGUHIkLGSd0XnhvDJ7SQXrXjkJQPS83b1SVUN+OpcBQmNI+xEzQ/hHso+rpAng9aPvZqzDknUuCutO/43ahGmds+n6pGC76VCJ3788sasWl4CLZt/3fx3+CqK2klcanWRm19cRf8GktdN4oNOO2IpZa6smLHXCCoYXeo/9HwrOPZodCWc0AfznDEPZvWszBcrkA4L4Wm96Z9DhtOh0BujfYTLrouCwypz4ctwJeQeZK3vLTkQWM1VtoZamsfgO3YZX0wxtaQ+UM1TNwIbMzp7d+g8LbHk852K8Ag/Ho+tPpUXquQ38+HfbuXN1Lxxj+aHK7CGtxI1tWl4BZgB4Asnqw3d9KBQxtPgzHe0RjsSWbqkDy3Oph7pHaXmCV28TEjEfpSFxRinZk8nbXHVNMUV195gN7fSyjkXTx38+8B3HAa8nw8AYvugGFw56nYAf7ZyXBDelbY2xj2rtvuB5alOM24+60x9T3ewWlYRP+dRjUZbP2LHW8tKPXLQaXJLDZ+rajU9maaDbYsxAewPXQ5Cf/bNZdINButwoUwkGUjyHlH3pEapDXg3RWczqA3qMLqOoNImoXbymICNkcoj9ZfYVF3iavXXaTbywLDyv6XOGdHWGH5cHT4vh52qYpVpxprImQJFu/TmRPSjzUgtZYi7CkIvhM00tEcQGj507QrnbAPgWZmz72yT714CgpmAQswNB41WbMulvJcQBOQbocQkBI2bL36LtPHbqHwcpftVm57nHwm228oycBs7fkqvXQFHG6L/EBu1JgCaChN8BKnsW OibsuV46 AeY5nzJWwNwgp6RvCF5QcuwFoczvGtDa+H8EfQ//Gy7+DF7TCBMZ157WPjUcgNJxQow4oLX25WzVxK7gDAJAy81uWdBpR1QVn0D9GmcHHyrKpFKfjgZdfg+ntKaflSC6t2bTVCmoXJO5FdeL3dbB4J1T2WYP3TCiENWIWrjtQUF41prjnDhtmi/WMkYgqNPjFR8z+tgi3UPyKt88JTMyQVCbBSvqRveGGAfVyzob8yHMrzPXWopk5mAF/1AZRudw7fUEcQWSjlcNaemF2runZrdiXOJUG2iJCJd4VTvoNBx+5gXhLJScmWT9qRQ8s1Y5yHGuZ0bhT/oFZ2lIHSj4kXSeIK6EKYCtOkFPvhpVoEUQ7YF4ZsBjXO/COsF8E1WgXXHLjTbX8QCuR86JgFlSV3HUi6Y2nuOSPSXws/hmA6+bBybNF4zUeN4ARxQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: copy_pte_range(): use pte_offset_map_nolock(), and allow for it to fail; but with a comment on some further assumptions that are being made there. zap_pte_range() and zap_pmd_range(): adjust their interaction so that a pte_offset_map_lock() failure in zap_pte_range() leads to a retry in zap_pmd_range(); remove call to pmd_none_or_trans_huge_or_clear_bad(). Allow pte_offset_map_lock() to fail in many functions. Update comment on calling pte_alloc() in do_anonymous_page(). Remove redundant calls to pmd_trans_unstable(), pmd_devmap_trans_unstable(), pmd_none() and pmd_bad(); but leave pmd_none_or_clear_bad() calls in free_pmd_range() and copy_pmd_range(), those do simplify the next level down. Signed-off-by: Hugh Dickins --- mm/memory.c | 172 +++++++++++++++++++++++++--------------------------- 1 file changed, 82 insertions(+), 90 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 2eb54c0d5d3c..c7b920291a72 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1012,13 +1012,25 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, progress = 0; init_rss_vec(rss); + /* + * copy_pmd_range()'s prior pmd_none_or_clear_bad(src_pmd), and the + * error handling here, assume that exclusive mmap_lock on dst and src + * protects anon from unexpected THP transitions; with shmem and file + * protected by mmap_lock-less collapse skipping areas with anon_vma + * (whereas vma_needs_copy() skips areas without anon_vma). A rework + * can remove such assumptions later, but this is good enough for now. + */ dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); if (!dst_pte) { ret = -ENOMEM; goto out; } - src_pte = pte_offset_map(src_pmd, addr); - src_ptl = pte_lockptr(src_mm, src_pmd); + src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + if (!src_pte) { + pte_unmap_unlock(dst_pte, dst_ptl); + /* ret == 0 */ + goto out; + } spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte = src_pte; orig_dst_pte = dst_pte; @@ -1083,8 +1095,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); - spin_unlock(src_ptl); - pte_unmap(orig_src_pte); + pte_unmap_unlock(orig_src_pte, src_ptl); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); cond_resched(); @@ -1388,10 +1399,11 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, swp_entry_t entry; tlb_change_page_size(tlb, PAGE_SIZE); -again: init_rss_vec(rss); - start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte = start_pte; + start_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return addr; + flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { @@ -1507,17 +1519,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, * If we forced a TLB flush (either due to running out of * batch buffers or because we needed to flush dirty TLB * entries before releasing the ptl), free the batched - * memory too. Restart if we didn't do everything. + * memory too. Come back again if we didn't do everything. */ - if (force_flush) { - force_flush = 0; + if (force_flush) tlb_flush_mmu(tlb); - } - - if (addr != end) { - cond_resched(); - goto again; - } return addr; } @@ -1536,8 +1541,10 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); - else if (zap_huge_pmd(tlb, vma, pmd, addr)) - goto next; + else if (zap_huge_pmd(tlb, vma, pmd, addr)) { + addr = next; + continue; + } /* fall through */ } else if (details && details->single_folio && folio_test_pmd_mappable(details->single_folio) && @@ -1550,20 +1557,14 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, */ spin_unlock(ptl); } - - /* - * Here there can be other concurrent MADV_DONTNEED or - * trans huge page faults running, and if the pmd is - * none or trans huge it can change under us. This is - * because MADV_DONTNEED holds the mmap_lock in read - * mode. - */ - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - goto next; - next = zap_pte_range(tlb, vma, pmd, addr, next, details); -next: - cond_resched(); - } while (pmd++, addr = next, addr != end); + if (pmd_none(*pmd)) { + addr = next; + continue; + } + addr = zap_pte_range(tlb, vma, pmd, addr, next, details); + if (addr != next) + pmd--; + } while (pmd++, cond_resched(), addr != end); return addr; } @@ -1905,6 +1906,10 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr, const int batch_size = min_t(int, pages_to_write_in_pmd, 8); start_pte = pte_offset_map_lock(mm, pmd, addr, &pte_lock); + if (!start_pte) { + ret = -EFAULT; + goto out; + } for (pte = start_pte; pte_idx < batch_size; ++pte, ++pte_idx) { int err = insert_page_in_batch_locked(vma, pte, addr, pages[curr_page_idx], prot); @@ -2572,10 +2577,10 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, mapped_pte = pte = (mm == &init_mm) ? pte_offset_kernel(pmd, addr) : pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return -EINVAL; } - BUG_ON(pmd_huge(*pmd)); - arch_enter_lazy_mmu_mode(); if (fn) { @@ -2804,7 +2809,6 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, int ret; void *kaddr; void __user *uaddr; - bool locked = false; struct vm_area_struct *vma = vmf->vma; struct mm_struct *mm = vma->vm_mm; unsigned long addr = vmf->address; @@ -2830,12 +2834,12 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, * On architectures with software "accessed" bits, we would * take a double page fault, so mark it accessed here. */ + vmf->pte = NULL; if (!arch_has_hw_pte_young() && !pte_young(vmf->orig_pte)) { pte_t entry; vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked = true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* * Other thread has already handled the fault * and update local tlb only @@ -2857,13 +2861,12 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, * zeroes. */ if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { - if (locked) + if (vmf->pte) goto warn; /* Re-validate under PTL if the page is still mapped */ vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked = true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* The PTE changed under us, update local tlb */ update_mmu_tlb(vma, addr, vmf->pte); ret = -EAGAIN; @@ -2888,7 +2891,7 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, ret = 0; pte_unlock: - if (locked) + if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); kunmap_atomic(kaddr); flush_dcache_page(dst); @@ -3110,7 +3113,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * Re-check the pte - we dropped the lock */ vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) { if (old_folio) { if (!folio_test_anon(old_folio)) { dec_mm_counter(mm, mm_counter_file(&old_folio->page)); @@ -3178,19 +3181,20 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) /* Free the old page.. */ new_folio = old_folio; page_copied = 1; - } else { + pte_unmap_unlock(vmf->pte, vmf->ptl); + } else if (vmf->pte) { update_mmu_tlb(vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); } - if (new_folio) - folio_put(new_folio); - - pte_unmap_unlock(vmf->pte, vmf->ptl); /* * No need to double call mmu_notifier->invalidate_range() callback as * the above ptep_clear_flush_notify() did already call it. */ mmu_notifier_invalidate_range_only_end(&range); + + if (new_folio) + folio_put(new_folio); if (old_folio) { if (page_copied) free_swap_cache(&old_folio->page); @@ -3230,6 +3234,8 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf) WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED)); vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; /* * We might have raced with another page fault while we released the * pte_offset_map_lock. @@ -3591,10 +3597,11 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte); - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); folio_unlock(folio); folio_put(folio); @@ -3625,6 +3632,8 @@ static vm_fault_t pte_marker_clear(struct vm_fault *vmf) { vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return 0; /* * Be careful so that we will only recover a special uffd-wp pte into a * none pte. Otherwise it means the pte could have changed, so retry. @@ -3728,11 +3737,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page = pfn_swap_entry_to_page(entry); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { - spin_unlock(vmf->ptl); - goto out; - } - + if (unlikely(!vmf->pte || + !pte_same(*vmf->pte, vmf->orig_pte))) + goto unlock; /* * Get a page reference while we know the page can't be * freed. @@ -3807,7 +3814,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) ret = VM_FAULT_OOM; goto unlock; } @@ -3877,7 +3884,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) goto out_nomap; if (unlikely(!folio_test_uptodate(folio))) { @@ -4003,13 +4010,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out: if (si) put_swap_device(si); return ret; out_nomap: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out_page: folio_unlock(folio); out_release: @@ -4041,22 +4050,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return VM_FAULT_SIGBUS; /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). + * Use pte_alloc() instead of pte_alloc_map(), so that OOM can + * be distinguished from a transient failure of pte_offset_map(). */ if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; - /* See comment in handle_pte_fault() */ - if (unlikely(pmd_trans_unstable(vmf->pmd))) - return 0; - /* Use the zero-page for reads */ if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm)) { @@ -4064,6 +4063,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vma->vm_page_prot)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto unlock; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto unlock; @@ -4104,6 +4105,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto release; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto release; @@ -4131,7 +4134,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; release: folio_put(folio); @@ -4380,15 +4384,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } - /* - * See comment in handle_pte_fault() for how this scenario happens, we - * need to return NOPAGE so that we drop this page. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return VM_FAULT_NOPAGE; - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; /* Re-check under ptl */ if (likely(!vmf_pte_changed(vmf))) { @@ -4630,17 +4629,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf) * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ if (!vma->vm_ops->fault) { - /* - * If we find a migration pmd entry or a none pmd entry, which - * should never happen, return SIGBUS - */ - if (unlikely(!pmd_present(*vmf->pmd))) + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) ret = VM_FAULT_SIGBUS; else { - vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, - vmf->pmd, - vmf->address, - &vmf->ptl); /* * Make sure this is not a temporary clearing of pte * by holding ptl and checking again. A R/M/W update @@ -5429,10 +5422,9 @@ int follow_pte(struct mm_struct *mm, unsigned long address, pmd = pmd_offset(pud, address); VM_BUG_ON(pmd_trans_huge(*pmd)); - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - goto out; - ptep = pte_offset_map_lock(mm, pmd, address, ptlp); + if (!ptep) + goto out; if (!pte_present(*ptep)) goto unlock; *ptepp = ptep;