From patchwork Mon Aug 5 12:55:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13753583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FC65C3DA4A for ; Mon, 5 Aug 2024 12:55:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C4936B0092; Mon, 5 Aug 2024 08:55:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 971506B0093; Mon, 5 Aug 2024 08:55:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EAC36B0095; Mon, 5 Aug 2024 08:55:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 61B086B0092 for ; Mon, 5 Aug 2024 08:55:53 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 16F3CA2440 for ; Mon, 5 Aug 2024 12:55:53 +0000 (UTC) X-FDA: 82418188986.07.4B0DA49 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) by imf17.hostedemail.com (Postfix) with ESMTP id 485874001B for ; Mon, 5 Aug 2024 12:55:51 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TOEmIU07; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf17.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.160.53 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722862505; a=rsa-sha256; cv=none; b=OaAYxrTwKrmdktAqI2ltuagv2W8k322GXUqL9GL6KPBSGdlID89z3soEjbjHlK2Gsr3Hzw wELlyEfYIdDvK71jWsP5uTACvjR4yeOLxgOJK9ngdH7CUjNleFlvbcw1uej73ApjZVb+wJ 3s6RdnizE3CrBhPBc/oQw9SdcLthXeo= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=TOEmIU07; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf17.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.160.53 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722862505; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+E/n3u69l82MknfkBVWlLtitLQrhUjJGa1Gd08uFtUg=; b=1pFyNO3vyVc4fVfSe8BJU8q1iV0UjnHbh0LIAelEEdjQVevpFJ8lqNhuo7zl69N0AvRY4F lYrz0cqYGrcHco5BDhbADOTGaQEFoPCEt4jd36d8iNEjiQhgF9cUzwG8vYbHGjzAUGKGtP UFOqrtK7iYdf0UfXX/OhAi7fbVHMg04= Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-2644f7d0fb2so1529618fac.0 for ; Mon, 05 Aug 2024 05:55:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722862550; x=1723467350; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+E/n3u69l82MknfkBVWlLtitLQrhUjJGa1Gd08uFtUg=; b=TOEmIU07YZyrP591kJD5AdO02765aKMwZnl08Edu96w2tmreRIk7sWHZwXbvDlR/oB iWIhoTPpSws5umoPhbd+dUBuqo27CTSDuYZGLqX6Hr/SWU98Sg2x7NiD7+BerupX6XZy q9F2yzKLf70qh6Xwp3pvx9gt7867Wh0FXTG8mUbXH2fo/Nsl4nFD5QchBQ/8B9NxWW/I 1Tb6r9vU/BnJc/AHz41c7mMAnMiR3eP4RU9E123z4AdYwr8AxDLFvXb0Oxcb/2+DPbyA DPuHrcPp6YjxT/LyjK7aA1V6jzdNthYRTF+zyGwPVvv6ggWXfRa20CFrHEpCqn+qHvz5 LpfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722862550; x=1723467350; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+E/n3u69l82MknfkBVWlLtitLQrhUjJGa1Gd08uFtUg=; b=RLt20Vq3vbsr2I8n54nvXfeWYWWoDp3lekCTC6SPogdbGqB7UpObwLlp3kZXC1J69F DCrGsYPYBgXjeOSe2qzP5kBeQlOgrsBoxXHO6pdG5JKaxO5MDAgTlR4a5DIcwsJM5dsk mlk/CBjfktLpPHqLrEKXE5SNDN7gJKeAReNpd7Z6JM/McTXo9//DG2J47GEaGgHQP4RU YmsmnmAodjLdOauAI8lK5G0uGnOQsVy8fM8XNmCOq2xkDwiP2b+UImgP89gydTUn8sPN IBFsQaGJf72RNH0Dokq+8AOqTqGAbCdODxt64lvCr3j51OheWFchKW/C/n2/fr/TKoke sThQ== X-Gm-Message-State: AOJu0Ywvnaj7fGQ4RxpQQN3AxWXvK8QATNKef3QISs7OaiFB+StlU3T4 T2Qjl38gpeqEbbzXF3DHQwJcKUSMYF7FMJAv57DVK8XuaBcmysk+rgisX59uwWU= X-Google-Smtp-Source: AGHT+IFv0DoB3b0vEye0FjjGUQbE4c0I7YpIv3kq3xPjqljbSw0tByADYY9Wsgd4q3ipuhDCRO/jUQ== X-Received: by 2002:a05:6871:338c:b0:25e:44b9:b2ee with SMTP id 586e51a60fabf-26891aa626amr6693054fac.2.1722862550191; Mon, 05 Aug 2024 05:55:50 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ecfaf1asm5503030b3a.142.2024.08.05.05.55.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 05:55:49 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, zokeefe@google.com, rientjes@google.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval Date: Mon, 5 Aug 2024 20:55:05 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 485874001B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 9p15ztfhwg3rd6fm3e1gkbipytfkpj84 X-HE-Tag: 1722862551-196664 X-HE-Meta: U2FsdGVkX18yEtFL9ninhTNV/Y9YzxwdtjudBlSananiv/JbQey7TV6dNNMUcN7Wbj843t6Jq727vRT8mSJ1IcUlqGJJiZFwerGzBebLawe3RBM3m0Yyi9MTlHP7l8QBkwbYuWfj5dAa5Hw4X/SpvQDW1vPhwg0VwKNF4SKHcniLfXBukvaOu8yYbmKrP3BCbqwCv4T+t86wqVcF64ZSQ+WS9MXQ9pwq7wrd6ehE/WIpv8o39uk4WwY7PSW+8BSfvVYayBwHOgOQd3MziMuNKhTKJ6HLcdduWqDhLh6gk/69IQkgQw5MtL7CIFXB95gZej3Rofq97HzeYNdz0Wk30lqq1w4INNuqLhRNrRaRzyXaiSHQhJ+UE2sCSEcoJY4bNs7aco3y8Zc9yExY3rNa4z3J6uPlwLsB4qw36DncjFc8puYE6/CZw8tKMJFi5cnn9R44jpcMFLJkBqLEqBAqLvVfIMcrjYXfDvKxWuy1QPbB87F0N3ocn3bdWbsFzajVBcCBBRmY6FXeLiUET2nywWPYZhD1C/MMSPi4ky+qQyc0cMHrOmgWJCTi3MEjnHQsIcTOthVnKiEMlIzGqwcXe+CkjXuG4yAPpsSQYTmeHdQZ4SxTxECXp8s62Zcdlc+EkfEsoOV1F0EP2qJ2UZ77sz8F+ZrNZZFuQ3WUwoYLnttSobIQwLegb+V/mj419mmB7BpZZVWxrQ/+wHeF8jzrkBwxLwXnrrWEmpTYLTq9RVYTn9yQ7EB2Zt9RqvGFkN7h9To6Rler+YgR0PQzgIGEjSHOB0C4LZNd90ykyryLchowmJkoRgR/6vQqjWstdtBVGwjEYnWfeVozED1xJwwpgXaBHJxYupAC/pySRNl/7fh5SfRBryuBhX+uQAHwJHN8CBs5WrnU5NO7a187FlMz+zyxlG+LZeKjqF0SVRYQFl0fF4jozH1AZfw7a5oKALCeUvu13juF0gXLJLR6OHZ 8GLJrRhQ Fz9pTb568dmLeFOV0SZKm2mW70wRJpbAk44BbwLqc5eteLeUz5RBxGEusEdyj2oewSwj1xj+bT7A+7vIV+3O74KoAuTwk6PETlFUUqA1NTQG2BScdJxPCdcXL/ledyfdct02EPlrdP4IVMrR7zoYZZvLelt9QXkBhaaSZRQNQMVS11hG2EzXu4sYu79xRD2KYlqOup44ebdoxAQaE6eLcTxL1FsntCIX1tJS5QAgDuw5nP+k30W2YqtACjGhbWuzWY9yoLECF/X76z0cyvD+ITFDZSOgzWcJ1a+9s565Gi4tmHGAIUnRf/CE+KD7VldmECsB+Za56C0LC6Pcwwb9JcSoY1UXnw9JD1thweL0MW6WayRCTA3CcS66cEw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Make pte_offset_map_nolock() return pmdval so that we can recheck the *pmd once the lock is taken. This is a preparation for freeing empty PTE pages, no functional changes are expected. Signed-off-by: Qi Zheng --- Documentation/mm/split_page_table_lock.rst | 3 ++- arch/arm/mm/fault-armv.c | 2 +- arch/powerpc/mm/pgtable.c | 2 +- include/linux/mm.h | 4 ++-- mm/filemap.c | 2 +- mm/khugepaged.c | 4 ++-- mm/memory.c | 4 ++-- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 2 +- mm/pgtable-generic.c | 21 ++++++++++++--------- mm/userfaultfd.c | 4 ++-- mm/vmscan.c | 2 +- 12 files changed, 28 insertions(+), 24 deletions(-) diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index e4f6972eb6c04..e6a47d57531cd 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -18,7 +18,8 @@ There are helpers to lock/unlock a table and other accessor functions: pointer to its PTE table lock, or returns NULL if no PTE table; - pte_offset_map_nolock() maps PTE, returns pointer to PTE with pointer to its PTE table - lock (not taken), or returns NULL if no PTE table; + lock (not taken) and the value of its pmd entry, or returns NULL + if no PTE table; - pte_offset_map() maps PTE, returns pointer to PTE, or returns NULL if no PTE table; - pte_unmap() diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 831793cd6ff94..db07e6a05eb6e 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,7 +117,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); + pte = pte_offset_map_nolock(vma->vm_mm, pmd, NULL, address, &ptl); if (!pte) return 0; diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 7316396e452d8..9b67d2a1457ed 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -398,7 +398,7 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) */ if (pmd_none(*pmd)) return; - pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + pte = pte_offset_map_nolock(mm, pmd, NULL, addr, &ptl); BUG_ON(!pte); assert_spin_locked(ptl); pte_unmap(pte); diff --git a/include/linux/mm.h b/include/linux/mm.h index 43b40334e9b28..b1ef2afe620c5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2937,8 +2937,8 @@ static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, return pte; } -pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, spinlock_t **ptlp); +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdvalp, + unsigned long addr, spinlock_t **ptlp); #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/mm/filemap.c b/mm/filemap.c index 67c3f5136db33..3285dffb64cf8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3231,7 +3231,7 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf) if (!(vmf->flags & FAULT_FLAG_ORIG_PTE_VALID)) return 0; - ptep = pte_offset_map_nolock(vma->vm_mm, vmf->pmd, vmf->address, + ptep = pte_offset_map_nolock(vma->vm_mm, vmf->pmd, NULL, vmf->address, &vmf->ptl); if (unlikely(!ptep)) return VM_FAULT_NOPAGE; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cdd1d8655a76b..91b93259ee214 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1009,7 +1009,7 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, }; if (!pte++) { - pte = pte_offset_map_nolock(mm, pmd, address, &ptl); + pte = pte_offset_map_nolock(mm, pmd, NULL, address, &ptl); if (!pte) { mmap_read_unlock(mm); result = SCAN_PMD_NULL; @@ -1598,7 +1598,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) pml = pmd_lock(mm, pmd); - start_pte = pte_offset_map_nolock(mm, pmd, haddr, &ptl); + start_pte = pte_offset_map_nolock(mm, pmd, NULL, haddr, &ptl); if (!start_pte) /* mmap_lock + page lock should prevent this */ goto abort; if (!pml) diff --git a/mm/memory.c b/mm/memory.c index d6a9dcddaca4a..afd8a967fb953 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1108,7 +1108,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, ret = -ENOMEM; goto out; } - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + src_pte = pte_offset_map_nolock(src_mm, src_pmd, NULL, addr, &src_ptl); if (!src_pte) { pte_unmap_unlock(dst_pte, dst_ptl); /* ret == 0 */ @@ -5671,7 +5671,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) * it into a huge pmd: just retry later if so. */ vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); + NULL, vmf->address, &vmf->ptl); if (unlikely(!vmf->pte)) return 0; vmf->orig_pte = ptep_get_lockless(vmf->pte); diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc6409..f672d0218a6fe 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -175,7 +175,7 @@ static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, err = -EAGAIN; goto out; } - new_pte = pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl); + new_pte = pte_offset_map_nolock(mm, new_pmd, NULL, new_addr, &new_ptl); if (!new_pte) { pte_unmap_unlock(old_pte, old_ptl); err = -EAGAIN; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae5cc42aa2087..507701b7bcc1e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -33,7 +33,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, spinlock_t **ptlp) * Though, in most cases, page lock already protects this. */ pvmw->pte = pte_offset_map_nolock(pvmw->vma->vm_mm, pvmw->pmd, - pvmw->address, ptlp); + NULL, pvmw->address, ptlp); if (!pvmw->pte) return false; diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711ac..443e3b34434a5 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -305,7 +305,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) return NULL; } -pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdvalp, unsigned long addr, spinlock_t **ptlp) { pmd_t pmdval; @@ -314,6 +314,8 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pte = __pte_offset_map(pmd, addr, &pmdval); if (likely(pte)) *ptlp = pte_lockptr(mm, &pmdval); + if (pmdvalp) + *pmdvalp = pmdval; return pte; } @@ -347,14 +349,15 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, * and disconnected table. Until pte_unmap(pte) unmaps and rcu_read_unlock()s * afterwards. * - * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map(); - * but when successful, it also outputs a pointer to the spinlock in ptlp - as - * pte_offset_map_lock() does, but in this case without locking it. This helps - * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time - * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock - * pointer for the page table that it returns. In principle, the caller should - * recheck *pmd once the lock is taken; in practice, no callsite needs that - - * either the mmap_lock for write, or pte_same() check on contents, is enough. + * pte_offset_map_nolock(mm, pmd, pmdvalp, addr, ptlp), above, is like + * pte_offset_map(); but when successful, it also outputs a pointer to the + * spinlock in ptlp - as pte_offset_map_lock() does, but in this case without + * locking it. This helps the caller to avoid a later pte_lockptr(mm, *pmd), + * which might by that time act on a changed *pmd: pte_offset_map_nolock() + * provides the correct spinlock pointer for the page table that it returns. + * In principle, the caller should recheck *pmd once the lock is taken; But in + * most cases, either the mmap_lock for write, or pte_same() check on contents, + * is enough. * * Note that free_pgtables(), used after unmapping detached vmas, or when * exiting the whole mm, does not take page table lock before freeing a page diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3b7715ecf292a..aa3c9cc51cc36 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1143,7 +1143,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, src_addr, src_addr + PAGE_SIZE); mmu_notifier_invalidate_range_start(&range); retry: - dst_pte = pte_offset_map_nolock(mm, dst_pmd, dst_addr, &dst_ptl); + dst_pte = pte_offset_map_nolock(mm, dst_pmd, NULL, dst_addr, &dst_ptl); /* Retry if a huge pmd materialized from under us */ if (unlikely(!dst_pte)) { @@ -1151,7 +1151,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } - src_pte = pte_offset_map_nolock(mm, src_pmd, src_addr, &src_ptl); + src_pte = pte_offset_map_nolock(mm, src_pmd, NULL, src_addr, &src_ptl); /* * We held the mmap_lock for reading so MADV_DONTNEED diff --git a/mm/vmscan.c b/mm/vmscan.c index 31d13462571e6..b00cd560c0e43 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3378,7 +3378,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); - pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte = pte_offset_map_nolock(args->mm, pmd, NULL, start & PMD_MASK, &ptl); if (!pte) return false; if (!spin_trylock(ptl)) {