diff mbox series

[RFC,v2,1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval

Message ID d101b185eb55438b18faa2029e4303b7453bd5f5.1722861064.git.zhengqi.arch@bytedance.com (mailing list archive)
State New
Headers show
Series synchronously scan and reclaim empty user PTE pages | expand

Commit Message

Qi Zheng Aug. 5, 2024, 12:55 p.m. UTC
Make pte_offset_map_nolock() return pmdval so that we can recheck the
*pmd once the lock is taken. This is a preparation for freeing empty
PTE pages, no functional changes are expected.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 Documentation/mm/split_page_table_lock.rst |  3 ++-
 arch/arm/mm/fault-armv.c                   |  2 +-
 arch/powerpc/mm/pgtable.c                  |  2 +-
 include/linux/mm.h                         |  4 ++--
 mm/filemap.c                               |  2 +-
 mm/khugepaged.c                            |  4 ++--
 mm/memory.c                                |  4 ++--
 mm/mremap.c                                |  2 +-
 mm/page_vma_mapped.c                       |  2 +-
 mm/pgtable-generic.c                       | 21 ++++++++++++---------
 mm/userfaultfd.c                           |  4 ++--
 mm/vmscan.c                                |  2 +-
 12 files changed, 28 insertions(+), 24 deletions(-)

Comments

David Hildenbrand Aug. 5, 2024, 2:43 p.m. UTC | #1
On 05.08.24 14:55, Qi Zheng wrote:
> Make pte_offset_map_nolock() return pmdval so that we can recheck the
> *pmd once the lock is taken. This is a preparation for freeing empty
> PTE pages, no functional changes are expected.

Skimming the patches, only patch #4 updates one of the callsites 
(collapse_pte_mapped_thp).

Wouldn't we have to recheck if the PMD val changed in more cases after 
taking the PTL?

If not, would it make sense to have a separate function that returns the 
pmdval and we won't have to update each and every callsite?
Qi Zheng Aug. 6, 2024, 2:40 a.m. UTC | #2
Hi David,

On 2024/8/5 22:43, David Hildenbrand wrote:
> On 05.08.24 14:55, Qi Zheng wrote:
>> Make pte_offset_map_nolock() return pmdval so that we can recheck the
>> *pmd once the lock is taken. This is a preparation for freeing empty
>> PTE pages, no functional changes are expected.
> 
> Skimming the patches, only patch #4 updates one of the callsites 
> (collapse_pte_mapped_thp).

In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
also used the pmdval returned by pte_offset_map_nolock().

> 
> Wouldn't we have to recheck if the PMD val changed in more cases after 
> taking the PTL?
> 
> If not, would it make sense to have a separate function that returns the 
> pmdval and we won't have to update each and every callsite?

pte_offset_map_nolock() had already obtained the pmdval previously, just
hadn't returned it. And updating those callsite is simple, so I think
there may not be a need to add a separate function.

Thanks,
Qi

>
David Hildenbrand Aug. 6, 2024, 2:16 p.m. UTC | #3
On 06.08.24 04:40, Qi Zheng wrote:
> Hi David,
> 
> On 2024/8/5 22:43, David Hildenbrand wrote:
>> On 05.08.24 14:55, Qi Zheng wrote:
>>> Make pte_offset_map_nolock() return pmdval so that we can recheck the
>>> *pmd once the lock is taken. This is a preparation for freeing empty
>>> PTE pages, no functional changes are expected.
>>
>> Skimming the patches, only patch #4 updates one of the callsites
>> (collapse_pte_mapped_thp).
> 
> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
> also used the pmdval returned by pte_offset_map_nolock().

Right, and I am questioning if only touching these two is sufficient, 
and how we can make it clearer when someone actually has to recheck the PMD.

> 
>>
>> Wouldn't we have to recheck if the PMD val changed in more cases after
>> taking the PTL?
>>
>> If not, would it make sense to have a separate function that returns the
>> pmdval and we won't have to update each and every callsite?
> 
> pte_offset_map_nolock() had already obtained the pmdval previously, just
> hadn't returned it. And updating those callsite is simple, so I think
> there may not be a need to add a separate function.

Let me ask this way: why is retract_page_tables() and 
reclaim_pgtables_pmd_entry() different to the other ones, and how would 
someone using pte_offset_map_nolock() know what's to do here?

IIUC, we must check the PMDVAL after taking the PTL in case

(a) we want to modify the page table to turn pte_none() entries to
     !pte_none(). Because it could be that the page table was removed and
     now is all pte_none()

(b) we want to remove the page table ourselves and want to check if it
     has already been removed.

Is that it?

So my thinking is if another function variant can make that clearer.
David Hildenbrand Aug. 9, 2024, 4:54 p.m. UTC | #4
On 07.08.24 05:08, Qi Zheng wrote:
> Hi David,
> 
> On 2024/8/6 22:16, David Hildenbrand wrote:
>> On 06.08.24 04:40, Qi Zheng wrote:
>>> Hi David,
>>>
>>> On 2024/8/5 22:43, David Hildenbrand wrote:
>>>> On 05.08.24 14:55, Qi Zheng wrote:
>>>>> Make pte_offset_map_nolock() return pmdval so that we can recheck the
>>>>> *pmd once the lock is taken. This is a preparation for freeing empty
>>>>> PTE pages, no functional changes are expected.
>>>>
>>>> Skimming the patches, only patch #4 updates one of the callsites
>>>> (collapse_pte_mapped_thp).
>>>
>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
>>> also used the pmdval returned by pte_offset_map_nolock().
>>
>> Right, and I am questioning if only touching these two is sufficient,
>> and how we can make it clearer when someone actually has to recheck the
>> PMD.
>>
>>>
>>>>
>>>> Wouldn't we have to recheck if the PMD val changed in more cases after
>>>> taking the PTL?
>>>>
>>>> If not, would it make sense to have a separate function that returns the
>>>> pmdval and we won't have to update each and every callsite?
>>>
>>> pte_offset_map_nolock() had already obtained the pmdval previously, just
>>> hadn't returned it. And updating those callsite is simple, so I think
>>> there may not be a need to add a separate function.
>>
>> Let me ask this way: why is retract_page_tables() and
>> reclaim_pgtables_pmd_entry() different to the other ones, and how would
>> someone using pte_offset_map_nolock() know what's to do here?
> 
> If we acuqire the PTL (PTE or PMD lock) after calling
> pte_offset_map_nolock(), it means we may be modifying the corresponding
> pte or pmd entry. In that case, we need to perform a pmd_same() check
> after holding the PTL, just like in pte_offset_map_lock(), to prevent
> the possibility of the PTE page being reclaimed at that time.

Okay, what I thought.

> 
> If we call pte_offset_map_nolock() and do not need to acquire the PTL
> afterwards, it means we are only reading the PTE page. In this case, the
> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE page
> cannot be reclaimed.
> 
>>
>> IIUC, we must check the PMDVAL after taking the PTL in case
>>
>> (a) we want to modify the page table to turn pte_none() entries to
>>       !pte_none(). Because it could be that the page table was removed and
>>       now is all pte_none()
>>
>> (b) we want to remove the page table ourselves and want to check if it
>>       has already been removed.
>>
>> Is that it?
> 
> Yes.
> 
>>
>> So my thinking is if another function variant can make that clearer.
> 
> OK, how about naming it pte_offset_map_before_lock?

That's the issue with some of the code: for example in 
filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and 
conditionally take the PTL. But we won't be modifying the pages tables.

Maybe something like:

pte_offset_map_readonly_nolock()

and

pte_offset_map_maywrite_nolock()

The latter would require you to pass the PMD pointer such that you have 
to really mess up to ignore what to do with it (check PMD same or not 
check PMD same if you really know what you are douing).

The first would not take a PMD pointer at all, because there is no need to.
Qi Zheng Aug. 12, 2024, 6:21 a.m. UTC | #5
Hi David,

On 2024/8/10 00:54, David Hildenbrand wrote:
> On 07.08.24 05:08, Qi Zheng wrote:
>> Hi David,
>>
>> On 2024/8/6 22:16, David Hildenbrand wrote:
>>> On 06.08.24 04:40, Qi Zheng wrote:
>>>> Hi David,
>>>>
>>>> On 2024/8/5 22:43, David Hildenbrand wrote:
>>>>> On 05.08.24 14:55, Qi Zheng wrote:
>>>>>> Make pte_offset_map_nolock() return pmdval so that we can recheck the
>>>>>> *pmd once the lock is taken. This is a preparation for freeing empty
>>>>>> PTE pages, no functional changes are expected.
>>>>>
>>>>> Skimming the patches, only patch #4 updates one of the callsites
>>>>> (collapse_pte_mapped_thp).
>>>>
>>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
>>>> also used the pmdval returned by pte_offset_map_nolock().
>>>
>>> Right, and I am questioning if only touching these two is sufficient,
>>> and how we can make it clearer when someone actually has to recheck the
>>> PMD.
>>>
>>>>
>>>>>
>>>>> Wouldn't we have to recheck if the PMD val changed in more cases after
>>>>> taking the PTL?
>>>>>
>>>>> If not, would it make sense to have a separate function that 
>>>>> returns the
>>>>> pmdval and we won't have to update each and every callsite?
>>>>
>>>> pte_offset_map_nolock() had already obtained the pmdval previously, 
>>>> just
>>>> hadn't returned it. And updating those callsite is simple, so I think
>>>> there may not be a need to add a separate function.
>>>
>>> Let me ask this way: why is retract_page_tables() and
>>> reclaim_pgtables_pmd_entry() different to the other ones, and how would
>>> someone using pte_offset_map_nolock() know what's to do here?
>>
>> If we acuqire the PTL (PTE or PMD lock) after calling
>> pte_offset_map_nolock(), it means we may be modifying the corresponding
>> pte or pmd entry. In that case, we need to perform a pmd_same() check
>> after holding the PTL, just like in pte_offset_map_lock(), to prevent
>> the possibility of the PTE page being reclaimed at that time.
> 
> Okay, what I thought.
> 
>>
>> If we call pte_offset_map_nolock() and do not need to acquire the PTL
>> afterwards, it means we are only reading the PTE page. In this case, the
>> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE page
>> cannot be reclaimed.
>>
>>>
>>> IIUC, we must check the PMDVAL after taking the PTL in case
>>>
>>> (a) we want to modify the page table to turn pte_none() entries to
>>>       !pte_none(). Because it could be that the page table was 
>>> removed and
>>>       now is all pte_none()
>>>
>>> (b) we want to remove the page table ourselves and want to check if it
>>>       has already been removed.
>>>
>>> Is that it?
>>
>> Yes.
>>
>>>
>>> So my thinking is if another function variant can make that clearer.
>>
>> OK, how about naming it pte_offset_map_before_lock?
> 
> That's the issue with some of the code: for example in 
> filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and 
> conditionally take the PTL. But we won't be modifying the pages tables.
> 
> Maybe something like:
> 
> pte_offset_map_readonly_nolock()
> 
> and
> 
> pte_offset_map_maywrite_nolock()
> 
> The latter would require you to pass the PMD pointer such that you have 
> to really mess up to ignore what to do with it (check PMD same or not 
> check PMD same if you really know what you are douing).
> 
> The first would not take a PMD pointer at all, because there is no need to.

These two function names LGTM. Will do in the next version.

Thanks,
Qi

>
David Hildenbrand Aug. 16, 2024, 8:59 a.m. UTC | #6
On 12.08.24 08:21, Qi Zheng wrote:
> Hi David,
> 
> On 2024/8/10 00:54, David Hildenbrand wrote:
>> On 07.08.24 05:08, Qi Zheng wrote:
>>> Hi David,
>>>
>>> On 2024/8/6 22:16, David Hildenbrand wrote:
>>>> On 06.08.24 04:40, Qi Zheng wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2024/8/5 22:43, David Hildenbrand wrote:
>>>>>> On 05.08.24 14:55, Qi Zheng wrote:
>>>>>>> Make pte_offset_map_nolock() return pmdval so that we can recheck the
>>>>>>> *pmd once the lock is taken. This is a preparation for freeing empty
>>>>>>> PTE pages, no functional changes are expected.
>>>>>>
>>>>>> Skimming the patches, only patch #4 updates one of the callsites
>>>>>> (collapse_pte_mapped_thp).
>>>>>
>>>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
>>>>> also used the pmdval returned by pte_offset_map_nolock().
>>>>
>>>> Right, and I am questioning if only touching these two is sufficient,
>>>> and how we can make it clearer when someone actually has to recheck the
>>>> PMD.
>>>>
>>>>>
>>>>>>
>>>>>> Wouldn't we have to recheck if the PMD val changed in more cases after
>>>>>> taking the PTL?
>>>>>>
>>>>>> If not, would it make sense to have a separate function that
>>>>>> returns the
>>>>>> pmdval and we won't have to update each and every callsite?
>>>>>
>>>>> pte_offset_map_nolock() had already obtained the pmdval previously,
>>>>> just
>>>>> hadn't returned it. And updating those callsite is simple, so I think
>>>>> there may not be a need to add a separate function.
>>>>
>>>> Let me ask this way: why is retract_page_tables() and
>>>> reclaim_pgtables_pmd_entry() different to the other ones, and how would
>>>> someone using pte_offset_map_nolock() know what's to do here?
>>>
>>> If we acuqire the PTL (PTE or PMD lock) after calling
>>> pte_offset_map_nolock(), it means we may be modifying the corresponding
>>> pte or pmd entry. In that case, we need to perform a pmd_same() check
>>> after holding the PTL, just like in pte_offset_map_lock(), to prevent
>>> the possibility of the PTE page being reclaimed at that time.
>>
>> Okay, what I thought.
>>
>>>
>>> If we call pte_offset_map_nolock() and do not need to acquire the PTL
>>> afterwards, it means we are only reading the PTE page. In this case, the
>>> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE page
>>> cannot be reclaimed.
>>>
>>>>
>>>> IIUC, we must check the PMDVAL after taking the PTL in case
>>>>
>>>> (a) we want to modify the page table to turn pte_none() entries to
>>>>        !pte_none(). Because it could be that the page table was
>>>> removed and
>>>>        now is all pte_none()
>>>>
>>>> (b) we want to remove the page table ourselves and want to check if it
>>>>        has already been removed.
>>>>
>>>> Is that it?
>>>
>>> Yes.
>>>
>>>>
>>>> So my thinking is if another function variant can make that clearer.
>>>
>>> OK, how about naming it pte_offset_map_before_lock?
>>
>> That's the issue with some of the code: for example in
>> filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and
>> conditionally take the PTL. But we won't be modifying the pages tables.
>>
>> Maybe something like:
>>
>> pte_offset_map_readonly_nolock()
>>
>> and
>>
>> pte_offset_map_maywrite_nolock()
>>
>> The latter would require you to pass the PMD pointer such that you have
>> to really mess up to ignore what to do with it (check PMD same or not
>> check PMD same if you really know what you are douing).
>>
>> The first would not take a PMD pointer at all, because there is no need to.
> 
> These two function names LGTM. Will do in the next version.

That is probably something you can send as a separate patch independent 
of this full series.

Then we might also get more review+thoughts from other folks!
Qi Zheng Aug. 16, 2024, 9:21 a.m. UTC | #7
Hi David,

On 2024/8/16 16:59, David Hildenbrand wrote:
> On 12.08.24 08:21, Qi Zheng wrote:
>> Hi David,
>>
>> On 2024/8/10 00:54, David Hildenbrand wrote:
>>> On 07.08.24 05:08, Qi Zheng wrote:
>>>> Hi David,
>>>>
>>>> On 2024/8/6 22:16, David Hildenbrand wrote:
>>>>> On 06.08.24 04:40, Qi Zheng wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 2024/8/5 22:43, David Hildenbrand wrote:
>>>>>>> On 05.08.24 14:55, Qi Zheng wrote:
>>>>>>>> Make pte_offset_map_nolock() return pmdval so that we can 
>>>>>>>> recheck the
>>>>>>>> *pmd once the lock is taken. This is a preparation for freeing 
>>>>>>>> empty
>>>>>>>> PTE pages, no functional changes are expected.
>>>>>>>
>>>>>>> Skimming the patches, only patch #4 updates one of the callsites
>>>>>>> (collapse_pte_mapped_thp).
>>>>>>
>>>>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
>>>>>> also used the pmdval returned by pte_offset_map_nolock().
>>>>>
>>>>> Right, and I am questioning if only touching these two is sufficient,
>>>>> and how we can make it clearer when someone actually has to recheck 
>>>>> the
>>>>> PMD.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Wouldn't we have to recheck if the PMD val changed in more cases 
>>>>>>> after
>>>>>>> taking the PTL?
>>>>>>>
>>>>>>> If not, would it make sense to have a separate function that
>>>>>>> returns the
>>>>>>> pmdval and we won't have to update each and every callsite?
>>>>>>
>>>>>> pte_offset_map_nolock() had already obtained the pmdval previously,
>>>>>> just
>>>>>> hadn't returned it. And updating those callsite is simple, so I think
>>>>>> there may not be a need to add a separate function.
>>>>>
>>>>> Let me ask this way: why is retract_page_tables() and
>>>>> reclaim_pgtables_pmd_entry() different to the other ones, and how 
>>>>> would
>>>>> someone using pte_offset_map_nolock() know what's to do here?
>>>>
>>>> If we acuqire the PTL (PTE or PMD lock) after calling
>>>> pte_offset_map_nolock(), it means we may be modifying the corresponding
>>>> pte or pmd entry. In that case, we need to perform a pmd_same() check
>>>> after holding the PTL, just like in pte_offset_map_lock(), to prevent
>>>> the possibility of the PTE page being reclaimed at that time.
>>>
>>> Okay, what I thought.
>>>
>>>>
>>>> If we call pte_offset_map_nolock() and do not need to acquire the PTL
>>>> afterwards, it means we are only reading the PTE page. In this case, 
>>>> the
>>>> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE 
>>>> page
>>>> cannot be reclaimed.
>>>>
>>>>>
>>>>> IIUC, we must check the PMDVAL after taking the PTL in case
>>>>>
>>>>> (a) we want to modify the page table to turn pte_none() entries to
>>>>>        !pte_none(). Because it could be that the page table was
>>>>> removed and
>>>>>        now is all pte_none()
>>>>>
>>>>> (b) we want to remove the page table ourselves and want to check if it
>>>>>        has already been removed.
>>>>>
>>>>> Is that it?
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> So my thinking is if another function variant can make that clearer.
>>>>
>>>> OK, how about naming it pte_offset_map_before_lock?
>>>
>>> That's the issue with some of the code: for example in
>>> filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and
>>> conditionally take the PTL. But we won't be modifying the pages tables.
>>>
>>> Maybe something like:
>>>
>>> pte_offset_map_readonly_nolock()
>>>
>>> and
>>>
>>> pte_offset_map_maywrite_nolock()
>>>
>>> The latter would require you to pass the PMD pointer such that you have
>>> to really mess up to ignore what to do with it (check PMD same or not
>>> check PMD same if you really know what you are douing).
>>>
>>> The first would not take a PMD pointer at all, because there is no 
>>> need to.
>>
>> These two function names LGTM. Will do in the next version.
> 
> That is probably something you can send as a separate patch independent 
> of this full series.

I think it makes sense. I am analyzing whether the existing path of
calling pte_offset_map_nolock + spin_lock will be concurrent with
the path of reclaiming PTE pages in THP. If so, it actually needs to
be fixed first.

> 
> Then we might also get more review+thoughts from other folks!

I hope so!

Thanks,
Qi

>
diff mbox series

Patch

diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst
index e4f6972eb6c04..e6a47d57531cd 100644
--- a/Documentation/mm/split_page_table_lock.rst
+++ b/Documentation/mm/split_page_table_lock.rst
@@ -18,7 +18,8 @@  There are helpers to lock/unlock a table and other accessor functions:
 	pointer to its PTE table lock, or returns NULL if no PTE table;
  - pte_offset_map_nolock()
 	maps PTE, returns pointer to PTE with pointer to its PTE table
-	lock (not taken), or returns NULL if no PTE table;
+	lock (not taken) and the value of its pmd entry, or returns NULL
+	if no PTE table;
  - pte_offset_map()
 	maps PTE, returns pointer to PTE, or returns NULL if no PTE table;
  - pte_unmap()
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 831793cd6ff94..db07e6a05eb6e 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -117,7 +117,7 @@  static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
 	 * must use the nested version.  This also means we need to
 	 * open-code the spin-locking.
 	 */
-	pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
+	pte = pte_offset_map_nolock(vma->vm_mm, pmd, NULL, address, &ptl);
 	if (!pte)
 		return 0;
 
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 7316396e452d8..9b67d2a1457ed 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -398,7 +398,7 @@  void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
 	 */
 	if (pmd_none(*pmd))
 		return;
-	pte = pte_offset_map_nolock(mm, pmd, addr, &ptl);
+	pte = pte_offset_map_nolock(mm, pmd, NULL, addr, &ptl);
 	BUG_ON(!pte);
 	assert_spin_locked(ptl);
 	pte_unmap(pte);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 43b40334e9b28..b1ef2afe620c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2937,8 +2937,8 @@  static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd,
 	return pte;
 }
 
-pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, spinlock_t **ptlp);
+pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdvalp,
+			     unsigned long addr, spinlock_t **ptlp);
 
 #define pte_unmap_unlock(pte, ptl)	do {		\
 	spin_unlock(ptl);				\
diff --git a/mm/filemap.c b/mm/filemap.c
index 67c3f5136db33..3285dffb64cf8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3231,7 +3231,7 @@  static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
 	if (!(vmf->flags & FAULT_FLAG_ORIG_PTE_VALID))
 		return 0;
 
-	ptep = pte_offset_map_nolock(vma->vm_mm, vmf->pmd, vmf->address,
+	ptep = pte_offset_map_nolock(vma->vm_mm, vmf->pmd, NULL, vmf->address,
 				     &vmf->ptl);
 	if (unlikely(!ptep))
 		return VM_FAULT_NOPAGE;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cdd1d8655a76b..91b93259ee214 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1009,7 +1009,7 @@  static int __collapse_huge_page_swapin(struct mm_struct *mm,
 		};
 
 		if (!pte++) {
-			pte = pte_offset_map_nolock(mm, pmd, address, &ptl);
+			pte = pte_offset_map_nolock(mm, pmd, NULL, address, &ptl);
 			if (!pte) {
 				mmap_read_unlock(mm);
 				result = SCAN_PMD_NULL;
@@ -1598,7 +1598,7 @@  int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
 	if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED))
 		pml = pmd_lock(mm, pmd);
 
-	start_pte = pte_offset_map_nolock(mm, pmd, haddr, &ptl);
+	start_pte = pte_offset_map_nolock(mm, pmd, NULL, haddr, &ptl);
 	if (!start_pte)		/* mmap_lock + page lock should prevent this */
 		goto abort;
 	if (!pml)
diff --git a/mm/memory.c b/mm/memory.c
index d6a9dcddaca4a..afd8a967fb953 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1108,7 +1108,7 @@  copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		ret = -ENOMEM;
 		goto out;
 	}
-	src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl);
+	src_pte = pte_offset_map_nolock(src_mm, src_pmd, NULL, addr, &src_ptl);
 	if (!src_pte) {
 		pte_unmap_unlock(dst_pte, dst_ptl);
 		/* ret == 0 */
@@ -5671,7 +5671,7 @@  static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 		 * it into a huge pmd: just retry later if so.
 		 */
 		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
-						 vmf->address, &vmf->ptl);
+						 NULL, vmf->address, &vmf->ptl);
 		if (unlikely(!vmf->pte))
 			return 0;
 		vmf->orig_pte = ptep_get_lockless(vmf->pte);
diff --git a/mm/mremap.c b/mm/mremap.c
index e7ae140fc6409..f672d0218a6fe 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -175,7 +175,7 @@  static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		err = -EAGAIN;
 		goto out;
 	}
-	new_pte = pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl);
+	new_pte = pte_offset_map_nolock(mm, new_pmd, NULL, new_addr, &new_ptl);
 	if (!new_pte) {
 		pte_unmap_unlock(old_pte, old_ptl);
 		err = -EAGAIN;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index ae5cc42aa2087..507701b7bcc1e 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -33,7 +33,7 @@  static bool map_pte(struct page_vma_mapped_walk *pvmw, spinlock_t **ptlp)
 	 * Though, in most cases, page lock already protects this.
 	 */
 	pvmw->pte = pte_offset_map_nolock(pvmw->vma->vm_mm, pvmw->pmd,
-					  pvmw->address, ptlp);
+					  NULL, pvmw->address, ptlp);
 	if (!pvmw->pte)
 		return false;
 
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index a78a4adf711ac..443e3b34434a5 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -305,7 +305,7 @@  pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
 	return NULL;
 }
 
-pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
+pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdvalp,
 			     unsigned long addr, spinlock_t **ptlp)
 {
 	pmd_t pmdval;
@@ -314,6 +314,8 @@  pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
 	pte = __pte_offset_map(pmd, addr, &pmdval);
 	if (likely(pte))
 		*ptlp = pte_lockptr(mm, &pmdval);
+	if (pmdvalp)
+		*pmdvalp = pmdval;
 	return pte;
 }
 
@@ -347,14 +349,15 @@  pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd,
  * and disconnected table.  Until pte_unmap(pte) unmaps and rcu_read_unlock()s
  * afterwards.
  *
- * pte_offset_map_nolock(mm, pmd, addr, ptlp), above, is like pte_offset_map();
- * but when successful, it also outputs a pointer to the spinlock in ptlp - as
- * pte_offset_map_lock() does, but in this case without locking it.  This helps
- * the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time
- * act on a changed *pmd: pte_offset_map_nolock() provides the correct spinlock
- * pointer for the page table that it returns.  In principle, the caller should
- * recheck *pmd once the lock is taken; in practice, no callsite needs that -
- * either the mmap_lock for write, or pte_same() check on contents, is enough.
+ * pte_offset_map_nolock(mm, pmd, pmdvalp, addr, ptlp), above, is like
+ * pte_offset_map(); but when successful, it also outputs a pointer to the
+ * spinlock in ptlp - as pte_offset_map_lock() does, but in this case without
+ * locking it.  This helps the caller to avoid a later pte_lockptr(mm, *pmd),
+ * which might by that time act on a changed *pmd: pte_offset_map_nolock()
+ * provides the correct spinlock pointer for the page table that it returns.
+ * In principle, the caller should recheck *pmd once the lock is taken; But in
+ * most cases, either the mmap_lock for write, or pte_same() check on contents,
+ * is enough.
  *
  * Note that free_pgtables(), used after unmapping detached vmas, or when
  * exiting the whole mm, does not take page table lock before freeing a page
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 3b7715ecf292a..aa3c9cc51cc36 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1143,7 +1143,7 @@  static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
 				src_addr, src_addr + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 retry:
-	dst_pte = pte_offset_map_nolock(mm, dst_pmd, dst_addr, &dst_ptl);
+	dst_pte = pte_offset_map_nolock(mm, dst_pmd, NULL, dst_addr, &dst_ptl);
 
 	/* Retry if a huge pmd materialized from under us */
 	if (unlikely(!dst_pte)) {
@@ -1151,7 +1151,7 @@  static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
 		goto out;
 	}
 
-	src_pte = pte_offset_map_nolock(mm, src_pmd, src_addr, &src_ptl);
+	src_pte = pte_offset_map_nolock(mm, src_pmd, NULL, src_addr, &src_ptl);
 
 	/*
 	 * We held the mmap_lock for reading so MADV_DONTNEED
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 31d13462571e6..b00cd560c0e43 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3378,7 +3378,7 @@  static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 	DEFINE_MAX_SEQ(walk->lruvec);
 	int old_gen, new_gen = lru_gen_from_seq(max_seq);
 
-	pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl);
+	pte = pte_offset_map_nolock(args->mm, pmd, NULL, start & PMD_MASK, &ptl);
 	if (!pte)
 		return false;
 	if (!spin_trylock(ptl)) {