Message ID | 20240904084022.32728-9-zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | introduce pte_offset_map_{ro|rw}_nolock() | expand |
On 2024/9/4 16:40, Qi Zheng wrote: > In copy_pte_range(), we may modify the src_pte entry after holding the > src_ptl, so convert it to using pte_offset_map_rw_nolock(). Since we may > free the PTE page in retract_page_tables() without holding the read lock > of mmap_lock, so we still need to get pmdval and do pmd_same() check after > the ptl is held. See commit 3db82b9374ca92, copy_pte_range and retract_page_tables are using vma->anon_vma to be exclusive. retract_page_tables() copy_page_range() vma_interval_tree_foreach() if (!vma_needs_copy()) if (READ_ONCE(vma->anon_vma)) return 0; continue; copy_pte_range() So I think mmap write lock here is also used for keeping ->anon_vma stable. And we do not need pmd_same(). Muchun, Thanks. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > Hi Muchun, since the code has changed, I dropped your Reviewed-by tag here. > > mm/memory.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 06674f94b7a4e..47974cc4bd7f2 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1082,6 +1082,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, > struct mm_struct *src_mm = src_vma->vm_mm; > pte_t *orig_src_pte, *orig_dst_pte; > pte_t *src_pte, *dst_pte; > + pmd_t pmdval; > pte_t ptent; > spinlock_t *src_ptl, *dst_ptl; > int progress, max_nr, ret = 0; > @@ -1107,13 +1108,28 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, > ret = -ENOMEM; > goto out; > } > - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); > + > + /* > + * Since we may free the PTE page in retract_page_tables() without > + * holding the read lock of mmap_lock, so we still need to do a > + * pmd_same() check after holding the PTL. > + */ > + src_pte = pte_offset_map_rw_nolock(src_mm, src_pmd, addr, &pmdval, > + &src_ptl); > if (!src_pte) { > pte_unmap_unlock(dst_pte, dst_ptl); > /* ret == 0 */ > goto out; > } > spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); > + > + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(src_pmd)))) { > + pte_unmap_unlock(src_pte, src_ptl); > + pte_unmap_unlock(dst_pte, dst_ptl); > + /* ret == 0 */ > + goto out; > + } > + > orig_src_pte = src_pte; > orig_dst_pte = dst_pte; > arch_enter_lazy_mmu_mode();
On 2024/9/5 16:57, Muchun Song wrote: > > > On 2024/9/4 16:40, Qi Zheng wrote: >> In copy_pte_range(), we may modify the src_pte entry after holding the >> src_ptl, so convert it to using pte_offset_map_rw_nolock(). Since we may >> free the PTE page in retract_page_tables() without holding the read lock >> of mmap_lock, so we still need to get pmdval and do pmd_same() check >> after >> the ptl is held. > > See commit 3db82b9374ca92, copy_pte_range and retract_page_tables > are using vma->anon_vma to be exclusive. > > retract_page_tables() copy_page_range() > vma_interval_tree_foreach() if (!vma_needs_copy()) > if (READ_ONCE(vma->anon_vma)) return 0; > continue; copy_pte_range() > > So I think mmap write lock here is also used for keeping ->anon_vma stable. > And we do not need pmd_same(). Indeed, will change it in v4. Thanks! > > Muchun, > Thanks. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> Hi Muchun, since the code has changed, I dropped your Reviewed-by tag >> here. >> >> mm/memory.c | 18 +++++++++++++++++- >> 1 file changed, 17 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 06674f94b7a4e..47974cc4bd7f2 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1082,6 +1082,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, >> struct vm_area_struct *src_vma, >> struct mm_struct *src_mm = src_vma->vm_mm; >> pte_t *orig_src_pte, *orig_dst_pte; >> pte_t *src_pte, *dst_pte; >> + pmd_t pmdval; >> pte_t ptent; >> spinlock_t *src_ptl, *dst_ptl; >> int progress, max_nr, ret = 0; >> @@ -1107,13 +1108,28 @@ copy_pte_range(struct vm_area_struct *dst_vma, >> struct vm_area_struct *src_vma, >> ret = -ENOMEM; >> goto out; >> } >> - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); >> + >> + /* >> + * Since we may free the PTE page in retract_page_tables() without >> + * holding the read lock of mmap_lock, so we still need to do a >> + * pmd_same() check after holding the PTL. >> + */ >> + src_pte = pte_offset_map_rw_nolock(src_mm, src_pmd, addr, &pmdval, >> + &src_ptl); >> if (!src_pte) { >> pte_unmap_unlock(dst_pte, dst_ptl); >> /* ret == 0 */ >> goto out; >> } >> spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); >> + >> + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(src_pmd)))) { >> + pte_unmap_unlock(src_pte, src_ptl); >> + pte_unmap_unlock(dst_pte, dst_ptl); >> + /* ret == 0 */ >> + goto out; >> + } >> + >> orig_src_pte = src_pte; >> orig_dst_pte = dst_pte; >> arch_enter_lazy_mmu_mode(); >
diff --git a/mm/memory.c b/mm/memory.c index 06674f94b7a4e..47974cc4bd7f2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1082,6 +1082,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, struct mm_struct *src_mm = src_vma->vm_mm; pte_t *orig_src_pte, *orig_dst_pte; pte_t *src_pte, *dst_pte; + pmd_t pmdval; pte_t ptent; spinlock_t *src_ptl, *dst_ptl; int progress, max_nr, ret = 0; @@ -1107,13 +1108,28 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, ret = -ENOMEM; goto out; } - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + + /* + * Since we may free the PTE page in retract_page_tables() without + * holding the read lock of mmap_lock, so we still need to do a + * pmd_same() check after holding the PTL. + */ + src_pte = pte_offset_map_rw_nolock(src_mm, src_pmd, addr, &pmdval, + &src_ptl); if (!src_pte) { pte_unmap_unlock(dst_pte, dst_ptl); /* ret == 0 */ goto out; } spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(src_pmd)))) { + pte_unmap_unlock(src_pte, src_ptl); + pte_unmap_unlock(dst_pte, dst_ptl); + /* ret == 0 */ + goto out; + } + orig_src_pte = src_pte; orig_dst_pte = dst_pte; arch_enter_lazy_mmu_mode();
In copy_pte_range(), we may modify the src_pte entry after holding the src_ptl, so convert it to using pte_offset_map_rw_nolock(). Since we may free the PTE page in retract_page_tables() without holding the read lock of mmap_lock, so we still need to get pmdval and do pmd_same() check after the ptl is held. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- Hi Muchun, since the code has changed, I dropped your Reviewed-by tag here. mm/memory.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)