Message ID | 5acabedfae7ded01b075960b4a91f2e15b4d76b5.1724310149.git.zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | introduce pte_offset_map_{ro|rw}_nolock() | expand |
On 22.08.24 09:13, Qi Zheng wrote: > In handle_pte_fault(), we may modify the vmf->pte after acquiring the > vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we > will do the pte_same() check, so there is no need to get pmdval to do > pmd_same() check, just pass a dummy variable to it. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > mm/memory.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 93c0c25433d02..7b6071a0e21e2 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > vmf->pte = NULL; > vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID; > } else { > + pmd_t dummy_pmdval; > + > /* > * A regular pmd is established and it can't morph into a huge > * pmd by anon khugepaged, since that takes mmap_lock in write > * mode; but shmem or file collapse to THP could still morph > * it into a huge pmd: just retry later if so. > + * > + * Use the maywrite version to indicate that vmf->pte will be > + * modified, but since we will use pte_same() to detect the > + * change of the pte entry, there is no need to get pmdval, so > + * just pass a dummy variable to it. > */ > - vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd, > + vmf->address, &dummy_pmdval, > + &vmf->ptl); > if (unlikely(!vmf->pte)) > return 0; > vmf->orig_pte = ptep_get_lockless(vmf->pte); No I understand why we don't need the PMD val in these cases ... the PTE would also be pte_none() at the point the page table is freed, so we would detect the change as well. I do enjoy documenting why we use a dummy value, though. Likely without that, new users will just pass NULL and call it a day. Acked-by: David Hildenbrand <david@redhat.com>
On 2024/8/26 23:36, David Hildenbrand wrote: > On 22.08.24 09:13, Qi Zheng wrote: >> In handle_pte_fault(), we may modify the vmf->pte after acquiring the >> vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we >> will do the pte_same() check, so there is no need to get pmdval to do >> pmd_same() check, just pass a dummy variable to it. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> mm/memory.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 93c0c25433d02..7b6071a0e21e2 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct >> vm_fault *vmf) >> vmf->pte = NULL; >> vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID; >> } else { >> + pmd_t dummy_pmdval; >> + >> /* >> * A regular pmd is established and it can't morph into a huge >> * pmd by anon khugepaged, since that takes mmap_lock in write >> * mode; but shmem or file collapse to THP could still morph >> * it into a huge pmd: just retry later if so. >> + * >> + * Use the maywrite version to indicate that vmf->pte will be >> + * modified, but since we will use pte_same() to detect the >> + * change of the pte entry, there is no need to get pmdval, so >> + * just pass a dummy variable to it. >> */ >> - vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, >> - vmf->address, &vmf->ptl); >> + vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd, >> + vmf->address, &dummy_pmdval, >> + &vmf->ptl); >> if (unlikely(!vmf->pte)) >> return 0; >> vmf->orig_pte = ptep_get_lockless(vmf->pte); > > No I understand why we don't need the PMD val in these cases ... the PTE > would also be pte_none() at the point the page table is freed, so we > would detect the change as well. Yes. > > I do enjoy documenting why we use a dummy value, though. Likely without > that, new users will just pass NULL and call it a day. OK, how about the following: Use the maywrite version to indicate that vmf->pte will be modified, but since we will use pte_same() to detect the change of the !pte_none() entry, there is no need to recheck the pmdval. Here we chooes to pass a dummy variable instead of NULL, which helps new user think about why this place is special. > > Acked-by: David Hildenbrand <david@redhat.com> Thanks! >
> On Aug 22, 2024, at 15:13, Qi Zheng <zhengqi.arch@bytedance.com> wrote: > > In handle_pte_fault(), we may modify the vmf->pte after acquiring the > vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we > will do the pte_same() check, so there is no need to get pmdval to do > pmd_same() check, just pass a dummy variable to it. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> A nit below. > --- > mm/memory.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 93c0c25433d02..7b6071a0e21e2 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > vmf->pte = NULL; > vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID; > } else { > + pmd_t dummy_pmdval; > + > /* > * A regular pmd is established and it can't morph into a huge > * pmd by anon khugepaged, since that takes mmap_lock in write > * mode; but shmem or file collapse to THP could still morph > * it into a huge pmd: just retry later if so. > + * > + * Use the maywrite version to indicate that vmf->pte will be Not "will be", should be "may be". > + * modified, but since we will use pte_same() to detect the > + * change of the pte entry, there is no need to get pmdval, so > + * just pass a dummy variable to it. > */ > - vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd, > + vmf->address, &dummy_pmdval, > + &vmf->ptl); > if (unlikely(!vmf->pte)) > return 0; > vmf->orig_pte = ptep_get_lockless(vmf->pte); > -- > 2.20.1 >
diff --git a/mm/memory.c b/mm/memory.c index 93c0c25433d02..7b6071a0e21e2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) vmf->pte = NULL; vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID; } else { + pmd_t dummy_pmdval; + /* * A regular pmd is established and it can't morph into a huge * pmd by anon khugepaged, since that takes mmap_lock in write * mode; but shmem or file collapse to THP could still morph * it into a huge pmd: just retry later if so. + * + * Use the maywrite version to indicate that vmf->pte will be + * modified, but since we will use pte_same() to detect the + * change of the pte entry, there is no need to get pmdval, so + * just pass a dummy variable to it. */ - vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); + vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &dummy_pmdval, + &vmf->ptl); if (unlikely(!vmf->pte)) return 0; vmf->orig_pte = ptep_get_lockless(vmf->pte);
In handle_pte_fault(), we may modify the vmf->pte after acquiring the vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we will do the pte_same() check, so there is no need to get pmdval to do pmd_same() check, just pass a dummy variable to it. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/memory.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)