Message ID | 574bc9b646c87d878a5048edb63698a1f8483e10.1731566457.git.zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | synchronously scan and reclaim empty user PTE pages | expand |
> static unsigned long zap_pte_range(struct mmu_gather *tlb, > struct vm_area_struct *vma, pmd_t *pmd, > unsigned long addr, unsigned long end, > @@ -1682,13 +1704,17 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > pte_t ptent = ptep_get(pte); > int max_nr; > > - nr = 1; > - if (pte_none(ptent)) > - continue; > - > if (need_resched()) > break; > > + nr = skip_none_ptes(pte, addr, end); > + if (nr) { > + addr += PAGE_SIZE * nr; > + if (addr == end) > + break; > + pte += nr; > + } > + > max_nr = (end - addr) / PAGE_SIZE; I dislike calculating max_nr twice, once here and once in skip_non_ptes. Further, you're missing to update ptent here. If you inline it you can avoid another ptep_get(). > if (pte_present(ptent)) { > nr = zap_present_ptes(tlb, vma, pte, ptent, max_nr,
On 2024/11/14 16:04, David Hildenbrand wrote: > >> static unsigned long zap_pte_range(struct mmu_gather *tlb, >> struct vm_area_struct *vma, pmd_t *pmd, >> unsigned long addr, unsigned long end, >> @@ -1682,13 +1704,17 @@ static unsigned long zap_pte_range(struct >> mmu_gather *tlb, >> pte_t ptent = ptep_get(pte); >> int max_nr; >> - nr = 1; >> - if (pte_none(ptent)) >> - continue; >> - >> if (need_resched()) >> break; >> + nr = skip_none_ptes(pte, addr, end); >> + if (nr) { >> + addr += PAGE_SIZE * nr; >> + if (addr == end) >> + break; >> + pte += nr; >> + } >> + >> max_nr = (end - addr) / PAGE_SIZE; > > I dislike calculating max_nr twice, once here and once in skip_non_ptes. > > Further, you're missing to update ptent here. Oh, my bad. However, with [PATCH v3 5/9], there will be no problem, but there are still two ptep_get() and max_nr calculation. If you inline it you can > avoid another ptep_get(). Do you mean to inline the skip_none_ptes() into do_zap_pte_range()? This will cause these none ptes to be checked again in count_pte_none() when PTE_MARKER_UFFD_WP is enabled. Of course, maybe we can count none ptes in do_zap_pte_range() and pass it through function parameters like force_break. > >> if (pte_present(ptent)) { >> nr = zap_present_ptes(tlb, vma, pte, ptent, max_nr, > >
On 14.11.24 10:20, Qi Zheng wrote: > > > On 2024/11/14 16:04, David Hildenbrand wrote: >> >>> static unsigned long zap_pte_range(struct mmu_gather *tlb, >>> struct vm_area_struct *vma, pmd_t *pmd, >>> unsigned long addr, unsigned long end, >>> @@ -1682,13 +1704,17 @@ static unsigned long zap_pte_range(struct >>> mmu_gather *tlb, >>> pte_t ptent = ptep_get(pte); >>> int max_nr; >>> - nr = 1; >>> - if (pte_none(ptent)) >>> - continue; >>> - >>> if (need_resched()) >>> break; >>> + nr = skip_none_ptes(pte, addr, end); >>> + if (nr) { >>> + addr += PAGE_SIZE * nr; >>> + if (addr == end) >>> + break; >>> + pte += nr; >>> + } >>> + >>> max_nr = (end - addr) / PAGE_SIZE; >> >> I dislike calculating max_nr twice, once here and once in skip_non_ptes. >> >> Further, you're missing to update ptent here. > > Oh, my bad. However, with [PATCH v3 5/9], there will be no problem, but > there are still two ptep_get() and max_nr calculation. > > If you inline it you can >> avoid another ptep_get(). > > Do you mean to inline the skip_none_ptes() into do_zap_pte_range()? Effectively moving this patch after #5, and have it be something like: diff --git a/mm/memory.c b/mm/memory.c index 1949f5e0fece5..4f5d1e4c6688e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1667,8 +1667,21 @@ static inline int do_zap_pte_range(struct mmu_gather *tlb, pte_t ptent = ptep_get(pte); int max_nr = (end - addr) / PAGE_SIZE; - if (pte_none(ptent)) - return 1; + /* Skip all consecutive pte_none(). */ + if (pte_none(ptent)) { + int nr; + + for (nr = 1; nr < max_nr; nr++) { + ptent = ptep_get(pte + nr); + if (!pte_none(ptent)) + break; + } + max_nr -= nr; + if (!max_nr) + return nr; + pte += nr; + addr += nr * PAGE_SIZE; + } if (pte_present(ptent)) return zap_present_ptes(tlb, vma, pte, ptent, max_nr, In the context of this patch this makes most sense. Regarding "count_pte_none" comment, I assume you talk about patch #7. Can't you simply return the number of pte_none that you skipped here using another input variable, if really required?
On 2024/11/14 20:32, David Hildenbrand wrote: > On 14.11.24 10:20, Qi Zheng wrote: >> >> >> On 2024/11/14 16:04, David Hildenbrand wrote: >>> >>>> static unsigned long zap_pte_range(struct mmu_gather *tlb, >>>> struct vm_area_struct *vma, pmd_t *pmd, >>>> unsigned long addr, unsigned long end, >>>> @@ -1682,13 +1704,17 @@ static unsigned long zap_pte_range(struct >>>> mmu_gather *tlb, >>>> pte_t ptent = ptep_get(pte); >>>> int max_nr; >>>> - nr = 1; >>>> - if (pte_none(ptent)) >>>> - continue; >>>> - >>>> if (need_resched()) >>>> break; >>>> + nr = skip_none_ptes(pte, addr, end); >>>> + if (nr) { >>>> + addr += PAGE_SIZE * nr; >>>> + if (addr == end) >>>> + break; >>>> + pte += nr; >>>> + } >>>> + >>>> max_nr = (end - addr) / PAGE_SIZE; >>> >>> I dislike calculating max_nr twice, once here and once in skip_non_ptes. >>> >>> Further, you're missing to update ptent here. >> >> Oh, my bad. However, with [PATCH v3 5/9], there will be no problem, but >> there are still two ptep_get() and max_nr calculation. >> >> If you inline it you can >>> avoid another ptep_get(). >> >> Do you mean to inline the skip_none_ptes() into do_zap_pte_range()? > > Effectively moving this patch after #5, and have it be something like: > > diff --git a/mm/memory.c b/mm/memory.c > index 1949f5e0fece5..4f5d1e4c6688e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1667,8 +1667,21 @@ static inline int do_zap_pte_range(struct > mmu_gather *tlb, > pte_t ptent = ptep_get(pte); > int max_nr = (end - addr) / PAGE_SIZE; > > - if (pte_none(ptent)) > - return 1; > + /* Skip all consecutive pte_none(). */ > + if (pte_none(ptent)) { > + int nr; > + > + for (nr = 1; nr < max_nr; nr++) { > + ptent = ptep_get(pte + nr); > + if (!pte_none(ptent)) > + break; > + } > + max_nr -= nr; > + if (!max_nr) > + return nr; > + pte += nr; > + addr += nr * PAGE_SIZE; > + } > > if (pte_present(ptent)) > return zap_present_ptes(tlb, vma, pte, ptent, max_nr, > > > In the context of this patch this makes most sense. > > Regarding "count_pte_none" comment, I assume you talk about patch #7. Yes. > > Can't you simply return the number of pte_none that you skipped here > using another > input variable, if really required? Suppose we add an input variable nr_skip to do_zap_pte_range(), you mean to return the above nr to zap_pte_range() through: *nr_skip = nr; and then: zap_pte_range --> nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, &skip_nr, rss, &force_flush, &force_break); if (can_reclaim_pt) { none_nr += count_pte_none(pte, nr); none_nr += nr_skip; } Right? >
diff --git a/mm/memory.c b/mm/memory.c index bd9ebe0f4471f..24633d0e1445a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,6 +1657,28 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb, return nr; } +static inline int skip_none_ptes(pte_t *pte, unsigned long addr, + unsigned long end) +{ + pte_t ptent = ptep_get(pte); + int max_nr; + int nr; + + if (!pte_none(ptent)) + return 0; + + max_nr = (end - addr) / PAGE_SIZE; + nr = 1; + + for (; nr < max_nr; nr++) { + ptent = ptep_get(pte + nr); + if (!pte_none(ptent)) + break; + } + + return nr; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1682,13 +1704,17 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_t ptent = ptep_get(pte); int max_nr; - nr = 1; - if (pte_none(ptent)) - continue; - if (need_resched()) break; + nr = skip_none_ptes(pte, addr, end); + if (nr) { + addr += PAGE_SIZE * nr; + if (addr == end) + break; + pte += nr; + } + max_nr = (end - addr) / PAGE_SIZE; if (pte_present(ptent)) { nr = zap_present_ptes(tlb, vma, pte, ptent, max_nr,
This commit introduces skip_none_ptes() to skip over all consecutive none ptes in zap_pte_range(), which helps optimize away need_resched() + force_break + incremental pte/addr increments etc. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/memory.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-)