Message ID | 20250106031711.82855-2-21cnbao@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: batched unmap lazyfree large folios during reclamation | expand |
On 2025/1/6 11:17, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > The refcount may be temporarily or long-term increased, but this does > not change the fundamental nature of the folio already being lazy- > freed. Therefore, we only reset 'swapbacked' when we are certain the > folio is dirty and not droppable. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> The changes look good to me. While we are at it, could you also change the __discard_anon_folio_pmd_locked() to follow the same strategy for lazy-freed PMD-sized folio?
On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2025/1/6 11:17, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > The refcount may be temporarily or long-term increased, but this does > > not change the fundamental nature of the folio already being lazy- > > freed. Therefore, we only reset 'swapbacked' when we are certain the > > folio is dirty and not droppable. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > The changes look good to me. While we are at it, could you also change > the __discard_anon_folio_pmd_locked() to follow the same strategy for > lazy-freed PMD-sized folio? it seems you mean __discard_anon_folio_pmd_locked() is lacking folio_set_swapbacked(folio) for dirty pmd-mapped folios? and it seems !(vma->vm_flags & VM_DROPPABLE) is also not handled properly? Thanks barry
On 2025/1/6 17:03, Barry Song wrote: > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > <baolin.wang@linux.alibaba.com> wrote: >> >> >> >> On 2025/1/6 11:17, Barry Song wrote: >>> From: Barry Song <v-songbaohua@oppo.com> >>> >>> The refcount may be temporarily or long-term increased, but this does >>> not change the fundamental nature of the folio already being lazy- >>> freed. Therefore, we only reset 'swapbacked' when we are certain the >>> folio is dirty and not droppable. >>> >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> >> >> The changes look good to me. While we are at it, could you also change >> the __discard_anon_folio_pmd_locked() to follow the same strategy for >> lazy-freed PMD-sized folio? > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > folio_set_swapbacked(folio) for dirty pmd-mapped folios? > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > handled properly? Right.
On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2025/1/6 17:03, Barry Song wrote: > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > > <baolin.wang@linux.alibaba.com> wrote: > >> > >> > >> > >> On 2025/1/6 11:17, Barry Song wrote: > >>> From: Barry Song <v-songbaohua@oppo.com> > >>> > >>> The refcount may be temporarily or long-term increased, but this does > >>> not change the fundamental nature of the folio already being lazy- > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the > >>> folio is dirty and not droppable. > >>> > >>> Suggested-by: David Hildenbrand <david@redhat.com> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > >> > >> The changes look good to me. While we are at it, could you also change > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for > >> lazy-freed PMD-sized folio? > > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > > folio_set_swapbacked(folio) for dirty pmd-mapped folios? Good catch! Hmm... I don't recall why we don't call folio_set_swapbacked for dirty THPs in __discard_anon_folio_pmd_locked() - possibly to align with previous behavior ;) If a dirty PMD-mapped THP cannot be discarded, we just split it and restart the page walk to process the PTE-mapped THP. After that, we will only mark each folio within the THP as swap-backed individually. It seems like we could cut the work by calling folio_set_swapbacked() for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping the restart of the page walk after splitting the THP, IMHO ;) Thanks, Lance > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > > handled properly? > > Right.
On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote: > > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang > <baolin.wang@linux.alibaba.com> wrote: > > > > > > > > On 2025/1/6 17:03, Barry Song wrote: > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > > > <baolin.wang@linux.alibaba.com> wrote: > > >> > > >> > > >> > > >> On 2025/1/6 11:17, Barry Song wrote: > > >>> From: Barry Song <v-songbaohua@oppo.com> > > >>> > > >>> The refcount may be temporarily or long-term increased, but this does > > >>> not change the fundamental nature of the folio already being lazy- > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the > > >>> folio is dirty and not droppable. > > >>> > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > >> > > >> The changes look good to me. While we are at it, could you also change > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for > > >> lazy-freed PMD-sized folio? > > > > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios? > > Good catch! > > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty > THPs in __discard_anon_folio_pmd_locked() - possibly to align with > previous behavior ;) > > If a dirty PMD-mapped THP cannot be discarded, we just split it and > restart the page walk to process the PTE-mapped THP. After that, we > will only mark each folio within the THP as swap-backed individually. > > It seems like we could cut the work by calling folio_set_swapbacked() > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping > the restart of the page walk after splitting the THP, IMHO ;) Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and then iterates over each PTE. I’m not sure why it’s designed this way—could there be a specific reason behind this approach? However, it does appear to handle folio_set_swapbacked() correctly, as only a dirty PMD will result in dirty PTEs being generated in __split_huge_pmd_locked(): } else { pte_t entry; entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); if (write) entry = pte_mkwrite(entry, vma); if (!young) entry = pte_mkold(entry); /* NOTE: this may set soft-dirty too on some archs */ if (dirty) entry = pte_mkdirty(entry); if (soft_dirty) entry = pte_mksoft_dirty(entry); if (uffd_wp) entry = pte_mkuffd_wp(entry); for (i = 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); } > > Thanks, > Lance > > > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > > > handled properly? > > > > > > Right. Thanks Barry
On Tue, Jan 7, 2025 at 9:52 AM Barry Song <21cnbao@gmail.com> wrote: > > On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote: > > > > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang > > <baolin.wang@linux.alibaba.com> wrote: > > > > > > > > > > > > On 2025/1/6 17:03, Barry Song wrote: > > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > > > > <baolin.wang@linux.alibaba.com> wrote: > > > >> > > > >> > > > >> > > > >> On 2025/1/6 11:17, Barry Song wrote: > > > >>> From: Barry Song <v-songbaohua@oppo.com> > > > >>> > > > >>> The refcount may be temporarily or long-term increased, but this does > > > >>> not change the fundamental nature of the folio already being lazy- > > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the > > > >>> folio is dirty and not droppable. > > > >>> > > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > >> > > > >> The changes look good to me. While we are at it, could you also change > > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for > > > >> lazy-freed PMD-sized folio? > > > > > > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios? > > > > Good catch! > > > > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty > > THPs in __discard_anon_folio_pmd_locked() - possibly to align with > > previous behavior ;) > > > > If a dirty PMD-mapped THP cannot be discarded, we just split it and > > restart the page walk to process the PTE-mapped THP. After that, we > > will only mark each folio within the THP as swap-backed individually. > > > > It seems like we could cut the work by calling folio_set_swapbacked() > > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping > > the restart of the page walk after splitting the THP, IMHO ;) > > Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits > the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and Apologies for the typo, I meant splitting a PMD-mapped THP into a PTE-mapped THP. > then iterates over each PTE. I’m not sure why it’s designed this way—could > there be a specific reason behind this approach? > > However, it does appear to handle folio_set_swapbacked() correctly, as only > a dirty PMD will result in dirty PTEs being generated in > __split_huge_pmd_locked(): > > } else { > pte_t entry; > > entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); > if (write) > entry = pte_mkwrite(entry, vma); > > if (!young) > entry = pte_mkold(entry); > > /* NOTE: this may set soft-dirty too on some archs */ > if (dirty) > entry = pte_mkdirty(entry); > > if (soft_dirty) > entry = pte_mksoft_dirty(entry); > > if (uffd_wp) > entry = pte_mkuffd_wp(entry); > > for (i = 0; i < HPAGE_PMD_NR; i++) > VM_WARN_ON(!pte_none(ptep_get(pte + i))); > > set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); > } > > > > > > > Thanks, > > Lance > > > > > > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > > > > handled properly? > > > > > > > > > > Right. > > Thanks > Barry
On Mon, Jan 6, 2025 at 10:39 PM Lance Yang <ioworker0@gmail.com> wrote: > > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang > <baolin.wang@linux.alibaba.com> wrote: > > > > > > > > On 2025/1/6 17:03, Barry Song wrote: > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > > > <baolin.wang@linux.alibaba.com> wrote: > > >> > > >> > > >> > > >> On 2025/1/6 11:17, Barry Song wrote: > > >>> From: Barry Song <v-songbaohua@oppo.com> > > >>> > > >>> The refcount may be temporarily or long-term increased, but this does > > >>> not change the fundamental nature of the folio already being lazy- > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the > > >>> folio is dirty and not droppable. > > >>> > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > >> > > >> The changes look good to me. While we are at it, could you also change > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for > > >> lazy-freed PMD-sized folio? > > > > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios? > > Good catch! > > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty > THPs in __discard_anon_folio_pmd_locked() - possibly to align with > previous behavior ;) > > If a dirty PMD-mapped THP cannot be discarded, we just split it and > restart the page walk to process the PTE-mapped THP. After that, we > will only mark each folio within the THP as swap-backed individually. > > It seems like we could cut the work by calling folio_set_swapbacked() > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping > the restart of the page walk after splitting the THP, IMHO ;) In correction to the earlier email: folio_set_swapbacked() is only called in __discard_anon_folio_pmd_locked() when '!(vma->vm_flags & VM_DROPPABLE)' is true, IIUC. Thanks, Lance > > Thanks, > Lance > > > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > > > handled properly? > > > > > > Right.
diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7..de6b8c34e98c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ smp_rmb(); - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - (!folio_test_dirty(folio) || - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE - * ones can be dropped even if they've - * been dirtied. - */ - (vma->vm_flags & VM_DROPPABLE))) { - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE ones - * never get swap backed on failure to drop. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_pte_at(mm, address, pvmw.pte, pteval); folio_set_swapbacked(folio); - goto walk_abort; + goto walk_abort; + } else if (ref_count != 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + goto walk_abort; + } + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; } if (swap_duplicate(entry) < 0) {