diff mbox series

[1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one

Message ID 20250106031711.82855-2-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series mm: batched unmap lazyfree large folios during reclamation | expand

Commit Message

Barry Song Jan. 6, 2025, 3:17 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

The refcount may be temporarily or long-term increased, but this does
not change the fundamental nature of the folio already being lazy-
freed. Therefore, we only reset 'swapbacked' when we are certain the
folio is dirty and not droppable.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/rmap.c | 49 ++++++++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

Comments

Baolin Wang Jan. 6, 2025, 6:40 a.m. UTC | #1
On 2025/1/6 11:17, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> The refcount may be temporarily or long-term increased, but this does
> not change the fundamental nature of the folio already being lazy-
> freed. Therefore, we only reset 'swapbacked' when we are certain the
> folio is dirty and not droppable.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>

The changes look good to me. While we are at it, could you also change 
the __discard_anon_folio_pmd_locked() to follow the same strategy for 
lazy-freed PMD-sized folio?
Barry Song Jan. 6, 2025, 9:03 a.m. UTC | #2
On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/1/6 11:17, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > The refcount may be temporarily or long-term increased, but this does
> > not change the fundamental nature of the folio already being lazy-
> > freed. Therefore, we only reset 'swapbacked' when we are certain the
> > folio is dirty and not droppable.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>
> The changes look good to me. While we are at it, could you also change
> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> lazy-freed PMD-sized folio?

it seems you mean __discard_anon_folio_pmd_locked() is lacking
folio_set_swapbacked(folio) for dirty pmd-mapped folios?
and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
handled properly?

Thanks
barry
Baolin Wang Jan. 6, 2025, 9:34 a.m. UTC | #3
On 2025/1/6 17:03, Barry Song wrote:
> On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/1/6 11:17, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> The refcount may be temporarily or long-term increased, but this does
>>> not change the fundamental nature of the folio already being lazy-
>>> freed. Therefore, we only reset 'swapbacked' when we are certain the
>>> folio is dirty and not droppable.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>
>> The changes look good to me. While we are at it, could you also change
>> the __discard_anon_folio_pmd_locked() to follow the same strategy for
>> lazy-freed PMD-sized folio?
> 
> it seems you mean __discard_anon_folio_pmd_locked() is lacking
> folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> handled properly?

Right.
Lance Yang Jan. 6, 2025, 2:39 p.m. UTC | #4
On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/1/6 17:03, Barry Song wrote:
> > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 2025/1/6 11:17, Barry Song wrote:
> >>> From: Barry Song <v-songbaohua@oppo.com>
> >>>
> >>> The refcount may be temporarily or long-term increased, but this does
> >>> not change the fundamental nature of the folio already being lazy-
> >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> >>> folio is dirty and not droppable.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>
> >> The changes look good to me. While we are at it, could you also change
> >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> >> lazy-freed PMD-sized folio?
> >
> > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > folio_set_swapbacked(folio) for dirty pmd-mapped folios?

Good catch!

Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
THPs in __discard_anon_folio_pmd_locked() - possibly to align with
previous behavior ;)

If a dirty PMD-mapped THP cannot be discarded, we just split it and
restart the page walk to process the PTE-mapped THP. After that, we
will only mark each folio within the THP as swap-backed individually.

It seems like we could cut the work by calling folio_set_swapbacked()
for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
the restart of the page walk after splitting the THP, IMHO ;)

Thanks,
Lance


> > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > handled properly?


>
> Right.
Barry Song Jan. 6, 2025, 8:52 p.m. UTC | #5
On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)

Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and
then iterates over each PTE. I’m not sure why it’s designed this way—could
there be a specific reason behind this approach?

However, it does appear to handle folio_set_swapbacked() correctly, as only
a dirty PMD will result in dirty PTEs being generated in
__split_huge_pmd_locked():

        } else {
                pte_t entry;

                entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
                if (write)
                        entry = pte_mkwrite(entry, vma);

                if (!young)
                        entry = pte_mkold(entry);

                /* NOTE: this may set soft-dirty too on some archs */
                if (dirty)
                        entry = pte_mkdirty(entry);

                if (soft_dirty)
                        entry = pte_mksoft_dirty(entry);

                if (uffd_wp)
                        entry = pte_mkuffd_wp(entry);

                for (i = 0; i < HPAGE_PMD_NR; i++)
                        VM_WARN_ON(!pte_none(ptep_get(pte + i)));

                set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
        }



>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.

Thanks
Barry
Barry Song Jan. 6, 2025, 8:56 p.m. UTC | #6
On Tue, Jan 7, 2025 at 9:52 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 2025/1/6 17:03, Barry Song wrote:
> > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > > <baolin.wang@linux.alibaba.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2025/1/6 11:17, Barry Song wrote:
> > > >>> From: Barry Song <v-songbaohua@oppo.com>
> > > >>>
> > > >>> The refcount may be temporarily or long-term increased, but this does
> > > >>> not change the fundamental nature of the folio already being lazy-
> > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > > >>> folio is dirty and not droppable.
> > > >>>
> > > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > >>
> > > >> The changes look good to me. While we are at it, could you also change
> > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > > >> lazy-freed PMD-sized folio?
> > > >
> > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> >
> > Good catch!
> >
> > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> > THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> > previous behavior ;)
> >
> > If a dirty PMD-mapped THP cannot be discarded, we just split it and
> > restart the page walk to process the PTE-mapped THP. After that, we
> > will only mark each folio within the THP as swap-backed individually.
> >
> > It seems like we could cut the work by calling folio_set_swapbacked()
> > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> > the restart of the page walk after splitting the THP, IMHO ;)
>
> Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
> the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and

Apologies for the typo, I meant splitting a PMD-mapped THP into a PTE-mapped
THP.

> then iterates over each PTE. I’m not sure why it’s designed this way—could
> there be a specific reason behind this approach?
>
> However, it does appear to handle folio_set_swapbacked() correctly, as only
> a dirty PMD will result in dirty PTEs being generated in
> __split_huge_pmd_locked():
>
>         } else {
>                 pte_t entry;
>
>                 entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
>                 if (write)
>                         entry = pte_mkwrite(entry, vma);
>
>                 if (!young)
>                         entry = pte_mkold(entry);
>
>                 /* NOTE: this may set soft-dirty too on some archs */
>                 if (dirty)
>                         entry = pte_mkdirty(entry);
>
>                 if (soft_dirty)
>                         entry = pte_mksoft_dirty(entry);
>
>                 if (uffd_wp)
>                         entry = pte_mkuffd_wp(entry);
>
>                 for (i = 0; i < HPAGE_PMD_NR; i++)
>                         VM_WARN_ON(!pte_none(ptep_get(pte + i)));
>
>                 set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>         }
>
>
>
> >
> > Thanks,
> > Lance
> >
> >
> > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > > handled properly?
> >
> >
> > >
> > > Right.
>
> Thanks
> Barry
Lance Yang Jan. 7, 2025, 1:33 a.m. UTC | #7
On Mon, Jan 6, 2025 at 10:39 PM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)

In correction to the earlier email:

folio_set_swapbacked() is only called in __discard_anon_folio_pmd_locked()
when '!(vma->vm_flags & VM_DROPPABLE)' is true, IIUC.

Thanks,
Lance


>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4ea29a7..de6b8c34e98c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1868,34 +1868,29 @@  static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				 */
 				smp_rmb();
 
-				/*
-				 * The only page refs must be one from isolation
-				 * plus the rmap(s) (dropped by discard:).
-				 */
-				if (ref_count == 1 + map_count &&
-				    (!folio_test_dirty(folio) ||
-				     /*
-				      * Unlike MADV_FREE mappings, VM_DROPPABLE
-				      * ones can be dropped even if they've
-				      * been dirtied.
-				      */
-				     (vma->vm_flags & VM_DROPPABLE))) {
-					dec_mm_counter(mm, MM_ANONPAGES);
-					goto discard;
-				}
-
-				/*
-				 * If the folio was redirtied, it cannot be
-				 * discarded. Remap the page to page table.
-				 */
-				set_pte_at(mm, address, pvmw.pte, pteval);
-				/*
-				 * Unlike MADV_FREE mappings, VM_DROPPABLE ones
-				 * never get swap backed on failure to drop.
-				 */
-				if (!(vma->vm_flags & VM_DROPPABLE))
+				if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
+					/*
+					 * redirtied either using the page table or a previously
+					 * obtained GUP reference.
+					 */
+					set_pte_at(mm, address, pvmw.pte, pteval);
 					folio_set_swapbacked(folio);
-				goto walk_abort;
+					goto walk_abort;
+				} else if (ref_count != 1 + map_count) {
+					/*
+					 * Additional reference. Could be a GUP reference or any
+					 * speculative reference. GUP users must mark the folio
+					 * dirty if there was a modification. This folio cannot be
+					 * reclaimed right now either way, so act just like nothing
+					 * happened.
+					 * We'll come back here later and detect if the folio was
+					 * dirtied when the additional reference is gone.
+					 */
+					set_pte_at(mm, address, pvmw.pte, pteval);
+					goto walk_abort;
+				}
+				dec_mm_counter(mm, MM_ANONPAGES);
+				goto discard;
 			}
 
 			if (swap_duplicate(entry) < 0) {