diff mbox series

[RFC] mm: huge_memory: add folio_mark_accessed() when zapping file THP

Message ID 34bab7a60930472377afbfeefe05b980d0512aa4.1744118089.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New
Headers show
Series [RFC] mm: huge_memory: add folio_mark_accessed() when zapping file THP | expand

Commit Message

Baolin Wang April 8, 2025, 1:16 p.m. UTC
When investigating performance issues during file folio unmap, I noticed some
behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
folio as having seen activity, but this is not done for PMD-sized folios.

This might not cause obvious issues, but a potential problem could be that,
it might lead to more frequent refaults of PMD-sized file folios under memory
pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
added for PMD-sized file folios?

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Zi Yan April 8, 2025, 3:29 p.m. UTC | #1
On 8 Apr 2025, at 9:16, Baolin Wang wrote:

> When investigating performance issues during file folio unmap, I noticed some
> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> folio as having seen activity, but this is not done for PMD-sized folios.
>
> This might not cause obvious issues, but a potential problem could be that,
> it might lead to more frequent refaults of PMD-sized file folios under memory
> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be

How likely will the system get PMD-sized file folios when it is under
memory pressure? Johannes’ recent patch increases THP allocation successful
rate, maybe it was not happening before but will be after the patch?

> added for PMD-sized file folios?

Do you see any performance change after your patch?

>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/huge_memory.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 6ac6d468af0d..b3ade7ac5bbf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  				zap_deposited_table(tlb->mm, pmd);
>  			add_mm_counter(tlb->mm, mm_counter_file(folio),
>  				       -HPAGE_PMD_NR);
> +
> +			if (flush_needed && pmd_young(orig_pmd) &&
> +			    likely(vma_has_recency(vma)))
> +				folio_mark_accessed(folio);
>  		}
>
>  		spin_unlock(ptl);
> -- 
> 2.43.5


Best Regards,
Yan, Zi
Johannes Weiner April 8, 2025, 4:02 p.m. UTC | #2
On Tue, Apr 08, 2025 at 11:29:43AM -0400, Zi Yan wrote:
> On 8 Apr 2025, at 9:16, Baolin Wang wrote:
> 
> > When investigating performance issues during file folio unmap, I noticed some
> > behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> > For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> > folio as having seen activity, but this is not done for PMD-sized folios.
> >
> > This might not cause obvious issues, but a potential problem could be that,
> > it might lead to more frequent refaults of PMD-sized file folios under memory
> > pressure. Therefore, I am unsure whether the folio_mark_accessed() should be

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

> How likely will the system get PMD-sized file folios when it is under
> memory pressure? Johannes’ recent patch increases THP allocation successful
> rate, maybe it was not happening before but will be after the patch?

It's not so much about whether the refault can construct a THP again,
but whether we should have evicted this data under pressure to begin
with. It's more about IO and paging. And it's the same consideration
why we transfer the young bit for base pages.

Sometimes file contents are only accessed through relatively
short-lived mappings. But they can nevertheless be accessed a lot and
be hot. It's important to not lose that information on unmap, and end
up kicking out a frequently used cache page.

> > added for PMD-sized file folios?
> 
> Do you see any performance change after your patch?
> 
> >
> > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > ---
> >  mm/huge_memory.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 6ac6d468af0d..b3ade7ac5bbf 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >  				zap_deposited_table(tlb->mm, pmd);
> >  			add_mm_counter(tlb->mm, mm_counter_file(folio),
> >  				       -HPAGE_PMD_NR);
> > +
> > +			if (flush_needed && pmd_young(orig_pmd) &&
> > +			    likely(vma_has_recency(vma)))
> > +				folio_mark_accessed(folio);
> >  		}
> >
> >  		spin_unlock(ptl);
> > -- 
> > 2.43.5
Zi Yan April 8, 2025, 4:12 p.m. UTC | #3
On 8 Apr 2025, at 12:02, Johannes Weiner wrote:

> On Tue, Apr 08, 2025 at 11:29:43AM -0400, Zi Yan wrote:
>> On 8 Apr 2025, at 9:16, Baolin Wang wrote:
>>
>>> When investigating performance issues during file folio unmap, I noticed some
>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>>> folio as having seen activity, but this is not done for PMD-sized folios.
>>>
>>> This might not cause obvious issues, but a potential problem could be that,
>>> it might lead to more frequent refaults of PMD-sized file folios under memory
>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
>> How likely will the system get PMD-sized file folios when it is under
>> memory pressure? Johannes’ recent patch increases THP allocation successful
>> rate, maybe it was not happening before but will be after the patch?
>
> It's not so much about whether the refault can construct a THP again,
> but whether we should have evicted this data under pressure to begin
> with. It's more about IO and paging. And it's the same consideration
> why we transfer the young bit for base pages.

Got it. It clarifies things a lot.

>
> Sometimes file contents are only accessed through relatively
> short-lived mappings. But they can nevertheless be accessed a lot and
> be hot. It's important to not lose that information on unmap, and end
> up kicking out a frequently used cache page.

So folio_mark_accessed() will prevent the folio from going down in
the LRU lists, when PTE access information is transferred to the folio.
The addition of folio_mark_accessed() makes sense to me now.

Baolin, can you include Johannes’s explanation in your commit log?

Feel free to add Acked-by: Zi Yan <ziy@nvidia.com>

>
>>> added for PMD-sized file folios?
>>
>> Do you see any performance change after your patch?
>>
>>>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>>  mm/huge_memory.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>  				zap_deposited_table(tlb->mm, pmd);
>>>  			add_mm_counter(tlb->mm, mm_counter_file(folio),
>>>  				       -HPAGE_PMD_NR);
>>> +
>>> +			if (flush_needed && pmd_young(orig_pmd) &&
>>> +			    likely(vma_has_recency(vma)))
>>> +				folio_mark_accessed(folio);
>>>  		}
>>>
>>>  		spin_unlock(ptl);
>>> -- 
>>> 2.43.5


Best Regards,
Yan, Zi
Baolin Wang April 9, 2025, 12:52 a.m. UTC | #4
On 2025/4/9 00:12, Zi Yan wrote:
> On 8 Apr 2025, at 12:02, Johannes Weiner wrote:
> 
>> On Tue, Apr 08, 2025 at 11:29:43AM -0400, Zi Yan wrote:
>>> On 8 Apr 2025, at 9:16, Baolin Wang wrote:
>>>
>>>> When investigating performance issues during file folio unmap, I noticed some
>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>>>> folio as having seen activity, but this is not done for PMD-sized folios.
>>>>
>>>> This might not cause obvious issues, but a potential problem could be that,
>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks for taking a look.

>>> How likely will the system get PMD-sized file folios when it is under
>>> memory pressure? Johannes’ recent patch increases THP allocation successful
>>> rate, maybe it was not happening before but will be after the patch?
>>
>> It's not so much about whether the refault can construct a THP again,
>> but whether we should have evicted this data under pressure to begin
>> with. It's more about IO and paging. And it's the same consideration
>> why we transfer the young bit for base pages.
> 
> Got it. It clarifies things a lot.
> 
>>
>> Sometimes file contents are only accessed through relatively
>> short-lived mappings. But they can nevertheless be accessed a lot and
>> be hot. It's important to not lose that information on unmap, and end
>> up kicking out a frequently used cache page.

Yes, that's what I also understand. Thanks for the explanation.

> So folio_mark_accessed() will prevent the folio from going down in
> the LRU lists, when PTE access information is transferred to the folio.
> The addition of folio_mark_accessed() makes sense to me now.
> 
> Baolin, can you include Johannes’s explanation in your commit log?

Sure. Will do.

> 
> Feel free to add Acked-by: Zi Yan <ziy@nvidia.com>

Thanks for reviewing.

>>>> added for PMD-sized file folios?
>>>
>>> Do you see any performance change after your patch?

Not yet, just some theoretical analysis from code inspection.

>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>>   mm/huge_memory.c | 4 ++++
>>>>   1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>   				zap_deposited_table(tlb->mm, pmd);
>>>>   			add_mm_counter(tlb->mm, mm_counter_file(folio),
>>>>   				       -HPAGE_PMD_NR);
>>>> +
>>>> +			if (flush_needed && pmd_young(orig_pmd) &&
>>>> +			    likely(vma_has_recency(vma)))
>>>> +				folio_mark_accessed(folio);
>>>>   		}
>>>>
>>>>   		spin_unlock(ptl);
>>>> -- 
>>>> 2.43.5
> 
> 
> Best Regards,
> Yan, Zi
Barry Song April 10, 2025, 8:14 a.m. UTC | #5
On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> When investigating performance issues during file folio unmap, I noticed some
> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> folio as having seen activity, but this is not done for PMD-sized folios.
>
> This might not cause obvious issues, but a potential problem could be that,
> it might lead to more frequent refaults of PMD-sized file folios under memory
> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
> added for PMD-sized file folios?
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/huge_memory.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 6ac6d468af0d..b3ade7ac5bbf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>                                 zap_deposited_table(tlb->mm, pmd);
>                         add_mm_counter(tlb->mm, mm_counter_file(folio),
>                                        -HPAGE_PMD_NR);
> +
> +                       if (flush_needed && pmd_young(orig_pmd) &&
> +                           likely(vma_has_recency(vma)))
> +                               folio_mark_accessed(folio);

Acked-by: Barry Song <baohua@kernel.org>

I also came across an interesting observation: on a memory-limited system,
demoting unmapped file folios in the LRU—specifically when their mapcount
drops from 1 to 0—can actually improve performance.

If others have observed the same behavior, we might not need to mark them
as accessed in that scenario.

>                 }
>
>                 spin_unlock(ptl);
> --
> 2.43.5
>

Thanks
barry
Baolin Wang April 10, 2025, 9:05 a.m. UTC | #6
On 2025/4/10 16:14, Barry Song wrote:
> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> When investigating performance issues during file folio unmap, I noticed some
>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>> folio as having seen activity, but this is not done for PMD-sized folios.
>>
>> This might not cause obvious issues, but a potential problem could be that,
>> it might lead to more frequent refaults of PMD-sized file folios under memory
>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>> added for PMD-sized file folios?
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/huge_memory.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>                                  zap_deposited_table(tlb->mm, pmd);
>>                          add_mm_counter(tlb->mm, mm_counter_file(folio),
>>                                         -HPAGE_PMD_NR);
>> +
>> +                       if (flush_needed && pmd_young(orig_pmd) &&
>> +                           likely(vma_has_recency(vma)))
>> +                               folio_mark_accessed(folio);
> 
> Acked-by: Barry Song <baohua@kernel.org>

Thanks.

> I also came across an interesting observation: on a memory-limited system,
> demoting unmapped file folios in the LRU—specifically when their mapcount
> drops from 1 to 0—can actually improve performance.

These file folios are used only once? Can folio_set_dropbehind() be used 
to optimize it, which can avoid the LRU activity movement in 
folio_mark_accessed()?

> If others have observed the same behavior, we might not need to mark them
> as accessed in that scenario.
> 
>>                  }
>>
>>                  spin_unlock(ptl);
>> --
>> 2.43.5
>>
> 
> Thanks
> barry
Barry Song April 10, 2025, 10:29 a.m. UTC | #7
On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/4/10 16:14, Barry Song wrote:
> > On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> >>
> >> When investigating performance issues during file folio unmap, I noticed some
> >> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> >> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> >> folio as having seen activity, but this is not done for PMD-sized folios.
> >>
> >> This might not cause obvious issues, but a potential problem could be that,
> >> it might lead to more frequent refaults of PMD-sized file folios under memory
> >> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
> >> added for PMD-sized file folios?
> >>
> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >> ---
> >>   mm/huge_memory.c | 4 ++++
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 6ac6d468af0d..b3ade7ac5bbf 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>                                  zap_deposited_table(tlb->mm, pmd);
> >>                          add_mm_counter(tlb->mm, mm_counter_file(folio),
> >>                                         -HPAGE_PMD_NR);
> >> +
> >> +                       if (flush_needed && pmd_young(orig_pmd) &&
> >> +                           likely(vma_has_recency(vma)))
> >> +                               folio_mark_accessed(folio);
> >
> > Acked-by: Barry Song <baohua@kernel.org>
>
> Thanks.
>
> > I also came across an interesting observation: on a memory-limited system,
> > demoting unmapped file folios in the LRU—specifically when their mapcount
> > drops from 1 to 0—can actually improve performance.
>
> These file folios are used only once? Can folio_set_dropbehind() be used
> to optimize it, which can avoid the LRU activity movement in
> folio_mark_accessed()?

For instance, when a process, such as a game, just exits, it can be expected
that it won't be used again in the near future. As a result, demoting
its previously
unmapped file pages can improve performance.

Of course, for file folios mapped by multiple processes, such as
common .so files,
it's a different story. Typically, their mapcounts are always high.

>
> > If others have observed the same behavior, we might not need to mark them
> > as accessed in that scenario.
> >
> >>                  }
> >>
> >>                  spin_unlock(ptl);
> >> --
> >> 2.43.5
> >>
> >

Thanks
Barry
Zi Yan April 10, 2025, 3:13 p.m. UTC | #8
On 10 Apr 2025, at 6:29, Barry Song wrote:

> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/4/10 16:14, Barry Song wrote:
>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>
>>>> When investigating performance issues during file folio unmap, I noticed some
>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>>>> folio as having seen activity, but this is not done for PMD-sized folios.
>>>>
>>>> This might not cause obvious issues, but a potential problem could be that,
>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>>>> added for PMD-sized file folios?
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>>   mm/huge_memory.c | 4 ++++
>>>>   1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>                                  zap_deposited_table(tlb->mm, pmd);
>>>>                          add_mm_counter(tlb->mm, mm_counter_file(folio),
>>>>                                         -HPAGE_PMD_NR);
>>>> +
>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
>>>> +                           likely(vma_has_recency(vma)))
>>>> +                               folio_mark_accessed(folio);
>>>
>>> Acked-by: Barry Song <baohua@kernel.org>
>>
>> Thanks.
>>
>>> I also came across an interesting observation: on a memory-limited system,
>>> demoting unmapped file folios in the LRU—specifically when their mapcount
>>> drops from 1 to 0—can actually improve performance.
>>
>> These file folios are used only once? Can folio_set_dropbehind() be used
>> to optimize it, which can avoid the LRU activity movement in
>> folio_mark_accessed()?
>
> For instance, when a process, such as a game, just exits, it can be expected
> that it won't be used again in the near future. As a result, demoting
> its previously
> unmapped file pages can improve performance.

Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
so that folio_mark_accessed() will be skipped? Or a new vm_flag?
Will it work?

>
> Of course, for file folios mapped by multiple processes, such as
> common .so files,
> it's a different story. Typically, their mapcounts are always high.

Text VMAs should not be marked.

>
>>
>>> If others have observed the same behavior, we might not need to mark them
>>> as accessed in that scenario.
>>>
>>>>                  }
>>>>
>>>>                  spin_unlock(ptl);
>>>> --
>>>> 2.43.5
>>>>
>>>
>
> Thanks
> Barry


Best Regards,
Yan, Zi
Barry Song April 10, 2025, 9:56 p.m. UTC | #9
On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 10 Apr 2025, at 6:29, Barry Song wrote:
>
> > On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 2025/4/10 16:14, Barry Song wrote:
> >>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
> >>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>
> >>>> When investigating performance issues during file folio unmap, I noticed some
> >>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> >>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> >>>> folio as having seen activity, but this is not done for PMD-sized folios.
> >>>>
> >>>> This might not cause obvious issues, but a potential problem could be that,
> >>>> it might lead to more frequent refaults of PMD-sized file folios under memory
> >>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
> >>>> added for PMD-sized file folios?
> >>>>
> >>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>>> ---
> >>>>   mm/huge_memory.c | 4 ++++
> >>>>   1 file changed, 4 insertions(+)
> >>>>
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>>>                                  zap_deposited_table(tlb->mm, pmd);
> >>>>                          add_mm_counter(tlb->mm, mm_counter_file(folio),
> >>>>                                         -HPAGE_PMD_NR);
> >>>> +
> >>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
> >>>> +                           likely(vma_has_recency(vma)))
> >>>> +                               folio_mark_accessed(folio);
> >>>
> >>> Acked-by: Barry Song <baohua@kernel.org>
> >>
> >> Thanks.
> >>
> >>> I also came across an interesting observation: on a memory-limited system,
> >>> demoting unmapped file folios in the LRU—specifically when their mapcount
> >>> drops from 1 to 0—can actually improve performance.
> >>
> >> These file folios are used only once? Can folio_set_dropbehind() be used
> >> to optimize it, which can avoid the LRU activity movement in
> >> folio_mark_accessed()?
> >
> > For instance, when a process, such as a game, just exits, it can be expected
> > that it won't be used again in the near future. As a result, demoting
> > its previously
> > unmapped file pages can improve performance.
>
> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
> Will it work?

Actually took a more aggressive approach and observed good performance
improvements on phones. After zap_pte_range() called remove_rmap(),
the following logic was added:

if (file_folio && !folio_mapped())
    deactivate_file_folio();

This helps file folios from exiting processes get reclaimed more quickly
during the MGLRU's min generation scan while the folios are probably
in max gen.

I'm not entirely sure if this is universally applicable or worth submitting as
a patch.

>
> >
> > Of course, for file folios mapped by multiple processes, such as
> > common .so files,
> > it's a different story. Typically, their mapcounts are always high.
>
> Text VMAs should not be marked.
>
> >
> >>
> >>> If others have observed the same behavior, we might not need to mark them
> >>> as accessed in that scenario.
> >>>
> >>>>                  }
> >>>>
> >>>>                  spin_unlock(ptl);
> >>>> --
> >>>> 2.43.5
> >>>>
> >>>
> >

Thanks
Barry
Baolin Wang April 11, 2025, 1:20 a.m. UTC | #10
On 2025/4/11 05:56, Barry Song wrote:
> On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 10 Apr 2025, at 6:29, Barry Song wrote:
>>
>>> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2025/4/10 16:14, Barry Song wrote:
>>>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
>>>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>>>
>>>>>> When investigating performance issues during file folio unmap, I noticed some
>>>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>>>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>>>>>> folio as having seen activity, but this is not done for PMD-sized folios.
>>>>>>
>>>>>> This might not cause obvious issues, but a potential problem could be that,
>>>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
>>>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>>>>>> added for PMD-sized file folios?
>>>>>>
>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>> ---
>>>>>>    mm/huge_memory.c | 4 ++++
>>>>>>    1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>>>                                   zap_deposited_table(tlb->mm, pmd);
>>>>>>                           add_mm_counter(tlb->mm, mm_counter_file(folio),
>>>>>>                                          -HPAGE_PMD_NR);
>>>>>> +
>>>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
>>>>>> +                           likely(vma_has_recency(vma)))
>>>>>> +                               folio_mark_accessed(folio);
>>>>>
>>>>> Acked-by: Barry Song <baohua@kernel.org>
>>>>
>>>> Thanks.
>>>>
>>>>> I also came across an interesting observation: on a memory-limited system,
>>>>> demoting unmapped file folios in the LRU—specifically when their mapcount
>>>>> drops from 1 to 0—can actually improve performance.
>>>>
>>>> These file folios are used only once? Can folio_set_dropbehind() be used
>>>> to optimize it, which can avoid the LRU activity movement in
>>>> folio_mark_accessed()?
>>>
>>> For instance, when a process, such as a game, just exits, it can be expected
>>> that it won't be used again in the near future. As a result, demoting
>>> its previously
>>> unmapped file pages can improve performance.
>>
>> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
>> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
>> Will it work?
> 
> Actually took a more aggressive approach and observed good performance
> improvements on phones. After zap_pte_range() called remove_rmap(),
> the following logic was added:
> 
> if (file_folio && !folio_mapped())
>      deactivate_file_folio();
> 
> This helps file folios from exiting processes get reclaimed more quickly
> during the MGLRU's min generation scan while the folios are probably
> in max gen.
> 
> I'm not entirely sure if this is universally applicable or worth submitting as
> a patch.

IMHO, I'm afraid this is not universally applicable. Although these file 
folios have been unmapped, it's not certain that they won't be accessed 
again. These file folios might be remapped and accessed again soon, or 
accessed through read()/write() operations using a file descriptor.

I agree with Zi's suggestion. Using some kind of madvise() hint to mark 
these file folios as those that won't be accessed after being unmapped, 
seems can work?
Barry Song April 11, 2025, 2:32 a.m. UTC | #11
On Fri, Apr 11, 2025 at 1:20 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2025/4/11 05:56, Barry Song wrote:
> > On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On 10 Apr 2025, at 6:29, Barry Song wrote:
> >>
> >>> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
> >>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2025/4/10 16:14, Barry Song wrote:
> >>>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
> >>>>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>>>
> >>>>>> When investigating performance issues during file folio unmap, I noticed some
> >>>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> >>>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> >>>>>> folio as having seen activity, but this is not done for PMD-sized folios.
> >>>>>>
> >>>>>> This might not cause obvious issues, but a potential problem could be that,
> >>>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
> >>>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
> >>>>>> added for PMD-sized file folios?
> >>>>>>
> >>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>>>>> ---
> >>>>>>    mm/huge_memory.c | 4 ++++
> >>>>>>    1 file changed, 4 insertions(+)
> >>>>>>
> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
> >>>>>> --- a/mm/huge_memory.c
> >>>>>> +++ b/mm/huge_memory.c
> >>>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>>>>>                                   zap_deposited_table(tlb->mm, pmd);
> >>>>>>                           add_mm_counter(tlb->mm, mm_counter_file(folio),
> >>>>>>                                          -HPAGE_PMD_NR);
> >>>>>> +
> >>>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
> >>>>>> +                           likely(vma_has_recency(vma)))
> >>>>>> +                               folio_mark_accessed(folio);
> >>>>>
> >>>>> Acked-by: Barry Song <baohua@kernel.org>
> >>>>
> >>>> Thanks.
> >>>>
> >>>>> I also came across an interesting observation: on a memory-limited system,
> >>>>> demoting unmapped file folios in the LRU—specifically when their mapcount
> >>>>> drops from 1 to 0—can actually improve performance.
> >>>>
> >>>> These file folios are used only once? Can folio_set_dropbehind() be used
> >>>> to optimize it, which can avoid the LRU activity movement in
> >>>> folio_mark_accessed()?
> >>>
> >>> For instance, when a process, such as a game, just exits, it can be expected
> >>> that it won't be used again in the near future. As a result, demoting
> >>> its previously
> >>> unmapped file pages can improve performance.
> >>
> >> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
> >> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
> >> Will it work?
> >
> > Actually took a more aggressive approach and observed good performance
> > improvements on phones. After zap_pte_range() called remove_rmap(),
> > the following logic was added:
> >
> > if (file_folio && !folio_mapped())
> >      deactivate_file_folio();
> >
> > This helps file folios from exiting processes get reclaimed more quickly
> > during the MGLRU's min generation scan while the folios are probably
> > in max gen.
> >
> > I'm not entirely sure if this is universally applicable or worth submitting as
> > a patch.
>
> IMHO, I'm afraid this is not universally applicable. Although these file
> folios have been unmapped, it's not certain that they won't be accessed
> again. These file folios might be remapped and accessed again soon, or
> accessed through read()/write() operations using a file descriptor.

This might apply to interactive systems such as desktops and Android phones.
When an app exits, it's unlikely to be reopened very soon. For
example, Firefox’s
text and other file handles are of no use to LibreOffice. So, if we
can help reclaim
Firefox’s files promptly (rather than promoting them), we may be able to assist
LibreOffice in getting memory more efficiently.

Imagine a desktop system with limited memory that can only hold either Firefox
or LibreOffice at a time. When Firefox exits, its files are still
relatively "young"
in memory. If they’re marked as recently accessed, it becomes harder to reclaim
Firefox’s exclusive files.

Consider the current LRU list:

active -------------------------------------------------------------------->
inactive

firefox files - common .so file - firefox file - common .so file

If we demote Firefox’s files, the LRU could instead look like this:

active -------------------------------------------------------------------->
inactive

common .so files - common .so files - firefox files - firefox files

With this arrangement, when launching LibreOffice, the system can quickly
reclaim Firefox's files, rather than spending time evicting the commonly
used .so files that LibreOffice may also need.

>
> I agree with Zi's suggestion. Using some kind of madvise() hint to mark
> these file folios as those that won't be accessed after being unmapped,
> seems can work?

The issue is that userspace doesn’t know why or when it should call
madvise(). From its perspective, it’s simply the app exiting.

But I agree—there are always exceptions to the pattern I described above.
Just don't know how to tell the kernel the proper pattern.

Thanks
Barry
David Hildenbrand April 11, 2025, 8:42 a.m. UTC | #12
On 11.04.25 03:20, Baolin Wang wrote:
> 
> 
> On 2025/4/11 05:56, Barry Song wrote:
>> On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
>>>
>>> On 10 Apr 2025, at 6:29, Barry Song wrote:
>>>
>>>> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
>>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2025/4/10 16:14, Barry Song wrote:
>>>>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
>>>>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>>>>
>>>>>>> When investigating performance issues during file folio unmap, I noticed some
>>>>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>>>>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>>>>>>> folio as having seen activity, but this is not done for PMD-sized folios.
>>>>>>>
>>>>>>> This might not cause obvious issues, but a potential problem could be that,
>>>>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
>>>>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>>>>>>> added for PMD-sized file folios?
>>>>>>>
>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>> ---
>>>>>>>     mm/huge_memory.c | 4 ++++
>>>>>>>     1 file changed, 4 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>>>>>>> --- a/mm/huge_memory.c
>>>>>>> +++ b/mm/huge_memory.c
>>>>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>>>>                                    zap_deposited_table(tlb->mm, pmd);
>>>>>>>                            add_mm_counter(tlb->mm, mm_counter_file(folio),
>>>>>>>                                           -HPAGE_PMD_NR);
>>>>>>> +
>>>>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
>>>>>>> +                           likely(vma_has_recency(vma)))
>>>>>>> +                               folio_mark_accessed(folio);
>>>>>>
>>>>>> Acked-by: Barry Song <baohua@kernel.org>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>> I also came across an interesting observation: on a memory-limited system,
>>>>>> demoting unmapped file folios in the LRU—specifically when their mapcount
>>>>>> drops from 1 to 0—can actually improve performance.
>>>>>
>>>>> These file folios are used only once? Can folio_set_dropbehind() be used
>>>>> to optimize it, which can avoid the LRU activity movement in
>>>>> folio_mark_accessed()?
>>>>
>>>> For instance, when a process, such as a game, just exits, it can be expected
>>>> that it won't be used again in the near future. As a result, demoting
>>>> its previously
>>>> unmapped file pages can improve performance.
>>>
>>> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
>>> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
>>> Will it work?
>>
>> Actually took a more aggressive approach and observed good performance
>> improvements on phones. After zap_pte_range() called remove_rmap(),
>> the following logic was added:
>>
>> if (file_folio && !folio_mapped())
>>       deactivate_file_folio();
>>
>> This helps file folios from exiting processes get reclaimed more quickly
>> during the MGLRU's min generation scan while the folios are probably
>> in max gen.
>>
>> I'm not entirely sure if this is universally applicable or worth submitting as
>> a patch.
> 
> IMHO, I'm afraid this is not universally applicable. Although these file
> folios have been unmapped, it's not certain that they won't be accessed
> again. These file folios might be remapped and accessed again soon, or
> accessed through read()/write() operations using a file descriptor.
> 
> I agree with Zi's suggestion. Using some kind of madvise() hint to mark
> these file folios as those that won't be accessed after being unmapped,
> seems can work?

Is that similar to MADV_COLD before unmap?
Barry Song April 11, 2025, 11:51 a.m. UTC | #13
On Fri, Apr 11, 2025 at 4:42 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 11.04.25 03:20, Baolin Wang wrote:
> >
> >
> > On 2025/4/11 05:56, Barry Song wrote:
> >> On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
> >>>
> >>> On 10 Apr 2025, at 6:29, Barry Song wrote:
> >>>
> >>>> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
> >>>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2025/4/10 16:14, Barry Song wrote:
> >>>>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
> >>>>>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>>>>
> >>>>>>> When investigating performance issues during file folio unmap, I noticed some
> >>>>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> >>>>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> >>>>>>> folio as having seen activity, but this is not done for PMD-sized folios.
> >>>>>>>
> >>>>>>> This might not cause obvious issues, but a potential problem could be that,
> >>>>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
> >>>>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
> >>>>>>> added for PMD-sized file folios?
> >>>>>>>
> >>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>>>>>> ---
> >>>>>>>     mm/huge_memory.c | 4 ++++
> >>>>>>>     1 file changed, 4 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
> >>>>>>> --- a/mm/huge_memory.c
> >>>>>>> +++ b/mm/huge_memory.c
> >>>>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>>>>>>                                    zap_deposited_table(tlb->mm, pmd);
> >>>>>>>                            add_mm_counter(tlb->mm, mm_counter_file(folio),
> >>>>>>>                                           -HPAGE_PMD_NR);
> >>>>>>> +
> >>>>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
> >>>>>>> +                           likely(vma_has_recency(vma)))
> >>>>>>> +                               folio_mark_accessed(folio);
> >>>>>>
> >>>>>> Acked-by: Barry Song <baohua@kernel.org>
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>> I also came across an interesting observation: on a memory-limited system,
> >>>>>> demoting unmapped file folios in the LRU—specifically when their mapcount
> >>>>>> drops from 1 to 0—can actually improve performance.
> >>>>>
> >>>>> These file folios are used only once? Can folio_set_dropbehind() be used
> >>>>> to optimize it, which can avoid the LRU activity movement in
> >>>>> folio_mark_accessed()?
> >>>>
> >>>> For instance, when a process, such as a game, just exits, it can be expected
> >>>> that it won't be used again in the near future. As a result, demoting
> >>>> its previously
> >>>> unmapped file pages can improve performance.
> >>>
> >>> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
> >>> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
> >>> Will it work?
> >>
> >> Actually took a more aggressive approach and observed good performance
> >> improvements on phones. After zap_pte_range() called remove_rmap(),
> >> the following logic was added:
> >>
> >> if (file_folio && !folio_mapped())
> >>       deactivate_file_folio();
> >>
> >> This helps file folios from exiting processes get reclaimed more quickly
> >> during the MGLRU's min generation scan while the folios are probably
> >> in max gen.
> >>
> >> I'm not entirely sure if this is universally applicable or worth submitting as
> >> a patch.
> >
> > IMHO, I'm afraid this is not universally applicable. Although these file
> > folios have been unmapped, it's not certain that they won't be accessed
> > again. These file folios might be remapped and accessed again soon, or
> > accessed through read()/write() operations using a file descriptor.
> >
> > I agree with Zi's suggestion. Using some kind of madvise() hint to mark
> > these file folios as those that won't be accessed after being unmapped,
> > seems can work?
>
> Is that similar to MADV_COLD before unmap?

I'm not convinced that's the case. Although the previous app exits,
its exclusive
folios aren't useful to the newly launched app. For instance, Firefox's text and
other exclusive file-backed folios have no relevance to LibreOffice. If a user
terminates Firefox and then launches LibreOffice, marking Firefox’s young
PTE-mapped folios as accessed—thus activating them in the LRU—is
meaningless for LibreOffice.

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
Zi Yan April 11, 2025, 2:44 p.m. UTC | #14
On 11 Apr 2025, at 7:51, Barry Song wrote:

> On Fri, Apr 11, 2025 at 4:42 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 11.04.25 03:20, Baolin Wang wrote:
>>>
>>>
>>> On 2025/4/11 05:56, Barry Song wrote:
>>>> On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>
>>>>> On 10 Apr 2025, at 6:29, Barry Song wrote:
>>>>>
>>>>>> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
>>>>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2025/4/10 16:14, Barry Song wrote:
>>>>>>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
>>>>>>>> <baolin.wang@linux.alibaba.com> wrote:
>>>>>>>>>
>>>>>>>>> When investigating performance issues during file folio unmap, I noticed some
>>>>>>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
>>>>>>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
>>>>>>>>> folio as having seen activity, but this is not done for PMD-sized folios.
>>>>>>>>>
>>>>>>>>> This might not cause obvious issues, but a potential problem could be that,
>>>>>>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
>>>>>>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
>>>>>>>>> added for PMD-sized file folios?
>>>>>>>>>
>>>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>>>> ---
>>>>>>>>>     mm/huge_memory.c | 4 ++++
>>>>>>>>>     1 file changed, 4 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
>>>>>>>>> --- a/mm/huge_memory.c
>>>>>>>>> +++ b/mm/huge_memory.c
>>>>>>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>>>>>>                                    zap_deposited_table(tlb->mm, pmd);
>>>>>>>>>                            add_mm_counter(tlb->mm, mm_counter_file(folio),
>>>>>>>>>                                           -HPAGE_PMD_NR);
>>>>>>>>> +
>>>>>>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
>>>>>>>>> +                           likely(vma_has_recency(vma)))
>>>>>>>>> +                               folio_mark_accessed(folio);
>>>>>>>>
>>>>>>>> Acked-by: Barry Song <baohua@kernel.org>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>> I also came across an interesting observation: on a memory-limited system,
>>>>>>>> demoting unmapped file folios in the LRU—specifically when their mapcount
>>>>>>>> drops from 1 to 0—can actually improve performance.
>>>>>>>
>>>>>>> These file folios are used only once? Can folio_set_dropbehind() be used
>>>>>>> to optimize it, which can avoid the LRU activity movement in
>>>>>>> folio_mark_accessed()?
>>>>>>
>>>>>> For instance, when a process, such as a game, just exits, it can be expected
>>>>>> that it won't be used again in the near future. As a result, demoting
>>>>>> its previously
>>>>>> unmapped file pages can improve performance.
>>>>>
>>>>> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
>>>>> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
>>>>> Will it work?
>>>>
>>>> Actually took a more aggressive approach and observed good performance
>>>> improvements on phones. After zap_pte_range() called remove_rmap(),
>>>> the following logic was added:
>>>>
>>>> if (file_folio && !folio_mapped())
>>>>       deactivate_file_folio();
>>>>
>>>> This helps file folios from exiting processes get reclaimed more quickly
>>>> during the MGLRU's min generation scan while the folios are probably
>>>> in max gen.
>>>>
>>>> I'm not entirely sure if this is universally applicable or worth submitting as
>>>> a patch.
>>>
>>> IMHO, I'm afraid this is not universally applicable. Although these file
>>> folios have been unmapped, it's not certain that they won't be accessed
>>> again. These file folios might be remapped and accessed again soon, or
>>> accessed through read()/write() operations using a file descriptor.
>>>
>>> I agree with Zi's suggestion. Using some kind of madvise() hint to mark
>>> these file folios as those that won't be accessed after being unmapped,
>>> seems can work?
>>
>> Is that similar to MADV_COLD before unmap?
>
> I'm not convinced that's the case. Although the previous app exits,
> its exclusive
> folios aren't useful to the newly launched app. For instance, Firefox's text and
> other exclusive file-backed folios have no relevance to LibreOffice. If a user
> terminates Firefox and then launches LibreOffice, marking Firefox’s young
> PTE-mapped folios as accessed—thus activating them in the LRU—is
> meaningless for LibreOffice.

In terms of marking VMAs, can you do it in exit_mmap() by passing a new
parameter, like bool dying_vma, to unmap_vmas()? So that unmap_vmas()
can change exclusive file-backed VMAs to !vma_has_recency() to avoid
folio_mark_accessed().


Best Regards,
Yan, Zi
Barry Song April 12, 2025, 9:02 a.m. UTC | #15
On Sat, Apr 12, 2025 at 2:44 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 11 Apr 2025, at 7:51, Barry Song wrote:
>
> > On Fri, Apr 11, 2025 at 4:42 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 11.04.25 03:20, Baolin Wang wrote:
> >>>
> >>>
> >>> On 2025/4/11 05:56, Barry Song wrote:
> >>>> On Fri, Apr 11, 2025 at 3:13 AM Zi Yan <ziy@nvidia.com> wrote:
> >>>>>
> >>>>> On 10 Apr 2025, at 6:29, Barry Song wrote:
> >>>>>
> >>>>>> On Thu, Apr 10, 2025 at 9:05 PM Baolin Wang
> >>>>>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2025/4/10 16:14, Barry Song wrote:
> >>>>>>>> On Wed, Apr 9, 2025 at 1:16 AM Baolin Wang
> >>>>>>>> <baolin.wang@linux.alibaba.com> wrote:
> >>>>>>>>>
> >>>>>>>>> When investigating performance issues during file folio unmap, I noticed some
> >>>>>>>>> behavioral differences in handling non-PMD-sized folios and PMD-sized folios.
> >>>>>>>>> For non-PMD-sized file folios, it will call folio_mark_accessed() to mark the
> >>>>>>>>> folio as having seen activity, but this is not done for PMD-sized folios.
> >>>>>>>>>
> >>>>>>>>> This might not cause obvious issues, but a potential problem could be that,
> >>>>>>>>> it might lead to more frequent refaults of PMD-sized file folios under memory
> >>>>>>>>> pressure. Therefore, I am unsure whether the folio_mark_accessed() should be
> >>>>>>>>> added for PMD-sized file folios?
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>>>>>>>> ---
> >>>>>>>>>     mm/huge_memory.c | 4 ++++
> >>>>>>>>>     1 file changed, 4 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>>>>>>> index 6ac6d468af0d..b3ade7ac5bbf 100644
> >>>>>>>>> --- a/mm/huge_memory.c
> >>>>>>>>> +++ b/mm/huge_memory.c
> >>>>>>>>> @@ -2262,6 +2262,10 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>>>>>>>>                                    zap_deposited_table(tlb->mm, pmd);
> >>>>>>>>>                            add_mm_counter(tlb->mm, mm_counter_file(folio),
> >>>>>>>>>                                           -HPAGE_PMD_NR);
> >>>>>>>>> +
> >>>>>>>>> +                       if (flush_needed && pmd_young(orig_pmd) &&
> >>>>>>>>> +                           likely(vma_has_recency(vma)))
> >>>>>>>>> +                               folio_mark_accessed(folio);
> >>>>>>>>
> >>>>>>>> Acked-by: Barry Song <baohua@kernel.org>
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>>> I also came across an interesting observation: on a memory-limited system,
> >>>>>>>> demoting unmapped file folios in the LRU—specifically when their mapcount
> >>>>>>>> drops from 1 to 0—can actually improve performance.
> >>>>>>>
> >>>>>>> These file folios are used only once? Can folio_set_dropbehind() be used
> >>>>>>> to optimize it, which can avoid the LRU activity movement in
> >>>>>>> folio_mark_accessed()?
> >>>>>>
> >>>>>> For instance, when a process, such as a game, just exits, it can be expected
> >>>>>> that it won't be used again in the near future. As a result, demoting
> >>>>>> its previously
> >>>>>> unmapped file pages can improve performance.
> >>>>>
> >>>>> Is it possible to mark the dying VMAs either VM_SEQ_READ or VM_RAND_READ
> >>>>> so that folio_mark_accessed() will be skipped? Or a new vm_flag?
> >>>>> Will it work?
> >>>>
> >>>> Actually took a more aggressive approach and observed good performance
> >>>> improvements on phones. After zap_pte_range() called remove_rmap(),
> >>>> the following logic was added:
> >>>>
> >>>> if (file_folio && !folio_mapped())
> >>>>       deactivate_file_folio();
> >>>>
> >>>> This helps file folios from exiting processes get reclaimed more quickly
> >>>> during the MGLRU's min generation scan while the folios are probably
> >>>> in max gen.
> >>>>
> >>>> I'm not entirely sure if this is universally applicable or worth submitting as
> >>>> a patch.
> >>>
> >>> IMHO, I'm afraid this is not universally applicable. Although these file
> >>> folios have been unmapped, it's not certain that they won't be accessed
> >>> again. These file folios might be remapped and accessed again soon, or
> >>> accessed through read()/write() operations using a file descriptor.
> >>>
> >>> I agree with Zi's suggestion. Using some kind of madvise() hint to mark
> >>> these file folios as those that won't be accessed after being unmapped,
> >>> seems can work?
> >>
> >> Is that similar to MADV_COLD before unmap?
> >
> > I'm not convinced that's the case. Although the previous app exits,
> > its exclusive
> > folios aren't useful to the newly launched app. For instance, Firefox's text and
> > other exclusive file-backed folios have no relevance to LibreOffice. If a user
> > terminates Firefox and then launches LibreOffice, marking Firefox’s young
> > PTE-mapped folios as accessed—thus activating them in the LRU—is
> > meaningless for LibreOffice.
>
> In terms of marking VMAs, can you do it in exit_mmap() by passing a new
> parameter, like bool dying_vma, to unmap_vmas()? So that unmap_vmas()
> can change exclusive file-backed VMAs to !vma_has_recency() to avoid
> folio_mark_accessed().

Good idea. Alternatively, we could infer the process's exiting or OOM-reaped
state from its mm struct, removing the need for a new parameter as the RFC
I sent just now:

https://lore.kernel.org/linux-mm/20250412085852.48524-1-21cnbao@gmail.com/

>
>
> Best Regards,
> Yan, Zi

Thanks
Barry
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6ac6d468af0d..b3ade7ac5bbf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2262,6 +2262,10 @@  int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 				zap_deposited_table(tlb->mm, pmd);
 			add_mm_counter(tlb->mm, mm_counter_file(folio),
 				       -HPAGE_PMD_NR);
+
+			if (flush_needed && pmd_young(orig_pmd) &&
+			    likely(vma_has_recency(vma)))
+				folio_mark_accessed(folio);
 		}
 
 		spin_unlock(ptl);