diff mbox series

[RFC,5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()

Message ID 20240118111036.72641-6-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series mm: support large folios swap-in | expand

Commit Message

Barry Song Jan. 18, 2024, 11:10 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

In do_swap_page(), while supporting large folio swap-in, we are using the helper
folio_add_anon_rmap_ptes. This is triggerring a WARN_ON in __folio_add_anon_rmap.
We can make the warning quiet by two ways
1. in do_swap_page, we call folio_add_new_anon_rmap() if we are sure the large
folio is new allocated one; we call folio_add_anon_rmap_ptes() if we find the
large folio in swapcache.
2. we always call folio_add_anon_rmap_ptes() in do_swap_page but weaken the
WARN_ON in __folio_add_anon_rmap() by letting the WARN_ON less sensitive.

Option 2 seems to be better for do_swap_page() as it can use unified code for
all cases.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Chuanhua Han <hanchuanhua@oppo.com>
---
 mm/rmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

David Hildenbrand Jan. 18, 2024, 11:54 a.m. UTC | #1
On 18.01.24 12:10, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> In do_swap_page(), while supporting large folio swap-in, we are using the helper
> folio_add_anon_rmap_ptes. This is triggerring a WARN_ON in __folio_add_anon_rmap.
> We can make the warning quiet by two ways
> 1. in do_swap_page, we call folio_add_new_anon_rmap() if we are sure the large
> folio is new allocated one; we call folio_add_anon_rmap_ptes() if we find the
> large folio in swapcache.
> 2. we always call folio_add_anon_rmap_ptes() in do_swap_page but weaken the
> WARN_ON in __folio_add_anon_rmap() by letting the WARN_ON less sensitive.
> 
> Option 2 seems to be better for do_swap_page() as it can use unified code for
> all cases.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Tested-by: Chuanhua Han <hanchuanhua@oppo.com>
> ---
>   mm/rmap.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f5d43edad529..469fcfd32317 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1304,7 +1304,10 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>   		 * page.
>   		 */
>   		VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> -				 level != RMAP_LEVEL_PMD, folio);
> +				 level != RMAP_LEVEL_PMD &&
> +				 (!IS_ALIGNED(address, nr_pages * PAGE_SIZE) ||
> +				 (folio_test_swapcache(folio) && !IS_ALIGNED(folio->index, nr_pages)) ||
> +				 page != &folio->page), folio);
>   		__folio_set_anon(folio, vma, address,
>   				 !!(flags & RMAP_EXCLUSIVE));
>   	} else if (likely(!folio_test_ksm(folio))) {


I have on my todo list to move all that !anon handling out of 
folio_add_anon_rmap_ptes(), and instead make swapin code call add 
folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag 
then (-> whole new folio exclusive).

That's the cleaner approach.
Barry Song Jan. 23, 2024, 6:49 a.m. UTC | #2
On Thu, Jan 18, 2024 at 7:54 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 18.01.24 12:10, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > In do_swap_page(), while supporting large folio swap-in, we are using the helper
> > folio_add_anon_rmap_ptes. This is triggerring a WARN_ON in __folio_add_anon_rmap.
> > We can make the warning quiet by two ways
> > 1. in do_swap_page, we call folio_add_new_anon_rmap() if we are sure the large
> > folio is new allocated one; we call folio_add_anon_rmap_ptes() if we find the
> > large folio in swapcache.
> > 2. we always call folio_add_anon_rmap_ptes() in do_swap_page but weaken the
> > WARN_ON in __folio_add_anon_rmap() by letting the WARN_ON less sensitive.
> >
> > Option 2 seems to be better for do_swap_page() as it can use unified code for
> > all cases.
> >
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > Tested-by: Chuanhua Han <hanchuanhua@oppo.com>
> > ---
> >   mm/rmap.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index f5d43edad529..469fcfd32317 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1304,7 +1304,10 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> >                * page.
> >                */
> >               VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> > -                              level != RMAP_LEVEL_PMD, folio);
> > +                              level != RMAP_LEVEL_PMD &&
> > +                              (!IS_ALIGNED(address, nr_pages * PAGE_SIZE) ||
> > +                              (folio_test_swapcache(folio) && !IS_ALIGNED(folio->index, nr_pages)) ||
> > +                              page != &folio->page), folio);
> >               __folio_set_anon(folio, vma, address,
> >                                !!(flags & RMAP_EXCLUSIVE));
> >       } else if (likely(!folio_test_ksm(folio))) {
>
>
> I have on my todo list to move all that !anon handling out of
> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> then (-> whole new folio exclusive).
>
> That's the cleaner approach.
>

one tricky thing is that sometimes it is hard to know who is the first
one to add rmap and thus should
call folio_add_new_anon_rmap.
especially when we want to support swapin_readahead(), the one who
allocated large filio might not
be that one who firstly does rmap.
is it an acceptable way to do the below in do_swap_page?
if (!folio_test_anon(folio))
      folio_add_new_anon_rmap()
else
      folio_add_anon_rmap_ptes()

> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
Chris Li Jan. 27, 2024, 11:41 p.m. UTC | #3
On Thu, Jan 18, 2024 at 3:12 AM Barry Song <21cnbao@gmail.com> wrote:
>
> From: Barry Song <v-songbaohua@oppo.com>
>
> In do_swap_page(), while supporting large folio swap-in, we are using the helper
> folio_add_anon_rmap_ptes. This is triggerring a WARN_ON in __folio_add_anon_rmap.
> We can make the warning quiet by two ways
> 1. in do_swap_page, we call folio_add_new_anon_rmap() if we are sure the large
> folio is new allocated one; we call folio_add_anon_rmap_ptes() if we find the
> large folio in swapcache.
> 2. we always call folio_add_anon_rmap_ptes() in do_swap_page but weaken the
> WARN_ON in __folio_add_anon_rmap() by letting the WARN_ON less sensitive.
>
> Option 2 seems to be better for do_swap_page() as it can use unified code for
> all cases.
>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Tested-by: Chuanhua Han <hanchuanhua@oppo.com>
> ---
>  mm/rmap.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f5d43edad529..469fcfd32317 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1304,7 +1304,10 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>                  * page.
>                  */
>                 VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> -                                level != RMAP_LEVEL_PMD, folio);
> +                                level != RMAP_LEVEL_PMD &&
> +                                (!IS_ALIGNED(address, nr_pages * PAGE_SIZE) ||
Some minor nitpick here.
There are two leading "(" in this and next line. This is the first "("
> +                                (folio_test_swapcache(folio) && !IS_ALIGNED(folio->index, nr_pages)) ||
 Second "("  here.

These two "(" are NOT at the same nested level. They should not have
the same indentation.
On my first glance, I misread the scope of the "||" due to the same
level indentation.
We can do one of the two
1) add more indentation on the second "(" to reflect the nesting level.

> +                                page != &folio->page), folio);

Also moving the folio to the next line, because the multiline
expression is huge and complex. Make it obvious the ending "folio" is
not part of the testing condition.

2) Move the multiline test condition to a checking function. Inside
the function it can return early when the shortcut condition is met.
That will also help the readability of this warning condition.

Chris

>                 __folio_set_anon(folio, vma, address,
>                                  !!(flags & RMAP_EXCLUSIVE));
>         } else if (likely(!folio_test_ksm(folio))) {
> --
> 2.34.1
>
>
Chris Li Jan. 29, 2024, 3:25 a.m. UTC | #4
Hi David and Barry,

On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@gmail.com> wrote:
>
> >
> >
> > I have on my todo list to move all that !anon handling out of
> > folio_add_anon_rmap_ptes(), and instead make swapin code call add
> > folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> > then (-> whole new folio exclusive).
> >
> > That's the cleaner approach.
> >
>
> one tricky thing is that sometimes it is hard to know who is the first
> one to add rmap and thus should
> call folio_add_new_anon_rmap.
> especially when we want to support swapin_readahead(), the one who
> allocated large filio might not
> be that one who firstly does rmap.

I think Barry has a point. Two tasks might race to swap in the folio
then race to perform the rmap.
folio_add_new_anon_rmap() should only call a folio that is absolutely
"new", not shared. The sharing in swap cache disqualifies that
condition.

> is it an acceptable way to do the below in do_swap_page?
> if (!folio_test_anon(folio))
>       folio_add_new_anon_rmap()
> else
>       folio_add_anon_rmap_ptes()

I am curious to know the answer as well.

BTW, that test might have a race as well. By the time the task got
!anon result, this result might get changed by another task. We need
to make sure in the caller context this race can't happen. Otherwise
we can't do the above safely.

Chris.
David Hildenbrand Jan. 29, 2024, 10:06 a.m. UTC | #5
On 29.01.24 04:25, Chris Li wrote:
> Hi David and Barry,
> 
> On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@gmail.com> wrote:
>>
>>>
>>>
>>> I have on my todo list to move all that !anon handling out of
>>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
>>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
>>> then (-> whole new folio exclusive).
>>>
>>> That's the cleaner approach.
>>>
>>
>> one tricky thing is that sometimes it is hard to know who is the first
>> one to add rmap and thus should
>> call folio_add_new_anon_rmap.
>> especially when we want to support swapin_readahead(), the one who
>> allocated large filio might not
>> be that one who firstly does rmap.
> 
> I think Barry has a point. Two tasks might race to swap in the folio
> then race to perform the rmap.
> folio_add_new_anon_rmap() should only call a folio that is absolutely
> "new", not shared. The sharing in swap cache disqualifies that
> condition.

We have to hold the folio lock. So only one task at a time might do the
folio_add_anon_rmap_ptes() right now, and the 
folio_add_new_shared_anon_rmap() in the future [below].

Also observe how folio_add_anon_rmap_ptes() states that one must hold 
the page lock, because otherwise this would all be completely racy.

 From the pte swp exclusive flags, we know for sure whether we are 
dealing with exclusive vs. shared. I think patch #6 does not properly 
check that all entries are actually the same in that regard (all 
exclusive vs all shared). That likely needs fixing.

[I have converting per-page PageAnonExclusive flags to a single 
per-folio flag on my todo list. I suspect that we'll keep the 
per-swp-pte exlusive bits, but the question is rather what we can 
actually make work, because swap and migration just make it much more 
complicated. Anyhow, future work]

> 
>> is it an acceptable way to do the below in do_swap_page?
>> if (!folio_test_anon(folio))
>>        folio_add_new_anon_rmap()
>> else
>>        folio_add_anon_rmap_ptes()
> 
> I am curious to know the answer as well.


Yes, the end code should likely be something like:

/* ksm created a completely new copy */
if (unlikely(folio != swapcache && swapcache)) {
	folio_add_new_anon_rmap(folio, vma, vmf->address);
	folio_add_lru_vma(folio, vma);
} else if (folio_test_anon(folio)) {
	folio_add_anon_rmap_ptes(rmap_flags)
} else {
	folio_add_new_anon_rmap(rmap_flags)
}

Maybe we want to avoid teaching all existing folio_add_new_anon_rmap() 
callers about a new flag, and just have a new 
folio_add_new_shared_anon_rmap() instead. TBD.

> 
> BTW, that test might have a race as well. By the time the task got
> !anon result, this result might get changed by another task. We need
> to make sure in the caller context this race can't happen. Otherwise
> we can't do the above safely.
Again, folio lock. Observe the folio_lock_or_retry() call that covers 
our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
Chris Li Jan. 29, 2024, 4:31 p.m. UTC | #6
On Mon, Jan 29, 2024 at 2:07 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 29.01.24 04:25, Chris Li wrote:
> > Hi David and Barry,
> >
> > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@gmail.com> wrote:
> >>
> >>>
> >>>
> >>> I have on my todo list to move all that !anon handling out of
> >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> >>> then (-> whole new folio exclusive).
> >>>
> >>> That's the cleaner approach.
> >>>
> >>
> >> one tricky thing is that sometimes it is hard to know who is the first
> >> one to add rmap and thus should
> >> call folio_add_new_anon_rmap.
> >> especially when we want to support swapin_readahead(), the one who
> >> allocated large filio might not
> >> be that one who firstly does rmap.
> >
> > I think Barry has a point. Two tasks might race to swap in the folio
> > then race to perform the rmap.
> > folio_add_new_anon_rmap() should only call a folio that is absolutely
> > "new", not shared. The sharing in swap cache disqualifies that
> > condition.
>
> We have to hold the folio lock. So only one task at a time might do the
> folio_add_anon_rmap_ptes() right now, and the
> folio_add_new_shared_anon_rmap() in the future [below].
>

Ah, I see. The folio_lock() is the answer I am looking for.

> Also observe how folio_add_anon_rmap_ptes() states that one must hold
> the page lock, because otherwise this would all be completely racy.
>
>  From the pte swp exclusive flags, we know for sure whether we are
> dealing with exclusive vs. shared. I think patch #6 does not properly
> check that all entries are actually the same in that regard (all
> exclusive vs all shared). That likely needs fixing.
>
> [I have converting per-page PageAnonExclusive flags to a single
> per-folio flag on my todo list. I suspect that we'll keep the
> per-swp-pte exlusive bits, but the question is rather what we can
> actually make work, because swap and migration just make it much more
> complicated. Anyhow, future work]
>
> >
> >> is it an acceptable way to do the below in do_swap_page?
> >> if (!folio_test_anon(folio))
> >>        folio_add_new_anon_rmap()
> >> else
> >>        folio_add_anon_rmap_ptes()
> >
> > I am curious to know the answer as well.
>
>
> Yes, the end code should likely be something like:
>
> /* ksm created a completely new copy */
> if (unlikely(folio != swapcache && swapcache)) {
>         folio_add_new_anon_rmap(folio, vma, vmf->address);
>         folio_add_lru_vma(folio, vma);
> } else if (folio_test_anon(folio)) {
>         folio_add_anon_rmap_ptes(rmap_flags)
> } else {
>         folio_add_new_anon_rmap(rmap_flags)
> }
>
> Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
> callers about a new flag, and just have a new
> folio_add_new_shared_anon_rmap() instead. TBD.

There is more than one caller needed to perform that dance around
folio_test_anon() then decide which function to call. It would be nice
to have a wrapper function folio_add_new_shared_anon_rmap() to
abstract this behavior.


>
> >
> > BTW, that test might have a race as well. By the time the task got
> > !anon result, this result might get changed by another task. We need
> > to make sure in the caller context this race can't happen. Otherwise
> > we can't do the above safely.
> Again, folio lock. Observe the folio_lock_or_retry() call that covers
> our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.

Ack. Thanks for the explanation.

Chris
Barry Song Feb. 26, 2024, 5:05 a.m. UTC | #7
On Tue, Jan 30, 2024 at 5:32 AM Chris Li <chrisl@kernel.org> wrote:
>
> On Mon, Jan 29, 2024 at 2:07 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 29.01.24 04:25, Chris Li wrote:
> > > Hi David and Barry,
> > >
> > > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@gmail.com> wrote:
> > >>
> > >>>
> > >>>
> > >>> I have on my todo list to move all that !anon handling out of
> > >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> > >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> > >>> then (-> whole new folio exclusive).
> > >>>
> > >>> That's the cleaner approach.
> > >>>
> > >>
> > >> one tricky thing is that sometimes it is hard to know who is the first
> > >> one to add rmap and thus should
> > >> call folio_add_new_anon_rmap.
> > >> especially when we want to support swapin_readahead(), the one who
> > >> allocated large filio might not
> > >> be that one who firstly does rmap.
> > >
> > > I think Barry has a point. Two tasks might race to swap in the folio
> > > then race to perform the rmap.
> > > folio_add_new_anon_rmap() should only call a folio that is absolutely
> > > "new", not shared. The sharing in swap cache disqualifies that
> > > condition.
> >
> > We have to hold the folio lock. So only one task at a time might do the
> > folio_add_anon_rmap_ptes() right now, and the
> > folio_add_new_shared_anon_rmap() in the future [below].
> >
>
> Ah, I see. The folio_lock() is the answer I am looking for.
>
> > Also observe how folio_add_anon_rmap_ptes() states that one must hold
> > the page lock, because otherwise this would all be completely racy.
> >
> >  From the pte swp exclusive flags, we know for sure whether we are
> > dealing with exclusive vs. shared. I think patch #6 does not properly
> > check that all entries are actually the same in that regard (all
> > exclusive vs all shared). That likely needs fixing.
> >
> > [I have converting per-page PageAnonExclusive flags to a single
> > per-folio flag on my todo list. I suspect that we'll keep the
> > per-swp-pte exlusive bits, but the question is rather what we can
> > actually make work, because swap and migration just make it much more
> > complicated. Anyhow, future work]
> >
> > >
> > >> is it an acceptable way to do the below in do_swap_page?
> > >> if (!folio_test_anon(folio))
> > >>        folio_add_new_anon_rmap()
> > >> else
> > >>        folio_add_anon_rmap_ptes()
> > >
> > > I am curious to know the answer as well.
> >
> >
> > Yes, the end code should likely be something like:
> >
> > /* ksm created a completely new copy */
> > if (unlikely(folio != swapcache && swapcache)) {
> >         folio_add_new_anon_rmap(folio, vma, vmf->address);
> >         folio_add_lru_vma(folio, vma);
> > } else if (folio_test_anon(folio)) {
> >         folio_add_anon_rmap_ptes(rmap_flags)
> > } else {
> >         folio_add_new_anon_rmap(rmap_flags)
> > }
> >
> > Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
> > callers about a new flag, and just have a new
> > folio_add_new_shared_anon_rmap() instead. TBD.

if we have to add a wrapper like folio_add_new_shared_anon_rmap()
to avoid "if (folio_test_anon(folio))" and "else" everywhere, why not
we just do it in folio_add_anon_rmap_ptes() ?

folio_add_anon_rmap_ptes()
{
      if (!folio_test_anon(folio))
               return folio_add_new_anon_rmap();
}

Anyway, I am going to change the patch 4/6 to if(folio_test_anon)/else first
and drop this 5/6.
we may figure out if we need a wrapper later.

>
> There is more than one caller needed to perform that dance around
> folio_test_anon() then decide which function to call. It would be nice
> to have a wrapper function folio_add_new_shared_anon_rmap() to
> abstract this behavior.
>
>
> >
> > >
> > > BTW, that test might have a race as well. By the time the task got
> > > !anon result, this result might get changed by another task. We need
> > > to make sure in the caller context this race can't happen. Otherwise
> > > we can't do the above safely.
> > Again, folio lock. Observe the folio_lock_or_retry() call that covers
> > our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
>
> Ack. Thanks for the explanation.
>
> Chris

Thanks
Barry
Barry Song April 6, 2024, 11:27 p.m. UTC | #8
On Mon, Jan 29, 2024 at 11:07 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 29.01.24 04:25, Chris Li wrote:
> > Hi David and Barry,
> >
> > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@gmail.com> wrote:
> >>
> >>>
> >>>
> >>> I have on my todo list to move all that !anon handling out of
> >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> >>> then (-> whole new folio exclusive).
> >>>
> >>> That's the cleaner approach.
> >>>
> >>
> >> one tricky thing is that sometimes it is hard to know who is the first
> >> one to add rmap and thus should
> >> call folio_add_new_anon_rmap.
> >> especially when we want to support swapin_readahead(), the one who
> >> allocated large filio might not
> >> be that one who firstly does rmap.
> >
> > I think Barry has a point. Two tasks might race to swap in the folio
> > then race to perform the rmap.
> > folio_add_new_anon_rmap() should only call a folio that is absolutely
> > "new", not shared. The sharing in swap cache disqualifies that
> > condition.
>
> We have to hold the folio lock. So only one task at a time might do the
> folio_add_anon_rmap_ptes() right now, and the
> folio_add_new_shared_anon_rmap() in the future [below].
>
> Also observe how folio_add_anon_rmap_ptes() states that one must hold
> the page lock, because otherwise this would all be completely racy.
>
>  From the pte swp exclusive flags, we know for sure whether we are
> dealing with exclusive vs. shared. I think patch #6 does not properly
> check that all entries are actually the same in that regard (all
> exclusive vs all shared). That likely needs fixing.
>
> [I have converting per-page PageAnonExclusive flags to a single
> per-folio flag on my todo list. I suspect that we'll keep the
> per-swp-pte exlusive bits, but the question is rather what we can
> actually make work, because swap and migration just make it much more
> complicated. Anyhow, future work]
>
> >
> >> is it an acceptable way to do the below in do_swap_page?
> >> if (!folio_test_anon(folio))
> >>        folio_add_new_anon_rmap()
> >> else
> >>        folio_add_anon_rmap_ptes()
> >
> > I am curious to know the answer as well.
>
>
> Yes, the end code should likely be something like:
>
> /* ksm created a completely new copy */
> if (unlikely(folio != swapcache && swapcache)) {
>         folio_add_new_anon_rmap(folio, vma, vmf->address);
>         folio_add_lru_vma(folio, vma);
> } else if (folio_test_anon(folio)) {
>         folio_add_anon_rmap_ptes(rmap_flags)
> } else {
>         folio_add_new_anon_rmap(rmap_flags)
> }
>
> Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
> callers about a new flag, and just have a new
> folio_add_new_shared_anon_rmap() instead. TBD.

right.

We need to clarify that the new anon_folio might not necessarily be exclusive.
Unlike folio_add_new_anon_rmap, which assumes the new folio is exclusive,
folio_add_anon_rmap_ptes is capable of handling both exclusive and
non-exclusive new anon folios.

The code would be like:

 if (unlikely(folio != swapcache && swapcache)) {
         folio_add_new_anon_rmap(folio, vma, vmf->address);
         folio_add_lru_vma(folio, vma);
 } else if (!folio_test_anon(folio)) {
         folio_add_anon_rmap_ptes(rmap_flags);
 } else {
         if (exclusive)
                 folio_add_new_anon_rmap();
         else
                 folio_add_new_shared_anon_rmap();
 }

It appears a bit lengthy?

>
> >
> > BTW, that test might have a race as well. By the time the task got
> > !anon result, this result might get changed by another task. We need
> > to make sure in the caller context this race can't happen. Otherwise
> > we can't do the above safely.
> Again, folio lock. Observe the folio_lock_or_retry() call that covers
> our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index f5d43edad529..469fcfd32317 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1304,7 +1304,10 @@  static __always_inline void __folio_add_anon_rmap(struct folio *folio,
 		 * page.
 		 */
 		VM_WARN_ON_FOLIO(folio_test_large(folio) &&
-				 level != RMAP_LEVEL_PMD, folio);
+				 level != RMAP_LEVEL_PMD &&
+				 (!IS_ALIGNED(address, nr_pages * PAGE_SIZE) ||
+				 (folio_test_swapcache(folio) && !IS_ALIGNED(folio->index, nr_pages)) ||
+				 page != &folio->page), folio);
 		__folio_set_anon(folio, vma, address,
 				 !!(flags & RMAP_EXCLUSIVE));
 	} else if (likely(!folio_test_ksm(folio))) {