diff mbox series

[mm-unstable] mm: folio_add_new_anon_rmap() careful __folio_set_swapbacked()

Message ID f3599b1d-8323-0dc5-e9e0-fdb3cfc3dd5a@google.com (mailing list archive)
State New
Headers show
Series [mm-unstable] mm: folio_add_new_anon_rmap() careful __folio_set_swapbacked() | expand

Commit Message

Hugh Dickins June 25, 2024, 5 a.m. UTC
Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
false" has extended folio_add_new_anon_rmap() to use on non-exclusive
folios, already visible to others in swap cache and on LRU.

That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
overwriting concurrent atomic operations on folio->flags, losing bits
added or restoring bits cleared.  Since it's only used in this risky
way when folio_test_locked and !folio_test_anon, many such races are
excluded; but, for example, isolations by folio_test_clear_lru() are
vulnerable, and setting or clearing active.

It could just use the atomic folio_set_swapbacked(); but this function
does try to avoid atomics where it can, so use a branch instead: just
avoid setting swapbacked when it is already set, that is good enough.
(Swapbacked is normally stable once set: lazyfree can undo it, but
only later, when found anon in a page table.)

This fixes a lot of instability under compaction and swapping loads:
assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
frees - though I've not worked out what races could lead to the latter.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/rmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Barry Song June 25, 2024, 5:55 a.m. UTC | #1
On Tue, Jun 25, 2024 at 5:00 PM Hugh Dickins <hughd@google.com> wrote:
>
> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
> folios, already visible to others in swap cache and on LRU.
>
> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
> overwriting concurrent atomic operations on folio->flags, losing bits
> added or restoring bits cleared.  Since it's only used in this risky
> way when folio_test_locked and !folio_test_anon, many such races are
> excluded; but, for example, isolations by folio_test_clear_lru() are
> vulnerable, and setting or clearing active.
>
> It could just use the atomic folio_set_swapbacked(); but this function
> does try to avoid atomics where it can, so use a branch instead: just
> avoid setting swapbacked when it is already set, that is good enough.
> (Swapbacked is normally stable once set: lazyfree can undo it, but
> only later, when found anon in a page table.)
>
> This fixes a lot of instability under compaction and swapping loads:
> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
> frees - though I've not worked out what races could lead to the latter.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>

Thanks a lot, Hugh. Sorry for my mistake. I guess we should squash this into
patch 1/3 "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio) ==
false"?
Andrew, could you please help to squash this one?

> ---
>  mm/rmap.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index df1a43295c85..5394c1178bf1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>         VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>         VM_BUG_ON_VMA(address < vma->vm_start ||
>                         address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> -       __folio_set_swapbacked(folio);
> +
> +       if (!folio_test_swapbacked(folio))
> +               __folio_set_swapbacked(folio);
>         __folio_set_anon(folio, vma, address, exclusive);
>
>         if (likely(!folio_test_large(folio))) {
> --
> 2.35.3
>

Thanks
Barry
Baolin Wang June 25, 2024, 7:04 a.m. UTC | #2
On 2024/6/25 13:55, Barry Song wrote:
> On Tue, Jun 25, 2024 at 5:00 PM Hugh Dickins <hughd@google.com> wrote:
>>
>> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
>> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
>> folios, already visible to others in swap cache and on LRU.
>>
>> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
>> overwriting concurrent atomic operations on folio->flags, losing bits
>> added or restoring bits cleared.  Since it's only used in this risky
>> way when folio_test_locked and !folio_test_anon, many such races are
>> excluded; but, for example, isolations by folio_test_clear_lru() are
>> vulnerable, and setting or clearing active.
>>
>> It could just use the atomic folio_set_swapbacked(); but this function
>> does try to avoid atomics where it can, so use a branch instead: just
>> avoid setting swapbacked when it is already set, that is good enough.
>> (Swapbacked is normally stable once set: lazyfree can undo it, but
>> only later, when found anon in a page table.)
>>
>> This fixes a lot of instability under compaction and swapping loads:
>> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
>> frees - though I've not worked out what races could lead to the latter.
>>
>> Signed-off-by: Hugh Dickins <hughd@google.com>
> 
> Thanks a lot, Hugh. Sorry for my mistake. I guess we should squash this into
> patch 1/3 "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio) ==
> false"?
> Andrew, could you please help to squash this one?

Hope the commit message written by Hugh can also be squashed into the 
original patch, as it is very helpful to me :)

>> ---
>>   mm/rmap.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index df1a43295c85..5394c1178bf1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>          VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>>          VM_BUG_ON_VMA(address < vma->vm_start ||
>>                          address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>> -       __folio_set_swapbacked(folio);
>> +
>> +       if (!folio_test_swapbacked(folio))
>> +               __folio_set_swapbacked(folio);
>>          __folio_set_anon(folio, vma, address, exclusive);
>>
>>          if (likely(!folio_test_large(folio))) {
>> --
>> 2.35.3
>>
> 
> Thanks
> Barry
David Hildenbrand June 25, 2024, 7:40 a.m. UTC | #3
On 25.06.24 07:00, Hugh Dickins wrote:
> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
> folios, already visible to others in swap cache and on LRU.
> 
> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
> overwriting concurrent atomic operations on folio->flags, losing bits
> added or restoring bits cleared.  Since it's only used in this risky
> way when folio_test_locked and !folio_test_anon, many such races are
> excluded; but, for example, isolations by folio_test_clear_lru() are
> vulnerable, and setting or clearing active.
> 
> It could just use the atomic folio_set_swapbacked(); but this function
> does try to avoid atomics where it can, so use a branch instead: just
> avoid setting swapbacked when it is already set, that is good enough.
> (Swapbacked is normally stable once set: lazyfree can undo it, but
> only later, when found anon in a page table.)
> 
> This fixes a lot of instability under compaction and swapping loads:
> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
> frees - though I've not worked out what races could lead to the latter.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   mm/rmap.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index df1a43295c85..5394c1178bf1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>   	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>   	VM_BUG_ON_VMA(address < vma->vm_start ||
>   			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> -	__folio_set_swapbacked(folio);
> +
> +	if (!folio_test_swapbacked(folio))
> +		__folio_set_swapbacked(folio);
>   	__folio_set_anon(folio, vma, address, exclusive);
>   
>   	if (likely(!folio_test_large(folio))) {

LGTM.

I'll point out that it's sufficient for a PFN walker to do a tryget + 
trylock to cause trouble.

Fortunately isolate_movable_page() will check __folio_test_movable() 
before doing the trylock.

Reviewed-by: David Hildenbrand <david@redhat.com>
Hugh Dickins June 25, 2024, 7:37 p.m. UTC | #4
On Tue, 25 Jun 2024, David Hildenbrand wrote:
> On 25.06.24 07:00, Hugh Dickins wrote:
> > Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
> > false" has extended folio_add_new_anon_rmap() to use on non-exclusive
> > folios, already visible to others in swap cache and on LRU.
> > 
> > That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
> > overwriting concurrent atomic operations on folio->flags, losing bits
> > added or restoring bits cleared.  Since it's only used in this risky
> > way when folio_test_locked and !folio_test_anon, many such races are
> > excluded; but, for example, isolations by folio_test_clear_lru() are
> > vulnerable, and setting or clearing active.
> > 
> > It could just use the atomic folio_set_swapbacked(); but this function
> > does try to avoid atomics where it can, so use a branch instead: just
> > avoid setting swapbacked when it is already set, that is good enough.
> > (Swapbacked is normally stable once set: lazyfree can undo it, but
> > only later, when found anon in a page table.)
> > 
> > This fixes a lot of instability under compaction and swapping loads:
> > assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
> > frees - though I've not worked out what races could lead to the latter.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> >   mm/rmap.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index df1a43295c85..5394c1178bf1 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio,
> > struct vm_area_struct *vma,
> >    VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> >    VM_BUG_ON_VMA(address < vma->vm_start ||
> >   			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> > -	__folio_set_swapbacked(folio);
> > +
> > +	if (!folio_test_swapbacked(folio))
> > +		__folio_set_swapbacked(folio);
> >    __folio_set_anon(folio, vma, address, exclusive);
> >   
> >    if (likely(!folio_test_large(folio))) {
> 
> LGTM.
> 
> I'll point out that it's sufficient for a PFN walker to do a tryget + trylock
> to cause trouble.

That surprises me.  I thought a racer's tryget was irrelevant (touching
a different field) and its trylock not a problem, since "we" hold the
folio lock throughout.  If my mental model is too naive there, please
explain in more detail: we all need to understand this better.

> 
> Fortunately isolate_movable_page() will check __folio_test_movable() before
> doing the trylock.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks,
Hugh
David Hildenbrand June 25, 2024, 7:45 p.m. UTC | #5
On 25.06.24 21:37, Hugh Dickins wrote:
> On Tue, 25 Jun 2024, David Hildenbrand wrote:
>> On 25.06.24 07:00, Hugh Dickins wrote:
>>> Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==
>>> false" has extended folio_add_new_anon_rmap() to use on non-exclusive
>>> folios, already visible to others in swap cache and on LRU.
>>>
>>> That renders its non-atomic __folio_set_swapbacked() unsafe: it risks
>>> overwriting concurrent atomic operations on folio->flags, losing bits
>>> added or restoring bits cleared.  Since it's only used in this risky
>>> way when folio_test_locked and !folio_test_anon, many such races are
>>> excluded; but, for example, isolations by folio_test_clear_lru() are
>>> vulnerable, and setting or clearing active.
>>>
>>> It could just use the atomic folio_set_swapbacked(); but this function
>>> does try to avoid atomics where it can, so use a branch instead: just
>>> avoid setting swapbacked when it is already set, that is good enough.
>>> (Swapbacked is normally stable once set: lazyfree can undo it, but
>>> only later, when found anon in a page table.)
>>>
>>> This fixes a lot of instability under compaction and swapping loads:
>>> assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double
>>> frees - though I've not worked out what races could lead to the latter.
>>>
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>> ---
>>>    mm/rmap.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index df1a43295c85..5394c1178bf1 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1408,7 +1408,9 @@ void folio_add_new_anon_rmap(struct folio *folio,
>>> struct vm_area_struct *vma,
>>>     VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>>>     VM_BUG_ON_VMA(address < vma->vm_start ||
>>>    			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>> -	__folio_set_swapbacked(folio);
>>> +
>>> +	if (!folio_test_swapbacked(folio))
>>> +		__folio_set_swapbacked(folio);
>>>     __folio_set_anon(folio, vma, address, exclusive);
>>>    
>>>     if (likely(!folio_test_large(folio))) {
>>
>> LGTM.
>>
>> I'll point out that it's sufficient for a PFN walker to do a tryget + trylock
>> to cause trouble.
> 
> That surprises me.  I thought a racer's tryget was irrelevant (touching
> a different field) and its trylock not a problem, since "we" hold the
> folio lock throughout.  If my mental model is too naive there, please
> explain in more detail: we all need to understand this better.

Sorry, I was imprecise.

tryget+trylock should indeed not be a problem, tryget+lock would be, 
because IIRC folio_wait_bit_common()->folio_set_waiters() would be 
messing with folio flags.
Hugh Dickins June 25, 2024, 8:12 p.m. UTC | #6
On Tue, 25 Jun 2024, David Hildenbrand wrote:
> On 25.06.24 21:37, Hugh Dickins wrote:
> > On Tue, 25 Jun 2024, David Hildenbrand wrote:
> >>
> >> I'll point out that it's sufficient for a PFN walker to do a tryget +
> >> trylock
> >> to cause trouble.
> > 
> > That surprises me.  I thought a racer's tryget was irrelevant (touching
> > a different field) and its trylock not a problem, since "we" hold the
> > folio lock throughout.  If my mental model is too naive there, please
> > explain in more detail: we all need to understand this better.
> 
> Sorry, I was imprecise.
> 
> tryget+trylock should indeed not be a problem, tryget+lock would be, because
> IIRC folio_wait_bit_common()->folio_set_waiters() would be messing with folio
> flags.

Interesting observation, thanks.  I had imagined that a folio locker was
safe, but think you're right that (before the fix) this could have erased
its PG_waiters.  Typically, I guess something else would come along sooner
or later to lock the folio, and that succeed in waking up the earlier one:
so probably not an issue that would be detected in testing, but not good.

Hugh
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index df1a43295c85..5394c1178bf1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1408,7 +1408,9 @@  void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
 	VM_BUG_ON_VMA(address < vma->vm_start ||
 			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
-	__folio_set_swapbacked(folio);
+
+	if (!folio_test_swapbacked(folio))
+		__folio_set_swapbacked(folio);
 	__folio_set_anon(folio, vma, address, exclusive);
 
 	if (likely(!folio_test_large(folio))) {