diff mbox series

[v2] mm/migrate: fix shmem xarray update during migration

Message ID 20250228174953.2222831-1-ziy@nvidia.com (mailing list archive)
State New
Headers show
Series [v2] mm/migrate: fix shmem xarray update during migration | expand

Commit Message

Zi Yan Feb. 28, 2025, 5:49 p.m. UTC
Pagecache uses multi-index entries for large folio, so does shmem. Only
swap cache still stores multiple entries for a single large folio.
Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
fixed swap cache but got shmem wrong by storing multiple entries for
a large shmem folio. Fix it by storing a single entry for a shmem
folio.

Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
Reported-by: Liu Shixin <liushixin2@huawei.com>
Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
---
 mm/migrate.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Baolin Wang March 1, 2025, 1:59 a.m. UTC | #1
On 2025/3/1 01:49, Zi Yan wrote:
> Pagecache uses multi-index entries for large folio, so does shmem. Only
> swap cache still stores multiple entries for a single large folio.
> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> fixed swap cache but got shmem wrong by storing multiple entries for
> a large shmem folio. Fix it by storing a single entry for a shmem
> folio.
> 
> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> Reported-by: Liu Shixin <liushixin2@huawei.com>
> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>

LGTM. Thanks.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/migrate.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 365c6daa8d1b..2c9669135a38 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>   			folio_set_swapcache(newfolio);
>   			newfolio->private = folio_get_private(folio);
>   		}
> -		entries = nr;
> +		/* shmem uses high-order entry */
> +		if (!folio_test_anon(folio))
> +			entries = 1;
> +		else
> +			entries = nr;
>   	} else {
>   		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>   		entries = 1;
Zi Yan March 4, 2025, 2:03 a.m. UTC | #2
On 28 Feb 2025, at 12:49, Zi Yan wrote:

> Pagecache uses multi-index entries for large folio, so does shmem. Only
> swap cache still stores multiple entries for a single large folio.
> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> fixed swap cache but got shmem wrong by storing multiple entries for
> a large shmem folio. Fix it by storing a single entry for a shmem
> folio.
>
> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> Reported-by: Liu Shixin <liushixin2@huawei.com>
> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>

+Cc:stable

> ---
>  mm/migrate.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 365c6daa8d1b..2c9669135a38 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>  			folio_set_swapcache(newfolio);
>  			newfolio->private = folio_get_private(folio);
>  		}
> -		entries = nr;
> +		/* shmem uses high-order entry */
> +		if (!folio_test_anon(folio))
> +			entries = 1;
> +		else
> +			entries = nr;
>  	} else {
>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>  		entries = 1;
> -- 
> 2.47.2


Best Regards,
Yan, Zi
Greg Kroah-Hartman March 4, 2025, 5:30 a.m. UTC | #3
On Mon, Mar 03, 2025 at 09:03:04PM -0500, Zi Yan wrote:
> On 28 Feb 2025, at 12:49, Zi Yan wrote:
> 
> > Pagecache uses multi-index entries for large folio, so does shmem. Only
> > swap cache still stores multiple entries for a single large folio.
> > Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> > fixed swap cache but got shmem wrong by storing multiple entries for
> > a large shmem folio. Fix it by storing a single entry for a shmem
> > folio.
> >
> > Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> > Reported-by: Liu Shixin <liushixin2@huawei.com>
> > Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
> > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > Reviewed-by: Shivank Garg <shivankg@amd.com>
> 
> +Cc:stable
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
Hugh Dickins March 4, 2025, 9:47 a.m. UTC | #4
On Fri, 28 Feb 2025, Zi Yan wrote:

> Pagecache uses multi-index entries for large folio, so does shmem. Only
> swap cache still stores multiple entries for a single large folio.
> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> fixed swap cache but got shmem wrong by storing multiple entries for
> a large shmem folio. Fix it by storing a single entry for a shmem
> folio.
> 
> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
> Reported-by: Liu Shixin <liushixin2@huawei.com>
> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>

It's a great find (I think), and your commit message is okay:
but unless I'm much mistaken, NAK to the patch itself.

First, I say "(I think)" there, because I don't actually know what the
loop writing the same folio nr times to the multi-index entry does to
the xarray: I can imagine it as being completely harmless, just nr
times more work than was needed.

But I guess it does something bad, since Matthew was horrified,
and we have all found that your patch appears to improve behaviour
(or at least improve behaviour in the context of your folio_split()
series: none of us noticed a problem before that, but it may be
that your new series is widening our exposure to existing bugs).

Maybe your orginal patch, with the shmem_mapping(mapping) check there,
was good, and it's only wrong when changed to !folio_test_anon(folio);
but TBH I find it too confusing, with the conditionals the way they are.
See my preferred alternative below.

The vital point is that multi-index entries are not used in swap cache:
whether the folio in question orginates from anon or from shmem.  And
it's easier to understand once you remember that a shmem folio is never
in both page cache and swap cache at the same time (well, there may be an
instant of transition from one to other while that folio is held locked) -
once it's in swap cache, folio->mapping is NULL and it's no longer
recognizable as from a shmem mapping.

The way I read your patch originally, I thought it meant that shmem
folios go into the swap cache as multi-index, but anon folios do not;
which seemed a worrying mixture to me.  But crashes on the
VM_BUG_ON_PAGE(entry != folio, entry) in __delete_from_swap_cache()
yesterday (with your patch in) led me to see how add_to_swap_cache()
inserts multiple non-multi-index entries, whether for anon or for shmem.

If this patch really is needed in old releases, then I suspect that
mm/huge_memory.c needs correction there too; but let me explain in
a response to your folio_split() series.

> ---
>  mm/migrate.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 365c6daa8d1b..2c9669135a38 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>  			folio_set_swapcache(newfolio);
>  			newfolio->private = folio_get_private(folio);
>  		}
> -		entries = nr;
> +		/* shmem uses high-order entry */
> +		if (!folio_test_anon(folio))
> +			entries = 1;
> +		else
> +			entries = nr;
>  	} else {
>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>  		entries = 1;
> -- 
> 2.47.2

NAK to that patch above, here's how I think it should be:

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/migrate.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fb19a18892c8..822776819ca6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -518,12 +518,12 @@ static int __folio_migrate_mapping(struct address_space *mapping,
 	if (folio_test_anon(folio) && folio_test_large(folio))
 		mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
 	folio_ref_add(newfolio, nr); /* add cache reference */
-	if (folio_test_swapbacked(folio)) {
+	if (folio_test_swapbacked(folio))
 		__folio_set_swapbacked(newfolio);
-		if (folio_test_swapcache(folio)) {
-			folio_set_swapcache(newfolio);
-			newfolio->private = folio_get_private(folio);
-		}
+
+	if (folio_test_swapcache(folio)) {
+		folio_set_swapcache(newfolio);
+		newfolio->private = folio_get_private(folio);
 		entries = nr;
 	} else {
 		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
Zi Yan March 4, 2025, 5 p.m. UTC | #5
On 4 Mar 2025, at 0:30, Greg KH wrote:

> On Mon, Mar 03, 2025 at 09:03:04PM -0500, Zi Yan wrote:
>> On 28 Feb 2025, at 12:49, Zi Yan wrote:
>>
>>> Pagecache uses multi-index entries for large folio, so does shmem. Only
>>> swap cache still stores multiple entries for a single large folio.
>>> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>>> fixed swap cache but got shmem wrong by storing multiple entries for
>>> a large shmem folio. Fix it by storing a single entry for a shmem
>>> folio.
>>>
>>> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>>> Reported-by: Liu Shixin <liushixin2@huawei.com>
>>> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> Reviewed-by: Shivank Garg <shivankg@amd.com>
>>
>> +Cc:stable
>>
>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
>   https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
>
> </formletter>

Sure. And this is not the right fix. I will resend a new one. Sorry
for the noise.


Best Regards,
Yan, Zi
Zi Yan March 4, 2025, 5:18 p.m. UTC | #6
On 4 Mar 2025, at 4:47, Hugh Dickins wrote:

> On Fri, 28 Feb 2025, Zi Yan wrote:
>
>> Pagecache uses multi-index entries for large folio, so does shmem. Only
>> swap cache still stores multiple entries for a single large folio.
>> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>> fixed swap cache but got shmem wrong by storing multiple entries for
>> a large shmem folio. Fix it by storing a single entry for a shmem
>> folio.
>>
>> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>> Reported-by: Liu Shixin <liushixin2@huawei.com>
>> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Shivank Garg <shivankg@amd.com>
>
> It's a great find (I think), and your commit message is okay:
> but unless I'm much mistaken, NAK to the patch itself.

Got it. Thank you for the review.

>
> First, I say "(I think)" there, because I don't actually know what the
> loop writing the same folio nr times to the multi-index entry does to
> the xarray: I can imagine it as being completely harmless, just nr
> times more work than was needed.
>
> But I guess it does something bad, since Matthew was horrified,
> and we have all found that your patch appears to improve behaviour
> (or at least improve behaviour in the context of your folio_split()
> series: none of us noticed a problem before that, but it may be
> that your new series is widening our exposure to existing bugs).
>
> Maybe your orginal patch, with the shmem_mapping(mapping) check there,
> was good, and it's only wrong when changed to !folio_test_anon(folio);
> but TBH I find it too confusing, with the conditionals the way they are.
> See my preferred alternative below.
>
> The vital point is that multi-index entries are not used in swap cache:
> whether the folio in question orginates from anon or from shmem.  And
> it's easier to understand once you remember that a shmem folio is never
> in both page cache and swap cache at the same time (well, there may be an
> instant of transition from one to other while that folio is held locked) -
> once it's in swap cache, folio->mapping is NULL and it's no longer
> recognizable as from a shmem mapping.

Got it. Now it all makes sense to me. Thank you for the explanation.

>
> The way I read your patch originally, I thought it meant that shmem
> folios go into the swap cache as multi-index, but anon folios do not;
> which seemed a worrying mixture to me.  But crashes on the
> VM_BUG_ON_PAGE(entry != folio, entry) in __delete_from_swap_cache()
> yesterday (with your patch in) led me to see how add_to_swap_cache()
> inserts multiple non-multi-index entries, whether for anon or for shmem.

Thanks for the pointer.

>
> If this patch really is needed in old releases, then I suspect that
> mm/huge_memory.c needs correction there too; but let me explain in
> a response to your folio_split() series.
>
>> ---
>>  mm/migrate.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 365c6daa8d1b..2c9669135a38 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>>  			folio_set_swapcache(newfolio);
>>  			newfolio->private = folio_get_private(folio);
>>  		}
>> -		entries = nr;
>> +		/* shmem uses high-order entry */
>> +		if (!folio_test_anon(folio))
>> +			entries = 1;
>> +		else
>> +			entries = nr;
>>  	} else {
>>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>>  		entries = 1;
>> -- 
>> 2.47.2
>
> NAK to that patch above, here's how I think it should be:

OK. I will resend your fix with __split_huge_page() fixes against Linus’s tree.
My folio_split() will conflict with the fix, but the merge fix should be
simple, since the related patch just deletes __split_huge_page() entirely.

>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/migrate.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index fb19a18892c8..822776819ca6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -518,12 +518,12 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>  	if (folio_test_anon(folio) && folio_test_large(folio))
>  		mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
>  	folio_ref_add(newfolio, nr); /* add cache reference */
> -	if (folio_test_swapbacked(folio)) {
> +	if (folio_test_swapbacked(folio))
>  		__folio_set_swapbacked(newfolio);
> -		if (folio_test_swapcache(folio)) {
> -			folio_set_swapcache(newfolio);
> -			newfolio->private = folio_get_private(folio);
> -		}
> +
> +	if (folio_test_swapcache(folio)) {
> +		folio_set_swapcache(newfolio);
> +		newfolio->private = folio_get_private(folio);
>  		entries = nr;
>  	} else {
>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
> -- 
> 2.43.0


Best Regards,
Yan, Zi
Zi Yan March 4, 2025, 8:07 p.m. UTC | #7
On 4 Mar 2025, at 12:18, Zi Yan wrote:

> On 4 Mar 2025, at 4:47, Hugh Dickins wrote:
>
>> On Fri, 28 Feb 2025, Zi Yan wrote:
>>
>>> Pagecache uses multi-index entries for large folio, so does shmem. Only
>>> swap cache still stores multiple entries for a single large folio.
>>> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>>> fixed swap cache but got shmem wrong by storing multiple entries for
>>> a large shmem folio. Fix it by storing a single entry for a shmem
>>> folio.
>>>
>>> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>>> Reported-by: Liu Shixin <liushixin2@huawei.com>
>>> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> Reviewed-by: Shivank Garg <shivankg@amd.com>
>>
>> It's a great find (I think), and your commit message is okay:
>> but unless I'm much mistaken, NAK to the patch itself.
>
> Got it. Thank you for the review.
>
>>
>> First, I say "(I think)" there, because I don't actually know what the
>> loop writing the same folio nr times to the multi-index entry does to
>> the xarray: I can imagine it as being completely harmless, just nr
>> times more work than was needed.

It seems that you are right on this one. I am trying to reproduce the
issue on mainline but could not and I did see shmem hits the entries = nr.
So it is likely there is no bug in mainline just inefficiency.

This fix might just mask the bugs introduced in my folio_split() patchset,
since I reverted my xas_try_split() in shmem_large_split_entry() patch
and still hit the issue. Let me do more debugging and get back.

>>
>> But I guess it does something bad, since Matthew was horrified,
>> and we have all found that your patch appears to improve behaviour
>> (or at least improve behaviour in the context of your folio_split()
>> series: none of us noticed a problem before that, but it may be
>> that your new series is widening our exposure to existing bugs).
>>
>> Maybe your orginal patch, with the shmem_mapping(mapping) check there,
>> was good, and it's only wrong when changed to !folio_test_anon(folio);
>> but TBH I find it too confusing, with the conditionals the way they are.
>> See my preferred alternative below.
>>
>> The vital point is that multi-index entries are not used in swap cache:
>> whether the folio in question orginates from anon or from shmem.  And
>> it's easier to understand once you remember that a shmem folio is never
>> in both page cache and swap cache at the same time (well, there may be an
>> instant of transition from one to other while that folio is held locked) -
>> once it's in swap cache, folio->mapping is NULL and it's no longer
>> recognizable as from a shmem mapping.
>
> Got it. Now it all makes sense to me. Thank you for the explanation.
>
>>
>> The way I read your patch originally, I thought it meant that shmem
>> folios go into the swap cache as multi-index, but anon folios do not;
>> which seemed a worrying mixture to me.  But crashes on the
>> VM_BUG_ON_PAGE(entry != folio, entry) in __delete_from_swap_cache()
>> yesterday (with your patch in) led me to see how add_to_swap_cache()
>> inserts multiple non-multi-index entries, whether for anon or for shmem.
>
> Thanks for the pointer.
>
>>
>> If this patch really is needed in old releases, then I suspect that
>> mm/huge_memory.c needs correction there too; but let me explain in
>> a response to your folio_split() series.
>>
>>> ---
>>>  mm/migrate.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 365c6daa8d1b..2c9669135a38 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>>>  			folio_set_swapcache(newfolio);
>>>  			newfolio->private = folio_get_private(folio);
>>>  		}
>>> -		entries = nr;
>>> +		/* shmem uses high-order entry */
>>> +		if (!folio_test_anon(folio))
>>> +			entries = 1;
>>> +		else
>>> +			entries = nr;
>>>  	} else {
>>>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>>>  		entries = 1;
>>> -- 
>>> 2.47.2
>>
>> NAK to that patch above, here's how I think it should be:
>
> OK. I will resend your fix with __split_huge_page() fixes against Linus’s tree.
> My folio_split() will conflict with the fix, but the merge fix should be
> simple, since the related patch just deletes __split_huge_page() entirely.

Best Regards,
Yan, Zi
Zi Yan March 5, 2025, 3:22 a.m. UTC | #8
On 4 Mar 2025, at 15:07, Zi Yan wrote:

> On 4 Mar 2025, at 12:18, Zi Yan wrote:
>
>> On 4 Mar 2025, at 4:47, Hugh Dickins wrote:
>>
>>> On Fri, 28 Feb 2025, Zi Yan wrote:
>>>
>>>> Pagecache uses multi-index entries for large folio, so does shmem. Only
>>>> swap cache still stores multiple entries for a single large folio.
>>>> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>>>> fixed swap cache but got shmem wrong by storing multiple entries for
>>>> a large shmem folio. Fix it by storing a single entry for a shmem
>>>> folio.
>>>>
>>>> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>>>> Reported-by: Liu Shixin <liushixin2@huawei.com>
>>>> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> Reviewed-by: Shivank Garg <shivankg@amd.com>
>>>
>>> It's a great find (I think), and your commit message is okay:
>>> but unless I'm much mistaken, NAK to the patch itself.
>>
>> Got it. Thank you for the review.
>>
>>>
>>> First, I say "(I think)" there, because I don't actually know what the
>>> loop writing the same folio nr times to the multi-index entry does to
>>> the xarray: I can imagine it as being completely harmless, just nr
>>> times more work than was needed.
>
> It seems that you are right on this one. I am trying to reproduce the
> issue on mainline but could not and I did see shmem hits the entries = nr.
> So it is likely there is no bug in mainline just inefficiency.
>
> This fix might just mask the bugs introduced in my folio_split() patchset,
> since I reverted my xas_try_split() in shmem_large_split_entry() patch
> and still hit the issue. Let me do more debugging and get back.

I need to take this back. It turns out I did not turn on large folio on
shmem when I was testing 6.14-rc5. After turning on 64KB only large folio
on shmem, shmem swapin got stuck using the repro from Liu Shixin (running
compact_memory all the time then doing linear shmem swapin). But if I
turn on 2MB large folio on shmem, there is no issue.

I get no issue with v6.13 either. So this issue seems from 6.14-rc. I am going
to rebase my folio_split() patchset on v6.13 to test the uniform split
part (the non-uniform part would need Baolin’s patchset).

>
>>>
>>> But I guess it does something bad, since Matthew was horrified,
>>> and we have all found that your patch appears to improve behaviour
>>> (or at least improve behaviour in the context of your folio_split()
>>> series: none of us noticed a problem before that, but it may be
>>> that your new series is widening our exposure to existing bugs).
>>>
>>> Maybe your orginal patch, with the shmem_mapping(mapping) check there,
>>> was good, and it's only wrong when changed to !folio_test_anon(folio);
>>> but TBH I find it too confusing, with the conditionals the way they are.
>>> See my preferred alternative below.
>>>
>>> The vital point is that multi-index entries are not used in swap cache:
>>> whether the folio in question orginates from anon or from shmem.  And
>>> it's easier to understand once you remember that a shmem folio is never
>>> in both page cache and swap cache at the same time (well, there may be an
>>> instant of transition from one to other while that folio is held locked) -
>>> once it's in swap cache, folio->mapping is NULL and it's no longer
>>> recognizable as from a shmem mapping.
>>
>> Got it. Now it all makes sense to me. Thank you for the explanation.
>>
>>>
>>> The way I read your patch originally, I thought it meant that shmem
>>> folios go into the swap cache as multi-index, but anon folios do not;
>>> which seemed a worrying mixture to me.  But crashes on the
>>> VM_BUG_ON_PAGE(entry != folio, entry) in __delete_from_swap_cache()
>>> yesterday (with your patch in) led me to see how add_to_swap_cache()
>>> inserts multiple non-multi-index entries, whether for anon or for shmem.
>>
>> Thanks for the pointer.
>>
>>>
>>> If this patch really is needed in old releases, then I suspect that
>>> mm/huge_memory.c needs correction there too; but let me explain in
>>> a response to your folio_split() series.
>>>
>>>> ---
>>>>  mm/migrate.c | 6 +++++-
>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 365c6daa8d1b..2c9669135a38 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>>>>  			folio_set_swapcache(newfolio);
>>>>  			newfolio->private = folio_get_private(folio);
>>>>  		}
>>>> -		entries = nr;
>>>> +		/* shmem uses high-order entry */
>>>> +		if (!folio_test_anon(folio))
>>>> +			entries = 1;
>>>> +		else
>>>> +			entries = nr;
>>>>  	} else {
>>>>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>>>>  		entries = 1;
>>>> -- 
>>>> 2.47.2
>>>
>>> NAK to that patch above, here's how I think it should be:
>>
>> OK. I will resend your fix with __split_huge_page() fixes against Linus’s tree.
>> My folio_split() will conflict with the fix, but the merge fix should be
>> simple, since the related patch just deletes __split_huge_page() entirely.
>
> Best Regards,
> Yan, Zi


Best Regards,
Yan, Zi
Zi Yan March 5, 2025, 7:55 p.m. UTC | #9
On 4 Mar 2025, at 4:47, Hugh Dickins wrote:

> On Fri, 28 Feb 2025, Zi Yan wrote:
>
>> Pagecache uses multi-index entries for large folio, so does shmem. Only
>> swap cache still stores multiple entries for a single large folio.
>> Commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>> fixed swap cache but got shmem wrong by storing multiple entries for
>> a large shmem folio. Fix it by storing a single entry for a shmem
>> folio.
>>
>> Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
>> Reported-by: Liu Shixin <liushixin2@huawei.com>
>> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Shivank Garg <shivankg@amd.com>
>
> It's a great find (I think), and your commit message is okay:
> but unless I'm much mistaken, NAK to the patch itself.
>
> First, I say "(I think)" there, because I don't actually know what the
> loop writing the same folio nr times to the multi-index entry does to
> the xarray: I can imagine it as being completely harmless, just nr
> times more work than was needed.
>
> But I guess it does something bad, since Matthew was horrified,
> and we have all found that your patch appears to improve behaviour
> (or at least improve behaviour in the context of your folio_split()
> series: none of us noticed a problem before that, but it may be
> that your new series is widening our exposure to existing bugs).
>
> Maybe your orginal patch, with the shmem_mapping(mapping) check there,
> was good, and it's only wrong when changed to !folio_test_anon(folio);
> but TBH I find it too confusing, with the conditionals the way they are.
> See my preferred alternative below.
>
> The vital point is that multi-index entries are not used in swap cache:
> whether the folio in question orginates from anon or from shmem.  And
> it's easier to understand once you remember that a shmem folio is never
> in both page cache and swap cache at the same time (well, there may be an
> instant of transition from one to other while that folio is held locked) -
> once it's in swap cache, folio->mapping is NULL and it's no longer
> recognizable as from a shmem mapping.
>
> The way I read your patch originally, I thought it meant that shmem
> folios go into the swap cache as multi-index, but anon folios do not;
> which seemed a worrying mixture to me.  But crashes on the
> VM_BUG_ON_PAGE(entry != folio, entry) in __delete_from_swap_cache()
> yesterday (with your patch in) led me to see how add_to_swap_cache()
> inserts multiple non-multi-index entries, whether for anon or for shmem.
>
> If this patch really is needed in old releases, then I suspect that
> mm/huge_memory.c needs correction there too; but let me explain in
> a response to your folio_split() series.

Actually, mm/huge_memory.c does not need the change, because
split_huge_page*() bails with EBUSY when folio->mapping is NULL. So
__split_huge_page() will not encounter large shmem folio in swap cache
case.

Best Regards,
Yan, Zi
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 365c6daa8d1b..2c9669135a38 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -524,7 +524,11 @@  static int __folio_migrate_mapping(struct address_space *mapping,
 			folio_set_swapcache(newfolio);
 			newfolio->private = folio_get_private(folio);
 		}
-		entries = nr;
+		/* shmem uses high-order entry */
+		if (!folio_test_anon(folio))
+			entries = 1;
+		else
+			entries = nr;
 	} else {
 		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
 		entries = 1;