diff mbox series

[RFC] mm/memory-failure: release private data before split THP

Message ID 20220803025243.155798-1-fengwei.yin@intel.com (mailing list archive)
State New
Headers show
Series [RFC] mm/memory-failure: release private data before split THP | expand

Commit Message

Yin Fengwei Aug. 3, 2022, 2:52 a.m. UTC
If there is private data attached to THP, the refcount of
THP will be increased and block the THP split. Which could
further cause the meomry failure not recovered.

Release private data attached to THP before split it to
increase the chance of splitting THP successfully.

The issue was hit during HW error injection testing with
5.18 kernel + xfs as rootfs, test got killed and system
reboot was required to re-run the test.

The issue was tracked down to THP split failure caused the
memory failure not being handled. The page dump showed:

[ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
[ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
[ 1785.452408] memcg:ff4247f2d28e9000
[ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
[ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
[ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000

It was like the error was injected to a large folio for xfs with
private data attached.

With private data released before split THP, the test case
could be run successfully many times without reboot system.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
---
 mm/memory-failure.c | 9 +++++++++
 1 file changed, 9 insertions(+)


base-commit: 9de1f9c8ca5100a02a2e271bdbde36202e251b4b

Comments

HORIGUCHI NAOYA(堀口 直也) Aug. 3, 2022, 9:39 a.m. UTC | #1
On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
> If there is private data attached to THP, the refcount of
> THP will be increased and block the THP split. Which could
> further cause the meomry failure not recovered.
> 
> Release private data attached to THP before split it to
> increase the chance of splitting THP successfully.
> 
> The issue was hit during HW error injection testing with
> 5.18 kernel + xfs as rootfs, test got killed and system
> reboot was required to re-run the test.
> 
> The issue was tracked down to THP split failure caused the
> memory failure not being handled. The page dump showed:
> 
> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
> [ 1785.452408] memcg:ff4247f2d28e9000
> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
> 
> It was like the error was injected to a large folio for xfs with
> private data attached.
> 
> With private data released before split THP, the test case
> could be run successfully many times without reboot system.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Reviewed-by: Aaron Lu <aaron.lu@intel.com>

Thank you for the patch,
It looks reasonable to me so far.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Matthew Wilcox Aug. 3, 2022, 1:01 p.m. UTC | #2
On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index da39ec8afca8..08e21973b120 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>  
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
> +	struct page *head = compound_head(page);
 > +
>  	lock_page(page);
> +	/*
> +	 * If thp page has private data attached, thp split will fail.
> +	 * Release private data before split thp.
> +	 */
> +	if (page_has_private(head))
> +		try_to_release_page(head, GFP_KERNEL);
> +
>  	if (unlikely(split_huge_page(page))) {
>  		unsigned long pfn = page_to_pfn(page);

It seems a shame to use the old page approach instead of the
shiny new folio approach.  We're quite close to being able to remove
try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea.
How about this:

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index da39ec8afca8..765b383288b1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p,
 
 static int try_to_split_thp_page(struct page *page, const char *msg)
 {
-	lock_page(page);
+	struct folio *folio = page_folio(page);
+
+	folio_lock(folio);
+	if (folio_test_private(folio))
+		filemap_release_folio(folio, GFP_KERNEL);
 	if (unlikely(split_huge_page(page))) {
 		unsigned long pfn = page_to_pfn(page);
 
-		unlock_page(page);
+		folio_unlock(folio);
 		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
-		put_page(page);
+		folio_put(folio);
 		return -EBUSY;
 	}
-	unlock_page(page);
+	folio = page_folio(page);
+	folio_unlock(folio);
 
 	return 0;
 }
Yin Fengwei Aug. 3, 2022, 1:32 p.m. UTC | #3
On 8/3/2022 9:01 PM, Matthew Wilcox wrote:
> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index da39ec8afca8..08e21973b120 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>  
>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>  {
>> +	struct page *head = compound_head(page);
>  > +
>>  	lock_page(page);
>> +	/*
>> +	 * If thp page has private data attached, thp split will fail.
>> +	 * Release private data before split thp.
>> +	 */
>> +	if (page_has_private(head))
>> +		try_to_release_page(head, GFP_KERNEL);
>> +
>>  	if (unlikely(split_huge_page(page))) {
>>  		unsigned long pfn = page_to_pfn(page);
> 
> It seems a shame to use the old page approach instead of the
> shiny new folio approach.  We're quite close to being able to remove
> try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea.
> How about this:
I am not aware try_to_release_page() was on remove plan. Yes. New API
is good.

> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index da39ec8afca8..765b383288b1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>  
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
> -	lock_page(page);
> +	struct folio *folio = page_folio(page);
> +
> +	folio_lock(folio);
> +	if (folio_test_private(folio))
> +		filemap_release_folio(folio, GFP_KERNEL);>  	if (unlikely(split_huge_page(page))) {
>  		unsigned long pfn = page_to_pfn(page);
>  
> -		unlock_page(page);
> +		folio_unlock(folio);
>  		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> -		put_page(page);
> +		folio_put(folio);
>  		return -EBUSY;
>  	}
> -	unlock_page(page);
> +	folio = page_folio(page);
Already got page folio. I suppose don't need above line.
I will re-run the test with the new folio API based patch.


Regards
Yin, Fengwei

> +	folio_unlock(folio);
>  
>  	return 0;
>  }
Matthew Wilcox Aug. 3, 2022, 1:36 p.m. UTC | #4
On Wed, Aug 03, 2022 at 09:32:41PM +0800, Yin, Fengwei wrote:
> On 8/3/2022 9:01 PM, Matthew Wilcox wrote:
> > On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index da39ec8afca8..08e21973b120 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p,
> >>  
> >>  static int try_to_split_thp_page(struct page *page, const char *msg)
> >>  {
> >> +	struct page *head = compound_head(page);
> >  > +
> >>  	lock_page(page);
> >> +	/*
> >> +	 * If thp page has private data attached, thp split will fail.
> >> +	 * Release private data before split thp.
> >> +	 */
> >> +	if (page_has_private(head))
> >> +		try_to_release_page(head, GFP_KERNEL);
> >> +
> >>  	if (unlikely(split_huge_page(page))) {
> >>  		unsigned long pfn = page_to_pfn(page);
> > 
> > It seems a shame to use the old page approach instead of the
> > shiny new folio approach.  We're quite close to being able to remove
> > try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea.
> > How about this:
> I am not aware try_to_release_page() was on remove plan. Yes. New API
> is good.

Generally, anything in folio-compat.c is on the remove schedule.
Depending on the callers, that schedule might be a few years away (eg
unlock_page() has around 700 callers).

> >  static int try_to_split_thp_page(struct page *page, const char *msg)
> >  {
> > -	lock_page(page);
> > +	struct folio *folio = page_folio(page);
> > +
> > +	folio_lock(folio);
> > +	if (folio_test_private(folio))
> > +		filemap_release_folio(folio, GFP_KERNEL);
> >  	if (unlikely(split_huge_page(page))) {
> >  		unsigned long pfn = page_to_pfn(page);
> >  
> > -		unlock_page(page);
> > +		folio_unlock(folio);
> >  		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> > -		put_page(page);
> > +		folio_put(folio);
> >  		return -EBUSY;
> >  	}
> > -	unlock_page(page);
> > +	folio = page_folio(page);
> Already got page folio. I suppose don't need above line.

Ah, no.  If the thp split succeeded, we need to get the new folio for
this page.

> I will re-run the test with the new folio API based patch.

Thanks!
Yin Fengwei Aug. 3, 2022, 1:40 p.m. UTC | #5
On 8/3/2022 9:36 PM, Matthew Wilcox wrote:
> Ah, no.  If the thp split succeeded, we need to get the new folio for
> this page.
Yes. You are right. The folio of tail page will be changed. Thanks for
pointing out.


Regards
Yin, Fengwei
Yin Fengwei Aug. 3, 2022, 2:33 p.m. UTC | #6
On 8/3/2022 9:36 PM, Matthew Wilcox wrote:
>> I will re-run the test with the new folio API based patch.
> Thanks!

The new folio API based change passed the test. Should I
re-send the patch with the new folio API with your suggested-by?
And remove "RFC". Thanks.


Regards
Yin, Fengwei
Yin Fengwei Aug. 3, 2022, 2:42 p.m. UTC | #7
On 8/3/2022 5:39 PM, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
>> If there is private data attached to THP, the refcount of
>> THP will be increased and block the THP split. Which could
>> further cause the meomry failure not recovered.
>>
>> Release private data attached to THP before split it to
>> increase the chance of splitting THP successfully.
>>
>> The issue was hit during HW error injection testing with
>> 5.18 kernel + xfs as rootfs, test got killed and system
>> reboot was required to re-run the test.
>>
>> The issue was tracked down to THP split failure caused the
>> memory failure not being handled. The page dump showed:
>>
>> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
>> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
>> [ 1785.452408] memcg:ff4247f2d28e9000
>> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
>> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
>> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
>> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
>>
>> It was like the error was injected to a large folio for xfs with
>> private data attached.
>>
>> With private data released before split THP, the test case
>> could be run successfully many times without reboot system.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
> 
> Thank you for the patch,
> It looks reasonable to me so far.
> 
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks a lot for helping review the patch. As Willy suggested to use
new folio API to replace the old try_to_release_page(), there will
be a new patch post soon.


Regards
Yin, Fengwei
Matthew Wilcox Aug. 3, 2022, 3:26 p.m. UTC | #8
On Wed, Aug 03, 2022 at 10:33:49PM +0800, Yin, Fengwei wrote:
> 
> 
> On 8/3/2022 9:36 PM, Matthew Wilcox wrote:
> >> I will re-run the test with the new folio API based patch.
> > Thanks!
> 
> The new folio API based change passed the test. Should I
> re-send the patch with the new folio API with your suggested-by?
> And remove "RFC". Thanks.

Yes please!
Miaohe Lin Aug. 4, 2022, 1:19 a.m. UTC | #9
On 2022/8/3 21:01, Matthew Wilcox wrote:
> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index da39ec8afca8..08e21973b120 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>  
>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>  {
>> +	struct page *head = compound_head(page);
>  > +
>>  	lock_page(page);
>> +	/*
>> +	 * If thp page has private data attached, thp split will fail.
>> +	 * Release private data before split thp.
>> +	 */
>> +	if (page_has_private(head))
>> +		try_to_release_page(head, GFP_KERNEL);
>> +
>>  	if (unlikely(split_huge_page(page))) {
>>  		unsigned long pfn = page_to_pfn(page);
> 
> It seems a shame to use the old page approach instead of the
> shiny new folio approach.  We're quite close to being able to remove
> try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea.
> How about this:
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index da39ec8afca8..765b383288b1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>  
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
> -	lock_page(page);
> +	struct folio *folio = page_folio(page);
> +
> +	folio_lock(folio);
> +	if (folio_test_private(folio))
> +		filemap_release_folio(folio, GFP_KERNEL);

If filemap_release_folio fails, we could avoid trying split_huge_page here?

>  	if (unlikely(split_huge_page(page))) {
>  		unsigned long pfn = page_to_pfn(page);
>  
> -		unlock_page(page);
> +		folio_unlock(folio);
>  		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> -		put_page(page);
> +		folio_put(folio);
>  		return -EBUSY;
>  	}
> -	unlock_page(page);
> +	folio = page_folio(page);

Above line (re-fetching folio) might need a comment to avoid future confusing ?

Anyway, this patch looks good to me. Thanks for fixing.

> +	folio_unlock(folio);
>  
>  	return 0;
>  }
> 
> .
>
Yin Fengwei Aug. 4, 2022, 1:54 a.m. UTC | #10
On 2022/8/4 09:19, Miaohe Lin wrote:
> On 2022/8/3 21:01, Matthew Wilcox wrote:
>> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index da39ec8afca8..08e21973b120 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>  
>>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>>  {
>>> +	struct page *head = compound_head(page);
>>  > +
>>>  	lock_page(page);
>>> +	/*
>>> +	 * If thp page has private data attached, thp split will fail.
>>> +	 * Release private data before split thp.
>>> +	 */
>>> +	if (page_has_private(head))
>>> +		try_to_release_page(head, GFP_KERNEL);
>>> +
>>>  	if (unlikely(split_huge_page(page))) {
>>>  		unsigned long pfn = page_to_pfn(page);
>>
>> It seems a shame to use the old page approach instead of the
>> shiny new folio approach.  We're quite close to being able to remove
>> try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea.
>> How about this:
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index da39ec8afca8..765b383288b1 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>  
>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>  {
>> -	lock_page(page);
>> +	struct folio *folio = page_folio(page);
>> +
>> +	folio_lock(folio);
>> +	if (folio_test_private(folio))
>> +		filemap_release_folio(folio, GFP_KERNEL);
> 
> If filemap_release_folio fails, we could avoid trying split_huge_page here?
Maybe we could stick to this regarding this is error recovery path
instead of hot path?

> 
>>  	if (unlikely(split_huge_page(page))) {
>>  		unsigned long pfn = page_to_pfn(page);
>>  
>> -		unlock_page(page);
>> +		folio_unlock(folio);
>>  		pr_info("%s: %#lx: thp split failed\n", msg, pfn);
>> -		put_page(page);
>> +		folio_put(folio);
>>  		return -EBUSY;
>>  	}
>> -	unlock_page(page);
>> +	folio = page_folio(page);
> 
> Above line (re-fetching folio) might need a comment to avoid future confusing ?
I will add one line comment about why page_folio need here.

> 
> Anyway, this patch looks good to me. Thanks for fixing.
Thanks.

Regards
Yin, Fengwei

> 
>> +	folio_unlock(folio);
>>  
>>  	return 0;
>>  }
>>
>> .
>>
>
Miaohe Lin Aug. 4, 2022, 2:05 a.m. UTC | #11
On 2022/8/4 9:54, Yin Fengwei wrote:
> On 2022/8/4 09:19, Miaohe Lin wrote:
>> On 2022/8/3 21:01, Matthew Wilcox wrote:
>>> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote:
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index da39ec8afca8..08e21973b120 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>>  
>>>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>>>  {
>>>> +	struct page *head = compound_head(page);
>>>  > +
>>>>  	lock_page(page);
>>>> +	/*
>>>> +	 * If thp page has private data attached, thp split will fail.
>>>> +	 * Release private data before split thp.
>>>> +	 */
>>>> +	if (page_has_private(head))
>>>> +		try_to_release_page(head, GFP_KERNEL);
>>>> +
>>>>  	if (unlikely(split_huge_page(page))) {
>>>>  		unsigned long pfn = page_to_pfn(page);
>>>
>>> It seems a shame to use the old page approach instead of the
>>> shiny new folio approach.  We're quite close to being able to remove
>>> try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea.
>>> How about this:
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index da39ec8afca8..765b383288b1 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>  
>>>  static int try_to_split_thp_page(struct page *page, const char *msg)
>>>  {
>>> -	lock_page(page);
>>> +	struct folio *folio = page_folio(page);
>>> +
>>> +	folio_lock(folio);
>>> +	if (folio_test_private(folio))
>>> +		filemap_release_folio(folio, GFP_KERNEL);
>>
>> If filemap_release_folio fails, we could avoid trying split_huge_page here?
> Maybe we could stick to this regarding this is error recovery path
> instead of hot path?

That should be fine. Thanks.
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index da39ec8afca8..08e21973b120 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1484,7 +1484,16 @@  static int identify_page_state(unsigned long pfn, struct page *p,
 
 static int try_to_split_thp_page(struct page *page, const char *msg)
 {
+	struct page *head = compound_head(page);
+
 	lock_page(page);
+	/*
+	 * If thp page has private data attached, thp split will fail.
+	 * Release private data before split thp.
+	 */
+	if (page_has_private(head))
+		try_to_release_page(head, GFP_KERNEL);
+
 	if (unlikely(split_huge_page(page))) {
 		unsigned long pfn = page_to_pfn(page);