Message ID | 20220803025243.155798-1-fengwei.yin@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] mm/memory-failure: release private data before split THP | expand |
On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: > If there is private data attached to THP, the refcount of > THP will be increased and block the THP split. Which could > further cause the meomry failure not recovered. > > Release private data attached to THP before split it to > increase the chance of splitting THP successfully. > > The issue was hit during HW error injection testing with > 5.18 kernel + xfs as rootfs, test got killed and system > reboot was required to re-run the test. > > The issue was tracked down to THP split failure caused the > memory failure not being handled. The page dump showed: > > [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200 > [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0 > [ 1785.452408] memcg:ff4247f2d28e9000 > [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx" > [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2) > [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8 > [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000 > > It was like the error was injected to a large folio for xfs with > private data attached. > > With private data released before split THP, the test case > could be run successfully many times without reboot system. > > Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> > Reviewed-by: Aaron Lu <aaron.lu@intel.com> Thank you for the patch, It looks reasonable to me so far. Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index da39ec8afca8..08e21973b120 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, > > static int try_to_split_thp_page(struct page *page, const char *msg) > { > + struct page *head = compound_head(page); > + > lock_page(page); > + /* > + * If thp page has private data attached, thp split will fail. > + * Release private data before split thp. > + */ > + if (page_has_private(head)) > + try_to_release_page(head, GFP_KERNEL); > + > if (unlikely(split_huge_page(page))) { > unsigned long pfn = page_to_pfn(page); It seems a shame to use the old page approach instead of the shiny new folio approach. We're quite close to being able to remove try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea. How about this: diff --git a/mm/memory-failure.c b/mm/memory-failure.c index da39ec8afca8..765b383288b1 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p, static int try_to_split_thp_page(struct page *page, const char *msg) { - lock_page(page); + struct folio *folio = page_folio(page); + + folio_lock(folio); + if (folio_test_private(folio)) + filemap_release_folio(folio, GFP_KERNEL); if (unlikely(split_huge_page(page))) { unsigned long pfn = page_to_pfn(page); - unlock_page(page); + folio_unlock(folio); pr_info("%s: %#lx: thp split failed\n", msg, pfn); - put_page(page); + folio_put(folio); return -EBUSY; } - unlock_page(page); + folio = page_folio(page); + folio_unlock(folio); return 0; }
On 8/3/2022 9:01 PM, Matthew Wilcox wrote: > On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index da39ec8afca8..08e21973b120 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, >> >> static int try_to_split_thp_page(struct page *page, const char *msg) >> { >> + struct page *head = compound_head(page); > > + >> lock_page(page); >> + /* >> + * If thp page has private data attached, thp split will fail. >> + * Release private data before split thp. >> + */ >> + if (page_has_private(head)) >> + try_to_release_page(head, GFP_KERNEL); >> + >> if (unlikely(split_huge_page(page))) { >> unsigned long pfn = page_to_pfn(page); > > It seems a shame to use the old page approach instead of the > shiny new folio approach. We're quite close to being able to remove > try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea. > How about this: I am not aware try_to_release_page() was on remove plan. Yes. New API is good. > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index da39ec8afca8..765b383288b1 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p, > > static int try_to_split_thp_page(struct page *page, const char *msg) > { > - lock_page(page); > + struct folio *folio = page_folio(page); > + > + folio_lock(folio); > + if (folio_test_private(folio)) > + filemap_release_folio(folio, GFP_KERNEL);> if (unlikely(split_huge_page(page))) { > unsigned long pfn = page_to_pfn(page); > > - unlock_page(page); > + folio_unlock(folio); > pr_info("%s: %#lx: thp split failed\n", msg, pfn); > - put_page(page); > + folio_put(folio); > return -EBUSY; > } > - unlock_page(page); > + folio = page_folio(page); Already got page folio. I suppose don't need above line. I will re-run the test with the new folio API based patch. Regards Yin, Fengwei > + folio_unlock(folio); > > return 0; > }
On Wed, Aug 03, 2022 at 09:32:41PM +0800, Yin, Fengwei wrote: > On 8/3/2022 9:01 PM, Matthew Wilcox wrote: > > On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >> index da39ec8afca8..08e21973b120 100644 > >> --- a/mm/memory-failure.c > >> +++ b/mm/memory-failure.c > >> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, > >> > >> static int try_to_split_thp_page(struct page *page, const char *msg) > >> { > >> + struct page *head = compound_head(page); > > > + > >> lock_page(page); > >> + /* > >> + * If thp page has private data attached, thp split will fail. > >> + * Release private data before split thp. > >> + */ > >> + if (page_has_private(head)) > >> + try_to_release_page(head, GFP_KERNEL); > >> + > >> if (unlikely(split_huge_page(page))) { > >> unsigned long pfn = page_to_pfn(page); > > > > It seems a shame to use the old page approach instead of the > > shiny new folio approach. We're quite close to being able to remove > > try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea. > > How about this: > I am not aware try_to_release_page() was on remove plan. Yes. New API > is good. Generally, anything in folio-compat.c is on the remove schedule. Depending on the callers, that schedule might be a few years away (eg unlock_page() has around 700 callers). > > static int try_to_split_thp_page(struct page *page, const char *msg) > > { > > - lock_page(page); > > + struct folio *folio = page_folio(page); > > + > > + folio_lock(folio); > > + if (folio_test_private(folio)) > > + filemap_release_folio(folio, GFP_KERNEL); > > if (unlikely(split_huge_page(page))) { > > unsigned long pfn = page_to_pfn(page); > > > > - unlock_page(page); > > + folio_unlock(folio); > > pr_info("%s: %#lx: thp split failed\n", msg, pfn); > > - put_page(page); > > + folio_put(folio); > > return -EBUSY; > > } > > - unlock_page(page); > > + folio = page_folio(page); > Already got page folio. I suppose don't need above line. Ah, no. If the thp split succeeded, we need to get the new folio for this page. > I will re-run the test with the new folio API based patch. Thanks!
On 8/3/2022 9:36 PM, Matthew Wilcox wrote: > Ah, no. If the thp split succeeded, we need to get the new folio for > this page. Yes. You are right. The folio of tail page will be changed. Thanks for pointing out. Regards Yin, Fengwei
On 8/3/2022 9:36 PM, Matthew Wilcox wrote: >> I will re-run the test with the new folio API based patch. > Thanks! The new folio API based change passed the test. Should I re-send the patch with the new folio API with your suggested-by? And remove "RFC". Thanks. Regards Yin, Fengwei
On 8/3/2022 5:39 PM, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: >> If there is private data attached to THP, the refcount of >> THP will be increased and block the THP split. Which could >> further cause the meomry failure not recovered. >> >> Release private data attached to THP before split it to >> increase the chance of splitting THP successfully. >> >> The issue was hit during HW error injection testing with >> 5.18 kernel + xfs as rootfs, test got killed and system >> reboot was required to re-run the test. >> >> The issue was tracked down to THP split failure caused the >> memory failure not being handled. The page dump showed: >> >> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200 >> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0 >> [ 1785.452408] memcg:ff4247f2d28e9000 >> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx" >> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2) >> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8 >> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000 >> >> It was like the error was injected to a large folio for xfs with >> private data attached. >> >> With private data released before split THP, the test case >> could be run successfully many times without reboot system. >> >> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> >> Reviewed-by: Aaron Lu <aaron.lu@intel.com> > > Thank you for the patch, > It looks reasonable to me so far. > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Thanks a lot for helping review the patch. As Willy suggested to use new folio API to replace the old try_to_release_page(), there will be a new patch post soon. Regards Yin, Fengwei
On Wed, Aug 03, 2022 at 10:33:49PM +0800, Yin, Fengwei wrote: > > > On 8/3/2022 9:36 PM, Matthew Wilcox wrote: > >> I will re-run the test with the new folio API based patch. > > Thanks! > > The new folio API based change passed the test. Should I > re-send the patch with the new folio API with your suggested-by? > And remove "RFC". Thanks. Yes please!
On 2022/8/3 21:01, Matthew Wilcox wrote: > On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index da39ec8afca8..08e21973b120 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, >> >> static int try_to_split_thp_page(struct page *page, const char *msg) >> { >> + struct page *head = compound_head(page); > > + >> lock_page(page); >> + /* >> + * If thp page has private data attached, thp split will fail. >> + * Release private data before split thp. >> + */ >> + if (page_has_private(head)) >> + try_to_release_page(head, GFP_KERNEL); >> + >> if (unlikely(split_huge_page(page))) { >> unsigned long pfn = page_to_pfn(page); > > It seems a shame to use the old page approach instead of the > shiny new folio approach. We're quite close to being able to remove > try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea. > How about this: > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index da39ec8afca8..765b383288b1 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p, > > static int try_to_split_thp_page(struct page *page, const char *msg) > { > - lock_page(page); > + struct folio *folio = page_folio(page); > + > + folio_lock(folio); > + if (folio_test_private(folio)) > + filemap_release_folio(folio, GFP_KERNEL); If filemap_release_folio fails, we could avoid trying split_huge_page here? > if (unlikely(split_huge_page(page))) { > unsigned long pfn = page_to_pfn(page); > > - unlock_page(page); > + folio_unlock(folio); > pr_info("%s: %#lx: thp split failed\n", msg, pfn); > - put_page(page); > + folio_put(folio); > return -EBUSY; > } > - unlock_page(page); > + folio = page_folio(page); Above line (re-fetching folio) might need a comment to avoid future confusing ? Anyway, this patch looks good to me. Thanks for fixing. > + folio_unlock(folio); > > return 0; > } > > . >
On 2022/8/4 09:19, Miaohe Lin wrote: > On 2022/8/3 21:01, Matthew Wilcox wrote: >> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index da39ec8afca8..08e21973b120 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, >>> >>> static int try_to_split_thp_page(struct page *page, const char *msg) >>> { >>> + struct page *head = compound_head(page); >> > + >>> lock_page(page); >>> + /* >>> + * If thp page has private data attached, thp split will fail. >>> + * Release private data before split thp. >>> + */ >>> + if (page_has_private(head)) >>> + try_to_release_page(head, GFP_KERNEL); >>> + >>> if (unlikely(split_huge_page(page))) { >>> unsigned long pfn = page_to_pfn(page); >> >> It seems a shame to use the old page approach instead of the >> shiny new folio approach. We're quite close to being able to remove >> try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea. >> How about this: >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index da39ec8afca8..765b383288b1 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p, >> >> static int try_to_split_thp_page(struct page *page, const char *msg) >> { >> - lock_page(page); >> + struct folio *folio = page_folio(page); >> + >> + folio_lock(folio); >> + if (folio_test_private(folio)) >> + filemap_release_folio(folio, GFP_KERNEL); > > If filemap_release_folio fails, we could avoid trying split_huge_page here? Maybe we could stick to this regarding this is error recovery path instead of hot path? > >> if (unlikely(split_huge_page(page))) { >> unsigned long pfn = page_to_pfn(page); >> >> - unlock_page(page); >> + folio_unlock(folio); >> pr_info("%s: %#lx: thp split failed\n", msg, pfn); >> - put_page(page); >> + folio_put(folio); >> return -EBUSY; >> } >> - unlock_page(page); >> + folio = page_folio(page); > > Above line (re-fetching folio) might need a comment to avoid future confusing ? I will add one line comment about why page_folio need here. > > Anyway, this patch looks good to me. Thanks for fixing. Thanks. Regards Yin, Fengwei > >> + folio_unlock(folio); >> >> return 0; >> } >> >> . >> >
On 2022/8/4 9:54, Yin Fengwei wrote: > On 2022/8/4 09:19, Miaohe Lin wrote: >> On 2022/8/3 21:01, Matthew Wilcox wrote: >>> On Wed, Aug 03, 2022 at 10:52:43AM +0800, Yin Fengwei wrote: >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index da39ec8afca8..08e21973b120 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, >>>> >>>> static int try_to_split_thp_page(struct page *page, const char *msg) >>>> { >>>> + struct page *head = compound_head(page); >>> > + >>>> lock_page(page); >>>> + /* >>>> + * If thp page has private data attached, thp split will fail. >>>> + * Release private data before split thp. >>>> + */ >>>> + if (page_has_private(head)) >>>> + try_to_release_page(head, GFP_KERNEL); >>>> + >>>> if (unlikely(split_huge_page(page))) { >>>> unsigned long pfn = page_to_pfn(page); >>> >>> It seems a shame to use the old page approach instead of the >>> shiny new folio approach. We're quite close to being able to remove >>> try_to_release_page() in 6.1 or 6.2 so adding a new caller is a bad idea. >>> How about this: >>> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index da39ec8afca8..765b383288b1 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -1484,16 +1484,21 @@ static int identify_page_state(unsigned long pfn, struct page *p, >>> >>> static int try_to_split_thp_page(struct page *page, const char *msg) >>> { >>> - lock_page(page); >>> + struct folio *folio = page_folio(page); >>> + >>> + folio_lock(folio); >>> + if (folio_test_private(folio)) >>> + filemap_release_folio(folio, GFP_KERNEL); >> >> If filemap_release_folio fails, we could avoid trying split_huge_page here? > Maybe we could stick to this regarding this is error recovery path > instead of hot path? That should be fine. Thanks.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index da39ec8afca8..08e21973b120 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1484,7 +1484,16 @@ static int identify_page_state(unsigned long pfn, struct page *p, static int try_to_split_thp_page(struct page *page, const char *msg) { + struct page *head = compound_head(page); + lock_page(page); + /* + * If thp page has private data attached, thp split will fail. + * Release private data before split thp. + */ + if (page_has_private(head)) + try_to_release_page(head, GFP_KERNEL); + if (unlikely(split_huge_page(page))) { unsigned long pfn = page_to_pfn(page);