Message ID | 20220805062844.439152-1-fengwei.yin@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: release private data before split THP | expand |
On Thu, Aug 4, 2022 at 11:29 PM Yin Fengwei <fengwei.yin@intel.com> wrote: > > If there is private data attached to THP, the refcount of > THP will be increased and block the THP split. Release > private data attached to THP before split it to increase > the chance of splitting THP successfully. > > There was a memory failure issue hit during HW error > injection testing with 5.18 kernel + xfs as rootfs. Test > got killed and system reboot was required to re-run the > test. > > The issue was tracked down to THP split failure caused the > memory failure not being handled. The page dump showed: > > [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200 > [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0 > [ 1785.452408] memcg:ff4247f2d28e9000 > [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx" > [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2) > [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8 > [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000 > > It was like the error was injected to a large folio for xfs > with private data attached. > > With private data released before split THP, the test case > could be run successfully many times without reboot system. > > Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> > Suggested-by: Matthew Wilcox <willy@infradead.org> > Reviewed-by: Aaron Lu <aaron.lu@intel.com> > --- > Changelog from v1: > - Move private release to split_huge_page_to_list > to cover wider path per Yang's comment > - Update to commit message > > Changelog from RFC: > - Use new folio API per Mathhew Wilcox's suggestion > - Add one line comment before re-get folio of page per > Miaohe's comment > - Remove RFC tag > - Add Co-developed-by of Qiuxu who did a lot of debugging > work to locate where the real issue is > > mm/huge_memory.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 15965084816d..edcbc6c2bb3f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2590,6 +2590,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > goto out; > } > > + if (folio_test_private(folio) && > + !filemap_release_folio(folio, GFP_KERNEL)) { The GFP_KERNEL is fine for most THP split callsites except for the memory reclaim path since it might not allow certain flags to avoid recursion, for example, nested reclaim, issue I/O, etc. The most filesystems clear __GFP_FS. However it should not be a real life problem now since AFAIK just xfs supports large folios for now and xfs uses iomap release_folio() method which actually ignores gfp flags. So it sounds safer to follow the gfp convention used by xas_split_alloc() in the below. The best way is to pass in the gfp flag from the reclaimer IMO, but it seems overkilling at the moment. > + ret = -EBUSY; > + goto out; > + } > + > xas_split_alloc(&xas, head, compound_order(head), > mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); > if (xas_error(&xas)) { > > base-commit: 31be1d0fbd950395701d9fd47d8fb1f99c996f61 > -- > 2.25.1 >
Hi Yang, On 2022/8/9 01:49, Yang Shi wrote: > The GFP_KERNEL is fine for most THP split callsites except for the > memory reclaim path since it might not allow certain flags to avoid > recursion, for example, nested reclaim, issue I/O, etc. The most > filesystems clear __GFP_FS. However it should not be a real life > problem now since AFAIK just xfs supports large folios for now and xfs > uses iomap release_folio() method which actually ignores gfp flags. Thanks a lot for the valuable comments. > > So it sounds safer to follow the gfp convention used by > xas_split_alloc() in the below. The best way is to pass in the gfp > flag from the reclaimer IMO, but it seems overkilling at the moment. It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set. What about to use current_gfp_context(gfp_as_xas_split_alloc)? Regards Yin, Fengwei
On Tue, Aug 09, 2022 at 09:12:57AM +0800, Yin Fengwei wrote: > Hi Yang, > > On 2022/8/9 01:49, Yang Shi wrote: > > The GFP_KERNEL is fine for most THP split callsites except for the > > memory reclaim path since it might not allow certain flags to avoid > > recursion, for example, nested reclaim, issue I/O, etc. The most > > filesystems clear __GFP_FS. However it should not be a real life > > problem now since AFAIK just xfs supports large folios for now and xfs > > uses iomap release_folio() method which actually ignores gfp flags. > Thanks a lot for the valuable comments. > > > > > > So it sounds safer to follow the gfp convention used by > > xas_split_alloc() in the below. The best way is to pass in the gfp > > flag from the reclaimer IMO, but it seems overkilling at the moment. > > It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set. > What about to use current_gfp_context(gfp_as_xas_split_alloc)? > Sounds reasonable to me. Also, the gfp used by xas_split_alloc() should also be modified to: current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK)? Since they are in the same context.
On Tue, Aug 9, 2022 at 2:08 AM Aaron Lu <aaron.lu@intel.com> wrote: > > On Tue, Aug 09, 2022 at 09:12:57AM +0800, Yin Fengwei wrote: > > Hi Yang, > > > > On 2022/8/9 01:49, Yang Shi wrote: > > > The GFP_KERNEL is fine for most THP split callsites except for the > > > memory reclaim path since it might not allow certain flags to avoid > > > recursion, for example, nested reclaim, issue I/O, etc. The most > > > filesystems clear __GFP_FS. However it should not be a real life > > > problem now since AFAIK just xfs supports large folios for now and xfs > > > uses iomap release_folio() method which actually ignores gfp flags. > > Thanks a lot for the valuable comments. > > > > > > > > > > So it sounds safer to follow the gfp convention used by > > > xas_split_alloc() in the below. The best way is to pass in the gfp > > > flag from the reclaimer IMO, but it seems overkilling at the moment. > > > > It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set. > > What about to use current_gfp_context(gfp_as_xas_split_alloc)? > > > > Sounds reasonable to me. > > Also, the gfp used by xas_split_alloc() should also be modified to: > current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK)? > Since they are in the same context. Good point, fine to me.
On 2022/8/10 00:45, Yang Shi wrote: > On Tue, Aug 9, 2022 at 2:08 AM Aaron Lu <aaron.lu@intel.com> wrote: >> >> On Tue, Aug 09, 2022 at 09:12:57AM +0800, Yin Fengwei wrote: >>> Hi Yang, >>> >>> On 2022/8/9 01:49, Yang Shi wrote: >>>> The GFP_KERNEL is fine for most THP split callsites except for the >>>> memory reclaim path since it might not allow certain flags to avoid >>>> recursion, for example, nested reclaim, issue I/O, etc. The most >>>> filesystems clear __GFP_FS. However it should not be a real life >>>> problem now since AFAIK just xfs supports large folios for now and xfs >>>> uses iomap release_folio() method which actually ignores gfp flags. >>> Thanks a lot for the valuable comments. >>> >>> >>>> >>>> So it sounds safer to follow the gfp convention used by >>>> xas_split_alloc() in the below. The best way is to pass in the gfp >>>> flag from the reclaimer IMO, but it seems overkilling at the moment. >>> >>> It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set. >>> What about to use current_gfp_context(gfp_as_xas_split_alloc)? >>> >> >> Sounds reasonable to me. >> >> Also, the gfp used by xas_split_alloc() should also be modified to: >> current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK)? >> Since they are in the same context. > > Good point, fine to me. Thanks both of your a lot for the comments. I will update the patch. Regards Yin, Fengwei
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 15965084816d..edcbc6c2bb3f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2590,6 +2590,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) goto out; } + if (folio_test_private(folio) && + !filemap_release_folio(folio, GFP_KERNEL)) { + ret = -EBUSY; + goto out; + } + xas_split_alloc(&xas, head, compound_order(head), mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); if (xas_error(&xas)) {