Message ID | 20240426112938.124740-1-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: use memalloc_nofs_save() in page_cache_ra_order() | expand |
On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead > path"), ensure that page_cache_ra_order() do not attempt to reclaim > file-backed pages too, or it leads to a deadlock, found issue when > test ext4 large folio. > > INFO: task DataXceiver for:7494 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 > Call trace: > __switch_to+0x14c/0x240 > __schedule+0x82c/0xdd0 > schedule+0x58/0xf0 > io_schedule+0x24/0xa0 > __folio_lock+0x130/0x300 > migrate_pages_batch+0x378/0x918 > migrate_pages+0x350/0x700 > compact_zone+0x63c/0xb38 > compact_zone_order+0xc0/0x118 > try_to_compact_pages+0xb0/0x280 > __alloc_pages_direct_compact+0x98/0x248 > __alloc_pages+0x510/0x1110 > alloc_pages+0x9c/0x130 > folio_alloc+0x20/0x78 > filemap_alloc_folio+0x8c/0x1b0 > page_cache_ra_order+0x174/0x308 > ondemand_readahead+0x1c8/0x2b8 > page_cache_async_ra+0x68/0xb8 > filemap_readahead.isra.0+0x64/0xa8 > filemap_get_pages+0x3fc/0x5b0 > filemap_splice_read+0xf4/0x280 > ext4_file_splice_read+0x2c/0x48 [ext4] > vfs_splice_read.part.0+0xa8/0x118 > splice_direct_to_actor+0xbc/0x288 > do_splice_direct+0x9c/0x108 > do_sendfile+0x328/0x468 > __arm64_sys_sendfile64+0x8c/0x148 > invoke_syscall+0x4c/0x118 > el0_svc_common.constprop.0+0xc8/0xf0 > do_el0_svc+0x24/0x38 > el0_svc+0x4c/0x1f8 > el0t_64_sync_handler+0xc0/0xc8 > el0t_64_sync+0x188/0x190 > > Cc: zhangyi (F) <yi.zhang@huawei.com> > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> I'm thinking Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Cc: stable > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -494,6 +494,7 @@ void page_cache_ra_order(struct readahead_control *ractl, > pgoff_t index = readahead_index(ractl); > pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; > pgoff_t mark = index + ra->size - ra->async_size; > + unsigned int nofs; > int err = 0; > gfp_t gfp = readahead_gfp_mask(mapping); > > @@ -508,6 +509,8 @@ void page_cache_ra_order(struct readahead_control *ractl, > new_order = min_t(unsigned int, new_order, ilog2(ra->size)); > } > > + /* See comment in page_cache_ra_unbounded() */ > + nofs = memalloc_nofs_save(); > filemap_invalidate_lock_shared(mapping); > while (index <= limit) { > unsigned int order = new_order; > @@ -531,6 +534,7 @@ void page_cache_ra_order(struct readahead_control *ractl, > > read_pages(ractl); > filemap_invalidate_unlock_shared(mapping); > + memalloc_nofs_restore(nofs); > > /* > * If there were already pages in the page cache, then we may have > -- > 2.41.0
On Fri, Apr 26, 2024 at 11:49:05AM -0700, Andrew Morton wrote: > On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > > io_schedule+0x24/0xa0 > > __folio_lock+0x130/0x300 > > migrate_pages_batch+0x378/0x918 > > migrate_pages+0x350/0x700 > > compact_zone+0x63c/0xb38 > > compact_zone_order+0xc0/0x118 > > try_to_compact_pages+0xb0/0x280 > > __alloc_pages_direct_compact+0x98/0x248 > > __alloc_pages+0x510/0x1110 > > alloc_pages+0x9c/0x130 > > folio_alloc+0x20/0x78 > > filemap_alloc_folio+0x8c/0x1b0 > > page_cache_ra_order+0x174/0x308 > > ondemand_readahead+0x1c8/0x2b8 > > I'm thinking > > Fixes: 793917d997df ("mm/readahead: Add large folio readahead") > Cc: stable I think it goes back earlier than that. https://lore.kernel.org/linux-mm/20200128060304.GA6615@bombadil.infradead.org/ details how it can happen with the old readpages code. It's just easier to hit now.
On 2024/4/27 11:45, Matthew Wilcox wrote: > On Fri, Apr 26, 2024 at 11:49:05AM -0700, Andrew Morton wrote: >> On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: >>> io_schedule+0x24/0xa0 >>> __folio_lock+0x130/0x300 >>> migrate_pages_batch+0x378/0x918 >>> migrate_pages+0x350/0x700 >>> compact_zone+0x63c/0xb38 >>> compact_zone_order+0xc0/0x118 >>> try_to_compact_pages+0xb0/0x280 >>> __alloc_pages_direct_compact+0x98/0x248 >>> __alloc_pages+0x510/0x1110 >>> alloc_pages+0x9c/0x130 >>> folio_alloc+0x20/0x78 >>> filemap_alloc_folio+0x8c/0x1b0 >>> page_cache_ra_order+0x174/0x308 >>> ondemand_readahead+0x1c8/0x2b8 >> >> I'm thinking >> >> Fixes: 793917d997df ("mm/readahead: Add large folio readahead") >> Cc: stable > > I think it goes back earlier than that. > https://lore.kernel.org/linux-mm/20200128060304.GA6615@bombadil.infradead.org/ > details how it can happen with the old readpages code. It's just easier > to hit now. > The page_cache_ra_order() is introduced from 793917d997df, but previous bugfix f2c817bed58d ("mm: use memalloc_nofs_save in readahead path") don't Cc stable, so the previous patch should be posted to stable?
diff --git a/mm/readahead.c b/mm/readahead.c index 63d6000103f0..c1b23989d9ca 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -494,6 +494,7 @@ void page_cache_ra_order(struct readahead_control *ractl, pgoff_t index = readahead_index(ractl); pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; + unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); @@ -508,6 +509,8 @@ void page_cache_ra_order(struct readahead_control *ractl, new_order = min_t(unsigned int, new_order, ilog2(ra->size)); } + /* See comment in page_cache_ra_unbounded() */ + nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); while (index <= limit) { unsigned int order = new_order; @@ -531,6 +534,7 @@ void page_cache_ra_order(struct readahead_control *ractl, read_pages(ractl); filemap_invalidate_unlock_shared(mapping); + memalloc_nofs_restore(nofs); /* * If there were already pages in the page cache, then we may have
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio. INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190 Cc: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- mm/readahead.c | 4 ++++ 1 file changed, 4 insertions(+)