From patchwork Wed Feb 15 12:56:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Polakov X-Patchwork-Id: 9574005 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 50C4760209 for ; Wed, 15 Feb 2017 13:06:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 388752843F for ; Wed, 15 Feb 2017 13:06:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D0422848E; Wed, 15 Feb 2017 13:06:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=2.0 tests=BAYES_00,DKIM_ADSP_ALL, DKIM_SIGNED,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D13828492 for ; Wed, 15 Feb 2017 13:06:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750781AbdBONGt (ORCPT ); Wed, 15 Feb 2017 08:06:49 -0500 Received: from special.m3.smtp.beget.ru ([5.101.158.90]:25515 "EHLO special.m3.smtp.beget.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751412AbdBONGq (ORCPT ); Wed, 15 Feb 2017 08:06:46 -0500 X-Greylist: delayed 586 seconds by postgrey-1.27 at vger.kernel.org; Wed, 15 Feb 2017 08:06:46 EST DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=beget.ru; s=beget; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version: Date:Message-ID:From:Cc:To:References:Subject:Reply-To:Sender:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ZDKoOjODqZ5qwuXwDxr2hCBqhzmtRwMdW7qfSRQsvXw=; b=rASvQKyamUzVrhn4epMwAvhmnv Oz4Bnh7139hj08MbyNvcQOxDi25+rve3ud+oD1+Yjf4MQ2/s+MWFY1KRmNGM39BVf7NJsjimzJZmE JVYcWvZw1cyr0MwsI9HOXOPyjNp9yCg+sWj8qMsIHOIVR096CsjZWaM7jEdUxzQ/n9XU=; Received: from [5.101.159.150] (port=35629 helo=[10.100.210.117]) by smtp.beget.ru with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.88-beget) (envelope-from ) id 1cdz8i-0002Yo-Um; Wed, 15 Feb 2017 15:56:56 +0300 Reply-To: apolyakov@beget.ru Subject: Re: [Bug 192981] New: page allocation stalls References: <20170123135111.13ac3e47110de10a4bd503ef@linux-foundation.org> To: Andrew Morton Cc: linux-mm@kvack.org, linux-xfs@vger.kernel.org, bugzilla-daemon@bugzilla.kernel.org From: Alexander Polakov Organization: Beget Message-ID: <8f450abd-4e05-92d3-2533-72b05fea2012@beget.ru> Date: Wed, 15 Feb 2017 15:56:56 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20170123135111.13ac3e47110de10a4bd503ef@linux-foundation.org> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 01/24/2017 12:51 AM, Andrew Morton wrote: > > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > A 2100 second page allocation stall! > I think we finally figured out the problem using Tetsuo Handa's mallocwd patch. It seems like it is in XFS direct reclaim path. Here's how it goes: memory is low, rsync goes into direct reclaim, locking xfs mutex in xfs_reclaim_inodes_nr(): 2017-02-14T00:12:59.811447+03:00 storage9 [24646.497290] MemAlloc: rsync(19706) flags=0x404840 switches=8692 seq=340 gfp=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK) order=0 delay=6795 2017-02-14T00:12:59.811447+03:00 storage9 [24646.497550] rsync R 2017-02-14T00:12:59.811579+03:00 storage9 0 19706 5000 0x00000000 2017-02-14T00:12:59.811579+03:00 storage9 [24646.497690] ffffa4361dc36c00 2017-02-14T00:12:59.811591+03:00 storage9 0000000c1dc36c00 2017-02-14T00:12:59.811591+03:00 storage9 ffffa4361dc36c00 2017-02-14T00:12:59.811724+03:00 storage9 ffffa44347792580 2017-02-14T00:12:59.811841+03:00 storage9 2017-02-14T00:12:59.811841+03:00 storage9 [24646.497951] 0000000000000000 2017-02-14T00:12:59.811846+03:00 storage9 ffffb1cab6343458 2017-02-14T00:12:59.811885+03:00 storage9 ffffffffb383e799 2017-02-14T00:12:59.811987+03:00 storage9 0000000000000000 2017-02-14T00:12:59.812103+03:00 storage9 2017-02-14T00:12:59.812103+03:00 storage9 [24646.498208] ffffa443ffff7a00 2017-02-14T00:12:59.812103+03:00 storage9 0000000000000001 2017-02-14T00:12:59.812104+03:00 storage9 ffffb1cab6343448 2017-02-14T00:12:59.812233+03:00 storage9 0000000000000002 2017-02-14T00:12:59.812350+03:00 storage9 2017-02-14T00:12:59.812475+03:00 storage9 [24646.498462] Call Trace: 2017-02-14T00:12:59.812610+03:00 storage9 [24646.498587] [] ? __schedule+0x179/0x5c8 2017-02-14T00:12:59.812733+03:00 storage9 [24646.498718] [] ? schedule+0x32/0x80 2017-02-14T00:12:59.812869+03:00 storage9 [24646.498846] [] ? schedule_timeout+0x159/0x2a0 2017-02-14T00:12:59.812997+03:00 storage9 [24646.498977] [] ? add_timer_on+0x130/0x130 2017-02-14T00:12:59.813130+03:00 storage9 [24646.499108] [] ? __alloc_pages_nodemask+0xe73/0x16b0 2017-02-14T00:12:59.813263+03:00 storage9 [24646.499240] [] ? alloc_pages_current+0x9a/0x120 2017-02-14T00:12:59.813388+03:00 storage9 [24646.499371] [] ? xfs_buf_allocate_memory+0x171/0x2c0 2017-02-14T00:12:59.813564+03:00 storage9 [24646.499503] [] ? xfs_buf_get_map+0x18b/0x1d0 2017-02-14T00:12:59.813654+03:00 storage9 [24646.499634] [] ? xfs_buf_read_map+0x3b/0x160 2017-02-14T00:12:59.813783+03:00 storage9 [24646.499765] [] ? xfs_trans_read_buf_map+0x1f0/0x490 2017-02-14T00:12:59.813931+03:00 storage9 [24646.499897] [] ? xfs_imap_to_bp+0x79/0x120 2017-02-14T00:12:59.814056+03:00 storage9 [24646.500029] [] ? xfs_iflush+0x118/0x380 2017-02-14T00:12:59.814196+03:00 storage9 [24646.500158] [] ? wake_atomic_t_function+0x40/0x40 2017-02-14T00:12:59.814314+03:00 storage9 [24646.500289] [] ? xfs_reclaim_inode+0x274/0x3f0 2017-02-14T00:12:59.814444+03:00 storage9 [24646.500421] [] ? xfs_reclaim_inodes_ag+0x1b9/0x2c0 2017-02-14T00:12:59.814573+03:00 storage9 [24646.500553] [] ? radix_tree_next_chunk+0x108/0x2a0 2017-02-14T00:12:59.814701+03:00 storage9 [24646.500685] [] ? lock_timer_base+0x51/0x70 2017-02-14T00:12:59.814846+03:00 storage9 [24646.500814] [] ? radix_tree_gang_lookup_tag+0xae/0x180 2017-02-14T00:12:59.814966+03:00 storage9 [24646.500946] [] ? xfs_perag_get_tag+0x48/0x100 2017-02-14T00:12:59.815104+03:00 storage9 [24646.501079] [] ? __list_lru_walk_one.isra.7+0x31/0x120 2017-02-14T00:12:59.815231+03:00 storage9 [24646.501212] [] ? iget5_locked+0x240/0x240 2017-02-14T00:12:59.815360+03:00 storage9 [24646.501342] [] ? xfs_reclaim_inodes_nr+0x31/0x40 2017-02-14T00:12:59.815489+03:00 storage9 [24646.501472] [] ? super_cache_scan+0x1a0/0x1b0 2017-02-14T00:12:59.815629+03:00 storage9 [24646.501603] [] ? shrink_slab+0x262/0x440 2017-02-14T00:12:59.815760+03:00 storage9 [24646.501734] [] ? drop_slab_node+0x2b/0x60 2017-02-14T00:12:59.815891+03:00 storage9 [24646.501864] [] ? drop_slab+0x42/0x70 2017-02-14T00:12:59.816020+03:00 storage9 [24646.501994] [] ? out_of_memory+0x220/0x560 2017-02-14T00:12:59.816160+03:00 storage9 [24646.502122] [] ? __alloc_pages_nodemask+0x1312/0x16b0 2017-02-14T00:12:59.816285+03:00 storage9 [24646.502255] [] ? alloc_pages_current+0x9a/0x120 2017-02-14T00:12:59.816407+03:00 storage9 [24646.502386] [] ? pte_alloc_one+0x13/0x40 2017-02-14T00:12:59.816536+03:00 storage9 [24646.502517] [] ? handle_mm_fault+0xc7f/0x14b0 2017-02-14T00:12:59.816659+03:00 storage9 [24646.502648] [] ? __do_page_fault+0x1cf/0x5a0 2017-02-14T00:12:59.816851+03:00 storage9 [24646.502777] [] ? page_fault+0x22/0x30 But it cannot get memory, because it's low (?). So it stays blocked. Other processes do the same but they can't get past the mutex in xfs_reclaim_inodes_nr(): 2017-02-14T00:12:59.817057+03:00 storage9 [24646.502909] MemAlloc: rsync(19707) flags=0x404840 switches=6344 seq=638 gfp=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) order=0 delay=3257 uninterruptible 2017-02-14T00:12:59.817062+03:00 storage9 [24646.503167] rsync D 2017-02-14T00:12:59.817189+03:00 storage9 0 19707 5000 0x00000000 2017-02-14T00:12:59.817197+03:00 storage9 [24646.503299] 0000000000000000 2017-02-14T00:12:59.817197+03:00 storage9 ffffa44347793840 2017-02-14T00:12:59.817207+03:00 storage9 ffffa4277a36ec00 2017-02-14T00:12:59.817316+03:00 storage9 ffffa4361dc35100 2017-02-14T00:12:59.817446+03:00 storage9 2017-02-14T00:12:59.817446+03:00 storage9 [24646.503554] ffffa4437a514300 2017-02-14T00:12:59.817461+03:00 storage9 ffffb1cab0823638 2017-02-14T00:12:59.817461+03:00 storage9 ffffffffb383e799 2017-02-14T00:12:59.817598+03:00 storage9 ffffa44365967b00 2017-02-14T00:12:59.817699+03:00 storage9 2017-02-14T00:12:59.817699+03:00 storage9 [24646.503809] ffffb1cab0823680 2017-02-14T00:12:59.817717+03:00 storage9 0000000000000001 2017-02-14T00:12:59.817717+03:00 storage9 ffffa443424062a8 2017-02-14T00:12:59.817832+03:00 storage9 0000000000000000 2017-02-14T00:12:59.817948+03:00 storage9 2017-02-14T00:12:59.818078+03:00 storage9 [24646.504061] Call Trace: 2017-02-14T00:12:59.818211+03:00 storage9 [24646.504189] [] ? __schedule+0x179/0x5c8 2017-02-14T00:12:59.818339+03:00 storage9 [24646.504321] [] ? schedule+0x32/0x80 2017-02-14T00:12:59.818471+03:00 storage9 [24646.504451] [] ? schedule_preempt_disabled+0xe/0x20 2017-02-14T00:12:59.818611+03:00 storage9 [24646.504582] [] ? __mutex_lock_slowpath+0x8a/0x100 2017-02-14T00:12:59.818737+03:00 storage9 [24646.504715] [] ? mutex_lock+0x13/0x22 2017-02-14T00:12:59.818870+03:00 storage9 [24646.504846] [] ? xfs_reclaim_inodes_ag+0x228/0x2c0 2017-02-14T00:12:59.819024+03:00 storage9 [24646.504977] [] ? radix_tree_next_chunk+0x108/0x2a0 2017-02-14T00:12:59.819124+03:00 storage9 [24646.505107] [] ? radix_tree_gang_lookup_tag+0xae/0x180 2017-02-14T00:12:59.819273+03:00 storage9 [24646.505238] [] ? xfs_perag_get_tag+0x48/0x100 2017-02-14T00:12:59.819397+03:00 storage9 [24646.505370] [] ? __list_lru_walk_one.isra.7+0x31/0x120 2017-02-14T00:12:59.819528+03:00 storage9 [24646.505502] [] ? iget5_locked+0x240/0x240 2017-02-14T00:12:59.819657+03:00 storage9 [24646.505634] [] ? xfs_reclaim_inodes_nr+0x31/0x40 2017-02-14T00:12:59.819781+03:00 storage9 [24646.505766] [] ? super_cache_scan+0x1a0/0x1b0 2017-02-14T00:12:59.819915+03:00 storage9 [24646.505897] [] ? shrink_slab+0x262/0x440 2017-02-14T00:12:59.820100+03:00 storage9 [24646.506028] [] ? shrink_node+0xef/0x2d0 2017-02-14T00:12:59.820176+03:00 storage9 [24646.506158] [] ? do_try_to_free_pages+0xc2/0x2b0 2017-02-14T00:12:59.820300+03:00 storage9 [24646.506288] [] ? try_to_free_pages+0xe2/0x1c0 2017-02-14T00:12:59.820442+03:00 storage9 [24646.506415] [] ? __perform_reclaim.isra.80+0x79/0xc0 2017-02-14T00:12:59.820556+03:00 storage9 [24646.506542] [] ? __alloc_pages_nodemask+0x861/0x16b0 2017-02-14T00:12:59.820683+03:00 storage9 [24646.506669] [] ? __radix_tree_lookup+0x7b/0xe0 2017-02-14T00:12:59.820851+03:00 storage9 [24646.506796] [] ? alloc_pages_current+0x9a/0x120 2017-02-14T00:12:59.820942+03:00 storage9 [24646.506922] [] ? filemap_fault+0x396/0x540 2017-02-14T00:12:59.821072+03:00 storage9 [24646.507048] [] ? radix_tree_next_chunk+0x108/0x2a0 2017-02-14T00:12:59.821192+03:00 storage9 [24646.507176] [] ? ext4_filemap_fault+0x3f/0x60 2017-02-14T00:12:59.821318+03:00 storage9 [24646.507303] [] ? __do_fault+0x71/0x120 2017-02-14T00:12:59.821441+03:00 storage9 [24646.507428] [] ? handle_mm_fault+0xdcd/0x14b0 2017-02-14T00:12:59.821574+03:00 storage9 [24646.507555] [] ? kmem_cache_free+0x204/0x220 2017-02-14T00:12:59.821697+03:00 storage9 [24646.507682] [] ? __fput+0x149/0x200 2017-02-14T00:12:59.821824+03:00 storage9 [24646.507806] [] ? __do_page_fault+0x1cf/0x5a0 2017-02-14T00:12:59.821954+03:00 storage9 [24646.507934] [] ? page_fault+0x22/0x30 Even kswapd gets stuck: 2017-02-14T00:13:10.306351+03:00 storage9 [24656.991375] MemAlloc: kswapd0(114) flags=0xa40840 switches=22109 uninterruptible 2017-02-14T00:13:10.306351+03:00 storage9 [24656.991622] kswapd0 D 2017-02-14T00:13:10.306473+03:00 storage9 0 114 2 0x00000000 2017-02-14T00:13:10.306482+03:00 storage9 [24656.991753] 0000000000000000 2017-02-14T00:13:10.306482+03:00 storage9 ffffa43830be0f00 2017-02-14T00:13:10.306488+03:00 storage9 ffffa44342ff2880 2017-02-14T00:13:10.306601+03:00 storage9 ffffa443600e0d80 2017-02-14T00:13:10.306714+03:00 storage9 2017-02-14T00:13:10.306723+03:00 storage9 [24656.992000] ffffa4437ad14300 2017-02-14T00:13:10.306723+03:00 storage9 ffffb1ca8d2737d8 2017-02-14T00:13:10.306731+03:00 storage9 ffffffffb383e799 2017-02-14T00:13:10.306846+03:00 storage9 ffffb1ca8d273788 2017-02-14T00:13:10.306956+03:00 storage9 2017-02-14T00:13:10.306957+03:00 storage9 [24656.992245] ffffb1ca8d273788 2017-02-14T00:13:10.306957+03:00 storage9 ffffb1ca8d273798 2017-02-14T00:13:10.307011+03:00 storage9 ffffb1ca8d273798 2017-02-14T00:13:10.307087+03:00 storage9 0000000002400001 2017-02-14T00:13:10.307200+03:00 storage9 2017-02-14T00:13:10.307321+03:00 storage9 [24656.992489] Call Trace: 2017-02-14T00:13:10.307475+03:00 storage9 [24656.992611] [] ? __schedule+0x179/0x5c8 2017-02-14T00:13:10.307572+03:00 storage9 [24656.992736] [] ? schedule+0x32/0x80 2017-02-14T00:13:10.307703+03:00 storage9 [24656.992861] [] ? rwsem_down_read_failed+0xb2/0x100 2017-02-14T00:13:10.307831+03:00 storage9 [24656.992987] [] ? call_rwsem_down_read_failed+0x14/0x30 2017-02-14T00:13:10.307949+03:00 storage9 [24656.993114] [] ? down_read+0x13/0x30 2017-02-14T00:13:10.308075+03:00 storage9 [24656.993239] [] ? xfs_map_blocks+0x90/0x2f0 2017-02-14T00:13:10.308214+03:00 storage9 [24656.993366] [] ? xfs_do_writepage+0x2b6/0x6a0 2017-02-14T00:13:10.308337+03:00 storage9 [24656.993496] [] ? submit_bh_wbc+0x169/0x200 2017-02-14T00:13:10.308518+03:00 storage9 [24656.993627] [] ? xfs_vm_writepage+0x2c/0x50 2017-02-14T00:13:10.308603+03:00 storage9 [24656.993760] [] ? pageout.isra.60+0xeb/0x2e0 2017-02-14T00:13:10.308728+03:00 storage9 [24656.993891] [] ? shrink_page_list+0x736/0xa50 2017-02-14T00:13:10.308935+03:00 storage9 [24656.994022] [] ? shrink_inactive_list+0x202/0x4b0 2017-02-14T00:13:10.308993+03:00 storage9 [24656.994152] [] ? shrink_node_memcg+0x2e1/0x790 2017-02-14T00:13:10.309148+03:00 storage9 [24656.994282] [] ? shrink_node+0xc9/0x2d0 2017-02-14T00:13:10.309258+03:00 storage9 [24656.994416] [] ? kswapd+0x2e4/0x690 2017-02-14T00:13:10.309854+03:00 storage9 [24656.994546] [] ? mem_cgroup_shrink_node+0x1a0/0x1a0 2017-02-14T00:13:10.309854+03:00 storage9 [24656.994678] [] ? mem_cgroup_shrink_node+0x1a0/0x1a0 2017-02-14T00:13:10.309854+03:00 storage9 [24656.994808] [] ? kthread+0xc2/0xe0 2017-02-14T00:13:10.309854+03:00 storage9 [24656.994937] [] ? __kthread_init_worker+0xb0/0xb0 2017-02-14T00:13:10.309910+03:00 storage9 [24656.995069] [] ? ret_from_fork+0x22/0x30 Which finally leads to "Kernel panic - not syncing: Out of memory and no killable processes..." as no process is able to proceed. I quickly hacked this: We ran 2 of our machines with this patch for a night, no more lockups/stalls were detected. xfsaild does its work asynchronously, so xfs_inodes don't run wild as confirmed by slabtop. I put netconsole logs here: http://aplkv.beget.tech/lkml/xfs/ for anyone interested. diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 9ef152b..8adfb0a 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1254,7 +1254,7 @@ struct xfs_inode * xfs_reclaim_work_queue(mp); xfs_ail_push_all(mp->m_ail); - return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); + return 0; // xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); }