From patchwork Wed Jul 13 13:19:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 9227671 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9B1946075D for ; Wed, 13 Jul 2016 13:57:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8ACB622230 for ; Wed, 13 Jul 2016 13:57:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F49226E64; Wed, 13 Jul 2016 13:57:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BC2A922230 for ; Wed, 13 Jul 2016 13:57:07 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u6DDoedg031388; Wed, 13 Jul 2016 09:50:40 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u6DDoPYr009997 for ; Wed, 13 Jul 2016 09:50:25 -0400 Received: from dhcp131-147.brq.redhat.com ([10.34.131.68]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6DDoOTk026354 for ; Wed, 13 Jul 2016 09:50:24 -0400 Resent-From: Ondrej Kozina Resent-To: dm-devel@redhat.com Resent-Date: Wed, 13 Jul 2016 15:50:23 +0200 Resent-Message-ID: <358c2211-bd6a-d60c-e353-4b63be631b93@redhat.com> Resent-User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 Received: from zmta06.collab.prod.int.phx2.redhat.com (LHLO zmta06.collab.prod.int.phx2.redhat.com) (10.5.81.13) by zmail17.collab.prod.int.phx2.redhat.com with LMTP; Wed, 13 Jul 2016 09:19:39 -0400 (EDT) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by zmta06.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 2380816955E for ; Wed, 13 Jul 2016 09:19:39 -0400 (EDT) Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com [10.5.110.32]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6DDJdPb026020 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 13 Jul 2016 09:19:39 -0400 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 064BEC122B57 for ; Wed, 13 Jul 2016 13:19:31 +0000 (UTC) Received: from fsav304.sakura.ne.jp (fsav304.sakura.ne.jp [153.120.85.135]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id u6DDJSw3041082; Wed, 13 Jul 2016 22:19:28 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav304.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp); Wed, 13 Jul 2016 22:19:28 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp) Received: from [192.168.1.8] (softbank126074139022.bbtec.net [126.74.139.22]) (authenticated bits=0) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id u6DDJN6X041049 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 13 Jul 2016 22:19:28 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) To: Mikulas Patocka , Michal Hocko References: <57837CEE.1010609@redhat.com> <9be09452-de7f-d8be-fd5d-4a80d1cd1ba3@redhat.com> <20160712064905.GA14586@dhcp22.suse.cz> From: Tetsuo Handa Message-ID: <2d5e1f84-e886-7b98-cb11-170d7104fd13@I-love.SAKURA.ne.jp> Date: Wed, 13 Jul 2016 22:19:23 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: X-Greylist: Sender IP whitelisted by DNSRBL, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 13 Jul 2016 13:19:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 13 Jul 2016 13:19:31 +0000 (UTC) for IP:'202.181.97.72' DOMAIN:'www262.sakura.ne.jp' HELO:'www262.sakura.ne.jp' FROM:'penguin-kernel@I-love.SAKURA.ne.jp' RCPT:'' X-RedHat-Spam-Score: 0.79 (BAYES_50, RCVD_IN_DNSWL_NONE) 202.181.97.72 www262.sakura.ne.jp 202.181.97.72 www262.sakura.ne.jp X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.32 X-loop: dm-devel@redhat.com Cc: Ondrej Kozina , Stanislav Kozina , Jerome Marchand , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [dm-devel] System freezes after OOM X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP > On Tue, 12 Jul 2016, Michal Hocko wrote: > >> On Mon 11-07-16 11:43:02, Mikulas Patocka wrote: >> [...] >>> The general problem is that the memory allocator does 16 retries to >>> allocate a page and then triggers the OOM killer (and it doesn't take into >>> account how much swap space is free or how many dirty pages were really >>> swapped out while it waited). >> >> Well, that is not how it works exactly. We retry as long as there is a >> reclaim progress (at least one page freed) back off only if the >> reclaimable memory can exceed watermks which is scaled down in 16 >> retries. The overal size of free swap is not really that important if we >> cannot swap out like here due to complete memory reserves depletion: >> https://okozina.fedorapeople.org/bugs/swap_on_dmcrypt/vmlog-1462458369-00000/sample-00011/dmesg: >> [ 90.491276] Node 0 DMA free:0kB min:60kB low:72kB high:84kB active_anon:4096kB inactive_anon:4636kB active_file:212kB inactive_file:280kB unevictable:488kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:488kB dirty:276kB writeback:4636kB mapped:476kB shmem:12kB slab_reclaimable:204kB slab_unreclaimable:4700kB kernel_stack:48kB pagetables:120kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:61132 all_unreclaimable? yes >> [ 90.491283] lowmem_reserve[]: 0 977 977 977 >> [ 90.491286] Node 0 DMA32 free:0kB min:3828kB low:4824kB high:5820kB active_anon:423820kB inactive_anon:424916kB active_file:17996kB inactive_file:21800kB unevictable:20724kB isolated(anon):384kB isolated(file):0kB present:1032184kB managed:1001260kB mlocked:20724kB dirty:25236kB writeback:49972kB mapped:23076kB shmem:1364kB slab_reclaimable:13796kB slab_unreclaimable:43008kB kernel_stack:2816kB pagetables:7320kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5635400 all_unreclaimable? yes >> >> Look at the amount of free memory. It is completely depleted. So it >> smells like a process which has access to memory reserves has consumed >> all of it. I suspect a __GFP_MEMALLOC resp. PF_MEMALLOC from softirq >> context user which went off the leash. > > It is caused by the commit f9054c70d28bc214b2857cf8db8269f4f45a5e23. Prior > to this commit, mempool allocations set __GFP_NOMEMALLOC, so they never > exhausted reserved memory. With this commit, mempool allocations drop > __GFP_NOMEMALLOC, so they can dig deeper (if the process has PF_MEMALLOC, > they can bypass all limits). I wonder whether commit f9054c70d28bc214 ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") is doing correct thing. It says If an oom killed thread calls mempool_alloc(), it is possible that it'll loop forever if there are no elements on the freelist since __GFP_NOMEMALLOC prevents it from accessing needed memory reserves in oom conditions. but we can allow mempool_alloc(__GFP_NOMEMALLOC) requests to access memory reserves via below change, can't we? The purpose of allowing ALLOC_NO_WATERMARKS via TIF_MEMDIE is to make sure current allocation request does not to loop forever inside the page allocator, isn't it? Why we need to allow mempool_alloc(__GFP_NOMEMALLOC) requests to use ALLOC_NO_WATERMARKS when TIF_MEMDIE is not set? --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6903b69..e4e3700 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3439,14 +3439,14 @@ gfp_to_alloc_flags(gfp_t gfp_mask) } else if (unlikely(rt_task(current)) && !in_interrupt()) alloc_flags |= ALLOC_HARDER; - if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { + if (!in_interrupt() && unlikely(test_thread_flag(TIF_MEMDIE))) + alloc_flags |= ALLOC_NO_WATERMARKS; + else if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { if (gfp_mask & __GFP_MEMALLOC) alloc_flags |= ALLOC_NO_WATERMARKS; else if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) alloc_flags |= ALLOC_NO_WATERMARKS; - else if (!in_interrupt() && - ((current->flags & PF_MEMALLOC) || - unlikely(test_thread_flag(TIF_MEMDIE)))) + else if (!in_interrupt() && (current->flags & PF_MEMALLOC)) alloc_flags |= ALLOC_NO_WATERMARKS; } #ifdef CONFIG_CMA