From patchwork Thu Jun 2 05:48:38 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcin Wojtas X-Patchwork-Id: 9149083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 61A626074E for ; Thu, 2 Jun 2016 05:50:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 536EB264FB for ; Thu, 2 Jun 2016 05:50:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4845E269DA; Thu, 2 Jun 2016 05:50:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BB49426B39 for ; Thu, 2 Jun 2016 05:50:39 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1b8LV9-0005AH-2p; Thu, 02 Jun 2016 05:49:03 +0000 Received: from mail-io0-x22a.google.com ([2607:f8b0:4001:c06::22a]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1b8LV5-000590-TH for linux-arm-kernel@lists.infradead.org; Thu, 02 Jun 2016 05:49:00 +0000 Received: by mail-io0-x22a.google.com with SMTP id o189so26762275ioe.2 for ; Wed, 01 Jun 2016 22:48:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=2fe5gAPc7vbweOsjB+TRllCnwXEnGjGpWFbbRoHLCr4=; b=ZYIKqMjv90vh29vmfk9P6wTQHDAkYvo6uYc5PsVI9uIZ3GAtoLQPZP6m0UGfPe/6AE YjCbGYFTGxc0b79mNkncX6Yv30jElrvoqLmgwR9LH2WF3Ybs1R4LMe9uqsTYvtnnr0k1 GdGIAW9eAsQAZmN9qvN0DvqA0M1VWZSy0IsYRlbcuyCmc83eXF8wnYztIXfr6og4f6tH FONH/O6vLBGjpxT73Rtv2p4MF+C8kzihFdsjuQkdfSzXc1EzydD4z5yUGWwoM+XkljQ5 UCyW1S1qOAhVlXwjyOZUdGeHVq07JUBIabHYtzAjJ2zxQ8/E4bh8K3xuJBD1UanwXlhS 085Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=2fe5gAPc7vbweOsjB+TRllCnwXEnGjGpWFbbRoHLCr4=; b=Bq2MJew123p9ZOFHNV4SZN590G2aR+hP+g868Mw5wtizLtc19e87T6nhgE74Qqau0A 82C054mXphHtq8malpEpidIq/cNmq3SeP7bTB4DuHMFHPR03UTyiFePaXraLKDJI3wwK rqjODOB/OgsG062C/WKx6nAVd2bhhkEOxWxQdNH8GuGxfjFIHUk3Qi/q1KWeAZiIcTVg 0T4hlMQI8XKpizSKLzPun6Gg11hF3MgDChwoA47c4NT2HuJwB44HbfUHCBtoIJNE1QpI hVj4UE1NKoC7n28NK3C1e2CxFc7NAZx4cioDjwxUum9/Rw2xneCuOVCJgfD3WFi4nu6x GfAg== X-Gm-Message-State: ALyK8tKISYydHa9q3Ya81s67zgP2hG7aTtrkZTiNQfrDGaqxothe2wzK/OjFv9aL6OSBeMBJHDMt4hv15SjudQ== MIME-Version: 1.0 X-Received: by 10.107.192.1 with SMTP id q1mr1304519iof.20.1464846518291; Wed, 01 Jun 2016 22:48:38 -0700 (PDT) Received: by 10.107.145.138 with HTTP; Wed, 1 Jun 2016 22:48:38 -0700 (PDT) In-Reply-To: <20160531131520.GI24936@arm.com> References: <574D64A0.2070207@arm.com> <60e8df74202e40b28a4d53dbc7fd0b22@IL-EXCH02.marvell.com> <20160531131520.GI24936@arm.com> Date: Thu, 2 Jun 2016 07:48:38 +0200 Message-ID: Subject: Re: [BUG] Page allocation failures with newest kernels From: Marcin Wojtas To: Will Deacon X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160601_224859_999864_BDEF1B54 X-CRM114-Status: GOOD ( 18.59 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lior Amsalem , Thomas Petazzoni , Yehuda Yitschak , mgorman@techsingularity.net, Arnd Bergmann , Catalin Marinas , "linux-kernel@vger.kernel.org" , Nadav Haklai , "linux-mm@kvack.org" , Grzegorz Jaszczyk , =?UTF-8?Q?Gregory_Cl=C3=A9ment?= , Tomasz Nowicki , Robin Murphy , "linux-arm-kernel@lists.infradead.org" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Will, I think I found a right trace. Following one-liner fixes the issue beginning from v4.2-rc1 up to v4.4 included: The regression was introduced by commit 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd"), which in fact disabled memblock reserve at all for all platfroms not using CONFIG_DEFERRED_STRUCT_PAGE_INIT (x86 is the only user), hence temporary shortage of memory possible to allocate during my test. Since v4.4-rc1 following changes of approach have been introduced: 97a16fc - mm, page_alloc: only enforce watermarks for order-0 allocations 0aaa29a - mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand 974a786 - mm, page_alloc: remove MIGRATE_RESERVE From what I understood, now order-0 allocation keep no reserve at all. I checked all gathered logs and indeed it was order-0 which failed and apparently weren't able to reclaim successfully. Since the problem is very easy to reproduce (at least in my test, as well as stressing device in NAS setup) is there any chance to avoid destiny of page alloc failures? Or any trick to play with fragmentation parameters, etc.? I would be grateful for any hint. Best regards, Marcin 2016-05-31 15:15 GMT+02:00 Will Deacon : > On Tue, May 31, 2016 at 01:10:44PM +0000, Yehuda Yitschak wrote: >> During some of the stress tests we also came across a different warning >> from the arm64 page management code >> It looks like a race is detected between HW and SW marking a bit in the PTE > > A72 (which I believe is the CPU in that SoC) is a v8.0 CPU and therefore > doesn't have hardware DBM. > >> Not sure it's really related but I thought it might give a clue on the issue >> http://pastebin.com/ASv19vZP > > There have been a few patches from Catalin to fix up the hardware DBM > patches, so it might be worth trying to reproduce this failure with a > more recent kernel. I doubt this is related to the allocation failures, > however. > > Will --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -294,7 +294,7 @@ static inline bool early_page_uninitialised(unsigned long pfn) static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) { - return false; + return true; }