From patchwork Wed Aug  5 09:51:21 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michal Hocko <mhocko@kernel.org>
X-Patchwork-Id: 6947511
Return-Path: <linux-btrfs-owner@kernel.org>
X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 0FD8D9F358
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed,  5 Aug 2015 09:56:26 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 0A23320384
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed,  5 Aug 2015 09:56:25 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id CD43C2034A
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed,  5 Aug 2015 09:56:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752895AbbHEJ4A (ORCPT
	<rfc822;patchwork-linux-btrfs@patchwork.kernel.org>);
	Wed, 5 Aug 2015 05:56:00 -0400
Received: from mail-wi0-f181.google.com ([209.85.212.181]:34450 "EHLO
	mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752450AbbHEJvy (ORCPT
	<rfc822; linux-btrfs@vger.kernel.org>); Wed, 5 Aug 2015 05:51:54 -0400
Received: by wibcd8 with SMTP id cd8so16047914wib.1;
	Wed, 05 Aug 2015 02:51:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=9cCLGwEzX2DIUvfJopWUySr9f4MzlWrFEpNPUFH1sdo=;
	b=b9pVVMIJYSuOXSUilixUR6OcxZ+6qxbD01s519n2lXPK9H1KSLc+5qQM5pj7u8yv0F
	tDVMuLPobi70Tsbh0pnIsqOGCsdTPS6g21yBkrv2jYjvVv6Vh58OyL6/1AkqeOOQuCly
	ckQEPCJ8dYRCkJ4PxPvHRv0A7wblygFCITmUK1YlOohsF7T5bE7POfAoL7+K4ACNYQro
	jzkFuVJischUCQFQAK+/9/ZwwSnmhl0mp0HMOLMNW0lMsd254EIRUNJvpuEKlGmhMtBL
	P/aQ3viFKxiIqHN06Oq8+Z34iVM2wAvxKmQsl2w/8ORxt1tnobd1Gh+xo/H5OnQiPCpJ
	eG2g==
X-Received: by 10.181.12.111 with SMTP id ep15mr17120091wid.15.1438768313071;
	Wed, 05 Aug 2015 02:51:53 -0700 (PDT)
Received: from tiehlicka.suse.cz (nat1.scz.suse.com. [213.151.88.250])
	by smtp.gmail.com with ESMTPSA id
	yu4sm3229106wjc.43.2015.08.05.02.51.52
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Wed, 05 Aug 2015 02:51:52 -0700 (PDT)
From: mhocko@kernel.org
To: LKML <linux-kernel@vger.kernel.org>
Cc: <linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Dave Chinner <david@fromorbit.com>,
	"Theodore Ts'o" <tytso@mit.edu>, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Michal Hocko <mhocko@suse.com>
Subject: [RFC 5/8] ext4: Do not fail journal due to block allocator
Date: Wed,  5 Aug 2015 11:51:21 +0200
Message-Id: <1438768284-30927-6-git-send-email-mhocko@kernel.org>
X-Mailer: git-send-email 2.5.0
In-Reply-To: <1438768284-30927-1-git-send-email-mhocko@kernel.org>
References: <1438768284-30927-1-git-send-email-mhocko@kernel.org>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org
X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.com>
---
 fs/ext4/mballoc.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..e6361622bfd5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
@@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {