From patchwork Mon Apr 10 17:44:31 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sumit Semwal <sumit.semwal@linaro.org>
X-Patchwork-Id: 9673877
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	B3AE26020C for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 10 Apr 2017 17:46:00 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9855127D4D
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 10 Apr 2017 17:46:00 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 8BCB727F60; Mon, 10 Apr 2017 17:46:00 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU,
	RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E13327D4D
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 10 Apr 2017 17:46:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754017AbdDJRp4 (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Mon, 10 Apr 2017 13:45:56 -0400
Received: from mail-pg0-f48.google.com ([74.125.83.48]:35862 "EHLO
	mail-pg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753534AbdDJRpx (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Mon, 10 Apr 2017 13:45:53 -0400
Received: by mail-pg0-f48.google.com with SMTP id g2so107980203pge.3
	for <linux-block@vger.kernel.org>;
	Mon, 10 Apr 2017 10:45:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=18SwAii/IQRWqTWRoBqgj1Esna8k0ayO/Q1NCJQo/6w=;
	b=GxcXzCf3j3w47y1PpGWS+XGJjtUg6/T4PGLfj3lhdfyJtGidc0L9j0wG19Q5AhULGb
	xkZcufhbbhZUMD46KLGV7KCSVEsvu7r75LGTik4hB2oXut5goTQVn2DfTcFvfvyw+0kw
	dlnK/FbfFg3jPdAUPA2Wfwt7Ax81MLAKtuymg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=18SwAii/IQRWqTWRoBqgj1Esna8k0ayO/Q1NCJQo/6w=;
	b=oBFcdvpi+Iw0vbMvJWC81ZHvzQrKz9i+QA6hyLetcONQ+9mNlapNqEj/ch2FyP742h
	pVlOE59a/dmit2UUmQQ4Clhxf93B4XttrufYzPYsCxKS5DUi7jp3hdlm+A0Yprw+qD99
	l4nYQp1nwquPbmdDkDva5dfDwzmuA8JX9tAl4YHHjN7eiT/tCMJ2W3cPDcN2fMhaX+im
	j8Mwsd7OZCJvmBca2YERN4pZJX35/y6Qx/c6DJaXEJFn3AsVuZJLKbnXNH7UEGSMKqDO
	M3fpB3Tqsby5nOHpmtzVS8hmDRpVa0LhOBaHF3NyyufPP4eLyvnhB15A7BemtOg/nYxA
	kbvg==
X-Gm-Message-State: 
 AFeK/H29XzMPEdMbPzIf215ffTfQP4UgRs2yxUF0FwH0/bPJ6iUSZZ/YclVuc4iVXrVXxXX+
X-Received: by 10.98.141.67 with SMTP id z64mr52688704pfd.91.1491846352825;
	Mon, 10 Apr 2017 10:45:52 -0700 (PDT)
Received: from phantom.lan ([106.51.225.38])
	by smtp.gmail.com with ESMTPSA id
	y6sm768833pfa.83.2017.04.10.10.45.49
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Mon, 10 Apr 2017 10:45:51 -0700 (PDT)
From: Sumit Semwal <sumit.semwal@linaro.org>
To: stable@vger.kernel.org
Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>,
	Douglas Miller <dougmill@linux.vnet.ibm.com>,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	Jens Axboe <axboe@fb.com>, Sumit Semwal <sumit.semwal@linaro.org>
Subject: [PATCH for-4.4 15/16] blk-mq: Avoid memory reclaim when remapping
	queues
Date: Mon, 10 Apr 2017 23:14:31 +0530
Message-Id: <1491846272-14882-16-git-send-email-sumit.semwal@linaro.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1491846272-14882-1-git-send-email-sumit.semwal@linaro.org>
References: <1491846272-14882-1-git-send-email-sumit.semwal@linaro.org>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>

[ Upstream commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 ]

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d8d63c3..0d1af3e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1470,7 +1470,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
 	INIT_LIST_HEAD(&tags->page_list);
 
 	tags->rqs = kzalloc_node(set->queue_depth * sizeof(struct request *),
-				 GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,
+				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 				 set->numa_node);
 	if (!tags->rqs) {
 		blk_mq_free_tags(tags);
@@ -1496,7 +1496,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
 
 		do {
 			page = alloc_pages_node(set->numa_node,
-				GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO,
+				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO,
 				this_order);
 			if (page)
 				break;
@@ -1517,7 +1517,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
 		 * Allow kmemleak to scan these pages as they contain pointers
 		 * to additional allocations like via ops->init_request().
 		 */
-		kmemleak_alloc(p, order_to_size(this_order), 1, GFP_KERNEL);
+		kmemleak_alloc(p, order_to_size(this_order), 1, GFP_NOIO);
 		entries_per_page = order_to_size(this_order) / rq_size;
 		to_do = min(entries_per_page, set->queue_depth - i);
 		left -= to_do * rq_size;