From patchwork Tue Aug 8 23:10:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 9889285 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 07C1160352 for ; Tue, 8 Aug 2017 23:11:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE3102889F for ; Tue, 8 Aug 2017 23:11:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E2D202894B; Tue, 8 Aug 2017 23:11:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3350F2889F for ; Tue, 8 Aug 2017 23:11:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752665AbdHHXLh (ORCPT ); Tue, 8 Aug 2017 19:11:37 -0400 Received: from esa5.hgst.iphmx.com ([216.71.153.144]:46383 "EHLO esa5.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752455AbdHHXLd (ORCPT ); Tue, 8 Aug 2017 19:11:33 -0400 X-IronPort-AV: E=Sophos;i="5.41,345,1498492800"; d="scan'208";a="40189694" Received: from mail-cys01nam02lp0053.outbound.protection.outlook.com (HELO NAM02-CY1-obe.outbound.protection.outlook.com) ([207.46.163.53]) by ob1.hgst.iphmx.com with ESMTP; 09 Aug 2017 07:10:47 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector1-wdc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=fWW/4ae2pXC2dYzxJ11f7HnWtFmgD+jdKdHFj/Ow5hA=; b=E2rTXfpQi0Rb5maxAT3u5xjDkk1YIfSQoZDs7CFZcS4BLJPBNsH+5dxfVPa7Q5CRcAIGCtsXjaw4SqVAQ8dDrTuBE8qIibnvaLd+azpJW0eDuHLC6UDpHZYI91T6xhY6c7RzK7QwAwhSoRUKu1mT2F2nV+OkvPKciOmpNm4mrQU= Received: from CY1PR0401MB1536.namprd04.prod.outlook.com (10.163.19.154) by CY1PR0401MB1533.namprd04.prod.outlook.com (10.163.19.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1320.16; Tue, 8 Aug 2017 23:10:45 +0000 Received: from CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) by CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) with mapi id 15.01.1320.018; Tue, 8 Aug 2017 23:10:46 +0000 From: Bart Van Assche To: "dm-devel@redhat.com" , "linux-scsi@vger.kernel.org" , "linux-block@vger.kernel.org" , "ming.lei@redhat.com" CC: "loberman@redhat.com" Subject: Re: [v4.13-rc BUG] system lockup when running big buffered write(4M) to IB SRP via mpath Thread-Topic: [v4.13-rc BUG] system lockup when running big buffered write(4M) to IB SRP via mpath Thread-Index: AQHTEFEYgOBxouQM3EyyuMfVVLhEg6J7FnuA Date: Tue, 8 Aug 2017 23:10:45 +0000 Message-ID: <1502233843.2686.4.camel@wdc.com> References: <20170808141715.GB22763@ming.t460p> In-Reply-To: <20170808141715.GB22763@ming.t460p> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [63.163.107.100] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY1PR0401MB1533; 20:hV2CzIjnyKxxsS8oDplJYDicn/2nXme9TB1I07odZ7hC2WLptQj2RFg2+lsmYbDJUstX0EGTNo3sfYEdLnPvb95oeA/hV771Phd7FxKHcjkWTIk9YdLJpzPGZjwCOCS0jhPddcFksgRrEJHnIMQKpo5Yy/jRUcaWuSfAFUOav90= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 5fcda801-d78b-4af4-3d15-08d4deb2b362 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:CY1PR0401MB1533; x-ms-traffictypediagnostic: CY1PR0401MB1533: wdcipoutbound: EOP-TRUE x-exchange-antispam-report-test: UriScan:; x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(93001095)(100000703101)(100105400095)(6055026)(6041248)(20161123558100)(20161123555025)(20161123562025)(20161123564025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:CY1PR0401MB1533; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:CY1PR0401MB1533; x-forefront-prvs: 03932714EB x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(39410400002)(39400400002)(39850400002)(39840400002)(39450400003)(39860400002)(24454002)(377424004)(189002)(199003)(52314003)(25786009)(3280700002)(53936002)(99286003)(478600001)(33646002)(54356999)(3660700001)(8676002)(76176999)(2501003)(36756003)(189998001)(50986999)(102836003)(6116002)(8936002)(575784001)(86362001)(3846002)(6246003)(66066001)(105586002)(38730400002)(305945005)(97736004)(2201001)(81166006)(4326008)(72206003)(7736002)(2950100002)(101416001)(229853002)(2906002)(81156014)(6512007)(14454004)(5660300001)(6486002)(77096006)(103116003)(2900100001)(6506006)(6436002)(106356001)(68736007); DIR:OUT; SFP:1102; SCL:1; SRVR:CY1PR0401MB1533; H:CY1PR0401MB1536.namprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: wdc.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <1AF7A6A916E81F479B612A2B4CDAADFF@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Aug 2017 23:10:45.9624 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1533 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 2017-08-08 at 22:17 +0800, Ming Lei wrote: > Laurence and I see a system lockup issue when running concurrent > big buffered write(4M bytes) to IB SRP on v4.13-rc3. > [ ... ] > #cat hammer_write.sh > #!/bin/bash > while true; do > dd if=/dev/zero of=/dev/mapper/$1 bs=4096k count=800 > done Hello Laurence, Is your goal perhaps to simulate a DDN workload? In that case I think you need oflag=direct to the dd argument list such that the page cache writeback code does not alter the size of the write requests. Anyway, this test should not trigger a lockup. Can you check whether the patch below makes the soft lockup complaints disappear (without changing the hammer_write.sh test script)? Thanks, Bart. ---------------------------------------------------------------------------- [PATCH] block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time Drivers like dm-mpath requeue requests if no paths are available and if configured to do so. If the queue depth is sufficiently high and the queue rerunning delay sufficiently short then .requeue_work can be queued so often that other work items queued on the same work queue do not get executed. Avoid that this happens by only rerunning the queue after no blk_mq_delay_kick_requeue_list() calls have occurred during @msecs milliseconds. Since the device mapper core is the only user of blk_mq_delay_kick_requeue_list(), modify the implementation of this function instead of creating a new function. Signed-off-by: Bart Van Assche --- block/blk-mq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 041f7b7fa0d6..8bfea36e92f9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -679,8 +679,8 @@ EXPORT_SYMBOL(blk_mq_kick_requeue_list); void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs) { - kblockd_schedule_delayed_work(&q->requeue_work, - msecs_to_jiffies(msecs)); + kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, + msecs_to_jiffies(msecs)); } EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);