From patchwork Fri Nov 17 17:01:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 10063021 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1189C6023A for ; Fri, 17 Nov 2017 17:01:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F14762A9BA for ; Fri, 17 Nov 2017 17:01:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E5EB72AD35; Fri, 17 Nov 2017 17:01:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C434D2A9BA for ; Fri, 17 Nov 2017 17:01:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759408AbdKQRBf (ORCPT ); Fri, 17 Nov 2017 12:01:35 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:15396 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751425AbdKQRB1 (ORCPT ); Fri, 17 Nov 2017 12:01:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1510938087; x=1542474087; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=yZMhTo515ukaACcouy4iW2Z6p/r/Lgj1zcbpumImm9w=; b=MKRiPOgsUc2fGjFadt2xfWJZ8X31Tcrng/v72eax49ibvZvClcTooOrL 5+SWq5FR4cqFcmRfN5inqWZOx7cTH46r2txfqeR9ysTl6mqEe7cBFauZ8 /WiX3JULkuJSMc6gysMeV29b8HdABBhUynUIDTuAr9CYnVWXbtipVhMCh D8UqCLhWWuIITrelsPAuZObohdDwAlVj07bvWp4C/2DngldVzR591c2uP gpgFG6uw7ugU/NBEi+cfDDo+FtgbJicPUF+euKImPIgYvCqABt4Zdu4Nk V27gBYug1RdSumQpOF7Z9fHtYLIJv2tejowOqeZ9uEjv2O1eJtnmVoBgF g==; X-IronPort-AV: E=Sophos;i="5.43,434,1503331200"; d="scan'208";a="63249101" Received: from mail-sn1nam01lp0115.outbound.protection.outlook.com (HELO NAM01-SN1-obe.outbound.protection.outlook.com) ([207.46.163.115]) by ob1.hgst.iphmx.com with ESMTP; 18 Nov 2017 01:01:26 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector1-wdc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=yZMhTo515ukaACcouy4iW2Z6p/r/Lgj1zcbpumImm9w=; b=WUYZSIhkYAg6cm7oqGr2Fm4g8TmoqRHOBdruy9VOAEhTks747UIDrx4Ddq9IjM6vD4pCCwsPsxg+TbdGILwCow5EwH2940jwcv07QDy9rHc6CWSKlNfgdxjQyfCvUllphKyXiGZX9aouOAtsPz8j9CG9QdHFX+0LO1rVOlHDGZc= Received: from CY1PR0401MB1536.namprd04.prod.outlook.com (10.163.19.154) by CY1PR0401MB1534.namprd04.prod.outlook.com (10.163.19.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.218.12; Fri, 17 Nov 2017 17:01:24 +0000 Received: from CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) by CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) with mapi id 15.20.0218.015; Fri, 17 Nov 2017 17:01:24 +0000 From: Bart Van Assche To: "xjtuwjp@gmail.com" CC: "jejb@linux.vnet.ibm.com" , "linux-scsi@vger.kernel.org" , "hare@suse.de" , "martin.petersen@oracle.com" , "snitzer@redhat.com" Subject: Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Thread-Topic: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Thread-Index: AQHSPH0ixpKaIm34IUi4xiueAyX1x6DWacMAgAJo54CCPYjJgIAACQcAgASQJ4CAAB3IAA== Date: Fri, 17 Nov 2017 17:01:24 +0000 Message-ID: <1510938082.2846.23.camel@wdc.com> References: <1479016028.17624.16.camel@linux.vnet.ibm.com> <9500e9b2-2d34-099d-aa90-d38fb3feb02e@sandisk.com> <1510680828.2280.16.camel@sandisk.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [199.255.44.171] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY1PR0401MB1534; 20:U3EVVPtlI/1NPhbQbqVdSqBPaE6euvTwIFnrUDI27peR850dMxXjZ5kn8eWRw81a/V46mw/fqRGn/Y/J9QQZlZJ3eyAOa1bMivXHiwic3nTFu/5SldkFcwI5xupaCH97dR6/+GI/apJPZEThIZi0u+LNlf3joxfdRsOz69K+jSI= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 9e77778a-b6bf-4595-7d5b-08d52ddcd59d x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(2017052603199); SRVR:CY1PR0401MB1534; x-ms-traffictypediagnostic: CY1PR0401MB1534: wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(10201501046)(3231022)(100000703101)(100105400095)(3002001)(93006095)(93001095)(6055026)(6041248)(20161123560025)(20161123562025)(20161123555025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:CY1PR0401MB1534; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:CY1PR0401MB1534; x-forefront-prvs: 049486C505 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(39860400002)(376002)(346002)(377424004)(24454002)(189002)(199003)(2351001)(86362001)(106356001)(68736007)(81166006)(3280700002)(105586002)(3660700001)(54356999)(36756003)(76176999)(93886005)(316002)(99286004)(97736004)(4001150100001)(54906003)(1411001)(1361003)(14454004)(103116003)(2501003)(50986999)(6512007)(101416001)(6246003)(2950100002)(53936002)(6436002)(2906002)(77096006)(5640700003)(6486002)(25786009)(478600001)(6916009)(5660300001)(72206003)(66066001)(6506006)(2900100001)(8936002)(189998001)(33646002)(229853002)(6116002)(1730700003)(8676002)(3846002)(39060400002)(102836003)(81156014)(4326008)(305945005)(7736002); DIR:OUT; SFP:1102; SCL:1; SRVR:CY1PR0401MB1534; H:CY1PR0401MB1536.namprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9e77778a-b6bf-4595-7d5b-08d52ddcd59d X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Nov 2017 17:01:24.1309 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1534 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, 2017-11-17 at 16:14 +0100, Jack Wang wrote: > I suspect could be missing run queue or lost IO, IMHO it's unlikely > below disk probing fix the bug. If the system is still in this state or if you can reproduce this issue, please collect and analyze the information under /sys/kernel/debug/block. That's the only way I know of to verify whether or not a lockup has been caused by a missing queue run. If the following command resolves the lockup then the root cause is definitely a missing queue run: for f in /sys/kernel/debug/block/*; do echo kick >$f/state; done When analyzing queue lockups it's important to also have information about requests that have been queued but that have not yet been started. I'm using the following patch locally (will split this patch and submit it properly when I have the time): diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 29e8451931ff..3c9d64793865 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -408,8 +408,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved) { const struct show_busy_params *params = data; - if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx && - test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) + if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx) __blk_mq_debugfs_rq_show(params->m, list_entry_rq(&rq->queuelist)); } diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c index 01f08c03f2c1..41d1e3a01786 100644 --- a/drivers/scsi/scsi_debugfs.c +++ b/drivers/scsi/scsi_debugfs.c @@ -7,10 +7,14 @@ void scsi_show_rq(struct seq_file *m, struct request *rq) { struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req); - int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); - char buf[80]; + int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); + int timeout_ms = jiffies_to_msecs(rq->timeout); + const u8 *const cdb = READ_ONCE(cmd->cmnd); + char buf[80] = "(?)"; - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len); - seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf, - cmd->retries, msecs / 1000, msecs % 1000); + if ((cmd->flags & SCMD_INITIALIZED) && cdb) + __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len); + seq_printf(m, ", .cmd=%s, .retries=%d, .timeout=%d.%03d, allocated %d.%03d s ago", + buf, cmd->retries, timeout_ms / 1000, timeout_ms % 1000, + alloc_ms / 1000, alloc_ms % 1000); }