From patchwork Mon Dec 4 17:30:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 10091173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 66E69600C5 for ; Mon, 4 Dec 2017 17:30:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 527F128CED for ; Mon, 4 Dec 2017 17:30:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 471C428DA0; Mon, 4 Dec 2017 17:30:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E75E228CED for ; Mon, 4 Dec 2017 17:30:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752576AbdLDRa4 (ORCPT ); Mon, 4 Dec 2017 12:30:56 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:28403 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751754AbdLDRad (ORCPT ); Mon, 4 Dec 2017 12:30:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1512408633; x=1543944633; h=from:to:cc:subject:date:message-id; bh=bFBM1jAUuGRzEIdVb1DERTaCDVQ9Lt9P1Jkbafn4F9c=; b=pHMDeJMdzGIMwCD7AsT1tmu9WJgAwvjiaR4Zq46QgaauSPOeMdBxCIVK uElHA0+Qt47iKi2BfJZW9TNgfx/eVnyqGZv5+6J4UJ4EoqMWgPvVMKTO8 ES6mtngsawk4MJhJzZsNsdFp/NMut3cbkzwEbAy1jaep8uYB/lxpVnRd0 oU2moRngLBarATNmbEkGIKtdvVKHulJamejAOGWUQOSkn8b92Qr1xhObX swXFmW8U0cObfYP07RIcaHIqhzxXg9+FNO8xqHEmYqEGQQHhRqnpHOs+a pNvLPej1mrJy94u5GUv/yKwxHroL1iPEmrwS1jFzB7Xkb9hVbUHYxBYzK Q==; X-IronPort-AV: E=Sophos;i="5.45,359,1508774400"; d="scan'208";a="166817725" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 05 Dec 2017 01:30:32 +0800 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP; 04 Dec 2017 09:27:45 -0800 Received: from thinkpad-bart.sdcorp.global.sandisk.com (HELO thinkpad-bart.int.fusionio.com) ([10.11.166.51]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Dec 2017 09:30:33 -0800 From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Ming Lei , Hannes Reinecke , Johannes Thumshirn , "James E . J . Bottomley" , "Martin K . Petersen" , linux-scsi@vger.kernel.org Subject: [PATCH] blk-mq: Fix several SCSI request queue lockups Date: Mon, 4 Dec 2017 09:30:32 -0800 Message-Id: <20171204173032.16330-1-bart.vanassche@wdc.com> X-Mailer: git-send-email 2.15.0 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Commit 0df21c86bdbf introduced several bugs: * A SCSI queue stall for queue depths > 1, addressed by commit 88022d7201e9 ("blk-mq: don't handle failure in .get_budget") * A systematic lockup for SCSI queues with queue depth 1. The following test reproduces that bug systematically: - Change the SRP initiator such that SCSI target queue depth is limited to 1. - Run the following command: srp-test/run_tests -f xfs -d -e none -r 60 -t 01 See also "[PATCH 4/7] blk-mq: Avoid that request processing stalls when sharing tags" (https://marc.info/?l=linux-block&m=151208695316857). Note: reverting commit 0df21c86bdbf also fixes a sporadic SCSI request queue lockup while inserting a blk_mq_sched_mark_restart_hctx() before all blk_mq_dispatch_rq_list() calls only fixes the systematic lockup for queue depth 1. * A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if device is blocked in scsi_dev_queue_ready()" (https://marc.info/?l=linux-block&m=151223233407154). I think the above means that it is too risky to try to fix all bugs introduced by commit 0df21c86bdbf before kernel v4.15 is released. Hence revert that commit. Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") Signed-off-by: Bart Van Assche Cc: Ming Lei Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: James E.J. Bottomley Cc: Martin K. Petersen Cc: linux-scsi@vger.kernel.org --- drivers/scsi/scsi_lib.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 84bd2b16d216..a7e7966f1477 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1976,9 +1976,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct scsi_device *sdev = q->queuedata; struct Scsi_Host *shost = sdev->host; struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); - blk_status_t ret; + blk_status_t ret = BLK_STS_RESOURCE; int reason; + if (!scsi_mq_get_budget(hctx)) + goto out; ret = prep_to_mq(scsi_prep_state_check(sdev, req)); if (ret != BLK_STS_OK) goto out_put_budget; @@ -2022,6 +2024,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, atomic_dec(&scsi_target(sdev)->target_busy); out_put_budget: scsi_mq_put_budget(hctx); +out: switch (ret) { case BLK_STS_OK: break; @@ -2225,8 +2228,6 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev) } static const struct blk_mq_ops scsi_mq_ops = { - .get_budget = scsi_mq_get_budget, - .put_budget = scsi_mq_put_budget, .queue_rq = scsi_queue_rq, .complete = scsi_softirq_done, .timeout = scsi_timeout,