From patchwork Mon Aug 1 20:46:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Snitzer X-Patchwork-Id: 9254901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CEDEC6075F for ; Mon, 1 Aug 2016 20:46:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1919284E6 for ; Mon, 1 Aug 2016 20:46:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B457D284E8; Mon, 1 Aug 2016 20:46:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17F11284E2 for ; Mon, 1 Aug 2016 20:46:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755257AbcHAUqV (ORCPT ); Mon, 1 Aug 2016 16:46:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57246 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752131AbcHAUpu (ORCPT ); Mon, 1 Aug 2016 16:45:50 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 162038535A; Mon, 1 Aug 2016 20:45:50 +0000 (UTC) Received: from localhost (dhcp-25-21.bos.redhat.com [10.18.25.21]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u71KjnwM021801 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 1 Aug 2016 16:45:49 -0400 Date: Mon, 1 Aug 2016 16:46:28 -0400 From: Mike Snitzer To: Bart Van Assche Cc: Laurence Oberman , "dm-devel@redhat.com" , "linux-scsi@vger.kernel.org" Subject: Re: dm-mq and end_clone_request() Message-ID: <20160801204628.GA94704@redhat.com> References: <6880321d-e14f-169b-d100-6e460dd9bd09@sandisk.com> <1110327939.7305916.1469819453678.JavaMail.zimbra@redhat.com> <757522831.7667712.1470059860543.JavaMail.zimbra@redhat.com> <536022978.7668211.1470060125271.JavaMail.zimbra@redhat.com> <931235537.7668834.1470060339483.JavaMail.zimbra@redhat.com> <1264951811.7684268.1470065187014.JavaMail.zimbra@redhat.com> <17da3ab0-233a-2cec-f921-bfd42c953ccc@sandisk.com> <20160801175948.GA6685@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 01 Aug 2016 20:45:50 +0000 (UTC) Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Aug 01 2016 at 2:55P -0400, Bart Van Assche wrote: > On 08/01/2016 10:59 AM, Mike Snitzer wrote: > >This says to me that must_push_back is returning false because > >dm_noflush_suspending() is false. When this happens -EIO will escape up > >the IO stack. > > > >And this confirms that must_push_back() calling dm_noflush_suspending() > >is quite suspect given queue_if_no_path was configured: we should > >_always_ pushback if no paths are available. > > > >I'll dig deeper on really understanding _why_ must_push_back() is coded > >like it is. > > Hello Mike, > > Earlier I had reported that I observe this behavior with > CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have been > able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but it takes a > large number of iterations to trigger this behavior. The output that appears > on my setup in the kernel log with a bunch of printk()'s added in the > dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n is as follows (mpath 254:0 and > /dev/mapper/mpathbe refer to the same multipath device): > > [ 314.755582] mpath 254:0: queue_if_no_path 0 -> 1 > [ 314.770571] executing DM ioctl DEV_SUSPEND on mpathbe > [ 314.770622] mpath 254:0: queue_if_no_path 1 -> 0 > [ 314.770657] __multipath_map(): (a) returning -5 > [ 314.770657] map_request(): clone_and_map_rq() returned -5 > [ 314.770658] dm_complete_request: error = -5 Hi Bart, Please retry both variant (CONFIG_DM_MQ_DEFAULT=y first) with this patch applied. Interested to see if things look better for you (WARN_ON_ONCEs added just to see if we hit the corresponding suspend/stopped state while mapping requests -- if so this speaks to an inherently racey problem that will need further investigation for a proper fix but results from this should let us know if we're closer). --- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1b2f962..0e0f6e0 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2007,6 +2007,9 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq, struct dm_target *ti = tio->ti; struct request *clone = NULL; + if (WARN_ON_ONCE(unlikely(dm_suspended_md(md)))) + return DM_MAPIO_REQUEUE; + if (tio->clone) { clone = tio->clone; r = ti->type->map_rq(ti, clone, &tio->info); @@ -2722,6 +2725,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, dm_put_live_table(md, srcu_idx); } + if (WARN_ON_ONCE(unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))) + return BLK_MQ_RQ_QUEUE_BUSY; + if (ti->type->busy && ti->type->busy(ti)) return BLK_MQ_RQ_QUEUE_BUSY;