From patchwork Wed Aug 24 18:34:42 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 9298193 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 38585608A7 for ; Wed, 24 Aug 2016 18:42:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29A2829084 for ; Wed, 24 Aug 2016 18:42:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DC5429096; Wed, 24 Aug 2016 18:42:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 635BF2908F for ; Wed, 24 Aug 2016 18:42:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933108AbcHXSmp (ORCPT ); Wed, 24 Aug 2016 14:42:45 -0400 Received: from mail-it0-f54.google.com ([209.85.214.54]:36621 "EHLO mail-it0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933047AbcHXSmc (ORCPT ); Wed, 24 Aug 2016 14:42:32 -0400 Received: by mail-it0-f54.google.com with SMTP id e63so227555277ith.1 for ; Wed, 24 Aug 2016 11:42:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=bcU0z/JfIk14pcLhgo7hfpZDaHerqxpvjhM5hnixzeY=; b=FkgcE8vrHvtFKWhwA80fE4Ayp9Tz5IVXeabsudXF8WarMIP5GSuEosl3YFetEywUwp 0RBCVYL2HjHvyx1QSQVzwnBL7UM2M7Lp8nqto6udsOC+pgZEJrvF+mZ2/kSRg4PS7Gku gbnB10E5tcN6ljxE5XIdcLj4Dgc5IMHklGbHMX/89vs3GjxXhE8IAEHaY/cQrk4M6Zk3 dHIBGHN95zQzwX/EVQ07X1Dnk0+4YOX8taZMaSHfzam+BoqHjW4bMrQmSznWFBiYFKD2 YwWUr5YzTvx3NEkX8tgw/KvjXQizqfL6WijlhMr8TL7O57iAW5p0d62KhaF42K6oY1Gp fMKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=bcU0z/JfIk14pcLhgo7hfpZDaHerqxpvjhM5hnixzeY=; b=IL6t1X9A6ApZgX2mJvO1pcrFRt4a3o/xI5dvNnKuifn/MV9y/sCqU6FlExncZ2CYDi Iob2vwwY0U+w+GPYO36Tv6rVzB89vrKjPjpT7EatYqhtAEjyzdW8vbOxFsttKoldQhw0 JwFzzpCxZDUapqhNHPA1o5Q1P0zq+UMXdjfNpLFq4C3cNxTwOhVi5lhKLd4ioltkPOxO HjIdrNHibY6p0WVFvvgMVGrgXoBKyKtbL9R0HOQ+QByGCOrSlf9aYH44NngP7V9rx0/3 m3NkwpNZWx0FOH0EkLrCdVPqqlZxTMPyUDtAmFlSd60i+e1Tyu342wIfmqlkiPR2A8vK naEw== X-Gm-Message-State: AEkoouteXCGLbKrsXiap6UkMT1yAxNFwTX0CORzfhe5R+kI70uNjWnuenbSabDxbXtuL0A== X-Received: by 10.107.154.196 with SMTP id c187mr5518384ioe.99.1472063683855; Wed, 24 Aug 2016 11:34:43 -0700 (PDT) Received: from [192.168.1.153] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id b135sm3895911ioa.13.2016.08.24.11.34.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Aug 2016 11:34:43 -0700 (PDT) Subject: Re: Oops when completing request on the wrong queue To: Gabriel Krisman Bertazi References: <87a8gltgks.fsf@linux.vnet.ibm.com> <871t1kq455.fsf@linux.vnet.ibm.com> <8fc9ae38-9488-ef52-f620-08499edebffa@kernel.dk> <87shu0hfye.fsf@linux.vnet.ibm.com> <87a8g39pg4.fsf@linux.vnet.ibm.com> <43693064-dd37-92ce-7753-2a8edb43eab5@kernel.dk> <164a4c63-065b-b766-36f3-bcef4aa46a38@kernel.dk> Cc: Keith Busch , Christoph Hellwig , linux-nvme@lists.infradead.org, Brian King , linux-block@vger.kernel.org, linux-scsi@vger.kernel.org From: Jens Axboe Message-ID: <49a954e6-2f96-8a63-ce15-2c82c1a1d36d@kernel.dk> Date: Wed, 24 Aug 2016 12:34:42 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <164a4c63-065b-b766-36f3-bcef4aa46a38@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 08/23/2016 03:14 PM, Jens Axboe wrote: > On 08/23/2016 03:11 PM, Jens Axboe wrote: >> On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: >>> Gabriel Krisman Bertazi writes: >>> >>>>> Can you share what you ran to online/offline CPUs? I can't reproduce >>>>> this here. >>>> >>>> I was using the ppc64_cpu tool, which shouldn't do nothing more than >>>> write to sysfs. but I just reproduced it with the script below. >>>> >>>> Note that this is ppc64le. I don't have a x86 in hand to attempt to >>>> reproduce right now, but I'll look for one and see how it goes. >>> >>> Hi, >>> >>> Any luck on reproducing it? We were initially reproducing with a >>> proprietary stress test, but I gave a try to a generated fio jobfile >>> associated with the SMT script I shared earlier and I could reproduce >>> the crash consistently in less than 10 minutes of execution. this was >>> still ppc64le, though. I couldn't get my hands on nvme on x86 yet. >> >> Nope, I have not been able to reproduce it. How long does the CPU >> offline/online actions take on ppc64? It's pretty slow on x86, which may >> hide the issue. I took out the various printk's associated with bringing >> a CPU off/online, as well as IRQ breaking parts, but didn't help in >> reproducing it. >> >>> The job file I used, as well as the smt.sh script, in case you want to >>> give it a try: >>> >>> jobfile: http://krisman.be/k/nvmejob.fio >>> smt.sh: http://krisman.be/k/smt.sh >>> >>> Still, the trigger seems to be consistently a heavy load of IO >>> associated with CPU addition/removal. >> >> My workload looks similar to yours, in that it's high depth and with a >> lot of jobs to keep most CPUs loaded. My bash script is different than >> yours, I'll try that and see if it helps here. > > Actually, I take that back. You're not using O_DIRECT, hence all your > jobs are running at QD=1, not the 256 specified. That looks odd, but > I'll try, maybe it'll hit something different. Can you try this patch? It's not perfect, but I'll be interested if it makes a difference for you. /* @@ -1075,15 +1075,11 @@ static void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, } void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, - bool async) + bool async) { + struct blk_mq_ctx *ctx = rq->mq_ctx; struct request_queue *q = rq->q; struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; - - current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) - rq->mq_ctx = ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1093,8 +1089,6 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, if (run_queue) blk_mq_run_hw_queue(hctx, async); - - blk_mq_put_ctx(current_ctx); } static void blk_mq_insert_requests(struct request_queue *q, @@ -1105,14 +1099,9 @@ static void blk_mq_insert_requests(struct request_queue *q, { struct blk_mq_hw_ctx *hctx; - struct blk_mq_ctx *current_ctx; trace_block_unplug(q, depth, !from_schedule); - current_ctx = blk_mq_get_ctx(q); - - if (!cpu_online(ctx->cpu)) - ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); /* @@ -1125,14 +1114,12 @@ static void blk_mq_insert_requests(struct request_queue *q, rq = list_first_entry(list, struct request, queuelist); list_del_init(&rq->queuelist); - rq->mq_ctx = ctx; __blk_mq_insert_req_list(hctx, ctx, rq, false); } blk_mq_hctx_mark_pending(hctx, ctx); spin_unlock(&ctx->lock); blk_mq_run_hw_queue(hctx, from_schedule); - blk_mq_put_ctx(current_ctx); } static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b) @@ -1692,6 +1679,11 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx, int cpu) while (!list_empty(&tmp)) { struct request *rq; + /* + * FIXME: we can't just move the req here. We'd have to + * pull off the bio chain and add it to a new request + * on the target hw queue + */ rq = list_first_entry(&tmp, struct request, queuelist); rq->mq_ctx = ctx; list_move_tail(&rq->queuelist, &ctx->rq_list); diff --git a/block/blk-mq.c b/block/blk-mq.c index 758a9b5..41def54 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -810,11 +810,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) struct list_head *dptr; int queued; - WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); - if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) return; + WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); + hctx->run++;