From patchwork Mon Aug 12 13:43:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11089897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 633C16C5 for ; Mon, 12 Aug 2019 13:43:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5582727CAF for ; Mon, 12 Aug 2019 13:43:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 49DDF2834A; Mon, 12 Aug 2019 13:43:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A275327CAF for ; Mon, 12 Aug 2019 13:43:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726998AbfHLNnh (ORCPT ); Mon, 12 Aug 2019 09:43:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59430 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726538AbfHLNnh (ORCPT ); Mon, 12 Aug 2019 09:43:37 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B1C0D300BC7E; Mon, 12 Aug 2019 13:43:36 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4BAAB1001281; Mon, 12 Aug 2019 13:43:32 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Minwoo Im , Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Keith Busch Subject: [PATCH V2 1/5] blk-mq: add new state of BLK_MQ_S_INTERNAL_STOPPED Date: Mon, 12 Aug 2019 21:43:08 +0800 Message-Id: <20190812134312.16732-2-ming.lei@redhat.com> In-Reply-To: <20190812134312.16732-1-ming.lei@redhat.com> References: <20190812134312.16732-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 12 Aug 2019 13:43:36 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add a new hw queue state of BLK_MQ_S_INTERNAL_STOPPED, which prepares for stopping hw queue before all CPUs of this hctx become offline. We can't reuse BLK_MQ_S_STOPPED because that state can be cleared during IO completion. Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-mq-debugfs.c | 1 + block/blk-mq.h | 3 ++- include/linux/blk-mq.h | 3 +++ 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index b3f2ba483992..af40a02c46ee 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -213,6 +213,7 @@ static const char *const hctx_state_name[] = { HCTX_STATE_NAME(STOPPED), HCTX_STATE_NAME(TAG_ACTIVE), HCTX_STATE_NAME(SCHED_RESTART), + HCTX_STATE_NAME(INTERNAL_STOPPED), }; #undef HCTX_STATE_NAME diff --git a/block/blk-mq.h b/block/blk-mq.h index 32c62c64e6c2..63717573bc16 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -176,7 +176,8 @@ static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx) { - return test_bit(BLK_MQ_S_STOPPED, &hctx->state); + return test_bit(BLK_MQ_S_STOPPED, &hctx->state) || + test_bit(BLK_MQ_S_INTERNAL_STOPPED, &hctx->state); } static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 21cebe901ac0..5b2d263e0646 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -235,6 +235,9 @@ enum { BLK_MQ_S_TAG_ACTIVE = 1, BLK_MQ_S_SCHED_RESTART = 2, + /* hw queue is internal stopped, driver do not use it */ + BLK_MQ_S_INTERNAL_STOPPED = 3, + BLK_MQ_MAX_DEPTH = 10240, BLK_MQ_CPU_WORK_BATCH = 8, From patchwork Mon Aug 12 13:43:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11089901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 361B1746 for ; Mon, 12 Aug 2019 13:43:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 264F226E55 for ; Mon, 12 Aug 2019 13:43:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A69E282EC; Mon, 12 Aug 2019 13:43:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E3A426E55 for ; Mon, 12 Aug 2019 13:43:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727066AbfHLNnn (ORCPT ); Mon, 12 Aug 2019 09:43:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54716 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726538AbfHLNnn (ORCPT ); Mon, 12 Aug 2019 09:43:43 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B8555305AB79; Mon, 12 Aug 2019 13:43:42 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E54817981; Mon, 12 Aug 2019 13:43:39 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Minwoo Im , Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Keith Busch Subject: [PATCH V2 2/5] blk-mq: add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ Date: Mon, 12 Aug 2019 21:43:09 +0800 Message-Id: <20190812134312.16732-3-ming.lei@redhat.com> In-Reply-To: <20190812134312.16732-1-ming.lei@redhat.com> References: <20190812134312.16732-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Mon, 12 Aug 2019 13:43:42 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We will stop hw queue and wait for completion of in-flight requests when one hctx is becoming dead in the following patch. This way may cause dead-lock for some stacking blk-mq drivers, such as dm-rq and loop. Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and loop, so we needn't to wait for completion of in-flight requests of dm-rq & loop, then the potential dead-lock can be avoided. Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-mq-debugfs.c | 1 + drivers/block/loop.c | 2 +- drivers/md/dm-rq.c | 2 +- include/linux/blk-mq.h | 1 + 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index af40a02c46ee..24fff8c90942 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -240,6 +240,7 @@ static const char *const hctx_flag_name[] = { HCTX_FLAG_NAME(TAG_SHARED), HCTX_FLAG_NAME(BLOCKING), HCTX_FLAG_NAME(NO_SCHED), + HCTX_FLAG_NAME(NO_MANAGED_IRQ), }; #undef HCTX_FLAG_NAME diff --git a/drivers/block/loop.c b/drivers/block/loop.c index a7461f482467..50328b572853 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1989,7 +1989,7 @@ static int loop_add(struct loop_device **l, int i) lo->tag_set.queue_depth = 128; lo->tag_set.numa_node = NUMA_NO_NODE; lo->tag_set.cmd_size = sizeof(struct loop_cmd); - lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; + lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ; lo->tag_set.driver_data = lo; err = blk_mq_alloc_tag_set(&lo->tag_set); diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 21d5c1784d0c..684f92988d40 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t) md->tag_set->ops = &dm_mq_ops; md->tag_set->queue_depth = dm_get_blk_mq_queue_depth(); md->tag_set->numa_node = md->numa_node_id; - md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE; + md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ; md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues(); md->tag_set->driver_data = md; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 5b2d263e0646..838a22888413 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -226,6 +226,7 @@ struct blk_mq_ops { enum { BLK_MQ_F_SHOULD_MERGE = 1 << 0, BLK_MQ_F_TAG_SHARED = 1 << 1, + BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2, BLK_MQ_F_BLOCKING = 1 << 5, BLK_MQ_F_NO_SCHED = 1 << 6, BLK_MQ_F_ALLOC_POLICY_START_BIT = 8, From patchwork Mon Aug 12 13:43:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11089899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 957106C5 for ; Mon, 12 Aug 2019 13:43:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8608528249 for ; Mon, 12 Aug 2019 13:43:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 79748282EC; Mon, 12 Aug 2019 13:43:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1FEF28305 for ; Mon, 12 Aug 2019 13:43:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727103AbfHLNnr (ORCPT ); Mon, 12 Aug 2019 09:43:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:14036 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726538AbfHLNnr (ORCPT ); Mon, 12 Aug 2019 09:43:47 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 67D6530EA19E; Mon, 12 Aug 2019 13:43:46 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 918851000324; Mon, 12 Aug 2019 13:43:45 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Minwoo Im , Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Keith Busch Subject: [PATCH V2 3/5] blk-mq: stop to handle IO before hctx's all CPUs become offline Date: Mon, 12 Aug 2019 21:43:10 +0800 Message-Id: <20190812134312.16732-4-ming.lei@redhat.com> In-Reply-To: <20190812134312.16732-1-ming.lei@redhat.com> References: <20190812134312.16732-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 12 Aug 2019 13:43:46 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: " That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again. " However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is one cpuhp state handled after the CPU is down, so there isn't any chance to quiesce hctx for blk-mq wrt. CPU hotplug. Add new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for blk-mq to stop queues and wait for completion of in-flight requests. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-mq-tag.c | 2 +- block/blk-mq-tag.h | 2 ++ block/blk-mq.c | 65 ++++++++++++++++++++++++++++++++++++++ include/linux/blk-mq.h | 1 + include/linux/cpuhotplug.h | 1 + 5 files changed, 70 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 008388e82b5c..31828b82552b 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -325,7 +325,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt, * true to continue iterating tags, false to stop. * @priv: Will be passed as second argument to @fn. */ -static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, +void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv) { if (tags->nr_reserved_tags) diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 61deab0b5a5a..321fd6f440e6 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -35,6 +35,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv); +void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, + busy_tag_iter_fn *fn, void *priv); static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt, struct blk_mq_hw_ctx *hctx) diff --git a/block/blk-mq.c b/block/blk-mq.c index 6968de9d7402..6931b2ba2776 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2206,6 +2206,61 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, return -ENOMEM; } +static bool blk_mq_count_inflight_rq(struct request *rq, void *data, + bool reserved) +{ + unsigned *count = data; + + if ((blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)) + (*count)++; + + return true; +} + +static unsigned blk_mq_tags_inflight_rqs(struct blk_mq_tags *tags) +{ + unsigned count = 0; + + blk_mq_all_tag_busy_iter(tags, blk_mq_count_inflight_rq, &count); + + return count; +} + +static void blk_mq_drain_inflight_rqs(struct blk_mq_hw_ctx *hctx) +{ + while (1) { + if (!blk_mq_tags_inflight_rqs(hctx->tags)) + break; + msleep(5); + } +} + +static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) +{ + struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, + struct blk_mq_hw_ctx, cpuhp_online); + unsigned prev_cpu = -1; + + while (true) { + unsigned next_cpu = cpumask_next_and(prev_cpu, hctx->cpumask, + cpu_online_mask); + + if (next_cpu >= nr_cpu_ids) + break; + + /* return if there is other online CPU on this hctx */ + if (next_cpu != cpu) + return 0; + + prev_cpu = next_cpu; + } + + set_bit(BLK_MQ_S_INTERNAL_STOPPED, &hctx->state); + blk_mq_drain_inflight_rqs(hctx); + + return 0; +} + /* * 'cpu' is going away. splice any existing rq_list entries from this * software queue to the hw queue dispatch list, and ensure that it @@ -2222,6 +2277,8 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) ctx = __blk_mq_get_ctx(hctx->queue, cpu); type = hctx->type; + clear_bit(BLK_MQ_S_INTERNAL_STOPPED, &hctx->state); + spin_lock(&ctx->lock); if (!list_empty(&ctx->rq_lists[type])) { list_splice_init(&ctx->rq_lists[type], &tmp); @@ -2242,6 +2299,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx) { + if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ)) + cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE, + &hctx->cpuhp_online); cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); } @@ -2301,6 +2361,9 @@ static int blk_mq_init_hctx(struct request_queue *q, { hctx->queue_num = hctx_idx; + if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ)) + cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE, + &hctx->cpuhp_online); cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); hctx->tags = set->tags[hctx_idx]; @@ -3537,6 +3600,8 @@ static int __init blk_mq_init(void) { cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, blk_mq_hctx_notify_dead); + cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online", + NULL, blk_mq_hctx_notify_online); return 0; } subsys_initcall(blk_mq_init); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 838a22888413..49413dcdb6aa 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -58,6 +58,7 @@ struct blk_mq_hw_ctx { atomic_t nr_active; + struct hlist_node cpuhp_online; struct hlist_node cpuhp_dead; struct kobject kobj; diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 068793a619ca..bb80f52040cb 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -147,6 +147,7 @@ enum cpuhp_state { CPUHP_AP_SMPBOOT_THREADS, CPUHP_AP_X86_VDSO_VMA_ONLINE, CPUHP_AP_IRQ_AFFINITY_ONLINE, + CPUHP_AP_BLK_MQ_ONLINE, CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS, CPUHP_AP_X86_INTEL_EPB_ONLINE, CPUHP_AP_PERF_ONLINE, From patchwork Mon Aug 12 13:43:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11089905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A5D36C5 for ; Mon, 12 Aug 2019 13:43:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 79CE326E51 for ; Mon, 12 Aug 2019 13:43:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6DF862846C; Mon, 12 Aug 2019 13:43:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BF6926E51 for ; Mon, 12 Aug 2019 13:43:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727462AbfHLNnw (ORCPT ); Mon, 12 Aug 2019 09:43:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55108 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726538AbfHLNnw (ORCPT ); Mon, 12 Aug 2019 09:43:52 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ED319EEF10; Mon, 12 Aug 2019 13:43:51 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 13E075C1D4; Mon, 12 Aug 2019 13:43:48 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Minwoo Im , Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Keith Busch Subject: [PATCH V2 4/5] blk-mq: re-submit IO in case that hctx is dead Date: Mon, 12 Aug 2019 21:43:11 +0800 Message-Id: <20190812134312.16732-5-ming.lei@redhat.com> In-Reply-To: <20190812134312.16732-1-ming.lei@redhat.com> References: <20190812134312.16732-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 12 Aug 2019 13:43:52 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When all CPUs in one hctx are offline, we shouldn't run this hw queue for completing request any more. So steal bios from the request, and resubmit them, and finally free the request in blk_mq_hctx_notify_dead(). Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-mq.c | 48 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 41 insertions(+), 7 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 6931b2ba2776..ed334fd867c4 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2261,10 +2261,30 @@ static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) return 0; } +static void blk_mq_resubmit_io(struct request *rq) +{ + struct bio_list list; + struct bio *bio; + + bio_list_init(&list); + blk_steal_bios(&list, rq); + + while (true) { + bio = bio_list_pop(&list); + if (!bio) + break; + + generic_make_request(bio); + } + + blk_mq_cleanup_rq(rq); + blk_mq_end_request(rq, 0); +} + /* - * 'cpu' is going away. splice any existing rq_list entries from this - * software queue to the hw queue dispatch list, and ensure that it - * gets run. + * 'cpu' has gone away. If this hctx is dead, we can't dispatch request + * to the hctx any more, so steal bios from requests of this hctx, and + * re-submit them to the request queue, and free these requests finally. */ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) { @@ -2272,6 +2292,8 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) struct blk_mq_ctx *ctx; LIST_HEAD(tmp); enum hctx_type type; + bool hctx_dead; + struct request *rq; hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); ctx = __blk_mq_get_ctx(hctx->queue, cpu); @@ -2279,6 +2301,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) clear_bit(BLK_MQ_S_INTERNAL_STOPPED, &hctx->state); + hctx_dead = cpumask_first_and(hctx->cpumask, cpu_online_mask) >= + nr_cpu_ids; + spin_lock(&ctx->lock); if (!list_empty(&ctx->rq_lists[type])) { list_splice_init(&ctx->rq_lists[type], &tmp); @@ -2289,11 +2314,20 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) if (list_empty(&tmp)) return 0; - spin_lock(&hctx->lock); - list_splice_tail_init(&tmp, &hctx->dispatch); - spin_unlock(&hctx->lock); + if (!hctx_dead) { + spin_lock(&hctx->lock); + list_splice_tail_init(&tmp, &hctx->dispatch); + spin_unlock(&hctx->lock); + blk_mq_run_hw_queue(hctx, true); + return 0; + } + + while (!list_empty(&tmp)) { + rq = list_entry(tmp.next, struct request, queuelist); + list_del_init(&rq->queuelist); + blk_mq_resubmit_io(rq); + } - blk_mq_run_hw_queue(hctx, true); return 0; } From patchwork Mon Aug 12 13:43:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11089903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2733F746 for ; Mon, 12 Aug 2019 13:43:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19554283A5 for ; Mon, 12 Aug 2019 13:43:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0DAA32846C; Mon, 12 Aug 2019 13:43:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 907F9283A5 for ; Mon, 12 Aug 2019 13:43:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728210AbfHLNn4 (ORCPT ); Mon, 12 Aug 2019 09:43:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56296 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728178AbfHLNn4 (ORCPT ); Mon, 12 Aug 2019 09:43:56 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A236B307C83E; Mon, 12 Aug 2019 13:43:55 +0000 (UTC) Received: from localhost (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id D071571C82; Mon, 12 Aug 2019 13:43:54 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Minwoo Im , Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Keith Busch Subject: [PATCH V2 5/5] blk-mq: handle requests dispatched from IO scheduler in case that hctx is dead Date: Mon, 12 Aug 2019 21:43:12 +0800 Message-Id: <20190812134312.16732-6-ming.lei@redhat.com> In-Reply-To: <20190812134312.16732-1-ming.lei@redhat.com> References: <20190812134312.16732-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Mon, 12 Aug 2019 13:43:55 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If hctx becomes dead, all in-queue IO requests aimed at this hctx have to be re-submitted, so cover requests queued in scheduler queue. Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Keith Busch Signed-off-by: Ming Lei --- block/blk-mq.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ed334fd867c4..a722ce53fb39 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2294,6 +2294,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) enum hctx_type type; bool hctx_dead; struct request *rq; + struct elevator_queue *e; hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); ctx = __blk_mq_get_ctx(hctx->queue, cpu); @@ -2304,12 +2305,31 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) hctx_dead = cpumask_first_and(hctx->cpumask, cpu_online_mask) >= nr_cpu_ids; - spin_lock(&ctx->lock); - if (!list_empty(&ctx->rq_lists[type])) { - list_splice_init(&ctx->rq_lists[type], &tmp); - blk_mq_hctx_clear_pending(hctx, ctx); + e = hctx->queue->elevator; + if (!e) { + spin_lock(&ctx->lock); + if (!list_empty(&ctx->rq_lists[type])) { + list_splice_init(&ctx->rq_lists[type], &tmp); + blk_mq_hctx_clear_pending(hctx, ctx); + } + spin_unlock(&ctx->lock); + } else if (hctx_dead) { + LIST_HEAD(sched_tmp); + + while ((rq = e->type->ops.dispatch_request(hctx))) { + if (rq->mq_hctx != hctx) + list_add(&rq->queuelist, &sched_tmp); + else + list_add(&rq->queuelist, &tmp); + } + + while (!list_empty(&sched_tmp)) { + rq = list_entry(sched_tmp.next, struct request, + queuelist); + list_del_init(&rq->queuelist); + blk_mq_sched_insert_request(rq, true, true, true); + } } - spin_unlock(&ctx->lock); if (list_empty(&tmp)) return 0;