From patchwork Fri Aug 5 17:41:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Pen X-Patchwork-Id: 9265639 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BB90D60754 for ; Fri, 5 Aug 2016 17:42:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B0CD128451 for ; Fri, 5 Aug 2016 17:42:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A593328459; Fri, 5 Aug 2016 17:42:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B1E9728451 for ; Fri, 5 Aug 2016 17:42:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756814AbcHERmP (ORCPT ); Fri, 5 Aug 2016 13:42:15 -0400 Received: from mail-wm0-f52.google.com ([74.125.82.52]:37180 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756687AbcHERmO (ORCPT ); Fri, 5 Aug 2016 13:42:14 -0400 Received: by mail-wm0-f52.google.com with SMTP id i5so46978110wmg.0 for ; Fri, 05 Aug 2016 10:42:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=YY5hchf/Z4wh6U7UWRN3ykhRKd4A7E7LQjygkwEcQA8=; b=UY6Pf+DmbD/UTVTefCjsZ5KRoSQQUw5illvZuA50kQNmbYr53rk9+up5k3eDygqwMH yFiOMLRp3dtLoSTnUPU2/v9Aj5UPQCgU8t7JCpTp8PklxezlTGYia+5miTBFHPs8FW8v /nO+Qt5vEFhAJPiWg9FdsVKI5MUhImLCspbs3qikvJiahUjEafQSb89vyDN1Z5UyDgIA +X1RM4lLsK1otYfBijwiRzfemipGOm2yYuqDiK8CTWF76cvuTWrc9K9u3cIG7oJ6XbtW OUsrY1lRTDO1DnSyvImMPUjZHlOq6anwzFQwtt5Se/CGabUR6EURhPmdyjdPV9prbkTP 64hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=YY5hchf/Z4wh6U7UWRN3ykhRKd4A7E7LQjygkwEcQA8=; b=mNDq17OFS92iZZRTkDNHOGs0aKYWd08lDsb/CxzSQVyA9k4b5oZ3Iz3ZG5TSbcWbOX uw1kZVs7WnZfKpdsqY8ucpas0mGSa+IUWj/mSVet65Y4y0GZZrZ9vz+qgG8TkGs9Vjrk cbDciTKNhsC/QdlVlTQPuxZyHlQRySsciNkth0P9xhofGdkssrbBKqe2yCL7BjXCTgTt bvYqspj44MfmJgHq80wQh4+RoKLaotM9HiRGIqlu3cABX6lQVUA+/SOKCfeXKS4cBKFV vl247vEPAnI21vCw4l6D2WmJRUijd9D0Zy7VNCvX4aF6SVTF+sp61gAJwO+d7ix8Kfwy 9fAA== X-Gm-Message-State: AEkoouuDA9aP02qz5Q/w/qobpcTNjvXNOzQn6AoC99SCi2l+g4D0xiVdcpEgAb+BNZOtK/yq X-Received: by 10.28.61.11 with SMTP id k11mr4710063wma.34.1470418932555; Fri, 05 Aug 2016 10:42:12 -0700 (PDT) Received: from pb.pb.local ([62.217.45.26]) by smtp.gmail.com with ESMTPSA id p3sm15272600wjb.45.2016.08.05.10.42.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 10:42:11 -0700 (PDT) From: Roman Pen Cc: Roman Pen , Akinobu Mita , Tejun Heo , Jens Axboe , Christoph Hellwig , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/1] blk-mq: fix hang caused by freeze/unfreeze sequence Date: Fri, 5 Aug 2016 19:41:30 +0200 Message-Id: <20160805174131.22043-1-roman.penyaev@profitbricks.com> X-Mailer: git-send-email 2.9.0 To: unlisted-recipients:; (no To-header on input) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Long time ago there was a similar fix proposed by Akinobu Mita[1], but it seems that time everyone decided to fix this subtle race in percpu-refcount and Tejun Heo[2] did an attempt (as I can see that patchset was not applied). The following is a description of a queue hang - same fix but a bug from another angle. The hang happens on queue freeze because of a simultaneous calls of blk_mq_freeze_queue() and blk_mq_unfreeze_queue() from different threads, and because of a reference race percpu_ref_reinit() and percpu_ref_kill() swap. CPU#0 CPU#1 ---------------- ----------------- percpu_ref_kill() percpu_ref_kill() << atomic reference does not percpu_ref_reinit() << guarantee the order blk_mq_freeze_queue_wait() << HANG HERE percpu_ref_reinit() Firstly this wrong sequence raises two kernel warnings: 1st. WARNING at lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once 2nd. WARNING at lib/percpu-refcount.c:331 But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), which waits for a zero of a q_usage_counter, which never happens because percpu-ref was not reinited and stays in PERCPU state forever. The simplified sequence above is reproduced on shared tags, when one queue is going to die meanwhile another one is initing: CPU#0 CPU#1 ------------------------------- ------------------------------------ q1 = blk_mq_init_queue(shared_tags) q2 = blk_mq_init_queue(shared_tags): blk_mq_add_queue_tag_set(shared_tags): blk_mq_update_tag_set_depth(shared_tags): blk_mq_freeze_queue(q1) blk_cleanup_queue(q1) ... blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1) [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org Signed-off-by: Roman Pen Cc: Akinobu Mita Cc: Tejun Heo Cc: Jens Axboe Cc: Christoph Hellwig Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- block/blk-core.c | 1 + block/blk-mq.c | 22 +++++++++++----------- include/linux/blkdev.h | 7 ++++++- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ef78848..01dcb02 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -740,6 +740,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); init_waitqueue_head(&q->mq_freeze_wq); + mutex_init(&q->mq_freeze_lock); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index 6d6f8fe..1f3e81b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -80,13 +80,13 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, void blk_mq_freeze_queue_start(struct request_queue *q) { - int freeze_depth; - - freeze_depth = atomic_inc_return(&q->mq_freeze_depth); - if (freeze_depth == 1) { + mutex_lock(&q->mq_freeze_lock); + if (++q->mq_freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); + mutex_unlock(&q->mq_freeze_lock); blk_mq_run_hw_queues(q, false); - } + } else + mutex_unlock(&q->mq_freeze_lock); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start); @@ -124,14 +124,14 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue); void blk_mq_unfreeze_queue(struct request_queue *q) { - int freeze_depth; - - freeze_depth = atomic_dec_return(&q->mq_freeze_depth); - WARN_ON_ONCE(freeze_depth < 0); - if (!freeze_depth) { + mutex_lock(&q->mq_freeze_lock); + q->mq_freeze_depth--; + WARN_ON_ONCE(q->mq_freeze_depth < 0); + if (!q->mq_freeze_depth) { percpu_ref_reinit(&q->q_usage_counter); wake_up_all(&q->mq_freeze_wq); } + mutex_unlock(&q->mq_freeze_lock); } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); @@ -2105,7 +2105,7 @@ void blk_mq_free_queue(struct request_queue *q) static void blk_mq_queue_reinit(struct request_queue *q, const struct cpumask *online_mask) { - WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth)); + WARN_ON_ONCE(!q->mq_freeze_depth); blk_mq_sysfs_unregister(q); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f6ff9d1..d692c16 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -445,7 +445,7 @@ struct request_queue { struct mutex sysfs_lock; int bypass_depth; - atomic_t mq_freeze_depth; + int mq_freeze_depth; #if defined(CONFIG_BLK_DEV_BSG) bsg_job_fn *bsg_job_fn; @@ -459,6 +459,11 @@ struct request_queue { #endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; + /* + * Protect concurrent access to q_usage_counter by + * percpu_ref_kill() and percpu_ref_reinit(). + */ + struct mutex mq_freeze_lock; struct percpu_ref q_usage_counter; struct list_head all_q_node;