From patchwork Fri Apr 12 03:30:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897175 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDFEC1669 for ; Fri, 12 Apr 2019 03:30:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D487F28E20 for ; Fri, 12 Apr 2019 03:30:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C8DBE28E23; Fri, 12 Apr 2019 03:30:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D9EA28E20 for ; Fri, 12 Apr 2019 03:30:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726702AbfDLDar (ORCPT ); Thu, 11 Apr 2019 23:30:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38352 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726678AbfDLDar (ORCPT ); Thu, 11 Apr 2019 23:30:47 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 54F4858E4E; Fri, 12 Apr 2019 03:30:47 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 574351001DED; Fri, 12 Apr 2019 03:30:44 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 1/9] blk-mq: grab .q_usage_counter when queuing request from plug code path Date: Fri, 12 Apr 2019 11:30:24 +0800 Message-Id: <20190412033032.10418-2-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 12 Apr 2019 03:30:47 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Just like aio/io_uring, we need to grab 2 refcount for queuing one request, one is for submission, another is for completion. If the request isn't queued from plug code path, the refcount grabbed in generic_make_request() serves for submission. In theroy, this refcount should have been released after the sumission(async run queue) is done. blk_freeze_queue() works with blk_sync_queue() together for avoiding race between cleanup queue and IO submission, given async run queue activities are canceled because hctx->run_work is scheduled with the refcount held, so it is fine to not hold the refcount when running the run queue work function for dispatch IO. However, if request is staggered into plug list, and finally queued from plug code path, the refcount in submission side is actually missed. And we may start to run queue after queue is removed because the queue's kobject refcount isn't guaranteed to be grabbed in flushing plug list context, then kernel oops is triggered, see the following race: blk_mq_flush_plug_list(): blk_mq_sched_insert_requests() insert requests to sw queue or scheduler queue blk_mq_run_hw_queue Because of concurrent run queue, all requests inserted above may be completed before calling the above blk_mq_run_hw_queue. Then queue can be freed during the above blk_mq_run_hw_queue(). Fixes the issue by grab .q_usage_counter before calling blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is safe because the queue is absolutely alive before inserting request. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei Reviewed-by: Johannes Thumshirn Reviewed-by: Hannes Reinecke --- block/blk-mq.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ff3d7b49969..5b586affee09 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1728,9 +1728,12 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule) if (rq->mq_hctx != this_hctx || rq->mq_ctx != this_ctx) { if (this_hctx) { trace_block_unplug(this_q, depth, !from_schedule); + + percpu_ref_get(&this_q->q_usage_counter); blk_mq_sched_insert_requests(this_hctx, this_ctx, &rq_list, from_schedule); + percpu_ref_put(&this_q->q_usage_counter); } this_q = rq->q; @@ -1749,8 +1752,11 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule) */ if (this_hctx) { trace_block_unplug(this_q, depth, !from_schedule); + + percpu_ref_get(&this_q->q_usage_counter); blk_mq_sched_insert_requests(this_hctx, this_ctx, &rq_list, from_schedule); + percpu_ref_put(&this_q->q_usage_counter); } } From patchwork Fri Apr 12 03:30:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4B541669 for ; Fri, 12 Apr 2019 03:30:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B88028E20 for ; Fri, 12 Apr 2019 03:30:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8FC6028E22; Fri, 12 Apr 2019 03:30:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 408CC28E20 for ; Fri, 12 Apr 2019 03:30:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726691AbfDLDay (ORCPT ); Thu, 11 Apr 2019 23:30:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38170 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726678AbfDLDax (ORCPT ); Thu, 11 Apr 2019 23:30:53 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 44FBE86663; Fri, 12 Apr 2019 03:30:53 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id ED70F5C21E; Fri, 12 Apr 2019 03:30:49 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 2/9] blk-mq: move cancel of requeue_work into blk_mq_release Date: Fri, 12 Apr 2019 11:30:25 +0800 Message-Id: <20190412033032.10418-3-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 12 Apr 2019 03:30:53 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei Reviewed-by: Johannes Thumshirn --- block/blk-core.c | 1 - block/blk-mq.c | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4673ebe42255..6583d67f3e34 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -237,7 +237,6 @@ void blk_sync_queue(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - cancel_delayed_work_sync(&q->requeue_work); queue_for_each_hw_ctx(q, hctx, i) cancel_delayed_work_sync(&hctx->run_work); } diff --git a/block/blk-mq.c b/block/blk-mq.c index 5b586affee09..b512ba0cb359 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2626,6 +2626,8 @@ void blk_mq_release(struct request_queue *q) struct blk_mq_hw_ctx *hctx; unsigned int i; + cancel_delayed_work_sync(&q->requeue_work); + /* hctx kobj stays in hctx */ queue_for_each_hw_ctx(q, hctx, i) { if (!hctx) From patchwork Fri Apr 12 03:30:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897185 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B224922 for ; Fri, 12 Apr 2019 03:31:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5E1028E20 for ; Fri, 12 Apr 2019 03:31:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA85928E23; Fri, 12 Apr 2019 03:31:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5378A28E20 for ; Fri, 12 Apr 2019 03:31:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726857AbfDLDa7 (ORCPT ); Thu, 11 Apr 2019 23:30:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52802 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726713AbfDLDa7 (ORCPT ); Thu, 11 Apr 2019 23:30:59 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B60613092647; Fri, 12 Apr 2019 03:30:58 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id BF9DC60BF7; Fri, 12 Apr 2019 03:30:55 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang , stable@vger.kernel.org Subject: [PATCH V5 3/9] blk-mq: free hw queue's resource in hctx's release handler Date: Fri, 12 Apr 2019 11:30:26 +0800 Message-Id: <20190412033032.10418-4-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Fri, 12 Apr 2019 03:30:58 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2 ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before 45a9c9d909b2, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: 8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done") c2856ae2f315 ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Reported-by: James Smart Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- block/blk-core.c | 2 +- block/blk-mq-sysfs.c | 6 ++++++ block/blk-mq.c | 8 ++------ block/blk-mq.h | 2 +- 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 6583d67f3e34..20298aa5a77c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -374,7 +374,7 @@ void blk_cleanup_queue(struct request_queue *q) blk_exit_queue(q); if (queue_is_mq(q)) - blk_mq_free_queue(q); + blk_mq_exit_queue(q); percpu_ref_exit(&q->q_usage_counter); diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 3f9c3f4ac44c..4040e62c3737 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -10,6 +10,7 @@ #include #include +#include "blk.h" #include "blk-mq.h" #include "blk-mq-tag.h" @@ -33,6 +34,11 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj) { struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx, kobj); + + if (hctx->flags & BLK_MQ_F_BLOCKING) + cleanup_srcu_struct(hctx->srcu); + blk_free_flush_queue(hctx->fq); + sbitmap_free(&hctx->ctx_map); free_cpumask_var(hctx->cpumask); kfree(hctx->ctxs); kfree(hctx); diff --git a/block/blk-mq.c b/block/blk-mq.c index b512ba0cb359..afc9912e2e42 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2259,12 +2259,7 @@ static void blk_mq_exit_hctx(struct request_queue *q, if (set->ops->exit_hctx) set->ops->exit_hctx(hctx, hctx_idx); - if (hctx->flags & BLK_MQ_F_BLOCKING) - cleanup_srcu_struct(hctx->srcu); - blk_mq_remove_cpuhp(hctx); - blk_free_flush_queue(hctx->fq); - sbitmap_free(&hctx->ctx_map); } static void blk_mq_exit_hw_queues(struct request_queue *q, @@ -2899,7 +2894,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, } EXPORT_SYMBOL(blk_mq_init_allocated_queue); -void blk_mq_free_queue(struct request_queue *q) +/* tags can _not_ be used after returning from blk_mq_exit_queue */ +void blk_mq_exit_queue(struct request_queue *q) { struct blk_mq_tag_set *set = q->tag_set; diff --git a/block/blk-mq.h b/block/blk-mq.h index d704fc7766f4..c421e3a16e36 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -37,7 +37,7 @@ struct blk_mq_ctx { struct kobject kobj; } ____cacheline_aligned_in_smp; -void blk_mq_free_queue(struct request_queue *q); +void blk_mq_exit_queue(struct request_queue *q); int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr); void blk_mq_wake_waiters(struct request_queue *q); bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool); From patchwork Fri Apr 12 03:30:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897189 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8B20922 for ; Fri, 12 Apr 2019 03:31:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D1B628E20 for ; Fri, 12 Apr 2019 03:31:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8143A28E23; Fri, 12 Apr 2019 03:31:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1003228E20 for ; Fri, 12 Apr 2019 03:31:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726862AbfDLDbD (ORCPT ); Thu, 11 Apr 2019 23:31:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56546 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726713AbfDLDbD (ORCPT ); Thu, 11 Apr 2019 23:31:03 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 65CB6C049E24; Fri, 12 Apr 2019 03:31:02 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6AFB55C21E; Fri, 12 Apr 2019 03:31:01 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 4/9] blk-mq: move all hctx alloction & initialization into __blk_mq_alloc_and_init_hctx Date: Fri, 12 Apr 2019 11:30:27 +0800 Message-Id: <20190412033032.10418-5-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 12 Apr 2019 03:31:02 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Move all hctx related allocation and initialization into __blk_mq_alloc_and_init_hctx, and prepare for splitting blk_mq_alloc_and_init_hctx into real two functions, one is for allocate everything, and another is for initializing everyting. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- block/blk-mq.c | 93 +++++++++++++++++++++++++++------------------------------- 1 file changed, 44 insertions(+), 49 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index afc9912e2e42..5ad58169ad6a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2276,15 +2276,45 @@ static void blk_mq_exit_hw_queues(struct request_queue *q, } } -static int blk_mq_init_hctx(struct request_queue *q, - struct blk_mq_tag_set *set, - struct blk_mq_hw_ctx *hctx, unsigned hctx_idx) +static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set) { - int node; + int hw_ctx_size = sizeof(struct blk_mq_hw_ctx); + + BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu), + __alignof__(struct blk_mq_hw_ctx)) != + sizeof(struct blk_mq_hw_ctx)); + + if (tag_set->flags & BLK_MQ_F_BLOCKING) + hw_ctx_size += sizeof(struct srcu_struct); + + return hw_ctx_size; +} + +static struct blk_mq_hw_ctx * +__blk_mq_alloc_and_init_hctx(struct request_queue *q, + struct blk_mq_tag_set *set, + unsigned hctx_idx, int node) +{ + struct blk_mq_hw_ctx *hctx; + + hctx = kzalloc_node(blk_mq_hw_ctx_size(set), + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, + node); + if (!hctx) + goto fail_alloc_hctx; + + if (!zalloc_cpumask_var_node(&hctx->cpumask, + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, + node)) + goto free_hctx; + + atomic_set(&hctx->nr_active, 0); + hctx->numa_node = node; + hctx->queue_num = hctx_idx; - node = hctx->numa_node; if (node == NUMA_NO_NODE) - node = hctx->numa_node = set->numa_node; + hctx->numa_node = set->numa_node; + node = hctx->numa_node; INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn); spin_lock_init(&hctx->lock); @@ -2329,8 +2359,9 @@ static int blk_mq_init_hctx(struct request_queue *q, if (hctx->flags & BLK_MQ_F_BLOCKING) init_srcu_struct(hctx->srcu); + blk_mq_hctx_kobj_init(hctx); - return 0; + return hctx; free_fq: kfree(hctx->fq); @@ -2343,7 +2374,11 @@ static int blk_mq_init_hctx(struct request_queue *q, kfree(hctx->ctxs); unregister_cpu_notifier: blk_mq_remove_cpuhp(hctx); - return -1; + free_cpumask_var(hctx->cpumask); + free_hctx: + kfree(hctx); + fail_alloc_hctx: + return NULL; } static void blk_mq_init_cpu_queues(struct request_queue *q, @@ -2689,51 +2724,11 @@ struct request_queue *blk_mq_init_sq_queue(struct blk_mq_tag_set *set, } EXPORT_SYMBOL(blk_mq_init_sq_queue); -static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set) -{ - int hw_ctx_size = sizeof(struct blk_mq_hw_ctx); - - BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu), - __alignof__(struct blk_mq_hw_ctx)) != - sizeof(struct blk_mq_hw_ctx)); - - if (tag_set->flags & BLK_MQ_F_BLOCKING) - hw_ctx_size += sizeof(struct srcu_struct); - - return hw_ctx_size; -} - static struct blk_mq_hw_ctx *blk_mq_alloc_and_init_hctx( struct blk_mq_tag_set *set, struct request_queue *q, int hctx_idx, int node) { - struct blk_mq_hw_ctx *hctx; - - hctx = kzalloc_node(blk_mq_hw_ctx_size(set), - GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, - node); - if (!hctx) - return NULL; - - if (!zalloc_cpumask_var_node(&hctx->cpumask, - GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, - node)) { - kfree(hctx); - return NULL; - } - - atomic_set(&hctx->nr_active, 0); - hctx->numa_node = node; - hctx->queue_num = hctx_idx; - - if (blk_mq_init_hctx(q, set, hctx, hctx_idx)) { - free_cpumask_var(hctx->cpumask); - kfree(hctx); - return NULL; - } - blk_mq_hctx_kobj_init(hctx); - - return hctx; + return __blk_mq_alloc_and_init_hctx(q, set, hctx_idx, node); } static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, From patchwork Fri Apr 12 03:30:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897193 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A1F41669 for ; Fri, 12 Apr 2019 03:31:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B0FC28E20 for ; Fri, 12 Apr 2019 03:31:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5FA9128E23; Fri, 12 Apr 2019 03:31:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0B2428E20 for ; Fri, 12 Apr 2019 03:31:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726837AbfDLDbK (ORCPT ); Thu, 11 Apr 2019 23:31:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52538 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726678AbfDLDbK (ORCPT ); Thu, 11 Apr 2019 23:31:10 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A492C075D72; Fri, 12 Apr 2019 03:31:09 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id E22F65D9CD; Fri, 12 Apr 2019 03:31:04 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 5/9] blk-mq: split blk_mq_alloc_and_init_hctx into two parts Date: Fri, 12 Apr 2019 11:30:28 +0800 Message-Id: <20190412033032.10418-6-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 12 Apr 2019 03:31:09 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Split blk_mq_alloc_and_init_hctx into two parts, and one is blk_mq_alloc_hctx() which is for allocating all hctx resources, another is blk_mq_init_hctx() which is for initializing hctx, and serves as counter-part of blk_mq_exit_hctx(). Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- block/blk-mq.c | 69 ++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 45 insertions(+), 24 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 5ad58169ad6a..71996fe494eb 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2290,10 +2290,36 @@ static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set) return hw_ctx_size; } +static int blk_mq_init_hctx(struct request_queue *q, + struct blk_mq_tag_set *set, + struct blk_mq_hw_ctx *hctx, + unsigned hctx_idx) +{ + hctx->queue_num = hctx_idx; + + cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); + + hctx->tags = set->tags[hctx_idx]; + + if (set->ops->init_hctx && + set->ops->init_hctx(hctx, set->driver_data, hctx_idx)) + goto fail; + + if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, + hctx->numa_node)) + goto exit_hctx; + return 0; + exit_hctx: + if (set->ops->exit_hctx) + set->ops->exit_hctx(hctx, hctx_idx); + fail: + return -1; +} + static struct blk_mq_hw_ctx * -__blk_mq_alloc_and_init_hctx(struct request_queue *q, - struct blk_mq_tag_set *set, - unsigned hctx_idx, int node) +blk_mq_alloc_hctx(struct request_queue *q, + struct blk_mq_tag_set *set, + unsigned hctx_idx, int node) { struct blk_mq_hw_ctx *hctx; @@ -2310,8 +2336,6 @@ __blk_mq_alloc_and_init_hctx(struct request_queue *q, atomic_set(&hctx->nr_active, 0); hctx->numa_node = node; - hctx->queue_num = hctx_idx; - if (node == NUMA_NO_NODE) hctx->numa_node = set->numa_node; node = hctx->numa_node; @@ -2322,10 +2346,6 @@ __blk_mq_alloc_and_init_hctx(struct request_queue *q, hctx->queue = q; hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED; - cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); - - hctx->tags = set->tags[hctx_idx]; - /* * Allocate space for all possible cpus to avoid allocation at * runtime @@ -2338,24 +2358,16 @@ __blk_mq_alloc_and_init_hctx(struct request_queue *q, if (sbitmap_init_node(&hctx->ctx_map, nr_cpu_ids, ilog2(8), GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, node)) goto free_ctxs; - hctx->nr_ctx = 0; spin_lock_init(&hctx->dispatch_wait_lock); init_waitqueue_func_entry(&hctx->dispatch_wait, blk_mq_dispatch_wake); INIT_LIST_HEAD(&hctx->dispatch_wait.entry); - if (set->ops->init_hctx && - set->ops->init_hctx(hctx, set->driver_data, hctx_idx)) - goto free_bitmap; - hctx->fq = blk_alloc_flush_queue(q, hctx->numa_node, set->cmd_size, GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY); if (!hctx->fq) - goto exit_hctx; - - if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, node)) - goto free_fq; + goto free_bitmap; if (hctx->flags & BLK_MQ_F_BLOCKING) init_srcu_struct(hctx->srcu); @@ -2363,11 +2375,6 @@ __blk_mq_alloc_and_init_hctx(struct request_queue *q, return hctx; - free_fq: - kfree(hctx->fq); - exit_hctx: - if (set->ops->exit_hctx) - set->ops->exit_hctx(hctx, hctx_idx); free_bitmap: sbitmap_free(&hctx->ctx_map); free_ctxs: @@ -2728,7 +2735,21 @@ static struct blk_mq_hw_ctx *blk_mq_alloc_and_init_hctx( struct blk_mq_tag_set *set, struct request_queue *q, int hctx_idx, int node) { - return __blk_mq_alloc_and_init_hctx(q, set, hctx_idx, node); + struct blk_mq_hw_ctx *hctx; + + hctx = blk_mq_alloc_hctx(q, set, hctx_idx, node); + if (!hctx) + goto fail; + + if (blk_mq_init_hctx(q, set, hctx, hctx_idx)) + goto free_hctx; + + return hctx; + + free_hctx: + kobject_put(&hctx->kobj); + fail: + return NULL; } static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, From patchwork Fri Apr 12 03:30:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897197 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1756922 for ; Fri, 12 Apr 2019 03:31:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8DD028E20 for ; Fri, 12 Apr 2019 03:31:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9DA7B28E23; Fri, 12 Apr 2019 03:31:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1FED928E20 for ; Fri, 12 Apr 2019 03:31:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726708AbfDLDbN (ORCPT ); Thu, 11 Apr 2019 23:31:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33608 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726678AbfDLDbN (ORCPT ); Thu, 11 Apr 2019 23:31:13 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EC98830BC665; Fri, 12 Apr 2019 03:31:12 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 008665D9CD; Fri, 12 Apr 2019 03:31:11 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 6/9] blk-mq: always free hctx after request queue is freed Date: Fri, 12 Apr 2019 11:30:29 +0800 Message-Id: <20190412033032.10418-7-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Fri, 12 Apr 2019 03:31:13 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In normal queue cleanup path, hctx is released after request queue is freed, see blk_mq_release(). However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because of hw queues shrinking. This way is easy to cause use-after-free, because: one implicit rule is that it is safe to call almost all block layer APIs if the request queue is alive; and one hctx may be retrieved by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues(); finally use-after-free is triggered. Fixes this issue by always freeing hctx after releasing request queue. If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce a per-queue list to hold them, then try to resuse these hctxs if numa node is matched. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Signed-off-by: Ming Lei --- block/blk-mq.c | 40 +++++++++++++++++++++++++++------------- include/linux/blk-mq.h | 2 ++ include/linux/blkdev.h | 7 +++++++ 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 71996fe494eb..886fbb678617 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2260,6 +2260,10 @@ static void blk_mq_exit_hctx(struct request_queue *q, set->ops->exit_hctx(hctx, hctx_idx); blk_mq_remove_cpuhp(hctx); + + spin_lock(&q->dead_hctx_lock); + list_add(&hctx->hctx_list, &q->dead_hctx_list); + spin_unlock(&q->dead_hctx_lock); } static void blk_mq_exit_hw_queues(struct request_queue *q, @@ -2660,15 +2664,13 @@ static int blk_mq_alloc_ctxs(struct request_queue *q) */ void blk_mq_release(struct request_queue *q) { - struct blk_mq_hw_ctx *hctx; - unsigned int i; + struct blk_mq_hw_ctx *hctx, *next; cancel_delayed_work_sync(&q->requeue_work); - /* hctx kobj stays in hctx */ - queue_for_each_hw_ctx(q, hctx, i) { - if (!hctx) - continue; + /* all hctx are in .dead_hctx_list now */ + list_for_each_entry_safe(hctx, next, &q->dead_hctx_list, hctx_list) { + list_del_init(&hctx->hctx_list); kobject_put(&hctx->kobj); } @@ -2735,9 +2737,22 @@ static struct blk_mq_hw_ctx *blk_mq_alloc_and_init_hctx( struct blk_mq_tag_set *set, struct request_queue *q, int hctx_idx, int node) { - struct blk_mq_hw_ctx *hctx; + struct blk_mq_hw_ctx *hctx = NULL, *tmp; + + /* reuse dead hctx first */ + spin_lock(&q->dead_hctx_lock); + list_for_each_entry(tmp, &q->dead_hctx_list, hctx_list) { + if (tmp->numa_node == node) { + hctx = tmp; + break; + } + } + if (hctx) + list_del_init(&hctx->hctx_list); + spin_unlock(&q->dead_hctx_lock); - hctx = blk_mq_alloc_hctx(q, set, hctx_idx, node); + if (!hctx) + hctx = blk_mq_alloc_hctx(q, set, hctx_idx, node); if (!hctx) goto fail; @@ -2775,10 +2790,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, hctx = blk_mq_alloc_and_init_hctx(set, q, i, node); if (hctx) { - if (hctxs[i]) { + if (hctxs[i]) blk_mq_exit_hctx(q, set, hctxs[i], i); - kobject_put(&hctxs[i]->kobj); - } hctxs[i] = hctx; } else { if (hctxs[i]) @@ -2809,9 +2822,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, if (hctx->tags) blk_mq_free_map_and_requests(set, j); blk_mq_exit_hctx(q, set, hctx, j); - kobject_put(&hctx->kobj); hctxs[j] = NULL; - } } mutex_unlock(&q->sysfs_lock); @@ -2854,6 +2865,9 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, if (!q->queue_hw_ctx) goto err_sys_init; + INIT_LIST_HEAD(&q->dead_hctx_list); + spin_lock_init(&q->dead_hctx_lock); + blk_mq_realloc_hw_ctxs(set, q); if (!q->nr_hw_queues) goto err_hctxs; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index cb2aa7ecafff..a44c3f95dcc1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -70,6 +70,8 @@ struct blk_mq_hw_ctx { struct dentry *sched_debugfs_dir; #endif + struct list_head hctx_list; + /* Must be the last member - see also blk_mq_hw_ctx_size(). */ struct srcu_struct srcu[0]; }; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 4b85dc066264..1325f941f0be 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -535,6 +535,13 @@ struct request_queue { struct mutex sysfs_lock; + /* + * for reusing dead hctx instance in case of updating + * nr_hw_queues + */ + struct list_head dead_hctx_list; + spinlock_t dead_hctx_lock; + atomic_t mq_freeze_depth; #if defined(CONFIG_BLK_DEV_BSG) From patchwork Fri Apr 12 03:30:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897201 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A92E51669 for ; Fri, 12 Apr 2019 03:31:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D3E728E20 for ; Fri, 12 Apr 2019 03:31:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 821FA28E23; Fri, 12 Apr 2019 03:31:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 239B828E20 for ; Fri, 12 Apr 2019 03:31:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726678AbfDLDbT (ORCPT ); Thu, 11 Apr 2019 23:31:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37398 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726801AbfDLDbT (ORCPT ); Thu, 11 Apr 2019 23:31:19 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2D9A13088A69; Fri, 12 Apr 2019 03:31:19 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id A47165C21E; Fri, 12 Apr 2019 03:31:15 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 7/9] blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release Date: Fri, 12 Apr 2019 11:30:30 +0800 Message-Id: <20190412033032.10418-8-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Fri, 12 Apr 2019 03:31:19 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP hctx is always released after requeue is freed. With holding queue's kobject refcount, it is safe for driver to run queue, so one run queue might be scheduled after blk_sync_queue() is done. So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release() for avoiding run released queue. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- block/blk-core.c | 8 -------- block/blk-mq-sysfs.c | 2 ++ 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 20298aa5a77c..ad17e999f79e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -232,14 +232,6 @@ void blk_sync_queue(struct request_queue *q) { del_timer_sync(&q->timeout); cancel_work_sync(&q->timeout_work); - - if (queue_is_mq(q)) { - struct blk_mq_hw_ctx *hctx; - int i; - - queue_for_each_hw_ctx(q, hctx, i) - cancel_delayed_work_sync(&hctx->run_work); - } } EXPORT_SYMBOL(blk_sync_queue); diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 4040e62c3737..25c0d0a6a556 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -35,6 +35,8 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj) struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx, kobj); + cancel_delayed_work_sync(&hctx->run_work); + if (hctx->flags & BLK_MQ_F_BLOCKING) cleanup_srcu_struct(hctx->srcu); blk_free_flush_queue(hctx->fq); From patchwork Fri Apr 12 03:30:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897205 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4C47922 for ; Fri, 12 Apr 2019 03:31:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A778B28E20 for ; Fri, 12 Apr 2019 03:31:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BD6728E23; Fri, 12 Apr 2019 03:31:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42E8328E20 for ; Fri, 12 Apr 2019 03:31:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726872AbfDLDbW (ORCPT ); Thu, 11 Apr 2019 23:31:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52874 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726731AbfDLDbW (ORCPT ); Thu, 11 Apr 2019 23:31:22 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 49F20308FC4D; Fri, 12 Apr 2019 03:31:22 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C6E35D6A9; Fri, 12 Apr 2019 03:31:21 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 8/9] block: don't drain in-progress dispatch in blk_cleanup_queue() Date: Fri, 12 Apr 2019 11:30:31 +0800 Message-Id: <20190412033032.10418-9-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Fri, 12 Apr 2019 03:31:22 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now freeing hw queue resource is moved to hctx's release handler, we don't need to worry about the race between blk_cleanup_queue and run queue any more. So don't drain in-progress dispatch in blk_cleanup_queue(). This is basically revert of c2856ae2f315 ("blk-mq: quiesce queue before freeing queue"). Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- block/blk-core.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ad17e999f79e..82b630da7e2c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -338,18 +338,6 @@ void blk_cleanup_queue(struct request_queue *q) blk_queue_flag_set(QUEUE_FLAG_DEAD, q); - /* - * make sure all in-progress dispatch are completed because - * blk_freeze_queue() can only complete all requests, and - * dispatch may still be in-progress since we dispatch requests - * from more than one contexts. - * - * We rely on driver to deal with the race in case that queue - * initialization isn't done. - */ - if (queue_is_mq(q) && blk_queue_init_done(q)) - blk_mq_quiesce_queue(q); - /* for synchronous bio-based driver finish in-flight integrity i/o */ blk_flush_integrity(); From patchwork Fri Apr 12 03:30:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10897211 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 308E8922 for ; Fri, 12 Apr 2019 03:31:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0FA1928E20 for ; Fri, 12 Apr 2019 03:31:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0420628E23; Fri, 12 Apr 2019 03:31:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87CDA28E20 for ; Fri, 12 Apr 2019 03:31:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726877AbfDLDb0 (ORCPT ); Thu, 11 Apr 2019 23:31:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38260 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726731AbfDLDbZ (ORCPT ); Thu, 11 Apr 2019 23:31:25 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8136387627; Fri, 12 Apr 2019 03:31:25 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id A6D4C5C21E; Fri, 12 Apr 2019 03:31:24 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Dongli Zhang , James Smart , Bart Van Assche , linux-scsi@vger.kernel.org, "Martin K . Petersen" , Christoph Hellwig , "James E . J . Bottomley" , jianchao wang Subject: [PATCH V5 9/9] SCSI: don't hold device refcount in IO path Date: Fri, 12 Apr 2019 11:30:32 +0800 Message-Id: <20190412033032.10418-10-ming.lei@redhat.com> In-Reply-To: <20190412033032.10418-1-ming.lei@redhat.com> References: <20190412033032.10418-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 12 Apr 2019 03:31:25 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP scsi_device's refcount is always grabbed in IO path. Turns out it isn't necessary, because blk_queue_cleanup() will drain any in-flight IOs, then cancel timeout/requeue work, and SCSI's requeue_work is canceled too in __scsi_remove_device(). Also scsi_device won't go away until blk_cleanup_queue() is done. So don't hold the refcount in IO path, especially the refcount isn't required in IO path since blk_queue_enter() / blk_queue_exit() is introduced in the legacy block layer. Cc: Dongli Zhang Cc: James Smart Cc: Bart Van Assche Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen , Cc: Christoph Hellwig , Cc: James E . J . Bottomley , Cc: jianchao wang Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke Acked-by: Martin K. Petersen --- drivers/scsi/scsi_lib.c | 28 ++-------------------------- 1 file changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 601b9f1de267..3369d66911eb 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -141,8 +141,6 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason) static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd) { - struct scsi_device *sdev = cmd->device; - if (cmd->request->rq_flags & RQF_DONTPREP) { cmd->request->rq_flags &= ~RQF_DONTPREP; scsi_mq_uninit_cmd(cmd); @@ -150,7 +148,6 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd) WARN_ON_ONCE(true); } blk_mq_requeue_request(cmd->request, true); - put_device(&sdev->sdev_gendev); } /** @@ -189,19 +186,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy) */ cmd->result = 0; - /* - * Before a SCSI command is dispatched, - * get_device(&sdev->sdev_gendev) is called and the host, - * target and device busy counters are increased. Since - * requeuing a request causes these actions to be repeated and - * since scsi_device_unbusy() has already been called, - * put_device(&device->sdev_gendev) must still be called. Call - * put_device() after blk_mq_requeue_request() to avoid that - * removal of the SCSI device can start before requeueing has - * happened. - */ blk_mq_requeue_request(cmd->request, true); - put_device(&device->sdev_gendev); } /* @@ -619,7 +604,6 @@ static bool scsi_end_request(struct request *req, blk_status_t error, blk_mq_run_hw_queues(q, true); percpu_ref_put(&q->q_usage_counter); - put_device(&sdev->sdev_gendev); return false; } @@ -1613,7 +1597,6 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx) struct scsi_device *sdev = q->queuedata; atomic_dec(&sdev->device_busy); - put_device(&sdev->sdev_gendev); } static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) @@ -1621,16 +1604,9 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct scsi_device *sdev = q->queuedata; - if (!get_device(&sdev->sdev_gendev)) - goto out; - if (!scsi_dev_queue_ready(q, sdev)) - goto out_put_device; - - return true; + if (scsi_dev_queue_ready(q, sdev)) + return true; -out_put_device: - put_device(&sdev->sdev_gendev); -out: if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); return false;