From patchwork Wed Apr 21 00:02:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 12215169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E153C433ED for ; Wed, 21 Apr 2021 00:02:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D85561418 for ; Wed, 21 Apr 2021 00:02:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234387AbhDUADS (ORCPT ); Tue, 20 Apr 2021 20:03:18 -0400 Received: from mail-pf1-f174.google.com ([209.85.210.174]:41716 "EHLO mail-pf1-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234359AbhDUADS (ORCPT ); Tue, 20 Apr 2021 20:03:18 -0400 Received: by mail-pf1-f174.google.com with SMTP id w6so12382490pfc.8 for ; Tue, 20 Apr 2021 17:02:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qi9EjZACA+wXjNYVSb3/f0Q0Q/+q7Lww8fd5pVf2aDQ=; b=AwQpdj66QlA0rI8ZNSrxuf8j749jauXGwEp3niAcmIe7seCbvIpSSR2N4P4DGast5k EgdkDWfeBxwp9vdyAXIcC34bMlHtC2QEePyzS531jeS2aUQJGOJ1Pa1N4a3pxIKCK22X /xSkeod+3NUe/z2llwE5SLM2ziG3PmR++jQlz4wjvu/KGnTlsOgwVOMcXOTxKhDCZsxN k3XXLdZuEgtyZOb+8S7cssldUChPCUEGQhAWNnc8fyKvYNU1KdWCaEOH7a8gxZgP68fi 2G3TMZc5ouIs5sfsedTcH8LaJNSvaOTdLt8n1CMPj+eJHUd6V+WWnIr7Nc0IPsQZ0R3G sC4Q== X-Gm-Message-State: AOAM530Ia1NkwEQUkzpEnkNgYoyOoelJLhs0StXOEKTj8TTNBJJrrHvD 5HV2wdZBRBlZOUQHrGRqY6I= X-Google-Smtp-Source: ABdhPJxpwzrlGhexXQRRX1mMhM3hCVZd65s4IvZeiSC+4nPIUJc+rAT9OYQ0iIjse/navantIkPBvQ== X-Received: by 2002:a17:90b:34b:: with SMTP id fh11mr7922824pjb.105.1618963364529; Tue, 20 Apr 2021 17:02:44 -0700 (PDT) Received: from asus.hsd1.ca.comcast.net ([2601:647:4000:d7:6cb:4566:9005:c2af]) by smtp.gmail.com with ESMTPSA id 33sm149560pgq.21.2021.04.20.17.02.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:02:44 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Daniel Wagner , Bart Van Assche , "Martin K . Petersen" , Khazhismel Kumykov , Shin'ichiro Kawasaki , Ming Lei , Hannes Reinecke , Johannes Thumshirn , John Garry Subject: [PATCH v7 1/5] blk-mq: Move the elevator_exit() definition Date: Tue, 20 Apr 2021 17:02:31 -0700 Message-Id: <20210421000235.2028-2-bvanassche@acm.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210421000235.2028-1-bvanassche@acm.org> References: <20210421000235.2028-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since elevator_exit() has only one caller, move its definition from block/blk.h into block/elevator.c. Remove the inline keyword since modern compilers are smart enough to decide when to inline functions that occur in the same compilation unit. Reviewed-by: Christoph Hellwig Acked-by: Martin K. Petersen Reviewed-by: Khazhismel Kumykov Reviewed-by: Daniel Wagner Tested-by: Shin'ichiro Kawasaki Cc: Ming Lei Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: John Garry Cc: Khazhy Kumykov Signed-off-by: Bart Van Assche --- block/blk.h | 9 --------- block/elevator.c | 8 ++++++++ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/block/blk.h b/block/blk.h index 8b3591aee0a5..529233957207 100644 --- a/block/blk.h +++ b/block/blk.h @@ -199,15 +199,6 @@ void __elevator_exit(struct request_queue *, struct elevator_queue *); int elv_register_queue(struct request_queue *q, bool uevent); void elv_unregister_queue(struct request_queue *q); -static inline void elevator_exit(struct request_queue *q, - struct elevator_queue *e) -{ - lockdep_assert_held(&q->sysfs_lock); - - blk_mq_sched_free_requests(q); - __elevator_exit(q, e); -} - ssize_t part_size_show(struct device *dev, struct device_attribute *attr, char *buf); ssize_t part_stat_show(struct device *dev, struct device_attribute *attr, diff --git a/block/elevator.c b/block/elevator.c index 440699c28119..7c486ce858e0 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -197,6 +197,14 @@ void __elevator_exit(struct request_queue *q, struct elevator_queue *e) kobject_put(&e->kobj); } +static void elevator_exit(struct request_queue *q, struct elevator_queue *e) +{ + lockdep_assert_held(&q->sysfs_lock); + + blk_mq_sched_free_requests(q); + __elevator_exit(q, e); +} + static inline void __elv_rqhash_del(struct request *rq) { hash_del(&rq->hash); From patchwork Wed Apr 21 00:02:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 12215171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49657C433ED for ; Wed, 21 Apr 2021 00:02:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 25A7F6141A for ; Wed, 21 Apr 2021 00:02:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234481AbhDUADV (ORCPT ); Tue, 20 Apr 2021 20:03:21 -0400 Received: from mail-pf1-f169.google.com ([209.85.210.169]:37709 "EHLO mail-pf1-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233992AbhDUADT (ORCPT ); Tue, 20 Apr 2021 20:03:19 -0400 Received: by mail-pf1-f169.google.com with SMTP id y62so3099782pfg.4 for ; Tue, 20 Apr 2021 17:02:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JF0Me0nlvU6WC5znjax/GIqi38ZLViOnIk+0MVduJIQ=; b=XM7F3jteBftPlt5jyMtkAqXhimvxNm2+sX+45FWDyjAIjRu+mCWA90O2gpL83YdTbq rcNf4ZRaDKE2rk08gZT+gChU1PkR5G45dSYc34XQGY86kw72an0tYB+xAQq5FU++pLnT U0X/TJWFWPLGqFMDxdXUsLhYMsc9aKuA5Mjyd/3EzT5vIhg99pGsIRR1L6DaKkOp+pcu nHgrRKVTYQHRg6APc3/6hcU9o15aCrsGSNpcOG0MSkW9EMsXUmXmnaDeBX0KFUkXUc9+ McuvaDlWbDSs3mB41XeK7JoQU7pdZlNfbkYUiLqjKkmkYvd8Sl7Z6XH9pIC8dTuNt2Sp V7kw== X-Gm-Message-State: AOAM530D6gxStE/Q0p4fGra2UD2TI4jUESWN73NyYycYlhRN3vdvIxIy HgOqiyecpjZVD5SuIdbrDjM= X-Google-Smtp-Source: ABdhPJwBCQc50GTJvpeVB3/IDJaZxC2sSz4jviNfIcYUrKetOTrqq8aIDBRmxsa/ui9pPmJJpMzbRg== X-Received: by 2002:a63:4513:: with SMTP id s19mr18844193pga.34.1618963365825; Tue, 20 Apr 2021 17:02:45 -0700 (PDT) Received: from asus.hsd1.ca.comcast.net ([2601:647:4000:d7:6cb:4566:9005:c2af]) by smtp.gmail.com with ESMTPSA id 33sm149560pgq.21.2021.04.20.17.02.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:02:45 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Daniel Wagner , Bart Van Assche , "Martin K . Petersen" , Khazhismel Kumykov , Shin'ichiro Kawasaki , Ming Lei , Hannes Reinecke , Johannes Thumshirn , John Garry Subject: [PATCH v7 2/5] blk-mq: Introduce atomic variants of blk_mq_(all_tag|tagset_busy)_iter Date: Tue, 20 Apr 2021 17:02:32 -0700 Message-Id: <20210421000235.2028-3-bvanassche@acm.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210421000235.2028-1-bvanassche@acm.org> References: <20210421000235.2028-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since in the next patch knowledge is required of whether or not it is allowed to sleep inside the tag iteration functions, pass this context information to the tag iteration functions. I have reviewed all callers of tag iteration functions to verify these annotations by starting from the output of the following grep command: git grep -nHE 'blk_mq_(all_tag|tagset_busy)_iter' My conclusions from that analysis are as follows: - Sleeping is allowed in the blk-mq-debugfs code that iterates over tags. - The nbd_clear_req() callback that is passed to blk_mq_tagset_busy_iter() by the nbd driver may sleep. - Since the blk_mq_tagset_busy_iter() calls in the mtip32xx driver are preceded by a function that sleeps (blk_mq_quiesce_queue()), sleeping is safe in the context of the blk_mq_tagset_busy_iter() calls. - The same reasoning also applies to the nbd driver. - All blk_mq_tagset_busy_iter() calls in the NVMe drivers are followed by a call to a function that sleeps so sleeping inside blk_mq_tagset_busy_iter() when called from the NVMe driver is fine. - scsi_host_busy(), scsi_host_complete_all_commands() and scsi_host_busy_iter() are used by multiple SCSI LLDs so analyzing whether or not these functions may sleep is hard. Instead of performing that analysis, make it safe to call these functions from atomic context. Reviewed-by: Christoph Hellwig Acked-by: Martin K. Petersen Reviewed-by: Khazhismel Kumykov Reviewed-by: Daniel Wagner Tested-by: Shin'ichiro Kawasaki Cc: Ming Lei Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: John Garry Cc: Khazhy Kumykov Signed-off-by: Bart Van Assche --- block/blk-mq-tag.c | 38 +++++++++++++++++++++++++++++++++----- block/blk-mq-tag.h | 2 +- block/blk-mq.c | 2 +- drivers/scsi/hosts.c | 16 ++++++++-------- drivers/scsi/ufs/ufshcd.c | 4 ++-- include/linux/blk-mq.h | 2 ++ 6 files changed, 47 insertions(+), 17 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 2a37731e8244..d8eaa38a1bd1 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -322,18 +322,19 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags, } /** - * blk_mq_all_tag_iter - iterate over all requests in a tag map + * blk_mq_all_tag_iter_atomic - iterate over all requests in a tag map * @tags: Tag map to iterate over. * @fn: Pointer to the function that will be called for each * request. @fn will be called as follows: @fn(rq, @priv, * reserved) where rq is a pointer to a request. 'reserved' * indicates whether or not @rq is a reserved request. Return - * true to continue iterating tags, false to stop. + * true to continue iterating tags, false to stop. Must not + * sleep. * @priv: Will be passed as second argument to @fn. * - * Caller has to pass the tag map from which requests are allocated. + * Does not sleep. */ -void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, +void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv) { __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS); @@ -348,6 +349,8 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, * indicates whether or not @rq is a reserved request. Return * true to continue iterating tags, false to stop. * @priv: Will be passed as second argument to @fn. + * + * May sleep. */ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv) @@ -362,6 +365,31 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, } EXPORT_SYMBOL(blk_mq_tagset_busy_iter); +/** + * blk_mq_tagset_busy_iter_atomic - iterate over all started requests in a tag set + * @tagset: Tag set to iterate over. + * @fn: Pointer to the function that will be called for each started + * request. @fn will be called as follows: @fn(rq, @priv, + * reserved) where rq is a pointer to a request. 'reserved' + * indicates whether or not @rq is a reserved request. Return + * true to continue iterating tags, false to stop. Must not sleep. + * @priv: Will be passed as second argument to @fn. + * + * Does not sleep. + */ +void blk_mq_tagset_busy_iter_atomic(struct blk_mq_tag_set *tagset, + busy_tag_iter_fn *fn, void *priv) +{ + int i; + + for (i = 0; i < tagset->nr_hw_queues; i++) { + if (tagset->tags && tagset->tags[i]) + __blk_mq_all_tag_iter(tagset->tags[i], fn, priv, + BT_TAG_ITER_STARTED); + } +} +EXPORT_SYMBOL(blk_mq_tagset_busy_iter_atomic); + static bool blk_mq_tagset_count_completed_rqs(struct request *rq, void *data, bool reserved) { @@ -384,7 +412,7 @@ void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset) while (true) { unsigned count = 0; - blk_mq_tagset_busy_iter(tagset, + blk_mq_tagset_busy_iter_atomic(tagset, blk_mq_tagset_count_completed_rqs, &count); if (!count) break; diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 7d3e6b333a4a..0290c308ece9 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -43,7 +43,7 @@ extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv); -void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, +void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv); static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt, diff --git a/block/blk-mq.c b/block/blk-mq.c index 927189a55575..79c01b1f885c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2484,7 +2484,7 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx) .hctx = hctx, }; - blk_mq_all_tag_iter(tags, blk_mq_has_request, &data); + blk_mq_all_tag_iter_atomic(tags, blk_mq_has_request, &data); return data.has_rq; } diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 697c09ef259b..f8863aa88642 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -573,8 +573,8 @@ int scsi_host_busy(struct Scsi_Host *shost) { int cnt = 0; - blk_mq_tagset_busy_iter(&shost->tag_set, - scsi_host_check_in_flight, &cnt); + blk_mq_tagset_busy_iter_atomic(&shost->tag_set, + scsi_host_check_in_flight, &cnt); return cnt; } EXPORT_SYMBOL(scsi_host_busy); @@ -672,8 +672,8 @@ static bool complete_all_cmds_iter(struct request *rq, void *data, bool rsvd) */ void scsi_host_complete_all_commands(struct Scsi_Host *shost, int status) { - blk_mq_tagset_busy_iter(&shost->tag_set, complete_all_cmds_iter, - &status); + blk_mq_tagset_busy_iter_atomic(&shost->tag_set, complete_all_cmds_iter, + &status); } EXPORT_SYMBOL_GPL(scsi_host_complete_all_commands); @@ -694,11 +694,11 @@ static bool __scsi_host_busy_iter_fn(struct request *req, void *priv, /** * scsi_host_busy_iter - Iterate over all busy commands * @shost: Pointer to Scsi_Host. - * @fn: Function to call on each busy command + * @fn: Function to call on each busy command. Must not sleep. * @priv: Data pointer passed to @fn * * If locking against concurrent command completions is required - * ithas to be provided by the caller + * it has to be provided by the caller. **/ void scsi_host_busy_iter(struct Scsi_Host *shost, bool (*fn)(struct scsi_cmnd *, void *, bool), @@ -709,7 +709,7 @@ void scsi_host_busy_iter(struct Scsi_Host *shost, .priv = priv, }; - blk_mq_tagset_busy_iter(&shost->tag_set, __scsi_host_busy_iter_fn, - &iter_data); + blk_mq_tagset_busy_iter_atomic(&shost->tag_set, + __scsi_host_busy_iter_fn, &iter_data); } EXPORT_SYMBOL_GPL(scsi_host_busy_iter); diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index d3d05e997c13..d6975c501e44 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1380,7 +1380,7 @@ static bool ufshcd_any_tag_in_use(struct ufs_hba *hba) struct request_queue *q = hba->cmd_queue; int busy = 0; - blk_mq_tagset_busy_iter(q->tag_set, ufshcd_is_busy, &busy); + blk_mq_tagset_busy_iter_atomic(q->tag_set, ufshcd_is_busy, &busy); return busy; } @@ -6269,7 +6269,7 @@ static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba) .pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL), }; - blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci); + blk_mq_tagset_busy_iter_atomic(q->tag_set, ufshcd_compl_tm, &ci); return ci.ncpl ? IRQ_HANDLED : IRQ_NONE; } diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2c473c9b8990..dfa0114a49fd 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -526,6 +526,8 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async); void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs); void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv); +void blk_mq_tagset_busy_iter_atomic(struct blk_mq_tag_set *tagset, + busy_tag_iter_fn *fn, void *priv); void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset); void blk_mq_freeze_queue(struct request_queue *q); void blk_mq_unfreeze_queue(struct request_queue *q); From patchwork Wed Apr 21 00:02:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 12215175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B63AC43460 for ; Wed, 21 Apr 2021 00:02:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE01C61417 for ; Wed, 21 Apr 2021 00:02:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234446AbhDUADV (ORCPT ); Tue, 20 Apr 2021 20:03:21 -0400 Received: from mail-pl1-f174.google.com ([209.85.214.174]:42927 "EHLO mail-pl1-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234359AbhDUADT (ORCPT ); Tue, 20 Apr 2021 20:03:19 -0400 Received: by mail-pl1-f174.google.com with SMTP id v13so7002292ple.9 for ; Tue, 20 Apr 2021 17:02:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=O5RU06VCS740SesFX5PqSljZxf/UePUb3diVzKbRFO4=; b=F9Xzdaz5akL11s6f0RHSdJkzlDt9iwFu+42ckn1asMbzCDaNusNGyxUWx282LlDOTn VLV9qzezYVSeY5TSszzeGX5hoic9d7HnDNn+6oG3tpLHwJqde2B3y/Ik/NdK4P/sGadL 0Slr+U/51D1qdNPIdmOG5z/hpX/u0tqcFKUnC3hwAXA4QabAgMNU8bi1df+9/NFEU/vs lgaxTbyrKyQjOeDAURtK/ir5C3CPCE1jPTevNZR0PIPhpQZzp8VQI1e9oRHrDw2bJufs 3EfP0PmnE2awtV0Dxa8ufQ64dpOHnXEZH5/zFMHjxancpF2S5Haxev4sd/KwAeifjzrn AzlA== X-Gm-Message-State: AOAM530GrtRgFjejzypg4KDnCSrvboSiJgADqsbIubUoVjIpilRQE53T kBjHUIjtJm8D6b2sFKINR8c= X-Google-Smtp-Source: ABdhPJw9atDlYHfUEMAaktHG8V3U7HLmhaiWBx6RbRdg/NMvgM/4JIu+sDtAJyxoyRyuIRfg3iT8jg== X-Received: by 2002:a17:902:6b02:b029:e9:8e2:d107 with SMTP id o2-20020a1709026b02b02900e908e2d107mr31486910plk.61.1618963367129; Tue, 20 Apr 2021 17:02:47 -0700 (PDT) Received: from asus.hsd1.ca.comcast.net ([2601:647:4000:d7:6cb:4566:9005:c2af]) by smtp.gmail.com with ESMTPSA id 33sm149560pgq.21.2021.04.20.17.02.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:02:46 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Daniel Wagner , Bart Van Assche , Khazhismel Kumykov , Shin'ichiro Kawasaki , "Martin K . Petersen" , Ming Lei , Hannes Reinecke , Johannes Thumshirn , John Garry Subject: [PATCH v7 3/5] blk-mq: Fix races between iterating over requests and freeing requests Date: Tue, 20 Apr 2021 17:02:33 -0700 Message-Id: <20210421000235.2028-4-bvanassche@acm.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210421000235.2028-1-bvanassche@acm.org> References: <20210421000235.2028-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When multiple request queues share a tag set and when switching the I/O scheduler for one of the request queues associated with that tag set, the following race can happen: * blk_mq_tagset_busy_iter() calls bt_tags_iter() and bt_tags_iter() assigns a pointer to a scheduler request to the local variable 'rq'. * blk_mq_sched_free_requests() is called to free hctx->sched_tags. * blk_mq_tagset_busy_iter() dereferences 'rq' and triggers a use-after-free. Fix this race as follows: * Use rcu_assign_pointer() and rcu_dereference() to access hctx->tags->rqs[]. The performance impact of the assignments added to the hot path is minimal (about 1% according to one particular test). * Protect hctx->tags->rqs[] reads with an RCU read-side lock or with a semaphore. Which mechanism is used depends on whether or not it is allowed to sleep and also on whether or not the callback function may sleep. * Wait for all preexisting readers to finish before freeing scheduler requests. Another race is as follows: * blk_mq_sched_free_requests() is called to free hctx->sched_tags. * blk_mq_queue_tag_busy_iter() iterates over the same tag set but for another request queue than the queue for which scheduler tags are being freed. * bt_iter() examines rq->q after *rq has been freed. Fix this race by protecting the rq->q read in bt_iter() with an RCU read lock and by calling synchronize_rcu() before freeing scheduler tags. Multiple users have reported use-after-free complaints similar to the following (from https://lore.kernel.org/linux-block/1545261885.185366.488.camel@acm.org/ ): BUG: KASAN: use-after-free in bt_iter+0x86/0xf0 Read of size 8 at addr ffff88803b335240 by task fio/21412 CPU: 0 PID: 21412 Comm: fio Tainted: G W 4.20.0-rc6-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_address_description+0x71/0x239 kasan_report.cold.5+0x242/0x301 __asan_load8+0x54/0x90 bt_iter+0x86/0xf0 blk_mq_queue_tag_busy_iter+0x373/0x5e0 blk_mq_in_flight+0x96/0xb0 part_in_flight+0x40/0x140 part_round_stats+0x18e/0x370 blk_account_io_start+0x3d7/0x670 blk_mq_bio_to_request+0x19c/0x3a0 blk_mq_make_request+0x7a9/0xcb0 generic_make_request+0x41d/0x960 submit_bio+0x9b/0x250 do_blockdev_direct_IO+0x435c/0x4c70 __blockdev_direct_IO+0x79/0x88 ext4_direct_IO+0x46c/0xc00 generic_file_direct_write+0x119/0x210 __generic_file_write_iter+0x11c/0x280 ext4_file_write_iter+0x1b8/0x6f0 aio_write+0x204/0x310 io_submit_one+0x9d3/0xe80 __x64_sys_io_submit+0x115/0x340 do_syscall_64+0x71/0x210 Reviewed-by: Khazhismel Kumykov Tested-by: Shin'ichiro Kawasaki Cc: Christoph Hellwig Cc: Martin K. Petersen Cc: Shin'ichiro Kawasaki Cc: Ming Lei Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: John Garry Cc: Khazhy Kumykov Signed-off-by: Bart Van Assche --- block/blk-core.c | 34 ++++++++++++++++++++++++++++++- block/blk-mq-tag.c | 51 ++++++++++++++++++++++++++++++++++++++++------ block/blk-mq-tag.h | 4 +++- block/blk-mq.c | 21 +++++++++++++++---- block/blk-mq.h | 1 + block/blk.h | 2 ++ block/elevator.c | 1 + 7 files changed, 102 insertions(+), 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 9bcdae93f6d4..ca7f833e25a8 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -279,6 +279,36 @@ void blk_dump_rq_flags(struct request *rq, char *msg) } EXPORT_SYMBOL(blk_dump_rq_flags); +/** + * blk_mq_wait_for_tag_iter - wait for preexisting tag iteration functions to finish + * @set: Tag set to wait on. + * + * Waits for preexisting calls of blk_mq_all_tag_iter(), + * blk_mq_tagset_busy_iter() and also for their atomic variants to finish + * accessing hctx->tags->rqs[]. New readers may start while this function is + * in progress or after this function has returned. Use this function to make + * sure that hctx->tags->rqs[] changes have become globally visible. + * + * Waits for preexisting blk_mq_queue_tag_busy_iter(q, fn, priv) calls to + * finish accessing requests associated with other request queues than 'q'. + */ +void blk_mq_wait_for_tag_iter(struct blk_mq_tag_set *set) +{ + struct blk_mq_tags *tags; + int i; + + if (set->tags) { + for (i = 0; i < set->nr_hw_queues; i++) { + tags = set->tags[i]; + if (!tags) + continue; + down_write(&tags->iter_rwsem); + up_write(&tags->iter_rwsem); + } + } + synchronize_rcu(); +} + /** * blk_sync_queue - cancel any pending callbacks on a queue * @q: the queue @@ -412,8 +442,10 @@ void blk_cleanup_queue(struct request_queue *q) * it is safe to free requests now. */ mutex_lock(&q->sysfs_lock); - if (q->elevator) + if (q->elevator) { + blk_mq_wait_for_tag_iter(q->tag_set); blk_mq_sched_free_requests(q); + } mutex_unlock(&q->sysfs_lock); percpu_ref_exit(&q->q_usage_counter); diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index d8eaa38a1bd1..39d5c9190a6b 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -209,14 +209,24 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) if (!reserved) bitnr += tags->nr_reserved_tags; - rq = tags->rqs[bitnr]; - + rcu_read_lock(); + /* + * The request 'rq' points at is protected by an RCU read lock until + * its queue pointer has been verified and by q_usage_count while the + * callback function is being invoked. See also the + * percpu_ref_tryget() and blk_queue_exit() calls in + * blk_mq_queue_tag_busy_iter(). + */ + rq = rcu_dereference(tags->rqs[bitnr]); /* * We can hit rq == NULL here, because the tagging functions * test and set the bit before assigning ->rqs[]. */ - if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx) + if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx) { + rcu_read_unlock(); return iter_data->fn(hctx, rq, iter_data->data, reserved); + } + rcu_read_unlock(); return true; } @@ -254,11 +264,17 @@ struct bt_tags_iter_data { unsigned int flags; }; +/* Include reserved tags. */ #define BT_TAG_ITER_RESERVED (1 << 0) +/* Only include started requests. */ #define BT_TAG_ITER_STARTED (1 << 1) +/* Iterate over tags->static_rqs[] instead of tags->rqs[]. */ #define BT_TAG_ITER_STATIC_RQS (1 << 2) +/* The callback function may sleep. */ +#define BT_TAG_ITER_MAY_SLEEP (1 << 3) -static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) +static bool __bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, + void *data) { struct bt_tags_iter_data *iter_data = data; struct blk_mq_tags *tags = iter_data->tags; @@ -275,7 +291,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) if (iter_data->flags & BT_TAG_ITER_STATIC_RQS) rq = tags->static_rqs[bitnr]; else - rq = tags->rqs[bitnr]; + rq = rcu_dereference_check(tags->rqs[bitnr], + lockdep_is_held(&tags->iter_rwsem)); if (!rq) return true; if ((iter_data->flags & BT_TAG_ITER_STARTED) && @@ -284,6 +301,25 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) return iter_data->fn(rq, iter_data->data, reserved); } +static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) +{ + struct bt_tags_iter_data *iter_data = data; + struct blk_mq_tags *tags = iter_data->tags; + bool res; + + if (iter_data->flags & BT_TAG_ITER_MAY_SLEEP) { + down_read(&tags->iter_rwsem); + res = __bt_tags_iter(bitmap, bitnr, data); + up_read(&tags->iter_rwsem); + } else { + rcu_read_lock(); + res = __bt_tags_iter(bitmap, bitnr, data); + rcu_read_unlock(); + } + + return res; +} + /** * bt_tags_for_each - iterate over the requests in a tag map * @tags: Tag map to iterate over. @@ -357,10 +393,12 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, { int i; + might_sleep(); + for (i = 0; i < tagset->nr_hw_queues; i++) { if (tagset->tags && tagset->tags[i]) __blk_mq_all_tag_iter(tagset->tags[i], fn, priv, - BT_TAG_ITER_STARTED); + BT_TAG_ITER_STARTED | BT_TAG_ITER_MAY_SLEEP); } } EXPORT_SYMBOL(blk_mq_tagset_busy_iter); @@ -544,6 +582,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, tags->nr_tags = total_tags; tags->nr_reserved_tags = reserved_tags; + init_rwsem(&tags->iter_rwsem); if (blk_mq_is_sbitmap_shared(flags)) return tags; diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 0290c308ece9..d1d73d7cc7df 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -17,9 +17,11 @@ struct blk_mq_tags { struct sbitmap_queue __bitmap_tags; struct sbitmap_queue __breserved_tags; - struct request **rqs; + struct request __rcu **rqs; struct request **static_rqs; struct list_head page_list; + + struct rw_semaphore iter_rwsem; }; extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, diff --git a/block/blk-mq.c b/block/blk-mq.c index 79c01b1f885c..8b59f6b4ec8e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -496,8 +496,10 @@ static void __blk_mq_free_request(struct request *rq) blk_crypto_free_request(rq); blk_pm_mark_last_busy(rq); rq->mq_hctx = NULL; - if (rq->tag != BLK_MQ_NO_TAG) + if (rq->tag != BLK_MQ_NO_TAG) { blk_mq_put_tag(hctx->tags, ctx, rq->tag); + rcu_assign_pointer(hctx->tags->rqs[rq->tag], NULL); + } if (sched_tag != BLK_MQ_NO_TAG) blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag); blk_mq_sched_restart(hctx); @@ -839,9 +841,20 @@ EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list); struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag) { + struct request *rq; + if (tag < tags->nr_tags) { - prefetch(tags->rqs[tag]); - return tags->rqs[tag]; + /* + * Freeing tags happens with the request queue frozen so the + * rcu dereference below is protected by the request queue + * usage count. We can only verify that usage count after + * having read the request pointer. + */ + rq = rcu_dereference_check(tags->rqs[tag], true); + WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && rq && + percpu_ref_is_zero(&rq->q->q_usage_counter)); + prefetch(rq); + return rq; } return NULL; @@ -1112,7 +1125,7 @@ static bool blk_mq_get_driver_tag(struct request *rq) rq->rq_flags |= RQF_MQ_INFLIGHT; __blk_mq_inc_active_requests(hctx); } - hctx->tags->rqs[rq->tag] = rq; + rcu_assign_pointer(hctx->tags->rqs[rq->tag], rq); return true; } diff --git a/block/blk-mq.h b/block/blk-mq.h index 3616453ca28c..9ccb1818303b 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -226,6 +226,7 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq) { blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag); + rcu_assign_pointer(hctx->tags->rqs[rq->tag], NULL); rq->tag = BLK_MQ_NO_TAG; if (rq->rq_flags & RQF_MQ_INFLIGHT) { diff --git a/block/blk.h b/block/blk.h index 529233957207..d88b0823738c 100644 --- a/block/blk.h +++ b/block/blk.h @@ -185,6 +185,8 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list, void blk_account_io_start(struct request *req); void blk_account_io_done(struct request *req, u64 now); +void blk_mq_wait_for_tag_iter(struct blk_mq_tag_set *set); + /* * Internal elevator interface */ diff --git a/block/elevator.c b/block/elevator.c index 7c486ce858e0..aae9cff6d5ae 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -201,6 +201,7 @@ static void elevator_exit(struct request_queue *q, struct elevator_queue *e) { lockdep_assert_held(&q->sysfs_lock); + blk_mq_wait_for_tag_iter(q->tag_set); blk_mq_sched_free_requests(q); __elevator_exit(q, e); } From patchwork Wed Apr 21 00:02:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 12215173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B023EC433B4 for ; Wed, 21 Apr 2021 00:02:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7D97B6141C for ; Wed, 21 Apr 2021 00:02:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233992AbhDUADV (ORCPT ); Tue, 20 Apr 2021 20:03:21 -0400 Received: from mail-pl1-f169.google.com ([209.85.214.169]:34454 "EHLO mail-pl1-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234453AbhDUADU (ORCPT ); Tue, 20 Apr 2021 20:03:20 -0400 Received: by mail-pl1-f169.google.com with SMTP id t22so20271234ply.1 for ; Tue, 20 Apr 2021 17:02:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1Di2CfCL8byMqwy8hyI2sa5jc4+9OFqr/SRk2zQLdeM=; b=Be//+5Kzgfyhyek2MfNdOeOR885lbyMu40idi4VddoPjTBpPBOJdywAYU0ceeKGxvn abUStz+eicnIFLi0DPt3mJK41HiKHIsQT0igTfNw4oUvOLfl+ARhE+zs+/UF9w9JpXWF CoRvhtcvdjYanvetHkDtx+tMpoXHHAzrudgcZypcBW+8BfYFf+gZ5MUO3lJTb4gm7MwN oPoJPlsfaCacJAABgKDf3pOURqxlUxv3Cl0I9yu4FU70rqUylcDd4oa7QH9fiHltXJS7 unT8hfXmI1xJAKEf0mjFbrOZIEEQTqMT1bAska+9Vd5tKRd/klnsrbA2ipvjGVIIvCBm WzcA== X-Gm-Message-State: AOAM5315ZtlPXyUEeM9DmkRkLqMYIdloPkcsfW53h+VWiXuGQIdOiUGJ Im2xKu5+pMWqwK0MnfvFlXc= X-Google-Smtp-Source: ABdhPJxfyTo7XojfssKUK2iHKCKuXVlqbyWus86u58yQlhdaawkPQnY1WRA/NH9dcKcZ2UYo+n5O5A== X-Received: by 2002:a17:902:c745:b029:eb:6fc0:39e7 with SMTP id q5-20020a170902c745b02900eb6fc039e7mr30722131plq.82.1618963368393; Tue, 20 Apr 2021 17:02:48 -0700 (PDT) Received: from asus.hsd1.ca.comcast.net ([2601:647:4000:d7:6cb:4566:9005:c2af]) by smtp.gmail.com with ESMTPSA id 33sm149560pgq.21.2021.04.20.17.02.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:02:47 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Daniel Wagner , Bart Van Assche , Khazhismel Kumykov , Shin'ichiro Kawasaki , "Martin K . Petersen" , Ming Lei , Hannes Reinecke , Johannes Thumshirn , John Garry Subject: [PATCH v7 4/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Date: Tue, 20 Apr 2021 17:02:34 -0700 Message-Id: <20210421000235.2028-5-bvanassche@acm.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210421000235.2028-1-bvanassche@acm.org> References: <20210421000235.2028-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since the next patch in this series will use RCU to iterate over tag_list, make this safe. Note: call_rcu() is already used to free the request queue. From blk-sysfs.c: call_rcu(&q->rcu_head, blk_free_queue_rcu); See also: * Commit 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list"; v4.12). * Commit 08c875cbf481 ("block: Use non _rcu version of list functions for tag_set_list"; v5.9). Reviewed-by: Khazhismel Kumykov Tested-by: Shin'ichiro Kawasaki Cc: Christoph Hellwig Cc: Martin K. Petersen Cc: Shin'ichiro Kawasaki Cc: Ming Lei Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: John Garry Cc: Daniel Wagner Signed-off-by: Bart Van Assche --- block/blk-mq.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 8b59f6b4ec8e..7d2ea6357c7d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2947,7 +2947,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) struct blk_mq_tag_set *set = q->tag_set; mutex_lock(&set->tag_list_lock); - list_del(&q->tag_set_list); + list_del_rcu(&q->tag_set_list); if (list_is_singular(&set->tag_list)) { /* just transitioned to unshared */ set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED; @@ -2955,7 +2955,11 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) blk_mq_update_tag_set_shared(set, false); } mutex_unlock(&set->tag_list_lock); - INIT_LIST_HEAD(&q->tag_set_list); + /* + * Calling synchronize_rcu() and INIT_LIST_HEAD(&q->tag_set_list) is + * not necessary since blk_mq_del_queue_tag_set() is only called from + * blk_cleanup_queue(). + */ } static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set, From patchwork Wed Apr 21 00:02:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 12215177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 323B1C43461 for ; Wed, 21 Apr 2021 00:02:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 068186141C for ; Wed, 21 Apr 2021 00:02:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234359AbhDUADX (ORCPT ); Tue, 20 Apr 2021 20:03:23 -0400 Received: from mail-pf1-f179.google.com ([209.85.210.179]:44727 "EHLO mail-pf1-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234482AbhDUADX (ORCPT ); Tue, 20 Apr 2021 20:03:23 -0400 Received: by mail-pf1-f179.google.com with SMTP id m11so26925449pfc.11 for ; Tue, 20 Apr 2021 17:02:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=prTDGgcjhAdE1so8UkKaPIj7GZ7j1PwKMzFJXTRVK/o=; b=fbCgGottZuMzDoMw4IxwRh/ZLo1fdU5qQ7+gtOTGLEGX4kZC7hECsngbUAiabq+UTO 5oznJ82T/CnD4eYME13CUh2opf/5j+7qRc3AvV0NREvG53cKsoV+KADJtHbEWmNHD5Gq 6InmD9MA1ICzBtiKkYWQckjexUHUV/00Lj/WyH28/IwrMy3dvC2wMEKJ6uKck//Wja1m YZxHxMvwkp5z1FVKmSPb9HLk9M11uqbbqrsMRw1INAvY8WOG0QsIRMwDPcSXTwSu2l6w P3rNWBVfwRY0FwVvlIC7AlVTA6W+MEwJbNBCyZBS9hsaauBKx7V5n1Rx0Cixu+ZhZvp6 mBAA== X-Gm-Message-State: AOAM5317xEswwMm6yJ+8u/iC6lLY9+IMFKbMEor1cgh7hL5C9t/BlvYV 54bfFc3IwPlVyVUH0ocz2Hg= X-Google-Smtp-Source: ABdhPJz/XGMN6I7BAdH4EoM4t2QWzEbI2ifvKmqCws4dvT6wHHmCGzwTyO6LklAiT/myK3O11/9stA== X-Received: by 2002:a62:2b03:0:b029:241:d147:2a79 with SMTP id r3-20020a622b030000b0290241d1472a79mr26904757pfr.53.1618963369827; Tue, 20 Apr 2021 17:02:49 -0700 (PDT) Received: from asus.hsd1.ca.comcast.net ([2601:647:4000:d7:6cb:4566:9005:c2af]) by smtp.gmail.com with ESMTPSA id 33sm149560pgq.21.2021.04.20.17.02.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:02:49 -0700 (PDT) From: Bart Van Assche To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Daniel Wagner , Bart Van Assche , Khazhy Kumykov , Shin'ichiro Kawasaki , "Martin K . Petersen" , Ming Lei , Hannes Reinecke , Johannes Thumshirn , John Garry Subject: [PATCH v7 5/5] blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags Date: Tue, 20 Apr 2021 17:02:35 -0700 Message-Id: <20210421000235.2028-6-bvanassche@acm.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210421000235.2028-1-bvanassche@acm.org> References: <20210421000235.2028-1-bvanassche@acm.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Serialize the tag set modifications performed by blk_mq_update_nr_hw_queues() and iterating over a tag set to prevent that iterating over a tag set crashes due to a tag set being examined while it is being modified. Reported-by: Khazhy Kumykov Reviewed-by: Khazhismel Kumykov Tested-by: Shin'ichiro Kawasaki Cc: Christoph Hellwig Cc: Martin K. Petersen Cc: Shin'ichiro Kawasaki Cc: Ming Lei Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: John Garry Cc: Khazhy Kumykov Signed-off-by: Bart Van Assche --- block/blk-mq-tag.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 39d5c9190a6b..b0e0f074a864 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -376,6 +376,31 @@ void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS); } +/* + * Iterate over all request queues in a tag set, find the first queue with a + * non-zero usage count, increment its usage count and return the pointer to + * that request queue. This prevents that blk_mq_update_nr_hw_queues() will + * modify @set->nr_hw_queues while iterating over tags since + * blk_mq_update_nr_hw_queues() only modifies @set->nr_hw_queues while the + * usage count of all queues associated with a tag set is zero. + */ +static struct request_queue * +blk_mq_get_any_tagset_queue(struct blk_mq_tag_set *set) +{ + struct request_queue *q; + + rcu_read_lock(); + list_for_each_entry_rcu(q, &set->tag_list, tag_set_list) { + if (percpu_ref_tryget(&q->q_usage_counter)) { + rcu_read_unlock(); + return q; + } + } + rcu_read_unlock(); + + return NULL; +} + /** * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set * @tagset: Tag set to iterate over. @@ -391,15 +416,22 @@ void blk_mq_all_tag_iter_atomic(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv) { + struct request_queue *q; int i; might_sleep(); + q = blk_mq_get_any_tagset_queue(tagset); + if (!q) + return; + for (i = 0; i < tagset->nr_hw_queues; i++) { if (tagset->tags && tagset->tags[i]) __blk_mq_all_tag_iter(tagset->tags[i], fn, priv, BT_TAG_ITER_STARTED | BT_TAG_ITER_MAY_SLEEP); } + + blk_queue_exit(q); } EXPORT_SYMBOL(blk_mq_tagset_busy_iter); @@ -418,13 +450,20 @@ EXPORT_SYMBOL(blk_mq_tagset_busy_iter); void blk_mq_tagset_busy_iter_atomic(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv) { + struct request_queue *q; int i; + q = blk_mq_get_any_tagset_queue(tagset); + if (!q) + return; + for (i = 0; i < tagset->nr_hw_queues; i++) { if (tagset->tags && tagset->tags[i]) __blk_mq_all_tag_iter(tagset->tags[i], fn, priv, BT_TAG_ITER_STARTED); } + + blk_queue_exit(q); } EXPORT_SYMBOL(blk_mq_tagset_busy_iter_atomic);