From patchwork Wed May 13 03:47:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA472159A for ; Wed, 13 May 2020 03:48:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1FFF206F5 for ; Wed, 13 May 2020 03:48:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="jWXYQO4M" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728649AbgEMDsa (ORCPT ); Tue, 12 May 2020 23:48:30 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:57560 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727107AbgEMDsa (ORCPT ); Tue, 12 May 2020 23:48:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341708; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zUZPLfWZn3fld2xI1a3x9XAmLNIS7ylvd5r6IkvTnfg=; b=jWXYQO4MnByPxP8xpJbaEEtzej2mHVwbfHEJpA6K1o3+TZ3cDwRijv7Kc5cjZ78XEtU2CS pVvzGI6EZIjWSfItq1tC7ygDRlyOgSHp4hYgymph16rDtcplM2weYfYw210YJm4TAk0Fg6 ZwUJyUnpIFFC7dJV0bXONt0oUSwjs3U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-285-4r1AASByObidgY0VU987OQ-1; Tue, 12 May 2020 23:48:26 -0400 X-MC-Unique: 4r1AASByObidgY0VU987OQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 173BB107ACCA; Wed, 13 May 2020 03:48:25 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F99A10013D9; Wed, 13 May 2020 03:48:14 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Mike Snitzer , dm-devel@redhat.com, Hannes Reinecke , "Martin K . Petersen" Subject: [PATCH V11 01/12] block: clone nr_integrity_segments and write_hint in blk_rq_prep_clone Date: Wed, 13 May 2020 11:47:52 +0800 Message-Id: <20200513034803.1844579-2-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org So far blk_rq_prep_clone() is only used for setup one underlying cloned request from dm-rq request. block intetrity can be enabled for both dm-rq and the underlying queues, so it is reasonable to clone rq's nr_integrity_segments. Also write_hint is from bio, it should have been cloned too. So clone nr_integrity_segments and write_hint in blk_rq_prep_clone. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Mike Snitzer Cc: dm-devel@redhat.com Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Signed-off-by: Ming Lei --- block/blk-core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index cf5b2163edfe..08ee92baa451 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1669,8 +1669,12 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src, rq->rq_flags |= RQF_SPECIAL_PAYLOAD; rq->special_vec = rq_src->special_vec; } +#ifdef CONFIG_BLK_DEV_INTEGRITY + rq->nr_integrity_segments = rq_src->nr_integrity_segments; +#endif rq->nr_phys_segments = rq_src->nr_phys_segments; rq->ioprio = rq_src->ioprio; + rq->write_hint = rq_src->write_hint; return 0; From patchwork Wed May 13 03:47:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544567 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE3DF1668 for ; Wed, 13 May 2020 03:48:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C70D420718 for ; Wed, 13 May 2020 03:48:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P8RQ4YNS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728651AbgEMDsn (ORCPT ); Tue, 12 May 2020 23:48:43 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:42595 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727107AbgEMDsn (ORCPT ); Tue, 12 May 2020 23:48:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341721; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vIAYDXBt9wwIZZ2W14gLNPcdGmjLM07bSq3AjCxVeVE=; b=P8RQ4YNSmlg/8uHTqxigR1z0Ezzm/4iHomBBgfJ7mczjjSA60P5THET0/5+S14IxWsWqaQ +i6SpPk/RfeBqtdowK/LNw0My0GUGwm0HeANh/+0TDfeluRqizrwTAK7ZNsKPH8DRLE4XE 6PXKb/PCtDCOdBfXJAS+v6Ag/MBVYMM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-226-h5eSeWcaNK2SRraL2O8TkQ-1; Tue, 12 May 2020 23:48:37 -0400 X-MC-Unique: h5eSeWcaNK2SRraL2O8TkQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6028D800687; Wed, 13 May 2020 03:48:36 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id E1BA25D9DD; Wed, 13 May 2020 03:48:27 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Mike Snitzer , dm-devel@redhat.com, Hannes Reinecke , "Martin K . Petersen" Subject: [PATCH V11 02/12] block: add helper for copying request Date: Wed, 13 May 2020 11:47:53 +0800 Message-Id: <20200513034803.1844579-3-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add one new helper of blk_rq_copy_request() to copy request, and the helper will be used in this patch for re-submitting request, so make it as a block layer internal helper. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: Mike Snitzer Cc: dm-devel@redhat.com Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Signed-off-by: Ming Lei --- block/blk-core.c | 31 ++++++++++++++++++------------- block/blk.h | 2 ++ 2 files changed, 20 insertions(+), 13 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 08ee92baa451..ffb1579fd4da 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1620,6 +1620,23 @@ void blk_rq_unprep_clone(struct request *rq) } EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); +void blk_rq_copy_request(struct request *rq, struct request *rq_src) +{ + /* Copy attributes of the original request to the clone request. */ + rq->__sector = blk_rq_pos(rq_src); + rq->__data_len = blk_rq_bytes(rq_src); + if (rq_src->rq_flags & RQF_SPECIAL_PAYLOAD) { + rq->rq_flags |= RQF_SPECIAL_PAYLOAD; + rq->special_vec = rq_src->special_vec; + } +#ifdef CONFIG_BLK_DEV_INTEGRITY + rq->nr_integrity_segments = rq_src->nr_integrity_segments; +#endif + rq->nr_phys_segments = rq_src->nr_phys_segments; + rq->ioprio = rq_src->ioprio; + rq->write_hint = rq_src->write_hint; +} + /** * blk_rq_prep_clone - Helper function to setup clone request * @rq: the request to be setup @@ -1662,19 +1679,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src, rq->bio = rq->biotail = bio; } - /* Copy attributes of the original request to the clone request. */ - rq->__sector = blk_rq_pos(rq_src); - rq->__data_len = blk_rq_bytes(rq_src); - if (rq_src->rq_flags & RQF_SPECIAL_PAYLOAD) { - rq->rq_flags |= RQF_SPECIAL_PAYLOAD; - rq->special_vec = rq_src->special_vec; - } -#ifdef CONFIG_BLK_DEV_INTEGRITY - rq->nr_integrity_segments = rq_src->nr_integrity_segments; -#endif - rq->nr_phys_segments = rq_src->nr_phys_segments; - rq->ioprio = rq_src->ioprio; - rq->write_hint = rq_src->write_hint; + blk_rq_copy_request(rq, rq_src); return 0; diff --git a/block/blk.h b/block/blk.h index e5cd350ca379..faf616cb0463 100644 --- a/block/blk.h +++ b/block/blk.h @@ -120,6 +120,8 @@ static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio, rq->rq_disk = bio->bi_disk; } +void blk_rq_copy_request(struct request *rq, struct request *rq_src); + #ifdef CONFIG_BLK_DEV_INTEGRITY void blk_flush_integrity(void); bool __bio_integrity_endio(struct bio *); From patchwork Wed May 13 03:47:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AF65A159A for ; Wed, 13 May 2020 03:48:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9864320718 for ; Wed, 13 May 2020 03:48:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PGWxl/Pr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728654AbgEMDss (ORCPT ); Tue, 12 May 2020 23:48:48 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:60002 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727107AbgEMDsr (ORCPT ); Tue, 12 May 2020 23:48:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341726; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cRN4fvBlbA692u6w3wONDOYbS3q4737Kd7pKANbh94s=; b=PGWxl/Pr3ZftcB2cpSqBgzmop3mXP6HvWR+nS+QbdyjID66khScIRkrdDFYsfEZSkT4Knm /MkBajOoyNKt8GQdJUUAfFgGGeVDquynq0g7TKtPcrZiUecnituGoEme17IJajiXKnPlg0 22fLBNIiOkEjJL5suP/RBAGsfa1INi4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-289-PuIq99YOOdSHPQoXOzc_5w-1; Tue, 12 May 2020 23:48:44 -0400 X-MC-Unique: PuIq99YOOdSHPQoXOzc_5w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6049C83DE6C; Wed, 13 May 2020 03:48:42 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC04175286; Wed, 13 May 2020 03:48:38 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , John Garry , Hannes Reinecke , "Martin K . Petersen" Subject: [PATCH V11 03/12] blk-mq: mark blk_mq_get_driver_tag as static Date: Wed, 13 May 2020 11:47:54 +0800 Message-Id: <20200513034803.1844579-4-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Now all callers of blk_mq_get_driver_tag are in blk-mq.c, so mark it as static. Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: John Garry Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen Signed-off-by: Ming Lei --- block/blk-mq.c | 2 +- block/blk-mq.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 9ee695bdf873..53c6e7678c14 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1015,7 +1015,7 @@ static inline unsigned int queued_to_index(unsigned int queued) return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); } -bool blk_mq_get_driver_tag(struct request *rq) +static bool blk_mq_get_driver_tag(struct request *rq) { struct blk_mq_alloc_data data = { .q = rq->q, diff --git a/block/blk-mq.h b/block/blk-mq.h index 10bfdfb494fa..e7d1da4b1f73 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -44,7 +44,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool); void blk_mq_add_to_requeue_list(struct request *rq, bool at_head, bool kick_requeue_list); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); -bool blk_mq_get_driver_tag(struct request *rq); struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *start); From patchwork Wed May 13 03:47:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544571 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D35561668 for ; Wed, 13 May 2020 03:48:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B697020718 for ; Wed, 13 May 2020 03:48:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hXOa8vVF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727790AbgEMDsy (ORCPT ); Tue, 12 May 2020 23:48:54 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:32763 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727107AbgEMDsy (ORCPT ); Tue, 12 May 2020 23:48:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341731; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xv5DqVnejg416O03SiKWNsVe+d+9iJXyGmze0RRSMnQ=; b=hXOa8vVF8c+TaR9n4UhCPZ76Dd3khpFLjaYPhE1uWtY0nNo2srRawPfpGqWtYUEnRty1pR M8m7tt9DJNxqicpMJEpwQGHCVupfuerbUZsJVA+P5JobWTfi8LEqeZ46W61LKaUDE5U1Uz sA5Igwv4rMPi0neXqzW55RmHzXsKnsM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-381-xstxSs7APAq0kren1P2r9A-1; Tue, 12 May 2020 23:48:49 -0400 X-MC-Unique: xstxSs7APAq0kren1P2r9A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3A3E61005510; Wed, 13 May 2020 03:48:48 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id C91CF10013D9; Wed, 13 May 2020 03:48:44 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , John Garry , Hannes Reinecke Subject: [PATCH V11 04/12] blk-mq: assign rq->tag in blk_mq_get_driver_tag Date: Wed, 13 May 2020 11:47:55 +0800 Message-Id: <20200513034803.1844579-5-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Especially for none elevator, rq->tag is assigned after the request is allocated, so there isn't any way to figure out if one request is in being dispatched. Also the code path wrt. driver tag becomes a bit difference between none and io scheduler. When one hctx becomes inactive, we have to prevent any request from being dispatched to LLD. And get driver tag provides one perfect chance to do that. Meantime we can drain any such requests by checking if rq->tag is assigned. So only assign rq->tag until blk_mq_get_driver_tag() is called. This way also simplifies code of dealing with driver tag a lot. Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: John Garry Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei --- block/blk-flush.c | 18 ++---------- block/blk-mq.c | 75 ++++++++++++++++++++++++----------------------- block/blk-mq.h | 21 +++++++------ block/blk.h | 5 ---- 4 files changed, 51 insertions(+), 68 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index c7f396e3d5e2..977edf95d711 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -236,13 +236,8 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) error = fq->rq_status; hctx = flush_rq->mq_hctx; - if (!q->elevator) { - blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq); - flush_rq->tag = -1; - } else { - blk_mq_put_driver_tag(flush_rq); - flush_rq->internal_tag = -1; - } + flush_rq->internal_tag = -1; + blk_mq_put_driver_tag(flush_rq); running = &fq->flush_queue[fq->flush_running_idx]; BUG_ON(fq->flush_pending_idx == fq->flush_running_idx); @@ -317,14 +312,7 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, flush_rq->mq_ctx = first_rq->mq_ctx; flush_rq->mq_hctx = first_rq->mq_hctx; - if (!q->elevator) { - fq->orig_rq = first_rq; - flush_rq->tag = first_rq->tag; - blk_mq_tag_set_rq(flush_rq->mq_hctx, first_rq->tag, flush_rq); - } else { - flush_rq->internal_tag = first_rq->internal_tag; - } - + flush_rq->internal_tag = first_rq->internal_tag; flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); flush_rq->rq_flags |= RQF_FLUSH_SEQ; diff --git a/block/blk-mq.c b/block/blk-mq.c index 53c6e7678c14..80d25e1d792a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -276,18 +276,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, struct request *rq = tags->static_rqs[tag]; req_flags_t rq_flags = 0; - if (data->flags & BLK_MQ_REQ_INTERNAL) { - rq->tag = -1; - rq->internal_tag = tag; - } else { - if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) { - rq_flags = RQF_MQ_INFLIGHT; - atomic_inc(&data->hctx->nr_active); - } - rq->tag = tag; - rq->internal_tag = -1; - data->hctx->tags->rqs[rq->tag] = rq; - } + rq->internal_tag = tag; + rq->tag = -1; /* csd/requeue_work/fifo_time is initialized before use */ rq->q = data->q; @@ -471,14 +461,18 @@ static void __blk_mq_free_request(struct request *rq) struct request_queue *q = rq->q; struct blk_mq_ctx *ctx = rq->mq_ctx; struct blk_mq_hw_ctx *hctx = rq->mq_hctx; - const int sched_tag = rq->internal_tag; blk_pm_mark_last_busy(rq); rq->mq_hctx = NULL; - if (rq->tag != -1) - blk_mq_put_tag(hctx->tags, ctx, rq->tag); - if (sched_tag != -1) - blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag); + + if (hctx->sched_tags) { + if (rq->tag >= 0) + blk_mq_put_tag(hctx->tags, ctx, rq->tag); + blk_mq_put_tag(hctx->sched_tags, ctx, rq->internal_tag); + } else { + blk_mq_put_tag(hctx->tags, ctx, rq->internal_tag); + } + blk_mq_sched_restart(hctx); blk_queue_exit(q); } @@ -526,7 +520,7 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) blk_stat_add(rq, now); } - if (rq->internal_tag != -1) + if (rq->q->elevator && rq->internal_tag != -1) blk_mq_sched_completed_request(rq, now); blk_account_io_done(rq, now); @@ -1015,33 +1009,40 @@ static inline unsigned int queued_to_index(unsigned int queued) return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); } -static bool blk_mq_get_driver_tag(struct request *rq) +static bool __blk_mq_get_driver_tag(struct request *rq) { struct blk_mq_alloc_data data = { - .q = rq->q, - .hctx = rq->mq_hctx, - .flags = BLK_MQ_REQ_NOWAIT, - .cmd_flags = rq->cmd_flags, + .q = rq->q, + .hctx = rq->mq_hctx, + .flags = BLK_MQ_REQ_NOWAIT, + .cmd_flags = rq->cmd_flags, }; - bool shared; - if (rq->tag != -1) - return true; + if (data.hctx->sched_tags) { + if (blk_mq_tag_is_reserved(data.hctx->sched_tags, + rq->internal_tag)) + data.flags |= BLK_MQ_REQ_RESERVED; + rq->tag = blk_mq_get_tag(&data); + } else { + rq->tag = rq->internal_tag; + } - if (blk_mq_tag_is_reserved(data.hctx->sched_tags, rq->internal_tag)) - data.flags |= BLK_MQ_REQ_RESERVED; + if (rq->tag == -1) + return false; - shared = blk_mq_tag_busy(data.hctx); - rq->tag = blk_mq_get_tag(&data); - if (rq->tag >= 0) { - if (shared) { - rq->rq_flags |= RQF_MQ_INFLIGHT; - atomic_inc(&data.hctx->nr_active); - } - data.hctx->tags->rqs[rq->tag] = rq; + if (blk_mq_tag_busy(data.hctx)) { + rq->rq_flags |= RQF_MQ_INFLIGHT; + atomic_inc(&data.hctx->nr_active); } + data.hctx->tags->rqs[rq->tag] = rq; + return true; +} - return rq->tag != -1; +static bool blk_mq_get_driver_tag(struct request *rq) +{ + if (rq->tag != -1) + return true; + return __blk_mq_get_driver_tag(rq); } static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, diff --git a/block/blk-mq.h b/block/blk-mq.h index e7d1da4b1f73..d0c72d7d07c8 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -196,26 +196,25 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) return true; } -static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, - struct request *rq) +static inline void blk_mq_put_driver_tag(struct request *rq) { - blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag); + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + int tag = rq->tag; + + if (tag < 0) + return; + rq->tag = -1; + if (hctx->sched_tags) + blk_mq_put_tag(hctx->tags, rq->mq_ctx, tag); + if (rq->rq_flags & RQF_MQ_INFLIGHT) { rq->rq_flags &= ~RQF_MQ_INFLIGHT; atomic_dec(&hctx->nr_active); } } -static inline void blk_mq_put_driver_tag(struct request *rq) -{ - if (rq->tag == -1 || rq->internal_tag == -1) - return; - - __blk_mq_put_driver_tag(rq->mq_hctx, rq); -} - static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) { int cpu; diff --git a/block/blk.h b/block/blk.h index faf616cb0463..002104739465 100644 --- a/block/blk.h +++ b/block/blk.h @@ -26,11 +26,6 @@ struct blk_flush_queue { struct list_head flush_data_in_flight; struct request *flush_rq; - /* - * flush_rq shares tag with this rq, both can't be active - * at the same time - */ - struct request *orig_rq; struct lock_class_key key; spinlock_t mq_flush_lock; }; From patchwork Wed May 13 03:47:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 512101668 for ; Wed, 13 May 2020 03:49:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 39D14206F5 for ; Wed, 13 May 2020 03:49:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bpn4YGSE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728461AbgEMDtA (ORCPT ); Tue, 12 May 2020 23:49:00 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:35672 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727107AbgEMDtA (ORCPT ); Tue, 12 May 2020 23:49:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uLqS0yyxOJBQJd9pqTgMlI3LiIt9PX4rxkrzPTZC2A4=; b=bpn4YGSEaLA/kff/pynVdRB8Jxmr/h6WOIEVrQI6XJ0c6+tadgQqWKhXroCh64nczXCKHo GokQot/nMoIKDhvl7jsQauI6Z71KiiVvXB7Fm+Gz9SoXt7TOUSF4hXWz5YrtUz2ksI8F1p WBKvN14pGJxptG/+aSO/5kvCX5sDULA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-223-1_4_wLChPti8Q6EeLwDKng-1; Tue, 12 May 2020 23:48:55 -0400 X-MC-Unique: 1_4_wLChPti8Q6EeLwDKng-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 13CA680183C; Wed, 13 May 2020 03:48:54 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E6265D9DD; Wed, 13 May 2020 03:48:50 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , John Garry Subject: [PATCH V11 05/12] blk-mq: add blk_mq_all_tag_iter Date: Wed, 13 May 2020 11:47:56 +0800 Message-Id: <20200513034803.1844579-6-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Now request is thought as in-flight only when its state is updated as MQ_RQ_IN_FLIGHT, which is done by driver via blk_mq_start_request(). Actually from blk-mq's view, one rq can be thought as in-flight after its tag is >= 0. Add one new function of blk_mq_all_tag_iter so that we can iterate every in-flight request, and this way is more flexible since caller can decide to handle request according to rq's state. Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Cc: John Garry Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig --- block/blk-mq-tag.c | 33 +++++++++++++++++++++++++++++---- block/blk-mq-tag.h | 2 ++ 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 586c9d6e904a..c27c6dfc7d36 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -257,6 +257,7 @@ struct bt_tags_iter_data { busy_tag_iter_fn *fn; void *data; bool reserved; + bool iterate_all; }; static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) @@ -274,7 +275,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) * test and set the bit before assining ->rqs[]. */ rq = tags->rqs[bitnr]; - if (rq && blk_mq_request_started(rq)) + if (rq && (iter_data->iterate_all || blk_mq_request_started(rq))) return iter_data->fn(rq, iter_data->data, reserved); return true; @@ -294,13 +295,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) * bitmap_tags member of struct blk_mq_tags. */ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt, - busy_tag_iter_fn *fn, void *data, bool reserved) + busy_tag_iter_fn *fn, void *data, bool reserved, + bool iterate_all) { struct bt_tags_iter_data iter_data = { .tags = tags, .fn = fn, .data = data, .reserved = reserved, + .iterate_all = iterate_all, }; if (tags->rqs) @@ -321,8 +324,30 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv) { if (tags->nr_reserved_tags) - bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true); - bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false); + bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true, + false); + bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false, false); +} + +/** + * blk_mq_all_tag_iter - iterate over all requests in a tag map + * @tags: Tag map to iterate over. + * @fn: Pointer to the function that will be called for each + * request. @fn will be called as follows: @fn(rq, @priv, + * reserved) where rq is a pointer to a request. 'reserved' + * indicates whether or not @rq is a reserved request. Return + * true to continue iterating tags, false to stop. + * @priv: Will be passed as second argument to @fn. + * + * It is the caller's responsibility to check rq's state in @fn. + */ +void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, + void *priv) +{ + if (tags->nr_reserved_tags) + bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true, + true); + bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false, true); } /** diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 2b8321efb682..d19546e8246b 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -34,6 +34,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv); +void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, + void *priv); static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt, struct blk_mq_hw_ctx *hctx) From patchwork Wed May 13 03:47:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544575 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 652B0159A for ; Wed, 13 May 2020 03:49:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45D85206F5 for ; Wed, 13 May 2020 03:49:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dDdVDZhQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728659AbgEMDtF (ORCPT ); Tue, 12 May 2020 23:49:05 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:42245 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727107AbgEMDtF (ORCPT ); Tue, 12 May 2020 23:49:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uMlfvdd6j+RsGqVPSrETLw02Tr70Uk22g3B7pOaK5bs=; b=dDdVDZhQhD8TdgrlzWH9lNREc740lrRIGhigJeD67vKYdLgG3+by75ukCGts5nbbJwMouk lSIvOrhGD0FOXHfqb+RcEj63s8rWHrfNg23+BQk5ebp9pZjfz15KWMD7RsFPUKogeC5nqt Zo3gA+iq2Szuorlz3PQZ6+D6xARc1qI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-199-sIHFMS03OMC8vX1y52N9zA-1; Tue, 12 May 2020 23:49:01 -0400 X-MC-Unique: sIHFMS03OMC8vX1y52N9zA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1D146100960F; Wed, 13 May 2020 03:49:00 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id AE82E60C47; Wed, 13 May 2020 03:48:56 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner Subject: [PATCH V11 06/12] blk-mq: prepare for draining IO when hctx's all CPUs are offline Date: Wed, 13 May 2020 11:47:57 +0800 Message-Id: <20200513034803.1844579-7-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: " That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again. " However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is one cpuhp state handled after the CPU is down, so there isn't any chance to quiesce hctx for blk-mq wrt. CPU hotplug. Add new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for blk-mq to stop queues and wait for completion of in-flight requests. We will stop hw queue and wait for completion of in-flight requests when one hctx is becoming dead in the following patch. This way may cause dead-lock for some stacking blk-mq drivers, such as dm-rq and loop. Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and loop, so we needn't to wait for completion of in-flight requests from dm-rq & loop, then the potential dead-lock can be avoided. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke --- block/blk-mq-debugfs.c | 1 + block/blk-mq.c | 19 +++++++++++++++++++ drivers/block/loop.c | 2 +- drivers/md/dm-rq.c | 2 +- include/linux/blk-mq.h | 4 ++++ include/linux/cpuhotplug.h | 1 + 6 files changed, 27 insertions(+), 2 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 96b7a35c898a..ddec58743e88 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -239,6 +239,7 @@ static const char *const hctx_flag_name[] = { HCTX_FLAG_NAME(TAG_SHARED), HCTX_FLAG_NAME(BLOCKING), HCTX_FLAG_NAME(NO_SCHED), + HCTX_FLAG_NAME(NO_MANAGED_IRQ), }; #undef HCTX_FLAG_NAME diff --git a/block/blk-mq.c b/block/blk-mq.c index 80d25e1d792a..25d2cbe9c716 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2300,6 +2300,16 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, return -ENOMEM; } +static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) +{ + return 0; +} + +static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) +{ + return 0; +} + /* * 'cpu' is going away. splice any existing rq_list entries from this * software queue to the hw queue dispatch list, and ensure that it @@ -2336,6 +2346,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx) { + if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ)) + cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE, + &hctx->cpuhp_online); cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); } @@ -2395,6 +2408,9 @@ static int blk_mq_init_hctx(struct request_queue *q, { hctx->queue_num = hctx_idx; + if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ)) + cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE, + &hctx->cpuhp_online); cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); hctx->tags = set->tags[hctx_idx]; @@ -3649,6 +3665,9 @@ static int __init blk_mq_init(void) { cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, blk_mq_hctx_notify_dead); + cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online", + blk_mq_hctx_notify_online, + blk_mq_hctx_notify_offline); return 0; } subsys_initcall(blk_mq_init); diff --git a/drivers/block/loop.c b/drivers/block/loop.c index da693e6a834e..784f2e038b55 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2037,7 +2037,7 @@ static int loop_add(struct loop_device **l, int i) lo->tag_set.queue_depth = 128; lo->tag_set.numa_node = NUMA_NO_NODE; lo->tag_set.cmd_size = sizeof(struct loop_cmd); - lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; + lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ; lo->tag_set.driver_data = lo; err = blk_mq_alloc_tag_set(&lo->tag_set); diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 3f8577e2c13b..5f1ff70ac029 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t) md->tag_set->ops = &dm_mq_ops; md->tag_set->queue_depth = dm_get_blk_mq_queue_depth(); md->tag_set->numa_node = md->numa_node_id; - md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE; + md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ; md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues(); md->tag_set->driver_data = md; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d7307795439a..ddd2cb6ed21c 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -140,6 +140,8 @@ struct blk_mq_hw_ctx { */ atomic_t nr_active; + /** @cpuhp_online: List to store request if CPU is going to die */ + struct hlist_node cpuhp_online; /** @cpuhp_dead: List to store request if some CPU die. */ struct hlist_node cpuhp_dead; /** @kobj: Kernel object for sysfs. */ @@ -391,6 +393,8 @@ struct blk_mq_ops { enum { BLK_MQ_F_SHOULD_MERGE = 1 << 0, BLK_MQ_F_TAG_SHARED = 1 << 1, + /* set when device's interrupt affinity isn't managed by kernel */ + BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2, BLK_MQ_F_BLOCKING = 1 << 5, BLK_MQ_F_NO_SCHED = 1 << 6, BLK_MQ_F_ALLOC_POLICY_START_BIT = 8, diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 77d70b633531..24b3a77810b6 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -152,6 +152,7 @@ enum cpuhp_state { CPUHP_AP_SMPBOOT_THREADS, CPUHP_AP_X86_VDSO_VMA_ONLINE, CPUHP_AP_IRQ_AFFINITY_ONLINE, + CPUHP_AP_BLK_MQ_ONLINE, CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS, CPUHP_AP_X86_INTEL_EPB_ONLINE, CPUHP_AP_PERF_ONLINE, From patchwork Wed May 13 03:47:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544577 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 098F8159A for ; Wed, 13 May 2020 03:49:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3272206F5 for ; Wed, 13 May 2020 03:49:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Z9DuQUZO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727107AbgEMDtM (ORCPT ); Tue, 12 May 2020 23:49:12 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:60603 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726550AbgEMDtM (ORCPT ); Tue, 12 May 2020 23:49:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ryC6FEiwS3u60/b6l8JZoioXhfj2tV4lRb7eP7HuQHs=; b=Z9DuQUZOJRi3RCX3JRujsup2h8Q1buzlU8FgR29dwU05o/QPQv8iMMw3O1H2m1P4e42C/S hYbMp8JEe5N7PRH4axVDVvjBN7yXzgrhUXkbvtvBIV4Uo58Yjni8DBz8Xow+HZNWm9oFBZ V0/3C5YLtzGYUw0uvwpmybMeCb/R+kA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-58-kHVIbkk3NaWkuJHG6w8jDg-1; Tue, 12 May 2020 23:49:07 -0400 X-MC-Unique: kHVIbkk3NaWkuJHG6w8jDg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CF445461; Wed, 13 May 2020 03:49:05 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 653ED7D8ED; Wed, 13 May 2020 03:49:02 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner Subject: [PATCH V11 07/12] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Date: Wed, 13 May 2020 11:47:58 +0800 Message-Id: <20200513034803.1844579-8-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Before one CPU becomes offline, check if it is the last online CPU of hctx. If yes, mark this hctx as inactive, meantime wait for completion of all in-flight IOs originated from this hctx. Meantime check if this hctx has become inactive in blk_mq_get_driver_tag(), if yes, release the allocated tag. This way guarantees that there isn't any inflight IO before shutdowning the managed IRQ line when all CPUs of this IRQ line is offline. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Signed-off-by: Ming Lei --- block/blk-mq-debugfs.c | 1 + block/blk-mq.c | 117 +++++++++++++++++++++++++++++++++++++---- include/linux/blk-mq.h | 3 ++ 3 files changed, 110 insertions(+), 11 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index ddec58743e88..dc66cb689d2f 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -213,6 +213,7 @@ static const char *const hctx_state_name[] = { HCTX_STATE_NAME(STOPPED), HCTX_STATE_NAME(TAG_ACTIVE), HCTX_STATE_NAME(SCHED_RESTART), + HCTX_STATE_NAME(INACTIVE), }; #undef HCTX_STATE_NAME diff --git a/block/blk-mq.c b/block/blk-mq.c index 25d2cbe9c716..171bbf2fbc56 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1038,11 +1038,36 @@ static bool __blk_mq_get_driver_tag(struct request *rq) return true; } -static bool blk_mq_get_driver_tag(struct request *rq) +static bool blk_mq_get_driver_tag(struct request *rq, bool direct_issue) { if (rq->tag != -1) return true; - return __blk_mq_get_driver_tag(rq); + + if (!__blk_mq_get_driver_tag(rq)) + return false; + /* + * In case that direct issue IO process is migrated to other CPU + * which may not belong to this hctx, add one memory barrier so we + * can order driver tag assignment and checking BLK_MQ_S_INACTIVE. + * Otherwise, barrier() is enough given both setting BLK_MQ_S_INACTIVE + * and driver tag assignment are run on the same CPU because + * BLK_MQ_S_INACTIVE is only set after the last CPU of this hctx is + * becoming offline. + * + * Process migration might happen after the check on current processor + * id, smp_mb() is implied by processor migration, so no need to worry + * about it. + */ + if (unlikely(direct_issue && rq->mq_ctx->cpu != raw_smp_processor_id())) + smp_mb(); + else + barrier(); + + if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &rq->mq_hctx->state))) { + blk_mq_put_driver_tag(rq); + return false; + } + return true; } static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, @@ -1091,7 +1116,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, * Don't clear RESTART here, someone else could have set it. * At most this will cost an extra queue run. */ - return blk_mq_get_driver_tag(rq); + return blk_mq_get_driver_tag(rq, false); } wait = &hctx->dispatch_wait; @@ -1117,7 +1142,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, * allocation failure and adding the hardware queue to the wait * queue. */ - ret = blk_mq_get_driver_tag(rq); + ret = blk_mq_get_driver_tag(rq, false); if (!ret) { spin_unlock(&hctx->dispatch_wait_lock); spin_unlock_irq(&wq->lock); @@ -1232,7 +1257,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, break; } - if (!blk_mq_get_driver_tag(rq)) { + if (!blk_mq_get_driver_tag(rq, false)) { /* * The initial allocation attempt failed, so we need to * rerun the hardware queue when a tag is freed. The @@ -1264,7 +1289,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bd.last = true; else { nxt = list_first_entry(list, struct request, queuelist); - bd.last = !blk_mq_get_driver_tag(nxt); + bd.last = !blk_mq_get_driver_tag(nxt, false); } ret = q->mq_ops->queue_rq(hctx, &bd); @@ -1891,7 +1916,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (!blk_mq_get_dispatch_budget(hctx)) goto insert; - if (!blk_mq_get_driver_tag(rq)) { + if (!blk_mq_get_driver_tag(rq, true)) { blk_mq_put_dispatch_budget(hctx); goto insert; } @@ -2300,13 +2325,80 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, return -ENOMEM; } -static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) +struct count_inflight_data { + unsigned count; + struct blk_mq_hw_ctx *hctx; +}; + +static bool blk_mq_count_inflight_rq(struct request *rq, void *data, + bool reserved) { - return 0; + struct count_inflight_data *count_data = data; + + /* + * Can't check rq's state because it is updated to MQ_RQ_IN_FLIGHT + * in blk_mq_start_request(), at that time we can't prevent this rq + * from being issued. + * + * So check if driver tag is assigned, if yes, count this rq as + * inflight. + */ + if (rq->tag >= 0 && rq->mq_hctx == count_data->hctx) + count_data->count++; + + return true; +} + +static unsigned blk_mq_tags_inflight_rqs(struct blk_mq_hw_ctx *hctx) +{ + struct count_inflight_data count_data = { + .hctx = hctx, + }; + + blk_mq_all_tag_iter(hctx->tags, blk_mq_count_inflight_rq, &count_data); + return count_data.count; +} + +static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu, + struct blk_mq_hw_ctx *hctx) +{ + if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu) + return false; + if (cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids) + return false; + return true; } static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) { + struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, + struct blk_mq_hw_ctx, cpuhp_online); + + if (!cpumask_test_cpu(cpu, hctx->cpumask)) + return 0; + + if (!blk_mq_last_cpu_in_hctx(cpu, hctx)) + return 0; + + /* + * Order setting BLK_MQ_S_INACTIVE versus checking rq->tag and rqs[tag], + * in blk_mq_tags_inflight_rqs. It pairs with the smp_mb() in + * blk_mq_get_driver_tag. + */ + set_bit(BLK_MQ_S_INACTIVE, &hctx->state); + smp_mb__after_atomic(); + while (blk_mq_tags_inflight_rqs(hctx)) + msleep(5); + return 0; +} + +static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) +{ + struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, + struct blk_mq_hw_ctx, cpuhp_online); + + if (cpumask_test_cpu(cpu, hctx->cpumask)) + clear_bit(BLK_MQ_S_INACTIVE, &hctx->state); return 0; } @@ -2317,12 +2409,15 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) */ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) { - struct blk_mq_hw_ctx *hctx; + struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, + struct blk_mq_hw_ctx, cpuhp_dead); struct blk_mq_ctx *ctx; LIST_HEAD(tmp); enum hctx_type type; - hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); + if (!cpumask_test_cpu(cpu, hctx->cpumask)) + return 0; + ctx = __blk_mq_get_ctx(hctx->queue, cpu); type = hctx->type; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index ddd2cb6ed21c..c2ea0a6e5b56 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -404,6 +404,9 @@ enum { BLK_MQ_S_TAG_ACTIVE = 1, BLK_MQ_S_SCHED_RESTART = 2, + /* hw queue is inactive after all its CPUs become offline */ + BLK_MQ_S_INACTIVE = 3, + BLK_MQ_MAX_DEPTH = 10240, BLK_MQ_CPU_WORK_BATCH = 8, From patchwork Wed May 13 03:47:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544581 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D75A17EA for ; Wed, 13 May 2020 03:49:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2651E207BC for ; Wed, 13 May 2020 03:49:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hxZXiFc8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728669AbgEMDtT (ORCPT ); Tue, 12 May 2020 23:49:19 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:33554 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726550AbgEMDtT (ORCPT ); Tue, 12 May 2020 23:49:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341757; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rd6+amCIzl+hP4fRgrg9M4OifLZpwuR6RSp6cM8jhEk=; b=hxZXiFc8Ba6KIXLw/S0ym0J27nVTaU/vNFRHMSSwdRIBPDb5mvgPsbnl7QSvoG/APHexUP mmFJiQKuEVT/ZhPqrhjO26IZ5keqMv6hUE7kBnXifk66+LGK16btWQ/X8DMWnUveZV9qg6 GDMghAp+C9gGE5Hg8pEO7K59DDlMwgg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-388-q71_r4QCMgm64yCvjmyL5w-1; Tue, 12 May 2020 23:49:13 -0400 X-MC-Unique: q71_r4QCMgm64yCvjmyL5w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CFB8A184040A; Wed, 13 May 2020 03:49:11 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 749D26E6E0; Wed, 13 May 2020 03:49:08 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner Subject: [PATCH V11 08/12] block: add blk_end_flush_machinery Date: Wed, 13 May 2020 11:47:59 +0800 Message-Id: <20200513034803.1844579-9-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Flush requests aren't same with normal FS request: 1) one dedicated per-hctx flush rq is pre-allocated for sending flush request 2) flush request si issued to hardware via one machinary so that flush merge can be applied We can't simply re-submit flush rqs via blk_steal_bios(), so add blk_end_flush_machinery to collect flush requests which needs to be resubmitted: - if one flush command without DATA is enough, send one flush, complete this kind of requests - otherwise, add the request into a list and let caller re-submit it. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei --- block/blk-flush.c | 123 +++++++++++++++++++++++++++++++++++++++++++--- block/blk.h | 4 ++ 2 files changed, 120 insertions(+), 7 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index 977edf95d711..745d878697ed 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -170,10 +170,11 @@ static void blk_flush_complete_seq(struct request *rq, unsigned int cmd_flags; BUG_ON(rq->flush.seq & seq); - rq->flush.seq |= seq; + if (!error) + rq->flush.seq |= seq; cmd_flags = rq->cmd_flags; - if (likely(!error)) + if (likely(!error && !fq->flush_queue_terminating)) seq = blk_flush_cur_seq(rq); else seq = REQ_FSEQ_DONE; @@ -200,9 +201,15 @@ static void blk_flush_complete_seq(struct request *rq, * normal completion and end it. */ BUG_ON(!list_empty(&rq->queuelist)); - list_del_init(&rq->flush.list); - blk_flush_restore_request(rq); - blk_mq_end_request(rq, error); + + /* Terminating code will end the request from flush queue */ + if (likely(!fq->flush_queue_terminating)) { + list_del_init(&rq->flush.list); + blk_flush_restore_request(rq); + blk_mq_end_request(rq, error); + } else { + list_move_tail(&rq->flush.list, pending); + } break; default: @@ -279,7 +286,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, struct request *flush_rq = fq->flush_rq; /* C1 described at the top of this file */ - if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending)) + if (fq->flush_pending_idx != fq->flush_running_idx || + list_empty(pending) || fq->flush_queue_terminating) return; /* C2 and C3 @@ -331,7 +339,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error) struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx); if (q->elevator) { - WARN_ON(rq->tag < 0); + WARN_ON(rq->tag < 0 && !fq->flush_queue_terminating); blk_mq_put_driver_tag(rq); } @@ -503,3 +511,104 @@ void blk_free_flush_queue(struct blk_flush_queue *fq) kfree(fq->flush_rq); kfree(fq); } + +static void __blk_end_queued_flush(struct blk_flush_queue *fq, + unsigned int queue_idx, struct list_head *resubmit_list, + struct list_head *flush_list) +{ + struct list_head *queue = &fq->flush_queue[queue_idx]; + struct request *rq, *nxt; + + list_for_each_entry_safe(rq, nxt, queue, flush.list) { + unsigned int seq = blk_flush_cur_seq(rq); + + list_del_init(&rq->flush.list); + blk_flush_restore_request(rq); + if (!blk_rq_sectors(rq) || seq == REQ_FSEQ_POSTFLUSH ) + list_add_tail(&rq->queuelist, flush_list); + else + list_add_tail(&rq->queuelist, resubmit_list); + } +} + +static void blk_end_queued_flush(struct blk_flush_queue *fq, + struct list_head *resubmit_list, struct list_head *flush_list) +{ + unsigned long flags; + + spin_lock_irqsave(&fq->mq_flush_lock, flags); + __blk_end_queued_flush(fq, 0, resubmit_list, flush_list); + __blk_end_queued_flush(fq, 1, resubmit_list, flush_list); + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); +} + +/* complete requests which just requires one flush command */ +static void blk_complete_flush_requests(struct blk_flush_queue *fq, + struct list_head *flush_list) +{ + struct block_device *bdev; + struct request *rq; + int error = -ENXIO; + + if (list_empty(flush_list)) + return; + + rq = list_first_entry(flush_list, struct request, queuelist); + + /* Send flush via one active hctx so we can move on */ + bdev = bdget_disk(rq->rq_disk, 0); + if (bdev) { + error = blkdev_issue_flush(bdev, GFP_KERNEL, NULL); + bdput(bdev); + } + + while (!list_empty(flush_list)) { + rq = list_first_entry(flush_list, struct request, queuelist); + list_del_init(&rq->queuelist); + blk_mq_end_request(rq, error); + } +} + +/* + * Called when this hctx is inactive and all CPUs of this hctx is dead, + * otherwise don't reuse this function. + * + * Terminate this hw queue's flush machinery, and try to complete flush + * IO requests if possible, such as any flush IO without data, or flush + * data IO in POSTFLUSH stage. Otherwise, add the flush IOs into @list + * and let caller to re-submit them. + */ +void blk_end_flush_machinery(struct blk_mq_hw_ctx *hctx, + struct list_head *in, struct list_head *out) +{ + LIST_HEAD(resubmit_list); + LIST_HEAD(flush_list); + struct blk_flush_queue *fq = hctx->fq; + struct request *rq, *nxt; + unsigned long flags; + + spin_lock_irqsave(&fq->mq_flush_lock, flags); + fq->flush_queue_terminating = 1; + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); + + /* End inflight flush requests */ + list_for_each_entry_safe(rq, nxt, in, queuelist) { + WARN_ON(!(rq->rq_flags & RQF_FLUSH_SEQ)); + list_del_init(&rq->queuelist); + rq->end_io(rq, BLK_STS_AGAIN); + } + + /* End queued requests */ + blk_end_queued_flush(fq, &resubmit_list, &flush_list); + + /* Send flush and complete requests which just need one flush req */ + blk_complete_flush_requests(fq, &flush_list); + + spin_lock_irqsave(&fq->mq_flush_lock, flags); + /* reset flush queue so that it is ready to work next time */ + fq->flush_pending_idx = fq->flush_running_idx = 0; + fq->flush_queue_terminating = 0; + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); + + list_splice_init(&resubmit_list, out); +} diff --git a/block/blk.h b/block/blk.h index 002104739465..79aaf976d15d 100644 --- a/block/blk.h +++ b/block/blk.h @@ -20,6 +20,7 @@ struct blk_flush_queue { unsigned int flush_queue_delayed:1; unsigned int flush_pending_idx:1; unsigned int flush_running_idx:1; + unsigned int flush_queue_terminating:1; blk_status_t rq_status; unsigned long flush_pending_since; struct list_head flush_queue[2]; @@ -453,4 +454,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +void blk_end_flush_machinery(struct blk_mq_hw_ctx *hctx, + struct list_head *in, struct list_head *out); + #endif /* BLK_INTERNAL_H */ From patchwork Wed May 13 03:48:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544579 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D3041668 for ; Wed, 13 May 2020 03:49:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03E8F205ED for ; Wed, 13 May 2020 03:49:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g7+WeNIm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728672AbgEMDtT (ORCPT ); Tue, 12 May 2020 23:49:19 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:36085 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728669AbgEMDtT (ORCPT ); Tue, 12 May 2020 23:49:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341758; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YFwNFbvN8lXiw5io30W1wc29b7T/2QA1xOMLcFcuIUc=; b=g7+WeNImJz7eghFb38HlxkhgQ0PKQJ6yNDvubCAwvqWi4bfJGOsGDEVX5IJxtR0E0OVhSP Z3wcJcR3qsUOSEiCXQwshCXY+tOrYbPm/edJXrA2eOlSjgoY2wisjiTjt9kCvTYvdtTdTo KXzno+i+iG6XCqoXp3+unK9h37Fv7mo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-219-0AjD-BFxPK2C3Agk4poDGg-1; Tue, 12 May 2020 23:49:16 -0400 X-MC-Unique: 0AjD-BFxPK2C3Agk4poDGg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B8DE1107ACCA; Wed, 13 May 2020 03:49:14 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1AE716A960; Wed, 13 May 2020 03:49:13 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner Subject: [PATCH V11 09/12] blk-mq: add blk_mq_hctx_handle_dead_cpu for handling cpu dead Date: Wed, 13 May 2020 11:48:00 +0800 Message-Id: <20200513034803.1844579-10-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add helper of blk_mq_hctx_handle_dead_cpu for handling cpu dead, and prepare for handling inactive hctx. No functional change. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Reviewed-by: Hannes Reinecke Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig --- block/blk-mq.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 171bbf2fbc56..7c640482fb24 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2402,22 +2402,13 @@ static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) return 0; } -/* - * 'cpu' is going away. splice any existing rq_list entries from this - * software queue to the hw queue dispatch list, and ensure that it - * gets run. - */ -static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) +static void blk_mq_hctx_handle_dead_cpu(struct blk_mq_hw_ctx *hctx, + unsigned int cpu) { - struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, - struct blk_mq_hw_ctx, cpuhp_dead); struct blk_mq_ctx *ctx; LIST_HEAD(tmp); enum hctx_type type; - if (!cpumask_test_cpu(cpu, hctx->cpumask)) - return 0; - ctx = __blk_mq_get_ctx(hctx->queue, cpu); type = hctx->type; @@ -2429,13 +2420,27 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) spin_unlock(&ctx->lock); if (list_empty(&tmp)) - return 0; + return; spin_lock(&hctx->lock); list_splice_tail_init(&tmp, &hctx->dispatch); spin_unlock(&hctx->lock); blk_mq_run_hw_queue(hctx, true); +} + +/* + * 'cpu' is going away. splice any existing rq_list entries from this + * software queue to the hw queue dispatch list, and ensure that it + * gets run. + */ +static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) +{ + struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, + struct blk_mq_hw_ctx, cpuhp_dead); + + if (cpumask_test_cpu(cpu, hctx->cpumask)) + blk_mq_hctx_handle_dead_cpu(hctx, cpu); return 0; } From patchwork Wed May 13 03:48:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544585 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E90CA1668 for ; Wed, 13 May 2020 03:49:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2AA020718 for ; Wed, 13 May 2020 03:49:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gHvcOs5p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728680AbgEMDtZ (ORCPT ); Tue, 12 May 2020 23:49:25 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:33573 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726550AbgEMDtZ (ORCPT ); Tue, 12 May 2020 23:49:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341764; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I/R2lR/AT07am7BztQhuIhR00xaUfYSL/TEZVE3OysM=; b=gHvcOs5pBzGBHaRWGtjnKaAFdCG+KnYWnCigwPFexhIYCvwSDD2+CxF7Qfrubbe+VsrWnA SlZB4VtW7pL7et+OSqqdl3Mtj/fX8mNsZuwrBvRHKcOAkczyIGWNgzD8o7F9TMS+fEMwmg I4ufGTkhPqxhUPoDrSh6gYOIKEOivv0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-164-U-_4LdnvPiWdP2ymD8UUbQ-1; Tue, 12 May 2020 23:49:22 -0400 X-MC-Unique: U-_4LdnvPiWdP2ymD8UUbQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F1A93461; Wed, 13 May 2020 03:49:20 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 76CC25D9E5; Wed, 13 May 2020 03:49:17 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner Subject: [PATCH V11 10/12] block: add request allocation flag of BLK_MQ_REQ_FORCE Date: Wed, 13 May 2020 11:48:01 +0800 Message-Id: <20200513034803.1844579-11-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When one hctx becomes inactive, there may be requests allocated from this hctx, we can't queue them to the inactive hctx, one approach is to re-submit them via one active hctx. However, the request queue may have been started to freeze, and allocating request becomes not possible. Add flag of BLK_MQ_REQ_FORCE to allow block layer to allocate request in this case becasue the queue won't be frozen completely before all requests allocated from inactive hctx are completed. The similar approach has been applied in commit 8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done"). This way can help on other request dependency case too, such as, storage device side has some problem, and IO request can't be queued to device successfully, and passthrough request is required to fix the device problem. If queue freeze just comes before allocating passthrough request, hang will be triggered in queue freeze process, IO process and context for allocating passthrough request forever. See commit 01e99aeca397 ("blk-mq: insert passthrough request into hctx->dispatch directly") for background of this kind of issue. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Signed-off-by: Ming Lei --- block/blk-core.c | 5 +++++ include/linux/blk-mq.h | 7 +++++++ 2 files changed, 12 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index ffb1579fd4da..82be15c1fde4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -430,6 +430,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) if (success) return 0; + if (flags & BLK_MQ_REQ_FORCE) { + percpu_ref_get(&q->q_usage_counter); + return 0; + } + if (flags & BLK_MQ_REQ_NOWAIT) return -EBUSY; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index c2ea0a6e5b56..7d7aa5305a67 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -448,6 +448,13 @@ enum { BLK_MQ_REQ_INTERNAL = (__force blk_mq_req_flags_t)(1 << 2), /* set RQF_PREEMPT */ BLK_MQ_REQ_PREEMPT = (__force blk_mq_req_flags_t)(1 << 3), + + /* + * force to allocate request and caller has to make sure queue + * won't be frozen completely during allocation, and this flag + * is only applied after queue freeze is started + */ + BLK_MQ_REQ_FORCE = (__force blk_mq_req_flags_t)(1 << 4), }; struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, From patchwork Wed May 13 03:48:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544587 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8A341668 for ; Wed, 13 May 2020 03:49:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B158D206F5 for ; Wed, 13 May 2020 03:49:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a5y2f0vK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726550AbgEMDtf (ORCPT ); Tue, 12 May 2020 23:49:35 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:37307 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728692AbgEMDtf (ORCPT ); Tue, 12 May 2020 23:49:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341773; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Uy9ng9J/9IM+YQnMXypI0xMgHqWb5jb/uhDT7d6eVMo=; b=a5y2f0vKwGmq+HTdeKn9qN/O0NbRgupDxpvDZGxSmSK7iY0Aari197K0i4ISh+q6BWKUly X5vnk/Bt72AM5Pd6HwWrvNZMJnMScxh1hubcMjJcB0OZJhuvg+Hvmt3eq75N3NBed2b28x S7DD9Jl+invr8rots8IjbXallAR/A8w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-489-aM2dvNq6M8GyJctDPXCziw-1; Tue, 12 May 2020 23:49:28 -0400 X-MC-Unique: aM2dvNq6M8GyJctDPXCziw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E4DE4107ACCD; Wed, 13 May 2020 03:49:26 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9AB1E6A960; Wed, 13 May 2020 03:49:23 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner Subject: [PATCH V11 11/12] blk-mq: re-submit IO in case that hctx is inactive Date: Wed, 13 May 2020 11:48:02 +0800 Message-Id: <20200513034803.1844579-12-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When all CPUs in one hctx are offline and this hctx becomes inactive, we shouldn't run this hw queue for completing request any more. So allocate request from one live hctx, and clone & resubmit the request, either it is from sw queue or scheduler queue. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Signed-off-by: Ming Lei --- block/blk-mq.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 111 insertions(+), 5 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 7c640482fb24..c9a3e48a1ebc 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2402,6 +2402,109 @@ static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) return 0; } +static void blk_mq_resubmit_end_rq(struct request *rq, blk_status_t error) +{ + struct request *orig_rq = rq->end_io_data; + + blk_mq_cleanup_rq(orig_rq); + blk_mq_end_request(orig_rq, error); + + blk_put_request(rq); +} + +static void blk_mq_resubmit_rq(struct request *rq) +{ + struct request *nrq; + unsigned int flags = 0; + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + struct blk_mq_tags *tags = rq->q->elevator ? hctx->sched_tags : + hctx->tags; + bool reserved = blk_mq_tag_is_reserved(tags, rq->internal_tag); + + if (rq->rq_flags & RQF_PREEMPT) + flags |= BLK_MQ_REQ_PREEMPT; + if (reserved) + flags |= BLK_MQ_REQ_RESERVED; + /* + * Queue freezing might be in-progress, and wait freeze can't be + * done now because we have request not completed yet, so mark this + * allocation as BLK_MQ_REQ_FORCE for avoiding this allocation & + * freeze hung forever. + */ + flags |= BLK_MQ_REQ_FORCE; + + /* avoid allocation failure by clearing NOWAIT */ + nrq = blk_get_request(rq->q, rq->cmd_flags & ~REQ_NOWAIT, flags); + if (!nrq) + return; + + blk_rq_copy_request(nrq, rq); + + nrq->timeout = rq->timeout; + nrq->rq_disk = rq->rq_disk; + nrq->part = rq->part; + + memcpy(blk_mq_rq_to_pdu(nrq), blk_mq_rq_to_pdu(rq), + rq->q->tag_set->cmd_size); + + nrq->end_io = blk_mq_resubmit_end_rq; + nrq->end_io_data = rq; + nrq->bio = rq->bio; + nrq->biotail = rq->biotail; + + /* bios ownership has been transfered to new request */ + rq->bio = rq->biotail = NULL; + rq->__data_len = 0; + + if (blk_insert_cloned_request(nrq->q, nrq) != BLK_STS_OK) + blk_mq_request_bypass_insert(nrq, false, true); +} + +static void blk_mq_hctx_deactivate(struct blk_mq_hw_ctx *hctx) +{ + LIST_HEAD(sched); + LIST_HEAD(re_submit); + LIST_HEAD(flush_in); + LIST_HEAD(flush_out); + struct request *rq, *nxt; + struct elevator_queue *e = hctx->queue->elevator; + + if (!e) { + blk_mq_flush_busy_ctxs(hctx, &re_submit); + } else { + while ((rq = e->type->ops.dispatch_request(hctx))) { + if (rq->mq_hctx != hctx) + list_add(&rq->queuelist, &sched); + else + list_add(&rq->queuelist, &re_submit); + } + } + while (!list_empty(&sched)) { + rq = list_first_entry(&sched, struct request, queuelist); + list_del_init(&rq->queuelist); + blk_mq_sched_insert_request(rq, true, true, true); + } + + /* requests in dispatch list have to be re-submitted too */ + spin_lock(&hctx->lock); + list_splice_tail_init(&hctx->dispatch, &re_submit); + spin_unlock(&hctx->lock); + + /* blk_end_flush_machinery will cover flush request */ + list_for_each_entry_safe(rq, nxt, &re_submit, queuelist) { + if (rq->rq_flags & RQF_FLUSH_SEQ) + list_move(&rq->queuelist, &flush_in); + } + blk_end_flush_machinery(hctx, &flush_in, &flush_out); + list_splice_tail(&flush_out, &re_submit); + + while (!list_empty(&re_submit)) { + rq = list_first_entry(&re_submit, struct request, queuelist); + list_del_init(&rq->queuelist); + blk_mq_resubmit_rq(rq); + } +} + static void blk_mq_hctx_handle_dead_cpu(struct blk_mq_hw_ctx *hctx, unsigned int cpu) { @@ -2430,17 +2533,20 @@ static void blk_mq_hctx_handle_dead_cpu(struct blk_mq_hw_ctx *hctx, } /* - * 'cpu' is going away. splice any existing rq_list entries from this - * software queue to the hw queue dispatch list, and ensure that it - * gets run. + * @cpu has gone away. If this hctx is inactive, we can't dispatch request + * to the hctx any more, so clone and re-submit requests from this hctx */ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) { struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); - if (cpumask_test_cpu(cpu, hctx->cpumask)) - blk_mq_hctx_handle_dead_cpu(hctx, cpu); + if (cpumask_test_cpu(cpu, hctx->cpumask)) { + if (test_bit(BLK_MQ_S_INACTIVE, &hctx->state)) + blk_mq_hctx_deactivate(hctx); + else + blk_mq_hctx_handle_dead_cpu(hctx, cpu); + } return 0; } From patchwork Wed May 13 03:48:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 11544589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9076C159A for ; Wed, 13 May 2020 03:49:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 782B420718 for ; Wed, 13 May 2020 03:49:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Bo2Qw9pL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728693AbgEMDtk (ORCPT ); Tue, 12 May 2020 23:49:40 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:49071 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728692AbgEMDtj (ORCPT ); Tue, 12 May 2020 23:49:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589341778; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pEMInoM2lu0EeNIl5HpQaYfK+bDyEr3VKJtd1ijPTP8=; b=Bo2Qw9pLj2wvxgof11LXqlS3l/5M260n5jQjaM2FJJPgfl6GWx3SHfarjZqjRqDN/NhV4Z XxlfBk8yoQI1qldMCBd81Nr6OfN3LwT/588xwzv9u1SezpZGnqzIyXAdK0hJHKaGfbZQfm TZwk9FKcp1JRAr+TZRtk63HxED6pLTc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-45-iQdk0fS1NUapLCFcSGDE2Q-1; Tue, 12 May 2020 23:49:34 -0400 X-MC-Unique: iQdk0fS1NUapLCFcSGDE2Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5DC80800687; Wed, 13 May 2020 03:49:33 +0000 (UTC) Received: from localhost (ovpn-12-166.pek2.redhat.com [10.72.12.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A5463A0; Wed, 13 May 2020 03:49:29 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , John Garry , Bart Van Assche , Hannes Reinecke , Christoph Hellwig , Thomas Gleixner , Hannes Reinecke Subject: [PATCH V11 12/12] block: deactivate hctx when the hctx is actually inactive Date: Wed, 13 May 2020 11:48:03 +0800 Message-Id: <20200513034803.1844579-13-ming.lei@redhat.com> In-Reply-To: <20200513034803.1844579-1-ming.lei@redhat.com> References: <20200513034803.1844579-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Run queue on dead CPU still may be triggered in some corner case, such as one request is requeued after CPU hotplug is handled. So handle this corner case during run queue. Cc: John Garry Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Thomas Gleixner Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei --- block/blk-mq.c | 30 ++++++++++-------------------- 1 file changed, 10 insertions(+), 20 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index c9a3e48a1ebc..0f1afa2ab5e3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -43,6 +43,8 @@ static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); +static void blk_mq_hctx_deactivate(struct blk_mq_hw_ctx *hctx); + static int blk_mq_poll_stats_bkt(const struct request *rq) { int ddir, sectors, bucket; @@ -1400,28 +1402,16 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) int srcu_idx; /* - * We should be running this queue from one of the CPUs that - * are mapped to it. - * - * There are at least two related races now between setting - * hctx->next_cpu from blk_mq_hctx_next_cpu() and running - * __blk_mq_run_hw_queue(): - * - * - hctx->next_cpu is found offline in blk_mq_hctx_next_cpu(), - * but later it becomes online, then this warning is harmless - * at all - * - * - hctx->next_cpu is found online in blk_mq_hctx_next_cpu(), - * but later it becomes offline, then the warning can't be - * triggered, and we depend on blk-mq timeout handler to - * handle dispatched requests to this hctx + * BLK_MQ_S_INACTIVE may not deal with some requeue corner case: + * one request is requeued after cpu unplug is handled, so check + * if the hctx is actually inactive. If yes, deactive it and + * re-submit all requests in the queue. */ if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && - cpu_online(hctx->next_cpu)) { - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", - raw_smp_processor_id(), - cpumask_empty(hctx->cpumask) ? "inactive": "active"); - dump_stack(); + cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) >= + nr_cpu_ids) { + blk_mq_hctx_deactivate(hctx); + return; } /*