From patchwork Mon Jul 17 04:00:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13315171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B0C2EB64DC for ; Mon, 17 Jul 2023 04:01:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230515AbjGQEBy (ORCPT ); Mon, 17 Jul 2023 00:01:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230396AbjGQEBp (ORCPT ); Mon, 17 Jul 2023 00:01:45 -0400 Received: from out-5.mta0.migadu.com (out-5.mta0.migadu.com [IPv6:2001:41d0:1004:224b::5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FD5EE46 for ; Sun, 16 Jul 2023 21:01:43 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689566501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pFXxdBNitgVGeb7jBJprwYc2SHfcR9ZpmuG2rofSs+Y=; b=aS4UxvHW74aNMeRLZvydKEju8QtYZX1rq7sUBQyrdw3ssRMah0U3Ms/FhSKdxr95m8UelC vvhsbRDT0jZse4zpCCnho2PyeLd7OduBgtBI3TyYuXW/WKAQWkpD/lHoT66Hx8WAHO8ulG zwfulZcJCFUrhltkkZkq/xBWyVjG3LQ= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, bvanassche@acm.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, zhouchengming@bytedance.com Subject: [PATCH v4 1/4] blk-mq: use percpu csd to remote complete instead of per-rq csd Date: Mon, 17 Jul 2023 12:00:55 +0800 Message-ID: <20230717040058.3993930-2-chengming.zhou@linux.dev> In-Reply-To: <20230717040058.3993930-1-chengming.zhou@linux.dev> References: <20230717040058.3993930-1-chengming.zhou@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Chengming Zhou If request need to be completed remotely, we insert it into percpu llist, and smp_call_function_single_async() if llist is empty previously. We don't need to use per-rq csd, percpu csd is enough. And the size of struct request is decreased by 24 bytes. This way is cleaner, and looks correct, given block softirq is guaranteed to be scheduled to consume the list if one new request is added to this percpu list, either smp_call_function_single_async() returns -EBUSY or 0. Signed-off-by: Chengming Zhou Reviewed-by: Ming Lei Reviewed-by: Christoph Hellwig --- block/blk-mq.c | 12 ++++++------ include/linux/blk-mq.h | 5 +---- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index d50b1d62a3d9..d98654869615 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -43,6 +43,7 @@ #include "blk-ioprio.h" static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); +static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd); static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); static void blk_mq_request_bypass_insert(struct request *rq, @@ -1157,15 +1158,11 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq) static void blk_mq_complete_send_ipi(struct request *rq) { - struct llist_head *list; unsigned int cpu; cpu = rq->mq_ctx->cpu; - list = &per_cpu(blk_cpu_done, cpu); - if (llist_add(&rq->ipi_list, list)) { - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); - smp_call_function_single_async(cpu, &rq->csd); - } + if (llist_add(&rq->ipi_list, &per_cpu(blk_cpu_done, cpu))) + smp_call_function_single_async(cpu, &per_cpu(blk_cpu_csd, cpu)); } static void blk_mq_raise_softirq(struct request *rq) @@ -4829,6 +4826,9 @@ static int __init blk_mq_init(void) for_each_possible_cpu(i) init_llist_head(&per_cpu(blk_cpu_done, i)); + for_each_possible_cpu(i) + INIT_CSD(&per_cpu(blk_cpu_csd, i), + __blk_mq_complete_request_remote, NULL); open_softirq(BLOCK_SOFTIRQ, blk_done_softirq); cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD, diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index b96e00499f9e..67f810857634 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -182,10 +182,7 @@ struct request { rq_end_io_fn *saved_end_io; } flush; - union { - struct __call_single_data csd; - u64 fifo_time; - }; + u64 fifo_time; /* * completion callback. From patchwork Mon Jul 17 04:00:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13315172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 199D8C001DC for ; Mon, 17 Jul 2023 04:01:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230214AbjGQEBz (ORCPT ); Mon, 17 Jul 2023 00:01:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230406AbjGQEBv (ORCPT ); Mon, 17 Jul 2023 00:01:51 -0400 Received: from out-54.mta0.migadu.com (out-54.mta0.migadu.com [91.218.175.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7060E4E for ; Sun, 16 Jul 2023 21:01:45 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689566504; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XhULMMQy5GVi2OU0Ad6+wHzsWWfxFQNSA1oYnzcztlU=; b=lPOIEuBY1mU/WbJfF2rw4P8sXiym+C3cSXl0kNPZNRT90H6I/FvAu+UzHTynqe+69gD0KN cHfJK3HBQQC6Gh+LnCz5YdMsNCcw114I2TMNiaHhJD7j2R5/C0+sBirkICF2EV7/U3WbSe ghsduXyN6K2AawbReid6mQHSJ5U0dcM= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, bvanassche@acm.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, zhouchengming@bytedance.com Subject: [PATCH v4 2/4] blk-flush: fix rq->flush.seq for post-flush requests Date: Mon, 17 Jul 2023 12:00:56 +0800 Message-ID: <20230717040058.3993930-3-chengming.zhou@linux.dev> In-Reply-To: <20230717040058.3993930-1-chengming.zhou@linux.dev> References: <20230717040058.3993930-1-chengming.zhou@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Chengming Zhou If the policy == (REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH), it means that the data sequence and post-flush sequence need to be done for this request. The rq->flush.seq should record what sequences have been done (or don't need to be done). So in this case, pre-flush doesn't need to be done, we should init rq->flush.seq to REQ_FSEQ_PREFLUSH not REQ_FSEQ_POSTFLUSH. Fixes: 615939a2ae73 ("blk-mq: defer to the normal submission path for post-flush requests") Signed-off-by: Chengming Zhou Reviewed-by: Christoph Hellwig --- block/blk-flush.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index 8220517c2d67..fdc489e0ea16 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -443,7 +443,7 @@ bool blk_insert_flush(struct request *rq) * the post flush, and then just pass the command on. */ blk_rq_init_flush(rq); - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; + rq->flush.seq |= REQ_FSEQ_PREFLUSH; spin_lock_irq(&fq->mq_flush_lock); list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); spin_unlock_irq(&fq->mq_flush_lock); From patchwork Mon Jul 17 04:00:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13315173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6D5FEB64DC for ; Mon, 17 Jul 2023 04:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231152AbjGQECG (ORCPT ); Mon, 17 Jul 2023 00:02:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231140AbjGQEB4 (ORCPT ); Mon, 17 Jul 2023 00:01:56 -0400 Received: from out-44.mta0.migadu.com (out-44.mta0.migadu.com [91.218.175.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21071E6A for ; Sun, 16 Jul 2023 21:01:48 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689566506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vDpuVCWvXzou2B7OXWC1deh3uuUZ5lSIACnNEl46ljk=; b=gpKgExO2AR+T/Bj9kjyrPBzUoeybLo9OrrTg5UdcGIDGxeXfbzRBk9yEwPT5OaFDuNmzs/ G0ZGilzX2aVZLRlwQRBq6A8xKzQ3fhBTHG/jP+A3OzV4gShsa4bY4zJvPlVNVYWVECW1WF LZVjRGhopjLHLNIyAeglKcuZDCurIeY= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, bvanassche@acm.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, zhouchengming@bytedance.com Subject: [PATCH v4 3/4] blk-flush: count inflight flush_data requests Date: Mon, 17 Jul 2023 12:00:57 +0800 Message-ID: <20230717040058.3993930-4-chengming.zhou@linux.dev> In-Reply-To: <20230717040058.3993930-1-chengming.zhou@linux.dev> References: <20230717040058.3993930-1-chengming.zhou@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Chengming Zhou The flush state machine use a double list to link all inflight flush_data requests, to avoid issuing separate post-flushes for these flush_data requests which shared PREFLUSH. So we can't reuse rq->queuelist, this is why we need rq->flush.list In preparation of the next patch that reuse rq->queuelist for flush state machine, we change the double linked list to unsigned long counter, which count all inflight flush_data requests. This is ok since we only need to know if there is any inflight flush_data request, so unsigned long counter is good. Signed-off-by: Chengming Zhou Reviewed-by: Ming Lei Reviewed-by: Christoph Hellwig --- block/blk-flush.c | 9 +++++---- block/blk.h | 5 ++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index fdc489e0ea16..fedb39031647 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -187,7 +187,8 @@ static void blk_flush_complete_seq(struct request *rq, break; case REQ_FSEQ_DATA: - list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); + list_del_init(&rq->flush.list); + fq->flush_data_in_flight++; spin_lock(&q->requeue_lock); list_add(&rq->queuelist, &q->requeue_list); spin_unlock(&q->requeue_lock); @@ -299,7 +300,7 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, return; /* C2 and C3 */ - if (!list_empty(&fq->flush_data_in_flight) && + if (fq->flush_data_in_flight && time_before(jiffies, fq->flush_pending_since + FLUSH_PENDING_TIMEOUT)) return; @@ -374,6 +375,7 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq, * the comment in flush_end_io(). */ spin_lock_irqsave(&fq->mq_flush_lock, flags); + fq->flush_data_in_flight--; blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error); spin_unlock_irqrestore(&fq->mq_flush_lock, flags); @@ -445,7 +447,7 @@ bool blk_insert_flush(struct request *rq) blk_rq_init_flush(rq); rq->flush.seq |= REQ_FSEQ_PREFLUSH; spin_lock_irq(&fq->mq_flush_lock); - list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); + fq->flush_data_in_flight++; spin_unlock_irq(&fq->mq_flush_lock); return false; default: @@ -496,7 +498,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size, INIT_LIST_HEAD(&fq->flush_queue[0]); INIT_LIST_HEAD(&fq->flush_queue[1]); - INIT_LIST_HEAD(&fq->flush_data_in_flight); return fq; diff --git a/block/blk.h b/block/blk.h index 608c5dcc516b..686712e13835 100644 --- a/block/blk.h +++ b/block/blk.h @@ -15,15 +15,14 @@ struct elevator_type; extern struct dentry *blk_debugfs_root; struct blk_flush_queue { + spinlock_t mq_flush_lock; unsigned int flush_pending_idx:1; unsigned int flush_running_idx:1; blk_status_t rq_status; unsigned long flush_pending_since; struct list_head flush_queue[2]; - struct list_head flush_data_in_flight; + unsigned long flush_data_in_flight; struct request *flush_rq; - - spinlock_t mq_flush_lock; }; bool is_flush_rq(struct request *req); From patchwork Mon Jul 17 04:00:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13315174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7971EB64DC for ; Mon, 17 Jul 2023 04:02:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231250AbjGQECN (ORCPT ); Mon, 17 Jul 2023 00:02:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230502AbjGQECC (ORCPT ); Mon, 17 Jul 2023 00:02:02 -0400 Received: from out-17.mta0.migadu.com (out-17.mta0.migadu.com [91.218.175.17]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA0CE1B8 for ; Sun, 16 Jul 2023 21:01:51 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689566509; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L62IGS5yyrINSCSaBLEOOOMpycO+zZjyfsnojgz6mDg=; b=qxiNEW0Gf1Z9a7yqlE6MdCGCZ93ZTCLqy2TQPpQqsynOcXPGx0yCxEYYdteDz8U64+tFqt j32AmN9HvZC8brL+sUv/zp5ZGOv+3y7BTWhIZS2npJ1rnQcTdhTDJ1HJl3tLW9Mb5th/J5 p/DtxG4I7FIIn2nslGxhivLDtgi8DGI= From: chengming.zhou@linux.dev To: axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de, bvanassche@acm.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, zhouchengming@bytedance.com Subject: [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine Date: Mon, 17 Jul 2023 12:00:58 +0800 Message-ID: <20230717040058.3993930-5-chengming.zhou@linux.dev> In-Reply-To: <20230717040058.3993930-1-chengming.zhou@linux.dev> References: <20230717040058.3993930-1-chengming.zhou@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Chengming Zhou Since we don't need to maintain inflight flush_data requests list anymore, we can reuse rq->queuelist for flush pending list. Note in mq_flush_data_end_io(), we need to re-initialize rq->queuelist before reusing it in the state machine when end, since the rq->rq_next also reuse it, may have corrupted rq->queuelist by the driver. This patch decrease the size of struct request by 16 bytes. Signed-off-by: Chengming Zhou Reviewed-by: Christoph Hellwig Reviewed-by: Ming Lei --- block/blk-flush.c | 17 ++++++++++------- include/linux/blk-mq.h | 1 - 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index fedb39031647..e73dc22d05c1 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -183,14 +183,13 @@ static void blk_flush_complete_seq(struct request *rq, /* queue for flush */ if (list_empty(pending)) fq->flush_pending_since = jiffies; - list_move_tail(&rq->flush.list, pending); + list_move_tail(&rq->queuelist, pending); break; case REQ_FSEQ_DATA: - list_del_init(&rq->flush.list); fq->flush_data_in_flight++; spin_lock(&q->requeue_lock); - list_add(&rq->queuelist, &q->requeue_list); + list_move(&rq->queuelist, &q->requeue_list); spin_unlock(&q->requeue_lock); blk_mq_kick_requeue_list(q); break; @@ -202,7 +201,7 @@ static void blk_flush_complete_seq(struct request *rq, * flush data request completion path. Restore @rq for * normal completion and end it. */ - list_del_init(&rq->flush.list); + list_del_init(&rq->queuelist); blk_flush_restore_request(rq); blk_mq_end_request(rq, error); break; @@ -258,7 +257,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq, fq->flush_running_idx ^= 1; /* and push the waiting requests to the next stage */ - list_for_each_entry_safe(rq, n, running, flush.list) { + list_for_each_entry_safe(rq, n, running, queuelist) { unsigned int seq = blk_flush_cur_seq(rq); BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH); @@ -292,7 +291,7 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, { struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx]; struct request *first_rq = - list_first_entry(pending, struct request, flush.list); + list_first_entry(pending, struct request, queuelist); struct request *flush_rq = fq->flush_rq; /* C1 described at the top of this file */ @@ -376,6 +375,11 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq, */ spin_lock_irqsave(&fq->mq_flush_lock, flags); fq->flush_data_in_flight--; + /* + * May have been corrupted by rq->rq_next reuse, we need to + * re-initialize rq->queuelist before reusing it here. + */ + INIT_LIST_HEAD(&rq->queuelist); blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error); spin_unlock_irqrestore(&fq->mq_flush_lock, flags); @@ -386,7 +390,6 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq, static void blk_rq_init_flush(struct request *rq) { rq->flush.seq = 0; - INIT_LIST_HEAD(&rq->flush.list); rq->rq_flags |= RQF_FLUSH_SEQ; rq->flush.saved_end_io = rq->end_io; /* Usually NULL */ rq->end_io = mq_flush_data_end_io; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 67f810857634..01e8c31db665 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -178,7 +178,6 @@ struct request { struct { unsigned int seq; - struct list_head list; rq_end_io_fn *saved_end_io; } flush;