From patchwork Wed Dec 23 11:26:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63D79C433E0 for ; Wed, 23 Dec 2020 11:27:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33BA7222F9 for ; Wed, 23 Dec 2020 11:27:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728467AbgLWL1I (ORCPT ); Wed, 23 Dec 2020 06:27:08 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:53781 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728379AbgLWL1I (ORCPT ); Wed, 23 Dec 2020 06:27:08 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJXV94F_1608722784; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJXV94F_1608722784) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:24 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 1/7] block: move definition of blk_qc_t to types.h Date: Wed, 23 Dec 2020 19:26:18 +0800 Message-Id: <20201223112624.78955-2-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org So that kiocb.ki_cookie can be defined as blk_qc_t, which will enforce the encapsulation. Signed-off-by: Jeffle Xu Reviewed-by: Mike Snitzer --- include/linux/blk_types.h | 2 +- include/linux/fs.h | 2 +- include/linux/types.h | 3 +++ 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 866f74261b3b..2e05244fc16d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -532,7 +532,7 @@ static inline int op_stat_group(unsigned int op) return op_is_write(op); } -typedef unsigned int blk_qc_t; +/* Macros for blk_qc_t */ #define BLK_QC_T_NONE -1U #define BLK_QC_T_SHIFT 16 #define BLK_QC_T_INTERNAL (1U << 31) diff --git a/include/linux/fs.h b/include/linux/fs.h index ad4cf1bae586..58db714c4834 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -330,7 +330,7 @@ struct kiocb { u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ union { - unsigned int ki_cookie; /* for ->iopoll */ + blk_qc_t ki_cookie; /* for ->iopoll */ struct wait_page_queue *ki_waitq; /* for async buffered IO */ }; diff --git a/include/linux/types.h b/include/linux/types.h index a147977602b5..da5ca7e1bea9 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -125,6 +125,9 @@ typedef s64 int64_t; typedef u64 sector_t; typedef u64 blkcnt_t; +/* cookie used for IO polling */ +typedef unsigned int blk_qc_t; + /* * The type of an index into the pagecache. */ From patchwork Wed Dec 23 11:26:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D989C43333 for ; Wed, 23 Dec 2020 11:27:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 518A9223C8 for ; Wed, 23 Dec 2020 11:27:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728503AbgLWL1J (ORCPT ); Wed, 23 Dec 2020 06:27:09 -0500 Received: from out30-56.freemail.mail.aliyun.com ([115.124.30.56]:39417 "EHLO out30-56.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728435AbgLWL1I (ORCPT ); Wed, 23 Dec 2020 06:27:08 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R431e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJWwKPG_1608722785; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJWwKPG_1608722785) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:25 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 2/7] block: add helper function fetching gendisk from queue Date: Wed, 23 Dec 2020 19:26:19 +0800 Message-Id: <20201223112624.78955-3-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Sometimes we need to get the corresponding gendisk from request_queue. One such use case is that, the block device driver had ever stored the same private data both in queue->queuedata and gendisk->private_data, while nowadays gendisk->private_data is more preferable in such case, e.g. commit c4a59c4e5db3 ("dm: stop using ->queuedata"). So if only request_queue given, we need to get the corresponding gendisk from queue, to get the private data stored in gendisk. Signed-off-by: Jeffle Xu Signed-off-by: Jeffle Xu --- include/linux/blkdev.h | 2 ++ include/trace/events/kyber.h | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 070de09425ad..2303d06a5a82 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -691,6 +691,8 @@ static inline bool blk_account_rq(struct request *rq) dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \ (dir), (attrs)) +#define queue_to_disk(q) (dev_to_disk(kobj_to_dev((q)->kobj.parent))) + static inline bool queue_is_mq(struct request_queue *q) { return q->mq_ops; diff --git a/include/trace/events/kyber.h b/include/trace/events/kyber.h index c0e7d24ca256..f9802562edf6 100644 --- a/include/trace/events/kyber.h +++ b/include/trace/events/kyber.h @@ -30,7 +30,7 @@ TRACE_EVENT(kyber_latency, ), TP_fast_assign( - __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + __entry->dev = disk_devt(queue_to_disk(q)); strlcpy(__entry->domain, domain, sizeof(__entry->domain)); strlcpy(__entry->type, type, sizeof(__entry->type)); __entry->percentile = percentile; @@ -59,7 +59,7 @@ TRACE_EVENT(kyber_adjust, ), TP_fast_assign( - __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + __entry->dev = disk_devt(queue_to_disk(q)); strlcpy(__entry->domain, domain, sizeof(__entry->domain)); __entry->depth = depth; ), @@ -81,7 +81,7 @@ TRACE_EVENT(kyber_throttled, ), TP_fast_assign( - __entry->dev = disk_devt(dev_to_disk(kobj_to_dev(q->kobj.parent))); + __entry->dev = disk_devt(queue_to_disk(q)); strlcpy(__entry->domain, domain, sizeof(__entry->domain)); ), From patchwork Wed Dec 23 11:26:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72CA6C43381 for ; Wed, 23 Dec 2020 11:27:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3AD44223C8 for ; Wed, 23 Dec 2020 11:27:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728563AbgLWL1L (ORCPT ); Wed, 23 Dec 2020 06:27:11 -0500 Received: from out30-44.freemail.mail.aliyun.com ([115.124.30.44]:38159 "EHLO out30-44.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728455AbgLWL1L (ORCPT ); Wed, 23 Dec 2020 06:27:11 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJXS7XL_1608722785; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJXS7XL_1608722785) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:25 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 3/7] block: add iopoll method for non-mq device Date: Wed, 23 Dec 2020 19:26:20 +0800 Message-Id: <20201223112624.78955-4-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org ->poll_fn is introduced in commit ea435e1b9392 ("block: add a poll_fn callback to struct request_queue") for supporting non-mq queues such as nvme multipath, but removed in commit 529262d56dbe ("block: remove ->poll_fn"). To add support of IO polling for non-mq device, this method need to be back. Since commit c62b37d96b6e ("block: move ->make_request_fn to struct block_device_operations") has moved all callbacks into struct block_device_operations in gendisk, we also add the new method named ->iopoll in block_device_operations. Signed-off-by: Jeffle Xu --- block/blk-core.c | 79 ++++++++++++++++++++++++++++++++++++++++++ block/blk-mq.c | 70 +++++-------------------------------- include/linux/blk-mq.h | 3 ++ include/linux/blkdev.h | 1 + 4 files changed, 92 insertions(+), 61 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 96e5fcd7f071..2f5c51ce32e3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1131,6 +1131,85 @@ blk_qc_t submit_bio(struct bio *bio) } EXPORT_SYMBOL(submit_bio); +static bool blk_poll_hybrid(struct request_queue *q, blk_qc_t cookie) +{ + struct blk_mq_hw_ctx *hctx; + + /* TODO: bio-based device doesn't support hybrid poll. */ + if (!queue_is_mq(q)) + return false; + + hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)]; + if (blk_mq_poll_hybrid(q, hctx, cookie)) + return true; + + hctx->poll_considered++; + return false; +} + +/** + * blk_poll - poll for IO completions + * @q: the queue + * @cookie: cookie passed back at IO submission time + * @spin: whether to spin for completions + * + * Description: + * Poll for completions on the passed in queue. Returns number of + * completed entries found. If @spin is true, then blk_poll will continue + * looping until at least one completion is found, unless the task is + * otherwise marked running (or we need to reschedule). + */ +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) +{ + long state; + + if (!blk_qc_t_valid(cookie) || + !test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) + return 0; + + if (current->plug) + blk_flush_plug_list(current->plug, false); + + /* + * If we sleep, have the caller restart the poll loop to reset + * the state. Like for the other success return cases, the + * caller is responsible for checking if the IO completed. If + * the IO isn't complete, we'll get called again and will go + * straight to the busy poll loop. If specified not to spin, + * we also should not sleep. + */ + if (spin && blk_poll_hybrid(q, cookie)) + return 1; + + state = current->state; + do { + int ret; + struct gendisk *disk = queue_to_disk(q); + + if (disk->fops->iopoll) + ret = disk->fops->iopoll(q, cookie); + else + ret = blk_mq_poll(q, cookie); + if (ret > 0) { + __set_current_state(TASK_RUNNING); + return ret; + } + + if (signal_pending_state(state, current)) + __set_current_state(TASK_RUNNING); + + if (current->state == TASK_RUNNING) + return 1; + if (ret < 0 || !spin) + break; + cpu_relax(); + } while (!need_resched()); + + __set_current_state(TASK_RUNNING); + return 0; +} +EXPORT_SYMBOL_GPL(blk_poll); + /** * blk_cloned_rq_check_limits - Helper function to check a cloned request * for the new queue limits diff --git a/block/blk-mq.c b/block/blk-mq.c index b09ce00cc6af..85258958e9f1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3818,8 +3818,8 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, return true; } -static bool blk_mq_poll_hybrid(struct request_queue *q, - struct blk_mq_hw_ctx *hctx, blk_qc_t cookie) +bool blk_mq_poll_hybrid(struct request_queue *q, + struct blk_mq_hw_ctx *hctx, blk_qc_t cookie) { struct request *rq; @@ -3843,72 +3843,20 @@ static bool blk_mq_poll_hybrid(struct request_queue *q, return blk_mq_poll_hybrid_sleep(q, rq); } -/** - * blk_poll - poll for IO completions - * @q: the queue - * @cookie: cookie passed back at IO submission time - * @spin: whether to spin for completions - * - * Description: - * Poll for completions on the passed in queue. Returns number of - * completed entries found. If @spin is true, then blk_poll will continue - * looping until at least one completion is found, unless the task is - * otherwise marked running (or we need to reschedule). - */ -int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin) +int blk_mq_poll(struct request_queue *q, blk_qc_t cookie) { + int ret; struct blk_mq_hw_ctx *hctx; - long state; - - if (!blk_qc_t_valid(cookie) || - !test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) - return 0; - - if (current->plug) - blk_flush_plug_list(current->plug, false); hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)]; - /* - * If we sleep, have the caller restart the poll loop to reset - * the state. Like for the other success return cases, the - * caller is responsible for checking if the IO completed. If - * the IO isn't complete, we'll get called again and will go - * straight to the busy poll loop. If specified not to spin, - * we also should not sleep. - */ - if (spin && blk_mq_poll_hybrid(q, hctx, cookie)) - return 1; - - hctx->poll_considered++; + hctx->poll_invoked++; + ret = q->mq_ops->poll(hctx); + if (ret > 0) + hctx->poll_success++; - state = current->state; - do { - int ret; - - hctx->poll_invoked++; - - ret = q->mq_ops->poll(hctx); - if (ret > 0) { - hctx->poll_success++; - __set_current_state(TASK_RUNNING); - return ret; - } - - if (signal_pending_state(state, current)) - __set_current_state(TASK_RUNNING); - - if (current->state == TASK_RUNNING) - return 1; - if (ret < 0 || !spin) - break; - cpu_relax(); - } while (!need_resched()); - - __set_current_state(TASK_RUNNING); - return 0; + return ret; } -EXPORT_SYMBOL_GPL(blk_poll); unsigned int blk_mq_rq_cpu(struct request *rq) { diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 47b021952ac7..032e08ecd42e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -607,6 +607,9 @@ static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio, } blk_qc_t blk_mq_submit_bio(struct bio *bio); +int blk_mq_poll(struct request_queue *q, blk_qc_t cookie); +bool blk_mq_poll_hybrid(struct request_queue *q, + struct blk_mq_hw_ctx *hctx, blk_qc_t cookie); void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx, struct lock_class_key *key); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2303d06a5a82..e8965879eb90 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1845,6 +1845,7 @@ static inline void blk_ksm_unregister(struct request_queue *q) { } struct block_device_operations { blk_qc_t (*submit_bio) (struct bio *bio); + int (*iopoll)(struct request_queue *q, blk_qc_t cookie); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); From patchwork Wed Dec 23 11:26:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EC96C43381 for ; Wed, 23 Dec 2020 11:27:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0ECDF222F9 for ; Wed, 23 Dec 2020 11:27:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728516AbgLWL1J (ORCPT ); Wed, 23 Dec 2020 06:27:09 -0500 Received: from out30-45.freemail.mail.aliyun.com ([115.124.30.45]:59312 "EHLO out30-45.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728352AbgLWL1J (ORCPT ); Wed, 23 Dec 2020 06:27:09 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJXjw8Y_1608722785; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJXjw8Y_1608722785) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:26 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 4/7] block: define blk_qc_t as uintptr_t Date: Wed, 23 Dec 2020 19:26:21 +0800 Message-Id: <20201223112624.78955-5-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org To support iopoll for bio-based device, the returned cookie is actually a pointer to an implementation specific object, e.g. an object maintaining all split bios. In such case, blk_qc_t should be large enough to contain one pointer, for which uintptr_t is used here. Since uintptr_t is actually an integer type in essence, there's no need of type casting in the original mq path, while type casting is indeed needed in bio-based routine. Signed-off-by: Jeffle Xu --- include/linux/types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/types.h b/include/linux/types.h index da5ca7e1bea9..f6301014a459 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -126,7 +126,7 @@ typedef u64 sector_t; typedef u64 blkcnt_t; /* cookie used for IO polling */ -typedef unsigned int blk_qc_t; +typedef uintptr_t blk_qc_t; /* * The type of an index into the pagecache. From patchwork Wed Dec 23 11:26:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59E48C433E6 for ; Wed, 23 Dec 2020 11:27:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29DA2222F9 for ; Wed, 23 Dec 2020 11:27:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728556AbgLWL1K (ORCPT ); Wed, 23 Dec 2020 06:27:10 -0500 Received: from out30-45.freemail.mail.aliyun.com ([115.124.30.45]:44203 "EHLO out30-45.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728356AbgLWL1K (ORCPT ); Wed, 23 Dec 2020 06:27:10 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R221e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJXS7XR_1608722786; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJXS7XR_1608722786) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:26 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 5/7] dm: always return BLK_QC_T_NONE for bio-based device Date: Wed, 23 Dec 2020 19:26:22 +0800 Message-Id: <20201223112624.78955-6-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the returned cookie of bio-based device is not used at all. In the following patches, bio-based device will actually return a pointer to a specific object as the returned cookie. Signed-off-by: Jeffle Xu Reviewed-by: Mike Snitzer --- drivers/md/dm.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 5b2f371ec4bb..03c2b867acaa 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1252,14 +1252,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) } EXPORT_SYMBOL_GPL(dm_accept_partial_bio); -static blk_qc_t __map_bio(struct dm_target_io *tio) +static void __map_bio(struct dm_target_io *tio) { int r; sector_t sector; struct bio *clone = &tio->clone; struct dm_io *io = tio->io; struct dm_target *ti = tio->ti; - blk_qc_t ret = BLK_QC_T_NONE; clone->bi_end_io = clone_endio; @@ -1278,7 +1277,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio) case DM_MAPIO_REMAPPED: /* the bio has been remapped so dispatch it */ trace_block_bio_remap(clone, bio_dev(io->orig_bio), sector); - ret = submit_bio_noacct(clone); + submit_bio_noacct(clone); break; case DM_MAPIO_KILL: free_tio(tio); @@ -1292,8 +1291,6 @@ static blk_qc_t __map_bio(struct dm_target_io *tio) DMWARN("unimplemented target map return value: %d", r); BUG(); } - - return ret; } static void bio_setup_sector(struct bio *bio, sector_t sector, unsigned len) @@ -1380,7 +1377,7 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci, } } -static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci, +static void __clone_and_map_simple_bio(struct clone_info *ci, struct dm_target_io *tio, unsigned *len) { struct bio *clone = &tio->clone; @@ -1391,7 +1388,7 @@ static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci, if (len) bio_setup_sector(clone, ci->sector, *len); - return __map_bio(tio); + __map_bio(tio); } static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, @@ -1405,7 +1402,7 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, while ((bio = bio_list_pop(&blist))) { tio = container_of(bio, struct dm_target_io, clone); - (void) __clone_and_map_simple_bio(ci, tio, len); + __clone_and_map_simple_bio(ci, tio, len); } } @@ -1450,7 +1447,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, free_tio(tio); return r; } - (void) __map_bio(tio); + __map_bio(tio); return 0; } @@ -1565,11 +1562,10 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md, /* * Entry point to split a bio into clones and submit them to the targets. */ -static blk_qc_t __split_and_process_bio(struct mapped_device *md, +static void __split_and_process_bio(struct mapped_device *md, struct dm_table *map, struct bio *bio) { struct clone_info ci; - blk_qc_t ret = BLK_QC_T_NONE; int error = 0; init_clone_info(&ci, md, map, bio); @@ -1613,7 +1609,7 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, bio_chain(b, bio); trace_block_split(b, bio->bi_iter.bi_sector); - ret = submit_bio_noacct(bio); + submit_bio_noacct(bio); break; } } @@ -1621,13 +1617,11 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, /* drop the extra reference count */ dec_pending(ci.io, errno_to_blk_status(error)); - return ret; } static blk_qc_t dm_submit_bio(struct bio *bio) { struct mapped_device *md = bio->bi_disk->private_data; - blk_qc_t ret = BLK_QC_T_NONE; int srcu_idx; struct dm_table *map; @@ -1657,10 +1651,10 @@ static blk_qc_t dm_submit_bio(struct bio *bio) if (is_abnormal_io(bio)) blk_queue_split(&bio); - ret = __split_and_process_bio(md, map, bio); + __split_and_process_bio(md, map, bio); out: dm_put_live_table(md, srcu_idx); - return ret; + return BLK_QC_T_NONE; } /*----------------------------------------------------------------- From patchwork Wed Dec 23 11:26:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 429BFC433E9 for ; Wed, 23 Dec 2020 11:27:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0DDF8222F9 for ; Wed, 23 Dec 2020 11:27:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728525AbgLWL1K (ORCPT ); Wed, 23 Dec 2020 06:27:10 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:33741 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728362AbgLWL1K (ORCPT ); Wed, 23 Dec 2020 06:27:10 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJXV94N_1608722786; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJXV94N_1608722786) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:27 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 6/7] block: track cookies of split bios for bio-based device Date: Wed, 23 Dec 2020 19:26:23 +0800 Message-Id: <20201223112624.78955-7-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This is actuaaly the core when supporting iopoll for bio-based device. A list is maintained in the top bio (the original bio submitted to dm device), which is used to maintain all valid cookies of split bios. The IO polling routine will actually iterate this list and poll on corresponding hardware queues of the underlying mq devices. Signed-off-by: Jeffle Xu --- block/bio.c | 8 ++++ block/blk-core.c | 84 ++++++++++++++++++++++++++++++++++++++- include/linux/blk_types.h | 39 ++++++++++++++++++ 3 files changed, 129 insertions(+), 2 deletions(-) diff --git a/block/bio.c b/block/bio.c index 1f2cc1fbe283..ca6d1a7ee196 100644 --- a/block/bio.c +++ b/block/bio.c @@ -284,6 +284,10 @@ void bio_init(struct bio *bio, struct bio_vec *table, bio->bi_io_vec = table; bio->bi_max_vecs = max_vecs; + + INIT_LIST_HEAD(&bio->bi_plist); + INIT_LIST_HEAD(&bio->bi_pnode); + spin_lock_init(&bio->bi_plock); } EXPORT_SYMBOL(bio_init); @@ -689,6 +693,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) bio->bi_write_hint = bio_src->bi_write_hint; bio->bi_iter = bio_src->bi_iter; bio->bi_io_vec = bio_src->bi_io_vec; + bio->bi_root = bio_src->bi_root; bio_clone_blkg_association(bio, bio_src); blkcg_bio_issue_init(bio); @@ -1425,6 +1430,8 @@ void bio_endio(struct bio *bio) if (bio->bi_disk) rq_qos_done_bio(bio->bi_disk->queue, bio); + bio_del_poll_list(bio); + /* * Need to have a real endio function for chained bios, otherwise * various corner cases will break (like stacking block devices that @@ -1446,6 +1453,7 @@ void bio_endio(struct bio *bio) blk_throtl_bio_endio(bio); /* release cgroup info */ bio_uninit(bio); + if (bio->bi_end_io) bio->bi_end_io(bio); } diff --git a/block/blk-core.c b/block/blk-core.c index 2f5c51ce32e3..5a332af01939 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -960,12 +960,31 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio) { struct bio_list bio_list_on_stack[2]; blk_qc_t ret = BLK_QC_T_NONE; + bool iopoll; + struct bio *root; BUG_ON(bio->bi_next); bio_list_init(&bio_list_on_stack[0]); current->bio_list = bio_list_on_stack; + iopoll = test_bit(QUEUE_FLAG_POLL, &bio->bi_disk->queue->queue_flags); + iopoll = iopoll && (bio->bi_opf & REQ_HIPRI); + + if (iopoll) { + bio->bi_root = root = bio; + /* + * We need to pin root bio here since there's a reference from + * the returned cookie. bio_get() is not enough since the whole + * bio and the corresponding kiocb/dio may have already + * completed and thus won't call blk_poll() at all, in which + * case the pairing bio_put() in blk_bio_poll() won't be called. + * The side effect of bio_inc_remaining() is that, the whole bio + * won't complete until blk_poll() called. + */ + bio_inc_remaining(root); + } + do { struct request_queue *q = bio->bi_disk->queue; struct bio_list lower, same; @@ -979,7 +998,18 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio) bio_list_on_stack[1] = bio_list_on_stack[0]; bio_list_init(&bio_list_on_stack[0]); - ret = __submit_bio(bio); + if (iopoll) { + /* See the comments of above bio_inc_remaining(). */ + bio_inc_remaining(bio); + bio->bi_cookie = __submit_bio(bio); + + if (blk_qc_t_valid(bio->bi_cookie)) + bio_add_poll_list(bio); + + bio_endio(bio); + } else { + ret = __submit_bio(bio); + } /* * Sort new bios into those for a lower level and those for the @@ -1002,7 +1032,11 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio) } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); current->bio_list = NULL; - return ret; + + if (iopoll) + return (blk_qc_t)root; + + return BLK_QC_T_NONE; } static blk_qc_t __submit_bio_noacct_mq(struct bio *bio) @@ -1131,6 +1165,52 @@ blk_qc_t submit_bio(struct bio *bio) } EXPORT_SYMBOL(submit_bio); +int blk_bio_poll(struct request_queue *q, blk_qc_t cookie) +{ + int ret = 0; + struct bio *bio, *root = (struct bio*)cookie; + + if (list_empty(&root->bi_plist)) { + bio_endio(root); + return 1; + } + + spin_lock(&root->bi_plock); + bio = list_first_entry_or_null(&root->bi_plist, struct bio, bi_pnode); + + while (bio) { + struct request_queue *q = bio->bi_disk->queue; + blk_qc_t cookie = bio->bi_cookie; + + spin_unlock(&root->bi_plock); + BUG_ON(!blk_qc_t_valid(cookie)); + + ret += blk_mq_poll(q, cookie); + + spin_lock(&root->bi_plock); + /* + * One blk_mq_poll() call could complete multiple bios, and + * thus multiple bios could be removed from root->bi_plock + * list. + */ + bio = list_first_entry_or_null(&root->bi_plist, struct bio, bi_pnode); + } + + spin_unlock(&root->bi_plock); + + if (list_empty(&root->bi_plist)) { + bio_endio(root); + /* + * 'ret' may be 0 here. root->bi_plist may be empty once we + * acquire the list spinlock. + */ + ret = max(ret, 1); + } + + return ret; +} +EXPORT_SYMBOL(blk_bio_poll); + static bool blk_poll_hybrid(struct request_queue *q, blk_qc_t cookie) { struct blk_mq_hw_ctx *hctx; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 2e05244fc16d..2cf5d8f0ea34 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -277,6 +277,12 @@ struct bio { struct bio_set *bi_pool; + struct bio *bi_root; /* original bio of submit_bio() */ + struct list_head bi_plist; + struct list_head bi_pnode; + struct spinlock bi_plock; + blk_qc_t bi_cookie; + /* * We can inline a number of vecs at the end of the bio, to avoid * double allocations for a small number of bio_vecs. This member @@ -557,6 +563,39 @@ static inline bool blk_qc_t_is_internal(blk_qc_t cookie) return (cookie & BLK_QC_T_INTERNAL) != 0; } +static inline void bio_add_poll_list(struct bio *bio) +{ + struct bio *root = bio->bi_root; + + /* + * The spin_lock() variant is enough since bios in root->bi_plist are + * all enqueued into polling mode hardware queue, thus the list_del() + * operation is handled only in process context. + */ + spin_lock(&root->bi_plock); + list_add_tail(&bio->bi_pnode, &root->bi_plist); + spin_unlock(&root->bi_plock); +} + +static inline void bio_del_poll_list(struct bio *bio) +{ + struct bio *root = bio->bi_root; + + /* + * bios in mq routine: @bi_root is NULL, @bi_cookie is 0; + * bios in bio-based routine: @bi_root is non-NULL, @bi_cookie is valid + * (including 0) for those in root->bi_plist, invalid for the + * remaining. + */ + if (bio->bi_root && blk_qc_t_valid(bio->bi_cookie)) { + spin_lock(&root->bi_plock); + list_del(&bio->bi_pnode); + spin_unlock(&root->bi_plock); + } +} + +int blk_bio_poll(struct request_queue *q, blk_qc_t cookie); + struct blk_rq_stat { u64 mean; u64 min; From patchwork Wed Dec 23 11:26:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 11988239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04CD9C4332E for ; Wed, 23 Dec 2020 11:27:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD3A6222F9 for ; Wed, 23 Dec 2020 11:27:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728366AbgLWL1N (ORCPT ); Wed, 23 Dec 2020 06:27:13 -0500 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:58222 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728529AbgLWL1M (ORCPT ); Wed, 23 Dec 2020 06:27:12 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UJXmaAa_1608722787; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0UJXmaAa_1608722787) by smtp.aliyun-inc.com(127.0.0.1); Wed, 23 Dec 2020 19:26:27 +0800 From: Jeffle Xu To: snitzer@redhat.com Cc: linux-block@vger.kernel.org, dm-devel@redhat.com, io-uring@vger.kernel.org Subject: [PATCH RFC 7/7] dm: add support for IO polling Date: Wed, 23 Dec 2020 19:26:24 +0800 Message-Id: <20201223112624.78955-8-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20201223112624.78955-1-jefflexu@linux.alibaba.com> References: <20201223112624.78955-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Enable iopoll when all underlying target devices supports iopoll. Signed-off-by: Jeffle Xu --- drivers/md/dm-table.c | 28 ++++++++++++++++++++++++++++ drivers/md/dm.c | 1 + 2 files changed, 29 insertions(+) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 188f41287f18..b0cd5bf58c3c 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1791,6 +1791,31 @@ static bool dm_table_requires_stable_pages(struct dm_table *t) return false; } +static int device_supports_poll(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return q && test_bit(QUEUE_FLAG_POLL, &q->queue_flags); +} + +static bool dm_table_supports_poll(struct dm_table *t) +{ + struct dm_target *ti; + unsigned int i; + + /* Ensure that all targets support iopoll. */ + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + + if (!ti->type->iterate_devices || + !ti->type->iterate_devices(ti, device_supports_poll, NULL)) + return false; + } + + return true; +} + void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { @@ -1883,6 +1908,9 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, #endif blk_queue_update_readahead(q); + + if (dm_table_supports_poll(t)) + blk_queue_flag_set(QUEUE_FLAG_POLL, q); } unsigned int dm_table_get_num_targets(struct dm_table *t) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 03c2b867acaa..ffd2c5ead256 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -3049,6 +3049,7 @@ static const struct pr_ops dm_pr_ops = { }; static const struct block_device_operations dm_blk_dops = { + .iopoll = blk_bio_poll, .submit_bio = dm_submit_bio, .open = dm_blk_open, .release = dm_blk_close,