From patchwork Thu Apr 27 12:44:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225458 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEA47C77B7C for ; Thu, 27 Apr 2023 12:45:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243758AbjD0MpR (ORCPT ); Thu, 27 Apr 2023 08:45:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243510AbjD0MpQ (ORCPT ); Thu, 27 Apr 2023 08:45:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C01944A9 for ; Thu, 27 Apr 2023 05:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599465; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lzwxe1t0ZIgVn70/Tq0NK3rd+W9jKSUTR+PKsGNCcz0=; b=DEZBAv6q+FGH8Kob6BUrgZlPjLDiJTpQ7LZjaH9n87Aj58O/8OSWI/6PyXVq7xEifoi436 F7vrOVWDVyu2ucVc7B0E3Vy59FekkPWpURxOQirySaY1MRQukshP7iPODBPn+Et88IZbjg lzFAfplZ0CQjtRCZZSPwhZHvQzfpTLw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-478-wpOMRbXWP0Kq_B0-3GXdVw-1; Thu, 27 Apr 2023 08:44:24 -0400 X-MC-Unique: wpOMRbXWP0Kq_B0-3GXdVw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 88D5638123A7; Thu, 27 Apr 2023 12:44:23 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id B99FF14171B6; Thu, 27 Apr 2023 12:44:22 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei , Pavel Begunkov Subject: [PATCH V2 1/7] ublk: kill queuing request by task_work_add Date: Thu, 27 Apr 2023 20:44:08 +0800 Message-Id: <20230427124414.319945-2-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org task_work_add() is used from early ublk development stage for handling request in batch. However, since commit 7d4a93176e01 ("ublk_drv: don't forward io commands in reserve order"), we can get similar batch processing with io_uring_cmd_complete_in_task(), and similar performance data is observed between task_work_add() and io_uring_cmd_complete_in_task(). Meantime we can kill one fast code path, which is actually seldom used given it is common to build ublk driver as module. Cc: Pavel Begunkov Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 40 ++-------------------------------------- 1 file changed, 2 insertions(+), 38 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index afbef182820b..068d8b99605b 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -62,7 +62,6 @@ struct ublk_rq_data { struct llist_node node; - struct callback_head work; }; struct ublk_uring_cmd_pdu { @@ -289,14 +288,6 @@ static int ublk_apply_params(struct ublk_device *ub) return 0; } -static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) -{ - if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) && - !(ubq->flags & UBLK_F_URING_CMD_COMP_IN_TASK)) - return true; - return false; -} - static inline bool ublk_need_get_data(const struct ublk_queue *ubq) { return ubq->flags & UBLK_F_NEED_GET_DATA; @@ -851,17 +842,6 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags) ublk_forward_io_cmds(ubq, issue_flags); } -static void ublk_rq_task_work_fn(struct callback_head *work) -{ - struct ublk_rq_data *data = container_of(work, - struct ublk_rq_data, work); - struct request *req = blk_mq_rq_from_pdu(data); - struct ublk_queue *ubq = req->mq_hctx->driver_data; - unsigned issue_flags = IO_URING_F_UNLOCKED; - - ublk_forward_io_cmds(ubq, issue_flags); -} - static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) { struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); @@ -885,10 +865,6 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) */ if (unlikely(io->flags & UBLK_IO_FLAG_ABORTED)) { ublk_abort_io_cmds(ubq); - } else if (ublk_can_use_task_work(ubq)) { - if (task_work_add(ubq->ubq_daemon, &data->work, - TWA_SIGNAL_NO_IPI)) - ublk_abort_io_cmds(ubq); } else { struct io_uring_cmd *cmd = io->cmd; struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); @@ -944,19 +920,9 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data, return 0; } -static int ublk_init_rq(struct blk_mq_tag_set *set, struct request *req, - unsigned int hctx_idx, unsigned int numa_node) -{ - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); - - init_task_work(&data->work, ublk_rq_task_work_fn); - return 0; -} - static const struct blk_mq_ops ublk_mq_ops = { .queue_rq = ublk_queue_rq, .init_hctx = ublk_init_hctx, - .init_request = ublk_init_rq, }; static int ublk_ch_open(struct inode *inode, struct file *filp) @@ -1783,10 +1749,8 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) */ ub->dev_info.flags &= UBLK_F_ALL; - if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK)) - ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK; - - ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE; + ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE | + UBLK_F_URING_CMD_COMP_IN_TASK; /* We are not ready to support zero copy */ ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; From patchwork Thu Apr 27 12:44:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C91EC77B73 for ; Thu, 27 Apr 2023 12:45:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243748AbjD0MpO (ORCPT ); Thu, 27 Apr 2023 08:45:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243510AbjD0MpN (ORCPT ); Thu, 27 Apr 2023 08:45:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05DE24233 for ; Thu, 27 Apr 2023 05:44:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dehvtx8JiMw1iS4HkfW+eV7Iijc2rl3zXXDIKkCePzc=; b=apApMMoEjFcwV4RlChWHRCTLrfVIuBmQj7qKvLCf7q242IiGWNdHTn0K5fwguqw8oN/k/y mhRGdVro9lF0RqZaXl83Bt1/38wNW4wAfMHf0ydJk8HPJ2uHbQ90Vl8Vrkbp++b77l1IGX kruE6aTZTmyrYMOHVJ1ZELmToiZrkzo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-70-o1tqt8aHP5SNzeRZeaiLIA-1; Thu, 27 Apr 2023 08:44:27 -0400 X-MC-Unique: o1tqt8aHP5SNzeRZeaiLIA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 261AF2A5955D; Thu, 27 Apr 2023 12:44:27 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5968240C2064; Thu, 27 Apr 2023 12:44:26 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei Subject: [PATCH V2 2/7] ublk: cleanup io cmd code path by adding ublk_fill_io() Date: Thu, 27 Apr 2023 20:44:09 +0800 Message-Id: <20230427124414.319945-3-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add one small helper to cleanup io command hanlding code path. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 068d8b99605b..0980c9fec669 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1238,6 +1238,14 @@ static inline int ublk_check_cmd_op(u32 cmd_op) return 0; } +static inline void ublk_fill_io(struct ublk_io *io, struct io_uring_cmd *cmd, + unsigned long buf_addr) +{ + io->cmd = cmd; + io->flags |= UBLK_IO_FLAG_ACTIVE; + io->addr = buf_addr; +} + static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags, struct ublksrv_io_cmd *ub_cmd) @@ -1304,10 +1312,8 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, /* FETCH_RQ has to provide IO buffer if NEED GET DATA is not enabled */ if (!ub_cmd->addr && !ublk_need_get_data(ubq)) goto out; - io->cmd = cmd; - io->flags |= UBLK_IO_FLAG_ACTIVE; - io->addr = ub_cmd->addr; + ublk_fill_io(io, cmd, ub_cmd->addr); ublk_mark_io_ready(ub, ubq); break; case UBLK_IO_COMMIT_AND_FETCH_REQ: @@ -1320,17 +1326,13 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, goto out; if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) goto out; - io->addr = ub_cmd->addr; - io->flags |= UBLK_IO_FLAG_ACTIVE; - io->cmd = cmd; + ublk_fill_io(io, cmd, ub_cmd->addr); ublk_commit_completion(ub, ub_cmd); break; case UBLK_IO_NEED_GET_DATA: if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) goto out; - io->addr = ub_cmd->addr; - io->cmd = cmd; - io->flags |= UBLK_IO_FLAG_ACTIVE; + ublk_fill_io(io, cmd, ub_cmd->addr); ublk_handle_need_get_data(ub, ub_cmd->q_id, ub_cmd->tag); break; default: From patchwork Thu Apr 27 12:44:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225460 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D627CC77B61 for ; Thu, 27 Apr 2023 12:45:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243518AbjD0MpU (ORCPT ); Thu, 27 Apr 2023 08:45:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243761AbjD0MpT (ORCPT ); Thu, 27 Apr 2023 08:45:19 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A8A52705 for ; Thu, 27 Apr 2023 05:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gEm5i1LbavXr6i+ImxHTVweX02SduZdqexagwMfzZ0Q=; b=GtEwKIETUfPeRc/HntBk4Ha63tbwMlHvRdIKlMw2UWvuLQtMbMCt7o0srRLjeNWJqXwlYn PCyE7xidM0m/HjTEgPizN2MwVirCs0krOWIT1z8COhdsCUFy8FB+IAGDIURj9kwWEiBpWO eqkEDCAahhKaCaaEe8Z5JxSvPx07jNg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-210-nPcD0l-DNry89qMCJOoFWg-1; Thu, 27 Apr 2023 08:44:31 -0400 X-MC-Unique: nPcD0l-DNry89qMCJOoFWg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 865ED2807D6C; Thu, 27 Apr 2023 12:44:30 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9177740C6EC4; Thu, 27 Apr 2023 12:44:29 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei Subject: [PATCH V2 3/7] ublk: cleanup ublk_copy_user_pages Date: Thu, 27 Apr 2023 20:44:10 +0800 Message-Id: <20230427124414.319945-4-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Clean up ublk_copy_user_pages() by using iov_iter_get_pages2, and code gets simplified a lot and becomes much more readable than before. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 112 +++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 63 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 0980c9fec669..7455c4c40fc0 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -411,49 +411,39 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 -struct ublk_map_data { - const struct request *rq; - unsigned long ubuf; - unsigned int len; -}; - struct ublk_io_iter { struct page *pages[UBLK_MAX_PIN_PAGES]; - unsigned pg_off; /* offset in the 1st page in pages */ - int nr_pages; /* how many page pointers in pages */ struct bio *bio; struct bvec_iter iter; }; -static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, - unsigned max_bytes, bool to_vm) +/* return how many pages are copied */ +static void ublk_copy_io_pages(struct ublk_io_iter *data, + size_t total, size_t pg_off, int dir) { - const unsigned total = min_t(unsigned, max_bytes, - PAGE_SIZE - data->pg_off + - ((data->nr_pages - 1) << PAGE_SHIFT)); unsigned done = 0; unsigned pg_idx = 0; while (done < total) { struct bio_vec bv = bio_iter_iovec(data->bio, data->iter); - const unsigned int bytes = min3(bv.bv_len, total - done, - (unsigned)(PAGE_SIZE - data->pg_off)); + unsigned int bytes = min3(bv.bv_len, (unsigned)total - done, + (unsigned)(PAGE_SIZE - pg_off)); void *bv_buf = bvec_kmap_local(&bv); void *pg_buf = kmap_local_page(data->pages[pg_idx]); - if (to_vm) - memcpy(pg_buf + data->pg_off, bv_buf, bytes); + if (dir == ITER_DEST) + memcpy(pg_buf + pg_off, bv_buf, bytes); else - memcpy(bv_buf, pg_buf + data->pg_off, bytes); + memcpy(bv_buf, pg_buf + pg_off, bytes); kunmap_local(pg_buf); kunmap_local(bv_buf); /* advance page array */ - data->pg_off += bytes; - if (data->pg_off == PAGE_SIZE) { + pg_off += bytes; + if (pg_off == PAGE_SIZE) { pg_idx += 1; - data->pg_off = 0; + pg_off = 0; } done += bytes; @@ -467,41 +457,40 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, data->iter = data->bio->bi_iter; } } - - return done; } -static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) +/* + * Copy data between request pages and io_iter, and 'offset' + * is the start point of linear offset of request. + */ +static size_t ublk_copy_user_pages(const struct request *req, + struct iov_iter *uiter, int dir) { - const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->ubuf; - unsigned int done = 0; struct ublk_io_iter iter = { - .pg_off = start_vm & (PAGE_SIZE - 1), - .bio = data->rq->bio, - .iter = data->rq->bio->bi_iter, + .bio = req->bio, + .iter = req->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->len + - (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; - - while (done < nr_pages) { - const unsigned to_pin = min_t(unsigned, UBLK_MAX_PIN_PAGES, - nr_pages - done); - unsigned i, len; - - iter.nr_pages = get_user_pages_fast(start_vm + - (done << PAGE_SHIFT), to_pin, gup_flags, - iter.pages); - if (iter.nr_pages <= 0) - return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->len, to_vm); - for (i = 0; i < iter.nr_pages; i++) { - if (to_vm) + size_t done = 0; + + while (iov_iter_count(uiter) && iter.bio) { + unsigned nr_pages; + size_t len, off; + int i; + + len = iov_iter_get_pages2(uiter, iter.pages, + iov_iter_count(uiter), + UBLK_MAX_PIN_PAGES, &off); + if (len <= 0) + return done; + + ublk_copy_io_pages(&iter, len, off, dir); + nr_pages = DIV_ROUND_UP(len + off, PAGE_SIZE); + for (i = 0; i < nr_pages; i++) { + if (dir == ITER_DEST) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->len -= len; - done += iter.nr_pages; + done += len; } return done; @@ -528,15 +517,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context is pretty fast, see ublk_pin_user_pages */ if (ublk_need_map_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = rq_bytes, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_DEST; - ublk_copy_user_pages(&data, true); + import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes, + &iov, &iter); - return rq_bytes - data.len; + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } @@ -548,17 +536,15 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, const unsigned int rq_bytes = blk_rq_bytes(req); if (ublk_need_unmap_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = io->res, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_SOURCE; WARN_ON_ONCE(io->res > rq_bytes); - ublk_copy_user_pages(&data, false); - - return io->res - data.len; + import_single_range(dir, u64_to_user_ptr(io->addr), io->res, + &iov, &iter); + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } From patchwork Thu Apr 27 12:44:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 560D2C77B7C for ; Thu, 27 Apr 2023 12:45:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243730AbjD0Mp1 (ORCPT ); Thu, 27 Apr 2023 08:45:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243987AbjD0Mp0 (ORCPT ); Thu, 27 Apr 2023 08:45:26 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82C1030E8 for ; Thu, 27 Apr 2023 05:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599479; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wQgCcZFzXp6fYqegjTW4SXOftcL3eGRTDtVoool15xI=; b=GIUgRIVuYnkQ5TB+2LmUn2EFAkuswKSeI+YZDGtbtu0QBYenrHE5y28adkYzGZMHwLU+EI fYQPUqTqc0BIgsZh9O9mz3deQOs5yDyRyEl0SCGfAA6qsD5Lu2rN/Pi+BpOEYiTLvguTza tRMIiqU/9/Nmu6mqy+XZA9lzaXf6CFI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-642-nnQKU7YOO3iFfKlzoBkz6g-1; Thu, 27 Apr 2023 08:44:34 -0400 X-MC-Unique: nnQKU7YOO3iFfKlzoBkz6g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 322E01C0A58C; Thu, 27 Apr 2023 12:44:34 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 62D4F1121314; Thu, 27 Apr 2023 12:44:33 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei Subject: [PATCH V2 4/7] ublk: grab request reference when the request is handled by userspace Date: Thu, 27 Apr 2023 20:44:11 +0800 Message-Id: <20230427124414.319945-5-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add one reference counter into request pdu data, and hold this reference in the request's lifetime. Prepare for supporting to move request data copy into userspace, which needs to copy request data by read()/write() on /dev/ublkcN, so we have to guarantee that read()/write() is done on one valid/active request, and that will be enhanced by holding the io request reference in read()/write(). Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 67 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 3 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 7455c4c40fc0..a0935b08e6c4 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #define UBLK_MINORS (1U << MINORBITS) @@ -62,6 +63,8 @@ struct ublk_rq_data { struct llist_node node; + + struct kref ref; }; struct ublk_uring_cmd_pdu { @@ -180,6 +183,9 @@ struct ublk_params_header { __u32 types; }; +static inline void __ublk_complete_rq(struct request *req); +static void ublk_complete_rq(struct kref *ref); + static dev_t ublk_chr_devt; static struct class *ublk_chr_class; @@ -288,6 +294,45 @@ static int ublk_apply_params(struct ublk_device *ub) return 0; } +static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) +{ + return false; +} + +static inline void ublk_init_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_need_req_ref(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + kref_init(&data->ref); + } +} + +static inline bool ublk_get_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_need_req_ref(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + return kref_get_unless_zero(&data->ref); + } + + return true; +} + +static inline void ublk_put_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_need_req_ref(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + kref_put(&data->ref, ublk_complete_rq); + } else { + __ublk_complete_rq(req); + } +} + static inline bool ublk_need_get_data(const struct ublk_queue *ubq) { return ubq->flags & UBLK_F_NEED_GET_DATA; @@ -624,13 +669,19 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) } /* todo: handle partial completion */ -static void ublk_complete_rq(struct request *req) +static inline void __ublk_complete_rq(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; blk_status_t res = BLK_STS_OK; + /* called from ublk_abort_queue() code path */ + if (io->flags & UBLK_IO_FLAG_ABORTED) { + res = BLK_STS_IOERR; + goto exit; + } + /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; @@ -670,6 +721,15 @@ static void ublk_complete_rq(struct request *req) blk_mq_end_request(req, res); } +static void ublk_complete_rq(struct kref *ref) +{ + struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data, + ref); + struct request *req = blk_mq_rq_from_pdu(data); + + __ublk_complete_rq(req); +} + /* * Since __ublk_rq_task_work always fails requests immediately during * exiting, __ublk_fail_req() is only called from abort context during @@ -688,7 +748,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io, if (ublk_queue_can_use_recovery_reissue(ubq)) blk_mq_requeue_request(req, false); else - blk_mq_end_request(req, BLK_STS_IOERR); + ublk_put_req_ref(ubq, req); } } @@ -797,6 +857,7 @@ static inline void __ublk_rq_task_work(struct request *req, mapped_bytes >> 9; } + ublk_init_req_ref(ubq, req); ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags); } @@ -984,7 +1045,7 @@ static void ublk_commit_completion(struct ublk_device *ub, req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag); if (req && likely(!blk_should_fake_timeout(req->q))) - ublk_complete_rq(req); + ublk_put_req_ref(ubq, req); } /* From patchwork Thu Apr 27 12:44:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225462 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75791C77B73 for ; Thu, 27 Apr 2023 12:45:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243761AbjD0Mp0 (ORCPT ); Thu, 27 Apr 2023 08:45:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243926AbjD0Mp0 (ORCPT ); Thu, 27 Apr 2023 08:45:26 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94C054692 for ; Thu, 27 Apr 2023 05:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599479; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=POC2k3zAeQbAbMVt6qB3XUftfoF8v1FYKT8+x9OVApg=; b=CQcefBo8pD7gVRTKFm8/B00XoLbCIa4DD2a+gwbLb8y6wM7IpQxMa5CCIkR2o50MR+ryV7 cltVnRROO82VjVCHpLd2Zdsr+lKwvpIG10IYBU2NLjVstPArLAa0ELsbncckAbITUtLKju GaM5ziMKJVfdSL/Z9wj12Ls5vSfOeCE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-393-ZcL_2c08Mu6Vlg12iFpkEw-1; Thu, 27 Apr 2023 08:44:38 -0400 X-MC-Unique: ZcL_2c08Mu6Vlg12iFpkEw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 64BBD1066540; Thu, 27 Apr 2023 12:44:37 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8FD432166B41; Thu, 27 Apr 2023 12:44:36 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei Subject: [PATCH V2 5/7] ublk: support to copy any part of request pages Date: Thu, 27 Apr 2023 20:44:12 +0800 Message-Id: <20230427124414.319945-6-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add 'offset' to 'struct ublk_map_data', so that ublk_copy_user_pages() can be used to copy any sub-buffer(linear mapped) of the request. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index a0935b08e6c4..7f75355687fa 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -504,19 +504,36 @@ static void ublk_copy_io_pages(struct ublk_io_iter *data, } } +static bool ublk_advance_io_iter(const struct request *req, + struct ublk_io_iter *iter, unsigned int offset) +{ + struct bio *bio = req->bio; + + for_each_bio(bio) { + if (bio->bi_iter.bi_size > offset) { + iter->bio = bio; + iter->iter = bio->bi_iter; + bio_advance_iter(iter->bio, &iter->iter, offset); + return true; + } + offset -= bio->bi_iter.bi_size; + } + return false; +} + /* * Copy data between request pages and io_iter, and 'offset' * is the start point of linear offset of request. */ static size_t ublk_copy_user_pages(const struct request *req, - struct iov_iter *uiter, int dir) + unsigned offset, struct iov_iter *uiter, int dir) { - struct ublk_io_iter iter = { - .bio = req->bio, - .iter = req->bio->bi_iter, - }; + struct ublk_io_iter iter; size_t done = 0; + if (!ublk_advance_io_iter(req, &iter, offset)) + return 0; + while (iov_iter_count(uiter) && iter.bio) { unsigned nr_pages; size_t len, off; @@ -569,7 +586,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } @@ -589,7 +606,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, import_single_range(dir, u64_to_user_ptr(io->addr), io->res, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } From patchwork Thu Apr 27 12:44:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C8F9C77B61 for ; Thu, 27 Apr 2023 12:45:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243734AbjD0MpZ (ORCPT ); Thu, 27 Apr 2023 08:45:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243662AbjD0MpY (ORCPT ); Thu, 27 Apr 2023 08:45:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A479E46AB for ; Thu, 27 Apr 2023 05:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599482; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aX4g8JCOelFK1bQCPrjtPrVlO6beDV2+uo7pB2SrazQ=; b=Dc0uqC6ty4Os9RIPOEDBxb0+MIGGkdSU7qiJzjsPFoQgxwtAUumtQts5sRAl9BL+DNijoH r07osssJWW1npeNLJOXv2LctGGgD3XHcAefu5ccjoZ+cUVxMA7jymoKyJzLs1mgqAGUzdu 54nIMvJwrmOWe6h1iVwqoMmB/taQRbo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-519-yweBIM6RMeazwsOth0oFiA-1; Thu, 27 Apr 2023 08:44:41 -0400 X-MC-Unique: yweBIM6RMeazwsOth0oFiA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 15EBC2807D6E; Thu, 27 Apr 2023 12:44:41 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3AFBC2027043; Thu, 27 Apr 2023 12:44:39 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei Subject: [PATCH V2 6/7] ublk: add read()/write() support for ublk char device Date: Thu, 27 Apr 2023 20:44:13 +0800 Message-Id: <20230427124414.319945-7-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so data copy between io request buffer and userspace buffer can be moved to ublk server from ublk driver. Then UBLK_F_NEED_GET_DATA becomes not necessary, so ublk server can allocate buffer without one extra round uring command communication for userspace to provide buffer. todo: implement .mmap() for mapping io request pages into userspace, so that big chunk IO zero copy can be supported Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 151 ++++++++++++++++++++++++++++++++++ include/uapi/linux/ublk_cmd.h | 22 ++++- 2 files changed, 172 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 7f75355687fa..ebe37bb23ec6 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -206,6 +206,23 @@ static unsigned int ublks_added; /* protected by ublk_ctl_mutex */ static struct miscdevice ublk_misc; +static inline unsigned ublk_pos_to_hwq(loff_t pos) +{ + return ((pos - UBLKSRV_IO_BUF_OFFSET) >> UBLK_QID_OFF) & + UBLK_QID_BITS_MASK; +} + +static inline unsigned ublk_pos_to_buf_off(loff_t pos) +{ + return (pos - UBLKSRV_IO_BUF_OFFSET) & UBLK_IO_BUF_BITS_MASK; +} + +static inline unsigned ublk_pos_to_tag(loff_t pos) +{ + return ((pos - UBLKSRV_IO_BUF_OFFSET) >> UBLK_TAG_OFF) & + UBLK_TAG_BITS_MASK; +} + static void ublk_dev_param_basic_apply(struct ublk_device *ub) { struct request_queue *q = ub->ub_disk->queue; @@ -1411,6 +1428,36 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, return -EIOCBQUEUED; } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_need_req_ref(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_src = (struct ublksrv_io_cmd *) cmd->cmd; @@ -1428,11 +1475,112 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) return __ublk_ch_uring_cmd(cmd, issue_flags, &ub_cmd); } +static inline bool ublk_check_ubuf_dir(const struct request *req, + int ubuf_dir) +{ + /* copy ubuf to request pages */ + if (req_op(req) == REQ_OP_READ && ubuf_dir == ITER_SOURCE) + return true; + + /* copy request pages to ubuf */ + if (req_op(req) == REQ_OP_WRITE && ubuf_dir == ITER_DEST) + return true; + + return false; +} + +static struct request *ublk_check_and_get_req(struct kiocb *iocb, + struct iov_iter *iter, size_t *off, int dir) +{ + struct ublk_device *ub = iocb->ki_filp->private_data; + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + u16 tag, q_id; + + if (!ub) + return ERR_PTR(-EACCES); + + if (!user_backed_iter(iter)) + return ERR_PTR(-EACCES); + + if (ub->dev_info.state == UBLK_S_DEV_DEAD) + return ERR_PTR(-EACCES); + + tag = ublk_pos_to_tag(iocb->ki_pos); + q_id = ublk_pos_to_hwq(iocb->ki_pos); + buf_off = ublk_pos_to_buf_off(iocb->ki_pos); + + if (q_id >= ub->dev_info.nr_hw_queues) + return ERR_PTR(-EINVAL); + + ubq = ublk_get_queue(ub, q_id); + if (!ubq) + return ERR_PTR(-EINVAL); + + if (tag >= ubq->q_depth) + return ERR_PTR(-EINVAL); + + req = __ublk_check_and_get_req(ub, ubq, tag, buf_off); + if (!req) + return ERR_PTR(-EINVAL); + + if (!req->mq_hctx || !req->mq_hctx->driver_data) + goto fail; + + if (!ublk_check_ubuf_dir(req, dir)) + goto fail; + + *off = buf_off; + return req; +fail: + ublk_put_req_ref(ubq, req); + return ERR_PTR(-EACCES); +} + +static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST); + if (IS_ERR(req)) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + +static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE); + if (IS_ERR(req)) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + static const struct file_operations ublk_ch_fops = { .owner = THIS_MODULE, .open = ublk_ch_open, .release = ublk_ch_release, .llseek = no_llseek, + .read_iter = ublk_ch_read_iter, + .write_iter = ublk_ch_write_iter, .uring_cmd = ublk_ch_uring_cmd, .mmap = ublk_ch_mmap, }; @@ -2331,6 +2479,9 @@ static int __init ublk_init(void) { int ret; + BUILD_BUG_ON((unsigned long long)UBLKSRV_IO_BUF_OFFSET + + UBLKSRV_IO_BUF_BITS_SIZE < UBLKSRV_IO_BUF_OFFSET); + init_waitqueue_head(&ublk_idr_wq); ret = misc_register(&ublk_misc); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index 640bf687b94a..a20ea079ca9b 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -93,9 +93,29 @@ #define UBLKSRV_CMD_BUF_OFFSET 0 #define UBLKSRV_IO_BUF_OFFSET 0x80000000 -/* tag bit is 12bit, so at most 4096 IOs for each queue */ +/* tag bit is 15bit, so far limit at most 4096 IOs for each queue */ #define UBLK_MAX_QUEUE_DEPTH 4096 +/* single IO buffer max size is 32MB */ +#define UBLK_IO_BUF_OFF 0 +#define UBLK_IO_BUF_BITS 25 +#define UBLK_IO_BUF_BITS_MASK ((1ULL << UBLK_IO_BUF_BITS) - 1) + +/* so at most 32K IOs for each queue */ +#define UBLK_TAG_OFF UBLK_IO_BUF_BITS +#define UBLK_TAG_BITS 15 +#define UBLK_TAG_BITS_MASK ((1ULL << UBLK_TAG_BITS) - 1) + +/* max 4096 queues */ +#define UBLK_QID_OFF (UBLK_TAG_OFF + UBLK_TAG_BITS) +#define UBLK_QID_BITS 12 +#define UBLK_QID_BITS_MASK ((1ULL << UBLK_QID_BITS) - 1) + +#define UBLK_MAX_NR_QUEUES (1U << UBLK_QID_BITS) + +#define UBLKSRV_IO_BUF_BITS (UBLK_QID_OFF + UBLK_QID_BITS) +#define UBLKSRV_IO_BUF_BITS_SIZE (1ULL << UBLKSRV_IO_BUF_BITS) + /* * zero copy requires 4k block size, and can remap ublk driver's io * request into ublksrv's vm space From patchwork Thu Apr 27 12:44:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13225464 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 324F4C77B61 for ; Thu, 27 Apr 2023 12:45:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243662AbjD0Mpb (ORCPT ); Thu, 27 Apr 2023 08:45:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243650AbjD0Mpa (ORCPT ); Thu, 27 Apr 2023 08:45:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7CED49CF for ; Thu, 27 Apr 2023 05:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682599490; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EU/1ZmJlG2YNYj9Bm2/heLVqDeyFP/r0V2ix6bzUxXI=; b=c30grteM7XdP4SnXf2Y7iyfWqCXvZ7nuwz7ofWhaUlvYUnqD/H4GnfKAak9z3vpWm1NEL5 qc03VIXRTYVeSclbDOAQZMSenHSWWlcwgaKUnSrPXcOzgxt9alpeyy6+qr/qzpyFcWrQZE Rg1QxfTT6aBQclpOuQGaMkw+ThM31ag= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-225-krqkicsEPjqTXmTKRRansw-1; Thu, 27 Apr 2023 08:44:45 -0400 X-MC-Unique: krqkicsEPjqTXmTKRRansw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3E4CA1066544; Thu, 27 Apr 2023 12:44:44 +0000 (UTC) Received: from localhost (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 582D8C15BAD; Thu, 27 Apr 2023 12:44:43 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, ZiyangZhang , Harris James R , Ming Lei Subject: [PATCH V2 7/7] ublk: support user copy Date: Thu, 27 Apr 2023 20:44:14 +0800 Message-Id: <20230427124414.319945-8-ming.lei@redhat.com> In-Reply-To: <20230427124414.319945-1-ming.lei@redhat.com> References: <20230427124414.319945-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently copy between io request buffer(pages) and userspace buffer is done inside ublk_map_io() or ublk_unmap_io(). This way performs very well in case of pre-allocated userspace io buffer. For dynamically allocated or external userspace backend io buffer, UBLK_F_NEED_GET_DATA is added for ublk server to provide buffer by one extra command communication for WRITE request. For READ, userspace simply provides buffer, but can't know when the buffer is done[1]. Add UBLK_F_USER_COPY by moving io data copy out of kernel by providing read()/write() on /dev/ublkcN, and simply let ublk server do the io data copy. This way makes both side cleaner, the cost is that one extra syscall for copy io data between request and backend buffer. With UBLK_F_USER_COPY, it actually becomes possible to run per-io zero copy now, such as, only do zero copy for big size IO, so it can be thought as one prep patch for supporting zero copy. Meantime zero copy still needs to expose read()/write() buffer for some corner case, such as passthrough IO. [1] READ buffer in UBLK_F_NEED_GET_DATA https://lore.kernel.org/linux-block/116d8a56-0881-56d3-9bcc-78ff3e1dc4e5@linux.alibaba.com/T/#m23bd4b8634c0a054e6797063167b469949a247bb ublksrv loop usercopy code: https://github.com/ming1/ubdsrv/commits/usercopy Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 57 ++++++++++++++++++++++++++++------- include/uapi/linux/ublk_cmd.h | 3 ++ 2 files changed, 49 insertions(+), 11 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index ebe37bb23ec6..40e0de6479eb 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -55,7 +55,8 @@ | UBLK_F_USER_RECOVERY \ | UBLK_F_USER_RECOVERY_REISSUE \ | UBLK_F_UNPRIVILEGED_DEV \ - | UBLK_F_CMD_IOCTL_ENCODE) + | UBLK_F_CMD_IOCTL_ENCODE \ + | UBLK_F_USER_COPY) /* All UBLK_PARAM_TYPE_* should be included here */ #define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | \ @@ -311,9 +312,18 @@ static int ublk_apply_params(struct ublk_device *ub) return 0; } +static inline bool ublk_support_user_copy(const struct ublk_queue *ubq) +{ + return ubq->flags & UBLK_F_USER_COPY; +} + static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) { - return false; + /* + * read()/write() is involved in user copy, so request reference + * has to be grabbed + */ + return ublk_support_user_copy(ubq); } static inline void ublk_init_req_ref(const struct ublk_queue *ubq, @@ -590,6 +600,9 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_user_copy(ubq)) + return rq_bytes; + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current @@ -614,6 +627,9 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_user_copy(ubq)) + return rq_bytes; + if (ublk_need_unmap_req(req)) { struct iov_iter iter; struct iovec iov; @@ -1372,6 +1388,11 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, ^ (_IOC_NR(cmd_op) == UBLK_IO_NEED_GET_DATA)) goto out; + if (ublk_support_user_copy(ubq) && ub_cmd->addr) { + ret = -EINVAL; + goto out; + } + ret = ublk_check_cmd_op(cmd_op); if (ret) goto out; @@ -1390,23 +1411,34 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, */ if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV) goto out; - /* FETCH_RQ has to provide IO buffer if NEED GET DATA is not enabled */ - if (!ub_cmd->addr && !ublk_need_get_data(ubq)) - goto out; + + if (!ublk_support_user_copy(ubq)) { + /* + * FETCH_RQ has to provide IO buffer if NEED GET + * DATA is not enabled + */ + if (!ub_cmd->addr && !ublk_need_get_data(ubq)) + goto out; + } ublk_fill_io(io, cmd, ub_cmd->addr); ublk_mark_io_ready(ub, ubq); break; case UBLK_IO_COMMIT_AND_FETCH_REQ: req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag); - /* - * COMMIT_AND_FETCH_REQ has to provide IO buffer if NEED GET DATA is - * not enabled or it is Read IO. - */ - if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || req_op(req) == REQ_OP_READ)) - goto out; + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) goto out; + + if (!ublk_support_user_copy(ubq)) { + /* + * COMMIT_AND_FETCH_REQ has to provide IO buffer if + * NEED GET DATA is not enabled or it is Read IO. + */ + if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || + req_op(req) == REQ_OP_READ)) + goto out; + } ublk_fill_io(io, cmd, ub_cmd->addr); ublk_commit_completion(ub, ub_cmd); break; @@ -1966,6 +1998,9 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE | UBLK_F_URING_CMD_COMP_IN_TASK; + if (ub->dev_info.flags & UBLK_F_USER_COPY) + ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA; + /* We are not ready to support zero copy */ ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index a20ea079ca9b..e26ca0bdd7c5 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -165,6 +165,9 @@ /* use ioctl encoding for uring command */ #define UBLK_F_CMD_IOCTL_ENCODE (1UL << 6) +/* Copy between request and user buffer by pread()/pwrite() */ +#define UBLK_F_USER_COPY (1UL << 7) + /* device state */ #define UBLK_S_DEV_DEAD 0 #define UBLK_S_DEV_LIVE 1