diff mbox series

[PATCHv8,5/6] io_uring: enable per-io hinting capability

Message ID 20241017160937.2283225-6-kbusch@meta.com (mailing list archive)
State New
Headers show
Series write hints for nvme fdp | expand

Commit Message

Keith Busch Oct. 17, 2024, 4:09 p.m. UTC
From: Kanchan Joshi <joshi.k@samsung.com>

With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
all the subsequent writes on the file pass that hint value down. This
can be limiting for block device as all the writes can be tagged with
only one lifetime hint value. Concurrent writes (with different hint
values) are hard to manage. Per-IO hinting solves that problem.

Allow userspace to pass additional metadata in the SQE.

	__u16 write_hint;

This accepts all hint values that the file allows.

The write handlers (io_prep_rw, io_write) send the hint value to
lower-layer using kiocb. This is good for upporting direct IO, but not
when kiocb is not available (e.g., buffered IO).

When per-io hints are not passed, the per-inode hint values are set in
the kiocb (as before). Otherwise, per-io hints  take the precedence over
per-inode hints.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/uapi/linux/io_uring.h |  4 ++++
 io_uring/rw.c                 | 11 +++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Oct. 18, 2024, 5:53 a.m. UTC | #1
Same hint vs write stream thing here as well.

> +	if (ddir == ITER_SOURCE &&
> +	    req->file->f_op->fop_flags & FOP_PER_IO_HINTS)
> +		rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint);
> +	else
> +		rw->kiocb.ki_write_hint = WRITE_LIFE_NOT_SET;

WRITE_LIFE_NOT_SET is in the wrong namespae vs the separate streams.

Either use 0 directly or add a separate constant for it.
Hannes Reinecke Oct. 18, 2024, 6:03 a.m. UTC | #2
On 10/17/24 18:09, Keith Busch wrote:
> From: Kanchan Joshi <joshi.k@samsung.com>
> 
> With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
> all the subsequent writes on the file pass that hint value down. This
> can be limiting for block device as all the writes can be tagged with
> only one lifetime hint value. Concurrent writes (with different hint
> values) are hard to manage. Per-IO hinting solves that problem.
> 
> Allow userspace to pass additional metadata in the SQE.
> 
> 	__u16 write_hint;
> 
> This accepts all hint values that the file allows.
> 
> The write handlers (io_prep_rw, io_write) send the hint value to
> lower-layer using kiocb. This is good for upporting direct IO, but not
> when kiocb is not available (e.g., buffered IO).
> 
> When per-io hints are not passed, the per-inode hint values are set in
> the kiocb (as before). Otherwise, per-io hints  take the precedence over
> per-inode hints.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   include/uapi/linux/io_uring.h |  4 ++++
>   io_uring/rw.c                 | 11 +++++++++--
>   2 files changed, 13 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
diff mbox series

Patch

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 86cb385fe0b53..bd9acc0053318 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -92,6 +92,10 @@  struct io_uring_sqe {
 			__u16	addr_len;
 			__u16	__pad3[1];
 		};
+		struct {
+			__u16	write_hint;
+			__u16	__pad4[1];
+		};
 	};
 	union {
 		struct {
diff --git a/io_uring/rw.c b/io_uring/rw.c
index ffd637ca0bd17..9a6d3ba76af4f 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -279,7 +279,11 @@  static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		rw->kiocb.ki_ioprio = get_current_ioprio();
 	}
 	rw->kiocb.dio_complete = NULL;
-
+	if (ddir == ITER_SOURCE &&
+	    req->file->f_op->fop_flags & FOP_PER_IO_HINTS)
+		rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint);
+	else
+		rw->kiocb.ki_write_hint = WRITE_LIFE_NOT_SET;
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
 	rw->flags = READ_ONCE(sqe->rw_flags);
@@ -1027,7 +1031,10 @@  int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(ret))
 		return ret;
 	req->cqe.res = iov_iter_count(&io->iter);
-	rw->kiocb.ki_write_hint = file_write_hint(rw->kiocb.ki_filp);
+
+	/* Use per-file hint only if per-io hint is not set. */
+	if (rw->kiocb.ki_write_hint == WRITE_LIFE_NOT_SET)
+		rw->kiocb.ki_write_hint = file_write_hint(rw->kiocb.ki_filp);
 
 	if (force_nonblock) {
 		/* If the file doesn't support async, just async punt */