diff mbox series

[PATCHv9,5/7] io_uring: enable per-io hinting capability

Message ID 20241025213645.3464331-6-kbusch@meta.com (mailing list archive)
State New
Headers show
Series write hints with nvme fdp, scsi streams | expand

Commit Message

Keith Busch Oct. 25, 2024, 9:36 p.m. UTC
From: Kanchan Joshi <joshi.k@samsung.com>

With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
all the subsequent writes on the file pass that hint value down. This
can be limiting for block device as all the writes will be tagged with
only one lifetime hint value. Concurrent writes (with different hint
values) are hard to manage. Per-IO hinting solves that problem.

Allow userspace to pass additional metadata in the SQE.

	__u16 write_hint;

If the hint is provided, filesystems may optionally use it. A filesytem
may ignore this field if it does not support per-io hints, or if the
value is invalid for its backing storage. Just like the inode hints,
requesting values that are not supported by the hardware are not an
error.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/uapi/linux/io_uring.h | 4 ++++
 io_uring/rw.c                 | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Anuj gupta Oct. 29, 2024, 12:46 p.m. UTC | #1
On Sat, Oct 26, 2024 at 3:13 AM Keith Busch <kbusch@meta.com> wrote:
>
> From: Kanchan Joshi <joshi.k@samsung.com>
>
> With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
> all the subsequent writes on the file pass that hint value down. This
> can be limiting for block device as all the writes will be tagged with
> only one lifetime hint value. Concurrent writes (with different hint
> values) are hard to manage. Per-IO hinting solves that problem.
>
> Allow userspace to pass additional metadata in the SQE.
>
>         __u16 write_hint;
>
> If the hint is provided, filesystems may optionally use it. A filesytem
> may ignore this field if it does not support per-io hints, or if the
> value is invalid for its backing storage. Just like the inode hints,
> requesting values that are not supported by the hardware are not an
> error.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  include/uapi/linux/io_uring.h | 4 ++++
>  io_uring/rw.c                 | 3 ++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 60b9c98595faf..8cdcc461d464c 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -92,6 +92,10 @@ struct io_uring_sqe {
>                         __u16   addr_len;
>                         __u16   __pad3[1];
>                 };
> +               struct {
> +                       __u16   write_hint;
> +                       __u16   __pad4[1];
> +               };
>         };
>         union {
>                 struct {
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 8080ffd6d5712..5a1231bfecc3a 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -279,7 +279,8 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>                 rw->kiocb.ki_ioprio = get_current_ioprio();
>         }
>         rw->kiocb.dio_complete = NULL;
> -
> +       if (ddir == ITER_SOURCE)
> +               rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint);
>         rw->addr = READ_ONCE(sqe->addr);
>         rw->len = READ_ONCE(sqe->len);
>         rw->flags = READ_ONCE(sqe->rw_flags);
> --
> 2.43.5
>

Since this patch adds a couple of new fields, it makes sense to add
BUILD_BUG_ON() checks in io_uring_init for these fields to assert the
layout of struct io_uring_sqe. And probably a zero check for pad4 in
io_prep_rw.
--
Anuj Gupta
diff mbox series

Patch

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 60b9c98595faf..8cdcc461d464c 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -92,6 +92,10 @@  struct io_uring_sqe {
 			__u16	addr_len;
 			__u16	__pad3[1];
 		};
+		struct {
+			__u16	write_hint;
+			__u16	__pad4[1];
+		};
 	};
 	union {
 		struct {
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 8080ffd6d5712..5a1231bfecc3a 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -279,7 +279,8 @@  static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		rw->kiocb.ki_ioprio = get_current_ioprio();
 	}
 	rw->kiocb.dio_complete = NULL;
-
+	if (ddir == ITER_SOURCE)
+		rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint);
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
 	rw->flags = READ_ONCE(sqe->rw_flags);