Message ID | 20240809173552.929988-1-maharmstone@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add io_uring interface for encoded reads | expand |
On Fri, Aug 09, 2024 at 06:35:27PM +0100, Mark Harmstone wrote: > Adds an io_uring interface for asynchronous encoded reads, using the > same interface as for the ioctl. To use this you would use an SQE opcode > of IORING_OP_URING_CMD, the cmd_op would be BTRFS_IOC_ENCODED_READ, and > addr would point to the userspace address of the > btrfs_ioctl_encoded_io_args struct. As with the ioctl, you need to have > CAP_SYS_ADMIN for this to work. Where is the CAP_SYS_ADMIN check done? It's not in this patch so I assume it's somewhere in io_uring but I don't see it. Thanks.
On 9/8/24 19:34, David Sterba wrote: > Where is the CAP_SYS_ADMIN check done? It's not in this patch so I > assume it's somewhere in io_uring but I don't see it. Thanks. It still gets done in btrfs_ioctl_encoded_read, which gets called by io_uring in a task with the process context. Mark
On Mon, Aug 12, 2024 at 09:20:19AM +0000, Mark Harmstone wrote: > On 9/8/24 19:34, David Sterba wrote: > > Where is the CAP_SYS_ADMIN check done? It's not in this patch so I > > assume it's somewhere in io_uring but I don't see it. Thanks. > > It still gets done in btrfs_ioctl_encoded_read, which gets called by > io_uring in a task with the process context. Oh of course. The io_uring wrapper is simple so the check does not have to be lifted from the function.
On Fri, Aug 09, 2024 at 06:35:27PM +0100, Mark Harmstone wrote: > Adds an io_uring interface for asynchronous encoded reads, using the > same interface as for the ioctl. To use this you would use an SQE opcode > of IORING_OP_URING_CMD, the cmd_op would be BTRFS_IOC_ENCODED_READ, and > addr would point to the userspace address of the > btrfs_ioctl_encoded_io_args struct. As with the ioctl, you need to have > CAP_SYS_ADMIN for this to work. What is the point if this doesn't actually do anything but returning -EIOCBQUEUED? Note that that the internals of the btrfs encoded read is built around kiocbs anyway, so you might as well turn things upside down, implement a real async io_uring cmd and just wait for it to complete to implement the existing synchronous ioctl. > > Signed-off-by: Mark Harmstone <maharmstone@fb.com> > --- > fs/btrfs/file.c | 1 + > fs/btrfs/ioctl.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/ioctl.h | 1 + > 3 files changed, 50 insertions(+) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index f9d76072398d..974f9e85b46e 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -3850,6 +3850,7 @@ const struct file_operations btrfs_file_operations = { > .compat_ioctl = btrfs_compat_ioctl, > #endif > .remap_file_range = btrfs_remap_file_range, > + .uring_cmd = btrfs_uring_cmd, > }; > > int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end) > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 0493272a7668..8f5cc7d1429c 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -29,6 +29,7 @@ > #include <linux/fileattr.h> > #include <linux/fsverity.h> > #include <linux/sched/xacct.h> > +#include <linux/io_uring/cmd.h> > #include "ctree.h" > #include "disk-io.h" > #include "export.h" > @@ -4648,6 +4649,53 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool > return ret; > } > > +static void btrfs_uring_encoded_read_cb(struct io_uring_cmd *cmd, > + unsigned int issue_flags) > +{ > + int ret; > + > + ret = btrfs_ioctl_encoded_read(cmd->file, (void __user *)cmd->sqe->addr, > + false); > + > + io_uring_cmd_done(cmd, ret, 0, issue_flags); > +} > + > +static void btrfs_uring_encoded_read_compat_cb(struct io_uring_cmd *cmd, > + unsigned int issue_flags) > +{ > + int ret; > + > + ret = btrfs_ioctl_encoded_read(cmd->file, (void __user *)cmd->sqe->addr, > + true); > + > + io_uring_cmd_done(cmd, ret, 0, issue_flags); > +} > + > +static int btrfs_uring_encoded_read(struct io_uring_cmd *cmd, > + unsigned int issue_flags) > +{ > + if (issue_flags & IO_URING_F_COMPAT) > + io_uring_cmd_complete_in_task(cmd, btrfs_uring_encoded_read_compat_cb); > + else > + io_uring_cmd_complete_in_task(cmd, btrfs_uring_encoded_read_cb); > + > + return -EIOCBQUEUED; > +} > + > +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) > +{ > + switch (cmd->cmd_op) { > + case BTRFS_IOC_ENCODED_READ: > +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) > + case BTRFS_IOC_ENCODED_READ_32: > +#endif > + return btrfs_uring_encoded_read(cmd, issue_flags); > + } > + > + io_uring_cmd_done(cmd, -EINVAL, 0, issue_flags); > + return -EIOCBQUEUED; > +} > + > long btrfs_ioctl(struct file *file, unsigned int > cmd, unsigned long arg) > { > diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h > index 2c5dc25ec670..33578f4b5f46 100644 > --- a/fs/btrfs/ioctl.h > +++ b/fs/btrfs/ioctl.h > @@ -22,5 +22,6 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode); > int __pure btrfs_is_empty_uuid(u8 *uuid); > void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info, > struct btrfs_ioctl_balance_args *bargs); > +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); > > #endif > -- > 2.44.2 > > ---end quoted text---
On 12/8/24 12:26, Christoph Hellwig wrote: > What is the point if this doesn't actually do anything but returning > -EIOCBQUEUED? It returns EIOCBQUEUED to say that io_uring has queued the request, and adds the task to io_uring's thread pool for it to be completed. > Note that that the internals of the btrfs encoded read is built > around kiocbs anyway, so you might as well turn things upside down, > implement a real async io_uring cmd and just wait for it to complete > to implement the existing synchronous ioctl. I'd have to look into it, but that sounds like it could be an interesting future refactor. Mark
On 8/12/24 15:46, Mark Harmstone wrote: > On 12/8/24 12:26, Christoph Hellwig wrote: >> What is the point if this doesn't actually do anything but returning >> -EIOCBQUEUED? > > It returns EIOCBQUEUED to say that io_uring has queued the request, and > adds the task to io_uring's thread pool for it to be completed. No, it doesn't. With your patch it'll be executed by the task submitting the request and seemingly get blocked, which is not how an async interface can work. I'll comment on the main patch. >> Note that that the internals of the btrfs encoded read is built >> around kiocbs anyway, so you might as well turn things upside down, >> implement a real async io_uring cmd and just wait for it to complete >> to implement the existing synchronous ioctl. > > I'd have to look into it, but that sounds like it could be an > interesting future refactor. > > Mark
On 8/12/24 12:26, Christoph Hellwig wrote: > On Fri, Aug 09, 2024 at 06:35:27PM +0100, Mark Harmstone wrote: Mark, please CC io_uring for next versions, I've only found out about the patch because Christoph added us. >> Adds an io_uring interface for asynchronous encoded reads, using the >> same interface as for the ioctl. To use this you would use an SQE opcode >> of IORING_OP_URING_CMD, the cmd_op would be BTRFS_IOC_ENCODED_READ, and >> addr would point to the userspace address of the >> btrfs_ioctl_encoded_io_args struct. As with the ioctl, you need to have >> CAP_SYS_ADMIN for this to work. > > What is the point if this doesn't actually do anything but returning > -EIOCBQUEUED? > > Note that that the internals of the btrfs encoded read is built > around kiocbs anyway, so you might as well turn things upside down, > implement a real async io_uring cmd and just wait for it to complete > to implement the existing synchronous ioctl. > >> >> Signed-off-by: Mark Harmstone <maharmstone@fb.com> >> --- ... >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index 0493272a7668..8f5cc7d1429c 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c ... >> +static void btrfs_uring_encoded_read_compat_cb(struct io_uring_cmd *cmd, >> + unsigned int issue_flags) >> +{ >> + int ret; >> + >> + ret = btrfs_ioctl_encoded_read(cmd->file, (void __user *)cmd->sqe->addr, >> + true); >> + >> + io_uring_cmd_done(cmd, ret, 0, issue_flags); >> +} >> + >> +static int btrfs_uring_encoded_read(struct io_uring_cmd *cmd, >> + unsigned int issue_flags) >> +{ >> + if (issue_flags & IO_URING_F_COMPAT) Instead of two different callbacks we can add a helper # include/linux/io_uring/cmd.h static inline bool io_uring_cmd_is_compat(struct io_uring_cmd *cmd) { #ifdef COMPAT struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); return req->ctx->compat; #else return false; #endif } But then we pass the flag in btrfs_ioctl_encoded_read() and use to interpret struct btrfs_ioctl_encoded_io_args, but next just call import_iovec(), which derives compat differently. Since io_uring worker threads are now forks of the original thread, ctx->compat vs in_compat_syscall() are only marginally different, e.g. when we pass a ring to another process with a different compat'ness, but I don't think that something we care about much. Let's just make it consistent. And the last point, I'm surprised there are two versions of btrfs_ioctl_encoded_io_args. Maybe, it's a good moment to fix it if we're creating a new interface. E.g. by adding a new structure defined right with u64 and such, use it in io_uring, and cast to it in the ioctl code when it's x64 (with a good set of BUILD_BUG_ON sprinkled) and convert structures otherwise? >> + io_uring_cmd_complete_in_task(cmd, btrfs_uring_encoded_read_compat_cb); >> + else >> + io_uring_cmd_complete_in_task(cmd, btrfs_uring_encoded_read_cb); As mentioned, the callback will be executed by the submitter task. You mentioned offloading to a thread/iowq, that would look like: btrfs_uring_encoded_read() { if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN; // it's a worker thread, block is allowed } It's a bad pattern though for anything requiring good performance. At minimum it should try to execute with a nowait flag set first nowait = issue_flags & IO_URING_F_NONBLOCK; btrfs_ioctl_encoded_read(..., nowait); If needs to block it would return -EAGAIN, so that the core io_uring would spin up a worker thread for it. Even better if it does it asynchronously as Christoph mentioned. >> + >> + return -EIOCBQUEUED; >> +} >> + >> +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) >> +{ >> + switch (cmd->cmd_op) { >> + case BTRFS_IOC_ENCODED_READ: >> +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) >> + case BTRFS_IOC_ENCODED_READ_32: >> +#endif >> + return btrfs_uring_encoded_read(cmd, issue_flags); >> + } >> + >> + io_uring_cmd_done(cmd, -EINVAL, 0, issue_flags); >> + return -EIOCBQUEUED; >> +} >> + >> long btrfs_ioctl(struct file *file, unsigned int
On Mon, Aug 12, 2024 at 05:10:15PM +0100, Pavel Begunkov wrote: > And the last point, I'm surprised there are two versions of > btrfs_ioctl_encoded_io_args. Maybe, it's a good moment to fix it if > we're creating a new interface. > > E.g. by adding a new structure defined right with u64 and such, use it > in io_uring, and cast to it in the ioctl code when it's x64 (with > a good set of BUILD_BUG_ON sprinkled) and convert structures otherwise? If you mean the 32bit version of the ioctl struct (btrfs_ioctl_encoded_io_args_32), I don't think we can fix it. It's been there from the beginning and it's not a mistake. I don't remember the details why and only vaguely remember that I'd asked why we need it. Similar 64/32 struct is in the send ioctl but that was a mistake due to a pointer being passed in the structure and that needs to be handled due to different type width.
On 8/12/24 17:58, David Sterba wrote: > On Mon, Aug 12, 2024 at 05:10:15PM +0100, Pavel Begunkov wrote: >> And the last point, I'm surprised there are two versions of >> btrfs_ioctl_encoded_io_args. Maybe, it's a good moment to fix it if >> we're creating a new interface. >> >> E.g. by adding a new structure defined right with u64 and such, use it >> in io_uring, and cast to it in the ioctl code when it's x64 (with >> a good set of BUILD_BUG_ON sprinkled) and convert structures otherwise? > > If you mean the 32bit version of the ioctl struct > (btrfs_ioctl_encoded_io_args_32), I don't think we can fix it. It's been Right, I meant btrfs_ioctl_encoded_io_args_32. And to clarify, nothing can be done for the ioctl(2) part, I only suggested to have a single structure when it comes to io_uring. > there from the beginning and it's not a mistake. I don't remember the > details why and only vaguely remember that I'd asked why we need it. > Similar 64/32 struct is in the send ioctl but that was a mistake due to > a pointer being passed in the structure and that needs to be handled due > to different type width. Would be interesting to learn why, maybe Omar remembers? Only two fields are not explicitly sized, both could've been just u64. The structure iov points to (struct iovec) would've had a compat flavour, but that doesn't require a separate btrfs_ioctl_encoded_io_args.
On Mon, Aug 12, 2024 at 08:17:43PM +0100, Pavel Begunkov wrote: > On 8/12/24 17:58, David Sterba wrote: > > On Mon, Aug 12, 2024 at 05:10:15PM +0100, Pavel Begunkov wrote: > >> And the last point, I'm surprised there are two versions of > >> btrfs_ioctl_encoded_io_args. Maybe, it's a good moment to fix it if > >> we're creating a new interface. > >> > >> E.g. by adding a new structure defined right with u64 and such, use it > >> in io_uring, and cast to it in the ioctl code when it's x64 (with > >> a good set of BUILD_BUG_ON sprinkled) and convert structures otherwise? > > > > If you mean the 32bit version of the ioctl struct > > (btrfs_ioctl_encoded_io_args_32), I don't think we can fix it. It's been > > Right, I meant btrfs_ioctl_encoded_io_args_32. And to clarify, nothing > can be done for the ioctl(2) part, I only suggested to have a single > structure when it comes to io_uring. > > > there from the beginning and it's not a mistake. I don't remember the > > details why and only vaguely remember that I'd asked why we need it. > > Similar 64/32 struct is in the send ioctl but that was a mistake due to > > a pointer being passed in the structure and that needs to be handled due > > to different type width. > > Would be interesting to learn why, maybe Omar remembers? Only two > fields are not explicitly sized, both could've been just u64. > The structure iov points to (struct iovec) would've had a compat > flavour, but that doesn't require a separate > btrfs_ioctl_encoded_io_args. Found it: "why don't we avoid the send 32bit workaround" https://lore.kernel.org/linux-btrfs/20190828120650.GZ2752@twin.jikos.cz/ "because big-endian" https://lore.kernel.org/linux-btrfs/20190903171458.GA7452@vader/
On 8/13/24 01:49, David Sterba wrote: > On Mon, Aug 12, 2024 at 08:17:43PM +0100, Pavel Begunkov wrote: >> On 8/12/24 17:58, David Sterba wrote: >>> On Mon, Aug 12, 2024 at 05:10:15PM +0100, Pavel Begunkov wrote: >>>> And the last point, I'm surprised there are two versions of >>>> btrfs_ioctl_encoded_io_args. Maybe, it's a good moment to fix it if >>>> we're creating a new interface. >>>> >>>> E.g. by adding a new structure defined right with u64 and such, use it >>>> in io_uring, and cast to it in the ioctl code when it's x64 (with >>>> a good set of BUILD_BUG_ON sprinkled) and convert structures otherwise? >>> >>> If you mean the 32bit version of the ioctl struct >>> (btrfs_ioctl_encoded_io_args_32), I don't think we can fix it. It's been >> >> Right, I meant btrfs_ioctl_encoded_io_args_32. And to clarify, nothing >> can be done for the ioctl(2) part, I only suggested to have a single >> structure when it comes to io_uring. >> >>> there from the beginning and it's not a mistake. I don't remember the >>> details why and only vaguely remember that I'd asked why we need it. >>> Similar 64/32 struct is in the send ioctl but that was a mistake due to >>> a pointer being passed in the structure and that needs to be handled due >>> to different type width. >> >> Would be interesting to learn why, maybe Omar remembers? Only two >> fields are not explicitly sized, both could've been just u64. >> The structure iov points to (struct iovec) would've had a compat >> flavour, but that doesn't require a separate >> btrfs_ioctl_encoded_io_args. > > Found it: > > "why don't we avoid the send 32bit workaround" > https://lore.kernel.org/linux-btrfs/20190828120650.GZ2752@twin.jikos.cz/ > > "because big-endian" > https://lore.kernel.org/linux-btrfs/20190903171458.GA7452@vader/ union { void __user *buf; __u64 __buf_alignment; }; Endianness is indeed a problem here, but I don't see the purpose of aliasing it with a pointer instead of keeping just u64, which is a common pattern. struct btrfs_ioctl_encoded_io_args { __u64 buf; ... }; // user void *buf = ...; arg.buf = (__u64)(uintptr_t)buf; // kernel void __user *p = u64_to_user_ptr(arg.buf);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f9d76072398d..974f9e85b46e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3850,6 +3850,7 @@ const struct file_operations btrfs_file_operations = { .compat_ioctl = btrfs_compat_ioctl, #endif .remap_file_range = btrfs_remap_file_range, + .uring_cmd = btrfs_uring_cmd, }; int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0493272a7668..8f5cc7d1429c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -29,6 +29,7 @@ #include <linux/fileattr.h> #include <linux/fsverity.h> #include <linux/sched/xacct.h> +#include <linux/io_uring/cmd.h> #include "ctree.h" #include "disk-io.h" #include "export.h" @@ -4648,6 +4649,53 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool return ret; } +static void btrfs_uring_encoded_read_cb(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + int ret; + + ret = btrfs_ioctl_encoded_read(cmd->file, (void __user *)cmd->sqe->addr, + false); + + io_uring_cmd_done(cmd, ret, 0, issue_flags); +} + +static void btrfs_uring_encoded_read_compat_cb(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + int ret; + + ret = btrfs_ioctl_encoded_read(cmd->file, (void __user *)cmd->sqe->addr, + true); + + io_uring_cmd_done(cmd, ret, 0, issue_flags); +} + +static int btrfs_uring_encoded_read(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + if (issue_flags & IO_URING_F_COMPAT) + io_uring_cmd_complete_in_task(cmd, btrfs_uring_encoded_read_compat_cb); + else + io_uring_cmd_complete_in_task(cmd, btrfs_uring_encoded_read_cb); + + return -EIOCBQUEUED; +} + +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + switch (cmd->cmd_op) { + case BTRFS_IOC_ENCODED_READ: +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) + case BTRFS_IOC_ENCODED_READ_32: +#endif + return btrfs_uring_encoded_read(cmd, issue_flags); + } + + io_uring_cmd_done(cmd, -EINVAL, 0, issue_flags); + return -EIOCBQUEUED; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 2c5dc25ec670..33578f4b5f46 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -22,5 +22,6 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode); int __pure btrfs_is_empty_uuid(u8 *uuid); void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_balance_args *bargs); +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); #endif
Adds an io_uring interface for asynchronous encoded reads, using the same interface as for the ioctl. To use this you would use an SQE opcode of IORING_OP_URING_CMD, the cmd_op would be BTRFS_IOC_ENCODED_READ, and addr would point to the userspace address of the btrfs_ioctl_encoded_io_args struct. As with the ioctl, you need to have CAP_SYS_ADMIN for this to work. Signed-off-by: Mark Harmstone <maharmstone@fb.com> --- fs/btrfs/file.c | 1 + fs/btrfs/ioctl.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.h | 1 + 3 files changed, 50 insertions(+)