Message ID | 20230422-uring-getdents-v2-1-2db1e37dc55e@codewreck.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring: add getdents support, take 2 | expand |
On Wed, May 10, 2023 at 07:52:49PM +0900, Dominique Martinet wrote: > This splits off the vfs_getdents function from the getdents64 system > call. > This will allow io_uring to call the vfs_getdents function. > > Co-authored-by: Stefan Roesch <shr@fb.com> > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > --- > fs/internal.h | 8 ++++++++ > fs/readdir.c | 34 ++++++++++++++++++++++++++-------- > 2 files changed, 34 insertions(+), 8 deletions(-) > > diff --git a/fs/internal.h b/fs/internal.h > index bd3b2810a36b..e8ca000e6613 100644 > --- a/fs/internal.h > +++ b/fs/internal.h > @@ -260,3 +260,11 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po > struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns); > struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap); > void mnt_idmap_put(struct mnt_idmap *idmap); > + > +/* > + * fs/readdir.c > + */ > +struct linux_dirent64; > + > +int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent, > + unsigned int count); > diff --git a/fs/readdir.c b/fs/readdir.c > index 9c53edb60c03..ed0803d0011e 100644 > --- a/fs/readdir.c > +++ b/fs/readdir.c > @@ -21,6 +21,7 @@ > #include <linux/unistd.h> > #include <linux/compat.h> > #include <linux/uaccess.h> > +#include "internal.h" > > #include <asm/unaligned.h> > > @@ -351,10 +352,16 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen, > return false; > } > > -SYSCALL_DEFINE3(getdents64, unsigned int, fd, > - struct linux_dirent64 __user *, dirent, unsigned int, count) > + > +/** > + * vfs_getdents - getdents without fdget > + * @file : pointer to file struct of directory > + * @dirent : pointer to user directory structure > + * @count : size of buffer > + */ > +int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent, > + unsigned int count) > { > - struct fd f; > struct getdents_callback64 buf = { > .ctx.actor = filldir64, > .count = count, > @@ -362,11 +369,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd, > }; > int error; > > - f = fdget_pos(fd); > - if (!f.file) > - return -EBADF; > - > - error = iterate_dir(f.file, &buf.ctx); > + error = iterate_dir(file, &buf.ctx); So afaict this isn't enough. If you look into iterate_shared() you should see that it uses and updates f_pos. But that can't work for io_uring and also isn't how io_uring handles read and write. You probably need to use a local pos similar to what io_uring does in rw.c for rw->kiocb.ki_pos. But in contrast simply disallow any offsets for getdents completely. Thus not relying on f_pos anywhere at all.
Christian Brauner wrote on Tue, May 23, 2023 at 05:39:08PM +0200: > > @@ -362,11 +369,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd, > > }; > > int error; > > > > - f = fdget_pos(fd); > > - if (!f.file) > > - return -EBADF; > > - > > - error = iterate_dir(f.file, &buf.ctx); > > + error = iterate_dir(file, &buf.ctx); > > So afaict this isn't enough. > If you look into iterate_shared() you should see that it uses and > updates f_pos. But that can't work for io_uring and also isn't how > io_uring handles read and write. You probably need to use a local pos > similar to what io_uring does in rw.c for rw->kiocb.ki_pos. But in > contrast simply disallow any offsets for getdents completely. Thus not > relying on f_pos anywhere at all. Using a private offset from the sqe was the previous implementation discussed around here[1], and Al Viro pointed out that the iterate filesystem implementations don't validate the offset makes sense as it's either costly or for some filesystems downright impossible, so I went into a don't let users modify it approach. [1] https://lore.kernel.org/all/20211221164004.119663-1-shr@fb.com/T/#m517583f23502f32b040c819d930359313b3db00c I agree it's not how io_uring usually works -- it dislikes global states -- but it works perfectly well as long as you don't have multiple users on the same file, which the application can take care of. Not having any offsets would work for small directories but make reading large directories impossible so some sort of continuation is required, which means we need to keep the offset around; I also suggested keeping the offset in argument as the previous version but only allowing the last known offset (... so ultimately still updating f_pos anyway as we don't have anywhere else to store it) or 0, but if we're going to do that it looks much simpler to me to expose the same API as getdents.
diff --git a/fs/internal.h b/fs/internal.h index bd3b2810a36b..e8ca000e6613 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -260,3 +260,11 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns); struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap); void mnt_idmap_put(struct mnt_idmap *idmap); + +/* + * fs/readdir.c + */ +struct linux_dirent64; + +int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent, + unsigned int count); diff --git a/fs/readdir.c b/fs/readdir.c index 9c53edb60c03..ed0803d0011e 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -21,6 +21,7 @@ #include <linux/unistd.h> #include <linux/compat.h> #include <linux/uaccess.h> +#include "internal.h" #include <asm/unaligned.h> @@ -351,10 +352,16 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen, return false; } -SYSCALL_DEFINE3(getdents64, unsigned int, fd, - struct linux_dirent64 __user *, dirent, unsigned int, count) + +/** + * vfs_getdents - getdents without fdget + * @file : pointer to file struct of directory + * @dirent : pointer to user directory structure + * @count : size of buffer + */ +int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent, + unsigned int count) { - struct fd f; struct getdents_callback64 buf = { .ctx.actor = filldir64, .count = count, @@ -362,11 +369,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd, }; int error; - f = fdget_pos(fd); - if (!f.file) - return -EBADF; - - error = iterate_dir(f.file, &buf.ctx); + error = iterate_dir(file, &buf.ctx); if (error >= 0) error = buf.error; if (buf.prev_reclen) { @@ -379,6 +382,21 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd, else error = count - buf.count; } + return error; +} + +SYSCALL_DEFINE3(getdents64, unsigned int, fd, + struct linux_dirent64 __user *, dirent, unsigned int, count) +{ + struct fd f; + int error; + + f = fdget_pos(fd); + if (!f.file) + return -EBADF; + + error = vfs_getdents(f.file, dirent, count); + fdput_pos(f); return error; }
This splits off the vfs_getdents function from the getdents64 system call. This will allow io_uring to call the vfs_getdents function. Co-authored-by: Stefan Roesch <shr@fb.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> --- fs/internal.h | 8 ++++++++ fs/readdir.c | 34 ++++++++++++++++++++++++++-------- 2 files changed, 34 insertions(+), 8 deletions(-)