Message ID | 20211124231700.1158521-1-shr@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | io_uring: add getdents64 support | expand |
I seem to recall making a few comments the last time a getdents64 for io_uring was proposed; in particular I wanted to bring up one here. This applies only to altering the internal interface, which io_uring would use, although wiring up a new syscall would be a nice addition. The current interface has 2 issues: 1) getdents64 requires at least two calls to read a directory. One or more to get the dents and a final call to see the EOF. With small directories, this means literally 50% of the calls are wasted. 2) The fpos cannot be changed atomically with a read, so it is not possible to to safely perform concurrent reads on the same fd. But, the kernel knows (most, if not all of the time) that it is at EOF at the time it returns the last buffer. So, it would be very useful to get an EOF indicator back with the final buffer. This could just a flag, or for instance make an fpos parameter which is both input and output, returning the (post read) fpos or zero at EOF. Futhermore, for input, one could supply: 0: Start from directory beginning -1: Read from current position other: (output from previous call) Read from here On Wed, Nov 24 2021 at 15:16:57 -0800, Stefan Roesch quoth thus: > This series adds support for getdents64 in liburing. The intent is to > provide a more complete I/O interface for io_uring. > > Patch 1: fs: add parameter use_fpos to iterate_dir() > This adds a new parameter to the function iterate_dir() so the > caller can specify if the position is the file position or the > position stored in the buffer context. > > Patch 2: fs: split off vfs_getdents function from getdents64 system call > This splits of the iterate_dir part of the syscall in its own > dedicated function. This allows to call the function directly from > liburing. > > Patch 3: io_uring: add support for getdents64 > Adds the functions to io_uring to support getdents64. > > There is also a patch series for the changes to liburing. This includes > a new test. The patch series is called "liburing: add getdents support." > > The following tests have been performed: > - new liburing getdents test program has been run > - xfstests have been run > - both tests have been repeated with the kernel memory leak checker > and no leaks have been reported. > > Signed-off-by: Stefan Roesch <shr@fb.com> > --- > V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with > the additional parameter. > > Stefan Roesch (3): > fs: add parameter use_fpos to iterate_dir function > fs: split off vfs_getdents function of getdents64 syscall > io_uring: add support for getdents64 > > arch/alpha/kernel/osf_sys.c | 2 +- > fs/ecryptfs/file.c | 2 +- > fs/exportfs/expfs.c | 2 +- > fs/internal.h | 8 +++++ > fs/io_uring.c | 52 ++++++++++++++++++++++++++++ > fs/ksmbd/smb2pdu.c | 2 +- > fs/ksmbd/vfs.c | 4 +-- > fs/nfsd/nfs4recover.c | 2 +- > fs/nfsd/vfs.c | 2 +- > fs/overlayfs/readdir.c | 6 ++-- > fs/readdir.c | 64 ++++++++++++++++++++++++++--------- > include/linux/fs.h | 2 +- > include/uapi/linux/io_uring.h | 1 + > 13 files changed, 121 insertions(+), 28 deletions(-) > > > base-commit: f0afafc21027c39544a2c1d889b0cff75b346932 > -- > 2.30.2
On 11/24/21 8:42 PM, Clay Harris wrote: > > I seem to recall making a few comments the last time a getdents64 > for io_uring was proposed; in particular I wanted to bring up one > here. This applies only to altering the internal interface, which > io_uring would use, although wiring up a new syscall would be a nice > addition. > > The current interface has 2 issues: > > 1) > getdents64 requires at least two calls to read a directory. > One or more to get the dents and a final call to see the EOF. > With small directories, this means literally 50% of the calls > are wasted. > > 2) > The fpos cannot be changed atomically with a read, so it is not > possible to to safely perform concurrent reads on the same fd. > > But, the kernel knows (most, if not all of the time) that it is at > EOF at the time it returns the last buffer. So, it would be very > useful to get an EOF indicator back with the final buffer. This > could just a flag, or for instance make an fpos parameter which is > both input and output, returning the (post read) fpos or zero at > EOF. > > Futhermore, for input, one could supply: > 0: Start from directory beginning > -1: Read from current position > other: (output from previous call) Read from here > While I can understand the wish to optimize the getdents call, this has its own set of challenges: - The getdents API is following the logic of other read API's. None of these API's has the logic you described above. This would be inconsistent. - The eof needs to be stored in another field. The dirent structure does not have space in the field, so a new data structure needs to be defined. - However the goal is to provide a familiar interface to the user. - If the user wants to reduce the number of calls he can still provide a bigger user buffer. > On Wed, Nov 24 2021 at 15:16:57 -0800, Stefan Roesch quoth thus: > >> This series adds support for getdents64 in liburing. The intent is to >> provide a more complete I/O interface for io_uring. >> >> Patch 1: fs: add parameter use_fpos to iterate_dir() >> This adds a new parameter to the function iterate_dir() so the >> caller can specify if the position is the file position or the >> position stored in the buffer context. >> >> Patch 2: fs: split off vfs_getdents function from getdents64 system call >> This splits of the iterate_dir part of the syscall in its own >> dedicated function. This allows to call the function directly from >> liburing. >> >> Patch 3: io_uring: add support for getdents64 >> Adds the functions to io_uring to support getdents64. >> >> There is also a patch series for the changes to liburing. This includes >> a new test. The patch series is called "liburing: add getdents support." >> >> The following tests have been performed: >> - new liburing getdents test program has been run >> - xfstests have been run >> - both tests have been repeated with the kernel memory leak checker >> and no leaks have been reported. >> >> Signed-off-by: Stefan Roesch <shr@fb.com> >> --- >> V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with >> the additional parameter. >> >> Stefan Roesch (3): >> fs: add parameter use_fpos to iterate_dir function >> fs: split off vfs_getdents function of getdents64 syscall >> io_uring: add support for getdents64 >> >> arch/alpha/kernel/osf_sys.c | 2 +- >> fs/ecryptfs/file.c | 2 +- >> fs/exportfs/expfs.c | 2 +- >> fs/internal.h | 8 +++++ >> fs/io_uring.c | 52 ++++++++++++++++++++++++++++ >> fs/ksmbd/smb2pdu.c | 2 +- >> fs/ksmbd/vfs.c | 4 +-- >> fs/nfsd/nfs4recover.c | 2 +- >> fs/nfsd/vfs.c | 2 +- >> fs/overlayfs/readdir.c | 6 ++-- >> fs/readdir.c | 64 ++++++++++++++++++++++++++--------- >> include/linux/fs.h | 2 +- >> include/uapi/linux/io_uring.h | 1 + >> 13 files changed, 121 insertions(+), 28 deletions(-) >> >> >> base-commit: f0afafc21027c39544a2c1d889b0cff75b346932 >> -- >> 2.30.2
On Tue, Nov 30 2021 at 22:01:30 -0800, Stefan Roesch quoth thus: > > > On 11/24/21 8:42 PM, Clay Harris wrote: > > > > I seem to recall making a few comments the last time a getdents64 > > for io_uring was proposed; in particular I wanted to bring up one > > here. This applies only to altering the internal interface, which > > io_uring would use, although wiring up a new syscall would be a nice > > addition. > > > > The current interface has 2 issues: > > > > 1) > > getdents64 requires at least two calls to read a directory. > > One or more to get the dents and a final call to see the EOF. > > With small directories, this means literally 50% of the calls > > are wasted. > > > > 2) > > The fpos cannot be changed atomically with a read, so it is not > > possible to to safely perform concurrent reads on the same fd. > > > > But, the kernel knows (most, if not all of the time) that it is at > > EOF at the time it returns the last buffer. So, it would be very > > useful to get an EOF indicator back with the final buffer. This > > could just a flag, or for instance make an fpos parameter which is > > both input and output, returning the (post read) fpos or zero at > > EOF. > > > > Futhermore, for input, one could supply: > > 0: Start from directory beginning > > -1: Read from current position > > other: (output from previous call) Read from here > > Thank you for taking the time to respond! > While I can understand the wish to optimize the getdents call, this > has its own set of challenges: > > - The getdents API is following the logic of other read API's. None > of these API's has the logic you described above. This would be > inconsistent. The point was that the familiar interface can be improved by changing its API. Consider as if you were implementing a new syscall: ssize_t getdents2(int fd, void *buf, size_t count, off_t *offset, int flags); This is close enough to pread() that no one would be slowed down by its unfamiliarity. It would work just like getdents64(), except that the fpos at the end of the read would always be returned in offset. At EOF, fpos 0 would be returned. On the calling side, one would set a flag and set offset to a value other than -1 to cause it to atomically change fpos before the read. Just considering an io_uring (only) call, io_uring is already a new interface, so the inconsistency point doesn't really seem to apply. > - The eof needs to be stored in another field. The dirent structure > does not have space in the field, so a new data structure needs to be defined. I believe the important part of the idea is the EOF processing, so I'll drop the fpos discussion for now. In this case you could use exactly the old getdents64() API for io_uring with the minor addition of a returned flag bit. My understanding, which I'll admit could be flawed, is that the directory iterator should be testable for the EOF condition. If correct, I'd think that testing it and setting a bit in the cqe flags field is all you'd need. So, as long as the internal call has access to the iterator, no new structures are required. > - However the goal is to provide a familiar interface to the user. > - If the user wants to reduce the number of calls he can still provide > a bigger user buffer. In io_uring, if there is no EOF flag as suggested above, the best you could probably due is to queue two requests, IOSQE_IO_HARDLINK-ed together, every time you want some directory entries, and then look for a zero length return. That's extra logic, the extra wasted call, and already doubling the required buffering. > > > On Wed, Nov 24 2021 at 15:16:57 -0800, Stefan Roesch quoth thus: > > > >> This series adds support for getdents64 in liburing. The intent is to > >> provide a more complete I/O interface for io_uring. > >> > >> Patch 1: fs: add parameter use_fpos to iterate_dir() > >> This adds a new parameter to the function iterate_dir() so the > >> caller can specify if the position is the file position or the > >> position stored in the buffer context. > >> > >> Patch 2: fs: split off vfs_getdents function from getdents64 system call > >> This splits of the iterate_dir part of the syscall in its own > >> dedicated function. This allows to call the function directly from > >> liburing. > >> > >> Patch 3: io_uring: add support for getdents64 > >> Adds the functions to io_uring to support getdents64. > >> > >> There is also a patch series for the changes to liburing. This includes > >> a new test. The patch series is called "liburing: add getdents support." > >> > >> The following tests have been performed: > >> - new liburing getdents test program has been run > >> - xfstests have been run > >> - both tests have been repeated with the kernel memory leak checker > >> and no leaks have been reported. > >> > >> Signed-off-by: Stefan Roesch <shr@fb.com> > >> --- > >> V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with > >> the additional parameter. > >> > >> Stefan Roesch (3): > >> fs: add parameter use_fpos to iterate_dir function > >> fs: split off vfs_getdents function of getdents64 syscall > >> io_uring: add support for getdents64 > >> > >> arch/alpha/kernel/osf_sys.c | 2 +- > >> fs/ecryptfs/file.c | 2 +- > >> fs/exportfs/expfs.c | 2 +- > >> fs/internal.h | 8 +++++ > >> fs/io_uring.c | 52 ++++++++++++++++++++++++++++ > >> fs/ksmbd/smb2pdu.c | 2 +- > >> fs/ksmbd/vfs.c | 4 +-- > >> fs/nfsd/nfs4recover.c | 2 +- > >> fs/nfsd/vfs.c | 2 +- > >> fs/overlayfs/readdir.c | 6 ++-- > >> fs/readdir.c | 64 ++++++++++++++++++++++++++--------- > >> include/linux/fs.h | 2 +- > >> include/uapi/linux/io_uring.h | 1 + > >> 13 files changed, 121 insertions(+), 28 deletions(-) > >> > >> > >> base-commit: f0afafc21027c39544a2c1d889b0cff75b346932 > >> -- > >> 2.30.2
On 11/24/21 4:16 PM, Stefan Roesch wrote: > This series adds support for getdents64 in liburing. The intent is to > provide a more complete I/O interface for io_uring. > > Patch 1: fs: add parameter use_fpos to iterate_dir() > This adds a new parameter to the function iterate_dir() so the > caller can specify if the position is the file position or the > position stored in the buffer context. > > Patch 2: fs: split off vfs_getdents function from getdents64 system call > This splits of the iterate_dir part of the syscall in its own > dedicated function. This allows to call the function directly from > liburing. > > Patch 3: io_uring: add support for getdents64 > Adds the functions to io_uring to support getdents64. > > There is also a patch series for the changes to liburing. This includes > a new test. The patch series is called "liburing: add getdents support." Al, ping on this one as well, same question on the VFS side.
On Wed, Dec 15, 2021 at 02:09:52PM -0700, Jens Axboe wrote: > On 11/24/21 4:16 PM, Stefan Roesch wrote: > > This series adds support for getdents64 in liburing. The intent is to > > provide a more complete I/O interface for io_uring. > > > > Patch 1: fs: add parameter use_fpos to iterate_dir() > > This adds a new parameter to the function iterate_dir() so the > > caller can specify if the position is the file position or the > > position stored in the buffer context. > > > > Patch 2: fs: split off vfs_getdents function from getdents64 system call > > This splits of the iterate_dir part of the syscall in its own > > dedicated function. This allows to call the function directly from > > liburing. > > > > Patch 3: io_uring: add support for getdents64 > > Adds the functions to io_uring to support getdents64. > > > > There is also a patch series for the changes to liburing. This includes > > a new test. The patch series is called "liburing: add getdents support." > > Al, ping on this one as well, same question on the VFS side. First of all, apologies for being MIA - had been sick for a couple of months, so right now I'm digging through the huge pile of mail. The problem is not with VFS side - it's with ->iterate_dir() *instances*. The logics for validation of file position in there is not pretty, and it's based upon the assumption that all position changes (other than those from ->iterate_dir() itself) come through lseek(2), with its per-opened-file exclusion with getdents(2). Your API just shoves an arbitrary number at them, with no indication as far as struct file instance is concerned that something might need to be checked. Hell, the file might've been just opened with no directory-changing operations done since then; of course its position is valid. And has nothing whatsoever with the number you put into ctx->pos. Normalization is sufficiently costly to show up on real-world loads. Even "assume it bogus" flag somewhere in ctx will seriously cost on io_uring side. And that doesn't help with the case of tmpfs et.al. where position is ignored - there's a per-instance cursor moved around on lseek/readdir and that's the authoritative thing there. So please, use a saner userland ABI - pread-like stuff is a really, really bad idea for directories.
This series adds support for getdents64 in liburing. The intent is to provide a more complete I/O interface for io_uring. Patch 1: fs: add parameter use_fpos to iterate_dir() This adds a new parameter to the function iterate_dir() so the caller can specify if the position is the file position or the position stored in the buffer context. Patch 2: fs: split off vfs_getdents function from getdents64 system call This splits of the iterate_dir part of the syscall in its own dedicated function. This allows to call the function directly from liburing. Patch 3: io_uring: add support for getdents64 Adds the functions to io_uring to support getdents64. There is also a patch series for the changes to liburing. This includes a new test. The patch series is called "liburing: add getdents support." The following tests have been performed: - new liburing getdents test program has been run - xfstests have been run - both tests have been repeated with the kernel memory leak checker and no leaks have been reported. Signed-off-by: Stefan Roesch <shr@fb.com> --- V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with the additional parameter. Stefan Roesch (3): fs: add parameter use_fpos to iterate_dir function fs: split off vfs_getdents function of getdents64 syscall io_uring: add support for getdents64 arch/alpha/kernel/osf_sys.c | 2 +- fs/ecryptfs/file.c | 2 +- fs/exportfs/expfs.c | 2 +- fs/internal.h | 8 +++++ fs/io_uring.c | 52 ++++++++++++++++++++++++++++ fs/ksmbd/smb2pdu.c | 2 +- fs/ksmbd/vfs.c | 4 +-- fs/nfsd/nfs4recover.c | 2 +- fs/nfsd/vfs.c | 2 +- fs/overlayfs/readdir.c | 6 ++-- fs/readdir.c | 64 ++++++++++++++++++++++++++--------- include/linux/fs.h | 2 +- include/uapi/linux/io_uring.h | 1 + 13 files changed, 121 insertions(+), 28 deletions(-) base-commit: f0afafc21027c39544a2c1d889b0cff75b346932