[0/3] readfile(2): a new syscall to make open/read/close faster
mbox series

Message ID 20200704140250.423345-1-gregkh@linuxfoundation.org
Headers show
Series
  • readfile(2): a new syscall to make open/read/close faster
Related show

Message

Greg KH July 4, 2020, 2:02 p.m. UTC
Here is a tiny new syscall, readfile, that makes it simpler to read
small/medium sized files all in one shot, no need to do open/read/close.
This is especially helpful for tools that poke around in procfs or
sysfs, making a little bit of a less system load than before, especially
as syscall overheads go up over time due to various CPU bugs being
addressed.

There are 4 patches in this series, the first 3 are against the kernel
tree, adding the syscall logic, wiring up the syscall, and adding some
tests for it.

The last patch is agains the man-pages project, adding a tiny man page
to try to describe the new syscall.

Greg Kroah-Hartman (3):
  readfile: implement readfile syscall
  arch: wire up the readfile syscall
  selftests: add readfile(2) selftests

 arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
 arch/arm/tools/syscall.tbl                    |   1 +
 arch/arm64/include/asm/unistd.h               |   2 +-
 arch/arm64/include/asm/unistd32.h             |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl         |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
 arch/s390/kernel/syscalls/syscall.tbl         |   1 +
 arch/sh/kernel/syscalls/syscall.tbl           |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
 fs/open.c                                     |  50 +++
 include/linux/syscalls.h                      |   2 +
 include/uapi/asm-generic/unistd.h             |   4 +-
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/readfile/.gitignore   |   3 +
 tools/testing/selftests/readfile/Makefile     |   7 +
 tools/testing/selftests/readfile/readfile.c   | 285 +++++++++++++++++
 .../selftests/readfile/readfile_speed.c       | 301 ++++++++++++++++++
 26 files changed, 671 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/readfile/.gitignore
 create mode 100644 tools/testing/selftests/readfile/Makefile
 create mode 100644 tools/testing/selftests/readfile/readfile.c
 create mode 100644 tools/testing/selftests/readfile/readfile_speed.c

Comments

Al Viro July 4, 2020, 7:30 p.m. UTC | #1
On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote:
> Here is a tiny new syscall, readfile, that makes it simpler to read
> small/medium sized files all in one shot, no need to do open/read/close.
> This is especially helpful for tools that poke around in procfs or
> sysfs, making a little bit of a less system load than before, especially
> as syscall overheads go up over time due to various CPU bugs being
> addressed.

Nice series, but you are 3 months late with it...  Next AFD, perhaps?

Seriously, the rationale is bollocks.  If the overhead of 2 extra
syscalls is anywhere near the costs of the real work being done by
that thing, we have already lost and the best thing to do is to
throw the system away and start with saner hardware.
Greg KH July 5, 2020, 11:47 a.m. UTC | #2
On Sat, Jul 04, 2020 at 08:30:40PM +0100, Al Viro wrote:
> On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote:
> > Here is a tiny new syscall, readfile, that makes it simpler to read
> > small/medium sized files all in one shot, no need to do open/read/close.
> > This is especially helpful for tools that poke around in procfs or
> > sysfs, making a little bit of a less system load than before, especially
> > as syscall overheads go up over time due to various CPU bugs being
> > addressed.
> 
> Nice series, but you are 3 months late with it...  Next AFD, perhaps?

Perhaps :)

> Seriously, the rationale is bollocks.  If the overhead of 2 extra
> syscalls is anywhere near the costs of the real work being done by
> that thing, we have already lost and the best thing to do is to
> throw the system away and start with saner hardware.

The real-work the kernel does is almost neglegant compared to the
open/close overhead of the syscalls on some platforms today.  I'll post
benchmarks with the next version of this patch series to hopefully show
that.  If not, then yeah, this isn't worth it, but it was fun to write.

thanks,

greg k-h
Dave Martin July 6, 2020, 5:25 p.m. UTC | #3
On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote:
> Here is a tiny new syscall, readfile, that makes it simpler to read
> small/medium sized files all in one shot, no need to do open/read/close.
> This is especially helpful for tools that poke around in procfs or
> sysfs, making a little bit of a less system load than before, especially
> as syscall overheads go up over time due to various CPU bugs being
> addressed.
> 
> There are 4 patches in this series, the first 3 are against the kernel
> tree, adding the syscall logic, wiring up the syscall, and adding some
> tests for it.
> 
> The last patch is agains the man-pages project, adding a tiny man page
> to try to describe the new syscall.

General question, using this series as an illustration only:


At the risk of starting a flamewar, why is this needed?  Is there a
realistic usecase that would get significant benefit from this?

A lot of syscalls seem to get added that combine or refactor the
functionality of existing syscalls without justifying why this is
needed (or even wise).  This case feels like a solution, not a
primitive, so I wonder if the long-term ABI fragmentation is worth the
benefit.

I ask because I'd like to get an idea of the policy on what is and is
not considered a frivolous ABI extension.

(I'm sure a usecase must be in mind, but it isn't mentioned here.
Certainly the time it takes top to dump the contents of /proc leaves
something to be desired.)

Cheers
---Dave