mbox series

[GIT,PULL] vfs mount api updates

Message ID 20240105-vfs-mount-5e94596bd1d1@brauner (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] vfs mount api updates | expand

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.8.mount

Message

Christian Brauner Jan. 5, 2024, 12:46 p.m. UTC
Hey Linus,

/* Summary */
This contains the work to retrieve detailed information about mounts via two
new system calls. This is hopefully the beginning of the end of the saga that
started with fsinfo() years ago. The LWN articles in [1] and [2] can serve as a
summary so we can avoid rehashing everything here.

At LSFMM in May 2022 we got into a room and agreed on what we want to do about
fsinfo(). Basically, split it into pieces. This is the first part of that
agreement. Specifically, it is concerned with retrieving information about
mounts. So this only concerns the mount information retrieval, not the mount
table change notification, or the extended filesystem specific mount option
work. That is separate work.

Currently mounts have a 32bit id. Mount ids are already in heavy use by
libmount and other low-level userspace but they can't be relied upon because
they're recycled very quickly. We agreed that mounts should carry a unique
64bit id by which they can be referenced directly. This is now implemented as
part of this work. The new 64bit mount id is exposed in statx() through the new
STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
returned. If it is raised and the kernel supports the new 64bit mount id the
flag is raised in the result mask and the new 64bit mount id is returned. New
and old mount ids do not overlap so they cannot be conflated.

Two new system calls are introduced that operate on the 64bit mount id:
statmount() and listmount(). A summary of the api and usage can be found on LWN
as well (cf. [3]) but of course, I'll provide a summary here as well.

Both system calls rely on struct mnt_id_req. Which is the request struct used
to pass the 64bit mount id identifying the mount to operate on. It is
extensible to allow for the addition of new parameters and for future use in
other apis that make use of mount ids.

statmount() mimicks the semantics of statx() and exposes a set flags that
userspace may raise in mnt_id_req to request specific information to be
retrieved. A statmount() call returns a struct statmount filled in with
information about the requested mount. Supported requests are indicated by
raising the request flag passed in struct mnt_id_req in the @mask argument in
struct statmount. Currently we do support:

* STATMOUNT_SB_BASIC:
  Basic filesystem info.

* STATMOUNT_MNT_BASIC
  Mount information (mount id, parent mount id, mount attributes etc.).

* STATMOUNT_PROPAGATE_FROM
  Propagation from what mount in current namespace.

* STATMOUNT_MNT_ROOT
  Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla).

* STATMOUNT_MNT_POINT
  Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt).

* STATMOUNT_FS_TYPE
  Name of the filesystem type as the magic number isn't enough due to submounts.

The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE are
appended to the end of the struct. Userspace can use the offsets in @fs_type,
@mnt_root, and @mnt_point to reference those strings easily.

The struct statmount reserves quite a bit of space currently for future
extensibility. This isn't really a problem and if this bothers us we can just
send a follow-up pull request during this cycle.

listmount() is given a 64bit mount id via mnt_id_req just as statmount(). It
takes a buffer and a size to return an array of the 64bit ids of the child
mounts of the requested mount. Userspace can thus choose to either retrieve
child mounts for a mount in batches or iterate through the child mounts. For
most use-cases it will be sufficient to just leave space for a few child
mounts. But for big mount tables having an iterator is really helpful.
Iterating through a mount table works by setting @param in mnt_id_req to the
mount id of the last child mount retrieved in the previous listmount() call.

[1]: https://lwn.net/Articles/934469
[2]: https://lwn.net/Articles/829212
[3]: https://lwn.net/Articles/950569

/* Testing */
clang: Debian clang version 16.0.6 (19)
gcc: (Debian 13.2.0-7) 13.2.0

All patches are based on v6.7-rc1 and have been sitting in linux-next.
No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with other trees
================================

[1] linux-next: manual merge of the security tree with the vfs-brauner tree
    https://lore.kernel.org/linux-next/20231120143106.3f8faedd@canb.auug.org.au

    Possible conflict in system call numbering depending on whether the LSM
    people do send a pull request including their new system calls.

[2] This will have a merge conflict with the vfs misc pull request.
    https://lore.kernel.org/r/20240105-vfs-misc-62acb84c5066@brauner

    The resolution is as follows:

    diff --cc tools/testing/selftests/Makefile
    index 27f9f679ed15,da2e1b0e4dd8..000000000000
    --- a/tools/testing/selftests/Makefile
    +++ b/tools/testing/selftests/Makefile
    @@@ -26,7 -26,7 +26,8 @@@ TARGETS += filesystem
      TARGETS += filesystems/binderfs
      TARGETS += filesystems/epoll
      TARGETS += filesystems/fat
     +TARGETS += filesystems/overlayfs
    + TARGETS += filesystems/statmount
      TARGETS += firmware
      TARGETS += fpu
      TARGETS += ftrace

The following changes since commit b85ea95d086471afb4ad062012a4d73cd328fa86:

  Linux 6.7-rc1 (2023-11-12 16:19:07 -0800)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.8.mount

for you to fetch changes up to 5bd3cf8cbc8a286308ef3f40656659d5abc89995:

  add selftest for statmount/listmount (2023-12-14 16:13:59 +0100)

Please consider pulling these changes from the signed vfs-6.8.mount tag.

Happy New Year!
Christian

----------------------------------------------------------------
vfs-6.8.mount

----------------------------------------------------------------
Christian Brauner (3):
      statmount: simplify numeric option retrieval
      statmount: simplify string option retrieval
      fs: keep struct mnt_id_req extensible

Miklos Szeredi (7):
      add unique mount ID
      mounts: keep list of mounts in an rbtree
      namespace: extract show_path() helper
      add statmount(2) syscall
      add listmount(2) syscall
      wire up syscalls for statmount/listmount
      add selftest for statmount/listmount

 arch/alpha/kernel/syscalls/syscall.tbl             |   2 +
 arch/arm/tools/syscall.tbl                         |   2 +
 arch/arm64/include/asm/unistd32.h                  |   4 +
 arch/m68k/kernel/syscalls/syscall.tbl              |   2 +
 arch/microblaze/kernel/syscalls/syscall.tbl        |   2 +
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   2 +
 arch/mips/kernel/syscalls/syscall_n64.tbl          |   2 +
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   2 +
 arch/parisc/kernel/syscalls/syscall.tbl            |   2 +
 arch/powerpc/kernel/syscalls/syscall.tbl           |   2 +
 arch/s390/kernel/syscalls/syscall.tbl              |   2 +
 arch/sh/kernel/syscalls/syscall.tbl                |   2 +
 arch/sparc/kernel/syscalls/syscall.tbl             |   2 +
 arch/x86/entry/syscalls/syscall_32.tbl             |   2 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   2 +
 arch/xtensa/kernel/syscalls/syscall.tbl            |   2 +
 fs/internal.h                                      |   2 +
 fs/mount.h                                         |  27 +-
 fs/namespace.c                                     | 627 +++++++++++++++++----
 fs/pnode.c                                         |   2 +-
 fs/proc_namespace.c                                |  13 +-
 fs/stat.c                                          |   9 +-
 include/linux/mount.h                              |   5 +-
 include/linux/syscalls.h                           |   8 +
 include/uapi/asm-generic/unistd.h                  |   8 +-
 include/uapi/linux/mount.h                         |  70 +++
 include/uapi/linux/stat.h                          |   1 +
 tools/testing/selftests/Makefile                   |   1 +
 .../selftests/filesystems/statmount/.gitignore     |   2 +
 .../selftests/filesystems/statmount/Makefile       |   6 +
 .../filesystems/statmount/statmount_test.c         | 612 ++++++++++++++++++++
 31 files changed, 1298 insertions(+), 129 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/statmount/.gitignore
 create mode 100644 tools/testing/selftests/filesystems/statmount/Makefile
 create mode 100644 tools/testing/selftests/filesystems/statmount/statmount_test.c

Comments

pr-tracker-bot@kernel.org Jan. 8, 2024, 8 p.m. UTC | #1
The pull request you sent on Fri,  5 Jan 2024 13:46:53 +0100:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.8.mount

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8c9440fea77440772542d6dbcb5c36182495c164

Thank you!
Linus Torvalds Jan. 9, 2024, 1:02 a.m. UTC | #2
On Fri, 5 Jan 2024 at 04:47, Christian Brauner <brauner@kernel.org> wrote:
>
> This contains the work to retrieve detailed information about mounts via two
> new system calls.

Gaah. While I have an arm64 laptop now, I don't do arm64 builds in
between each pull like I do x86 ones.

I *did* just start one, because I got the arm64 pull request.

And this fails the arm64 build, because __NR_statmount and
__NR_listmount (457 and 458 respectively) exceed the compat system
call array size, which is

arch/arm64/include/asm/unistd.h:
  #define __NR_compat_syscalls            457

I don't think this is a merge error, I think the error is there in the
original, but I'm about to go off and have dinner, so I'm just sending
this out for now.

How was this not noted in linux-next? Am I missing something?

Now, admittedly this looks like an easy mistake to make due to that
whole odd situation where the compat system calls are listed in
unistd32.h, but then the max number is in unistd.h, but I would still
have expected this to have raised flags before it hit my tree..

                Linus
Will Deacon Jan. 9, 2024, 9:52 a.m. UTC | #3
Hi Linus,

On Mon, Jan 08, 2024 at 05:02:48PM -0800, Linus Torvalds wrote:
> On Fri, 5 Jan 2024 at 04:47, Christian Brauner <brauner@kernel.org> wrote:
> >
> > This contains the work to retrieve detailed information about mounts via two
> > new system calls.
> 
> Gaah. While I have an arm64 laptop now, I don't do arm64 builds in
> between each pull like I do x86 ones.
> 
> I *did* just start one, because I got the arm64 pull request.
> 
> And this fails the arm64 build, because __NR_statmount and
> __NR_listmount (457 and 458 respectively) exceed the compat system
> call array size, which is
> 
> arch/arm64/include/asm/unistd.h:
>   #define __NR_compat_syscalls            457
> 
> I don't think this is a merge error, I think the error is there in the
> original, but I'm about to go off and have dinner, so I'm just sending
> this out for now.
> 
> How was this not noted in linux-next? Am I missing something?

Urgh, that is surprising, and I just confirmed that linux-next builds
fine! The reason seems to be because there are also some new lsm
syscalls being added there (lsm_get_self_attr and friends) which bump
__NR_compat_syscalls to 460 and then Stephen Rothwell's mighty merging
magic adjusted this up to 462 in the merge of the lsm tree.

> Now, admittedly this looks like an easy mistake to make due to that
> whole odd situation where the compat system calls are listed in
> unistd32.h, but then the max number is in unistd.h, but I would still
> have expected this to have raised flags before it hit my tree..

I suppose the two options for now are either to merge the lsm stuff and
adjust __NR_compat_syscalls as Stephen did, or to take this patch from
Florian in the meantime:

https://lore.kernel.org/r/20240109010906.429652-1-florian.fainelli@broadcom.com

Cheers,

Will
Sedat Dilek Jan. 9, 2024, 10:45 a.m. UTC | #4
On Tue, Jan 9, 2024 at 10:52 AM Will Deacon <will@kernel.org> wrote:
>
> Hi Linus,
>
> On Mon, Jan 08, 2024 at 05:02:48PM -0800, Linus Torvalds wrote:
> > On Fri, 5 Jan 2024 at 04:47, Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > This contains the work to retrieve detailed information about mounts via two
> > > new system calls.
> >
> > Gaah. While I have an arm64 laptop now, I don't do arm64 builds in
> > between each pull like I do x86 ones.
> >
> > I *did* just start one, because I got the arm64 pull request.
> >
> > And this fails the arm64 build, because __NR_statmount and
> > __NR_listmount (457 and 458 respectively) exceed the compat system
> > call array size, which is
> >
> > arch/arm64/include/asm/unistd.h:
> >   #define __NR_compat_syscalls            457
> >
> > I don't think this is a merge error, I think the error is there in the
> > original, but I'm about to go off and have dinner, so I'm just sending
> > this out for now.
> >
> > How was this not noted in linux-next? Am I missing something?
>
> Urgh, that is surprising, and I just confirmed that linux-next builds
> fine! The reason seems to be because there are also some new lsm
> syscalls being added there (lsm_get_self_attr and friends) which bump
> __NR_compat_syscalls to 460 and then Stephen Rothwell's mighty merging
> magic adjusted this up to 462 in the merge of the lsm tree.
>
> > Now, admittedly this looks like an easy mistake to make due to that
> > whole odd situation where the compat system calls are listed in
> > unistd32.h, but then the max number is in unistd.h, but I would still
> > have expected this to have raised flags before it hit my tree..
>
> I suppose the two options for now are either to merge the lsm stuff and
> adjust __NR_compat_syscalls as Stephen did, or to take this patch from
> Florian in the meantime:
>
> https://lore.kernel.org/r/20240109010906.429652-1-florian.fainelli@broadcom.com
>

Applied upstream:

https://git.kernel.org/linus/f0a78b3e2a0c842cc7b4c2686f4a35681f02ca72

-Sedat-
Stephen Rothwell Jan. 9, 2024, 11:56 a.m. UTC | #5
Hi Linus,

On Mon, 8 Jan 2024 17:02:48 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> How was this not noted in linux-next? Am I missing something?

I suspect I noticed this when the other syscall adding commits arrived
(in the security tree) and sent
https://lore.kernel.org/all/20231120143106.3f8faedd@canb.auug.org.au/
 - which gave a hint and would have hidden the error you got.  I also do
not do arm64 builds along the way and only at the end of my day.
Christian Brauner Jan. 9, 2024, 2:23 p.m. UTC | #6
On Mon, Jan 08, 2024 at 05:02:48PM -0800, Linus Torvalds wrote:
> On Fri, 5 Jan 2024 at 04:47, Christian Brauner <brauner@kernel.org> wrote:
> >
> > This contains the work to retrieve detailed information about mounts via two
> > new system calls.
> 
> Gaah. While I have an arm64 laptop now, I don't do arm64 builds in
> between each pull like I do x86 ones.
> 
> I *did* just start one, because I got the arm64 pull request.
> 
> And this fails the arm64 build, because __NR_statmount and
> __NR_listmount (457 and 458 respectively) exceed the compat system
> call array size, which is
> 
> arch/arm64/include/asm/unistd.h:
>   #define __NR_compat_syscalls            457
> 
> I don't think this is a merge error, I think the error is there in the
> original, but I'm about to go off and have dinner, so I'm just sending
> this out for now.
> 
> How was this not noted in linux-next? Am I missing something?
> 
> Now, admittedly this looks like an easy mistake to make due to that
> whole odd situation where the compat system calls are listed in
> unistd32.h, but then the max number is in unistd.h, but I would still
> have expected this to have raised flags before it hit my tree..

Bah.

I think Will already provided a good explantion for how this came to be.
But for full transparency: I've ran into this exact issue before with
other system calls we added and I've been notified/saved by Arnd who
pointed out that this file needs to be updated.

32 bit arm has this annoying extra file where you need to bump that
single line. But it'd be nice if we finally had some:

./add-new-syscall

script that could automate adding a new system call number into all
relevant architectures.

Sorry for the breakage. I see that it's already fixed. I'll make a note
to reactivate my cross-compilation toolsuite.