mbox series

[RFC,v3,0/2] fhandle: expose u64 mount id to name_to_handle_at(2)

Message ID 20240801-exportfs-u64-mount-id-v3-0-be5d6283144a@cyphar.com (mailing list archive)
Headers show
Series fhandle: expose u64 mount id to name_to_handle_at(2) | expand

Message

Aleksa Sarai Aug. 1, 2024, 3:52 a.m. UTC
Now that we provide a unique 64-bit mount ID interface in statx(2), we
can now provide a race-free way for name_to_handle_at(2) to provide a
file handle and corresponding mount without needing to worry about
racing with /proc/mountinfo parsing or having to open a file just to do
statx(2).

While this is not necessary if you are using AT_EMPTY_PATH and don't
care about an extra statx(2) call, users that pass full paths into
name_to_handle_at(2) need to know which mount the file handle comes from
(to make sure they don't try to open_by_handle_at a file handle from a
different filesystem) and switching to AT_EMPTY_PATH would require
allocating a file for every name_to_handle_at(2) call, turning

  err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
                          AT_HANDLE_MNT_ID_UNIQUE);

into

  int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
  err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
  err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
  mntid = statxbuf.stx_mnt_id;
  close(fd);

Also, this series adds a patch to clarify how AT_* flag allocation
should work going forwards.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
Changes in v3:
- Added a patch describing how AT_* flags should be allocated in the
  future, based on Amir's suggestions.
- Included AT_* aliases for RENAME_* flags to further indicate that
  renameat2(2) is an *at(2) syscall and to indicate that those flags
  have been allocated already in the per-syscall range.
- Switched AT_HANDLE_MNT_ID_UNIQUE to use 0x01 (to reuse
  (AT_)RENAME_NOREPLACE).
- v2: <https://lore.kernel.org/r/20240523-exportfs-u64-mount-id-v2-1-f9f959f17eb1@cyphar.com>
Changes in v2:
- Fixed a few minor compiler warnings and a buggy copy_to_user() check.
- Rename AT_HANDLE_UNIQUE_MOUNT_ID -> AT_HANDLE_MNT_ID_UNIQUE to match statx.
- Switched to using an AT_* bit from 0xFF and defining that range as
  being "per-syscall" for future usage.
- Sync tools/ copy of <linux/fcntl.h> to include changes.
- v1: <https://lore.kernel.org/r/20240520-exportfs-u64-mount-id-v1-1-f55fd9215b8e@cyphar.com>

---
Aleksa Sarai (2):
      uapi: explain how per-syscall AT_* flags should be allocated
      fhandle: expose u64 mount id to name_to_handle_at(2)

 fs/fhandle.c                                       | 29 ++++++--
 include/linux/syscalls.h                           |  2 +-
 include/uapi/linux/fcntl.h                         | 81 ++++++++++++++-------
 tools/perf/trace/beauty/include/uapi/linux/fcntl.h | 84 +++++++++++++++-------
 4 files changed, 140 insertions(+), 56 deletions(-)
---
base-commit: c7b9563b58a77423d4c6e026ff831a69612b02fc
change-id: 20240515-exportfs-u64-mount-id-9ebb5c58b53c

Best regards,

Comments

Josef Bacik Aug. 1, 2024, 2:28 p.m. UTC | #1
On Thu, Aug 01, 2024 at 01:52:39PM +1000, Aleksa Sarai wrote:
> Now that we provide a unique 64-bit mount ID interface in statx(2), we
> can now provide a race-free way for name_to_handle_at(2) to provide a
> file handle and corresponding mount without needing to worry about
> racing with /proc/mountinfo parsing or having to open a file just to do
> statx(2).
> 
> While this is not necessary if you are using AT_EMPTY_PATH and don't
> care about an extra statx(2) call, users that pass full paths into
> name_to_handle_at(2) need to know which mount the file handle comes from
> (to make sure they don't try to open_by_handle_at a file handle from a
> different filesystem) and switching to AT_EMPTY_PATH would require
> allocating a file for every name_to_handle_at(2) call, turning
> 
>   err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
>                           AT_HANDLE_MNT_ID_UNIQUE);
> 
> into
> 
>   int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
>   err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
>   err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
>   mntid = statxbuf.stx_mnt_id;
>   close(fd);
> 
> Also, this series adds a patch to clarify how AT_* flag allocation
> should work going forwards.
> 
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
> Changes in v3:
> - Added a patch describing how AT_* flags should be allocated in the
>   future, based on Amir's suggestions.
> - Included AT_* aliases for RENAME_* flags to further indicate that
>   renameat2(2) is an *at(2) syscall and to indicate that those flags
>   have been allocated already in the per-syscall range.
> - Switched AT_HANDLE_MNT_ID_UNIQUE to use 0x01 (to reuse
>   (AT_)RENAME_NOREPLACE).
> - v2: <https://lore.kernel.org/r/20240523-exportfs-u64-mount-id-v2-1-f9f959f17eb1@cyphar.com>
> Changes in v2:
> - Fixed a few minor compiler warnings and a buggy copy_to_user() check.
> - Rename AT_HANDLE_UNIQUE_MOUNT_ID -> AT_HANDLE_MNT_ID_UNIQUE to match statx.
> - Switched to using an AT_* bit from 0xFF and defining that range as
>   being "per-syscall" for future usage.
> - Sync tools/ copy of <linux/fcntl.h> to include changes.
> - v1: <https://lore.kernel.org/r/20240520-exportfs-u64-mount-id-v1-1-f55fd9215b8e@cyphar.com>
> 
> ---
> Aleksa Sarai (2):
>       uapi: explain how per-syscall AT_* flags should be allocated
>       fhandle: expose u64 mount id to name_to_handle_at(2)
> 

Wasn't the conclusion from this discussion last time that we needed to revisit
this API completely?  Christoph had some pretty adamant objections.

That being said the uapi comments patch looks good to me, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to that one.  The other one I'm going to let others who have stronger opinions
than me argue about.  Thanks, 

Josef
Aleksa Sarai Aug. 2, 2024, 1:43 a.m. UTC | #2
On 2024-08-01, Josef Bacik <josef@toxicpanda.com> wrote:
> On Thu, Aug 01, 2024 at 01:52:39PM +1000, Aleksa Sarai wrote:
> > Now that we provide a unique 64-bit mount ID interface in statx(2), we
> > can now provide a race-free way for name_to_handle_at(2) to provide a
> > file handle and corresponding mount without needing to worry about
> > racing with /proc/mountinfo parsing or having to open a file just to do
> > statx(2).
> > 
> > While this is not necessary if you are using AT_EMPTY_PATH and don't
> > care about an extra statx(2) call, users that pass full paths into
> > name_to_handle_at(2) need to know which mount the file handle comes from
> > (to make sure they don't try to open_by_handle_at a file handle from a
> > different filesystem) and switching to AT_EMPTY_PATH would require
> > allocating a file for every name_to_handle_at(2) call, turning
> > 
> >   err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
> >                           AT_HANDLE_MNT_ID_UNIQUE);
> > 
> > into
> > 
> >   int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
> >   err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
> >   err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
> >   mntid = statxbuf.stx_mnt_id;
> >   close(fd);
> > 
> > Also, this series adds a patch to clarify how AT_* flag allocation
> > should work going forwards.
> > 
> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> > Changes in v3:
> > - Added a patch describing how AT_* flags should be allocated in the
> >   future, based on Amir's suggestions.
> > - Included AT_* aliases for RENAME_* flags to further indicate that
> >   renameat2(2) is an *at(2) syscall and to indicate that those flags
> >   have been allocated already in the per-syscall range.
> > - Switched AT_HANDLE_MNT_ID_UNIQUE to use 0x01 (to reuse
> >   (AT_)RENAME_NOREPLACE).
> > - v2: <https://lore.kernel.org/r/20240523-exportfs-u64-mount-id-v2-1-f9f959f17eb1@cyphar.com>
> > Changes in v2:
> > - Fixed a few minor compiler warnings and a buggy copy_to_user() check.
> > - Rename AT_HANDLE_UNIQUE_MOUNT_ID -> AT_HANDLE_MNT_ID_UNIQUE to match statx.
> > - Switched to using an AT_* bit from 0xFF and defining that range as
> >   being "per-syscall" for future usage.
> > - Sync tools/ copy of <linux/fcntl.h> to include changes.
> > - v1: <https://lore.kernel.org/r/20240520-exportfs-u64-mount-id-v1-1-f55fd9215b8e@cyphar.com>
> > 
> > ---
> > Aleksa Sarai (2):
> >       uapi: explain how per-syscall AT_* flags should be allocated
> >       fhandle: expose u64 mount id to name_to_handle_at(2)
> > 
> 
> Wasn't the conclusion from this discussion last time that we needed to revisit
> this API completely?  Christoph had some pretty adamant objections.

There was a discussion about reworking the API and I agree with most of
the issues raised about file handles (I personally don't really like
this interface and it's a bit of a shame that it seems this is going to
be the interface that replaces inode numbers) so I'm not at all opposed
to reworking it.

However, I agree with Christian[1] that we can fix this existing issue
in the existing API fairly easily and then work on a new API separately.
The existing usage of name_to_handle_at() is fundamentally unsafe (as
outlined in the man page) and we can fix that fairly easily.

[1]: https://lore.kernel.org/all/20240527-hagel-thunfisch-75781b0cf75d@brauner/

> That being said the uapi comments patch looks good to me, you can add
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> to that one.  The other one I'm going to let others who have stronger opinions
> than me argue about.  Thanks,
> 
> Josef