mbox series

[0/4] statmount: allow to retrieve idmappings

Message ID 20250130-work-mnt_idmap-statmount-v1-0-d4ced5874e14@kernel.org (mailing list archive)
Headers show
Series statmount: allow to retrieve idmappings | expand

Message

Christian Brauner Jan. 29, 2025, 11:19 p.m. UTC
This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options.
It allows the retrieval of idmappings via statmount().

Currently it isn't possible to figure out what idmappings are applied to
an idmapped mount. This information is often crucial. Before statmount()
the only realistic options for an interface like this would have been to
add it to /proc/<pid>/fdinfo/<nr> or to expose it in
/proc/<pid>/mountinfo. Both solution would have been pretty ugly and
would've shown information that is of strong interest to some
application but not all. statmount() is perfect for this.

The idmappings applied to an idmapped mount are shown relative to the
caller's user namespace. This is the most useful solution that doesn't
risk leaking information or confuse the caller.

For example, an idmapped mount might have been created with the
following idmappings:

    mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt

Listing the idmappings through statmount() in the same context shows:

    mnt_id:        2147485088
    mnt_parent_id: 2147484816
    fs_type:       btrfs
    mnt_root:      /srv
    mnt_point:     /opt
    mnt_opts:      ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
    mnt_uidmap[0]: 0 10000 1000
    mnt_uidmap[1]: 2000 2000 1
    mnt_uidmap[2]: 3000 3000 1
    mnt_gidmap[0]: 0 10000 1000
    mnt_gidmap[1]: 2000 2000 1
    mnt_gidmap[2]: 3000 3000 1

But the idmappings might not always be resolvablein the caller's user
namespace. For example:

    unshare --user --map-root

In this case statmount() will indicate the failure to resolve the idmappings
in the caller's user namespace by listing 4294967295 aka (uid_t) -1 as
the target of the mapping while still showing the source and range of
the mapping:

    mnt_id:        2147485087
    mnt_parent_id: 2147484016
    fs_type:       btrfs
    mnt_root:      /srv
    mnt_point:     /opt
    mnt_opts:      ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
    mnt_uidmap[0]: 0 4294967295 1000
    mnt_uidmap[1]: 2000 4294967295 1
    mnt_uidmap[2]: 3000 4294967295 1
    mnt_gidmap[0]: 0 4294967295 1000
    mnt_gidmap[1]: 2000 4294967295 1
    mnt_gidmap[2]: 3000 4294967295 1

Note that statmount() requires that the whole range must be resolvable
in the caller's user namespace. If a subrange fails to map it will still
list the map as not resolvable. This is a practical compromise to avoid
having to find which subranges are resovable and wich aren't.

Idmappings are listed as a string array with each mapping separated by
zero bytes. This allows to retrieve the idmappings and immediately use
them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for
simple iteration like:

    if (stmnt->mask & STATMOUNT_MNT_UIDMAP) {
            const char *idmap = stmnt->str + stmnt->mnt_uidmap;

            for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) {
                    printf("mnt_uidmap[%lu]: %s\n", idx, idmap);
                    idmap += strlen(idmap) + 1;
            }
    }

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (4):
      uidgid: add map_id_range_up()
      statmount: allow to retrieve idmappings
      samples/vfs: check whether flag was raised
      samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP

 fs/internal.h                      |  1 +
 fs/mnt_idmapping.c                 | 49 ++++++++++++++++++++++++++++++++++++++
 fs/namespace.c                     | 43 ++++++++++++++++++++++++++++++++-
 include/linux/uidgid.h             |  6 +++++
 include/uapi/linux/mount.h         |  8 ++++++-
 kernel/user_namespace.c            | 26 +++++++++++++-------
 samples/vfs/samples-vfs.h          | 14 ++++++++++-
 samples/vfs/test-list-all-mounts.c | 35 ++++++++++++++++++++++-----
 8 files changed, 164 insertions(+), 18 deletions(-)
---
base-commit: 6d61a53dd6f55405ebcaea6ee38d1ab5a8856c2c
change-id: 20250129-work-mnt_idmap-statmount-e57f258fef8e

Comments

Jeff Layton Jan. 30, 2025, 12:22 p.m. UTC | #1
On Thu, 2025-01-30 at 00:19 +0100, Christian Brauner wrote:
> This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options.
> It allows the retrieval of idmappings via statmount().
> 
> Currently it isn't possible to figure out what idmappings are applied to
> an idmapped mount. This information is often crucial. Before statmount()
> the only realistic options for an interface like this would have been to
> add it to /proc/<pid>/fdinfo/<nr> or to expose it in
> /proc/<pid>/mountinfo. Both solution would have been pretty ugly and
> would've shown information that is of strong interest to some
> application but not all. statmount() is perfect for this.
> 
> The idmappings applied to an idmapped mount are shown relative to the
> caller's user namespace. This is the most useful solution that doesn't
> risk leaking information or confuse the caller.
> 
> For example, an idmapped mount might have been created with the
> following idmappings:
> 
>     mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt
> 
> Listing the idmappings through statmount() in the same context shows:
> 
>     mnt_id:        2147485088
>     mnt_parent_id: 2147484816
>     fs_type:       btrfs
>     mnt_root:      /srv
>     mnt_point:     /opt
>     mnt_opts:      ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
>     mnt_uidmap[0]: 0 10000 1000
>     mnt_uidmap[1]: 2000 2000 1
>     mnt_uidmap[2]: 3000 3000 1
>     mnt_gidmap[0]: 0 10000 1000
>     mnt_gidmap[1]: 2000 2000 1
>     mnt_gidmap[2]: 3000 3000 1
> 

nit: any reason not to separate the fields with ':' like the mount
option syntax?

> But the idmappings might not always be resolvablein the caller's user
> namespace. For example:
> 
>     unshare --user --map-root
> 
> In this case statmount() will indicate the failure to resolve the idmappings
> in the caller's user namespace by listing 4294967295 aka (uid_t) -1 as
> the target of the mapping while still showing the source and range of
> the mapping:
> 
>     mnt_id:        2147485087
>     mnt_parent_id: 2147484016
>     fs_type:       btrfs
>     mnt_root:      /srv
>     mnt_point:     /opt
>     mnt_opts:      ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
>     mnt_uidmap[0]: 0 4294967295 1000
>     mnt_uidmap[1]: 2000 4294967295 1
>     mnt_uidmap[2]: 3000 4294967295 1
>     mnt_gidmap[0]: 0 4294967295 1000
>     mnt_gidmap[1]: 2000 4294967295 1
>     mnt_gidmap[2]: 3000 4294967295 1
> 

From a UI standpoint, this behavior is pretty ugly. What if we
(hypothetically) move to 64-bit uids one day? Maybe it'd be better to
note an inability to resolve with more distinct output? Like a '?'
instead of a -1 cast to unsigned?

If I can't resolve the range, maybe it'd be better to just not return
the info at all? Are the first and third fields of any value without
the second?

> Note that statmount() requires that the whole range must be resolvable
> in the caller's user namespace. If a subrange fails to map it will still
> list the map as not resolvable. This is a practical compromise to avoid
> having to find which subranges are resovable and wich aren't.
> 
> Idmappings are listed as a string array with each mapping separated by
> zero bytes. This allows to retrieve the idmappings and immediately use
> them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for
> simple iteration like:
> 
>     if (stmnt->mask & STATMOUNT_MNT_UIDMAP) {
>             const char *idmap = stmnt->str + stmnt->mnt_uidmap;
> 
>             for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) {
>                     printf("mnt_uidmap[%lu]: %s\n", idx, idmap);
>                     idmap += strlen(idmap) + 1;
>             }
>     }
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
> Christian Brauner (4):
>       uidgid: add map_id_range_up()
>       statmount: allow to retrieve idmappings
>       samples/vfs: check whether flag was raised
>       samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
> 
>  fs/internal.h                      |  1 +
>  fs/mnt_idmapping.c                 | 49 ++++++++++++++++++++++++++++++++++++++
>  fs/namespace.c                     | 43 ++++++++++++++++++++++++++++++++-
>  include/linux/uidgid.h             |  6 +++++
>  include/uapi/linux/mount.h         |  8 ++++++-
>  kernel/user_namespace.c            | 26 +++++++++++++-------
>  samples/vfs/samples-vfs.h          | 14 ++++++++++-
>  samples/vfs/test-list-all-mounts.c | 35 ++++++++++++++++++++++-----
>  8 files changed, 164 insertions(+), 18 deletions(-)
> ---
> base-commit: 6d61a53dd6f55405ebcaea6ee38d1ab5a8856c2c
> change-id: 20250129-work-mnt_idmap-statmount-e57f258fef8e
>
Christian Brauner Jan. 30, 2025, 3:16 p.m. UTC | #2
On Thu, Jan 30, 2025 at 07:22:42AM -0500, Jeff Layton wrote:
> On Thu, 2025-01-30 at 00:19 +0100, Christian Brauner wrote:
> > This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options.
> > It allows the retrieval of idmappings via statmount().
> > 
> > Currently it isn't possible to figure out what idmappings are applied to
> > an idmapped mount. This information is often crucial. Before statmount()
> > the only realistic options for an interface like this would have been to
> > add it to /proc/<pid>/fdinfo/<nr> or to expose it in
> > /proc/<pid>/mountinfo. Both solution would have been pretty ugly and
> > would've shown information that is of strong interest to some
> > application but not all. statmount() is perfect for this.
> > 
> > The idmappings applied to an idmapped mount are shown relative to the
> > caller's user namespace. This is the most useful solution that doesn't
> > risk leaking information or confuse the caller.
> > 
> > For example, an idmapped mount might have been created with the
> > following idmappings:
> > 
> >     mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt
> > 
> > Listing the idmappings through statmount() in the same context shows:
> > 
> >     mnt_id:        2147485088
> >     mnt_parent_id: 2147484816
> >     fs_type:       btrfs
> >     mnt_root:      /srv
> >     mnt_point:     /opt
> >     mnt_opts:      ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
> >     mnt_uidmap[0]: 0 10000 1000
> >     mnt_uidmap[1]: 2000 2000 1
> >     mnt_uidmap[2]: 3000 3000 1
> >     mnt_gidmap[0]: 0 10000 1000
> >     mnt_gidmap[1]: 2000 2000 1
> >     mnt_gidmap[2]: 3000 3000 1
> > 
> 
> nit: any reason not to separate the fields with ':' like the mount
> option syntax?

I followed the format of how idmappings are written and shown in
/proc/<PID>/{g,u}id_map.

> 
> > But the idmappings might not always be resolvablein the caller's user
> > namespace. For example:
> > 
> >     unshare --user --map-root
> > 
> > In this case statmount() will indicate the failure to resolve the idmappings
> > in the caller's user namespace by listing 4294967295 aka (uid_t) -1 as
> > the target of the mapping while still showing the source and range of
> > the mapping:
> > 
> >     mnt_id:        2147485087
> >     mnt_parent_id: 2147484016
> >     fs_type:       btrfs
> >     mnt_root:      /srv
> >     mnt_point:     /opt
> >     mnt_opts:      ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
> >     mnt_uidmap[0]: 0 4294967295 1000
> >     mnt_uidmap[1]: 2000 4294967295 1
> >     mnt_uidmap[2]: 3000 4294967295 1
> >     mnt_gidmap[0]: 0 4294967295 1000
> >     mnt_gidmap[1]: 2000 4294967295 1
> >     mnt_gidmap[2]: 3000 4294967295 1
> > 
> 
> From a UI standpoint, this behavior is pretty ugly. What if we
> (hypothetically) move to 64-bit uids one day? Maybe it'd be better to
> note an inability to resolve with more distinct output? Like a '?'
> instead of a -1 cast to unsigned?
> 
> If I can't resolve the range, maybe it'd be better to just not return
> the info at all? Are the first and third fields of any value without
> the second?

That's an option but then users can only distinguish between no
idmapping and an empty idmapping by checking sm->mask for
STATMOUNT_MNT_{G,U}IDMAP. Which is probably fine. I'm just pointing it
out.

Leaving them out is probably a good idea rather than adding some special
syntax.

> > Note that statmount() requires that the whole range must be resolvable
> > in the caller's user namespace. If a subrange fails to map it will still
> > list the map as not resolvable. This is a practical compromise to avoid
> > having to find which subranges are resovable and wich aren't.
> > 
> > Idmappings are listed as a string array with each mapping separated by
> > zero bytes. This allows to retrieve the idmappings and immediately use
> > them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for
> > simple iteration like:
> > 
> >     if (stmnt->mask & STATMOUNT_MNT_UIDMAP) {
> >             const char *idmap = stmnt->str + stmnt->mnt_uidmap;
> > 
> >             for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) {
> >                     printf("mnt_uidmap[%lu]: %s\n", idx, idmap);
> >                     idmap += strlen(idmap) + 1;
> >             }
> >     }
> > 
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> > Christian Brauner (4):
> >       uidgid: add map_id_range_up()
> >       statmount: allow to retrieve idmappings
> >       samples/vfs: check whether flag was raised
> >       samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
> > 
> >  fs/internal.h                      |  1 +
> >  fs/mnt_idmapping.c                 | 49 ++++++++++++++++++++++++++++++++++++++
> >  fs/namespace.c                     | 43 ++++++++++++++++++++++++++++++++-
> >  include/linux/uidgid.h             |  6 +++++
> >  include/uapi/linux/mount.h         |  8 ++++++-
> >  kernel/user_namespace.c            | 26 +++++++++++++-------
> >  samples/vfs/samples-vfs.h          | 14 ++++++++++-
> >  samples/vfs/test-list-all-mounts.c | 35 ++++++++++++++++++++++-----
> >  8 files changed, 164 insertions(+), 18 deletions(-)
> > ---
> > base-commit: 6d61a53dd6f55405ebcaea6ee38d1ab5a8856c2c
> > change-id: 20250129-work-mnt_idmap-statmount-e57f258fef8e
> > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>