mbox series

[v4,00/15] fuse: basic support for idmapped mounts

Message ID 20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com (mailing list archive)
Headers show
Series fuse: basic support for idmapped mounts | expand

Message

Alexander Mikhalitsyn Sept. 3, 2024, 3:16 p.m. UTC
Dear friends,

This patch series aimed to provide support for idmapped mounts
for fuse & virtiofs. We already have idmapped mounts support for almost all
widely-used filesystems:
* local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
* network (ceph)

Git tree (based on torvalds/master):
v4: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v4
current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts

Changelog for version 4:
- heavily reworked to comply with Miklos's suggestion to start sending idmapped uid/gid
  just in fuse header instead of adding a new FUSE_OWNER_UID_GID_EXT extension (please, refer to [6], [7])
- added ("fs/fuse: handle idmappings properly in ->write_iter")
- added ("fs/fuse: warn if fuse_access is called when idmapped mounts are allowed")
- added handling for idmapped mounts in FUSE_EXT_GROUPS extension
- now RENAME_WHITEOUT can be (and is) supported

Changelog for version 3:
- introduce and use a new SB_I_NOIDMAP flag (suggested by Christian)
- add support for virtiofs (+user space virtiofsd conversion)

Changelog for version 2:
- removed "fs/namespace: introduce fs_type->allow_idmap hook" and simplified logic
to return -EIO if a fuse daemon does not support idmapped mounts (suggested
by Christian Brauner)
- passed an "idmap" in more cases even when it's not necessary to simplify things (suggested
by Christian Brauner)
- take ->rename() RENAME_WHITEOUT into account and forbid it for idmapped mount case

Links to previous versions:
v3: https://lore.kernel.org/all/20240815092429.103356-1-aleksandr.mikhalitsyn@canonical.com
tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v3
v2: https://lore.kernel.org/linux-fsdevel/20240814114034.113953-1-aleksandr.mikhalitsyn@canonical.com
tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v2
v1: https://lore.kernel.org/all/20240108120824.122178-1-aleksandr.mikhalitsyn@canonical.com/#r
tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1

Having fuse (+virtiofs) supported looks like a good next step. At the same time
fuse conceptually close to the network filesystems and supporting it is
a quite challenging task.

Let me briefly explain what was done in this series and which obstacles we have.

With this series, you can use idmapped mounts with fuse if the following
conditions are met:
1. The filesystem daemon declares idmap support (new FUSE_INIT response feature
flag FUSE_ALLOW_IDMAP)
2. The filesystem superblock was mounted with the "default_permissions" parameter
3. The filesystem fuse daemon does not perform any UID/GID-based checks internally
and fully trusts the kernel to do that (yes, it's almost the same as 2.)

I have prepared a bunch of real-world examples of the user space modifications
that can be done to use this extension:
- libfuse support
https://github.com/mihalicyn/libfuse/commits/idmap_support
- fuse-overlayfs support:
https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support
- cephfs-fuse conversion example
https://github.com/mihalicyn/ceph/commits/fuse_idmap
- glusterfs conversion example (there is a conceptual issue)
https://github.com/mihalicyn/glusterfs/commits/fuse_idmap
- virtiofsd conversion example
https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245

The glusterfs is a bit problematic, unfortunately, because even if the glusterfs
superblock was mounted with the "default_permissions" parameter (1 and 2 conditions
are satisfied), it fails to satisfy the 3rd condition. The glusterfs fuse daemon sends
caller UIDs/GIDs over the wire and all the permission checks are done twice (first
on the client side (in the fuse kernel module) and second on the glusterfs server side).
Just for demonstration's sake, I found a hacky (but working) solution for glusterfs
that disables these server-side permission checks (see [1]). This allows you to play
with the filesystem and idmapped mounts and it works just fine.

The problem described above is the main problem that we can meet when
working on idmapped mounts support for network-based filesystems (or network-like filesystems
like fuse). When people look at the idmapped mounts feature at first they tend to think
that idmaps are for faking caller UIDs/GIDs, but that's not the case. There was a big
discussion about this in the "ceph: support idmapped mounts" patch series [2], [3].
The brief outcome from this discussion is that we don't want and don't have to fool
filesystem code and map a caller's UID/GID everywhere, but only in VFS i_op's
which are provided with a "struct mnt_idmap *idmap"). For example ->lookup()
callback is not provided with it and that's on purpose! We don't expect the low-level
filesystem code to do any permissions checks inside this callback because everything
was already checked on the higher level (see may_lookup() helper). For local filesystems
this assumption works like a charm, but for network-based, unfortunately, not.
For example, the cephfs kernel client *always* send called UID/GID with *any* request
(->lookup included!) and then *may* (depending on the MDS configuration) perform any
permissions checks on the server side based on these values, which obviously leads
to issues/inconsistencies if VFS idmaps are involved.

Fuse filesystem very-very close to cephfs example, because we have req->in.h.uid/req->in.h.gid
and these values are present in all fuse requests and userspace may use them as it wants.

All of the above explains why we have a "default_permissions" requirement. If filesystem
does not use it, then permission checks will be widespread across all the i_op's like
->lookup, ->unlink, ->readlink instead of being consolidated in the one place (->permission callback).

How to play with it:
1. take any patched filesystem from the list (fuse-overlayfs, cephfs-fuse, glusterfs) and mount it
2. ./mount-idmapped --map-mount b:1000:0:2 /mnt/my_fuse_mount /mnt/my_fuse_mount_idmapped
(maps UID/GIDs as 1000 -> 0, 1001 -> 1)
[ taken from https://raw.githubusercontent.com/brauner/mount-idmapped/master/mount-idmapped.c ]

[1] https://github.com/mihalicyn/glusterfs/commit/ab3ec2c7cbe22618cba9cc94a52a492b1904d0b2
[2] https://lore.kernel.org/lkml/20230608154256.562906-1-aleksandr.mikhalitsyn@canonical.com/
[3] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
[4] https://github.com/ceph/ceph/pull/52575
[5] https://lore.kernel.org/all/20230807132626.182101-4-aleksandr.mikhalitsyn@canonical.com/
[6] https://lore.kernel.org/all/CAJfpegsVY97_5mHSc06mSw79FehFWtoXT=hhTUK_E-Yhr7OAuQ@mail.gmail.com/
[7] https://lore.kernel.org/all/CAJfpegtHQsEUuFq1k4ZbTD3E1h-GsrN3PWyv7X8cg6sfU_W2Yw@mail.gmail.com/


Thanks!
Alex

Cc: Christian Brauner <brauner@kernel.org>
Cc: Seth Forshee <sforshee@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Bernd Schubert <bschubert@ddn.com>
Cc: <linux-fsdevel@vger.kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

Alexander Mikhalitsyn (15):
  fs/namespace: introduce SB_I_NOIDMAP flag
  fs/fuse: add basic infrastructure to support idmappings
  fs/fuse: add an idmap argument to fuse_simple_request
  fs/fuse: support idmapped FUSE_EXT_GROUPS
  fs/fuse: support idmap for mkdir/mknod/symlink/create/tmpfile
  fs/fuse: support idmapped getattr inode op
  fs/fuse: support idmapped ->permission inode op
  fs/fuse: support idmapped ->setattr op
  fs/fuse: drop idmap argument from __fuse_get_acl
  fs/fuse: support idmapped ->set_acl
  fs/fuse: support idmapped ->rename op
  fs/fuse: handle idmappings properly in ->write_iter
  fs/fuse: warn if fuse_access is called when idmapped mounts are
    allowed
  fs/fuse: allow idmapped mounts
  fs/fuse/virtio_fs: allow idmapped mounts

 fs/fuse/acl.c             |  10 +--
 fs/fuse/dax.c             |   4 +-
 fs/fuse/dev.c             |  54 +++++++++---
 fs/fuse/dir.c             | 169 ++++++++++++++++++++++----------------
 fs/fuse/file.c            |  37 +++++----
 fs/fuse/fuse_i.h          |   7 +-
 fs/fuse/inode.c           |  19 +++--
 fs/fuse/ioctl.c           |   2 +-
 fs/fuse/readdir.c         |   4 +-
 fs/fuse/virtio_fs.c       |   1 +
 fs/fuse/xattr.c           |   8 +-
 fs/namespace.c            |   4 +
 include/linux/fs.h        |   1 +
 include/uapi/linux/fuse.h |  22 ++++-
 14 files changed, 216 insertions(+), 126 deletions(-)

Comments

Miklos Szeredi Sept. 4, 2024, 3:15 p.m. UTC | #1
On Tue, 3 Sept 2024 at 17:16, Alexander Mikhalitsyn
<aleksandr.mikhalitsyn@canonical.com> wrote:
>
> Dear friends,
>
> This patch series aimed to provide support for idmapped mounts
> for fuse & virtiofs. We already have idmapped mounts support for almost all
> widely-used filesystems:
> * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
> * network (ceph)

Looks good.

Applied with some tweaks and pushed.

Thanks,
Miklos
Christian Brauner Sept. 4, 2024, 3:29 p.m. UTC | #2
On Wed, Sep 04, 2024 at 05:15:40PM GMT, Miklos Szeredi wrote:
> On Tue, 3 Sept 2024 at 17:16, Alexander Mikhalitsyn
> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >
> > Dear friends,
> >
> > This patch series aimed to provide support for idmapped mounts
> > for fuse & virtiofs. We already have idmapped mounts support for almost all
> > widely-used filesystems:
> > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
> > * network (ceph)
> 
> Looks good.
> 
> Applied with some tweaks and pushed.

Ah, I didn't see your reply. Fwiw, if you agree with my suggestion then
Alex can just put that patch on top of the series or do it after we
landed it. I just think passing NULL 38 times is a bit ugly.
Miklos Szeredi Sept. 4, 2024, 4:59 p.m. UTC | #3
On Wed, 4 Sept 2024 at 17:29, Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Sep 04, 2024 at 05:15:40PM GMT, Miklos Szeredi wrote:
> > On Tue, 3 Sept 2024 at 17:16, Alexander Mikhalitsyn
> > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > >
> > > Dear friends,
> > >
> > > This patch series aimed to provide support for idmapped mounts
> > > for fuse & virtiofs. We already have idmapped mounts support for almost all
> > > widely-used filesystems:
> > > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
> > > * network (ceph)
> >
> > Looks good.
> >
> > Applied with some tweaks and pushed.
>
> Ah, I didn't see your reply. Fwiw, if you agree with my suggestion then
> Alex can just put that patch on top of the series or do it after we
> landed it. I just think passing NULL 38 times is a bit ugly.

Yes, I agree with this comment.   I'm fine with either a redone series
or an incremental patch.

Thanks,
Miklos
Alexander Mikhalitsyn Sept. 4, 2024, 5:21 p.m. UTC | #4
On Wed, Sep 4, 2024 at 7:00 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 4 Sept 2024 at 17:29, Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Sep 04, 2024 at 05:15:40PM GMT, Miklos Szeredi wrote:
> > > On Tue, 3 Sept 2024 at 17:16, Alexander Mikhalitsyn
> > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > >
> > > > Dear friends,
> > > >
> > > > This patch series aimed to provide support for idmapped mounts
> > > > for fuse & virtiofs. We already have idmapped mounts support for almost all
> > > > widely-used filesystems:
> > > > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
> > > > * network (ceph)
> > >
> > > Looks good.
> > >
> > > Applied with some tweaks and pushed.
> >
> > Ah, I didn't see your reply. Fwiw, if you agree with my suggestion then
> > Alex can just put that patch on top of the series or do it after we
> > landed it. I just think passing NULL 38 times is a bit ugly.
>
> Yes, I agree with this comment.   I'm fine with either a redone series
> or an incremental patch.

Dear Christian,
Dear Miklos,

I'm happy to send a patch/patches on top to refactor that.

Kind regards,
Alex

>
> Thanks,
> Miklos