Message ID | 20240815092429.103356-1-aleksandr.mikhalitsyn@canonical.com (mailing list archive) |
---|---|
Headers | show |
Series | fuse: basic support for idmapped mounts | expand |
On Thu, Aug 15, 2024 at 11:24:17AM GMT, Alexander Mikhalitsyn wrote: > Dear friends, > > This patch series aimed to provide support for idmapped mounts > for fuse & virtiofs. We already have idmapped mounts support for almost all > widely-used filesystems: > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree)) > * network (ceph) > > Git tree (based on torvalds/master): > v3: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v3 > current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts > > Changelog for version 3: > - introduce and use a new SB_I_NOIDMAP flag (suggested by Christian) > - add support for virtiofs (+user space virtiofsd conversion) > > Changelog for version 2: > - removed "fs/namespace: introduce fs_type->allow_idmap hook" and simplified logic > to return -EIO if a fuse daemon does not support idmapped mounts (suggested > by Christian Brauner) > - passed an "idmap" in more cases even when it's not necessary to simplify things (suggested > by Christian Brauner) > - take ->rename() RENAME_WHITEOUT into account and forbid it for idmapped mount case > > Links to previous versions: > v2: https://lore.kernel.org/linux-fsdevel/20240814114034.113953-1-aleksandr.mikhalitsyn@canonical.com > tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v2 > v1: https://lore.kernel.org/all/20240108120824.122178-1-aleksandr.mikhalitsyn@canonical.com/#r > tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1 > > Having fuse (+virtiofs) supported looks like a good next step. At the same time > fuse conceptually close to the network filesystems and supporting it is > a quite challenging task. > > Let me briefly explain what was done in this series and which obstacles we have. > > With this series, you can use idmapped mounts with fuse if the following > conditions are met: > 1. The filesystem daemon declares idmap support (new FUSE_INIT response feature > flags FUSE_OWNER_UID_GID_EXT and FUSE_ALLOW_IDMAP) > 2. The filesystem superblock was mounted with the "default_permissions" parameter > 3. The filesystem fuse daemon does not perform any UID/GID-based checks internally > and fully trusts the kernel to do that (yes, it's almost the same as 2.) > > I have prepared a bunch of real-world examples of the user space modifications > that can be done to use this extension: > - libfuse support > https://github.com/mihalicyn/libfuse/commits/idmap_support > - fuse-overlayfs support: > https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support > - cephfs-fuse conversion example > https://github.com/mihalicyn/ceph/commits/fuse_idmap > - glusterfs conversion example (there is a conceptual issue) > https://github.com/mihalicyn/glusterfs/commits/fuse_idmap > - virtiofsd conversion example > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 So I have no further comments on this and from my perspective this is: Reviewed-by: Christian Brauner <brauner@kernel.org> I would really like to see tests for this feature as this is available to unprivileged users.
On Fri, Aug 16, 2024 at 10:02 AM Christian Brauner <brauner@kernel.org> wrote: > > On Thu, Aug 15, 2024 at 11:24:17AM GMT, Alexander Mikhalitsyn wrote: > > Dear friends, > > > > This patch series aimed to provide support for idmapped mounts > > for fuse & virtiofs. We already have idmapped mounts support for almost all > > widely-used filesystems: > > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree)) > > * network (ceph) > > > > Git tree (based on torvalds/master): > > v3: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v3 > > current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts > > > > Changelog for version 3: > > - introduce and use a new SB_I_NOIDMAP flag (suggested by Christian) > > - add support for virtiofs (+user space virtiofsd conversion) > > > > Changelog for version 2: > > - removed "fs/namespace: introduce fs_type->allow_idmap hook" and simplified logic > > to return -EIO if a fuse daemon does not support idmapped mounts (suggested > > by Christian Brauner) > > - passed an "idmap" in more cases even when it's not necessary to simplify things (suggested > > by Christian Brauner) > > - take ->rename() RENAME_WHITEOUT into account and forbid it for idmapped mount case > > > > Links to previous versions: > > v2: https://lore.kernel.org/linux-fsdevel/20240814114034.113953-1-aleksandr.mikhalitsyn@canonical.com > > tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v2 > > v1: https://lore.kernel.org/all/20240108120824.122178-1-aleksandr.mikhalitsyn@canonical.com/#r > > tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1 > > > > Having fuse (+virtiofs) supported looks like a good next step. At the same time > > fuse conceptually close to the network filesystems and supporting it is > > a quite challenging task. > > > > Let me briefly explain what was done in this series and which obstacles we have. > > > > With this series, you can use idmapped mounts with fuse if the following > > conditions are met: > > 1. The filesystem daemon declares idmap support (new FUSE_INIT response feature > > flags FUSE_OWNER_UID_GID_EXT and FUSE_ALLOW_IDMAP) > > 2. The filesystem superblock was mounted with the "default_permissions" parameter > > 3. The filesystem fuse daemon does not perform any UID/GID-based checks internally > > and fully trusts the kernel to do that (yes, it's almost the same as 2.) > > > > I have prepared a bunch of real-world examples of the user space modifications > > that can be done to use this extension: > > - libfuse support > > https://github.com/mihalicyn/libfuse/commits/idmap_support > > - fuse-overlayfs support: > > https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support > > - cephfs-fuse conversion example > > https://github.com/mihalicyn/ceph/commits/fuse_idmap > > - glusterfs conversion example (there is a conceptual issue) > > https://github.com/mihalicyn/glusterfs/commits/fuse_idmap > > - virtiofsd conversion example > > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 > > So I have no further comments on this and from my perspective this is: > > Reviewed-by: Christian Brauner <brauner@kernel.org> Thanks, Christian! ;-) > > I would really like to see tests for this feature as this is available > to unprivileged users. Sure. I can confirm that this thing passes xfstests for virtiofs. My setup: - host machine Virtiofsd options: [ virtiofsd sources from https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 ] ./target/debug/virtiofsd --socket-path=/tmp/vfsd.sock --shared-dir /home/alex/Documents/dev/tmp --announce-submounts --inode-file-handles=mandatory --posix-acl QEMU options: -object memory-backend-memfd,id=mem,size=$RAM,share=on \ -numa node,memdev=mem \ -chardev socket,id=char0,path=/tmp/vfsd.sock \ -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \ - guest xfstests version: root@ubuntu:/home/ubuntu/xfstests-dev# git log | head -n 3 commit f5ada754d5838d29fd270257003d0d123a9d1cd2 Author: Darrick J. Wong <djwong@kernel.org> Date: Fri Jul 26 09:51:07 2024 -0700 root@ubuntu:/home/ubuntu/xfstests-dev# cat local.config export TEST_DEV=myfs export TEST_DIR=/mnt/test export FSTYP=virtiofs root@ubuntu:/home/ubuntu/xfstests-dev# ./check -g idmapped FSTYP -- virtiofs PLATFORM -- Linux/x86_64 ubuntu 6.11.0-rc3+ #2 SMP PREEMPT_DYNAMIC Fri Aug 16 10:23:41 CEST 2024 generic/633 1s ... 0s generic/644 0s ... 1s generic/645 18s ... 18s generic/656 [not run] fsgqa user not defined. generic/689 [not run] fsgqa user not defined. generic/696 [not run] this test requires a valid $SCRATCH_DEV generic/697 0s ... 1s generic/698 [not run] this test requires a valid $SCRATCH_DEV generic/699 [not run] this test requires a valid $SCRATCH_DEV Ran: generic/633 generic/644 generic/645 generic/656 generic/689 generic/696 generic/697 generic/698 generic/699 Not run: generic/656 generic/689 generic/696 generic/698 generic/699 Passed all 9 tests I'll try to do more tests, for example with fuse-overlayfs and get back with results. Kind regards, Alex
On Fri, Aug 16, 2024 at 10:58 AM Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote: > > On Fri, Aug 16, 2024 at 10:02 AM Christian Brauner <brauner@kernel.org> wrote: > > > > On Thu, Aug 15, 2024 at 11:24:17AM GMT, Alexander Mikhalitsyn wrote: > > > Dear friends, > > > > > > This patch series aimed to provide support for idmapped mounts > > > for fuse & virtiofs. We already have idmapped mounts support for almost all > > > widely-used filesystems: > > > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree)) > > > * network (ceph) > > > > > > Git tree (based on torvalds/master): > > > v3: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v3 > > > current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts > > > > > > Changelog for version 3: > > > - introduce and use a new SB_I_NOIDMAP flag (suggested by Christian) > > > - add support for virtiofs (+user space virtiofsd conversion) > > > > > > Changelog for version 2: > > > - removed "fs/namespace: introduce fs_type->allow_idmap hook" and simplified logic > > > to return -EIO if a fuse daemon does not support idmapped mounts (suggested > > > by Christian Brauner) > > > - passed an "idmap" in more cases even when it's not necessary to simplify things (suggested > > > by Christian Brauner) > > > - take ->rename() RENAME_WHITEOUT into account and forbid it for idmapped mount case > > > > > > Links to previous versions: > > > v2: https://lore.kernel.org/linux-fsdevel/20240814114034.113953-1-aleksandr.mikhalitsyn@canonical.com > > > tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v2 > > > v1: https://lore.kernel.org/all/20240108120824.122178-1-aleksandr.mikhalitsyn@canonical.com/#r > > > tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1 > > > > > > Having fuse (+virtiofs) supported looks like a good next step. At the same time > > > fuse conceptually close to the network filesystems and supporting it is > > > a quite challenging task. > > > > > > Let me briefly explain what was done in this series and which obstacles we have. > > > > > > With this series, you can use idmapped mounts with fuse if the following > > > conditions are met: > > > 1. The filesystem daemon declares idmap support (new FUSE_INIT response feature > > > flags FUSE_OWNER_UID_GID_EXT and FUSE_ALLOW_IDMAP) > > > 2. The filesystem superblock was mounted with the "default_permissions" parameter > > > 3. The filesystem fuse daemon does not perform any UID/GID-based checks internally > > > and fully trusts the kernel to do that (yes, it's almost the same as 2.) > > > > > > I have prepared a bunch of real-world examples of the user space modifications > > > that can be done to use this extension: > > > - libfuse support > > > https://github.com/mihalicyn/libfuse/commits/idmap_support > > > - fuse-overlayfs support: > > > https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support > > > - cephfs-fuse conversion example > > > https://github.com/mihalicyn/ceph/commits/fuse_idmap > > > - glusterfs conversion example (there is a conceptual issue) > > > https://github.com/mihalicyn/glusterfs/commits/fuse_idmap > > > - virtiofsd conversion example > > > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 > > > > So I have no further comments on this and from my perspective this is: > > > > Reviewed-by: Christian Brauner <brauner@kernel.org> > > Thanks, Christian! ;-) > > > > > I would really like to see tests for this feature as this is available > > to unprivileged users. > > Sure. I can confirm that this thing passes xfstests for virtiofs. > > My setup: > > - host machine > > Virtiofsd options: > > [ virtiofsd sources from > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 ] > ./target/debug/virtiofsd --socket-path=/tmp/vfsd.sock --shared-dir > /home/alex/Documents/dev/tmp --announce-submounts > --inode-file-handles=mandatory --posix-acl > > QEMU options: > -object memory-backend-memfd,id=mem,size=$RAM,share=on \ > -numa node,memdev=mem \ > -chardev socket,id=char0,path=/tmp/vfsd.sock \ > -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \ > > - guest > > xfstests version: > > root@ubuntu:/home/ubuntu/xfstests-dev# git log | head -n 3 > commit f5ada754d5838d29fd270257003d0d123a9d1cd2 > Author: Darrick J. Wong <djwong@kernel.org> > Date: Fri Jul 26 09:51:07 2024 -0700 > > root@ubuntu:/home/ubuntu/xfstests-dev# cat local.config > export TEST_DEV=myfs > export TEST_DIR=/mnt/test > export FSTYP=virtiofs > > root@ubuntu:/home/ubuntu/xfstests-dev# ./check -g idmapped > FSTYP -- virtiofs > PLATFORM -- Linux/x86_64 ubuntu 6.11.0-rc3+ #2 SMP > PREEMPT_DYNAMIC Fri Aug 16 10:23:41 CEST 2024 > > generic/633 1s ... 0s > generic/644 0s ... 1s > generic/645 18s ... 18s > generic/656 [not run] fsgqa user not defined. > generic/689 [not run] fsgqa user not defined. > generic/696 [not run] this test requires a valid $SCRATCH_DEV > generic/697 0s ... 1s > generic/698 [not run] this test requires a valid $SCRATCH_DEV > generic/699 [not run] this test requires a valid $SCRATCH_DEV > Ran: generic/633 generic/644 generic/645 generic/656 generic/689 > generic/696 generic/697 generic/698 generic/699 > Not run: generic/656 generic/689 generic/696 generic/698 generic/699 > Passed all 9 tests > > I'll try to do more tests, for example with fuse-overlayfs and get > back with results. Ok, it wasn't smooth to make xfstests to run with overlayfs-fuse. It only started to live after I commented out a bunch of checks in _check_if_dev_already_mounted/_check_mounted_on: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/xfstests-dev.git/tree/common/rc#n1613 https://git.kernel.org/pub/scm/linux/kernel/git/brauner/xfstests-dev.git/tree/common/rc#n1635 https://git.kernel.org/pub/scm/linux/kernel/git/brauner/xfstests-dev.git/tree/common/rc#n1644 I think we have some space for improvements for xfstests+fuse combination. :-) $ cat /sbin/mount.fuse.overlayfs #!/bin/bash ulimit -n 1048576 exec /mnt/fuse-overlayfs/fuse-overlayfs -o $4 $1 $2 $ cat local.config export TEST_DEV=non1 export TEST_DIR=/mnt2 export FSTYP=fuse export FUSE_SUBTYP=.overlayfs export MOUNT_OPTIONS="-olowerdir=/home/ubuntu/fuse_tmp/scratch_lower,upperdir=/home/ubuntu/fuse_tmp/scratch_upper,workdir=/home/ubuntu/fuse_tmp/scratch_work,allow_other,default_permissions" export TEST_FS_MOUNT_OPTS="-olowerdir=/home/ubuntu/fuse_tmp/lower,upperdir=/home/ubuntu/fuse_tmp/upper,workdir=/home/ubuntu/fuse_tmp/work,allow_other,default_permissions" ================== without idmapped mounts support ==================== # ./check -g idmapped FSTYP -- fuse PLATFORM -- Linux/x86_64 ubuntu 6.11.0-rc3+ #2 SMP PREEMPT_DYNAMIC Fri Aug 16 10:23:41 CEST 2024 generic/633 0s ... [failed, exit status 1]- output mismatch (see /home/ubuntu/xfstests-dev/results//generic/633.out.bad) --- tests/generic/633.out 2023-06-07 12:19:04.309062045 +0000 +++ /home/ubuntu/xfstests-dev/results//generic/633.out.bad 2024-08-16 13:30:20.471569848 +0000 @@ -1,2 +1,4 @@ QA output created by 633 Silence is golden +vfstest.c: 1561: setgid_create - Success - failure: is_setgid +vfstest.c: 2418: run_test - Success - failure: create operations in directories with setgid bit set ... (Run 'diff -u /home/ubuntu/xfstests-dev/tests/generic/633.out /home/ubuntu/xfstests-dev/results//generic/633.out.bad' to see the entire diff) generic/644 0s ... [not run] vfstest not support by fuse generic/645 10s ... [not run] vfstest not support by fuse generic/656 0s ... [not run] vfstest not support by fuse generic/689 0s ... [not run] vfstest not support by fuse generic/696 [not run] this test requires a valid $SCRATCH_DEV generic/697 1s ... - output mismatch (see /home/ubuntu/xfstests-dev/results//generic/697.out.bad) --- tests/generic/697.out 2023-06-07 12:19:04.313062164 +0000 +++ /home/ubuntu/xfstests-dev/results//generic/697.out.bad 2024-08-16 13:30:21.919598831 +0000 @@ -1,2 +1,4 @@ QA output created by 697 +vfstest.c: 2018: setgid_create_acl - Success - failure: is_setgid +vfstest.c: 2418: run_test - Success - failure: create operations in directories with setgid bit set under posix acl Silence is golden ... (Run 'diff -u /home/ubuntu/xfstests-dev/tests/generic/697.out /home/ubuntu/xfstests-dev/results//generic/697.out.bad' to see the entire diff) HINT: You _MAY_ be missing kernel fix: 1639a49ccdce fs: move S_ISGID stripping into the vfs_*() helpers generic/698 [not run] this test requires a valid $SCRATCH_DEV generic/699 [not run] this test requires a valid $SCRATCH_DEV Ran: generic/633 generic/644 generic/645 generic/656 generic/689 generic/696 generic/697 generic/698 generic/699 Not run: generic/644 generic/645 generic/656 generic/689 generic/696 generic/698 generic/699 Failures: generic/633 generic/697 Failed 2 of 9 tests ================== with idmapped mounts support ==================== # ./check -g idmapped FSTYP -- fuse PLATFORM -- Linux/x86_64 ubuntu 6.11.0-rc3+ #2 SMP PREEMPT_DYNAMIC Fri Aug 16 10:23:41 CEST 2024 generic/633 0s ... [failed, exit status 1]- output mismatch (see /home/ubuntu/xfstests-dev/results//generic/633.out.bad) --- tests/generic/633.out 2023-06-07 12:19:04.309062045 +0000 +++ /home/ubuntu/xfstests-dev/results//generic/633.out.bad 2024-08-16 13:29:30.358557063 +0000 @@ -1,2 +1,4 @@ QA output created by 633 Silence is golden +vfstest.c: 1561: setgid_create - Success - failure: is_setgid +vfstest.c: 2418: run_test - Success - failure: create operations in directories with setgid bit set ... (Run 'diff -u /home/ubuntu/xfstests-dev/tests/generic/633.out /home/ubuntu/xfstests-dev/results//generic/633.out.bad' to see the entire diff) generic/644 0s ... 0s generic/645 10s ... 10s generic/656 0s generic/689 0s generic/696 [not run] this test requires a valid $SCRATCH_DEV generic/697 1s ... - output mismatch (see /home/ubuntu/xfstests-dev/results//generic/697.out.bad) --- tests/generic/697.out 2023-06-07 12:19:04.313062164 +0000 +++ /home/ubuntu/xfstests-dev/results//generic/697.out.bad 2024-08-16 13:29:41.466783240 +0000 @@ -1,2 +1,4 @@ QA output created by 697 +vfstest.c: 2018: setgid_create_acl - Success - failure: is_setgid +vfstest.c: 2418: run_test - Success - failure: create operations in directories with setgid bit set under posix acl Silence is golden ... (Run 'diff -u /home/ubuntu/xfstests-dev/tests/generic/697.out /home/ubuntu/xfstests-dev/results//generic/697.out.bad' to see the entire diff) HINT: You _MAY_ be missing kernel fix: 1639a49ccdce fs: move S_ISGID stripping into the vfs_*() helpers generic/698 [not run] this test requires a valid $SCRATCH_DEV generic/699 [not run] this test requires a valid $SCRATCH_DEV Ran: generic/633 generic/644 generic/645 generic/656 generic/689 generic/696 generic/697 generic/698 generic/699 Not run: generic/696 generic/698 generic/699 Failures: generic/633 generic/697 Failed 2 of 9 tests As we can see it's clearly not related to idmapped mounts, as I compare two cases, overlayfs-fuse compiled *without* support for idmapped mounts and *with*. > > Kind regards, > Alex
On Thu, 15 Aug 2024 11:24:17 +0200, Alexander Mikhalitsyn wrote: > This patch series aimed to provide support for idmapped mounts > for fuse & virtiofs. We already have idmapped mounts support for almost all > widely-used filesystems: > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree)) > * network (ceph) > > Git tree (based on torvalds/master): > v3: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v3 > current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts > > [...] I've taken this but can drop should it need to end up in a fuse tree. --- Applied to the vfs.idmap branch of the vfs/vfs.git tree. Patches in the vfs.idmap branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.idmap [01/11] fs/namespace: introduce SB_I_NOIDMAP flag https://git.kernel.org/vfs/vfs/c/cc3e8969ffb2 [02/11] fs/fuse: add FUSE_OWNER_UID_GID_EXT extension https://git.kernel.org/vfs/vfs/c/d2c5937035e5 [03/11] fs/fuse: support idmap for mkdir/mknod/symlink/create https://git.kernel.org/vfs/vfs/c/9961d396252b [04/11] fs/fuse: support idmapped getattr inode op https://git.kernel.org/vfs/vfs/c/52dfd148ff75 [05/11] fs/fuse: support idmapped ->permission inode op https://git.kernel.org/vfs/vfs/c/34ddf0de71be [06/11] fs/fuse: support idmapped ->setattr op https://git.kernel.org/vfs/vfs/c/27b622529cdc [07/11] fs/fuse: drop idmap argument from __fuse_get_acl https://git.kernel.org/vfs/vfs/c/6d8f2f4fde13 [08/11] fs/fuse: support idmapped ->set_acl https://git.kernel.org/vfs/vfs/c/ab7c30987cbb [09/11] fs/fuse: properly handle idmapped ->rename op https://git.kernel.org/vfs/vfs/c/76c0baad3782 [10/11] fs/fuse: allow idmapped mounts https://git.kernel.org/vfs/vfs/c/9aace2eda1bd [11/11] fs/fuse/virtio_fs: allow idmapped mounts https://git.kernel.org/vfs/vfs/c/020a698f136c
Dear friends, This patch series aimed to provide support for idmapped mounts for fuse & virtiofs. We already have idmapped mounts support for almost all widely-used filesystems: * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree)) * network (ceph) Git tree (based on torvalds/master): v3: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v3 current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts Changelog for version 3: - introduce and use a new SB_I_NOIDMAP flag (suggested by Christian) - add support for virtiofs (+user space virtiofsd conversion) Changelog for version 2: - removed "fs/namespace: introduce fs_type->allow_idmap hook" and simplified logic to return -EIO if a fuse daemon does not support idmapped mounts (suggested by Christian Brauner) - passed an "idmap" in more cases even when it's not necessary to simplify things (suggested by Christian Brauner) - take ->rename() RENAME_WHITEOUT into account and forbid it for idmapped mount case Links to previous versions: v2: https://lore.kernel.org/linux-fsdevel/20240814114034.113953-1-aleksandr.mikhalitsyn@canonical.com tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v2 v1: https://lore.kernel.org/all/20240108120824.122178-1-aleksandr.mikhalitsyn@canonical.com/#r tree: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1 Having fuse (+virtiofs) supported looks like a good next step. At the same time fuse conceptually close to the network filesystems and supporting it is a quite challenging task. Let me briefly explain what was done in this series and which obstacles we have. With this series, you can use idmapped mounts with fuse if the following conditions are met: 1. The filesystem daemon declares idmap support (new FUSE_INIT response feature flags FUSE_OWNER_UID_GID_EXT and FUSE_ALLOW_IDMAP) 2. The filesystem superblock was mounted with the "default_permissions" parameter 3. The filesystem fuse daemon does not perform any UID/GID-based checks internally and fully trusts the kernel to do that (yes, it's almost the same as 2.) I have prepared a bunch of real-world examples of the user space modifications that can be done to use this extension: - libfuse support https://github.com/mihalicyn/libfuse/commits/idmap_support - fuse-overlayfs support: https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support - cephfs-fuse conversion example https://github.com/mihalicyn/ceph/commits/fuse_idmap - glusterfs conversion example (there is a conceptual issue) https://github.com/mihalicyn/glusterfs/commits/fuse_idmap - virtiofsd conversion example https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/245 The glusterfs is a bit problematic, unfortunately, because even if the glusterfs superblock was mounted with the "default_permissions" parameter (1 and 2 conditions are satisfied), it fails to satisfy the 3rd condition. The glusterfs fuse daemon sends caller UIDs/GIDs over the wire and all the permission checks are done twice (first on the client side (in the fuse kernel module) and second on the glusterfs server side). Just for demonstration's sake, I found a hacky (but working) solution for glusterfs that disables these server-side permission checks (see [1]). This allows you to play with the filesystem and idmapped mounts and it works just fine. The problem described above is the main problem that we can meet when working on idmapped mounts support for network-based filesystems (or network-like filesystems like fuse). When people look at the idmapped mounts feature at first they tend to think that idmaps are for faking caller UIDs/GIDs, but that's not the case. There was a big discussion about this in the "ceph: support idmapped mounts" patch series [2], [3]. The brief outcome from this discussion is that we don't want and don't have to fool filesystem code and map a caller's UID/GID everywhere, but only in VFS i_op's which are provided with a "struct mnt_idmap *idmap"). For example ->lookup() callback is not provided with it and that's on purpose! We don't expect the low-level filesystem code to do any permissions checks inside this callback because everything was already checked on the higher level (see may_lookup() helper). For local filesystems this assumption works like a charm, but for network-based, unfortunately, not. For example, the cephfs kernel client *always* send called UID/GID with *any* request (->lookup included!) and then *may* (depending on the MDS configuration) perform any permissions checks on the server side based on these values, which obviously leads to issues/inconsistencies if VFS idmaps are involved. Fuse filesystem very-very close to cephfs example, because we have req->in.h.uid/req->in.h.gid and these values are present in all fuse requests and userspace may use them as it wants. All of the above explains why we have a "default_permissions" requirement. If filesystem does not use it, then permission checks will be widespread across all the i_op's like ->lookup, ->unlink, ->readlink instead of being consolidated in the one place (->permission callback). In this series, my approach is the same as in cephfs [4], [5]. Don't touch req->in.h.uid/req->in.h.gid values at all (because we can't properly idmap them as we don't have "struct mnt_idmap *idmap" everywhere), instead, provide the userspace with a new optional (FUSE_OWNER_UID_GID_EXT) UID/GID suitable only for ->mknod, ->mkdir, ->symlink, ->atomic_open and these values have to be used as the owner UID and GID for newly created inodes. Things to discuss: - we enable idmapped mounts support only if "default_permissions" mode is enabled, because otherwise, we would need to deal with UID/GID mappings on the userspace side OR provide the userspace with idmapped req->in.h.uid/req->in.h.gid values which is not something that we probably want to do. Idmapped mounts philosophy is not about faking caller uid/gid. How to play with it: 1. take any patched filesystem from the list (fuse-overlayfs, cephfs-fuse, glusterfs) and mount it 2. ./mount-idmapped --map-mount b:1000:0:2 /mnt/my_fuse_mount /mnt/my_fuse_mount_idmapped (maps UID/GIDs as 1000 -> 0, 1001 -> 1) [ taken from https://raw.githubusercontent.com/brauner/mount-idmapped/master/mount-idmapped.c ] [1] https://github.com/mihalicyn/glusterfs/commit/ab3ec2c7cbe22618cba9cc94a52a492b1904d0b2 [2] https://lore.kernel.org/lkml/20230608154256.562906-1-aleksandr.mikhalitsyn@canonical.com/ [3] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ [4] https://github.com/ceph/ceph/pull/52575 [5] https://lore.kernel.org/all/20230807132626.182101-4-aleksandr.mikhalitsyn@canonical.com/ Thanks! Alex Cc: Christian Brauner <brauner@kernel.org> Cc: Seth Forshee <sforshee@kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: German Maglione <gmaglione@redhat.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Bernd Schubert <bschubert@ddn.com> Cc: <linux-fsdevel@vger.kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Alexander Mikhalitsyn (11): fs/namespace: introduce SB_I_NOIDMAP flag fs/fuse: add FUSE_OWNER_UID_GID_EXT extension fs/fuse: support idmap for mkdir/mknod/symlink/create fs/fuse: support idmapped getattr inode op fs/fuse: support idmapped ->permission inode op fs/fuse: support idmapped ->setattr op fs/fuse: drop idmap argument from __fuse_get_acl fs/fuse: support idmapped ->set_acl fs/fuse: properly handle idmapped ->rename op fs/fuse: allow idmapped mounts fs/fuse/virtio_fs: allow idmapped mounts fs/fuse/acl.c | 10 ++- fs/fuse/dir.c | 146 +++++++++++++++++++++++++------------- fs/fuse/file.c | 2 +- fs/fuse/fuse_i.h | 7 +- fs/fuse/inode.c | 16 ++++- fs/fuse/virtio_fs.c | 1 + fs/namespace.c | 4 ++ include/linux/fs.h | 1 + include/uapi/linux/fuse.h | 24 ++++++- 9 files changed, 148 insertions(+), 63 deletions(-)