Message ID | 20250321-xattrat-syscall-v4-0-3e82e6fb3264@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | fs: introduce getfsxattrat and setfsxattrat syscalls | expand |
On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > This patchset introduced two new syscalls getfsxattrat() and > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > except they use *at() semantics. Therefore, there's no need to open the > file to get an fd. > > These syscalls allow userspace to set filesystem inode attributes on > special files. One of the usage examples is XFS quota projects. > > XFS has project quotas which could be attached to a directory. All > new inodes in these directories inherit project ID set on parent > directory. > > The project is created from userspace by opening and calling > FS_IOC_FSSETXATTR on each inode. This is not possible for special > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > with empty project ID. Those inodes then are not shown in the quota > accounting but still exist in the directory. This is not critical but in > the case when special files are created in the directory with already > existing project quota, these new inodes inherit extended attributes. > This creates a mix of special files with and without attributes. > Moreover, special files with attributes don't have a possibility to > become clear or change the attributes. This, in turn, prevents userspace > from re-creating quota project on these existing files. > > Christian, if this get in some mergeable state, please don't merge it > yet. Amir suggested these syscalls better to use updated struct fsxattr > with masking from Pali Rohár patchset, so, let's see how it goes. Andrey, To be honest I don't think it would be fair to delay your syscalls more than needed. If Pali can follow through and post patches on top of your syscalls for next merge window that would be great, but otherwise, I think the minimum requirement is that the syscalls return EINVAL if fsx_pad is not zero. we can take it from there later. We can always also increase the size of struct fsxattr, but let's first use the padding space already available. Thanks, Amir. > > NAME > > getfsxattrat/setfsxattrat - get/set filesystem inode attributes > > SYNOPSIS > > #include <sys/syscall.h> /* Definition of SYS_* constants */ > #include <unistd.h> > > long syscall(SYS_getfsxattrat, int dirfd, const char *pathname, > struct fsxattr *fsx, size_t size, > unsigned int at_flags); > long syscall(SYS_setfsxattrat, int dirfd, const char *pathname, > struct fsxattr *fsx, size_t size, > unsigned int at_flags); > > Note: glibc doesn't provide for getfsxattrat()/setfsxattrat(), > use syscall(2) instead. > > DESCRIPTION > > The syscalls take fd and path to the child together with struct > fsxattr. If path is absolute, fd is not used. If path is empty, > inode under fd is used to get/set attributes on. > > This is an alternative to FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR > ioctl with a difference that file don't need to be open as we > can reference it with a path instead of fd. By having this we > can manipulated filesystem inode attributes not only on regular > files but also on special ones. This is not possible with > FS_IOC_FSSETXATTR ioctl as with special files we can not call > ioctl() directly on the filesystem inode using file descriptor. > > RETURN VALUE > > On success, 0 is returned. On error, -1 is returned, and errno > is set to indicate the error. > > ERRORS > > EINVAL Invalid at_flag specified (only > AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH is > supported). > > EINVAL Size was smaller than any known version of > struct fsxattr. > > EINVAL Invalid combination of parameters provided in > fsxattr for this type of file. > > E2BIG Size of input argument **struct fsxattr** is too > big. > > EBADF Invalid file descriptor was provided. > > EPERM No permission to change this file. > > EOPNOTSUPP Filesystem does not support setting attributes > on this type of inode > > HISTORY > > Added in Linux 6.14. > > EXAMPLE > > Create directory and file "mkdir ./dir && touch ./dir/foo" and then > execute the following program: > > #include <fcntl.h> > #include <errno.h> > #include <string.h> > #include <linux/fs.h> > #include <stdio.h> > #include <sys/syscall.h> > #include <unistd.h> > > int > main(int argc, char **argv) { > int dfd; > int error; > struct fsxattr fsx; > > dfd = open("./dir", O_RDONLY); > if (dfd == -1) { > printf("can not open ./dir"); > return dfd; > } > > error = syscall(467, dfd, "./foo", &fsx, 0); > if (error) { > printf("can not call 467: %s", strerror(errno)); > return error; > } > > printf("dir/foo flags: %d\n", fsx.fsx_xflags); > > fsx.fsx_xflags |= FS_XFLAG_NODUMP; > error = syscall(468, dfd, "./foo", &fsx, 0); > if (error) { > printf("can not call 468: %s", strerror(errno)); > return error; > } > > printf("dir/foo flags: %d\n", fsx.fsx_xflags); > > return error; > } > > SEE ALSO > > ioctl(2), ioctl_iflags(2), ioctl_xfs_fsgetxattr(2) > > --- > Changes in v4: > - Use getname_maybe_null() for correct handling of dfd + path semantic > - Remove restriction for special files on which flags are allowed > - Utilize copy_struct_from_user() for better future compatibility > - Add draft man page to cover letter > - Convert -ENOIOCTLCMD to -EOPNOSUPP as more appropriate for syscall > - Add missing __user to header declaration of syscalls > - Link to v3: https://lore.kernel.org/r/20250211-xattrat-syscall-v3-1-a07d15f898b2@kernel.org > > Changes in v3: > - Remove unnecessary "dfd is dir" check as it checked in user_path_at() > - Remove unnecessary "same filesystem" check > - Use CLASS() instead of directly calling fdget/fdput > - Link to v2: https://lore.kernel.org/r/20250122-xattrat-syscall-v2-1-5b360d4fbcb2@kernel.org > > v1: > https://lore.kernel.org/linuxppc-dev/20250109174540.893098-1-aalbersh@kernel.org/ > > Previous discussion: > https://lore.kernel.org/linux-xfs/20240520164624.665269-2-aalbersh@redhat.com/ > > --- > Andrey Albershteyn (3): > lsm: introduce new hooks for setting/getting inode fsxattr > fs: split fileattr/fsxattr converters into helpers > fs: introduce getfsxattrat and setfsxattrat syscalls > > arch/alpha/kernel/syscalls/syscall.tbl | 2 + > arch/arm/tools/syscall.tbl | 2 + > arch/arm64/tools/syscall_32.tbl | 2 + > arch/m68k/kernel/syscalls/syscall.tbl | 2 + > arch/microblaze/kernel/syscalls/syscall.tbl | 2 + > arch/mips/kernel/syscalls/syscall_n32.tbl | 2 + > arch/mips/kernel/syscalls/syscall_n64.tbl | 2 + > arch/mips/kernel/syscalls/syscall_o32.tbl | 2 + > arch/parisc/kernel/syscalls/syscall.tbl | 2 + > arch/powerpc/kernel/syscalls/syscall.tbl | 2 + > arch/s390/kernel/syscalls/syscall.tbl | 2 + > arch/sh/kernel/syscalls/syscall.tbl | 2 + > arch/sparc/kernel/syscalls/syscall.tbl | 2 + > arch/x86/entry/syscalls/syscall_32.tbl | 2 + > arch/x86/entry/syscalls/syscall_64.tbl | 2 + > arch/xtensa/kernel/syscalls/syscall.tbl | 2 + > fs/inode.c | 130 ++++++++++++++++++++++++++++ > fs/ioctl.c | 39 ++++++--- > include/linux/fileattr.h | 2 + > include/linux/lsm_hook_defs.h | 4 + > include/linux/security.h | 16 ++++ > include/linux/syscalls.h | 6 ++ > include/uapi/asm-generic/unistd.h | 8 +- > include/uapi/linux/fs.h | 3 + > security/security.c | 32 +++++++ > 25 files changed, 259 insertions(+), 13 deletions(-) > --- > base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04 > change-id: 20250114-xattrat-syscall-6a1136d2db59 > > Best regards, > -- > Andrey Albershteyn <aalbersh@kernel.org> > >
On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote: > On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > > > This patchset introduced two new syscalls getfsxattrat() and > > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > > except they use *at() semantics. Therefore, there's no need to open the > > file to get an fd. > > > > These syscalls allow userspace to set filesystem inode attributes on > > special files. One of the usage examples is XFS quota projects. > > > > XFS has project quotas which could be attached to a directory. All > > new inodes in these directories inherit project ID set on parent > > directory. > > > > The project is created from userspace by opening and calling > > FS_IOC_FSSETXATTR on each inode. This is not possible for special > > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > > with empty project ID. Those inodes then are not shown in the quota > > accounting but still exist in the directory. This is not critical but in > > the case when special files are created in the directory with already > > existing project quota, these new inodes inherit extended attributes. > > This creates a mix of special files with and without attributes. > > Moreover, special files with attributes don't have a possibility to > > become clear or change the attributes. This, in turn, prevents userspace > > from re-creating quota project on these existing files. > > > > Christian, if this get in some mergeable state, please don't merge it > > yet. Amir suggested these syscalls better to use updated struct fsxattr > > with masking from Pali Rohár patchset, so, let's see how it goes. > > Andrey, > > To be honest I don't think it would be fair to delay your syscalls more > than needed. I agree. > If Pali can follow through and post patches on top of your syscalls for > next merge window that would be great, but otherwise, I think the > minimum requirement is that the syscalls return EINVAL if fsx_pad > is not zero. we can take it from there later. IMHO SYS_getfsxattrat is fine in this form. For SYS_setfsxattrat I think there are needed some modifications otherwise we would have problem again with backward compatibility as is with ioctl if the syscall wants to be extended in future. I would suggest for following modifications for SYS_setfsxattrat: - return EINVAL if fsx_xflags contains some reserved or unsupported flag - add some flag to completely ignore fsx_extsize, fsx_projid, and fsx_cowextsize fields, so SYS_setfsxattrat could be used just to change fsx_xflags, and so could be used without the preceding SYS_getfsxattrat call. What do you think about it? Use cases for future without breaking backward compatibility: - atomically / race-free do set or clear just one flag in fsx_xflags (so avoid getfsxattrat - modify buffer - setfsxattrat roundtrip) - use fsx_pad[] for some new purposes > We can always also increase the size of struct fsxattr, but let's first > use the padding space already available. > > Thanks, > Amir. > > > > > NAME > > > > getfsxattrat/setfsxattrat - get/set filesystem inode attributes > > > > SYNOPSIS > > > > #include <sys/syscall.h> /* Definition of SYS_* constants */ > > #include <unistd.h> > > > > long syscall(SYS_getfsxattrat, int dirfd, const char *pathname, > > struct fsxattr *fsx, size_t size, > > unsigned int at_flags); > > long syscall(SYS_setfsxattrat, int dirfd, const char *pathname, > > struct fsxattr *fsx, size_t size, > > unsigned int at_flags); > > > > Note: glibc doesn't provide for getfsxattrat()/setfsxattrat(), > > use syscall(2) instead. > > > > DESCRIPTION > > > > The syscalls take fd and path to the child together with struct > > fsxattr. If path is absolute, fd is not used. If path is empty, > > inode under fd is used to get/set attributes on. > > > > This is an alternative to FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR > > ioctl with a difference that file don't need to be open as we > > can reference it with a path instead of fd. By having this we > > can manipulated filesystem inode attributes not only on regular > > files but also on special ones. This is not possible with > > FS_IOC_FSSETXATTR ioctl as with special files we can not call > > ioctl() directly on the filesystem inode using file descriptor. > > > > RETURN VALUE > > > > On success, 0 is returned. On error, -1 is returned, and errno > > is set to indicate the error. > > > > ERRORS > > > > EINVAL Invalid at_flag specified (only > > AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH is > > supported). > > > > EINVAL Size was smaller than any known version of > > struct fsxattr. > > > > EINVAL Invalid combination of parameters provided in > > fsxattr for this type of file. > > > > E2BIG Size of input argument **struct fsxattr** is too > > big. > > > > EBADF Invalid file descriptor was provided. > > > > EPERM No permission to change this file. > > > > EOPNOTSUPP Filesystem does not support setting attributes > > on this type of inode > > > > HISTORY > > > > Added in Linux 6.14. > > > > EXAMPLE > > > > Create directory and file "mkdir ./dir && touch ./dir/foo" and then > > execute the following program: > > > > #include <fcntl.h> > > #include <errno.h> > > #include <string.h> > > #include <linux/fs.h> > > #include <stdio.h> > > #include <sys/syscall.h> > > #include <unistd.h> > > > > int > > main(int argc, char **argv) { > > int dfd; > > int error; > > struct fsxattr fsx; > > > > dfd = open("./dir", O_RDONLY); > > if (dfd == -1) { > > printf("can not open ./dir"); > > return dfd; > > } > > > > error = syscall(467, dfd, "./foo", &fsx, 0); > > if (error) { > > printf("can not call 467: %s", strerror(errno)); > > return error; > > } > > > > printf("dir/foo flags: %d\n", fsx.fsx_xflags); > > > > fsx.fsx_xflags |= FS_XFLAG_NODUMP; > > error = syscall(468, dfd, "./foo", &fsx, 0); > > if (error) { > > printf("can not call 468: %s", strerror(errno)); > > return error; > > } > > > > printf("dir/foo flags: %d\n", fsx.fsx_xflags); > > > > return error; > > } > > > > SEE ALSO > > > > ioctl(2), ioctl_iflags(2), ioctl_xfs_fsgetxattr(2) > > > > --- > > Changes in v4: > > - Use getname_maybe_null() for correct handling of dfd + path semantic > > - Remove restriction for special files on which flags are allowed > > - Utilize copy_struct_from_user() for better future compatibility > > - Add draft man page to cover letter > > - Convert -ENOIOCTLCMD to -EOPNOSUPP as more appropriate for syscall > > - Add missing __user to header declaration of syscalls > > - Link to v3: https://lore.kernel.org/r/20250211-xattrat-syscall-v3-1-a07d15f898b2@kernel.org > > > > Changes in v3: > > - Remove unnecessary "dfd is dir" check as it checked in user_path_at() > > - Remove unnecessary "same filesystem" check > > - Use CLASS() instead of directly calling fdget/fdput > > - Link to v2: https://lore.kernel.org/r/20250122-xattrat-syscall-v2-1-5b360d4fbcb2@kernel.org > > > > v1: > > https://lore.kernel.org/linuxppc-dev/20250109174540.893098-1-aalbersh@kernel.org/ > > > > Previous discussion: > > https://lore.kernel.org/linux-xfs/20240520164624.665269-2-aalbersh@redhat.com/ > > > > --- > > Andrey Albershteyn (3): > > lsm: introduce new hooks for setting/getting inode fsxattr > > fs: split fileattr/fsxattr converters into helpers > > fs: introduce getfsxattrat and setfsxattrat syscalls > > > > arch/alpha/kernel/syscalls/syscall.tbl | 2 + > > arch/arm/tools/syscall.tbl | 2 + > > arch/arm64/tools/syscall_32.tbl | 2 + > > arch/m68k/kernel/syscalls/syscall.tbl | 2 + > > arch/microblaze/kernel/syscalls/syscall.tbl | 2 + > > arch/mips/kernel/syscalls/syscall_n32.tbl | 2 + > > arch/mips/kernel/syscalls/syscall_n64.tbl | 2 + > > arch/mips/kernel/syscalls/syscall_o32.tbl | 2 + > > arch/parisc/kernel/syscalls/syscall.tbl | 2 + > > arch/powerpc/kernel/syscalls/syscall.tbl | 2 + > > arch/s390/kernel/syscalls/syscall.tbl | 2 + > > arch/sh/kernel/syscalls/syscall.tbl | 2 + > > arch/sparc/kernel/syscalls/syscall.tbl | 2 + > > arch/x86/entry/syscalls/syscall_32.tbl | 2 + > > arch/x86/entry/syscalls/syscall_64.tbl | 2 + > > arch/xtensa/kernel/syscalls/syscall.tbl | 2 + > > fs/inode.c | 130 ++++++++++++++++++++++++++++ > > fs/ioctl.c | 39 ++++++--- > > include/linux/fileattr.h | 2 + > > include/linux/lsm_hook_defs.h | 4 + > > include/linux/security.h | 16 ++++ > > include/linux/syscalls.h | 6 ++ > > include/uapi/asm-generic/unistd.h | 8 +- > > include/uapi/linux/fs.h | 3 + > > security/security.c | 32 +++++++ > > 25 files changed, 259 insertions(+), 13 deletions(-) > > --- > > base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04 > > change-id: 20250114-xattrat-syscall-6a1136d2db59 > > > > Best regards, > > -- > > Andrey Albershteyn <aalbersh@kernel.org> > > > >
On Sun, Mar 23, 2025 at 11:32 AM Pali Rohár <pali@kernel.org> wrote: > > On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote: > > On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > > > > > This patchset introduced two new syscalls getfsxattrat() and > > > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > > > except they use *at() semantics. Therefore, there's no need to open the > > > file to get an fd. > > > > > > These syscalls allow userspace to set filesystem inode attributes on > > > special files. One of the usage examples is XFS quota projects. > > > > > > XFS has project quotas which could be attached to a directory. All > > > new inodes in these directories inherit project ID set on parent > > > directory. > > > > > > The project is created from userspace by opening and calling > > > FS_IOC_FSSETXATTR on each inode. This is not possible for special > > > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > > > with empty project ID. Those inodes then are not shown in the quota > > > accounting but still exist in the directory. This is not critical but in > > > the case when special files are created in the directory with already > > > existing project quota, these new inodes inherit extended attributes. > > > This creates a mix of special files with and without attributes. > > > Moreover, special files with attributes don't have a possibility to > > > become clear or change the attributes. This, in turn, prevents userspace > > > from re-creating quota project on these existing files. > > > > > > Christian, if this get in some mergeable state, please don't merge it > > > yet. Amir suggested these syscalls better to use updated struct fsxattr > > > with masking from Pali Rohár patchset, so, let's see how it goes. > > > > Andrey, > > > > To be honest I don't think it would be fair to delay your syscalls more > > than needed. > > I agree. > > > If Pali can follow through and post patches on top of your syscalls for > > next merge window that would be great, but otherwise, I think the > > minimum requirement is that the syscalls return EINVAL if fsx_pad > > is not zero. we can take it from there later. > > IMHO SYS_getfsxattrat is fine in this form. > > For SYS_setfsxattrat I think there are needed some modifications > otherwise we would have problem again with backward compatibility as > is with ioctl if the syscall wants to be extended in future. > > I would suggest for following modifications for SYS_setfsxattrat: > > - return EINVAL if fsx_xflags contains some reserved or unsupported flag > > - add some flag to completely ignore fsx_extsize, fsx_projid, and > fsx_cowextsize fields, so SYS_setfsxattrat could be used just to > change fsx_xflags, and so could be used without the preceding > SYS_getfsxattrat call. > > What do you think about it? I think all Andrey needs to do now is return -EINVAL if fsx_pad is not zero. You can use this later to extend for the semantics of flags/fields mask and we can have a long discussion later on what this semantics should be. Right? Amir.
On Thursday 27 March 2025 12:47:02 Amir Goldstein wrote: > On Sun, Mar 23, 2025 at 11:32 AM Pali Rohár <pali@kernel.org> wrote: > > > > On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote: > > > On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > > > > > > > This patchset introduced two new syscalls getfsxattrat() and > > > > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > > > > except they use *at() semantics. Therefore, there's no need to open the > > > > file to get an fd. > > > > > > > > These syscalls allow userspace to set filesystem inode attributes on > > > > special files. One of the usage examples is XFS quota projects. > > > > > > > > XFS has project quotas which could be attached to a directory. All > > > > new inodes in these directories inherit project ID set on parent > > > > directory. > > > > > > > > The project is created from userspace by opening and calling > > > > FS_IOC_FSSETXATTR on each inode. This is not possible for special > > > > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > > > > with empty project ID. Those inodes then are not shown in the quota > > > > accounting but still exist in the directory. This is not critical but in > > > > the case when special files are created in the directory with already > > > > existing project quota, these new inodes inherit extended attributes. > > > > This creates a mix of special files with and without attributes. > > > > Moreover, special files with attributes don't have a possibility to > > > > become clear or change the attributes. This, in turn, prevents userspace > > > > from re-creating quota project on these existing files. > > > > > > > > Christian, if this get in some mergeable state, please don't merge it > > > > yet. Amir suggested these syscalls better to use updated struct fsxattr > > > > with masking from Pali Rohár patchset, so, let's see how it goes. > > > > > > Andrey, > > > > > > To be honest I don't think it would be fair to delay your syscalls more > > > than needed. > > > > I agree. > > > > > If Pali can follow through and post patches on top of your syscalls for > > > next merge window that would be great, but otherwise, I think the > > > minimum requirement is that the syscalls return EINVAL if fsx_pad > > > is not zero. we can take it from there later. > > > > IMHO SYS_getfsxattrat is fine in this form. > > > > For SYS_setfsxattrat I think there are needed some modifications > > otherwise we would have problem again with backward compatibility as > > is with ioctl if the syscall wants to be extended in future. > > > > I would suggest for following modifications for SYS_setfsxattrat: > > > > - return EINVAL if fsx_xflags contains some reserved or unsupported flag > > > > - add some flag to completely ignore fsx_extsize, fsx_projid, and > > fsx_cowextsize fields, so SYS_setfsxattrat could be used just to > > change fsx_xflags, and so could be used without the preceding > > SYS_getfsxattrat call. > > > > What do you think about it? > > I think all Andrey needs to do now is return -EINVAL if fsx_pad is not zero. > > You can use this later to extend for the semantics of flags/fields mask > and we can have a long discussion later on what this semantics should be. > > Right? > > Amir. It is really enough? All new extensions later would have to be added into fsx_pad fields, and currently unused bits in fsx_xflags would be unusable for extensions.
On Thu, Mar 27, 2025 at 8:26 PM Pali Rohár <pali@kernel.org> wrote: > > On Thursday 27 March 2025 12:47:02 Amir Goldstein wrote: > > On Sun, Mar 23, 2025 at 11:32 AM Pali Rohár <pali@kernel.org> wrote: > > > > > > On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote: > > > > On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > > > > > > > > > This patchset introduced two new syscalls getfsxattrat() and > > > > > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > > > > > except they use *at() semantics. Therefore, there's no need to open the > > > > > file to get an fd. > > > > > > > > > > These syscalls allow userspace to set filesystem inode attributes on > > > > > special files. One of the usage examples is XFS quota projects. > > > > > > > > > > XFS has project quotas which could be attached to a directory. All > > > > > new inodes in these directories inherit project ID set on parent > > > > > directory. > > > > > > > > > > The project is created from userspace by opening and calling > > > > > FS_IOC_FSSETXATTR on each inode. This is not possible for special > > > > > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > > > > > with empty project ID. Those inodes then are not shown in the quota > > > > > accounting but still exist in the directory. This is not critical but in > > > > > the case when special files are created in the directory with already > > > > > existing project quota, these new inodes inherit extended attributes. > > > > > This creates a mix of special files with and without attributes. > > > > > Moreover, special files with attributes don't have a possibility to > > > > > become clear or change the attributes. This, in turn, prevents userspace > > > > > from re-creating quota project on these existing files. > > > > > > > > > > Christian, if this get in some mergeable state, please don't merge it > > > > > yet. Amir suggested these syscalls better to use updated struct fsxattr > > > > > with masking from Pali Rohár patchset, so, let's see how it goes. > > > > > > > > Andrey, > > > > > > > > To be honest I don't think it would be fair to delay your syscalls more > > > > than needed. > > > > > > I agree. > > > > > > > If Pali can follow through and post patches on top of your syscalls for > > > > next merge window that would be great, but otherwise, I think the > > > > minimum requirement is that the syscalls return EINVAL if fsx_pad > > > > is not zero. we can take it from there later. > > > > > > IMHO SYS_getfsxattrat is fine in this form. > > > > > > For SYS_setfsxattrat I think there are needed some modifications > > > otherwise we would have problem again with backward compatibility as > > > is with ioctl if the syscall wants to be extended in future. > > > > > > I would suggest for following modifications for SYS_setfsxattrat: > > > > > > - return EINVAL if fsx_xflags contains some reserved or unsupported flag > > > > > > - add some flag to completely ignore fsx_extsize, fsx_projid, and > > > fsx_cowextsize fields, so SYS_setfsxattrat could be used just to > > > change fsx_xflags, and so could be used without the preceding > > > SYS_getfsxattrat call. > > > > > > What do you think about it? > > > > I think all Andrey needs to do now is return -EINVAL if fsx_pad is not zero. > > > > You can use this later to extend for the semantics of flags/fields mask > > and we can have a long discussion later on what this semantics should be. > > > > Right? > > > > Amir. > > It is really enough? I don't know. Let's see... > All new extensions later would have to be added > into fsx_pad fields, and currently unused bits in fsx_xflags would be > unusable for extensions. I am working under the assumption that the first extension would be to support fsx_xflags_mask and from there, you could add filesystem flags support checks and then new flags. Am I wrong? Obviously, fsx_xflags_mask would be taken from fsx_pad space. After that extension is implemented, calling SYS_setfsxattrat() with a zero fsx_xflags_mask would be silly for programs that do not do the legacy get+set. So when we introduce fsx_xflags_mask, we could say that a value of zero means that the mask is not being checked at all and unknown flags in set syscall are ignored (a.k.a legacy ioctl behavior). Programs that actually want to try and set without get will have to set a non zero fsx_xflags_mask to do something useful. I don't think this is great. I would rather that the first version of syscalls will require the mask and will always enforce filesystems supported flags. If you can get those patches (on top of current series) posted and reviewed in time for the next merge window, including consensus on the actual semantics, that would be the best IMO. But I am just preparing a plan B in case you do not have time to work on the patches or if consensus on the API extensions is not reached on time. I think that for plan B, the minimum is to verify zero pad field and that is something that this syscall has to do anyway, because this is the way that backward compact APIs work. If you want the syscall to always return -EINVAL for setting xflags that are currently undefined I agree that would be nice as well. Thanks, Amir.
On Thursday 27 March 2025 21:57:34 Amir Goldstein wrote: > On Thu, Mar 27, 2025 at 8:26 PM Pali Rohár <pali@kernel.org> wrote: > > > > On Thursday 27 March 2025 12:47:02 Amir Goldstein wrote: > > > On Sun, Mar 23, 2025 at 11:32 AM Pali Rohár <pali@kernel.org> wrote: > > > > > > > > On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote: > > > > > On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > > > > > > > > > > > This patchset introduced two new syscalls getfsxattrat() and > > > > > > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > > > > > > except they use *at() semantics. Therefore, there's no need to open the > > > > > > file to get an fd. > > > > > > > > > > > > These syscalls allow userspace to set filesystem inode attributes on > > > > > > special files. One of the usage examples is XFS quota projects. > > > > > > > > > > > > XFS has project quotas which could be attached to a directory. All > > > > > > new inodes in these directories inherit project ID set on parent > > > > > > directory. > > > > > > > > > > > > The project is created from userspace by opening and calling > > > > > > FS_IOC_FSSETXATTR on each inode. This is not possible for special > > > > > > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > > > > > > with empty project ID. Those inodes then are not shown in the quota > > > > > > accounting but still exist in the directory. This is not critical but in > > > > > > the case when special files are created in the directory with already > > > > > > existing project quota, these new inodes inherit extended attributes. > > > > > > This creates a mix of special files with and without attributes. > > > > > > Moreover, special files with attributes don't have a possibility to > > > > > > become clear or change the attributes. This, in turn, prevents userspace > > > > > > from re-creating quota project on these existing files. > > > > > > > > > > > > Christian, if this get in some mergeable state, please don't merge it > > > > > > yet. Amir suggested these syscalls better to use updated struct fsxattr > > > > > > with masking from Pali Rohár patchset, so, let's see how it goes. > > > > > > > > > > Andrey, > > > > > > > > > > To be honest I don't think it would be fair to delay your syscalls more > > > > > than needed. > > > > > > > > I agree. > > > > > > > > > If Pali can follow through and post patches on top of your syscalls for > > > > > next merge window that would be great, but otherwise, I think the > > > > > minimum requirement is that the syscalls return EINVAL if fsx_pad > > > > > is not zero. we can take it from there later. > > > > > > > > IMHO SYS_getfsxattrat is fine in this form. > > > > > > > > For SYS_setfsxattrat I think there are needed some modifications > > > > otherwise we would have problem again with backward compatibility as > > > > is with ioctl if the syscall wants to be extended in future. > > > > > > > > I would suggest for following modifications for SYS_setfsxattrat: > > > > > > > > - return EINVAL if fsx_xflags contains some reserved or unsupported flag > > > > > > > > - add some flag to completely ignore fsx_extsize, fsx_projid, and > > > > fsx_cowextsize fields, so SYS_setfsxattrat could be used just to > > > > change fsx_xflags, and so could be used without the preceding > > > > SYS_getfsxattrat call. > > > > > > > > What do you think about it? > > > > > > I think all Andrey needs to do now is return -EINVAL if fsx_pad is not zero. > > > > > > You can use this later to extend for the semantics of flags/fields mask > > > and we can have a long discussion later on what this semantics should be. > > > > > > Right? > > > > > > Amir. > > > > It is really enough? > > I don't know. Let's see... > > > All new extensions later would have to be added > > into fsx_pad fields, and currently unused bits in fsx_xflags would be > > unusable for extensions. > > I am working under the assumption that the first extension would be > to support fsx_xflags_mask and from there, you could add filesystem > flags support checks and then new flags. Am I wrong? > > Obviously, fsx_xflags_mask would be taken from fsx_pad space. > After that extension is implemented, calling SYS_setfsxattrat() with > a zero fsx_xflags_mask would be silly for programs that do not do > the legacy get+set. > > So when we introduce fsx_xflags_mask, we could say that a value > of zero means that the mask is not being checked at all and unknown > flags in set syscall are ignored (a.k.a legacy ioctl behavior). > > Programs that actually want to try and set without get will have to set > a non zero fsx_xflags_mask to do something useful. Here we need to also solve the problem that without GET call we do not have valid values for fsx_extsize, fsx_projid, and fsx_cowextsize. So maybe we would need some flag in fsx_pad that fsx_extsize, fsx_projid, or fsx_cowextsize are ignored/masked. > I don't think this is great. > I would rather that the first version of syscalls will require the mask > and will always enforce filesystems supported flags. It is not great... But what about this? In a first step (part of this syscall patch series) would be just a check that fsx_pad is zero. Non-zero will return -EINVAL. In next changes would added fsx_filter bit field, which for each fsx_xflags and also for fsx_extsize, fsx_projid, and fsx_cowextsize fields would add a new bit flag which would say (when SET) that the particular thing has to be ignored. So when fsx_pad is all-zeros then fsx_filter (first field in fsx_pad) would say that nothing in fsx_xflags, fsx_extsize, fsx_projid, and fsx_cowextsize is ignored, and hence behave like before. And when something in fsx_pad/fsx_filter is set then it says which fields are ignored/filtered-out. > If you can get those patches (on top of current series) posted and > reviewed in time for the next merge window, including consensus > on the actual semantics, that would be the best IMO. I think that this starting to be more complicated to rebase my patches in a way that they do not affect IOCTL path but implement it properly for new syscall path. It does not sounds like a trivial thing which I would finish in merge window time and having proper review and consensus on this. > But I am just preparing a plan B in case you do not have time to > work on the patches or if consensus on the API extensions is not > reached on time. > > I think that for plan B, the minimum is to verify zero pad field and > that is something that this syscall has to do anyway, because this > is the way that backward compact APIs work. > > If you want the syscall to always return -EINVAL for setting xflags > that are currently undefined I agree that would be nice as well. > > Thanks, > Amir.
On Thu, Mar 27, 2025 at 10:13 PM Pali Rohár <pali@kernel.org> wrote: > > On Thursday 27 March 2025 21:57:34 Amir Goldstein wrote: > > On Thu, Mar 27, 2025 at 8:26 PM Pali Rohár <pali@kernel.org> wrote: > > > > > > On Thursday 27 March 2025 12:47:02 Amir Goldstein wrote: > > > > On Sun, Mar 23, 2025 at 11:32 AM Pali Rohár <pali@kernel.org> wrote: > > > > > > > > > > On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote: > > > > > > On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn <aalbersh@redhat.com> wrote: > > > > > > > > > > > > > > This patchset introduced two new syscalls getfsxattrat() and > > > > > > > setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl() > > > > > > > except they use *at() semantics. Therefore, there's no need to open the > > > > > > > file to get an fd. > > > > > > > > > > > > > > These syscalls allow userspace to set filesystem inode attributes on > > > > > > > special files. One of the usage examples is XFS quota projects. > > > > > > > > > > > > > > XFS has project quotas which could be attached to a directory. All > > > > > > > new inodes in these directories inherit project ID set on parent > > > > > > > directory. > > > > > > > > > > > > > > The project is created from userspace by opening and calling > > > > > > > FS_IOC_FSSETXATTR on each inode. This is not possible for special > > > > > > > files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left > > > > > > > with empty project ID. Those inodes then are not shown in the quota > > > > > > > accounting but still exist in the directory. This is not critical but in > > > > > > > the case when special files are created in the directory with already > > > > > > > existing project quota, these new inodes inherit extended attributes. > > > > > > > This creates a mix of special files with and without attributes. > > > > > > > Moreover, special files with attributes don't have a possibility to > > > > > > > become clear or change the attributes. This, in turn, prevents userspace > > > > > > > from re-creating quota project on these existing files. > > > > > > > > > > > > > > Christian, if this get in some mergeable state, please don't merge it > > > > > > > yet. Amir suggested these syscalls better to use updated struct fsxattr > > > > > > > with masking from Pali Rohár patchset, so, let's see how it goes. > > > > > > > > > > > > Andrey, > > > > > > > > > > > > To be honest I don't think it would be fair to delay your syscalls more > > > > > > than needed. > > > > > > > > > > I agree. > > > > > > > > > > > If Pali can follow through and post patches on top of your syscalls for > > > > > > next merge window that would be great, but otherwise, I think the > > > > > > minimum requirement is that the syscalls return EINVAL if fsx_pad > > > > > > is not zero. we can take it from there later. > > > > > > > > > > IMHO SYS_getfsxattrat is fine in this form. > > > > > > > > > > For SYS_setfsxattrat I think there are needed some modifications > > > > > otherwise we would have problem again with backward compatibility as > > > > > is with ioctl if the syscall wants to be extended in future. > > > > > > > > > > I would suggest for following modifications for SYS_setfsxattrat: > > > > > > > > > > - return EINVAL if fsx_xflags contains some reserved or unsupported flag > > > > > > > > > > - add some flag to completely ignore fsx_extsize, fsx_projid, and > > > > > fsx_cowextsize fields, so SYS_setfsxattrat could be used just to > > > > > change fsx_xflags, and so could be used without the preceding > > > > > SYS_getfsxattrat call. > > > > > > > > > > What do you think about it? > > > > > > > > I think all Andrey needs to do now is return -EINVAL if fsx_pad is not zero. > > > > > > > > You can use this later to extend for the semantics of flags/fields mask > > > > and we can have a long discussion later on what this semantics should be. > > > > > > > > Right? > > > > > > > > Amir. > > > > > > It is really enough? > > > > I don't know. Let's see... > > > > > All new extensions later would have to be added > > > into fsx_pad fields, and currently unused bits in fsx_xflags would be > > > unusable for extensions. > > > > I am working under the assumption that the first extension would be > > to support fsx_xflags_mask and from there, you could add filesystem > > flags support checks and then new flags. Am I wrong? > > > > Obviously, fsx_xflags_mask would be taken from fsx_pad space. > > After that extension is implemented, calling SYS_setfsxattrat() with > > a zero fsx_xflags_mask would be silly for programs that do not do > > the legacy get+set. > > > > So when we introduce fsx_xflags_mask, we could say that a value > > of zero means that the mask is not being checked at all and unknown > > flags in set syscall are ignored (a.k.a legacy ioctl behavior). > > > > Programs that actually want to try and set without get will have to set > > a non zero fsx_xflags_mask to do something useful. > > Here we need to also solve the problem that without GET call we do not > have valid values for fsx_extsize, fsx_projid, and fsx_cowextsize. So > maybe we would need some flag in fsx_pad that fsx_extsize, fsx_projid, > or fsx_cowextsize are ignored/masked. > > > I don't think this is great. > > I would rather that the first version of syscalls will require the mask > > and will always enforce filesystems supported flags. > > It is not great... But what about this? In a first step (part of this > syscall patch series) would be just a check that fsx_pad is zero. > Non-zero will return -EINVAL. > > In next changes would added fsx_filter bit field, which for each > fsx_xflags and also for fsx_extsize, fsx_projid, and fsx_cowextsize > fields would add a new bit flag which would say (when SET) that the > particular thing has to be ignored. 1. I don't like the inverse mask. statx already has the stx_mask and stx_attributes_mask, so I rather stick to same semantics because some of those attributes are exposed via statx as well 2. fsx_*extsize already have a bit that says if that the particular attribute is valid or not, so setting a zero fsx_cowextsize with the flag FS_XFLAG_COWEXTSIZE has no effect in xfs: /* * Only set the extent size hint if we've already determined that the * extent size hint should be set on the inode. If no extent size flags * are set on the inode then unconditionally clear the extent size hint. */ if (ip->i_diflags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)) ip->i_extsize = XFS_B_TO_FSB(mp, fa->fsx_extsize); else ip->i_extsize = 0; if (xfs_has_v3inodes(mp)) { if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) ip->i_cowextsize = XFS_B_TO_FSB(mp, fa->fsx_cowextsize); else ip->i_cowextsize = 0; } I think we need to enforce this logic in fileattr_set_prepare() and I think we need to add a flag FS_XFLAG_PROJID that will be set in GET when fsx_projid != 0 and similarly required when setting fsx_projid != 0. Probably will need to add some backward compat glue for this flag in GET ioctl to avoid breaking out of tree fs and fuse. > > So when fsx_pad is all-zeros then fsx_filter (first field in fsx_pad) > would say that nothing in fsx_xflags, fsx_extsize, fsx_projid, and > fsx_cowextsize is ignored, and hence behave like before. > > And when something in fsx_pad/fsx_filter is set then it says which > fields are ignored/filtered-out. > > > If you can get those patches (on top of current series) posted and > > reviewed in time for the next merge window, including consensus > > on the actual semantics, that would be the best IMO. > > I think that this starting to be more complicated to rebase my patches > in a way that they do not affect IOCTL path but implement it properly > for new syscall path. It does not sounds like a trivial thing which I > would finish in merge window time and having proper review and consensus > on this. > Yes, it is better to separate the two efforts. wrt erroring on unsupported SET flags, all fs other than xfs already have some variant of fileattr_has_fsx(), so xfs is the only filesystem that requires special care with the new syscalls. It's easier to write a patch than it is to explain what I mean, so I'll try to write a patch. Thanks, Amir.