diff mbox series

fs: report f_fsid from s_dev for "simple" filesystems

Message ID 20231023143049.2944970-1-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series fs: report f_fsid from s_dev for "simple" filesystems | expand

Commit Message

Amir Goldstein Oct. 23, 2023, 2:30 p.m. UTC
There are many "simple" filesystems (*) that report null f_fsid in
statfs(2).  Those "simple" filesystems report sb->s_dev as the st_dev
field of the stat syscalls for all inodes of the filesystem (**).

In order to enable fanotify reporting of events with fsid on those
"simple" filesystems, report the sb->s_dev number in f_fsid field of
statfs(2).

(*) For most of the "simple" filesystem refered to in this commit, the
->statfs() operation is simple_statfs(). Some of those fs assign the
simple_statfs() method directly in their ->s_op struct and some assign it
indirectly via a call to simple_fill_super() or to pseudo_fs_fill_super()
with either custom or "simple" s_op.
We also make the same change to efivarfs and hugetlbfs, although they do
not use simple_statfs(), because they use the simple_* inode opreations
(e.g. simple_lookup()).

(**) For most of the "simple" filesystems, the ->getattr() method is not
assigned, so stat() is implemented by generic_fillattr().  A few "simple"
filesystem use the simple_getattr() method which also calls
generic_fillattr() to fill most of the stat struct.

The two exceptions are procfs and 9p. procfs implements several different
->getattr() methods, but they all end up calling generic_fillattr() to
fill the st_dev field from sb->s_dev.

9p has more complicated ->getattr() methods, but they too, end up calling
generic_fillattr() to fill the st_dev field from sb->s_dev.

Note that 9p and kernfs also call simple_statfs() from custom ->statfs()
methods which already fill the f_fsid field, but v9fs_statfs() calls
simple_statfs() only in case f_fsid was not filled and kenrfs_statfs()
overwrites f_fsid after calling simple_statfs().

Link: https://lore.kernel.org/r/20230919094820.g5bwharbmy2dq46w@quack3/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

This is a variant of the approach that you suggested in the Link above.
The two variations from your suggestion are:

1. I chose to use s_dev instead of s_uuid - I see no point in generating
   s_uuid for those simple filesystems. IMO, for the simple filesystems
   without open_by_handle_at() support, fanotify fid doesn't need to be
   more unique than {st_dev,st_ino}, because the inode mark pins the
   inode and prevent st_dev,st_ino collisions.

2. f_fsid is not filled by vfs according to fstype flag, but by
   ->statfs() implementations (simple_statfs() for the majority).

When applied together with the generic AT_HANDLE_FID support patches [1],
all of those simple filesystems can be watches with FAN_ERPORT_FID.

According to your audit of filesystems in the Link above, this leaves:
"afs, coda, nfs - networking filesystems where inotify and fanotify have
                  dubious value anyway.

 freevxfs - the only real filesystem without f_fsid. Trivial to handle one
            way or the other.
"

There are two other filesystems that I found in my audit which also don't
fill f_fsid: fuse and gfs2.

fuse is also a sort of a networking filesystems. Also, fuse supports NFS
export (as does nfs in some configurations) and I would like to stick to
the rule that filesystems the support decodable file handles, use an fsid
that is more unique than s_dev.

gfs2 already has s_uuid, so we know what f_fsid should be.
BTW, afs also has a server uuid, it just doesn't set s_uuid.

For btrfs, which fills a non-null, but non-uniform fsid, I already have
patches for inode_get_fsid [2] per your suggestion.

IMO, we can defer dealing with all those remaining cases for later and
solve the "simple" cases first.

Do you agree?

So far, there were no objections to the generic AT_HANDLE_FID support
patches [1], although I am still waiting on an ACK from you on the last
patch. If this fsid patch is also aaceptable, do you think they could
be candidated for upcoming 6.7?

Thanks,
Amir.

[1] https://lore.kernel.org/r/20231018100000.2453965-1-amir73il@gmail.com/
[2] https://github.com/amir73il/linux/commits/inode_fsid

 fs/efivarfs/super.c  | 2 ++
 fs/hugetlbfs/inode.c | 2 ++
 fs/libfs.c           | 3 +++
 3 files changed, 7 insertions(+)

Comments

Jan Kara Oct. 24, 2023, 10:58 a.m. UTC | #1
On Mon 23-10-23 17:30:49, Amir Goldstein wrote:
> There are many "simple" filesystems (*) that report null f_fsid in
> statfs(2).  Those "simple" filesystems report sb->s_dev as the st_dev
> field of the stat syscalls for all inodes of the filesystem (**).
> 
> In order to enable fanotify reporting of events with fsid on those
> "simple" filesystems, report the sb->s_dev number in f_fsid field of
> statfs(2).
> 
> (*) For most of the "simple" filesystem refered to in this commit, the
> ->statfs() operation is simple_statfs(). Some of those fs assign the
> simple_statfs() method directly in their ->s_op struct and some assign it
> indirectly via a call to simple_fill_super() or to pseudo_fs_fill_super()
> with either custom or "simple" s_op.
> We also make the same change to efivarfs and hugetlbfs, although they do
> not use simple_statfs(), because they use the simple_* inode opreations
> (e.g. simple_lookup()).
> 
> (**) For most of the "simple" filesystems, the ->getattr() method is not
> assigned, so stat() is implemented by generic_fillattr().  A few "simple"
> filesystem use the simple_getattr() method which also calls
> generic_fillattr() to fill most of the stat struct.
> 
> The two exceptions are procfs and 9p. procfs implements several different
> ->getattr() methods, but they all end up calling generic_fillattr() to
> fill the st_dev field from sb->s_dev.
> 
> 9p has more complicated ->getattr() methods, but they too, end up calling
> generic_fillattr() to fill the st_dev field from sb->s_dev.
> 
> Note that 9p and kernfs also call simple_statfs() from custom ->statfs()
> methods which already fill the f_fsid field, but v9fs_statfs() calls
> simple_statfs() only in case f_fsid was not filled and kenrfs_statfs()
> overwrites f_fsid after calling simple_statfs().
> 
> Link: https://lore.kernel.org/r/20230919094820.g5bwharbmy2dq46w@quack3/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Looks good. I agree with your choice of sb->s_dev for fsid. Feel free to
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> 
> Jan,
> 
> This is a variant of the approach that you suggested in the Link above.
> The two variations from your suggestion are:
> 
> 1. I chose to use s_dev instead of s_uuid - I see no point in generating
>    s_uuid for those simple filesystems. IMO, for the simple filesystems
>    without open_by_handle_at() support, fanotify fid doesn't need to be
>    more unique than {st_dev,st_ino}, because the inode mark pins the
>    inode and prevent st_dev,st_ino collisions.
> 
> 2. f_fsid is not filled by vfs according to fstype flag, but by
>    ->statfs() implementations (simple_statfs() for the majority).
> 
> When applied together with the generic AT_HANDLE_FID support patches [1],
> all of those simple filesystems can be watches with FAN_ERPORT_FID.
> 
> According to your audit of filesystems in the Link above, this leaves:
> "afs, coda, nfs - networking filesystems where inotify and fanotify have
>                   dubious value anyway.
> 
>  freevxfs - the only real filesystem without f_fsid. Trivial to handle one
>             way or the other.
> "
> 
> There are two other filesystems that I found in my audit which also don't
> fill f_fsid: fuse and gfs2.
> 
> fuse is also a sort of a networking filesystems. Also, fuse supports NFS
> export (as does nfs in some configurations) and I would like to stick to
> the rule that filesystems the support decodable file handles, use an fsid
> that is more unique than s_dev.
> 
> gfs2 already has s_uuid, so we know what f_fsid should be.
> BTW, afs also has a server uuid, it just doesn't set s_uuid.
> 
> For btrfs, which fills a non-null, but non-uniform fsid, I already have
> patches for inode_get_fsid [2] per your suggestion.
> 
> IMO, we can defer dealing with all those remaining cases for later and
> solve the "simple" cases first.
> 
> Do you agree?
> 
> So far, there were no objections to the generic AT_HANDLE_FID support
> patches [1], although I am still waiting on an ACK from you on the last
> patch. If this fsid patch is also aaceptable, do you think they could
> be candidated for upcoming 6.7?
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/r/20231018100000.2453965-1-amir73il@gmail.com/
> [2] https://github.com/amir73il/linux/commits/inode_fsid
> 
>  fs/efivarfs/super.c  | 2 ++
>  fs/hugetlbfs/inode.c | 2 ++
>  fs/libfs.c           | 3 +++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
> index 996271473609..2933090ad11f 100644
> --- a/fs/efivarfs/super.c
> +++ b/fs/efivarfs/super.c
> @@ -30,6 +30,7 @@ static int efivarfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  			 EFI_VARIABLE_BOOTSERVICE_ACCESS |
>  			 EFI_VARIABLE_RUNTIME_ACCESS;
>  	u64 storage_space, remaining_space, max_variable_size;
> +	u64 id = huge_encode_dev(dentry->d_sb->s_dev);
>  	efi_status_t status;
>  
>  	/* Some UEFI firmware does not implement QueryVariableInfo() */
> @@ -53,6 +54,7 @@ static int efivarfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  	buf->f_blocks	= storage_space;
>  	buf->f_bfree	= remaining_space;
>  	buf->f_type	= dentry->d_sb->s_magic;
> +	buf->f_fsid	= u64_to_fsid(id);
>  
>  	/*
>  	 * In f_bavail we declare the free space that the kernel will allow writing
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 316c4cebd3f3..c003a27be6fe 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1204,7 +1204,9 @@ static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(dentry->d_sb);
>  	struct hstate *h = hstate_inode(d_inode(dentry));
> +	u64 id = huge_encode_dev(dentry->d_sb->s_dev);
>  
> +	buf->f_fsid = u64_to_fsid(id);
>  	buf->f_type = HUGETLBFS_MAGIC;
>  	buf->f_bsize = huge_page_size(h);
>  	if (sbinfo) {
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 37f2d34ee090..8117b24b929d 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL(simple_getattr);
>  
>  int simple_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
> +	u64 id = huge_encode_dev(dentry->d_sb->s_dev);
> +
> +	buf->f_fsid = u64_to_fsid(id);
>  	buf->f_type = dentry->d_sb->s_magic;
>  	buf->f_bsize = PAGE_SIZE;
>  	buf->f_namelen = NAME_MAX;
> -- 
> 2.34.1
>
Christian Brauner Oct. 24, 2023, 4:07 p.m. UTC | #2
On Mon, 23 Oct 2023 17:30:49 +0300, Amir Goldstein wrote:
> There are many "simple" filesystems (*) that report null f_fsid in
> statfs(2).  Those "simple" filesystems report sb->s_dev as the st_dev
> field of the stat syscalls for all inodes of the filesystem (**).
> 
> In order to enable fanotify reporting of events with fsid on those
> "simple" filesystems, report the sb->s_dev number in f_fsid field of
> statfs(2).
> 
> [...]

Applied to the vfs.f_fsid branch of the vfs/vfs.git tree.
Patches in the vfs.f_fsid branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.f_fsid

[1/1] fs: report f_fsid from s_dev for "simple" filesystems
      https://git.kernel.org/vfs/vfs/c/14673976a658
diff mbox series

Patch

diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 996271473609..2933090ad11f 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -30,6 +30,7 @@  static int efivarfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 			 EFI_VARIABLE_BOOTSERVICE_ACCESS |
 			 EFI_VARIABLE_RUNTIME_ACCESS;
 	u64 storage_space, remaining_space, max_variable_size;
+	u64 id = huge_encode_dev(dentry->d_sb->s_dev);
 	efi_status_t status;
 
 	/* Some UEFI firmware does not implement QueryVariableInfo() */
@@ -53,6 +54,7 @@  static int efivarfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_blocks	= storage_space;
 	buf->f_bfree	= remaining_space;
 	buf->f_type	= dentry->d_sb->s_magic;
+	buf->f_fsid	= u64_to_fsid(id);
 
 	/*
 	 * In f_bavail we declare the free space that the kernel will allow writing
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 316c4cebd3f3..c003a27be6fe 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1204,7 +1204,9 @@  static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(dentry->d_sb);
 	struct hstate *h = hstate_inode(d_inode(dentry));
+	u64 id = huge_encode_dev(dentry->d_sb->s_dev);
 
+	buf->f_fsid = u64_to_fsid(id);
 	buf->f_type = HUGETLBFS_MAGIC;
 	buf->f_bsize = huge_page_size(h);
 	if (sbinfo) {
diff --git a/fs/libfs.c b/fs/libfs.c
index 37f2d34ee090..8117b24b929d 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -41,6 +41,9 @@  EXPORT_SYMBOL(simple_getattr);
 
 int simple_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
+	u64 id = huge_encode_dev(dentry->d_sb->s_dev);
+
+	buf->f_fsid = u64_to_fsid(id);
 	buf->f_type = dentry->d_sb->s_magic;
 	buf->f_bsize = PAGE_SIZE;
 	buf->f_namelen = NAME_MAX;