nfs: derive f_fsid from server's fsid

Message ID	20231024110109.3007794-1-amir73il@gmail.com (mailing list archive)
State	New, archived
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9A367498 for <linux-fsdevel@vger.kernel.org>; Tue, 24 Oct 2023 11:01:17 +0000 (UTC) From: Amir Goldstein <amir73il@gmail.com> To: Trond Myklebust <trond.myklebust@hammerspace.com>, Anna Schumaker <anna@kernel.org> Cc: Jeff Layton <jlayton@kernel.org>, Chuck Lever <chuck.lever@oracle.com>, Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org Subject: [PATCH] nfs: derive f_fsid from server's fsid Date: Tue, 24 Oct 2023 14:01:09 +0300 Message-Id: <20231024110109.3007794-1-amir73il@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	nfs: derive f_fsid from server's fsid \| expand nfs: derive f_fsid from server's fsid

Amir Goldstein Oct. 24, 2023, 11:01 a.m. UTC

Fold the server's 128bit fsid to report f_fsid in statfs(2).
This is similar to how uuid is folded for f_fsid of ext2/ext4/zonefs.

This allows nfs client to be monitored by fanotify filesystem watch
for local client access if nfs supports re-export.

For example, with inotify-tools 4.23.8.0, the following command can be
used to watch local client access over entire nfs filesystem:

  fsnotifywatch --filesystem /mnt/nfs

Note that fanotify filesystem watch does not report remote changes on
server.  It provides the same notifications as inotify, but it watches
over the entire filesystem and reports file handle of objects and fsid
with events.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Anna, Trond,

I realize that the value of watching local changes without getting
notifications on remote changes is questionable, but still, we want
fanotify to be on-par with inotify in that regard.

Remote notification via fanotify has been requested in the past for fuse
and for smb3. If we ever implement those, they will most likely require
a new opt-in flag to fanotify.

I think that exporting a digest of the server's fsid via statfs(2) on the
client mounts is useful regardless of fanotify, so please consider this
change to NFS client.

Thanks,
Amir.

 fs/nfs/super.c | 1 +
 1 file changed, 1 insertion(+)

Benjamin Coddington Oct. 24, 2023, 2:01 p.m. UTC | #1

On 24 Oct 2023, at 7:01, Amir Goldstein wrote:

> Fold the server's 128bit fsid to report f_fsid in statfs(2).
> This is similar to how uuid is folded for f_fsid of ext2/ext4/zonefs.
>
> This allows nfs client to be monitored by fanotify filesystem watch
> for local client access if nfs supports re-export.
>
> For example, with inotify-tools 4.23.8.0, the following command can be
> used to watch local client access over entire nfs filesystem:
>
>   fsnotifywatch --filesystem /mnt/nfs
>
> Note that fanotify filesystem watch does not report remote changes on
> server.  It provides the same notifications as inotify, but it watches
> over the entire filesystem and reports file handle of objects and fsid
> with events.

I think this will run into trouble where an NFSv4 will report both
fsid.major and fsid.minor as zero for the special root filesystem.   We can
expect an NFSv4 client to have one of these per server.

Could use s_dev from nfs_server for a unique major/minor for each mount on
the client, but these values won't be stable against a particular server
export.

Ben

Amir Goldstein Oct. 24, 2023, 2:58 p.m. UTC | #2

On Tue, Oct 24, 2023 at 5:01 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 24 Oct 2023, at 7:01, Amir Goldstein wrote:
>
> > Fold the server's 128bit fsid to report f_fsid in statfs(2).
> > This is similar to how uuid is folded for f_fsid of ext2/ext4/zonefs.
> >
> > This allows nfs client to be monitored by fanotify filesystem watch
> > for local client access if nfs supports re-export.
> >
> > For example, with inotify-tools 4.23.8.0, the following command can be
> > used to watch local client access over entire nfs filesystem:
> >
> >   fsnotifywatch --filesystem /mnt/nfs
> >
> > Note that fanotify filesystem watch does not report remote changes on
> > server.  It provides the same notifications as inotify, but it watches
> > over the entire filesystem and reports file handle of objects and fsid
> > with events.
>
> I think this will run into trouble where an NFSv4 will report both
> fsid.major and fsid.minor as zero for the special root filesystem.   We can
> expect an NFSv4 client to have one of these per server.
>
> Could use s_dev from nfs_server for a unique major/minor for each mount on
> the client, but these values won't be stable against a particular server
> export.
>

That's a good point.
Not sure I understand the relation between mount/server/export.

If the client mounts the special NFSv4 root filesystem at /mnt/nfs,
are the rest of the server exports going to be accessible via the same
mount/sb or via new auto mounts of different nfs sb?

In any case, f_fsid does not have to be uniform across all inodes
of the same sb. This is the case with btrfs, where the the btrfs sb
has inodes from the root volume and from sub-volumes.
inodes from btrfs sub-volumes have a different f_fsid than inodes
in the root btrfs volume.

We try to detect this case in fanotify, which currently does not
support watching btrfs sub-volume for that reason.
I have a WIP branch [1] for handling non-uniform f_fsid in
fanotify by introducing the s_op->get_fsid(inode) method.

Anyway, IIUC, my proposed f_fsid change is going to be fine for
NFSv2/3 and best effort for NFSv4:
- For NFSv2/3 mount, f_fsid is a good identifier?
- For NFSv4 mount of a specific export, f_fsid is a good identifier?
- For the NFSv4 root export mount, f_fsid remains zero as it is now

Am I understanding this correctly?
Do you see a reason not to make this change?
Do you see a reason to limit this change for NFSv2/3?

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/inode_fsid

Benjamin Coddington Oct. 24, 2023, 3:32 p.m. UTC | #3

On 24 Oct 2023, at 10:58, Amir Goldstein wrote:

> On Tue, Oct 24, 2023 at 5:01 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>>
>> On 24 Oct 2023, at 7:01, Amir Goldstein wrote:
>>
>>> Fold the server's 128bit fsid to report f_fsid in statfs(2).
>>> This is similar to how uuid is folded for f_fsid of ext2/ext4/zonefs.
>>>
>>> This allows nfs client to be monitored by fanotify filesystem watch
>>> for local client access if nfs supports re-export.
>>>
>>> For example, with inotify-tools 4.23.8.0, the following command can be
>>> used to watch local client access over entire nfs filesystem:
>>>
>>>   fsnotifywatch --filesystem /mnt/nfs
>>>
>>> Note that fanotify filesystem watch does not report remote changes on
>>> server.  It provides the same notifications as inotify, but it watches
>>> over the entire filesystem and reports file handle of objects and fsid
>>> with events.
>>
>> I think this will run into trouble where an NFSv4 will report both
>> fsid.major and fsid.minor as zero for the special root filesystem.   We can
>> expect an NFSv4 client to have one of these per server.
>>
>> Could use s_dev from nfs_server for a unique major/minor for each mount on
>> the client, but these values won't be stable against a particular server
>> export.
>>
>
> That's a good point.
> Not sure I understand the relation between mount/server/export.
>
> If the client mounts the special NFSv4 root filesystem at /mnt/nfs,
> are the rest of the server exports going to be accessible via the same
> mount/sb or via new auto mounts of different nfs sb?

If we cross into a new filesystem on the server, then the client will also
cross and leave the "root" and have a new sb with non-zero fsid.

> In any case, f_fsid does not have to be uniform across all inodes
> of the same sb. This is the case with btrfs, where the the btrfs sb
> has inodes from the root volume and from sub-volumes.
> inodes from btrfs sub-volumes have a different f_fsid than inodes
> in the root btrfs volume.

This isn't what I'm worried about.  I'm worried about the case where an nfs
client will have multiple mounts with fsid's of 0:0, and those are
distinctly different mounts of the "root" of NFSv4 on different servers.

> We try to detect this case in fanotify, which currently does not
> support watching btrfs sub-volume for that reason.
> I have a WIP branch [1] for handling non-uniform f_fsid in
> fanotify by introducing the s_op->get_fsid(inode) method.
>
> Anyway, IIUC, my proposed f_fsid change is going to be fine for
> NFSv2/3 and best effort for NFSv4:
> - For NFSv2/3 mount, f_fsid is a good identifier?

Yes, it should represent the same filesystem on the server.  You could still
get duplicates between servers. What's returned in the protocol's u64 fsid
goes into major with minor always zero.

I'm sure there was discussion about what implementations should use long
ago, but that predates me.

> - For NFSv4 mount of a specific export, f_fsid is a good identifier?

Yes, but if the specific export is on the same server's filesystem as the
"root", you'll still get zero.  There are various ways to set fsid on
exports for linux servers, but the fsid will be the same for all exports of
the same filesystem on the server.

> - For the NFSv4 root export mount, f_fsid remains zero as it is now

Yes.

> Am I understanding this correctly?

I think so.

> Do you see a reason not to make this change?
> Do you see a reason to limit this change for NFSv2/3?

I'm not familiar with fanotify enough to know if having multiple fsid 0
mounts of different filesystems on different servers will do the right
thing.  I wanted to point out that very real possibility for v4.

Ben

Amir Goldstein Oct. 24, 2023, 5:12 p.m. UTC | #4

On Tue, Oct 24, 2023 at 6:32 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 24 Oct 2023, at 10:58, Amir Goldstein wrote:
>
> > On Tue, Oct 24, 2023 at 5:01 PM Benjamin Coddington <bcodding@redhat.com> wrote:
> >>
> >> On 24 Oct 2023, at 7:01, Amir Goldstein wrote:
> >>
> >>> Fold the server's 128bit fsid to report f_fsid in statfs(2).
> >>> This is similar to how uuid is folded for f_fsid of ext2/ext4/zonefs.
> >>>
> >>> This allows nfs client to be monitored by fanotify filesystem watch
> >>> for local client access if nfs supports re-export.
> >>>
> >>> For example, with inotify-tools 4.23.8.0, the following command can be
> >>> used to watch local client access over entire nfs filesystem:
> >>>
> >>>   fsnotifywatch --filesystem /mnt/nfs
> >>>
> >>> Note that fanotify filesystem watch does not report remote changes on
> >>> server.  It provides the same notifications as inotify, but it watches
> >>> over the entire filesystem and reports file handle of objects and fsid
> >>> with events.
> >>
> >> I think this will run into trouble where an NFSv4 will report both
> >> fsid.major and fsid.minor as zero for the special root filesystem.   We can
> >> expect an NFSv4 client to have one of these per server.
> >>
> >> Could use s_dev from nfs_server for a unique major/minor for each mount on
> >> the client, but these values won't be stable against a particular server
> >> export.
> >>
> >
> > That's a good point.
> > Not sure I understand the relation between mount/server/export.
> >
> > If the client mounts the special NFSv4 root filesystem at /mnt/nfs,
> > are the rest of the server exports going to be accessible via the same
> > mount/sb or via new auto mounts of different nfs sb?
>
> If we cross into a new filesystem on the server, then the client will also
> cross and leave the "root" and have a new sb with non-zero fsid.
>
> > In any case, f_fsid does not have to be uniform across all inodes
> > of the same sb. This is the case with btrfs, where the the btrfs sb
> > has inodes from the root volume and from sub-volumes.
> > inodes from btrfs sub-volumes have a different f_fsid than inodes
> > in the root btrfs volume.
>
> This isn't what I'm worried about.  I'm worried about the case where an nfs
> client will have multiple mounts with fsid's of 0:0, and those are
> distinctly different mounts of the "root" of NFSv4 on different servers.
>

fanotify_mark() fails with -ENODEV when trying to watch an sb with
zero f_fsid. This is the current state with nfs, so there is no concern
for a problem - it just means that fanotify will not be able to watch
those mounts. It's not great.

> > We try to detect this case in fanotify, which currently does not
> > support watching btrfs sub-volume for that reason.
> > I have a WIP branch [1] for handling non-uniform f_fsid in
> > fanotify by introducing the s_op->get_fsid(inode) method.
> >
> > Anyway, IIUC, my proposed f_fsid change is going to be fine for
> > NFSv2/3 and best effort for NFSv4:
> > - For NFSv2/3 mount, f_fsid is a good identifier?
>
> Yes, it should represent the same filesystem on the server.  You could still
> get duplicates between servers. What's returned in the protocol's u64 fsid
> goes into major with minor always zero.
>
> I'm sure there was discussion about what implementations should use long
> ago, but that predates me.
>

Yeh, duplicates aren't great when watching two different sb with same
f_fsid, it is not possible to know which sb the events came from.
However, the process setting the sb watches can know that in advance.

> > - For NFSv4 mount of a specific export, f_fsid is a good identifier?
>
> Yes, but if the specific export is on the same server's filesystem as the
> "root", you'll still get zero.  There are various ways to set fsid on
> exports for linux servers, but the fsid will be the same for all exports of
> the same filesystem on the server.
>

OK. good to know. I thought zero fsid was only for the root itself.

> > - For the NFSv4 root export mount, f_fsid remains zero as it is now
>
> Yes.
>
> > Am I understanding this correctly?
>
> I think so.
>
> > Do you see a reason not to make this change?
> > Do you see a reason to limit this change for NFSv2/3?
>
> I'm not familiar with fanotify enough to know if having multiple fsid 0
> mounts of different filesystems on different servers will do the right
> thing.  I wanted to point out that very real possibility for v4.
>

The fact that fsid 0 would be very common for many nfs mounts
makes this patch much less attractive.

Because we only get events for local client changes, we do not
have to tie the fsid with the server's fsid, we could just use a local
volatile fsid, as we do in other non-blockdev fs (tmpfs, kernfs).

I am not decisive about the best way to tackle this and since
Jan was not sure about the value of local-only notifications
for network filesystems, I am going to put this one on hold.

Thanks for the useful feedback!
Amir.

Benjamin Coddington Oct. 24, 2023, 6 p.m. UTC | #5

On 24 Oct 2023, at 13:12, Amir Goldstein wrote:
> On Tue, Oct 24, 2023 at 6:32 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>> Yes, but if the specific export is on the same server's filesystem as the
>> "root", you'll still get zero.  There are various ways to set fsid on
>> exports for linux servers, but the fsid will be the same for all exports of
>> the same filesystem on the server.
>>
>
> OK. good to know. I thought zero fsid was only for the root itself.

Yes, but by "root" here I always mean the special NFSv4 root - the special
per-server global root filehandle.

...

>> I'm not familiar with fanotify enough to know if having multiple fsid 0
>> mounts of different filesystems on different servers will do the right
>> thing.  I wanted to point out that very real possibility for v4.
>>
>
> The fact that fsid 0 would be very common for many nfs mounts
> makes this patch much less attractive.
>
> Because we only get events for local client changes, we do not
> have to tie the fsid with the server's fsid, we could just use a local
> volatile fsid, as we do in other non-blockdev fs (tmpfs, kernfs).

A good way to do this would be to use the nfs_server->s_dev's major:minor -
this represents the results of nfs_compare_super(), so it should be the same
value if NFS is treating it as the same filesystem.

> I am not decisive about the best way to tackle this and since
> Jan was not sure about the value of local-only notifications
> for network filesystems, I am going to put this one on hold.
>
> Thanks for the useful feedback!

Sure!

Ben

Amir Goldstein Oct. 25, 2023, 3:20 a.m. UTC | #6

On Tue, Oct 24, 2023 at 9:01 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 24 Oct 2023, at 13:12, Amir Goldstein wrote:
> > On Tue, Oct 24, 2023 at 6:32 PM Benjamin Coddington <bcodding@redhat.com> wrote:
> >> Yes, but if the specific export is on the same server's filesystem as the
> >> "root", you'll still get zero.  There are various ways to set fsid on
> >> exports for linux servers, but the fsid will be the same for all exports of
> >> the same filesystem on the server.
> >>
> >
> > OK. good to know. I thought zero fsid was only for the root itself.
>
> Yes, but by "root" here I always mean the special NFSv4 root - the special
> per-server global root filehandle.
>
> ...
>
> >> I'm not familiar with fanotify enough to know if having multiple fsid 0
> >> mounts of different filesystems on different servers will do the right
> >> thing.  I wanted to point out that very real possibility for v4.
> >>
> >
> > The fact that fsid 0 would be very common for many nfs mounts
> > makes this patch much less attractive.
> >
> > Because we only get events for local client changes, we do not
> > have to tie the fsid with the server's fsid, we could just use a local
> > volatile fsid, as we do in other non-blockdev fs (tmpfs, kernfs).
>
> A good way to do this would be to use the nfs_server->s_dev's major:minor -
> this represents the results of nfs_compare_super(), so it should be the same
> value if NFS is treating it as the same filesystem.
>

Yes, that would avoid local collisions and this is what we are going
to do for most of the simple fs with anon_bdev [1].

But anon_bdev major is 0 and minor is quickly recyclable.
fanotify identified objects by {f_fsid, f_handle} pair.
Since nfs client encodes persistent file handles, I would like to try to hold
its f_fsid to higher standards than those of the simple fs.

You say that server->fsid.minor is always 0.
Perhaps we should mix server->fsid.major with server->s_dev's minor?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20231023143049.2944970-1-amir73il@gmail.com/

nfs: derive f_fsid from server's fsid

Commit Message

Comments

Patch