mbox series

[0/8,RFC] tidy up various VFS lookup functions

Message ID 20250314045655.603377-1-neil@brown.name (mailing list archive)
Headers show
Series tidy up various VFS lookup functions | expand

Message

NeilBrown March 14, 2025, 12:34 a.m. UTC
VFS has some functions with names containing "lookup_one_len" and others
without the "_len".  This difference has nothing to do with "len".

The functions without "_len" take a "mnt_idmap" pointer.  This is found
in the "vfsmount" and that is an important question when choosing which
to use: do you have a vfsmount, or are you "inside" the filesystem.  A
related question is "is permission checking relevant here?".

nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
functions.  They pass nop_mnt_idmap which is not correct if the vfsmount
is actually idmaped.  For cachefiles it probably (?) doesn't matter as
the accesses to the backing filesystem are always does with elevated privileged (?).

For nfsd it would matter if anyone exported an idmapped filesystem.  I
wonder if anyone has tried...

These patches change the "lookup_one" functions to take a vfsmount
instead of a mnt_idmap because I think that makes the intention clearer.

It also renames the "_one" functions to be "_noperm" and removes the
permission checking completely.  In all cases where they are (correctly)
used permission checking is irrelevant.

I haven't included changes to afs because there are patches in vfs.all
which make a lot of changes to lookup in afs.  I think (if they are seen
as a good idea) these patches should aim to land after the afs patches
and any further fixup in afs can happen then.

The nfsd and cachefiles patches probably should be separate.  Maybe I
should submit those to relevant maintainers first, and one afs,
cachefiles, and nfsd changes have landed I can submit this series with
appropriate modifications.

May main question for review is : have I understood mnt_idmap correctly?

These patches are based on vfs-6.15.async.dir as they touch mkdir
related code.  There is a small conflict with the recently posted patch
to remove locking from try_lookup_one_len() calls.

Thanks,
NeilBrown


 [PATCH 1/8] VFS: improve interface for lookup_one functions
 [PATCH 2/8] nfsd: Use lookup_one() rather than lookup_one_len()
 [PATCH 3/8] nfsd: use correct idmap for all accesses.
 [PATCH 4/8] cachefiles: Use lookup_one() rather than lookup_one_len()
 [PATCH 5/8] cachefiles: use correct mnt_idmap
 [PATCH 6/8] VFS: rename lookup_one_len() family to lookup_noperm()
 [PATCH 7/8] Use try_lookup_noperm() instead of d_hash_and_lookup()
 [PATCH 8/8] VFS: change lookup_one_common and lookup_noperm_common to

Comments

Christian Brauner March 14, 2025, 10:38 a.m. UTC | #1
On Fri, Mar 14, 2025 at 11:34:06AM +1100, NeilBrown wrote:
> VFS has some functions with names containing "lookup_one_len" and others
> without the "_len".  This difference has nothing to do with "len".
> 
> The functions without "_len" take a "mnt_idmap" pointer.  This is found

When we added idmapped mounts there were all these *_len() helpers and I
orignally had just ported them to pass mnt_idmap. But we decided to not
do this. The argument might have been that most callers don't need to be
switched (I'm not actually sure if that's still true now that we have
quite a few filesystems that do support idmapped mounts.).

So then we added new helper and then we decided to use better naming
then that *_len() stuff. That's about it.

> in the "vfsmount" and that is an important question when choosing which
> to use: do you have a vfsmount, or are you "inside" the filesystem.  A
> related question is "is permission checking relevant here?".
> 
> nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
> functions.  They pass nop_mnt_idmap which is not correct if the vfsmount
> is actually idmaped.  For cachefiles it probably (?) doesn't matter as
> the accesses to the backing filesystem are always does with elevated privileged (?).

Cachefiles explicitly refuse being mounted on top of an idmapped mount
and they require that the mount is attached (check_mnt()) and an
attached mount can never be idmapped as it has already been exposed in
the filesystem hierarchy.

> 
> For nfsd it would matter if anyone exported an idmapped filesystem.  I
> wonder if anyone has tried...

nfsd doesn't support exporting idmapped mounts. See check_export() where
that's explicitly checked.

If there are ways to circumvent this I'd be good to know.

> 
> These patches change the "lookup_one" functions to take a vfsmount
> instead of a mnt_idmap because I think that makes the intention clearer.

Please don't!

These internal lookup helpers intentionally do not take a vfsmount.
First, because they can be called in places where access to a vfsmount
isn't possible and we don't want to pass vfsmounts down to filesystems
ever!

Second, the mnt_idmap pointer is - with few safe exceptions - is
retrieved once in the VFS and then passed down so that e.g., permission
checking and file creation are guaranteed to use the same mnt_idmap
pointer.

A caller may start out with a non-idmapped detached mount (e.g., via
fsmount() or OPEN_TREE_CLONE) (nop_mnt_idmap) and call
inode_permission(). Now someone idmaps that mount. Now further down the
callchain someone calls lookup_one() which now retrieves the idmapping
again and now it's an idmapped mount. Now permission checking is out of
sync. That's an unlikely scenario but it's possible so lookup_one() is
not supposed to retrieve the idmapping again. Please keep passing it
explicitly. I've also written that down in the Documenation somewhere.

> 
> It also renames the "_one" functions to be "_noperm" and removes the
> permission checking completely.  In all cases where they are (correctly)
> used permission checking is irrelevant.

Ok, that sounds fine. Though I haven't taken the time to check the
callers yet. I'll try to do that during the weekend.

> 
> I haven't included changes to afs because there are patches in vfs.all
> which make a lot of changes to lookup in afs.  I think (if they are seen
> as a good idea) these patches should aim to land after the afs patches
> and any further fixup in afs can happen then.
> 
> The nfsd and cachefiles patches probably should be separate.  Maybe I
> should submit those to relevant maintainers first, and one afs,
> cachefiles, and nfsd changes have landed I can submit this series with
> appropriate modifications.
> 
> May main question for review is : have I understood mnt_idmap correctly?

I mean, you didn't ask semantic questions so much as syntactic, I think.
I hope I explained the reasoning sufficiently.

Thanks for the cleanups.
NeilBrown March 17, 2025, 2:06 a.m. UTC | #2
On Fri, 14 Mar 2025, Christian Brauner wrote:
> On Fri, Mar 14, 2025 at 11:34:06AM +1100, NeilBrown wrote:
> > VFS has some functions with names containing "lookup_one_len" and others
> > without the "_len".  This difference has nothing to do with "len".
> > 
> > The functions without "_len" take a "mnt_idmap" pointer.  This is found
> 
> When we added idmapped mounts there were all these *_len() helpers and I
> orignally had just ported them to pass mnt_idmap. But we decided to not
> do this. The argument might have been that most callers don't need to be
> switched (I'm not actually sure if that's still true now that we have
> quite a few filesystems that do support idmapped mounts.).
> 
> So then we added new helper and then we decided to use better naming
> then that *_len() stuff. That's about it.
> 
> > in the "vfsmount" and that is an important question when choosing which
> > to use: do you have a vfsmount, or are you "inside" the filesystem.  A
> > related question is "is permission checking relevant here?".
> > 
> > nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
> > functions.  They pass nop_mnt_idmap which is not correct if the vfsmount
> > is actually idmaped.  For cachefiles it probably (?) doesn't matter as
> > the accesses to the backing filesystem are always does with elevated privileged (?).
> 
> Cachefiles explicitly refuse being mounted on top of an idmapped mount
> and they require that the mount is attached (check_mnt()) and an
> attached mount can never be idmapped as it has already been exposed in
> the filesystem hierarchy.
> 
> > 
> > For nfsd it would matter if anyone exported an idmapped filesystem.  I
> > wonder if anyone has tried...
> 
> nfsd doesn't support exporting idmapped mounts. See check_export() where
> that's explicitly checked.
> 
> If there are ways to circumvent this I'd be good to know.

I should have checked that they rejected idmapped mounts
(is_idmapped_mnt()).  But I think that just changes my justification for
the change, not my desire to make the change.

There are two contexts in which lookup is done.  One is the common
context when there is a vfsmount present and permission checking is
expected.  nfsd and cachefiles both fit this context.

The other is when there is no vfsmount and/or permission checking is not
relevant.  This happens after lookup_parentat when the permission check
has already been performed, and in various virtual filesystems when the
filesystem itself is adding/removing files or in normal filesystems
where dedicated names like "lost+found" and "quota" are being accessed.

I would like to make a clear distinction between these, and for that to
be done nfsd and cachefiles need to be changed to clearly fit the first
context.  Whether they should allow idmapped mounts or not is to some
extent a separate question.  They do want to do permission checking
(certainly nfsd does) so they should use the same API as other
permission-checking code.

> 
> > 
> > These patches change the "lookup_one" functions to take a vfsmount
> > instead of a mnt_idmap because I think that makes the intention clearer.
> 
> Please don't!
> 
> These internal lookup helpers intentionally do not take a vfsmount.
> First, because they can be called in places where access to a vfsmount
> isn't possible and we don't want to pass vfsmounts down to filesystems
> ever!

There are two sorts of internal lookup helpers.
Those that currently don't even take a mnt_idmap and are called, as you
say, in places where a vfsmount isn't available.
And those that are currently called with a mnt_idmap and called (after a
few cleanup) in places where a vfsmount is readily available.


> 
> Second, the mnt_idmap pointer is - with few safe exceptions - is
> retrieved once in the VFS and then passed down so that e.g., permission
> checking and file creation are guaranteed to use the same mnt_idmap
> pointer.

In every case that I changed a call to pass a vfsmount instead of a
mnt_idmap, the mnt_idmap had recently been fetched from the vfsmount,
often by mnt_idmap() in the first argument to lookup_one().  Sometimes
by file_mnt_idmap() or similar.  So the patch never changed the safety
of the idmap.

One purpose of this change is to force callers to either acknowledge
that not permission checking is needed (by using a _noperm interface),
or to provide the correct idmap (not nop_mnt_idmap).

I understand that for object creation interfaces (vfs_mkdir etc) this
wouldn't work.  For these the idmap isn't needed only for permission
checking, but also for assigning an initial uid so the simple "vfsmount
or noperm" distinction doesn't work.  But for lookup I think it does.

> 
> A caller may start out with a non-idmapped detached mount (e.g., via
> fsmount() or OPEN_TREE_CLONE) (nop_mnt_idmap) and call
> inode_permission(). Now someone idmaps that mount. Now further down the
> callchain someone calls lookup_one() which now retrieves the idmapping
> again and now it's an idmapped mount. Now permission checking is out of
> sync. That's an unlikely scenario but it's possible so lookup_one() is
> not supposed to retrieve the idmapping again. Please keep passing it
> explicitly. I've also written that down in the Documenation somewhere.
> 

I couldn't easily find such documentation but I didn't look *very* hard.
fsmount and OPEN_TREE_CLONE only seem to be used in selftests though I
guess they could get used elsewhere.  However current callers of
lookup_one() always get the idmap from the vfsmount shortly before
calling lookup_one.

> 
> > 
> > It also renames the "_one" functions to be "_noperm" and removes the
> > permission checking completely.  In all cases where they are (correctly)
> > used permission checking is irrelevant.
> 
> Ok, that sounds fine. Though I haven't taken the time to check the
> callers yet. I'll try to do that during the weekend.
> 
> > 
> > I haven't included changes to afs because there are patches in vfs.all
> > which make a lot of changes to lookup in afs.  I think (if they are seen
> > as a good idea) these patches should aim to land after the afs patches
> > and any further fixup in afs can happen then.
> > 
> > The nfsd and cachefiles patches probably should be separate.  Maybe I
> > should submit those to relevant maintainers first, and one afs,
> > cachefiles, and nfsd changes have landed I can submit this series with
> > appropriate modifications.
> > 
> > May main question for review is : have I understood mnt_idmap correctly?
> 
> I mean, you didn't ask semantic questions so much as syntactic, I think.
> I hope I explained the reasoning sufficiently.
> 

You have helped me see some aspects of idmapping which I hadn't seen as
clearly before - thanks.

NeilBrown