diff mbox series

[1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry

Message ID 20250207034040.3402438-2-neilb@suse.de (mailing list archive)
State New
Headers show
Series VFS: minor improvements to a couple of interfaces | expand

Commit Message

NeilBrown Feb. 7, 2025, 3:36 a.m. UTC
No callers of kern_path_locked() or user_path_locked_at() want a
negative dentry.  So change them to return -ENOENT instead.  This
simplifies callers.

This results in a subtle change to bcachefs in that an ioctl will now
return -ENOENT in preference to -EXDEV.  I believe this restores the
behaviour to what it was prior to
 Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/base/devtmpfs.c | 65 +++++++++++++++++++----------------------
 fs/bcachefs/fs-ioctl.c  |  4 ---
 fs/namei.c              |  4 +++
 kernel/audit_watch.c    | 12 ++++----
 4 files changed, 40 insertions(+), 45 deletions(-)

Comments

Kent Overstreet Feb. 7, 2025, 3:46 a.m. UTC | #1
On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> No callers of kern_path_locked() or user_path_locked_at() want a
> negative dentry.  So change them to return -ENOENT instead.  This
> simplifies callers.
> 
> This results in a subtle change to bcachefs in that an ioctl will now
> return -ENOENT in preference to -EXDEV.  I believe this restores the
> behaviour to what it was prior to

I'm not following how the code change matches the commit message?

>  Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> index 15725b4ce393..595b57fabc9a 100644
> --- a/fs/bcachefs/fs-ioctl.c
> +++ b/fs/bcachefs/fs-ioctl.c
> @@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
>  		ret = -EXDEV;
>  		goto err;
>  	}
> -	if (!d_is_positive(victim)) {
> -		ret = -ENOENT;
> -		goto err;
> -	}
>  	ret = __bch2_unlink(dir, victim, true);
>  	if (!ret) {
>  		fsnotify_rmdir(dir, victim);
NeilBrown Feb. 7, 2025, 4:53 a.m. UTC | #2
On Fri, 07 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > No callers of kern_path_locked() or user_path_locked_at() want a
> > negative dentry.  So change them to return -ENOENT instead.  This
> > simplifies callers.
> > 
> > This results in a subtle change to bcachefs in that an ioctl will now
> > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > behaviour to what it was prior to
> 
> I'm not following how the code change matches the commit message?

Maybe it doesn't.  Let me checked.

Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
-EXDEV.

-ENOENT is returned if the path named in arg.dst_ptr cannot be found.
-EXDEV is returned if the filesystem on which that path exists is not
 the one that the ioctl is called on.

If the target filesystem is "/foo" and the path given is "/bar/baz" and
/bar exists but /bar/baz does not, then user_path_locked_at or
user_path_at will return a negative dentry corresponding to the
(non-existent) name "baz" in /bar.

In this case the dentry exists so the filesystem on which it was found
can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
are credible return values.


- before bbe6a7c899e7 the -EXDEV is tested immediately after the call
  to user_path_att() so there is no chance that ENOENT will be returned.
  I cannot actually find where ENOENT could be returned ...  but that
  doesn't really matter now.

- after that patch .... again the -EXDEV test comes first. That isn't
  what I remember.  I must have misread it.

- after my patch user_path_locked_at() will return -ENOENT if the whole
  name cannot be found.  So now you get -ENOENT instead of -EXDEV.

So with my patch, ENOENT always wins, and it was never like that before.
Thanks for challenging me!

Do you think there could be a problem with changing the error returned
in this circumstance? i.e. if you try to destroy a subvolume with a
non-existant name on a different filesystem could getting -ENOENT
instead of -EXDEV be noticed?

Thanks,
NeilBrown
Kent Overstreet Feb. 7, 2025, 6:13 a.m. UTC | #3
On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > > No callers of kern_path_locked() or user_path_locked_at() want a
> > > negative dentry.  So change them to return -ENOENT instead.  This
> > > simplifies callers.
> > > 
> > > This results in a subtle change to bcachefs in that an ioctl will now
> > > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > > behaviour to what it was prior to
> > 
> > I'm not following how the code change matches the commit message?
> 
> Maybe it doesn't.  Let me checked.
> 
> Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
> which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
> -EXDEV.
> 
> -ENOENT is returned if the path named in arg.dst_ptr cannot be found.
> -EXDEV is returned if the filesystem on which that path exists is not
>  the one that the ioctl is called on.
> 
> If the target filesystem is "/foo" and the path given is "/bar/baz" and
> /bar exists but /bar/baz does not, then user_path_locked_at or
> user_path_at will return a negative dentry corresponding to the
> (non-existent) name "baz" in /bar.
> 
> In this case the dentry exists so the filesystem on which it was found
> can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
> are credible return values.
> 
> 
> - before bbe6a7c899e7 the -EXDEV is tested immediately after the call
>   to user_path_att() so there is no chance that ENOENT will be returned.
>   I cannot actually find where ENOENT could be returned ...  but that
>   doesn't really matter now.
> 
> - after that patch .... again the -EXDEV test comes first. That isn't
>   what I remember.  I must have misread it.
> 
> - after my patch user_path_locked_at() will return -ENOENT if the whole
>   name cannot be found.  So now you get -ENOENT instead of -EXDEV.
> 
> So with my patch, ENOENT always wins, and it was never like that before.
> Thanks for challenging me!

How do you always manage to be unfailingly polite? :)

> 
> Do you think there could be a problem with changing the error returned
> in this circumstance? i.e. if you try to destroy a subvolume with a
> non-existant name on a different filesystem could getting -ENOENT
> instead of -EXDEV be noticed?

-EXDEV is the standard error code for "we're crossing a filesystem
boundary and we can't or aren't supposed to be", so no, let's not change
that.
NeilBrown Feb. 7, 2025, 6:34 a.m. UTC | #4
On Fri, 07 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > > > No callers of kern_path_locked() or user_path_locked_at() want a
> > > > negative dentry.  So change them to return -ENOENT instead.  This
> > > > simplifies callers.
> > > > 
> > > > This results in a subtle change to bcachefs in that an ioctl will now
> > > > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > > > behaviour to what it was prior to
> > > 
> > > I'm not following how the code change matches the commit message?
> > 
> > Maybe it doesn't.  Let me checked.
> > 
> > Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
> > which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
> > -EXDEV.
> > 
> > -ENOENT is returned if the path named in arg.dst_ptr cannot be found.
> > -EXDEV is returned if the filesystem on which that path exists is not
> >  the one that the ioctl is called on.
> > 
> > If the target filesystem is "/foo" and the path given is "/bar/baz" and
> > /bar exists but /bar/baz does not, then user_path_locked_at or
> > user_path_at will return a negative dentry corresponding to the
> > (non-existent) name "baz" in /bar.
> > 
> > In this case the dentry exists so the filesystem on which it was found
> > can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
> > are credible return values.
> > 
> > 
> > - before bbe6a7c899e7 the -EXDEV is tested immediately after the call
> >   to user_path_att() so there is no chance that ENOENT will be returned.
> >   I cannot actually find where ENOENT could be returned ...  but that
> >   doesn't really matter now.
> > 
> > - after that patch .... again the -EXDEV test comes first. That isn't
> >   what I remember.  I must have misread it.
> > 
> > - after my patch user_path_locked_at() will return -ENOENT if the whole
> >   name cannot be found.  So now you get -ENOENT instead of -EXDEV.
> > 
> > So with my patch, ENOENT always wins, and it was never like that before.
> > Thanks for challenging me!
> 
> How do you always manage to be unfailingly polite? :)

My dad impressed the value of politeness on me at an early age.  It is
an effective way of manipulating people!

> 
> > 
> > Do you think there could be a problem with changing the error returned
> > in this circumstance? i.e. if you try to destroy a subvolume with a
> > non-existant name on a different filesystem could getting -ENOENT
> > instead of -EXDEV be noticed?
> 
> -EXDEV is the standard error code for "we're crossing a filesystem
> boundary and we can't or aren't supposed to be", so no, let's not change
> that.
> 

OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
be too hard.

Thanks,
NeilBrown
Kent Overstreet Feb. 7, 2025, 6:51 a.m. UTC | #5
On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > Do you think there could be a problem with changing the error returned
> > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > non-existant name on a different filesystem could getting -ENOENT
> > > instead of -EXDEV be noticed?
> > 
> > -EXDEV is the standard error code for "we're crossing a filesystem
> > boundary and we can't or aren't supposed to be", so no, let's not change
> > that.
> > 
> 
> OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> be too hard.

Hang on, why does that require keeping user_path_locked_at()? Just
compare i_sb...
NeilBrown Feb. 7, 2025, 6:53 a.m. UTC | #6
On Fri, 07 Feb 2025, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote:
> > > No callers of kern_path_locked() or user_path_locked_at() want a
> > > negative dentry.  So change them to return -ENOENT instead.  This
> > > simplifies callers.
> > > 
> > > This results in a subtle change to bcachefs in that an ioctl will now
> > > return -ENOENT in preference to -EXDEV.  I believe this restores the
> > > behaviour to what it was prior to
> > 
> > I'm not following how the code change matches the commit message?
> 
> Maybe it doesn't.  Let me checked.
> 
> Two of the possible error returns from bch2_ioctl_subvolume_destroy(),
> which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and
> -EXDEV.
> 
> -ENOENT is returned if the path named in arg.dst_ptr cannot be found.
> -EXDEV is returned if the filesystem on which that path exists is not
>  the one that the ioctl is called on.
> 
> If the target filesystem is "/foo" and the path given is "/bar/baz" and
> /bar exists but /bar/baz does not, then user_path_locked_at or
> user_path_at will return a negative dentry corresponding to the
> (non-existent) name "baz" in /bar.
> 
> In this case the dentry exists so the filesystem on which it was found
> can be tested, but the dentry is negative.  So both -ENOENT and -EXDEV
> are credible return values.
> 
> 
> - before bbe6a7c899e7 the -EXDEV is tested immediately after the call
>   to user_path_at() so there is no chance that ENOENT will be returned.
>   I cannot actually find where ENOENT could be returned ...  but that
>   doesn't really matter now.

No, that's not right.  user_path_at() never returns a negative dentry.
It isn't documented and I tried reading the code instead of looking at
callers.

So before that commit it DID return ENOENT rather than -EXDEV where as
after that commit it returns -EXDEV rather than -ENOENT.

So my patch IS restoring the old behaviour.  Are you *sure* you want
-EXDEV when given a non-existent path?

Thanks,
NeilBrown


> 
> - after that patch .... again the -EXDEV test comes first. That isn't
>   what I remember.  I must have misread it.
> 
> - after my patch user_path_locked_at() will return -ENOENT if the whole
>   name cannot be found.  So now you get -ENOENT instead of -EXDEV.
> 
> So with my patch, ENOENT always wins, and it was never like that before.
> Thanks for challenging me!
> 
> Do you think there could be a problem with changing the error returned
> in this circumstance? i.e. if you try to destroy a subvolume with a
> non-existant name on a different filesystem could getting -ENOENT
> instead of -EXDEV be noticed?
> 
> Thanks,
> NeilBrown
> 
>
NeilBrown Feb. 7, 2025, 7:30 a.m. UTC | #7
On Fri, 07 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > Do you think there could be a problem with changing the error returned
> > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > non-existant name on a different filesystem could getting -ENOENT
> > > > instead of -EXDEV be noticed?
> > > 
> > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > that.
> > > 
> > 
> > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > be too hard.
> 
> Hang on, why does that require keeping user_path_locked_at()? Just
> compare i_sb...
> 

I changed user_path_locked_at() to not return a dentry at all when the
full path couldn't be found.  If there is no dentry, then there is no
->d_sb.
(if there was an ->i_sb, there would be an inode and this all wouldn't
be an issue).

To recap: the difference happens if the path DOESN'T exist but the
parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
case and the error code shouldn't matter.  But I had to ask...

NeilBrown
Kent Overstreet Feb. 7, 2025, 1:35 p.m. UTC | #8
On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > Do you think there could be a problem with changing the error returned
> > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > instead of -EXDEV be noticed?
> > > > 
> > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > that.
> > > > 
> > > 
> > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > be too hard.
> > 
> > Hang on, why does that require keeping user_path_locked_at()? Just
> > compare i_sb...
> > 
> 
> I changed user_path_locked_at() to not return a dentry at all when the
> full path couldn't be found.  If there is no dentry, then there is no
> ->d_sb.
> (if there was an ->i_sb, there would be an inode and this all wouldn't
> be an issue).
> 
> To recap: the difference happens if the path DOESN'T exist but the
> parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> case and the error code shouldn't matter.  But I had to ask...

Ahh...

Well, if I've scanned the series correctly (sorry, we're on different
timezones and I haven't had much caffeine yet) I hope you don't have to
keep that function just for bcachefs - but I do think the error code is
important.

Userspace getting -ENOENT and reporting -ENOENT to the user will
inevitably lead to head banging frustration by someone, somewhere, when
they're trying to delete something and the system is tell them it
doesn't exist when they can see it very much does exist, right there :)
the more precise error code is a very helpful cue...
Paul Moore Feb. 7, 2025, 7:09 p.m. UTC | #9
On Thu, Feb 6, 2025 at 10:41 PM NeilBrown <neilb@suse.de> wrote:
>
> No callers of kern_path_locked() or user_path_locked_at() want a
> negative dentry.  So change them to return -ENOENT instead.  This
> simplifies callers.
>
> This results in a subtle change to bcachefs in that an ioctl will now
> return -ENOENT in preference to -EXDEV.  I believe this restores the
> behaviour to what it was prior to
>  Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking")
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  drivers/base/devtmpfs.c | 65 +++++++++++++++++++----------------------
>  fs/bcachefs/fs-ioctl.c  |  4 ---
>  fs/namei.c              |  4 +++
>  kernel/audit_watch.c    | 12 ++++----
>  4 files changed, 40 insertions(+), 45 deletions(-)

...

> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 7f358740e958..e3130675ee6b 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
>         struct dentry *d = kern_path_locked(watch->path, parent);
>         if (IS_ERR(d))
>                 return PTR_ERR(d);
> -       if (d_is_positive(d)) {
> -               /* update watch filter fields */
> -               watch->dev = d->d_sb->s_dev;
> -               watch->ino = d_backing_inode(d)->i_ino;
> -       }
> +       /* update watch filter fields */
> +       watch->dev = d->d_sb->s_dev;
> +       watch->ino = d_backing_inode(d)->i_ino;
> +
>         inode_unlock(d_backing_inode(parent->dentry));
>         dput(d);
>         return 0;
> @@ -419,7 +418,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
>         /* caller expects mutex locked */
>         mutex_lock(&audit_filter_mutex);
>
> -       if (ret) {
> +       if (ret && ret != -ENOENT) {
>                 audit_put_watch(watch);
>                 return ret;
>         }
> @@ -438,6 +437,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list)
>
>         h = audit_hash_ino((u32)watch->ino);
>         *list = &audit_inode_hash[h];
> +       ret = 0;

If you have to respin this patch for any reason I'd prefer to move the
'ret = 0' up to just after the if block you're modifying in the chunk
above, but that's a trivial nitpick so please don't respin only for
that.  Otherwise it looks fine to me from an audit perspective.

Acked-by: Paul Moore <paul@paul-moore.com> (Audit)

>  error:
>         path_put(&parent_path);
>         audit_put_watch(watch);
> --
> 2.47.1
NeilBrown Feb. 10, 2025, 1:20 a.m. UTC | #10
On Sat, 08 Feb 2025, Kent Overstreet wrote:
> On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > > Do you think there could be a problem with changing the error returned
> > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > > instead of -EXDEV be noticed?
> > > > > 
> > > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > > that.
> > > > > 
> > > > 
> > > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > > be too hard.
> > > 
> > > Hang on, why does that require keeping user_path_locked_at()? Just
> > > compare i_sb...
> > > 
> > 
> > I changed user_path_locked_at() to not return a dentry at all when the
> > full path couldn't be found.  If there is no dentry, then there is no
> > ->d_sb.
> > (if there was an ->i_sb, there would be an inode and this all wouldn't
> > be an issue).
> > 
> > To recap: the difference happens if the path DOESN'T exist but the
> > parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> > case and the error code shouldn't matter.  But I had to ask...
> 
> Ahh...
> 
> Well, if I've scanned the series correctly (sorry, we're on different
> timezones and I haven't had much caffeine yet) I hope you don't have to
> keep that function just for bcachefs - but I do think the error code is
> important.
> 
> Userspace getting -ENOENT and reporting -ENOENT to the user will
> inevitably lead to head banging frustration by someone, somewhere, when
> they're trying to delete something and the system is tell them it
> doesn't exist when they can see it very much does exist, right there :)
> the more precise error code is a very helpful cue...
> 

???
You will only get -ENOENT if there is no ent.  There is no question of a
confusing error message.
If you ask for a non-exist name on the correct filesystem, you get -ENOENT
If you ask for an existing name of the wrong filesystem, you get -EXDEV
That all works as expected and always has.

But what if you ask for a non-existing name in a directory on the
wrong filesystem?  
The code you originally wrote in 42d237320e9817a9 would return
-ENOENT because that it what user_path_at() would return.
But using user_path_at() is "wrong" because it doesn't lock the directory
so ->d_parent is not guaranteed to be stable.
Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but
that doesn't check for a negative dentry so Al added a check to return
-ENOENT, but that was added *after* the test that returns -EXDEV.

So now if you call subvolume_destroy on a non-existing name in a
directory on the wrong filesystem, you get -EXDEV.  I think that is
a bit weird but not a lot weird.
My patch will change it back to -ENOENT - the way you originally wrote
it.

I hope you are ok with that.

NeilBrown
Kent Overstreet Feb. 10, 2025, 4:33 p.m. UTC | #11
On Mon, Feb 10, 2025 at 12:20:15PM +1100, NeilBrown wrote:
> On Sat, 08 Feb 2025, Kent Overstreet wrote:
> > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > > > Do you think there could be a problem with changing the error returned
> > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > > > instead of -EXDEV be noticed?
> > > > > > 
> > > > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > > > that.
> > > > > > 
> > > > > 
> > > > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > > > be too hard.
> > > > 
> > > > Hang on, why does that require keeping user_path_locked_at()? Just
> > > > compare i_sb...
> > > > 
> > > 
> > > I changed user_path_locked_at() to not return a dentry at all when the
> > > full path couldn't be found.  If there is no dentry, then there is no
> > > ->d_sb.
> > > (if there was an ->i_sb, there would be an inode and this all wouldn't
> > > be an issue).
> > > 
> > > To recap: the difference happens if the path DOESN'T exist but the
> > > parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> > > case and the error code shouldn't matter.  But I had to ask...
> > 
> > Ahh...
> > 
> > Well, if I've scanned the series correctly (sorry, we're on different
> > timezones and I haven't had much caffeine yet) I hope you don't have to
> > keep that function just for bcachefs - but I do think the error code is
> > important.
> > 
> > Userspace getting -ENOENT and reporting -ENOENT to the user will
> > inevitably lead to head banging frustration by someone, somewhere, when
> > they're trying to delete something and the system is tell them it
> > doesn't exist when they can see it very much does exist, right there :)
> > the more precise error code is a very helpful cue...
> > 
> 
> ???
> You will only get -ENOENT if there is no ent.  There is no question of a
> confusing error message.
> If you ask for a non-exist name on the correct filesystem, you get -ENOENT
> If you ask for an existing name of the wrong filesystem, you get -EXDEV
> That all works as expected and always has.
> 
> But what if you ask for a non-existing name in a directory on the
> wrong filesystem?  
> The code you originally wrote in 42d237320e9817a9 would return
> -ENOENT because that it what user_path_at() would return.

Ahh - ok, I think I see where I misread before

> But using user_path_at() is "wrong" because it doesn't lock the directory
> so ->d_parent is not guaranteed to be stable.
> Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but
> that doesn't check for a negative dentry so Al added a check to return
> -ENOENT, but that was added *after* the test that returns -EXDEV.
> 
> So now if you call subvolume_destroy on a non-existing name in a
> directory on the wrong filesystem, you get -EXDEV.  I think that is
> a bit weird but not a lot weird.

Yeah, we don't need to preserve that. As long as calling it on a name
that _does_ exist on a different filesystem returns -EXDEV, that's all I
care about.

So assuming that's the case you can go ahead and add my acked-by...

Nit: I would go back and stare at the patch some more, but threading got
completely fubar so I can't find anything. Doh.

> My patch will change it back to -ENOENT - the way you originally wrote
> it.
> 
> I hope you are ok with that.

Yes, sounds good.
NeilBrown Feb. 12, 2025, 3:24 a.m. UTC | #12
On Tue, 11 Feb 2025, Kent Overstreet wrote:
> On Mon, Feb 10, 2025 at 12:20:15PM +1100, NeilBrown wrote:
> > On Sat, 08 Feb 2025, Kent Overstreet wrote:
> > > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote:
> > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote:
> > > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote:
> > > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote:
> > > > > > > > Do you think there could be a problem with changing the error returned
> > > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a
> > > > > > > > non-existant name on a different filesystem could getting -ENOENT
> > > > > > > > instead of -EXDEV be noticed?
> > > > > > > 
> > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem
> > > > > > > boundary and we can't or aren't supposed to be", so no, let's not change
> > > > > > > that.
> > > > > > > 
> > > > > > 
> > > > > > OK.  As bcachefs is the only user of user_path_locked_at() it shouldn't
> > > > > > be too hard.
> > > > > 
> > > > > Hang on, why does that require keeping user_path_locked_at()? Just
> > > > > compare i_sb...
> > > > > 
> > > > 
> > > > I changed user_path_locked_at() to not return a dentry at all when the
> > > > full path couldn't be found.  If there is no dentry, then there is no
> > > > ->d_sb.
> > > > (if there was an ->i_sb, there would be an inode and this all wouldn't
> > > > be an issue).
> > > > 
> > > > To recap: the difference happens if the path DOESN'T exist but the
> > > > parent DOES exist on a DIFFERENT filesystem.  It is very much a corner
> > > > case and the error code shouldn't matter.  But I had to ask...
> > > 
> > > Ahh...
> > > 
> > > Well, if I've scanned the series correctly (sorry, we're on different
> > > timezones and I haven't had much caffeine yet) I hope you don't have to
> > > keep that function just for bcachefs - but I do think the error code is
> > > important.
> > > 
> > > Userspace getting -ENOENT and reporting -ENOENT to the user will
> > > inevitably lead to head banging frustration by someone, somewhere, when
> > > they're trying to delete something and the system is tell them it
> > > doesn't exist when they can see it very much does exist, right there :)
> > > the more precise error code is a very helpful cue...
> > > 
> > 
> > ???
> > You will only get -ENOENT if there is no ent.  There is no question of a
> > confusing error message.
> > If you ask for a non-exist name on the correct filesystem, you get -ENOENT
> > If you ask for an existing name of the wrong filesystem, you get -EXDEV
> > That all works as expected and always has.
> > 
> > But what if you ask for a non-existing name in a directory on the
> > wrong filesystem?  
> > The code you originally wrote in 42d237320e9817a9 would return
> > -ENOENT because that it what user_path_at() would return.
> 
> Ahh - ok, I think I see where I misread before
> 
> > But using user_path_at() is "wrong" because it doesn't lock the directory
> > so ->d_parent is not guaranteed to be stable.
> > Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but
> > that doesn't check for a negative dentry so Al added a check to return
> > -ENOENT, but that was added *after* the test that returns -EXDEV.
> > 
> > So now if you call subvolume_destroy on a non-existing name in a
> > directory on the wrong filesystem, you get -EXDEV.  I think that is
> > a bit weird but not a lot weird.
> 
> Yeah, we don't need to preserve that. As long as calling it on a name
> that _does_ exist on a different filesystem returns -EXDEV, that's all I
> care about.
> 
> So assuming that's the case you can go ahead and add my acked-by...

Cool - thanks.

> 
> Nit: I would go back and stare at the patch some more, but threading got
> completely fubar so I can't find anything. Doh.
> 

:-)

NeilBrown



> > My patch will change it back to -ENOENT - the way you originally wrote
> > it.
> > 
> > I hope you are ok with that.
> 
> Yes, sounds good.
>
diff mbox series

Patch

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index b848764ef018..c9e34842139f 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -245,15 +245,12 @@  static int dev_rmdir(const char *name)
 	dentry = kern_path_locked(name, &parent);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
-	if (d_really_is_positive(dentry)) {
-		if (d_inode(dentry)->i_private == &thread)
-			err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry),
-					dentry);
-		else
-			err = -EPERM;
-	} else {
-		err = -ENOENT;
-	}
+	if (d_inode(dentry)->i_private == &thread)
+		err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry),
+				dentry);
+	else
+		err = -EPERM;
+
 	dput(dentry);
 	inode_unlock(d_inode(parent.dentry));
 	path_put(&parent);
@@ -310,6 +307,8 @@  static int handle_remove(const char *nodename, struct device *dev)
 {
 	struct path parent;
 	struct dentry *dentry;
+	struct kstat stat;
+	struct path p;
 	int deleted = 0;
 	int err;
 
@@ -317,32 +316,28 @@  static int handle_remove(const char *nodename, struct device *dev)
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-	if (d_really_is_positive(dentry)) {
-		struct kstat stat;
-		struct path p = {.mnt = parent.mnt, .dentry = dentry};
-		err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE,
-				  AT_STATX_SYNC_AS_STAT);
-		if (!err && dev_mynode(dev, d_inode(dentry), &stat)) {
-			struct iattr newattrs;
-			/*
-			 * before unlinking this node, reset permissions
-			 * of possible references like hardlinks
-			 */
-			newattrs.ia_uid = GLOBAL_ROOT_UID;
-			newattrs.ia_gid = GLOBAL_ROOT_GID;
-			newattrs.ia_mode = stat.mode & ~0777;
-			newattrs.ia_valid =
-				ATTR_UID|ATTR_GID|ATTR_MODE;
-			inode_lock(d_inode(dentry));
-			notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL);
-			inode_unlock(d_inode(dentry));
-			err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry),
-					 dentry, NULL);
-			if (!err || err == -ENOENT)
-				deleted = 1;
-		}
-	} else {
-		err = -ENOENT;
+	p.mnt = parent.mnt;
+	p.dentry = dentry;
+	err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE,
+			  AT_STATX_SYNC_AS_STAT);
+	if (!err && dev_mynode(dev, d_inode(dentry), &stat)) {
+		struct iattr newattrs;
+		/*
+		 * before unlinking this node, reset permissions
+		 * of possible references like hardlinks
+		 */
+		newattrs.ia_uid = GLOBAL_ROOT_UID;
+		newattrs.ia_gid = GLOBAL_ROOT_GID;
+		newattrs.ia_mode = stat.mode & ~0777;
+		newattrs.ia_valid =
+			ATTR_UID|ATTR_GID|ATTR_MODE;
+		inode_lock(d_inode(dentry));
+		notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL);
+		inode_unlock(d_inode(dentry));
+		err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry),
+				 dentry, NULL);
+		if (!err || err == -ENOENT)
+			deleted = 1;
 	}
 	dput(dentry);
 	inode_unlock(d_inode(parent.dentry));
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index 15725b4ce393..595b57fabc9a 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -511,10 +511,6 @@  static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
 		ret = -EXDEV;
 		goto err;
 	}
-	if (!d_is_positive(victim)) {
-		ret = -ENOENT;
-		goto err;
-	}
 	ret = __bch2_unlink(dir, victim, true);
 	if (!ret) {
 		fsnotify_rmdir(dir, victim);
diff --git a/fs/namei.c b/fs/namei.c
index 3a4039acdb3f..e3047db7b2b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2736,6 +2736,10 @@  static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
 	}
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
 	d = lookup_one_qstr_excl(&last, path->dentry, 0);
+	if (!IS_ERR(d) && d_is_negative(d)) {
+		dput(d);
+		d = ERR_PTR(-ENOENT);
+	}
 	if (IS_ERR(d)) {
 		inode_unlock(path->dentry->d_inode);
 		path_put(path);
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 7f358740e958..e3130675ee6b 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -350,11 +350,10 @@  static int audit_get_nd(struct audit_watch *watch, struct path *parent)
 	struct dentry *d = kern_path_locked(watch->path, parent);
 	if (IS_ERR(d))
 		return PTR_ERR(d);
-	if (d_is_positive(d)) {
-		/* update watch filter fields */
-		watch->dev = d->d_sb->s_dev;
-		watch->ino = d_backing_inode(d)->i_ino;
-	}
+	/* update watch filter fields */
+	watch->dev = d->d_sb->s_dev;
+	watch->ino = d_backing_inode(d)->i_ino;
+
 	inode_unlock(d_backing_inode(parent->dentry));
 	dput(d);
 	return 0;
@@ -419,7 +418,7 @@  int audit_add_watch(struct audit_krule *krule, struct list_head **list)
 	/* caller expects mutex locked */
 	mutex_lock(&audit_filter_mutex);
 
-	if (ret) {
+	if (ret && ret != -ENOENT) {
 		audit_put_watch(watch);
 		return ret;
 	}
@@ -438,6 +437,7 @@  int audit_add_watch(struct audit_krule *krule, struct list_head **list)
 
 	h = audit_hash_ino((u32)watch->ino);
 	*list = &audit_inode_hash[h];
+	ret = 0;
 error:
 	path_put(&parent_path);
 	audit_put_watch(watch);