diff mbox series

[RFC] fanotify: disallow mount/sb marks on kernel internal pseudo fs

Message ID 20230629042044.25723-1-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] fanotify: disallow mount/sb marks on kernel internal pseudo fs | expand

Commit Message

Amir Goldstein June 29, 2023, 4:20 a.m. UTC
Hopefully, nobody is trying to abuse mount/sb marks for watching all
anonymous pipes/inodes.

I cannot think of a good reason to allow this - it looks like an
oversight that dated back to the original fanotify API.

Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

As discussed, allowing sb/mount mark on anonymous pipes
makes no sense and we should not allow it.

I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
backport to maintained LTS kernels event though this dates back to day one
with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.

The reason this is an RFC and that I have not included also the
optimization patch is because we may want to consider banning kernel
internal inodes from fanotify and/or inotify altogether.

The tricky point in banning anonymous pipes from inotify, which
could have existing users (?), but maybe not, so maybe this is
something that we need to try out.

I think we can easily get away with banning anonymous pipes from
fanotify altogeter, but I would not like to get to into a situation
where new applications will be written to rely on inotify for
functionaly that fanotify is never going to have.

Thoughts?
Am I over thinking this?

Amir.

 fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Jan Kara June 29, 2023, 10:18 a.m. UTC | #1
On Thu 29-06-23 07:20:44, Amir Goldstein wrote:
> Hopefully, nobody is trying to abuse mount/sb marks for watching all
> anonymous pipes/inodes.
> 
> I cannot think of a good reason to allow this - it looks like an
> oversight that dated back to the original fanotify API.
> 
> Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jan,
> 
> As discussed, allowing sb/mount mark on anonymous pipes
> makes no sense and we should not allow it.
> 
> I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> backport to maintained LTS kernels event though this dates back to day one
> with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.

I can add CC to stable. We can also modify the Fixes tag to:

Fixes: 0ff21db9fcc3 ("fanotify: hooks the fanotify_mark syscall to the vfsmount code")

to make things a bit more accurate. Not that it would matter much...

> The reason this is an RFC and that I have not included also the
> optimization patch is because we may want to consider banning kernel
> internal inodes from fanotify and/or inotify altogether.

So here I guess you mean to ban also inode marks for them? And by
kernel-internal I suppose you mean on SB_NOUSER superblock?
 
> The tricky point in banning anonymous pipes from inotify, which
> could have existing users (?), but maybe not, so maybe this is
> something that we need to try out.
> 
> I think we can easily get away with banning anonymous pipes from
> fanotify altogeter, but I would not like to get to into a situation
> where new applications will be written to rely on inotify for
> functionaly that fanotify is never going to have.

Yeah, so didn't we try to already disable inotify on some virtual inodes
and didn't it break something? I have a vague feeling we've already tried
that in the past and it didn't quite fly but searching the history didn't
reveal anything so maybe I'm mistaking it with something else.

I guess you are looking for this so that fsnotify code can bail early when
it sees such inodes and thus improve performance?

								Honza

> index 95d7d8790bc3..8240a3fdbef0 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
>  	    path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
>  		return -EINVAL;
>  
> +	/*
> +	 * mount and sb marks are not allowed on kernel internal pseudo fs,
> +	 * like pipe_mnt, because that would subscribe to events on all the
> +	 * anonynous pipes in the system.
> +	 *
> +	 * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> +	 * are not exposed to user's mount namespace, but there are other
> +	 * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> +	 * allowing sb and mount mark is questionable.
> +	 */
> +	if (mark_type != FAN_MARK_INODE &&
> +	    path->mnt->mnt_sb->s_flags & SB_NOUSER)
> +		return -EINVAL;
> +
>  	/*
>  	 * We shouldn't have allowed setting dirent events and the directory
>  	 * flags FAN_ONDIR and FAN_EVENT_ON_CHILD in mask of non-dir inode,
> -- 
> 2.34.1
>
Amir Goldstein June 29, 2023, 12:20 p.m. UTC | #2
On Thu, Jun 29, 2023 at 1:18 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 29-06-23 07:20:44, Amir Goldstein wrote:
> > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > anonymous pipes/inodes.
> >
> > I cannot think of a good reason to allow this - it looks like an
> > oversight that dated back to the original fanotify API.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Jan,
> >
> > As discussed, allowing sb/mount mark on anonymous pipes
> > makes no sense and we should not allow it.
> >
> > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > backport to maintained LTS kernels event though this dates back to day one
> > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
>
> I can add CC to stable. We can also modify the Fixes tag to:
>
> Fixes: 0ff21db9fcc3 ("fanotify: hooks the fanotify_mark syscall to the vfsmount code")
>
> to make things a bit more accurate. Not that it would matter much...
>

Whatever you decide.
I guess that this could wait for 6.6?
but maybe before, because I wouldn't want to additional
fsnotify splice hooks to be added without this, so then
this restriction can be in place by the time vfs maintainers
merge the splice patches.

> > The reason this is an RFC and that I have not included also the
> > optimization patch is because we may want to consider banning kernel
> > internal inodes from fanotify and/or inotify altogether.
>
> So here I guess you mean to ban also inode marks for them? And by
> kernel-internal I suppose you mean on SB_NOUSER superblock?
>

Yes and yes.

> > The tricky point in banning anonymous pipes from inotify, which
> > could have existing users (?), but maybe not, so maybe this is
> > something that we need to try out.
> >
> > I think we can easily get away with banning anonymous pipes from
> > fanotify altogeter, but I would not like to get to into a situation
> > where new applications will be written to rely on inotify for
> > functionaly that fanotify is never going to have.
>
> Yeah, so didn't we try to already disable inotify on some virtual inodes
> and didn't it break something? I have a vague feeling we've already tried
> that in the past and it didn't quite fly but searching the history didn't
> reveal anything so maybe I'm mistaking it with something else.
>

I do have the same memory now that you mention it.
I will try to track it down.

> I guess you are looking for this so that fsnotify code can bail early when
> it sees such inodes and thus improve performance?
>

Not exactly.

Bailing early is easy even if we allow a mark on anonymous inode.
That is what the optimization patch looks like:

 /* Could the inode be watched by inode/mount/sb mark? */
 static inline bool fsnotify_inode_has_watchers(struct inode *inode, __u32 mask)
 {
+       /*
+        * For objects that are not mapped into user accessible path like
+        * anonymous pipes/inodes, we do not need to check for watchers on
+        * parent/mount/sb and the sb watchers optimizations below are
+        * not as effective, so check the inode mask directly.
+        */
+       if (inode->i_sb->s_flags & SB_NOUSER &&
+           !(mask & inode->i_fsnotify_mask))
+               return 0;
+
        if (mask & ALL_FSNOTIFY_PERM_EVENTS)
                return atomic_long_read(&inode->i_sb->s_fsnotify_perm_watchers);

        return atomic_long_read(&inode->i_sb->s_fsnotify_connectors);
}

My question was about: do we need this optimization patch or could
we just ban SB_NOUSER from inotify and fanotify altogether and then
s_fsnotify_connectors will be zero on the pseudo fs anyway.

Thanks,
Amir.
Amir Goldstein June 29, 2023, 12:51 p.m. UTC | #3
On Thu, Jun 29, 2023 at 3:20 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Jun 29, 2023 at 1:18 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 29-06-23 07:20:44, Amir Goldstein wrote:
> > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > anonymous pipes/inodes.
> > >
> > > I cannot think of a good reason to allow this - it looks like an
> > > oversight that dated back to the original fanotify API.
> > >
> > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Jan,
> > >
> > > As discussed, allowing sb/mount mark on anonymous pipes
> > > makes no sense and we should not allow it.
> > >
> > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > backport to maintained LTS kernels event though this dates back to day one
> > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> >
> > I can add CC to stable. We can also modify the Fixes tag to:
> >
> > Fixes: 0ff21db9fcc3 ("fanotify: hooks the fanotify_mark syscall to the vfsmount code")
> >
> > to make things a bit more accurate. Not that it would matter much...
> >
>
> Whatever you decide.
> I guess that this could wait for 6.6?
> but maybe before, because I wouldn't want to additional
> fsnotify splice hooks to be added without this, so then
> this restriction can be in place by the time vfs maintainers
> merge the splice patches.
>
> > > The reason this is an RFC and that I have not included also the
> > > optimization patch is because we may want to consider banning kernel
> > > internal inodes from fanotify and/or inotify altogether.
> >
> > So here I guess you mean to ban also inode marks for them? And by
> > kernel-internal I suppose you mean on SB_NOUSER superblock?
> >
>
> Yes and yes.
>
> > > The tricky point in banning anonymous pipes from inotify, which
> > > could have existing users (?), but maybe not, so maybe this is
> > > something that we need to try out.
> > >
> > > I think we can easily get away with banning anonymous pipes from
> > > fanotify altogeter, but I would not like to get to into a situation
> > > where new applications will be written to rely on inotify for
> > > functionaly that fanotify is never going to have.
> >
> > Yeah, so didn't we try to already disable inotify on some virtual inodes
> > and didn't it break something? I have a vague feeling we've already tried
> > that in the past and it didn't quite fly but searching the history didn't
> > reveal anything so maybe I'm mistaking it with something else.
> >
>
> I do have the same memory now that you mention it.
> I will try to track it down.
>

Here it is:
https://lore.kernel.org/linux-fsdevel/20200629130915.GF26507@quack2.suse.cz/

A regression report on Mel's patch:
e9c15badbb7b ("fs: Do not check if there is a fsnotify watcher on
pseudo inodes")

Chromium needs IN_OPEN/IN_CLOSE on anon pipes.
It does not need IN_ACCESS/IN_MODIFY, but the value of eliminating
those was deemed as marginal after the alternative optimizations by Mel:

71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when
there is no watcher")

The reason I would like to ban the "global" watch on all anon inodes
is because it is just wrong and an oversight of sb/mount marks that
needs to be fixed.

The SB_NOUSER optimization is something that we can consider later.
It's not critical, but just a very low hanging fruit to pick.

Based on this finding, I would go with this RFC patch as is.
I will let you decide how to CC stable and about the timing
of sending this to Linus.

Thanks,
Amir.
Jan Kara June 29, 2023, 1:49 p.m. UTC | #4
On Thu 29-06-23 15:51:58, Amir Goldstein wrote:
> On Thu, Jun 29, 2023 at 3:20 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > The tricky point in banning anonymous pipes from inotify, which
> > > > could have existing users (?), but maybe not, so maybe this is
> > > > something that we need to try out.
> > > >
> > > > I think we can easily get away with banning anonymous pipes from
> > > > fanotify altogeter, but I would not like to get to into a situation
> > > > where new applications will be written to rely on inotify for
> > > > functionaly that fanotify is never going to have.
> > >
> > > Yeah, so didn't we try to already disable inotify on some virtual inodes
> > > and didn't it break something? I have a vague feeling we've already tried
> > > that in the past and it didn't quite fly but searching the history didn't
> > > reveal anything so maybe I'm mistaking it with something else.
> > >
> >
> > I do have the same memory now that you mention it.
> > I will try to track it down.
> >
> 
> Here it is:
> https://lore.kernel.org/linux-fsdevel/20200629130915.GF26507@quack2.suse.cz/
> 
> A regression report on Mel's patch:
> e9c15badbb7b ("fs: Do not check if there is a fsnotify watcher on
> pseudo inodes")
> 
> Chromium needs IN_OPEN/IN_CLOSE on anon pipes.
> It does not need IN_ACCESS/IN_MODIFY, but the value of eliminating
> those was deemed as marginal after the alternative optimizations by Mel:
> 
> 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when
> there is no watcher")

Ah, yes. Thanks for the history digging! My grep-foo over the changelogs
was not good enough to find it :)

> The reason I would like to ban the "global" watch on all anon inodes
> is because it is just wrong and an oversight of sb/mount marks that
> needs to be fixed.
> 
> The SB_NOUSER optimization is something that we can consider later.
> It's not critical, but just a very low hanging fruit to pick.
> 
> Based on this finding, I would go with this RFC patch as is.
> I will let you decide how to CC stable and about the timing
> of sending this to Linus.

Yes, let's go with the patch as is. Currently I have pull request pending
with Linus so I won't merge the patch yet but I plan on merging it early
next week and then sending it to Linus towards the end of the next week.

								Honza
Christian Brauner June 30, 2023, 7:29 a.m. UTC | #5
On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> Hopefully, nobody is trying to abuse mount/sb marks for watching all
> anonymous pipes/inodes.
> 
> I cannot think of a good reason to allow this - it looks like an
> oversight that dated back to the original fanotify API.
> 
> Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jan,
> 
> As discussed, allowing sb/mount mark on anonymous pipes
> makes no sense and we should not allow it.
> 
> I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> backport to maintained LTS kernels event though this dates back to day one
> with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> 
> The reason this is an RFC and that I have not included also the
> optimization patch is because we may want to consider banning kernel
> internal inodes from fanotify and/or inotify altogether.
> 
> The tricky point in banning anonymous pipes from inotify, which
> could have existing users (?), but maybe not, so maybe this is
> something that we need to try out.
> 
> I think we can easily get away with banning anonymous pipes from
> fanotify altogeter, but I would not like to get to into a situation
> where new applications will be written to rely on inotify for
> functionaly that fanotify is never going to have.
> 
> Thoughts?
> Am I over thinking this?
> 
> Amir.
> 
>  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 95d7d8790bc3..8240a3fdbef0 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
>  	    path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
>  		return -EINVAL;
>  
> +	/*
> +	 * mount and sb marks are not allowed on kernel internal pseudo fs,
> +	 * like pipe_mnt, because that would subscribe to events on all the
> +	 * anonynous pipes in the system.

s/anonynous/anonymous/

> +	 *
> +	 * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> +	 * are not exposed to user's mount namespace, but there are other
> +	 * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> +	 * allowing sb and mount mark is questionable.
> +	 */
> +	if (mark_type != FAN_MARK_INODE &&
> +	    path->mnt->mnt_sb->s_flags & SB_NOUSER)
> +		return -EINVAL;

It's a good starting point.

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>
Amir Goldstein July 1, 2023, 4:25 p.m. UTC | #6
On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > anonymous pipes/inodes.
> >
> > I cannot think of a good reason to allow this - it looks like an
> > oversight that dated back to the original fanotify API.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Jan,
> >
> > As discussed, allowing sb/mount mark on anonymous pipes
> > makes no sense and we should not allow it.
> >
> > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > backport to maintained LTS kernels event though this dates back to day one
> > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> >
> > The reason this is an RFC and that I have not included also the
> > optimization patch is because we may want to consider banning kernel
> > internal inodes from fanotify and/or inotify altogether.
> >
> > The tricky point in banning anonymous pipes from inotify, which
> > could have existing users (?), but maybe not, so maybe this is
> > something that we need to try out.
> >
> > I think we can easily get away with banning anonymous pipes from
> > fanotify altogeter, but I would not like to get to into a situation
> > where new applications will be written to rely on inotify for
> > functionaly that fanotify is never going to have.
> >
> > Thoughts?
> > Am I over thinking this?
> >
> > Amir.
> >
> >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 95d7d8790bc3..8240a3fdbef0 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> >               return -EINVAL;
> >
> > +     /*
> > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > +      * like pipe_mnt, because that would subscribe to events on all the
> > +      * anonynous pipes in the system.
>
> s/anonynous/anonymous/
>
> > +      *
> > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > +      * are not exposed to user's mount namespace, but there are other
> > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > +      * allowing sb and mount mark is questionable.
> > +      */
> > +     if (mark_type != FAN_MARK_INODE &&
> > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > +             return -EINVAL;
>

On second thought, I am not sure about  the EINVAL error code here.
I used the same error code that Jan used for permission events on
proc fs, but the problem is that applications do not have a decent way
to differentiate between
"sb mark not supported by kernel" (i.e. < v4.20) vs.
"sb mark not supported by fs" (the case above)

same for permission events:
"kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
"permission events not supported by fs" (procfs)

I have looked for other syscalls that react to SB_NOUSER and I've
found that mount also returns EINVAL.

So far, fanotify_mark() and fanotify_init() mostly return EINVAL
for invalid flag combinations (also across the two syscalls),
but not because of the type of object being marked, except for
the special case of procfs and permission events.

mount(2) syscall OTOH, has many documented EINVAL cases
due to the type of source object (e.g. propagation type shared).

I know there is no standard and EINVAL can mean many
different things in syscalls, but I thought that maybe EACCES
would convey more accurately the message:
"The sb/mount of this fs is not accessible for placing a mark".

WDYT? worth changing?
worth changing procfs also?
We don't have that EINVAL for procfs documented in man page btw.

Thanks,
Amir.
Christian Brauner July 3, 2023, 8:27 a.m. UTC | #7
On Sat, Jul 01, 2023 at 07:25:14PM +0300, Amir Goldstein wrote:
> On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > anonymous pipes/inodes.
> > >
> > > I cannot think of a good reason to allow this - it looks like an
> > > oversight that dated back to the original fanotify API.
> > >
> > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Jan,
> > >
> > > As discussed, allowing sb/mount mark on anonymous pipes
> > > makes no sense and we should not allow it.
> > >
> > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > backport to maintained LTS kernels event though this dates back to day one
> > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > >
> > > The reason this is an RFC and that I have not included also the
> > > optimization patch is because we may want to consider banning kernel
> > > internal inodes from fanotify and/or inotify altogether.
> > >
> > > The tricky point in banning anonymous pipes from inotify, which
> > > could have existing users (?), but maybe not, so maybe this is
> > > something that we need to try out.
> > >
> > > I think we can easily get away with banning anonymous pipes from
> > > fanotify altogeter, but I would not like to get to into a situation
> > > where new applications will be written to rely on inotify for
> > > functionaly that fanotify is never going to have.
> > >
> > > Thoughts?
> > > Am I over thinking this?
> > >
> > > Amir.
> > >
> > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > --- a/fs/notify/fanotify/fanotify_user.c
> > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > >               return -EINVAL;
> > >
> > > +     /*
> > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > +      * anonynous pipes in the system.
> >
> > s/anonynous/anonymous/
> >
> > > +      *
> > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > +      * are not exposed to user's mount namespace, but there are other
> > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > +      * allowing sb and mount mark is questionable.
> > > +      */
> > > +     if (mark_type != FAN_MARK_INODE &&
> > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > +             return -EINVAL;
> >
> 
> On second thought, I am not sure about  the EINVAL error code here.
> I used the same error code that Jan used for permission events on
> proc fs, but the problem is that applications do not have a decent way
> to differentiate between
> "sb mark not supported by kernel" (i.e. < v4.20) vs.
> "sb mark not supported by fs" (the case above)
> 
> same for permission events:
> "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> "permission events not supported by fs" (procfs)
> 
> I have looked for other syscalls that react to SB_NOUSER and I've
> found that mount also returns EINVAL.
> 
> So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> for invalid flag combinations (also across the two syscalls),
> but not because of the type of object being marked, except for
> the special case of procfs and permission events.
> 
> mount(2) syscall OTOH, has many documented EINVAL cases
> due to the type of source object (e.g. propagation type shared).

Many is an understatement. There's so many EINVALs.

> 
> I know there is no standard and EINVAL can mean many
> different things in syscalls, but I thought that maybe EACCES
> would convey more accurately the message:
> "The sb/mount of this fs is not accessible for placing a mark".
> 
> WDYT? worth changing?

I think it's not crazy to use EACCES to let users figure out that the fs
isn't supported. It really depends on how useful that is.

> worth changing procfs also?

Not sure if people would be confused if they got EACCES suddenly but
then again, they could've never used it.

> We don't have that EINVAL for procfs documented in man page btw.
Jan Kara July 3, 2023, 11:25 a.m. UTC | #8
On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > anonymous pipes/inodes.
> > >
> > > I cannot think of a good reason to allow this - it looks like an
> > > oversight that dated back to the original fanotify API.
> > >
> > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Jan,
> > >
> > > As discussed, allowing sb/mount mark on anonymous pipes
> > > makes no sense and we should not allow it.
> > >
> > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > backport to maintained LTS kernels event though this dates back to day one
> > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > >
> > > The reason this is an RFC and that I have not included also the
> > > optimization patch is because we may want to consider banning kernel
> > > internal inodes from fanotify and/or inotify altogether.
> > >
> > > The tricky point in banning anonymous pipes from inotify, which
> > > could have existing users (?), but maybe not, so maybe this is
> > > something that we need to try out.
> > >
> > > I think we can easily get away with banning anonymous pipes from
> > > fanotify altogeter, but I would not like to get to into a situation
> > > where new applications will be written to rely on inotify for
> > > functionaly that fanotify is never going to have.
> > >
> > > Thoughts?
> > > Am I over thinking this?
> > >
> > > Amir.
> > >
> > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > --- a/fs/notify/fanotify/fanotify_user.c
> > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > >               return -EINVAL;
> > >
> > > +     /*
> > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > +      * anonynous pipes in the system.
> >
> > s/anonynous/anonymous/
> >
> > > +      *
> > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > +      * are not exposed to user's mount namespace, but there are other
> > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > +      * allowing sb and mount mark is questionable.
> > > +      */
> > > +     if (mark_type != FAN_MARK_INODE &&
> > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > +             return -EINVAL;
> >
> 
> On second thought, I am not sure about  the EINVAL error code here.
> I used the same error code that Jan used for permission events on
> proc fs, but the problem is that applications do not have a decent way
> to differentiate between
> "sb mark not supported by kernel" (i.e. < v4.20) vs.
> "sb mark not supported by fs" (the case above)
> 
> same for permission events:
> "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> "permission events not supported by fs" (procfs)
> 
> I have looked for other syscalls that react to SB_NOUSER and I've
> found that mount also returns EINVAL.

We tend to return EINVAL both for invalid (combination of) flags as well as
for flags applied to invalid objects in various calls. In practice there is
rarely a difference.

> So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> for invalid flag combinations (also across the two syscalls),
> but not because of the type of object being marked, except for
> the special case of procfs and permission events.
> 
> mount(2) syscall OTOH, has many documented EINVAL cases
> due to the type of source object (e.g. propagation type shared).
> 
> I know there is no standard and EINVAL can mean many
> different things in syscalls, but I thought that maybe EACCES
> would convey more accurately the message:
> "The sb/mount of this fs is not accessible for placing a mark".
> 
> WDYT? worth changing?
> worth changing procfs also?
> We don't have that EINVAL for procfs documented in man page btw.

Well, EACCES translates to message "Permission denied" which as Christian
writes is justifiable but frankly I find it more confusing. Because when I
get "Permission denied", I go looking which permissions are wrong, perhaps
suspecting SELinux or other LSM and don't think that object type / location
is at fault.

I agree that with EINVAL it is impossible to distinguish "unsupported on
this object only" vs "completely unknown flag" but it doesn't seem like a
huge problem for userspace to me as I can think of workarounds even if
userspace wants to do something else than "report error and bail".

								Honza
Christian Brauner July 4, 2023, 9:58 a.m. UTC | #9
On Mon, Jul 03, 2023 at 01:25:51PM +0200, Jan Kara wrote:
> On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> > On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > > anonymous pipes/inodes.
> > > >
> > > > I cannot think of a good reason to allow this - it looks like an
> > > > oversight that dated back to the original fanotify API.
> > > >
> > > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >
> > > > Jan,
> > > >
> > > > As discussed, allowing sb/mount mark on anonymous pipes
> > > > makes no sense and we should not allow it.
> > > >
> > > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > > backport to maintained LTS kernels event though this dates back to day one
> > > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > > >
> > > > The reason this is an RFC and that I have not included also the
> > > > optimization patch is because we may want to consider banning kernel
> > > > internal inodes from fanotify and/or inotify altogether.
> > > >
> > > > The tricky point in banning anonymous pipes from inotify, which
> > > > could have existing users (?), but maybe not, so maybe this is
> > > > something that we need to try out.
> > > >
> > > > I think we can easily get away with banning anonymous pipes from
> > > > fanotify altogeter, but I would not like to get to into a situation
> > > > where new applications will be written to rely on inotify for
> > > > functionaly that fanotify is never going to have.
> > > >
> > > > Thoughts?
> > > > Am I over thinking this?
> > > >
> > > > Amir.
> > > >
> > > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > > >               return -EINVAL;
> > > >
> > > > +     /*
> > > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > > +      * anonynous pipes in the system.
> > >
> > > s/anonynous/anonymous/
> > >
> > > > +      *
> > > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > > +      * are not exposed to user's mount namespace, but there are other
> > > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > > +      * allowing sb and mount mark is questionable.
> > > > +      */
> > > > +     if (mark_type != FAN_MARK_INODE &&
> > > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > > +             return -EINVAL;
> > >
> > 
> > On second thought, I am not sure about  the EINVAL error code here.
> > I used the same error code that Jan used for permission events on
> > proc fs, but the problem is that applications do not have a decent way
> > to differentiate between
> > "sb mark not supported by kernel" (i.e. < v4.20) vs.
> > "sb mark not supported by fs" (the case above)
> > 
> > same for permission events:
> > "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> > "permission events not supported by fs" (procfs)
> > 
> > I have looked for other syscalls that react to SB_NOUSER and I've
> > found that mount also returns EINVAL.
> 
> We tend to return EINVAL both for invalid (combination of) flags as well as
> for flags applied to invalid objects in various calls. In practice there is
> rarely a difference.
> 
> > So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> > for invalid flag combinations (also across the two syscalls),
> > but not because of the type of object being marked, except for
> > the special case of procfs and permission events.
> > 
> > mount(2) syscall OTOH, has many documented EINVAL cases
> > due to the type of source object (e.g. propagation type shared).
> > 
> > I know there is no standard and EINVAL can mean many
> > different things in syscalls, but I thought that maybe EACCES
> > would convey more accurately the message:
> > "The sb/mount of this fs is not accessible for placing a mark".
> > 
> > WDYT? worth changing?
> > worth changing procfs also?
> > We don't have that EINVAL for procfs documented in man page btw.
> 
> Well, EACCES translates to message "Permission denied" which as Christian
> writes is justifiable but frankly I find it more confusing. Because when I
> get "Permission denied", I go looking which permissions are wrong, perhaps
> suspecting SELinux or other LSM and don't think that object type / location
> is at fault.
> 
> I agree that with EINVAL it is impossible to distinguish "unsupported on
> this object only" vs "completely unknown flag" but it doesn't seem like a
> huge problem for userspace to me as I can think of workarounds even if
> userspace wants to do something else than "report error and bail".

Userspace is pretty used to the flood of EINVAL from the vfs apis so
they often have good workarounds. It doesn't mean it's something we
should just discount ofc. I think having ways to surface more
descriptive errors would overall be a good thing.
Jan Kara July 4, 2023, 11:18 a.m. UTC | #10
On Tue 04-07-23 11:58:07, Christian Brauner wrote:
> On Mon, Jul 03, 2023 at 01:25:51PM +0200, Jan Kara wrote:
> > On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> > > On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > > > anonymous pipes/inodes.
> > > > >
> > > > > I cannot think of a good reason to allow this - it looks like an
> > > > > oversight that dated back to the original fanotify API.
> > > > >
> > > > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > >
> > > > > Jan,
> > > > >
> > > > > As discussed, allowing sb/mount mark on anonymous pipes
> > > > > makes no sense and we should not allow it.
> > > > >
> > > > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > > > backport to maintained LTS kernels event though this dates back to day one
> > > > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > > > >
> > > > > The reason this is an RFC and that I have not included also the
> > > > > optimization patch is because we may want to consider banning kernel
> > > > > internal inodes from fanotify and/or inotify altogether.
> > > > >
> > > > > The tricky point in banning anonymous pipes from inotify, which
> > > > > could have existing users (?), but maybe not, so maybe this is
> > > > > something that we need to try out.
> > > > >
> > > > > I think we can easily get away with banning anonymous pipes from
> > > > > fanotify altogeter, but I would not like to get to into a situation
> > > > > where new applications will be written to rely on inotify for
> > > > > functionaly that fanotify is never going to have.
> > > > >
> > > > > Thoughts?
> > > > > Am I over thinking this?
> > > > >
> > > > > Amir.
> > > > >
> > > > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > > > >  1 file changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > > > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > > > >               return -EINVAL;
> > > > >
> > > > > +     /*
> > > > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > > > +      * anonynous pipes in the system.
> > > >
> > > > s/anonynous/anonymous/
> > > >
> > > > > +      *
> > > > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > > > +      * are not exposed to user's mount namespace, but there are other
> > > > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > > > +      * allowing sb and mount mark is questionable.
> > > > > +      */
> > > > > +     if (mark_type != FAN_MARK_INODE &&
> > > > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > > > +             return -EINVAL;
> > > >
> > > 
> > > On second thought, I am not sure about  the EINVAL error code here.
> > > I used the same error code that Jan used for permission events on
> > > proc fs, but the problem is that applications do not have a decent way
> > > to differentiate between
> > > "sb mark not supported by kernel" (i.e. < v4.20) vs.
> > > "sb mark not supported by fs" (the case above)
> > > 
> > > same for permission events:
> > > "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> > > "permission events not supported by fs" (procfs)
> > > 
> > > I have looked for other syscalls that react to SB_NOUSER and I've
> > > found that mount also returns EINVAL.
> > 
> > We tend to return EINVAL both for invalid (combination of) flags as well as
> > for flags applied to invalid objects in various calls. In practice there is
> > rarely a difference.
> > 
> > > So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> > > for invalid flag combinations (also across the two syscalls),
> > > but not because of the type of object being marked, except for
> > > the special case of procfs and permission events.
> > > 
> > > mount(2) syscall OTOH, has many documented EINVAL cases
> > > due to the type of source object (e.g. propagation type shared).
> > > 
> > > I know there is no standard and EINVAL can mean many
> > > different things in syscalls, but I thought that maybe EACCES
> > > would convey more accurately the message:
> > > "The sb/mount of this fs is not accessible for placing a mark".
> > > 
> > > WDYT? worth changing?
> > > worth changing procfs also?
> > > We don't have that EINVAL for procfs documented in man page btw.
> > 
> > Well, EACCES translates to message "Permission denied" which as Christian
> > writes is justifiable but frankly I find it more confusing. Because when I
> > get "Permission denied", I go looking which permissions are wrong, perhaps
> > suspecting SELinux or other LSM and don't think that object type / location
> > is at fault.
> > 
> > I agree that with EINVAL it is impossible to distinguish "unsupported on
> > this object only" vs "completely unknown flag" but it doesn't seem like a
> > huge problem for userspace to me as I can think of workarounds even if
> > userspace wants to do something else than "report error and bail".
> 
> Userspace is pretty used to the flood of EINVAL from the vfs apis so
> they often have good workarounds. It doesn't mean it's something we
> should just discount ofc. I think having ways to surface more
> descriptive errors would overall be a good thing.

Oh, I absolutely agree with that. I'm just not sure whether returning
EACCES in this particular case is going to cause more or less confusion.

								Honza
Christian Brauner July 4, 2023, 12:47 p.m. UTC | #11
On Tue, Jul 04, 2023 at 01:18:33PM +0200, Jan Kara wrote:
> On Tue 04-07-23 11:58:07, Christian Brauner wrote:
> > On Mon, Jul 03, 2023 at 01:25:51PM +0200, Jan Kara wrote:
> > > On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> > > > On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > > > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > > > > anonymous pipes/inodes.
> > > > > >
> > > > > > I cannot think of a good reason to allow this - it looks like an
> > > > > > oversight that dated back to the original fanotify API.
> > > > > >
> > > > > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > > > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > > ---
> > > > > >
> > > > > > Jan,
> > > > > >
> > > > > > As discussed, allowing sb/mount mark on anonymous pipes
> > > > > > makes no sense and we should not allow it.
> > > > > >
> > > > > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > > > > backport to maintained LTS kernels event though this dates back to day one
> > > > > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > > > > >
> > > > > > The reason this is an RFC and that I have not included also the
> > > > > > optimization patch is because we may want to consider banning kernel
> > > > > > internal inodes from fanotify and/or inotify altogether.
> > > > > >
> > > > > > The tricky point in banning anonymous pipes from inotify, which
> > > > > > could have existing users (?), but maybe not, so maybe this is
> > > > > > something that we need to try out.
> > > > > >
> > > > > > I think we can easily get away with banning anonymous pipes from
> > > > > > fanotify altogeter, but I would not like to get to into a situation
> > > > > > where new applications will be written to rely on inotify for
> > > > > > functionaly that fanotify is never going to have.
> > > > > >
> > > > > > Thoughts?
> > > > > > Am I over thinking this?
> > > > > >
> > > > > > Amir.
> > > > > >
> > > > > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > > > > >  1 file changed, 14 insertions(+)
> > > > > >
> > > > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > > > > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > > > > >               return -EINVAL;
> > > > > >
> > > > > > +     /*
> > > > > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > > > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > > > > +      * anonynous pipes in the system.
> > > > >
> > > > > s/anonynous/anonymous/
> > > > >
> > > > > > +      *
> > > > > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > > > > +      * are not exposed to user's mount namespace, but there are other
> > > > > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > > > > +      * allowing sb and mount mark is questionable.
> > > > > > +      */
> > > > > > +     if (mark_type != FAN_MARK_INODE &&
> > > > > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > > > > +             return -EINVAL;
> > > > >
> > > > 
> > > > On second thought, I am not sure about  the EINVAL error code here.
> > > > I used the same error code that Jan used for permission events on
> > > > proc fs, but the problem is that applications do not have a decent way
> > > > to differentiate between
> > > > "sb mark not supported by kernel" (i.e. < v4.20) vs.
> > > > "sb mark not supported by fs" (the case above)
> > > > 
> > > > same for permission events:
> > > > "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> > > > "permission events not supported by fs" (procfs)
> > > > 
> > > > I have looked for other syscalls that react to SB_NOUSER and I've
> > > > found that mount also returns EINVAL.
> > > 
> > > We tend to return EINVAL both for invalid (combination of) flags as well as
> > > for flags applied to invalid objects in various calls. In practice there is
> > > rarely a difference.
> > > 
> > > > So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> > > > for invalid flag combinations (also across the two syscalls),
> > > > but not because of the type of object being marked, except for
> > > > the special case of procfs and permission events.
> > > > 
> > > > mount(2) syscall OTOH, has many documented EINVAL cases
> > > > due to the type of source object (e.g. propagation type shared).
> > > > 
> > > > I know there is no standard and EINVAL can mean many
> > > > different things in syscalls, but I thought that maybe EACCES
> > > > would convey more accurately the message:
> > > > "The sb/mount of this fs is not accessible for placing a mark".
> > > > 
> > > > WDYT? worth changing?
> > > > worth changing procfs also?
> > > > We don't have that EINVAL for procfs documented in man page btw.
> > > 
> > > Well, EACCES translates to message "Permission denied" which as Christian
> > > writes is justifiable but frankly I find it more confusing. Because when I
> > > get "Permission denied", I go looking which permissions are wrong, perhaps
> > > suspecting SELinux or other LSM and don't think that object type / location
> > > is at fault.
> > > 
> > > I agree that with EINVAL it is impossible to distinguish "unsupported on
> > > this object only" vs "completely unknown flag" but it doesn't seem like a
> > > huge problem for userspace to me as I can think of workarounds even if
> > > userspace wants to do something else than "report error and bail".
> > 
> > Userspace is pretty used to the flood of EINVAL from the vfs apis so
> > they often have good workarounds. It doesn't mean it's something we
> > should just discount ofc. I think having ways to surface more
> > descriptive errors would overall be a good thing.
> 
> Oh, I absolutely agree with that. I'm just not sure whether returning
> EACCES in this particular case is going to cause more or less confusion.

Yeah, I really meant it as a generic comment that also applies to my
own comment about EINVAL being ubiquitous; so not directed at your
comment about EACCES specifically.
Amir Goldstein July 4, 2023, 1:19 p.m. UTC | #12
On Tue, Jul 4, 2023 at 3:47 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Jul 04, 2023 at 01:18:33PM +0200, Jan Kara wrote:
> > On Tue 04-07-23 11:58:07, Christian Brauner wrote:
> > > On Mon, Jul 03, 2023 at 01:25:51PM +0200, Jan Kara wrote:
> > > > On Sat 01-07-23 19:25:14, Amir Goldstein wrote:
> > > > > On Fri, Jun 30, 2023 at 10:29 AM Christian Brauner <brauner@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jun 29, 2023 at 07:20:44AM +0300, Amir Goldstein wrote:
> > > > > > > Hopefully, nobody is trying to abuse mount/sb marks for watching all
> > > > > > > anonymous pipes/inodes.
> > > > > > >
> > > > > > > I cannot think of a good reason to allow this - it looks like an
> > > > > > > oversight that dated back to the original fanotify API.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/
> > > > > > > Fixes: d54f4fba889b ("fanotify: add API to attach/detach super block mark")
> > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > > > ---
> > > > > > >
> > > > > > > Jan,
> > > > > > >
> > > > > > > As discussed, allowing sb/mount mark on anonymous pipes
> > > > > > > makes no sense and we should not allow it.
> > > > > > >
> > > > > > > I've noted FAN_MARK_FILESYSTEM as the Fixes commit as a trigger to
> > > > > > > backport to maintained LTS kernels event though this dates back to day one
> > > > > > > with FAN_MARK_MOUNT. Not sure if we should keep the Fixes tag or not.
> > > > > > >
> > > > > > > The reason this is an RFC and that I have not included also the
> > > > > > > optimization patch is because we may want to consider banning kernel
> > > > > > > internal inodes from fanotify and/or inotify altogether.
> > > > > > >
> > > > > > > The tricky point in banning anonymous pipes from inotify, which
> > > > > > > could have existing users (?), but maybe not, so maybe this is
> > > > > > > something that we need to try out.
> > > > > > >
> > > > > > > I think we can easily get away with banning anonymous pipes from
> > > > > > > fanotify altogeter, but I would not like to get to into a situation
> > > > > > > where new applications will be written to rely on inotify for
> > > > > > > functionaly that fanotify is never going to have.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > > Am I over thinking this?
> > > > > > >
> > > > > > > Amir.
> > > > > > >
> > > > > > >  fs/notify/fanotify/fanotify_user.c | 14 ++++++++++++++
> > > > > > >  1 file changed, 14 insertions(+)
> > > > > > >
> > > > > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > > > > index 95d7d8790bc3..8240a3fdbef0 100644
> > > > > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > > > > @@ -1622,6 +1622,20 @@ static int fanotify_events_supported(struct fsnotify_group *group,
> > > > > > >           path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
> > > > > > >               return -EINVAL;
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * mount and sb marks are not allowed on kernel internal pseudo fs,
> > > > > > > +      * like pipe_mnt, because that would subscribe to events on all the
> > > > > > > +      * anonynous pipes in the system.
> > > > > >
> > > > > > s/anonynous/anonymous/
> > > > > >
> > > > > > > +      *
> > > > > > > +      * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
> > > > > > > +      * are not exposed to user's mount namespace, but there are other
> > > > > > > +      * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
> > > > > > > +      * allowing sb and mount mark is questionable.
> > > > > > > +      */
> > > > > > > +     if (mark_type != FAN_MARK_INODE &&
> > > > > > > +         path->mnt->mnt_sb->s_flags & SB_NOUSER)
> > > > > > > +             return -EINVAL;
> > > > > >
> > > > >
> > > > > On second thought, I am not sure about  the EINVAL error code here.
> > > > > I used the same error code that Jan used for permission events on
> > > > > proc fs, but the problem is that applications do not have a decent way
> > > > > to differentiate between
> > > > > "sb mark not supported by kernel" (i.e. < v4.20) vs.
> > > > > "sb mark not supported by fs" (the case above)
> > > > >
> > > > > same for permission events:
> > > > > "kernel compiled without FANOTIFY_ACCESS_PERMISSIONS" vs.
> > > > > "permission events not supported by fs" (procfs)
> > > > >
> > > > > I have looked for other syscalls that react to SB_NOUSER and I've
> > > > > found that mount also returns EINVAL.
> > > >
> > > > We tend to return EINVAL both for invalid (combination of) flags as well as
> > > > for flags applied to invalid objects in various calls. In practice there is
> > > > rarely a difference.
> > > >
> > > > > So far, fanotify_mark() and fanotify_init() mostly return EINVAL
> > > > > for invalid flag combinations (also across the two syscalls),
> > > > > but not because of the type of object being marked, except for
> > > > > the special case of procfs and permission events.
> > > > >
> > > > > mount(2) syscall OTOH, has many documented EINVAL cases
> > > > > due to the type of source object (e.g. propagation type shared).
> > > > >
> > > > > I know there is no standard and EINVAL can mean many
> > > > > different things in syscalls, but I thought that maybe EACCES
> > > > > would convey more accurately the message:
> > > > > "The sb/mount of this fs is not accessible for placing a mark".
> > > > >
> > > > > WDYT? worth changing?
> > > > > worth changing procfs also?
> > > > > We don't have that EINVAL for procfs documented in man page btw.
> > > >
> > > > Well, EACCES translates to message "Permission denied" which as Christian
> > > > writes is justifiable but frankly I find it more confusing. Because when I
> > > > get "Permission denied", I go looking which permissions are wrong, perhaps
> > > > suspecting SELinux or other LSM and don't think that object type / location
> > > > is at fault.
> > > >
> > > > I agree that with EINVAL it is impossible to distinguish "unsupported on
> > > > this object only" vs "completely unknown flag" but it doesn't seem like a
> > > > huge problem for userspace to me as I can think of workarounds even if
> > > > userspace wants to do something else than "report error and bail".
> > >
> > > Userspace is pretty used to the flood of EINVAL from the vfs apis so
> > > they often have good workarounds. It doesn't mean it's something we
> > > should just discount ofc. I think having ways to surface more
> > > descriptive errors would overall be a good thing.
> >
> > Oh, I absolutely agree with that. I'm just not sure whether returning
> > EACCES in this particular case is going to cause more or less confusion.
>
> Yeah, I really meant it as a generic comment that also applies to my
> own comment about EINVAL being ubiquitous; so not directed at your
> comment about EACCES specifically.

No worries. Let it stay EINVAL.
Sorry for the noise :)

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 95d7d8790bc3..8240a3fdbef0 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1622,6 +1622,20 @@  static int fanotify_events_supported(struct fsnotify_group *group,
 	    path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
 		return -EINVAL;
 
+	/*
+	 * mount and sb marks are not allowed on kernel internal pseudo fs,
+	 * like pipe_mnt, because that would subscribe to events on all the
+	 * anonynous pipes in the system.
+	 *
+	 * XXX: SB_NOUSER covers all of the internal pseudo fs whose objects
+	 * are not exposed to user's mount namespace, but there are other
+	 * SB_KERNMOUNT fs, like nsfs, debugfs, for which the value of
+	 * allowing sb and mount mark is questionable.
+	 */
+	if (mark_type != FAN_MARK_INODE &&
+	    path->mnt->mnt_sb->s_flags & SB_NOUSER)
+		return -EINVAL;
+
 	/*
 	 * We shouldn't have allowed setting dirent events and the directory
 	 * flags FAN_ONDIR and FAN_EVENT_ON_CHILD in mask of non-dir inode,