fsnotify: do not generate duplicate events for "fake" path
diff mbox series

Message ID 20190424100951.16678-1-amir73il@gmail.com
State New
Headers show
Series
  • fsnotify: do not generate duplicate events for "fake" path
Related show

Commit Message

Amir Goldstein April 24, 2019, 10:09 a.m. UTC
Overlayfs "fake" path is used for stacked file operations on
underlying files.  Operations on files with "fake" path must not
generate events on mount marks and on parent watches, because
those events have already been generated at overlayfs layer.

The reported event->fd for inode/sb marks will have the wrong path
(overlayfs path), but we have no choice but to report them anyway.

Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

This is a slightly simplified and cleaner version of the patch I posted
yesterday on the linked bug report thread.

I have posted an additional overlayfs test case to fanotify06 on LTP
list.

Thanks,
Amir.

 fs/notify/fsnotify.c     |  3 ++-
 include/linux/fsnotify.h | 23 ++++++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

Comments

Jan Kara April 24, 2019, 3:57 p.m. UTC | #1
Hi Amir!

On Wed 24-04-19 13:09:51, Amir Goldstein wrote:
> Overlayfs "fake" path is used for stacked file operations on
> underlying files.  Operations on files with "fake" path must not
> generate events on mount marks and on parent watches, because
> those events have already been generated at overlayfs layer.
> 
> The reported event->fd for inode/sb marks will have the wrong path
> (overlayfs path), but we have no choice but to report them anyway.
> 
> Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/
> Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
> Fixes: d1d04ef8572b ("ovl: stack file ops")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

So honestly I don't quite like that fsnotify core has to care about
peculiarities of overlayfs. And I'm not sure I fully understand what the
problem actually is so let me try to summarize here:

With write to overlayfs, we generate event both for inode on underlying
filesystem (upper inode; this gets generated by
vfs_write()->ovl_write_iter()->do_iter_write()) and also for "virtual"
overlayfs inode (generated directly by vfs_write()). Now for overlayfs
inode the (mnt, dentry) pair is a valid one - the one used for opening the
file. For upper inode you say we don't use proper path (mnt, dentry) pair
but some made-up one - looks like the original overlayfs path if I read the
code right. So fsnotify will get two fsnotify_modify() calls, both with the
same file->f_path but with different file->f_inode. So no surprise mntpoint
/ path based stuff is getting confused by this.

I guess the first question that comes to my mind is: Is fsnotify on 'upper'
inodes actually useful? On overlayfs as a whole it makes sense but on
individual filesystems I'm not so sure. And here we can see that mountpoint
watches are going to have hard times, some events (e.g. open / close) do
not seem to be generated at all, not sure if directory events are
generated...

								Honza

> ---
> 
> Jan,
> 
> This is a slightly simplified and cleaner version of the patch I posted
> yesterday on the linked bug report thread.
> 
> I have posted an additional overlayfs test case to fanotify06 on LTP
> list.
> 
> Thanks,
> Amir.
> 
>  fs/notify/fsnotify.c     |  3 ++-
>  include/linux/fsnotify.h | 23 ++++++++++++++++++++---
>  2 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index df06f3da166c..6f752b13e3fd 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -334,7 +334,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
>  	int ret = 0;
>  	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
>  
> -	if (data_is == FSNOTIFY_EVENT_PATH) {
> +	if (data_is == FSNOTIFY_EVENT_PATH &&
> +	    !fsnotify_is_fake_path(to_tell, data)) {
>  		mnt = real_mount(((const struct path *)data)->mnt);
>  		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
>  	}
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 09587e2860b5..c51f57d3a025 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -40,6 +40,20 @@ static inline int fsnotify_parent(const struct path *path,
>  	return __fsnotify_parent(path, dentry, mask);
>  }
>  
> +/*
> + * Overlayfs "fake" path is used for stacked file operations on underlying
> + * files.  file->f_path is from overlayfs layer and file->f_inode is from
> + * underlying layer.  We must not generate events on mount and on parent
> + * based on fake path, because those events have already been generated at
> + * overlayfs layer.  The reported event->fd for inode/sb marks will have the
> + * wrong path (overlayfs path), but we have no choice but to report them as is.
> + */
> +static inline bool fsnotify_is_fake_path(struct inode *inode,
> +					 const struct path *path)
> +{
> +	return unlikely(inode->i_sb != path->dentry->d_sb);
> +}
> +
>  /*
>   * Simple wrapper to consolidate calls fsnotify_parent()/fsnotify() when
>   * an event is on a path.
> @@ -47,10 +61,13 @@ static inline int fsnotify_parent(const struct path *path,
>  static inline int fsnotify_path(struct inode *inode, const struct path *path,
>  				__u32 mask)
>  {
> -	int ret = fsnotify_parent(path, NULL, mask);
> +	if (!fsnotify_is_fake_path(inode, path)) {
> +		int ret = fsnotify_parent(path, NULL, mask);
> +
> +		if (ret)
> +			return ret;
> +	}
>  
> -	if (ret)
> -		return ret;
>  	return fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
>  }
>  
> -- 
> 2.17.1
>
Amir Goldstein April 24, 2019, 4:17 p.m. UTC | #2
On Wed, Apr 24, 2019 at 6:57 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi Amir!
>
> On Wed 24-04-19 13:09:51, Amir Goldstein wrote:
> > Overlayfs "fake" path is used for stacked file operations on
> > underlying files.  Operations on files with "fake" path must not
> > generate events on mount marks and on parent watches, because
> > those events have already been generated at overlayfs layer.
> >
> > The reported event->fd for inode/sb marks will have the wrong path
> > (overlayfs path), but we have no choice but to report them anyway.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/
> > Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
> > Fixes: d1d04ef8572b ("ovl: stack file ops")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> So honestly I don't quite like that fsnotify core has to care about
> peculiarities of overlayfs. And I'm not sure I fully understand what the
> problem actually is so let me try to summarize here:
>
> With write to overlayfs, we generate event both for inode on underlying
> filesystem (upper inode; this gets generated by
> vfs_write()->ovl_write_iter()->do_iter_write()) and also for "virtual"
> overlayfs inode (generated directly by vfs_write()). Now for overlayfs
> inode the (mnt, dentry) pair is a valid one - the one used for opening the
> file. For upper inode you say we don't use proper path (mnt, dentry) pair
> but some made-up one - looks like the original overlayfs path if I read the
> code right. So fsnotify will get two fsnotify_modify() calls, both with the
> same file->f_path but with different file->f_inode. So no surprise mntpoint
> / path based stuff is getting confused by this.
>

That is correct.

> I guess the first question that comes to my mind is: Is fsnotify on 'upper'
> inodes actually useful? On overlayfs as a whole it makes sense but on
> individual filesystems I'm not so sure.

Not sure. I think the use case is FAN_MARK_FILESYSTEM watch from
host by Anti-Virus software to protect host from containers.
Yes, host can watch all container overlayfs mounts, but that is less scalable.

> And here we can see that mountpoint
> watches are going to have hard times, some events (e.g. open / close) do
> not seem to be generated at all, not sure if directory events are
> generated...
>

So it seems that before 573e1784817c Revert "fsnotify: support overlayfs"
no fanotify events were generated on underlying regular inodes.
Overlayfs uses private cloned mounts to access real inodes, so marking
underlying mounts was never going to produce any events.
Overlayfs does open underlying directory files, so events on underlying
directories would be generated (I think) even before v4.19.

inotify and new fanotify directory modification events would also be
generated from overlayfs, but we do not have a "fake" path issue with
those.

The bottom line is that it is probably best not to report any events
originating from overlayfs file with "fake" path and the simplest way to
achieve that would be to open the overlayfs realfile with
FMODE_NONOTIFY. At first I though doing that may cause regressions
from pre v4.19, but I was wrong. It would be the same behavior as pre v4.19.

I'll go write the patch...

Thanks,
Amir.

Patch
diff mbox series

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index df06f3da166c..6f752b13e3fd 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -334,7 +334,8 @@  int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	int ret = 0;
 	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
 
-	if (data_is == FSNOTIFY_EVENT_PATH) {
+	if (data_is == FSNOTIFY_EVENT_PATH &&
+	    !fsnotify_is_fake_path(to_tell, data)) {
 		mnt = real_mount(((const struct path *)data)->mnt);
 		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
 	}
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 09587e2860b5..c51f57d3a025 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -40,6 +40,20 @@  static inline int fsnotify_parent(const struct path *path,
 	return __fsnotify_parent(path, dentry, mask);
 }
 
+/*
+ * Overlayfs "fake" path is used for stacked file operations on underlying
+ * files.  file->f_path is from overlayfs layer and file->f_inode is from
+ * underlying layer.  We must not generate events on mount and on parent
+ * based on fake path, because those events have already been generated at
+ * overlayfs layer.  The reported event->fd for inode/sb marks will have the
+ * wrong path (overlayfs path), but we have no choice but to report them as is.
+ */
+static inline bool fsnotify_is_fake_path(struct inode *inode,
+					 const struct path *path)
+{
+	return unlikely(inode->i_sb != path->dentry->d_sb);
+}
+
 /*
  * Simple wrapper to consolidate calls fsnotify_parent()/fsnotify() when
  * an event is on a path.
@@ -47,10 +61,13 @@  static inline int fsnotify_parent(const struct path *path,
 static inline int fsnotify_path(struct inode *inode, const struct path *path,
 				__u32 mask)
 {
-	int ret = fsnotify_parent(path, NULL, mask);
+	if (!fsnotify_is_fake_path(inode, path)) {
+		int ret = fsnotify_parent(path, NULL, mask);
+
+		if (ret)
+			return ret;
+	}
 
-	if (ret)
-		return ret;
 	return fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
 }