diff mbox series

[v3] overlay: Implement volatile-specific fsync error behaviour

Message ID 20210106083546.4392-1-sargun@sargun.me (mailing list archive)
State New, archived
Headers show
Series [v3] overlay: Implement volatile-specific fsync error behaviour | expand

Commit Message

Sargun Dhillon Jan. 6, 2021, 8:35 a.m. UTC
Overlayfs's volatile option allows the user to bypass all forced sync calls
to the upperdir filesystem. This comes at the cost of safety. We can never
ensure that the user's data is intact, but we can make a best effort to
expose whether or not the data is likely to be in a bad state.

The best way to handle this in the time being is that if an overlayfs's
upperdir experiences an error after a volatile mount occurs, that error
will be returned on fsync, fdatasync, sync, and syncfs. This is
contradictory to the traditional behaviour of VFS which fails the call
once, and only raises an error if a subsequent fsync error has occurred,
and been raised by the filesystem.

One awkward aspect of the patch is that we have to manually set the
superblock's errseq_t after the sync_fs callback as opposed to just
returning an error from syncfs. This is because the call chain looks
something like this:

sys_syncfs ->
	sync_filesystem ->
		__sync_filesystem ->
			/* The return value is ignored here
			sb->s_op->sync_fs(sb)
			_sync_blockdev
		/* Where the VFS fetches the error to raise to userspace */
		errseq_check_and_advance

Because of this we call errseq_set every time the sync_fs callback occurs.
Due to the nature of this seen / unseen dichotomy, if the upperdir is an
inconsistent state at the initial mount time, overlayfs will refuse to
mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
will increment on error until the user calls syncfs.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
---
 Documentation/filesystems/overlayfs.rst |  8 +++++++
 fs/overlayfs/file.c                     |  5 ++--
 fs/overlayfs/overlayfs.h                |  1 +
 fs/overlayfs/ovl_entry.h                |  2 ++
 fs/overlayfs/readdir.c                  |  5 ++--
 fs/overlayfs/super.c                    | 32 +++++++++++++++++++------
 fs/overlayfs/util.c                     | 27 +++++++++++++++++++++
 7 files changed, 69 insertions(+), 11 deletions(-)

Comments

Amir Goldstein Jan. 6, 2021, 9:34 a.m. UTC | #1
On Wed, Jan 6, 2021 at 10:35 AM Sargun Dhillon <sargun@sargun.me> wrote:
>
> Overlayfs's volatile option allows the user to bypass all forced sync calls
> to the upperdir filesystem. This comes at the cost of safety. We can never
> ensure that the user's data is intact, but we can make a best effort to
> expose whether or not the data is likely to be in a bad state.
>
> The best way to handle this in the time being is that if an overlayfs's
> upperdir experiences an error after a volatile mount occurs, that error
> will be returned on fsync, fdatasync, sync, and syncfs. This is
> contradictory to the traditional behaviour of VFS which fails the call
> once, and only raises an error if a subsequent fsync error has occurred,
> and been raised by the filesystem.
>
> One awkward aspect of the patch is that we have to manually set the
> superblock's errseq_t after the sync_fs callback as opposed to just
> returning an error from syncfs. This is because the call chain looks
> something like this:
>
> sys_syncfs ->
>         sync_filesystem ->
>                 __sync_filesystem ->
>                         /* The return value is ignored here
>                         sb->s_op->sync_fs(sb)
>                         _sync_blockdev
>                 /* Where the VFS fetches the error to raise to userspace */
>                 errseq_check_and_advance
>
> Because of this we call errseq_set every time the sync_fs callback occurs.
> Due to the nature of this seen / unseen dichotomy, if the upperdir is an
> inconsistent state at the initial mount time, overlayfs will refuse to
> mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
> will increment on error until the user calls syncfs.
>
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> Suggested-by: Amir Goldstein <amir73il@gmail.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: Jeff Layton <jlayton@redhat.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> ---

Looks good.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

>  Documentation/filesystems/overlayfs.rst |  8 +++++++
>  fs/overlayfs/file.c                     |  5 ++--
>  fs/overlayfs/overlayfs.h                |  1 +
>  fs/overlayfs/ovl_entry.h                |  2 ++
>  fs/overlayfs/readdir.c                  |  5 ++--
>  fs/overlayfs/super.c                    | 32 +++++++++++++++++++------
>  fs/overlayfs/util.c                     | 27 +++++++++++++++++++++
>  7 files changed, 69 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> index 580ab9a0fe31..3af569cea6a7 100644
> --- a/Documentation/filesystems/overlayfs.rst
> +++ b/Documentation/filesystems/overlayfs.rst
> @@ -575,6 +575,14 @@ without significant effort.
>  The advantage of mounting with the "volatile" option is that all forms of
>  sync calls to the upper filesystem are omitted.
>
> +In order to avoid a giving a false sense of safety, the syncfs (and fsync)
> +semantics of volatile mounts are slightly different than that of the rest of
> +VFS.  If any error occurs on the upperdir's filesystem after a volatile mount
> +takes place, all sync functions will return the last error observed on the
> +upperdir filesystem.  Once this condition is reached, the filesystem will not
> +recover, and every subsequent sync call will return an error, even if the
> +upperdir has not experience a new error since the last sync call.
> +
>  When overlay is mounted with "volatile" option, the directory
>  "$workdir/work/incompat/volatile" is created.  During next mount, overlay
>  checks for this directory and refuses to mount if present. This is a strong
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index a1f72ac053e5..5c5c3972ebd0 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -445,8 +445,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>         const struct cred *old_cred;
>         int ret;
>
> -       if (!ovl_should_sync(OVL_FS(file_inode(file)->i_sb)))
> -               return 0;
> +       ret = ovl_sync_status(OVL_FS(file_inode(file)->i_sb));
> +       if (ret <= 0)
> +               return ret;
>
>         ret = ovl_real_fdget_meta(file, &real, !datasync);
>         if (ret)
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index f8880aa2ba0e..9f7af98ae200 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -322,6 +322,7 @@ int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry);
>  bool ovl_is_metacopy_dentry(struct dentry *dentry);
>  char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
>                              int padding);
> +int ovl_sync_status(struct ovl_fs *ofs);
>
>  static inline bool ovl_is_impuredir(struct super_block *sb,
>                                     struct dentry *dentry)
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 1b5a2094df8e..b208eba5d0b6 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -79,6 +79,8 @@ struct ovl_fs {
>         atomic_long_t last_ino;
>         /* Whiteout dentry cache */
>         struct dentry *whiteout;
> +       /* r/o snapshot of upperdir sb's only taken on volatile mounts */
> +       errseq_t errseq;
>  };
>
>  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 01620ebae1bd..a273ef901e57 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -909,8 +909,9 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
>         struct file *realfile;
>         int err;
>
> -       if (!ovl_should_sync(OVL_FS(file->f_path.dentry->d_sb)))
> -               return 0;
> +       err = ovl_sync_status(OVL_FS(file->f_path.dentry->d_sb));
> +       if (err <= 0)
> +               return err;
>
>         realfile = ovl_dir_real_file(file, true);
>         err = PTR_ERR_OR_ZERO(realfile);
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 290983bcfbb3..b917b456bbb4 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
>         struct super_block *upper_sb;
>         int ret;
>
> -       if (!ovl_upper_mnt(ofs))
> -               return 0;
> +       ret = ovl_sync_status(ofs);
> +       /*
> +        * We have to always set the err, because the return value isn't
> +        * checked in syncfs, and instead indirectly return an error via
> +        * the sb's writeback errseq, which VFS inspects after this call.
> +        */
> +       if (ret < 0)
> +               errseq_set(&sb->s_wb_err, ret);
> +
> +       if (!ret)
> +               return ret;
>
> -       if (!ovl_should_sync(ofs))
> -               return 0;
>         /*
>          * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>          * All the super blocks will be iterated, including upper_sb.
> @@ -1927,6 +1934,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>         sb->s_op = &ovl_super_operations;
>
>         if (ofs->config.upperdir) {
> +               struct super_block *upper_sb;
> +
>                 if (!ofs->config.workdir) {
>                         pr_err("missing 'workdir'\n");
>                         goto out_err;
> @@ -1936,6 +1945,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>                 if (err)
>                         goto out_err;
>
> +               upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
> +               if (!ovl_should_sync(ofs)) {
> +                       ofs->errseq = errseq_sample(&upper_sb->s_wb_err);
> +                       if (errseq_check(&upper_sb->s_wb_err, ofs->errseq)) {
> +                               err = -EIO;
> +                               pr_err("Cannot mount volatile when upperdir has an unseen error. Sync upperdir fs to clear state.\n");
> +                               goto out_err;
> +                       }
> +               }
> +
>                 err = ovl_get_workdir(sb, ofs, &upperpath);
>                 if (err)
>                         goto out_err;
> @@ -1943,9 +1962,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>                 if (!ofs->workdir)
>                         sb->s_flags |= SB_RDONLY;
>
> -               sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> -               sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
> +               sb->s_stack_depth = upper_sb->s_stack_depth;
> +               sb->s_time_gran = upper_sb->s_time_gran;
>         }
>         oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
>         err = PTR_ERR(oe);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 23f475627d07..6e7b8c882045 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -950,3 +950,30 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
>         kfree(buf);
>         return ERR_PTR(res);
>  }
> +
> +/*
> + * ovl_sync_status() - Check fs sync status for volatile mounts
> + *
> + * Returns 1 if this is not a volatile mount and a real sync is required.
> + *
> + * Returns 0 if syncing can be skipped because mount is volatile, and no errors
> + * have occurred on the upperdir since the mount.
> + *
> + * Returns -errno if it is a volatile mount, and the error that occurred since
> + * the last mount. If the error code changes, it'll return the latest error
> + * code.
> + */
> +
> +int ovl_sync_status(struct ovl_fs *ofs)
> +{
> +       struct vfsmount *mnt;
> +
> +       if (ovl_should_sync(ofs))
> +               return 1;
> +
> +       mnt = ovl_upper_mnt(ofs);
> +       if (!mnt)
> +               return 0;
> +
> +       return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);
> +}
> --
> 2.25.1
>
Vivek Goyal Jan. 6, 2021, 7:46 p.m. UTC | #2
On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> Overlayfs's volatile option allows the user to bypass all forced sync calls
> to the upperdir filesystem. This comes at the cost of safety. We can never
> ensure that the user's data is intact, but we can make a best effort to
> expose whether or not the data is likely to be in a bad state.
> 
> The best way to handle this in the time being is that if an overlayfs's
> upperdir experiences an error after a volatile mount occurs, that error
> will be returned on fsync, fdatasync, sync, and syncfs. This is
> contradictory to the traditional behaviour of VFS which fails the call
> once, and only raises an error if a subsequent fsync error has occurred,
> and been raised by the filesystem.
> 
> One awkward aspect of the patch is that we have to manually set the
> superblock's errseq_t after the sync_fs callback as opposed to just
> returning an error from syncfs. This is because the call chain looks
> something like this:
> 
> sys_syncfs ->
> 	sync_filesystem ->
> 		__sync_filesystem ->
> 			/* The return value is ignored here
> 			sb->s_op->sync_fs(sb)
> 			_sync_blockdev
> 		/* Where the VFS fetches the error to raise to userspace */
> 		errseq_check_and_advance
> 
> Because of this we call errseq_set every time the sync_fs callback occurs.

Why not start capturing return code of ->sync_fs and then return error
from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb). 

I already posted a patch to capture retrun code from ->sync_fs.

https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/


> Due to the nature of this seen / unseen dichotomy, if the upperdir is an
> inconsistent state at the initial mount time, overlayfs will refuse to
> mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
> will increment on error until the user calls syncfs.
> 
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> Suggested-by: Amir Goldstein <amir73il@gmail.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: Jeff Layton <jlayton@redhat.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> ---
>  Documentation/filesystems/overlayfs.rst |  8 +++++++
>  fs/overlayfs/file.c                     |  5 ++--
>  fs/overlayfs/overlayfs.h                |  1 +
>  fs/overlayfs/ovl_entry.h                |  2 ++
>  fs/overlayfs/readdir.c                  |  5 ++--
>  fs/overlayfs/super.c                    | 32 +++++++++++++++++++------
>  fs/overlayfs/util.c                     | 27 +++++++++++++++++++++
>  7 files changed, 69 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> index 580ab9a0fe31..3af569cea6a7 100644
> --- a/Documentation/filesystems/overlayfs.rst
> +++ b/Documentation/filesystems/overlayfs.rst
> @@ -575,6 +575,14 @@ without significant effort.
>  The advantage of mounting with the "volatile" option is that all forms of
>  sync calls to the upper filesystem are omitted.
>  
> +In order to avoid a giving a false sense of safety, the syncfs (and fsync)
> +semantics of volatile mounts are slightly different than that of the rest of
> +VFS.  If any error occurs on the upperdir's filesystem after a volatile mount
                ^^^
shoud we say "If any writeback error occurs...."

> +takes place, all sync functions will return the last error observed on the
> +upperdir filesystem.  Once this condition is reached, the filesystem will not
> +recover, and every subsequent sync call will return an error, even if the
> +upperdir has not experience a new error since the last sync call.

Once filesystem fails, do we want to continue to return latest error on
upper? Or we just mark filesystem failed internally and once failed
we always return a fixed error, say -EIO. That way we don't have to
call errseq_check() on every filesystem call. I am assuming at some
point of time we will extend this to other filesystem functions
like read()/write()/mmap() etc. Filesystem has failed at this point 
of time and user is supposed to throw away upper and restart.

> +
>  When overlay is mounted with "volatile" option, the directory
>  "$workdir/work/incompat/volatile" is created.  During next mount, overlay
>  checks for this directory and refuses to mount if present. This is a strong
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index a1f72ac053e5..5c5c3972ebd0 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -445,8 +445,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>  	const struct cred *old_cred;
>  	int ret;
>  
> -	if (!ovl_should_sync(OVL_FS(file_inode(file)->i_sb)))
> -		return 0;
> +	ret = ovl_sync_status(OVL_FS(file_inode(file)->i_sb));
> +	if (ret <= 0)
> +		return ret;
>  
>  	ret = ovl_real_fdget_meta(file, &real, !datasync);
>  	if (ret)
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index f8880aa2ba0e..9f7af98ae200 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -322,6 +322,7 @@ int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry);
>  bool ovl_is_metacopy_dentry(struct dentry *dentry);
>  char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
>  			     int padding);
> +int ovl_sync_status(struct ovl_fs *ofs);
>  
>  static inline bool ovl_is_impuredir(struct super_block *sb,
>  				    struct dentry *dentry)
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 1b5a2094df8e..b208eba5d0b6 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -79,6 +79,8 @@ struct ovl_fs {
>  	atomic_long_t last_ino;
>  	/* Whiteout dentry cache */
>  	struct dentry *whiteout;
> +	/* r/o snapshot of upperdir sb's only taken on volatile mounts */
> +	errseq_t errseq;
>  };
>  
>  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 01620ebae1bd..a273ef901e57 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -909,8 +909,9 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
>  	struct file *realfile;
>  	int err;
>  
> -	if (!ovl_should_sync(OVL_FS(file->f_path.dentry->d_sb)))
> -		return 0;
> +	err = ovl_sync_status(OVL_FS(file->f_path.dentry->d_sb));
> +	if (err <= 0)
> +		return err;
>  
>  	realfile = ovl_dir_real_file(file, true);
>  	err = PTR_ERR_OR_ZERO(realfile);
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 290983bcfbb3..b917b456bbb4 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
>  	struct super_block *upper_sb;
>  	int ret;
>  
> -	if (!ovl_upper_mnt(ofs))
> -		return 0;
> +	ret = ovl_sync_status(ofs);
> +	/*
> +	 * We have to always set the err, because the return value isn't
> +	 * checked in syncfs, and instead indirectly return an error via
> +	 * the sb's writeback errseq, which VFS inspects after this call.
> +	 */
> +	if (ret < 0)
> +		errseq_set(&sb->s_wb_err, ret);

Again, I think we can simplify this. If we just capture return code of
->sync_fs in VFS and return to user space, we can simply return an
error instead of trying to play this game of setting errseq on overlay
superblock.

Thanks
Vivek

> +
> +	if (!ret)
> +		return ret;
>  
> -	if (!ovl_should_sync(ofs))
> -		return 0;
>  	/*
>  	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>  	 * All the super blocks will be iterated, including upper_sb.
> @@ -1927,6 +1934,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  	sb->s_op = &ovl_super_operations;
>  
>  	if (ofs->config.upperdir) {
> +		struct super_block *upper_sb;
> +
>  		if (!ofs->config.workdir) {
>  			pr_err("missing 'workdir'\n");
>  			goto out_err;
> @@ -1936,6 +1945,16 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  		if (err)
>  			goto out_err;
>  
> +		upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
> +		if (!ovl_should_sync(ofs)) {
> +			ofs->errseq = errseq_sample(&upper_sb->s_wb_err);
> +			if (errseq_check(&upper_sb->s_wb_err, ofs->errseq)) {
> +				err = -EIO;
> +				pr_err("Cannot mount volatile when upperdir has an unseen error. Sync upperdir fs to clear state.\n");
> +				goto out_err;
> +			}
> +		}
> +
>  		err = ovl_get_workdir(sb, ofs, &upperpath);
>  		if (err)
>  			goto out_err;
> @@ -1943,9 +1962,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  		if (!ofs->workdir)
>  			sb->s_flags |= SB_RDONLY;
>  
> -		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> -		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
> +		sb->s_stack_depth = upper_sb->s_stack_depth;
> +		sb->s_time_gran = upper_sb->s_time_gran;
>  	}
>  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
>  	err = PTR_ERR(oe);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 23f475627d07..6e7b8c882045 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -950,3 +950,30 @@ char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
>  	kfree(buf);
>  	return ERR_PTR(res);
>  }
> +
> +/*
> + * ovl_sync_status() - Check fs sync status for volatile mounts
> + *
> + * Returns 1 if this is not a volatile mount and a real sync is required.
> + *
> + * Returns 0 if syncing can be skipped because mount is volatile, and no errors
> + * have occurred on the upperdir since the mount.
> + *
> + * Returns -errno if it is a volatile mount, and the error that occurred since
> + * the last mount. If the error code changes, it'll return the latest error
> + * code.
> + */
> +
> +int ovl_sync_status(struct ovl_fs *ofs)
> +{
> +	struct vfsmount *mnt;
> +
> +	if (ovl_should_sync(ofs))
> +		return 1;
> +
> +	mnt = ovl_upper_mnt(ofs);
> +	if (!mnt)
> +		return 0;
> +
> +	return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);

> +}
> -- 
> 2.25.1
>
Sargun Dhillon Jan. 7, 2021, 3:51 a.m. UTC | #3
On Wed, Jan 06, 2021 at 02:46:58PM -0500, Vivek Goyal wrote:
> On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> > Overlayfs's volatile option allows the user to bypass all forced sync calls
> > to the upperdir filesystem. This comes at the cost of safety. We can never
> > ensure that the user's data is intact, but we can make a best effort to
> > expose whether or not the data is likely to be in a bad state.
> > 
> > The best way to handle this in the time being is that if an overlayfs's
> > upperdir experiences an error after a volatile mount occurs, that error
> > will be returned on fsync, fdatasync, sync, and syncfs. This is
> > contradictory to the traditional behaviour of VFS which fails the call
> > once, and only raises an error if a subsequent fsync error has occurred,
> > and been raised by the filesystem.
> > 
> > One awkward aspect of the patch is that we have to manually set the
> > superblock's errseq_t after the sync_fs callback as opposed to just
> > returning an error from syncfs. This is because the call chain looks
> > something like this:
> > 
> > sys_syncfs ->
> > 	sync_filesystem ->
> > 		__sync_filesystem ->
> > 			/* The return value is ignored here
> > 			sb->s_op->sync_fs(sb)
> > 			_sync_blockdev
> > 		/* Where the VFS fetches the error to raise to userspace */
> > 		errseq_check_and_advance
> > 
> > Because of this we call errseq_set every time the sync_fs callback occurs.
> 
> Why not start capturing return code of ->sync_fs and then return error
> from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb). 
> 
> I already posted a patch to capture retrun code from ->sync_fs.
> 
> https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/
> 

The idea of this patch is to go into stable, and a minimal patch to prevent 
overlayfs volatile mounts from expressing unintended behaviour. I think that 
your changes are still valid, and can sit atop this [and you can remove the 
errseq_set].

I believe the consensus was that changing the behaviour for all filesystems
presented undue risk to have the patch land in stable.

> 
> > Due to the nature of this seen / unseen dichotomy, if the upperdir is an
> > inconsistent state at the initial mount time, overlayfs will refuse to
> > mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
> > will increment on error until the user calls syncfs.
> > 
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > Suggested-by: Amir Goldstein <amir73il@gmail.com>
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: Jeff Layton <jlayton@redhat.com>
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > ---
> >  Documentation/filesystems/overlayfs.rst |  8 +++++++
> >  fs/overlayfs/file.c                     |  5 ++--
> >  fs/overlayfs/overlayfs.h                |  1 +
> >  fs/overlayfs/ovl_entry.h                |  2 ++
> >  fs/overlayfs/readdir.c                  |  5 ++--
> >  fs/overlayfs/super.c                    | 32 +++++++++++++++++++------
> >  fs/overlayfs/util.c                     | 27 +++++++++++++++++++++
> >  7 files changed, 69 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> > index 580ab9a0fe31..3af569cea6a7 100644
> > --- a/Documentation/filesystems/overlayfs.rst
> > +++ b/Documentation/filesystems/overlayfs.rst
> > @@ -575,6 +575,14 @@ without significant effort.
> >  The advantage of mounting with the "volatile" option is that all forms of
> >  sync calls to the upper filesystem are omitted.
> >  
> > +In order to avoid a giving a false sense of safety, the syncfs (and fsync)
> > +semantics of volatile mounts are slightly different than that of the rest of
> > +VFS.  If any error occurs on the upperdir's filesystem after a volatile mount
>                 ^^^
> shoud we say "If any writeback error occurs...."
> 
Sure.

> > +takes place, all sync functions will return the last error observed on the
> > +upperdir filesystem.  Once this condition is reached, the filesystem will not
> > +recover, and every subsequent sync call will return an error, even if the
> > +upperdir has not experience a new error since the last sync call.
> 
> Once filesystem fails, do we want to continue to return latest error on
> upper? Or we just mark filesystem failed internally and once failed
> we always return a fixed error, say -EIO. That way we don't have to
> call errseq_check() on every filesystem call. I am assuming at some
> point of time we will extend this to other filesystem functions
> like read()/write()/mmap() etc. Filesystem has failed at this point 
> of time and user is supposed to throw away upper and restart.
> 
I think we talked about this on another thread -- adding filesystem shutdown[1]. I 
think that once we land this, we can go a number of ways in -next and add shutdown,
direct error return, and volatile remount, but I'd rather get something into stable
which is minimal earlier than later.

> > +
> >  When overlay is mounted with "volatile" option, the directory
> >  "$workdir/work/incompat/volatile" is created.  During next mount, overlay
> >  checks for this directory and refuses to mount if present. This is a strong
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index a1f72ac053e5..5c5c3972ebd0 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -445,8 +445,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >  	const struct cred *old_cred;
> >  	int ret;
> >  
> > -	if (!ovl_should_sync(OVL_FS(file_inode(file)->i_sb)))
> > -		return 0;
> > +	ret = ovl_sync_status(OVL_FS(file_inode(file)->i_sb));
> > +	if (ret <= 0)
> > +		return ret;
> >  
> >  	ret = ovl_real_fdget_meta(file, &real, !datasync);
> >  	if (ret)
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index f8880aa2ba0e..9f7af98ae200 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -322,6 +322,7 @@ int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry);
> >  bool ovl_is_metacopy_dentry(struct dentry *dentry);
> >  char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
> >  			     int padding);
> > +int ovl_sync_status(struct ovl_fs *ofs);
> >  
> >  static inline bool ovl_is_impuredir(struct super_block *sb,
> >  				    struct dentry *dentry)
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index 1b5a2094df8e..b208eba5d0b6 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -79,6 +79,8 @@ struct ovl_fs {
> >  	atomic_long_t last_ino;
> >  	/* Whiteout dentry cache */
> >  	struct dentry *whiteout;
> > +	/* r/o snapshot of upperdir sb's only taken on volatile mounts */
> > +	errseq_t errseq;
> >  };
> >  
> >  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 01620ebae1bd..a273ef901e57 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -909,8 +909,9 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
> >  	struct file *realfile;
> >  	int err;
> >  
> > -	if (!ovl_should_sync(OVL_FS(file->f_path.dentry->d_sb)))
> > -		return 0;
> > +	err = ovl_sync_status(OVL_FS(file->f_path.dentry->d_sb));
> > +	if (err <= 0)
> > +		return err;
> >  
> >  	realfile = ovl_dir_real_file(file, true);
> >  	err = PTR_ERR_OR_ZERO(realfile);
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 290983bcfbb3..b917b456bbb4 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
> >  	struct super_block *upper_sb;
> >  	int ret;
> >  
> > -	if (!ovl_upper_mnt(ofs))
> > -		return 0;
> > +	ret = ovl_sync_status(ofs);
> > +	/*
> > +	 * We have to always set the err, because the return value isn't
> > +	 * checked in syncfs, and instead indirectly return an error via
> > +	 * the sb's writeback errseq, which VFS inspects after this call.
> > +	 */
> > +	if (ret < 0)
> > +		errseq_set(&sb->s_wb_err, ret);
> 
> Again, I think we can simplify this. If we just capture return code of
> ->sync_fs in VFS and return to user space, we can simply return an
> error instead of trying to play this game of setting errseq on overlay
> superblock.
> 
> Thanks
> Vivek
> 
If you want to land that in stable, I'm fine with returning an error directly, 
but I'll leave that up to Al and Matthew.

[1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhra_RB98gJ7ovGhbUV1atCR1rMPnf63tT37WtrNC0asg@mail.gmail.com/T/#u
Amir Goldstein Jan. 7, 2021, 7:02 a.m. UTC | #4
On Wed, Jan 6, 2021 at 9:47 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> > Overlayfs's volatile option allows the user to bypass all forced sync calls
> > to the upperdir filesystem. This comes at the cost of safety. We can never
> > ensure that the user's data is intact, but we can make a best effort to
> > expose whether or not the data is likely to be in a bad state.
> >
> > The best way to handle this in the time being is that if an overlayfs's
> > upperdir experiences an error after a volatile mount occurs, that error
> > will be returned on fsync, fdatasync, sync, and syncfs. This is
> > contradictory to the traditional behaviour of VFS which fails the call
> > once, and only raises an error if a subsequent fsync error has occurred,
> > and been raised by the filesystem.
> >
> > One awkward aspect of the patch is that we have to manually set the
> > superblock's errseq_t after the sync_fs callback as opposed to just
> > returning an error from syncfs. This is because the call chain looks
> > something like this:
> >
> > sys_syncfs ->
> >       sync_filesystem ->
> >               __sync_filesystem ->
> >                       /* The return value is ignored here
> >                       sb->s_op->sync_fs(sb)
> >                       _sync_blockdev
> >               /* Where the VFS fetches the error to raise to userspace */
> >               errseq_check_and_advance
> >
> > Because of this we call errseq_set every time the sync_fs callback occurs.
>
> Why not start capturing return code of ->sync_fs and then return error
> from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb).
>
> I already posted a patch to capture retrun code from ->sync_fs.
>
> https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/
>
>

Vivek,

IMO the more important question is "Why not?".

Your patches will undoubtedly get to mainline in the near future and they do
make the errseq_set(ovl_sb) in this patch a bit redundant, but I really see no
harm in it. It is very simple for you to remove this line in your patch.
I do see the big benefit of an independent patch that is easy to apply to fix
a fresh v5.10 feature.

I think it is easy for people to dismiss the importance of "syncfs on volatile"
which sounds like a contradiction, but it is not.
The fact that the current behavior is documented doesn't make it right either.
It just makes our review wrong.
The durability guarantee (that volatile does not provide) is very different
from the "reliability" guarantee that it CAN provide.
We do not want to have to explain to people that "volatile" provided different
guarantees depending on the kernel they are running.
Fixing syncfs/fsync of volatile is much more important IMO than erroring
on other fs ops post writeback error, because other fs ops are equally
unreliable on any filesystem in case application did not do fsync.

Ignoring the factor of "backporting cost" when there is no engineering
justification to do so is just ignoring the pain of others.
Do you have an engineering argument for objecting this patch is
applied before your fixes to syncfs vfs API?

Sargun,

Please add Fixes/Stable #v5.10 tags.

Thanks,
Amir.
Sargun Dhillon Jan. 7, 2021, 8:02 a.m. UTC | #5
On Wed, Jan 6, 2021 at 11:02 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Jan 6, 2021 at 9:47 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> > > Overlayfs's volatile option allows the user to bypass all forced sync calls
> > > to the upperdir filesystem. This comes at the cost of safety. We can never
> > > ensure that the user's data is intact, but we can make a best effort to
> > > expose whether or not the data is likely to be in a bad state.
> > >
> > > The best way to handle this in the time being is that if an overlayfs's
> > > upperdir experiences an error after a volatile mount occurs, that error
> > > will be returned on fsync, fdatasync, sync, and syncfs. This is
> > > contradictory to the traditional behaviour of VFS which fails the call
> > > once, and only raises an error if a subsequent fsync error has occurred,
> > > and been raised by the filesystem.
> > >
> > > One awkward aspect of the patch is that we have to manually set the
> > > superblock's errseq_t after the sync_fs callback as opposed to just
> > > returning an error from syncfs. This is because the call chain looks
> > > something like this:
> > >
> > > sys_syncfs ->
> > >       sync_filesystem ->
> > >               __sync_filesystem ->
> > >                       /* The return value is ignored here
> > >                       sb->s_op->sync_fs(sb)
> > >                       _sync_blockdev
> > >               /* Where the VFS fetches the error to raise to userspace */
> > >               errseq_check_and_advance
> > >
> > > Because of this we call errseq_set every time the sync_fs callback occurs.
> >
> > Why not start capturing return code of ->sync_fs and then return error
> > from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb).
> >
> > I already posted a patch to capture retrun code from ->sync_fs.
> >
> > https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/
> >
> >
>
> Vivek,
>
> IMO the more important question is "Why not?".
>
> Your patches will undoubtedly get to mainline in the near future and they do
> make the errseq_set(ovl_sb) in this patch a bit redundant, but I really see no
> harm in it. It is very simple for you to remove this line in your patch.
> I do see the big benefit of an independent patch that is easy to apply to fix
> a fresh v5.10 feature.
>
> I think it is easy for people to dismiss the importance of "syncfs on volatile"
> which sounds like a contradiction, but it is not.
> The fact that the current behavior is documented doesn't make it right either.
> It just makes our review wrong.
> The durability guarantee (that volatile does not provide) is very different
> from the "reliability" guarantee that it CAN provide.
> We do not want to have to explain to people that "volatile" provided different
> guarantees depending on the kernel they are running.
> Fixing syncfs/fsync of volatile is much more important IMO than erroring
> on other fs ops post writeback error, because other fs ops are equally
> unreliable on any filesystem in case application did not do fsync.
>
> Ignoring the factor of "backporting cost" when there is no engineering
> justification to do so is just ignoring the pain of others.
> Do you have an engineering argument for objecting this patch is
> applied before your fixes to syncfs vfs API?
>
> Sargun,
>
> Please add Fixes/Stable #v5.10 tags.
>
> Thanks,
> Amir.

I was going to send the patch to stable once it was picked up in
the unionfs tree. I will resend / re-roll with a CC to stable.
Vivek Goyal Jan. 7, 2021, 1:44 p.m. UTC | #6
On Thu, Jan 07, 2021 at 09:02:00AM +0200, Amir Goldstein wrote:
> On Wed, Jan 6, 2021 at 9:47 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> > > Overlayfs's volatile option allows the user to bypass all forced sync calls
> > > to the upperdir filesystem. This comes at the cost of safety. We can never
> > > ensure that the user's data is intact, but we can make a best effort to
> > > expose whether or not the data is likely to be in a bad state.
> > >
> > > The best way to handle this in the time being is that if an overlayfs's
> > > upperdir experiences an error after a volatile mount occurs, that error
> > > will be returned on fsync, fdatasync, sync, and syncfs. This is
> > > contradictory to the traditional behaviour of VFS which fails the call
> > > once, and only raises an error if a subsequent fsync error has occurred,
> > > and been raised by the filesystem.
> > >
> > > One awkward aspect of the patch is that we have to manually set the
> > > superblock's errseq_t after the sync_fs callback as opposed to just
> > > returning an error from syncfs. This is because the call chain looks
> > > something like this:
> > >
> > > sys_syncfs ->
> > >       sync_filesystem ->
> > >               __sync_filesystem ->
> > >                       /* The return value is ignored here
> > >                       sb->s_op->sync_fs(sb)
> > >                       _sync_blockdev
> > >               /* Where the VFS fetches the error to raise to userspace */
> > >               errseq_check_and_advance
> > >
> > > Because of this we call errseq_set every time the sync_fs callback occurs.
> >
> > Why not start capturing return code of ->sync_fs and then return error
> > from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb).
> >
> > I already posted a patch to capture retrun code from ->sync_fs.
> >
> > https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/
> >
> >
> 
> Vivek,
> 
> IMO the more important question is "Why not?".
> 
> Your patches will undoubtedly get to mainline in the near future and they do
> make the errseq_set(ovl_sb) in this patch a bit redundant,

I thought my patch of capturing ->sync_fs is really simple (just few
lines), so backportability should not be an issue. That's why I
asked for it. 

> but I really see no
> harm in it. It is very simple for you to remove this line in your patch.
> I do see the big benefit of an independent patch that is easy to apply to fix
> a fresh v5.10 feature.
> 
> I think it is easy for people to dismiss the importance of "syncfs on volatile"
> which sounds like a contradiction, but it is not.
> The fact that the current behavior is documented doesn't make it right either.
> It just makes our review wrong.
> The durability guarantee (that volatile does not provide) is very different
> from the "reliability" guarantee that it CAN provide.
> We do not want to have to explain to people that "volatile" provided different
> guarantees depending on the kernel they are running.
> Fixing syncfs/fsync of volatile is much more important IMO than erroring
> on other fs ops post writeback error, because other fs ops are equally
> unreliable on any filesystem in case application did not do fsync.
> 
> Ignoring the factor of "backporting cost" when there is no engineering
> justification to do so is just ignoring the pain of others.
> Do you have an engineering argument for objecting this patch is
> applied before your fixes to syncfs vfs API?

Carrying ->sync_fs return code patch is definitely not a blocker. It
is just nice to have. Anyway, I you don't want to carry that ->sync_fs
return patch in stable, I am fine with this patch. I will follow up
on that fix separately.

Vivek

> 
> Sargun,
> 
> Please add Fixes/Stable #v5.10 tags.
> 
> Thanks,
> Amir.
>
Amir Goldstein Jan. 7, 2021, 2:44 p.m. UTC | #7
On Thu, Jan 7, 2021 at 3:45 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, Jan 07, 2021 at 09:02:00AM +0200, Amir Goldstein wrote:
> > On Wed, Jan 6, 2021 at 9:47 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> > > > Overlayfs's volatile option allows the user to bypass all forced sync calls
> > > > to the upperdir filesystem. This comes at the cost of safety. We can never
> > > > ensure that the user's data is intact, but we can make a best effort to
> > > > expose whether or not the data is likely to be in a bad state.
> > > >
> > > > The best way to handle this in the time being is that if an overlayfs's
> > > > upperdir experiences an error after a volatile mount occurs, that error
> > > > will be returned on fsync, fdatasync, sync, and syncfs. This is
> > > > contradictory to the traditional behaviour of VFS which fails the call
> > > > once, and only raises an error if a subsequent fsync error has occurred,
> > > > and been raised by the filesystem.
> > > >
> > > > One awkward aspect of the patch is that we have to manually set the
> > > > superblock's errseq_t after the sync_fs callback as opposed to just
> > > > returning an error from syncfs. This is because the call chain looks
> > > > something like this:
> > > >
> > > > sys_syncfs ->
> > > >       sync_filesystem ->
> > > >               __sync_filesystem ->
> > > >                       /* The return value is ignored here
> > > >                       sb->s_op->sync_fs(sb)
> > > >                       _sync_blockdev
> > > >               /* Where the VFS fetches the error to raise to userspace */
> > > >               errseq_check_and_advance
> > > >
> > > > Because of this we call errseq_set every time the sync_fs callback occurs.
> > >
> > > Why not start capturing return code of ->sync_fs and then return error
> > > from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb).
> > >
> > > I already posted a patch to capture retrun code from ->sync_fs.
> > >
> > > https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/
> > >
> > >
> >
> > Vivek,
> >
> > IMO the more important question is "Why not?".
> >
> > Your patches will undoubtedly get to mainline in the near future and they do
> > make the errseq_set(ovl_sb) in this patch a bit redundant,
>
> I thought my patch of capturing ->sync_fs is really simple (just few
> lines), so backportability should not be an issue. That's why I
> asked for it.
>

Apologies. I thought you meant your entire patch set.
I do agree to that. In fact, I think I suggested it myself at one
point or another.

> > but I really see no
> > harm in it. It is very simple for you to remove this line in your patch.
> > I do see the big benefit of an independent patch that is easy to apply to fix
> > a fresh v5.10 feature.
> >
> > I think it is easy for people to dismiss the importance of "syncfs on volatile"
> > which sounds like a contradiction, but it is not.
> > The fact that the current behavior is documented doesn't make it right either.
> > It just makes our review wrong.
> > The durability guarantee (that volatile does not provide) is very different
> > from the "reliability" guarantee that it CAN provide.
> > We do not want to have to explain to people that "volatile" provided different
> > guarantees depending on the kernel they are running.
> > Fixing syncfs/fsync of volatile is much more important IMO than erroring
> > on other fs ops post writeback error, because other fs ops are equally
> > unreliable on any filesystem in case application did not do fsync.
> >
> > Ignoring the factor of "backporting cost" when there is no engineering
> > justification to do so is just ignoring the pain of others.
> > Do you have an engineering argument for objecting this patch is
> > applied before your fixes to syncfs vfs API?
>
> Carrying ->sync_fs return code patch is definitely not a blocker. It
> is just nice to have. Anyway, I you don't want to carry that ->sync_fs
> return patch in stable, I am fine with this patch. I will follow up
> on that fix separately.
>

Please collaborate with Sargun.
I think it is best if one of you will post those two patches in the same
series. I think you had a few minor comments to address, so many
send the final patch version to Sargun to he can test the two patches
together and post them?

Sorry for the confusion.
Too many "the syncfs patch" to juggle.

Thanks,
Amir.
Vivek Goyal Jan. 7, 2021, 2:57 p.m. UTC | #8
On Thu, Jan 07, 2021 at 04:44:19PM +0200, Amir Goldstein wrote:
> On Thu, Jan 7, 2021 at 3:45 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, Jan 07, 2021 at 09:02:00AM +0200, Amir Goldstein wrote:
> > > On Wed, Jan 6, 2021 at 9:47 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote:
> > > > > Overlayfs's volatile option allows the user to bypass all forced sync calls
> > > > > to the upperdir filesystem. This comes at the cost of safety. We can never
> > > > > ensure that the user's data is intact, but we can make a best effort to
> > > > > expose whether or not the data is likely to be in a bad state.
> > > > >
> > > > > The best way to handle this in the time being is that if an overlayfs's
> > > > > upperdir experiences an error after a volatile mount occurs, that error
> > > > > will be returned on fsync, fdatasync, sync, and syncfs. This is
> > > > > contradictory to the traditional behaviour of VFS which fails the call
> > > > > once, and only raises an error if a subsequent fsync error has occurred,
> > > > > and been raised by the filesystem.
> > > > >
> > > > > One awkward aspect of the patch is that we have to manually set the
> > > > > superblock's errseq_t after the sync_fs callback as opposed to just
> > > > > returning an error from syncfs. This is because the call chain looks
> > > > > something like this:
> > > > >
> > > > > sys_syncfs ->
> > > > >       sync_filesystem ->
> > > > >               __sync_filesystem ->
> > > > >                       /* The return value is ignored here
> > > > >                       sb->s_op->sync_fs(sb)
> > > > >                       _sync_blockdev
> > > > >               /* Where the VFS fetches the error to raise to userspace */
> > > > >               errseq_check_and_advance
> > > > >
> > > > > Because of this we call errseq_set every time the sync_fs callback occurs.
> > > >
> > > > Why not start capturing return code of ->sync_fs and then return error
> > > > from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb).
> > > >
> > > > I already posted a patch to capture retrun code from ->sync_fs.
> > > >
> > > > https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@redhat.com/
> > > >
> > > >
> > >
> > > Vivek,
> > >
> > > IMO the more important question is "Why not?".
> > >
> > > Your patches will undoubtedly get to mainline in the near future and they do
> > > make the errseq_set(ovl_sb) in this patch a bit redundant,
> >
> > I thought my patch of capturing ->sync_fs is really simple (just few
> > lines), so backportability should not be an issue. That's why I
> > asked for it.
> >
> 
> Apologies. I thought you meant your entire patch set.
> I do agree to that. In fact, I think I suggested it myself at one
> point or another.
> 
> > > but I really see no
> > > harm in it. It is very simple for you to remove this line in your patch.
> > > I do see the big benefit of an independent patch that is easy to apply to fix
> > > a fresh v5.10 feature.
> > >
> > > I think it is easy for people to dismiss the importance of "syncfs on volatile"
> > > which sounds like a contradiction, but it is not.
> > > The fact that the current behavior is documented doesn't make it right either.
> > > It just makes our review wrong.
> > > The durability guarantee (that volatile does not provide) is very different
> > > from the "reliability" guarantee that it CAN provide.
> > > We do not want to have to explain to people that "volatile" provided different
> > > guarantees depending on the kernel they are running.
> > > Fixing syncfs/fsync of volatile is much more important IMO than erroring
> > > on other fs ops post writeback error, because other fs ops are equally
> > > unreliable on any filesystem in case application did not do fsync.
> > >
> > > Ignoring the factor of "backporting cost" when there is no engineering
> > > justification to do so is just ignoring the pain of others.
> > > Do you have an engineering argument for objecting this patch is
> > > applied before your fixes to syncfs vfs API?
> >
> > Carrying ->sync_fs return code patch is definitely not a blocker. It
> > is just nice to have. Anyway, I you don't want to carry that ->sync_fs
> > return patch in stable, I am fine with this patch. I will follow up
> > on that fix separately.
> >
> 
> Please collaborate with Sargun.
> I think it is best if one of you will post those two patches in the same
> series. I think you had a few minor comments to address, so many
> send the final patch version to Sargun to he can test the two patches
> together and post them?

Hi Amir,

I was thinking more about that patch. That patch will start returning
error on syncfs() in cases where it did not return errors in the past and 
somebody might complain. So it probably is safer to carry that patch
in mainline first and once it gets good testing, push it to stable later.

So for now, I am fine with this patch as it is. Will follow on ->sync_fs
error capture patch separately. And once that is upstream, I can post
another overlay patch to remove errseq_set().

> 
> Sorry for the confusion.
> Too many "the syncfs patch" to juggle.

No worries. I agree, too many mail threads on this topic.

Thanks
Vivek
diff mbox series

Patch

diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 580ab9a0fe31..3af569cea6a7 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst
@@ -575,6 +575,14 @@  without significant effort.
 The advantage of mounting with the "volatile" option is that all forms of
 sync calls to the upper filesystem are omitted.
 
+In order to avoid a giving a false sense of safety, the syncfs (and fsync)
+semantics of volatile mounts are slightly different than that of the rest of
+VFS.  If any error occurs on the upperdir's filesystem after a volatile mount
+takes place, all sync functions will return the last error observed on the
+upperdir filesystem.  Once this condition is reached, the filesystem will not
+recover, and every subsequent sync call will return an error, even if the
+upperdir has not experience a new error since the last sync call.
+
 When overlay is mounted with "volatile" option, the directory
 "$workdir/work/incompat/volatile" is created.  During next mount, overlay
 checks for this directory and refuses to mount if present. This is a strong
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index a1f72ac053e5..5c5c3972ebd0 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -445,8 +445,9 @@  static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	const struct cred *old_cred;
 	int ret;
 
-	if (!ovl_should_sync(OVL_FS(file_inode(file)->i_sb)))
-		return 0;
+	ret = ovl_sync_status(OVL_FS(file_inode(file)->i_sb));
+	if (ret <= 0)
+		return ret;
 
 	ret = ovl_real_fdget_meta(file, &real, !datasync);
 	if (ret)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f8880aa2ba0e..9f7af98ae200 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -322,6 +322,7 @@  int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry);
 bool ovl_is_metacopy_dentry(struct dentry *dentry);
 char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 			     int padding);
+int ovl_sync_status(struct ovl_fs *ofs);
 
 static inline bool ovl_is_impuredir(struct super_block *sb,
 				    struct dentry *dentry)
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094df8e..b208eba5d0b6 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -79,6 +79,8 @@  struct ovl_fs {
 	atomic_long_t last_ino;
 	/* Whiteout dentry cache */
 	struct dentry *whiteout;
+	/* r/o snapshot of upperdir sb's only taken on volatile mounts */
+	errseq_t errseq;
 };
 
 static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 01620ebae1bd..a273ef901e57 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -909,8 +909,9 @@  static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end,
 	struct file *realfile;
 	int err;
 
-	if (!ovl_should_sync(OVL_FS(file->f_path.dentry->d_sb)))
-		return 0;
+	err = ovl_sync_status(OVL_FS(file->f_path.dentry->d_sb));
+	if (err <= 0)
+		return err;
 
 	realfile = ovl_dir_real_file(file, true);
 	err = PTR_ERR_OR_ZERO(realfile);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983bcfbb3..b917b456bbb4 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -261,11 +261,18 @@  static int ovl_sync_fs(struct super_block *sb, int wait)
 	struct super_block *upper_sb;
 	int ret;
 
-	if (!ovl_upper_mnt(ofs))
-		return 0;
+	ret = ovl_sync_status(ofs);
+	/*
+	 * We have to always set the err, because the return value isn't
+	 * checked in syncfs, and instead indirectly return an error via
+	 * the sb's writeback errseq, which VFS inspects after this call.
+	 */
+	if (ret < 0)
+		errseq_set(&sb->s_wb_err, ret);
+
+	if (!ret)
+		return ret;
 
-	if (!ovl_should_sync(ofs))
-		return 0;
 	/*
 	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 	 * All the super blocks will be iterated, including upper_sb.
@@ -1927,6 +1934,8 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_op = &ovl_super_operations;
 
 	if (ofs->config.upperdir) {
+		struct super_block *upper_sb;
+
 		if (!ofs->config.workdir) {
 			pr_err("missing 'workdir'\n");
 			goto out_err;
@@ -1936,6 +1945,16 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (err)
 			goto out_err;
 
+		upper_sb = ovl_upper_mnt(ofs)->mnt_sb;
+		if (!ovl_should_sync(ofs)) {
+			ofs->errseq = errseq_sample(&upper_sb->s_wb_err);
+			if (errseq_check(&upper_sb->s_wb_err, ofs->errseq)) {
+				err = -EIO;
+				pr_err("Cannot mount volatile when upperdir has an unseen error. Sync upperdir fs to clear state.\n");
+				goto out_err;
+			}
+		}
+
 		err = ovl_get_workdir(sb, ofs, &upperpath);
 		if (err)
 			goto out_err;
@@ -1943,9 +1962,8 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (!ofs->workdir)
 			sb->s_flags |= SB_RDONLY;
 
-		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
-		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
-
+		sb->s_stack_depth = upper_sb->s_stack_depth;
+		sb->s_time_gran = upper_sb->s_time_gran;
 	}
 	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
 	err = PTR_ERR(oe);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f475627d07..6e7b8c882045 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -950,3 +950,30 @@  char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry,
 	kfree(buf);
 	return ERR_PTR(res);
 }
+
+/*
+ * ovl_sync_status() - Check fs sync status for volatile mounts
+ *
+ * Returns 1 if this is not a volatile mount and a real sync is required.
+ *
+ * Returns 0 if syncing can be skipped because mount is volatile, and no errors
+ * have occurred on the upperdir since the mount.
+ *
+ * Returns -errno if it is a volatile mount, and the error that occurred since
+ * the last mount. If the error code changes, it'll return the latest error
+ * code.
+ */
+
+int ovl_sync_status(struct ovl_fs *ofs)
+{
+	struct vfsmount *mnt;
+
+	if (ovl_should_sync(ofs))
+		return 1;
+
+	mnt = ovl_upper_mnt(ofs);
+	if (!mnt)
+		return 0;
+
+	return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);
+}