[V6,7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
diff mbox series

Message ID 20200407182958.568475-8-ira.weiny@intel.com
State New
Headers show
Series
  • Enable per-file/per-directory DAX operations V6
Related show

Commit Message

Ira Weiny April 7, 2020, 6:29 p.m. UTC
From: Ira Weiny <ira.weiny@intel.com>

We only support changing FS_XFLAG_DAX on directories.  Files get their
flag from the parent directory on creation only.  So no data
invalidation needs to happen.

Alter the xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check().

This also allows use to remove the join_flags logic.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v5:
	New patch
---
 fs/xfs/xfs_ioctl.c | 91 +++++-----------------------------------------
 1 file changed, 10 insertions(+), 81 deletions(-)

Comments

Dave Chinner April 8, 2020, 2:23 a.m. UTC | #1
On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> We only support changing FS_XFLAG_DAX on directories.  Files get their
> flag from the parent directory on creation only.  So no data
> invalidation needs to happen.

Which leads me to ask: how are users and/or admins supposed to
remove the flag from regular files once it is set in the filesystem?

Only being able to override the flag via the "dax=never" mount
option means that once the flag is set, nobody can ever remove it
and they can only globally turn off dax if it gets set incorrectly.
It also means a global interrupt because all apps on the filesystem
need to be stopped so the filesystem can be unmounted and mounted
again with dax=never. This is highly unfriendly to admins and users.

IOWs, we _must_ be able to clear this inode flag on regular inodes
in some way. I don't care if it doesn't change the current in-memory
state, but we must be able to clear the flags so that the next time
the inodes are instantiated DAX is not enabled for those files...

Cheers,

Dave.
Jan Kara April 8, 2020, 9:58 a.m. UTC | #2
On Wed 08-04-20 12:23:18, Dave Chinner wrote:
> On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > flag from the parent directory on creation only.  So no data
> > invalidation needs to happen.
> 
> Which leads me to ask: how are users and/or admins supposed to
> remove the flag from regular files once it is set in the filesystem?
> 
> Only being able to override the flag via the "dax=never" mount
> option means that once the flag is set, nobody can ever remove it
> and they can only globally turn off dax if it gets set incorrectly.
> It also means a global interrupt because all apps on the filesystem
> need to be stopped so the filesystem can be unmounted and mounted
> again with dax=never. This is highly unfriendly to admins and users.
> 
> IOWs, we _must_ be able to clear this inode flag on regular inodes
> in some way. I don't care if it doesn't change the current in-memory
> state, but we must be able to clear the flags so that the next time
> the inodes are instantiated DAX is not enabled for those files...

Well, there's one way to clear the flag: delete the file. If you still care
about the data, you can copy the data first. It isn't very convenient, I
agree, and effectively means restarting whatever application that is using
the file. But it seems like more understandable API than letting user clear
the on-disk flag but the inode will still use DAX until kernel decides to
evict the inode - because that often means you need to restart the
application using the file anyway for the flag change to have any effect.

								Honza
Darrick J. Wong April 8, 2020, 3:37 p.m. UTC | #3
On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> We only support changing FS_XFLAG_DAX on directories.  Files get their
> flag from the parent directory on creation only.  So no data
> invalidation needs to happen.
> 
> Alter the xfs_ioctl_setattr_dax_invalidate() to be
> xfs_ioctl_dax_check().
> 
> This also allows use to remove the join_flags logic.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v5:
> 	New patch
> ---
>  fs/xfs/xfs_ioctl.c | 91 +++++-----------------------------------------
>  1 file changed, 10 insertions(+), 81 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index c6cd92ef4a05..5472faab7c4f 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1145,63 +1145,18 @@ xfs_ioctl_setattr_xflags(
>  }
>  
>  /*
> - * If we are changing DAX flags, we have to ensure the file is clean and any
> - * cached objects in the address space are invalidated and removed. This
> - * requires us to lock out other IO and page faults similar to a truncate
> - * operation. The locks need to be held until the transaction has been committed
> - * so that the cache invalidation is atomic with respect to the DAX flag
> - * manipulation.
> + * Only directories are allowed to change dax flags
>   */
>  static int
>  xfs_ioctl_setattr_dax_invalidate(
> -	struct xfs_inode	*ip,
> -	struct fsxattr		*fa,
> -	int			*join_flags)
> +	struct xfs_inode	*ip)
>  {
>  	struct inode		*inode = VFS_I(ip);
> -	struct super_block	*sb = inode->i_sb;
> -	int			error;
> -
> -	*join_flags = 0;
> -
> -	/*
> -	 * It is only valid to set the DAX flag on regular files and
> -	 * directories on filesystems where the block size is equal to the page
> -	 * size. On directories it serves as an inherited hint so we don't
> -	 * have to check the device for dax support or flush pagecache.
> -	 */
> -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> -
> -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> -			return -EINVAL;
> -	}
> -
> -	/* If the DAX state is not changing, we have nothing to do here. */
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> -		return 0;
> -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> -		return 0;

Does the !S_ISDIR check below apply unconditionally even if we weren't
trying to change the DAX flag?

> -	if (S_ISDIR(inode->i_mode))
> -		return 0;
>  
> -	/* lock, flush and invalidate mapping in preparation for flag change */
> -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	error = filemap_write_and_wait(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> -	error = invalidate_inode_pages2(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> +	if (!S_ISDIR(inode->i_mode))
> +		return -EINVAL;

If this entire function collapses to an S_ISDIR check then you might
as well just hoist this one piece to the caller.  Also, where is
xfs_ioctl_dax_check?

<confused>

--D

>  
> -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
>  	return 0;
> -
> -out_unlock:
> -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	return error;
> -
>  }
>  
>  /*
> @@ -1209,17 +1164,10 @@ xfs_ioctl_setattr_dax_invalidate(
>   * have permission to do so. On success, return a clean transaction and the
>   * inode locked exclusively ready for further operation specific checks. On
>   * failure, return an error without modifying or locking the inode.
> - *
> - * The inode might already be IO locked on call. If this is the case, it is
> - * indicated in @join_flags and we take full responsibility for ensuring they
> - * are unlocked from now on. Hence if we have an error here, we still have to
> - * unlock them. Otherwise, once they are joined to the transaction, they will
> - * be unlocked on commit/cancel.
>   */
>  static struct xfs_trans *
>  xfs_ioctl_setattr_get_trans(
> -	struct xfs_inode	*ip,
> -	int			join_flags)
> +	struct xfs_inode	*ip)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_trans	*tp;
> @@ -1236,8 +1184,7 @@ xfs_ioctl_setattr_get_trans(
>  		goto out_unlock;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> -	join_flags = 0;
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  
>  	/*
>  	 * CAP_FOWNER overrides the following restrictions:
> @@ -1258,8 +1205,6 @@ xfs_ioctl_setattr_get_trans(
>  out_cancel:
>  	xfs_trans_cancel(tp);
>  out_unlock:
> -	if (join_flags)
> -		xfs_iunlock(ip, join_flags);
>  	return ERR_PTR(error);
>  }
>  
> @@ -1386,7 +1331,6 @@ xfs_ioctl_setattr(
>  	struct xfs_dquot	*pdqp = NULL;
>  	struct xfs_dquot	*olddquot = NULL;
>  	int			code;
> -	int			join_flags = 0;
>  
>  	trace_xfs_ioctl_setattr(ip);
>  
> @@ -1410,18 +1354,11 @@ xfs_ioctl_setattr(
>  			return code;
>  	}
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> +	code = xfs_ioctl_setattr_dax_invalidate(ip);
>  	if (code)
>  		goto error_free_dquots;
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		code = PTR_ERR(tp);
>  		goto error_free_dquots;
> @@ -1552,7 +1489,6 @@ xfs_ioc_setxflags(
>  	struct fsxattr		fa;
>  	struct fsxattr		old_fa;
>  	unsigned int		flags;
> -	int			join_flags = 0;
>  	int			error;
>  
>  	if (copy_from_user(&flags, arg, sizeof(flags)))
> @@ -1569,18 +1505,11 @@ xfs_ioc_setxflags(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> +	error = xfs_ioctl_setattr_dax_invalidate(ip);
>  	if (error)
>  		goto out_drop_write;
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		error = PTR_ERR(tp);
>  		goto out_drop_write;
> -- 
> 2.25.1
>
Ira Weiny April 8, 2020, 6:13 p.m. UTC | #4
On Wed, Apr 08, 2020 at 08:37:17AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > flag from the parent directory on creation only.  So no data
> > invalidation needs to happen.
> > 
> > Alter the xfs_ioctl_setattr_dax_invalidate() to be
> > xfs_ioctl_dax_check().
> > 
> > This also allows use to remove the join_flags logic.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from v5:
> > 	New patch
> > ---
> >  fs/xfs/xfs_ioctl.c | 91 +++++-----------------------------------------
> >  1 file changed, 10 insertions(+), 81 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index c6cd92ef4a05..5472faab7c4f 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1145,63 +1145,18 @@ xfs_ioctl_setattr_xflags(
> >  }
> >  
> >  /*
> > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > - * cached objects in the address space are invalidated and removed. This
> > - * requires us to lock out other IO and page faults similar to a truncate
> > - * operation. The locks need to be held until the transaction has been committed
> > - * so that the cache invalidation is atomic with respect to the DAX flag
> > - * manipulation.
> > + * Only directories are allowed to change dax flags
> >   */
> >  static int
> >  xfs_ioctl_setattr_dax_invalidate(
> > -	struct xfs_inode	*ip,
> > -	struct fsxattr		*fa,
> > -	int			*join_flags)
> > +	struct xfs_inode	*ip)
> >  {
> >  	struct inode		*inode = VFS_I(ip);
> > -	struct super_block	*sb = inode->i_sb;
> > -	int			error;
> > -
> > -	*join_flags = 0;
> > -
> > -	/*
> > -	 * It is only valid to set the DAX flag on regular files and
> > -	 * directories on filesystems where the block size is equal to the page
> > -	 * size. On directories it serves as an inherited hint so we don't
> > -	 * have to check the device for dax support or flush pagecache.
> > -	 */
> > -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> > -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> > -
> > -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> > -			return -EINVAL;
> > -	}
> > -
> > -	/* If the DAX state is not changing, we have nothing to do here. */
> > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> > -		return 0;
> > -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> > -		return 0;
> 
> Does the !S_ISDIR check below apply unconditionally even if we weren't
> trying to change the DAX flag?

:-/

shoot...  I really screwed this up...

> 
> > -	if (S_ISDIR(inode->i_mode))
> > -		return 0;
> >  
> > -	/* lock, flush and invalidate mapping in preparation for flag change */
> > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	error = filemap_write_and_wait(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > -	error = invalidate_inode_pages2(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > +	if (!S_ISDIR(inode->i_mode))
> > +		return -EINVAL;
> 
> If this entire function collapses to an S_ISDIR check then you might
> as well just hoist this one piece to the caller.

Oops...  yea this is broken...  All broken.

> Also, where is
> xfs_ioctl_dax_check?

I forgot to change the name.

> 
> <confused>

Much less so than me...  :-/

I'll clean it up.

Thanks, this was really bad on my part...
Ira

> 
> --D
> 
> >  
> > -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> >  	return 0;
> > -
> > -out_unlock:
> > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	return error;
> > -
> >  }
> >  
> >  /*
> > @@ -1209,17 +1164,10 @@ xfs_ioctl_setattr_dax_invalidate(
> >   * have permission to do so. On success, return a clean transaction and the
> >   * inode locked exclusively ready for further operation specific checks. On
> >   * failure, return an error without modifying or locking the inode.
> > - *
> > - * The inode might already be IO locked on call. If this is the case, it is
> > - * indicated in @join_flags and we take full responsibility for ensuring they
> > - * are unlocked from now on. Hence if we have an error here, we still have to
> > - * unlock them. Otherwise, once they are joined to the transaction, they will
> > - * be unlocked on commit/cancel.
> >   */
> >  static struct xfs_trans *
> >  xfs_ioctl_setattr_get_trans(
> > -	struct xfs_inode	*ip,
> > -	int			join_flags)
> > +	struct xfs_inode	*ip)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_trans	*tp;
> > @@ -1236,8 +1184,7 @@ xfs_ioctl_setattr_get_trans(
> >  		goto out_unlock;
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> > -	join_flags = 0;
> > +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >  
> >  	/*
> >  	 * CAP_FOWNER overrides the following restrictions:
> > @@ -1258,8 +1205,6 @@ xfs_ioctl_setattr_get_trans(
> >  out_cancel:
> >  	xfs_trans_cancel(tp);
> >  out_unlock:
> > -	if (join_flags)
> > -		xfs_iunlock(ip, join_flags);
> >  	return ERR_PTR(error);
> >  }
> >  
> > @@ -1386,7 +1331,6 @@ xfs_ioctl_setattr(
> >  	struct xfs_dquot	*pdqp = NULL;
> >  	struct xfs_dquot	*olddquot = NULL;
> >  	int			code;
> > -	int			join_flags = 0;
> >  
> >  	trace_xfs_ioctl_setattr(ip);
> >  
> > @@ -1410,18 +1354,11 @@ xfs_ioctl_setattr(
> >  			return code;
> >  	}
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> > +	code = xfs_ioctl_setattr_dax_invalidate(ip);
> >  	if (code)
> >  		goto error_free_dquots;
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		code = PTR_ERR(tp);
> >  		goto error_free_dquots;
> > @@ -1552,7 +1489,6 @@ xfs_ioc_setxflags(
> >  	struct fsxattr		fa;
> >  	struct fsxattr		old_fa;
> >  	unsigned int		flags;
> > -	int			join_flags = 0;
> >  	int			error;
> >  
> >  	if (copy_from_user(&flags, arg, sizeof(flags)))
> > @@ -1569,18 +1505,11 @@ xfs_ioc_setxflags(
> >  	if (error)
> >  		return error;
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> > +	error = xfs_ioctl_setattr_dax_invalidate(ip);
> >  	if (error)
> >  		goto out_drop_write;
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		error = PTR_ERR(tp);
> >  		goto out_drop_write;
> > -- 
> > 2.25.1
> >
Dave Chinner April 8, 2020, 9:09 p.m. UTC | #5
On Wed, Apr 08, 2020 at 11:58:03AM +0200, Jan Kara wrote:
> On Wed 08-04-20 12:23:18, Dave Chinner wrote:
> > On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > > flag from the parent directory on creation only.  So no data
> > > invalidation needs to happen.
> > 
> > Which leads me to ask: how are users and/or admins supposed to
> > remove the flag from regular files once it is set in the filesystem?
> > 
> > Only being able to override the flag via the "dax=never" mount
> > option means that once the flag is set, nobody can ever remove it
> > and they can only globally turn off dax if it gets set incorrectly.
> > It also means a global interrupt because all apps on the filesystem
> > need to be stopped so the filesystem can be unmounted and mounted
> > again with dax=never. This is highly unfriendly to admins and users.
> > 
> > IOWs, we _must_ be able to clear this inode flag on regular inodes
> > in some way. I don't care if it doesn't change the current in-memory
> > state, but we must be able to clear the flags so that the next time
> > the inodes are instantiated DAX is not enabled for those files...
> 
> Well, there's one way to clear the flag: delete the file. If you still care
> about the data, you can copy the data first. It isn't very convenient, I
> agree, and effectively means restarting whatever application that is using
> the file.

Restarting the application is fine. Having to backup/restore or copy
the entire data set just to turn off an inode flag? That's not a
viable management strategy. We could be talking about terabytes of
data here.

I explained how we can safely remove the flag in the other branch of
this thread...

> But it seems like more understandable API than letting user clear
> the on-disk flag but the inode will still use DAX until kernel decides to
> evict the inode

Certainly doesn't seem that way to me. "stop app, clear flags, drop
caches, restart app" is a pretty simple, easy thing to do for an
admin.

Especially compared to process that is effectively "stop app, backup
data set, delete data set, clear flags, restore data set, restart
app"

> - because that often means you need to restart the
> application using the file anyway for the flag change to have any effect.

That's a trivial requirement compared to the downtime and resource
cost of a data set backup/restore just to clear inode flags....

Cheers,

Dave.
Ira Weiny April 8, 2020, 10:26 p.m. UTC | #6
On Thu, Apr 09, 2020 at 07:09:50AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 11:58:03AM +0200, Jan Kara wrote:
> > On Wed 08-04-20 12:23:18, Dave Chinner wrote:
> > > On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > 
> > > > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > > > flag from the parent directory on creation only.  So no data
> > > > invalidation needs to happen.
> > > 
> > > Which leads me to ask: how are users and/or admins supposed to
> > > remove the flag from regular files once it is set in the filesystem?
> > > 
> > > Only being able to override the flag via the "dax=never" mount
> > > option means that once the flag is set, nobody can ever remove it
> > > and they can only globally turn off dax if it gets set incorrectly.
> > > It also means a global interrupt because all apps on the filesystem
> > > need to be stopped so the filesystem can be unmounted and mounted
> > > again with dax=never. This is highly unfriendly to admins and users.
> > > 
> > > IOWs, we _must_ be able to clear this inode flag on regular inodes
> > > in some way. I don't care if it doesn't change the current in-memory
> > > state, but we must be able to clear the flags so that the next time
> > > the inodes are instantiated DAX is not enabled for those files...
> > 
> > Well, there's one way to clear the flag: delete the file. If you still care
> > about the data, you can copy the data first. It isn't very convenient, I
> > agree, and effectively means restarting whatever application that is using
> > the file.
> 
> Restarting the application is fine. Having to backup/restore or copy
> the entire data set just to turn off an inode flag? That's not a
> viable management strategy. We could be talking about terabytes of
> data here.
> 
> I explained how we can safely remove the flag in the other branch of
> this thread...
> 
> > But it seems like more understandable API than letting user clear
> > the on-disk flag but the inode will still use DAX until kernel decides to
> > evict the inode
> 
> Certainly doesn't seem that way to me. "stop app, clear flags, drop
> caches, restart app" is a pretty simple, easy thing to do for an
> admin.

I want to be clear here: I think this is reasonable.  However, I don't see
consensus for that interface.

Christoph in particular said that a 'lazy change' is: "... straight from
the playbook for arcane and confusing API designs."

	"But returning an error and doing a lazy change anyway is straight from
	the playbook for arcane and confusing API designs."

	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/

Did I somehow misunderstand this?

Again for this patch set, 5.8, lets leave that alone for now.  I think if we
disable setting this on files right now we can still allow it in the future as
another step forward.

> 
> Especially compared to process that is effectively "stop app, backup
> data set, delete data set, clear flags, restore data set, restart
> app"
> 
> > - because that often means you need to restart the
> > application using the file anyway for the flag change to have any effect.
> 
> That's a trivial requirement compared to the downtime and resource
> cost of a data set backup/restore just to clear inode flags....
> 

I agree but others do not.  This still provides a baby step forward and some
granularity for those who plan out the creation of their files.

Ira
Dave Chinner April 8, 2020, 11:48 p.m. UTC | #7
On Wed, Apr 08, 2020 at 03:26:36PM -0700, Ira Weiny wrote:
> On Thu, Apr 09, 2020 at 07:09:50AM +1000, Dave Chinner wrote:
> > On Wed, Apr 08, 2020 at 11:58:03AM +0200, Jan Kara wrote:
> > I explained how we can safely remove the flag in the other branch of
> > this thread...
> > 
> > > But it seems like more understandable API than letting user clear
> > > the on-disk flag but the inode will still use DAX until kernel decides to
> > > evict the inode
> > 
> > Certainly doesn't seem that way to me. "stop app, clear flags, drop
> > caches, restart app" is a pretty simple, easy thing to do for an
> > admin.
> 
> I want to be clear here: I think this is reasonable.  However, I don't see
> consensus for that interface.
> 
> Christoph in particular said that a 'lazy change' is: "... straight from
> the playbook for arcane and confusing API designs."
> 
> 	"But returning an error and doing a lazy change anyway is straight from
> 	the playbook for arcane and confusing API designs."
> 
> 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> 
> Did I somehow misunderstand this?

Yes. Clearing the on-disk flag successfully should not return an
error.

What is wrong is having it clear the flag successfully and returning
an error because the operation doesn't take immediate effect, then
having the change take effect later after telling the application
there was an error.

That's what Christoph was saying is "straight from the playbook for
arcane and confusing API designs."

There's absolutely nothing wrong with setting/clearing the on-disk
flag and having the change take effect some time later depending on
some external context. We've done this sort of thing for a -long
time- and it's not XFS specific at all.

e.g.  changing the on-disk APPEND flag doesn't change the write
behaviour of currently open files - it only affects the behaviour of
future file opens. IOWs, we can have the flag set on disk, but we
can still write randomly to the inode as long as we have a file
descriptor that was opened before the APPEND on disk flag was set.

That's exactly the same class of behaviour as we are talking about
here for the on-disk DAX flag.

> > Especially compared to process that is effectively "stop app, backup
> > data set, delete data set, clear flags, restore data set, restart
> > app"
> > 
> > > - because that often means you need to restart the
> > > application using the file anyway for the flag change to have any effect.
> > 
> > That's a trivial requirement compared to the downtime and resource
> > cost of a data set backup/restore just to clear inode flags....
> 
> I agree but others do not.  This still provides a baby step forward and some

It's not a baby step forward. We can't expose a behaviour to
userspace and then decide to change it completely at some later
date.  We have to think through the entire admin model before
setting it in concrete.

If an admin operation can set an optional persistent feature flags
on a file, then there *must* be admin operations that can remove
that persistent feature flag from said files. This has *nothing to
do with DAX* - it's a fundamental principle of balanced system
design.

Cheers,

Dave.
Christoph Hellwig April 9, 2020, 12:28 p.m. UTC | #8
On Thu, Apr 09, 2020 at 09:48:17AM +1000, Dave Chinner wrote:
> > Christoph in particular said that a 'lazy change' is: "... straight from
> > the playbook for arcane and confusing API designs."
> > 
> > 	"But returning an error and doing a lazy change anyway is straight from
> > 	the playbook for arcane and confusing API designs."
> > 
> > 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> > 
> > Did I somehow misunderstand this?
> 
> Yes. Clearing the on-disk flag successfully should not return an
> error.
> 
> What is wrong is having it clear the flag successfully and returning
> an error because the operation doesn't take immediate effect, then
> having the change take effect later after telling the application
> there was an error.
> 
> That's what Christoph was saying is "straight from the playbook for
> arcane and confusing API designs."

Yes.

> There's absolutely nothing wrong with setting/clearing the on-disk
> flag and having the change take effect some time later depending on
> some external context. We've done this sort of thing for a -long
> time- and it's not XFS specific at all.
> 
> e.g.  changing the on-disk APPEND flag doesn't change the write
> behaviour of currently open files - it only affects the behaviour of
> future file opens. IOWs, we can have the flag set on disk, but we
> can still write randomly to the inode as long as we have a file
> descriptor that was opened before the APPEND on disk flag was set.
> 
> That's exactly the same class of behaviour as we are talking about
> here for the on-disk DAX flag.

Some people consider that a bug, though.  But I don't think we can
change that now.  In general I don't think APIs that don't take
immediate effect are all that great, but in some cases we can live
with them if they are properly documented.  But APIs that return
an error, but actually take effect later anyway are just crazy.

Patch
diff mbox series

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index c6cd92ef4a05..5472faab7c4f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1145,63 +1145,18 @@  xfs_ioctl_setattr_xflags(
 }
 
 /*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
+ * Only directories are allowed to change dax flags
  */
 static int
 xfs_ioctl_setattr_dax_invalidate(
-	struct xfs_inode	*ip,
-	struct fsxattr		*fa,
-	int			*join_flags)
+	struct xfs_inode	*ip)
 {
 	struct inode		*inode = VFS_I(ip);
-	struct super_block	*sb = inode->i_sb;
-	int			error;
-
-	*join_flags = 0;
-
-	/*
-	 * It is only valid to set the DAX flag on regular files and
-	 * directories on filesystems where the block size is equal to the page
-	 * size. On directories it serves as an inherited hint so we don't
-	 * have to check the device for dax support or flush pagecache.
-	 */
-	if (fa->fsx_xflags & FS_XFLAG_DAX) {
-		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
-
-		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
-			return -EINVAL;
-	}
-
-	/* If the DAX state is not changing, we have nothing to do here. */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
-		return 0;
-	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
-		return 0;
-
-	if (S_ISDIR(inode->i_mode))
-		return 0;
 
-	/* lock, flush and invalidate mapping in preparation for flag change */
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	error = filemap_write_and_wait(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-	error = invalidate_inode_pages2(inode->i_mapping);
-	if (error)
-		goto out_unlock;
+	if (!S_ISDIR(inode->i_mode))
+		return -EINVAL;
 
-	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
 	return 0;
-
-out_unlock:
-	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	return error;
-
 }
 
 /*
@@ -1209,17 +1164,10 @@  xfs_ioctl_setattr_dax_invalidate(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xfs_inode	*ip,
-	int			join_flags)
+	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
@@ -1236,8 +1184,7 @@  xfs_ioctl_setattr_get_trans(
 		goto out_unlock;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
-	join_flags = 0;
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1258,8 +1205,6 @@  xfs_ioctl_setattr_get_trans(
 out_cancel:
 	xfs_trans_cancel(tp);
 out_unlock:
-	if (join_flags)
-		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1386,7 +1331,6 @@  xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
-	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1410,18 +1354,11 @@  xfs_ioctl_setattr(
 			return code;
 	}
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
+	code = xfs_ioctl_setattr_dax_invalidate(ip);
 	if (code)
 		goto error_free_dquots;
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1552,7 +1489,6 @@  xfs_ioc_setxflags(
 	struct fsxattr		fa;
 	struct fsxattr		old_fa;
 	unsigned int		flags;
-	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1569,18 +1505,11 @@  xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
+	error = xfs_ioctl_setattr_dax_invalidate(ip);
 	if (error)
 		goto out_drop_write;
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;