diff mbox series

[v3,06/12] fs/xfs: Check if the inode supports DAX under lock

Message ID 20200208193445.27421-7-ira.weiny@intel.com (mailing list archive)
State New, archived
Headers show
Series Enable per-file/directory DAX operations V3 | expand

Commit Message

Ira Weiny Feb. 8, 2020, 7:34 p.m. UTC
From: Ira Weiny <ira.weiny@intel.com>

One of the checks for an inode supporting DAX is if the inode is
reflinked.  During a non-DAX to DAX state change we could race with
the file being reflinked and end up with a reflinked file being in DAX
state.

Prevent this race by checking for DAX support under the MMAP_LOCK.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 fs/xfs/xfs_ioctl.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Dave Chinner Feb. 11, 2020, 6:16 a.m. UTC | #1
On Sat, Feb 08, 2020 at 11:34:39AM -0800, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> One of the checks for an inode supporting DAX is if the inode is
> reflinked.  During a non-DAX to DAX state change we could race with
> the file being reflinked and end up with a reflinked file being in DAX
> state.
> 
> Prevent this race by checking for DAX support under the MMAP_LOCK.

The on disk inode flags are protected by the XFS_ILOCK, not the
MMAP_LOCK. i.e. the MMAPLOCK provides data access serialisation, not
metadata modification serialisation.

> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  fs/xfs/xfs_ioctl.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index da1eb2bdb386..4ff402fd6636 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1194,10 +1194,6 @@ xfs_ioctl_setattr_dax_invalidate(
>  
>  	*join_flags = 0;
>  
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
> -	    !xfs_inode_supports_dax(ip))
> -		return -EINVAL;
> -
>  	/* If the DAX state is not changing, we have nothing to do here. */
>  	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
>  	    (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> @@ -1211,6 +1207,13 @@ xfs_ioctl_setattr_dax_invalidate(
>  
>  	/* lock, flush and invalidate mapping in preparation for flag change */
>  	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> +
> +	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
> +	    !xfs_inode_supports_dax(ip)) {
> +		error = -EINVAL;
> +		goto out_unlock;
> +	}

Yes, you might be able to get away with reflink vs dax flag
serialisation on the inode flag modification, but it is not correct and
leaves a landmine for future inode flag modifications that are done
without holding either the MMAP or IOLOCK.

e.g. concurrent calls to xfs_ioctl_setattr() setting/clearing flags
other than the on disk DAX flag are all serialised by the ILOCK_EXCL
and will no be serialised against this DAX check. Hence if there are
other flags that we add in future that affect the result of
xfs_inode_supports_dax(), this code will not be correctly
serialised.

This raciness in checking the DAX flags is the reason that
xfs_ioctl_setattr_xflags() redoes all the reflink vs dax checks once
it's called under the XFS_ILOCK_EXCL during the actual change
transaction....

Cheers,

Dave.
Ira Weiny Feb. 11, 2020, 5:55 p.m. UTC | #2
On Tue, Feb 11, 2020 at 05:16:39PM +1100, Dave Chinner wrote:
> On Sat, Feb 08, 2020 at 11:34:39AM -0800, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > One of the checks for an inode supporting DAX is if the inode is
> > reflinked.  During a non-DAX to DAX state change we could race with
> > the file being reflinked and end up with a reflinked file being in DAX
> > state.
> > 
> > Prevent this race by checking for DAX support under the MMAP_LOCK.
> 
> The on disk inode flags are protected by the XFS_ILOCK, not the
> MMAP_LOCK. i.e. the MMAPLOCK provides data access serialisation, not
> metadata modification serialisation.

Ah...

> 
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  fs/xfs/xfs_ioctl.c | 11 +++++++----
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index da1eb2bdb386..4ff402fd6636 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1194,10 +1194,6 @@ xfs_ioctl_setattr_dax_invalidate(
> >  
> >  	*join_flags = 0;
> >  
> > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
> > -	    !xfs_inode_supports_dax(ip))
> > -		return -EINVAL;
> > -
> >  	/* If the DAX state is not changing, we have nothing to do here. */
> >  	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
> >  	    (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> > @@ -1211,6 +1207,13 @@ xfs_ioctl_setattr_dax_invalidate(
> >  
> >  	/* lock, flush and invalidate mapping in preparation for flag change */
> >  	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > +
> > +	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
> > +	    !xfs_inode_supports_dax(ip)) {
> > +		error = -EINVAL;
> > +		goto out_unlock;
> > +	}
> 
> Yes, you might be able to get away with reflink vs dax flag
> serialisation on the inode flag modification, but it is not correct and
> leaves a landmine for future inode flag modifications that are done
> without holding either the MMAP or IOLOCK.
> 
> e.g. concurrent calls to xfs_ioctl_setattr() setting/clearing flags
> other than the on disk DAX flag are all serialised by the ILOCK_EXCL
> and will no be serialised against this DAX check. Hence if there are
> other flags that we add in future that affect the result of
> xfs_inode_supports_dax(), this code will not be correctly
> serialised.
> 
> This raciness in checking the DAX flags is the reason that
> xfs_ioctl_setattr_xflags() redoes all the reflink vs dax checks once
> it's called under the XFS_ILOCK_EXCL during the actual change
> transaction....

Ok I found this by trying to make sure that the xfs_inode_supports_dax() call
was always returning valid data.  So I don't have a specific test which was
failing.

Looking at the code again, it sounds like I was wrong about which locks protect
what and with your explanation above it looks like there is nothing to be done
here and I can drop the patch.

Would you agree?

Thanks for the review!
Ira

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner Feb. 11, 2020, 8:42 p.m. UTC | #3
On Tue, Feb 11, 2020 at 09:55:09AM -0800, Ira Weiny wrote:
> On Tue, Feb 11, 2020 at 05:16:39PM +1100, Dave Chinner wrote:
> > On Sat, Feb 08, 2020 at 11:34:39AM -0800, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > One of the checks for an inode supporting DAX is if the inode is
> > > reflinked.  During a non-DAX to DAX state change we could race with
> > > the file being reflinked and end up with a reflinked file being in DAX
> > > state.
> > > 
> > > Prevent this race by checking for DAX support under the MMAP_LOCK.
> > 
> > The on disk inode flags are protected by the XFS_ILOCK, not the
> > MMAP_LOCK. i.e. the MMAPLOCK provides data access serialisation, not
> > metadata modification serialisation.
> 
> Ah...
> 
> > 
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > ---
> > >  fs/xfs/xfs_ioctl.c | 11 +++++++----
> > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index da1eb2bdb386..4ff402fd6636 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -1194,10 +1194,6 @@ xfs_ioctl_setattr_dax_invalidate(
> > >  
> > >  	*join_flags = 0;
> > >  
> > > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
> > > -	    !xfs_inode_supports_dax(ip))
> > > -		return -EINVAL;
> > > -
> > >  	/* If the DAX state is not changing, we have nothing to do here. */
> > >  	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > >  	    (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> > > @@ -1211,6 +1207,13 @@ xfs_ioctl_setattr_dax_invalidate(
> > >  
> > >  	/* lock, flush and invalidate mapping in preparation for flag change */
> > >  	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > > +
> > > +	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
> > > +	    !xfs_inode_supports_dax(ip)) {
> > > +		error = -EINVAL;
> > > +		goto out_unlock;
> > > +	}
> > 
> > Yes, you might be able to get away with reflink vs dax flag
> > serialisation on the inode flag modification, but it is not correct and
> > leaves a landmine for future inode flag modifications that are done
> > without holding either the MMAP or IOLOCK.
> > 
> > e.g. concurrent calls to xfs_ioctl_setattr() setting/clearing flags
> > other than the on disk DAX flag are all serialised by the ILOCK_EXCL
> > and will no be serialised against this DAX check. Hence if there are
> > other flags that we add in future that affect the result of
> > xfs_inode_supports_dax(), this code will not be correctly
> > serialised.
> > 
> > This raciness in checking the DAX flags is the reason that
> > xfs_ioctl_setattr_xflags() redoes all the reflink vs dax checks once
> > it's called under the XFS_ILOCK_EXCL during the actual change
> > transaction....
> 
> Ok I found this by trying to make sure that the xfs_inode_supports_dax() call
> was always returning valid data.  So I don't have a specific test which was
> failing.
> 
> Looking at the code again, it sounds like I was wrong about which locks protect
> what and with your explanation above it looks like there is nothing to be done
> here and I can drop the patch.
> 
> Would you agree?

*nod*

Cheers,

Dave.
Ira Weiny Feb. 12, 2020, 4:10 p.m. UTC | #4
On Wed, Feb 12, 2020 at 07:42:20AM +1100, Dave Chinner wrote:
> On Tue, Feb 11, 2020 at 09:55:09AM -0800, Ira Weiny wrote:
> > On Tue, Feb 11, 2020 at 05:16:39PM +1100, Dave Chinner wrote:
> > > On Sat, Feb 08, 2020 at 11:34:39AM -0800, ira.weiny@intel.com wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > 

[snip]

> > > 
> > > This raciness in checking the DAX flags is the reason that
> > > xfs_ioctl_setattr_xflags() redoes all the reflink vs dax checks once
> > > it's called under the XFS_ILOCK_EXCL during the actual change
> > > transaction....
> > 
> > Ok I found this by trying to make sure that the xfs_inode_supports_dax() call
> > was always returning valid data.  So I don't have a specific test which was
> > failing.
> > 
> > Looking at the code again, it sounds like I was wrong about which locks protect
> > what and with your explanation above it looks like there is nothing to be done
> > here and I can drop the patch.
> > 
> > Would you agree?
> 
> *nod*

Thanks! done.
Ira

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index da1eb2bdb386..4ff402fd6636 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1194,10 +1194,6 @@  xfs_ioctl_setattr_dax_invalidate(
 
 	*join_flags = 0;
 
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
-	    !xfs_inode_supports_dax(ip))
-		return -EINVAL;
-
 	/* If the DAX state is not changing, we have nothing to do here. */
 	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
 	    (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
@@ -1211,6 +1207,13 @@  xfs_ioctl_setattr_dax_invalidate(
 
 	/* lock, flush and invalidate mapping in preparation for flag change */
 	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+
+	if ((fa->fsx_xflags & FS_XFLAG_DAX) == FS_XFLAG_DAX &&
+	    !xfs_inode_supports_dax(ip)) {
+		error = -EINVAL;
+		goto out_unlock;
+	}
+
 	error = filemap_write_and_wait(inode->i_mapping);
 	if (error)
 		goto out_unlock;