Message ID | 20170925231404.32723-8-ross.zwisler@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote: > Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when > any mappings are present. > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > --- > fs/xfs/xfs_ioctl.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c > index 386b437..7a24dd5 100644 > --- a/fs/xfs/xfs_ioctl.c > +++ b/fs/xfs/xfs_ioctl.c > @@ -1012,12 +1012,10 @@ xfs_diflags_to_linux( > inode->i_flags |= S_NOATIME; > else > inode->i_flags &= ~S_NOATIME; > -#if 0 /* disabled until the flag switching races are sorted out */ > if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX)) > inode->i_flags |= S_DAX; > else > inode->i_flags &= ~S_DAX; > -#endif > } > > static bool > @@ -1049,6 +1047,8 @@ xfs_ioctl_setattr_xflags( > { > struct xfs_mount *mp = ip->i_mount; > uint64_t di_flags2; > + struct address_space *mapping = VFS_I(ip)->i_mapping; > + bool dax_changing; > > /* Can't change realtime flag if any extents are allocated. */ > if ((ip->i_d.di_nextents || ip->i_delayed_blks) && > @@ -1084,10 +1084,23 @@ xfs_ioctl_setattr_xflags( > if (di_flags2 && ip->i_d.di_version < 3) > return -EINVAL; > > + dax_changing = xfs_is_dax_state_changing(fa->fsx_xflags, ip); > + if (dax_changing) { > + i_mmap_lock_read(mapping); > + if (mapping_mapped(mapping)) { > + i_mmap_unlock_read(mapping); > + return -EBUSY; > + } > + } > + > ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags); > ip->i_d.di_flags2 = di_flags2; > > xfs_diflags_to_linux(ip); > + > + if (dax_changing) > + i_mmap_unlock_read(mapping); Is this safe to be taking here under the ILOCK_EXCL? i.e. this is the lock order here: IOLOCK_EXCL -> MMAPLOCK_EXCL -> ILOCK_EXCL -> i_mmap_rwsem The truncate path must run outside the ILOCK context, and it does this order via unmap_mapping_range: IOLOCK_EXCL -> MMAPLOCK_EXCL -> i_mmap_rwsem On a page fault, we do: mmap_sem -> MMAPLOCK_EXCL -> page lock -> ILOCK_EXCL Which gives the order IOLOCK_EXCL -> mmap_sem -> MMAPLOCK_EXCL -> page lock -> ILOCK_EXCL -> i_mmap_rwsem What I'm not clear on is what the orders between page locks and pte locks and i_mapping_tree_lock and i_mmap_rwsem. If there's any locks that the filesystem can take above the ILOCK that are also taken under the i_mmap_rwsem, then we have a deadlock vector. Historically we've avoided any mm/ level interactions under the ILOCK_EXCL because of it's location in the page fault path locking order (e.g. lockdep will go nuts if we take a page fault with the ILOCK held). Hence I'm extremely wary of putting any other mm/ level locks under the ILOCK like this without a clear explanation of the locking orders and why it won't deadlock.... Cheers, Dave.
On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote: > Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when > any mappings are present. Before we re-enable it please come up with a coherent description of the per-inode DAX flag that makes sense to a user. We'll also need to find a good place to document it, e.g. a new ioctl_setflags man page.
On Tue, Sep 26, 2017 at 08:36:11AM +0200, Christoph Hellwig wrote: > On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote: > > Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when > > any mappings are present. > > Before we re-enable it please come up with a coherent description > of the per-inode DAX flag that makes sense to a user. We'll also need > to find a good place to document it, e.g. a new ioctl_setflags man > page. I agree that documentation is a great place to start, if we can just agree on what we want the functionality to be. :)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 386b437..7a24dd5 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1012,12 +1012,10 @@ xfs_diflags_to_linux( inode->i_flags |= S_NOATIME; else inode->i_flags &= ~S_NOATIME; -#if 0 /* disabled until the flag switching races are sorted out */ if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX)) inode->i_flags |= S_DAX; else inode->i_flags &= ~S_DAX; -#endif } static bool @@ -1049,6 +1047,8 @@ xfs_ioctl_setattr_xflags( { struct xfs_mount *mp = ip->i_mount; uint64_t di_flags2; + struct address_space *mapping = VFS_I(ip)->i_mapping; + bool dax_changing; /* Can't change realtime flag if any extents are allocated. */ if ((ip->i_d.di_nextents || ip->i_delayed_blks) && @@ -1084,10 +1084,23 @@ xfs_ioctl_setattr_xflags( if (di_flags2 && ip->i_d.di_version < 3) return -EINVAL; + dax_changing = xfs_is_dax_state_changing(fa->fsx_xflags, ip); + if (dax_changing) { + i_mmap_lock_read(mapping); + if (mapping_mapped(mapping)) { + i_mmap_unlock_read(mapping); + return -EBUSY; + } + } + ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags); ip->i_d.di_flags2 = di_flags2; xfs_diflags_to_linux(ip); + + if (dax_changing) + i_mmap_unlock_read(mapping); + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); XFS_STATS_INC(mp, xs_ig_attrchg);
Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when any mappings are present. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> --- fs/xfs/xfs_ioctl.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)