Message ID | 163961698851.3129691.1262560189729839928.stgit@magnolia (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | xfs: random fixes for 5.17 | expand |
On Wed, Dec 15, 2021 at 05:09:48PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Ian Kent reported that for inline symlinks, it's possible for > vfs_readlink to hang on to the target buffer returned by > _vn_get_link_inline long after it's been freed by xfs inode reclaim. > This is a layering violation -- we should never expose XFS internals to > the VFS. > > When the symlink has a remote target, we allocate a separate buffer, > copy the internal information, and let the VFS manage the new buffer's > lifetime. Let's adapt the inline code paths to do this too. It's > less efficient, but fixes the layering violation and avoids the need to > adapt the if_data lifetime to rcu rules. Clearly I don't care about > readlink benchmarks. > > As a side note, this fixes the minor locking violation where we can > access the inode data fork without taking any locks; proper locking (and > eliminating the possibility of having to switch inode_operations on a > live inode) is essential to online repair coordinating repairs > correctly. > > Reported-by: Ian Kent <raven@themaw.net> > Signed-off-by: Darrick J. Wong <djwong@kernel.org> Looks fine, nicely avoids all the nasty RCU interactions trying to handle this after the fact. Reviewed-by: Dave Chinner <dchinner@redhat.com>
On Thu, 2021-12-16 at 16:11 +1100, Dave Chinner wrote: > On Wed, Dec 15, 2021 at 05:09:48PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Ian Kent reported that for inline symlinks, it's possible for > > vfs_readlink to hang on to the target buffer returned by > > _vn_get_link_inline long after it's been freed by xfs inode > > reclaim. > > This is a layering violation -- we should never expose XFS > > internals to > > the VFS. > > > > When the symlink has a remote target, we allocate a separate > > buffer, > > copy the internal information, and let the VFS manage the new > > buffer's > > lifetime. Let's adapt the inline code paths to do this too. It's > > less efficient, but fixes the layering violation and avoids the > > need to > > adapt the if_data lifetime to rcu rules. Clearly I don't care > > about > > readlink benchmarks. > > > > As a side note, this fixes the minor locking violation where we can > > access the inode data fork without taking any locks; proper locking > > (and > > eliminating the possibility of having to switch inode_operations on > > a > > live inode) is essential to online repair coordinating repairs > > correctly. > > > > Reported-by: Ian Kent <raven@themaw.net> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > Looks fine, nicely avoids all the nasty RCU interactions trying to > handle this after the fact. > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > Yes, I like it too and that original rcu gymnastics was always due to the unclear ownership and lifetime of the link path buffer. And I don't think needing to switch to ref-walk mode (due to the memory allocation possibly blocking) is such a big performance drawback as might be thought. Acked-by: Ian Kent <raven@themaw.net>
On Wed, Dec 15, 2021 at 05:09:48PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Ian Kent reported that for inline symlinks, it's possible for > vfs_readlink to hang on to the target buffer returned by > _vn_get_link_inline long after it's been freed by xfs inode reclaim. > This is a layering violation -- we should never expose XFS internals to > the VFS. > > When the symlink has a remote target, we allocate a separate buffer, > copy the internal information, and let the VFS manage the new buffer's > lifetime. Let's adapt the inline code paths to do this too. It's > less efficient, but fixes the layering violation and avoids the need to > adapt the if_data lifetime to rcu rules. Clearly I don't care about > readlink benchmarks. > > As a side note, this fixes the minor locking violation where we can > access the inode data fork without taking any locks; proper locking (and > eliminating the possibility of having to switch inode_operations on a > live inode) is essential to online repair coordinating repairs > correctly. > > Reported-by: Ian Kent <raven@themaw.net> > Signed-off-by: Darrick J. Wong <djwong@kernel.org> Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> This should also allow us to avoid the magic overallocation in xfs_init_local_fork.
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a607d6aca5c4..72bdd7c79e93 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -511,27 +511,6 @@ xfs_vn_get_link( return ERR_PTR(error); } -STATIC const char * -xfs_vn_get_link_inline( - struct dentry *dentry, - struct inode *inode, - struct delayed_call *done) -{ - struct xfs_inode *ip = XFS_I(inode); - char *link; - - ASSERT(ip->i_df.if_format == XFS_DINODE_FMT_LOCAL); - - /* - * The VFS crashes on a NULL pointer, so return -EFSCORRUPTED if - * if_data is junk. - */ - link = ip->i_df.if_u1.if_data; - if (XFS_IS_CORRUPT(ip->i_mount, !link)) - return ERR_PTR(-EFSCORRUPTED); - return link; -} - static uint32_t xfs_stat_blksize( struct xfs_inode *ip) @@ -1250,14 +1229,6 @@ static const struct inode_operations xfs_symlink_inode_operations = { .update_time = xfs_vn_update_time, }; -static const struct inode_operations xfs_inline_symlink_inode_operations = { - .get_link = xfs_vn_get_link_inline, - .getattr = xfs_vn_getattr, - .setattr = xfs_vn_setattr, - .listxattr = xfs_vn_listxattr, - .update_time = xfs_vn_update_time, -}; - /* Figure out if this file actually supports DAX. */ static bool xfs_inode_supports_dax( @@ -1408,10 +1379,7 @@ xfs_setup_iops( inode->i_fop = &xfs_dir_file_operations; break; case S_IFLNK: - if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) - inode->i_op = &xfs_inline_symlink_inode_operations; - else - inode->i_op = &xfs_symlink_inode_operations; + inode->i_op = &xfs_symlink_inode_operations; break; default: inode->i_op = &xfs_inode_operations; diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index fc2c6a404647..4868bd986f52 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -22,6 +22,7 @@ #include "xfs_trace.h" #include "xfs_trans.h" #include "xfs_ialloc.h" +#include "xfs_error.h" /* ----- Kernel only functions below ----- */ int @@ -101,12 +102,10 @@ xfs_readlink( { struct xfs_mount *mp = ip->i_mount; xfs_fsize_t pathlen; - int error = 0; + int error = -EFSCORRUPTED; trace_xfs_readlink(ip); - ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_LOCAL); - if (xfs_is_shutdown(mp)) return -EIO; @@ -121,12 +120,22 @@ xfs_readlink( __func__, (unsigned long long) ip->i_ino, (long long) pathlen); ASSERT(0); - error = -EFSCORRUPTED; goto out; } + if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) { + /* + * The VFS crashes on a NULL pointer, so return -EFSCORRUPTED if + * if_data is junk. + */ + if (XFS_IS_CORRUPT(ip->i_mount, !ip->i_df.if_u1.if_data)) + goto out; - error = xfs_readlink_bmap_ilocked(ip, link); + memcpy(link, ip->i_df.if_u1.if_data, ip->i_disk_size + 1); + error = 0; + } else { + error = xfs_readlink_bmap_ilocked(ip, link); + } out: xfs_iunlock(ip, XFS_ILOCK_SHARED);