diff mbox series

[6/7] xfs: don't expose internal symlink metadata buffers to the vfs

Message ID 163961698851.3129691.1262560189729839928.stgit@magnolia (mailing list archive)
State Accepted, archived
Headers show
Series xfs: random fixes for 5.17 | expand

Commit Message

Darrick J. Wong Dec. 16, 2021, 1:09 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Ian Kent reported that for inline symlinks, it's possible for
vfs_readlink to hang on to the target buffer returned by
_vn_get_link_inline long after it's been freed by xfs inode reclaim.
This is a layering violation -- we should never expose XFS internals to
the VFS.

When the symlink has a remote target, we allocate a separate buffer,
copy the internal information, and let the VFS manage the new buffer's
lifetime.  Let's adapt the inline code paths to do this too.  It's
less efficient, but fixes the layering violation and avoids the need to
adapt the if_data lifetime to rcu rules.  Clearly I don't care about
readlink benchmarks.

As a side note, this fixes the minor locking violation where we can
access the inode data fork without taking any locks; proper locking (and
eliminating the possibility of having to switch inode_operations on a
live inode) is essential to online repair coordinating repairs
correctly.

Reported-by: Ian Kent <raven@themaw.net>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iops.c    |   34 +---------------------------------
 fs/xfs/xfs_symlink.c |   19 ++++++++++++++-----
 2 files changed, 15 insertions(+), 38 deletions(-)

Comments

Dave Chinner Dec. 16, 2021, 5:11 a.m. UTC | #1
On Wed, Dec 15, 2021 at 05:09:48PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Ian Kent reported that for inline symlinks, it's possible for
> vfs_readlink to hang on to the target buffer returned by
> _vn_get_link_inline long after it's been freed by xfs inode reclaim.
> This is a layering violation -- we should never expose XFS internals to
> the VFS.
> 
> When the symlink has a remote target, we allocate a separate buffer,
> copy the internal information, and let the VFS manage the new buffer's
> lifetime.  Let's adapt the inline code paths to do this too.  It's
> less efficient, but fixes the layering violation and avoids the need to
> adapt the if_data lifetime to rcu rules.  Clearly I don't care about
> readlink benchmarks.
> 
> As a side note, this fixes the minor locking violation where we can
> access the inode data fork without taking any locks; proper locking (and
> eliminating the possibility of having to switch inode_operations on a
> live inode) is essential to online repair coordinating repairs
> correctly.
> 
> Reported-by: Ian Kent <raven@themaw.net>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks fine, nicely avoids all the nasty RCU interactions trying to
handle this after the fact.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Ian Kent Dec. 17, 2021, 2:58 a.m. UTC | #2
On Thu, 2021-12-16 at 16:11 +1100, Dave Chinner wrote:
> On Wed, Dec 15, 2021 at 05:09:48PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Ian Kent reported that for inline symlinks, it's possible for
> > vfs_readlink to hang on to the target buffer returned by
> > _vn_get_link_inline long after it's been freed by xfs inode
> > reclaim.
> > This is a layering violation -- we should never expose XFS
> > internals to
> > the VFS.
> > 
> > When the symlink has a remote target, we allocate a separate
> > buffer,
> > copy the internal information, and let the VFS manage the new
> > buffer's
> > lifetime.  Let's adapt the inline code paths to do this too.  It's
> > less efficient, but fixes the layering violation and avoids the
> > need to
> > adapt the if_data lifetime to rcu rules.  Clearly I don't care
> > about
> > readlink benchmarks.
> > 
> > As a side note, this fixes the minor locking violation where we can
> > access the inode data fork without taking any locks; proper locking
> > (and
> > eliminating the possibility of having to switch inode_operations on
> > a
> > live inode) is essential to online repair coordinating repairs
> > correctly.
> > 
> > Reported-by: Ian Kent <raven@themaw.net>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Looks fine, nicely avoids all the nasty RCU interactions trying to
> handle this after the fact.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 

Yes, I like it too and that original rcu gymnastics was always due
to the unclear ownership and lifetime of the link path buffer.

And I don't think needing to switch to ref-walk mode (due to the
memory allocation possibly blocking) is such a big performance
drawback as might be thought.

Acked-by: Ian Kent <raven@themaw.net>
Christoph Hellwig Dec. 24, 2021, 7:22 a.m. UTC | #3
On Wed, Dec 15, 2021 at 05:09:48PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Ian Kent reported that for inline symlinks, it's possible for
> vfs_readlink to hang on to the target buffer returned by
> _vn_get_link_inline long after it's been freed by xfs inode reclaim.
> This is a layering violation -- we should never expose XFS internals to
> the VFS.
> 
> When the symlink has a remote target, we allocate a separate buffer,
> copy the internal information, and let the VFS manage the new buffer's
> lifetime.  Let's adapt the inline code paths to do this too.  It's
> less efficient, but fixes the layering violation and avoids the need to
> adapt the if_data lifetime to rcu rules.  Clearly I don't care about
> readlink benchmarks.
> 
> As a side note, this fixes the minor locking violation where we can
> access the inode data fork without taking any locks; proper locking (and
> eliminating the possibility of having to switch inode_operations on a
> live inode) is essential to online repair coordinating repairs
> correctly.
> 
> Reported-by: Ian Kent <raven@themaw.net>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

This should also allow us to avoid the magic overallocation in
xfs_init_local_fork.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a607d6aca5c4..72bdd7c79e93 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -511,27 +511,6 @@  xfs_vn_get_link(
 	return ERR_PTR(error);
 }
 
-STATIC const char *
-xfs_vn_get_link_inline(
-	struct dentry		*dentry,
-	struct inode		*inode,
-	struct delayed_call	*done)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	char			*link;
-
-	ASSERT(ip->i_df.if_format == XFS_DINODE_FMT_LOCAL);
-
-	/*
-	 * The VFS crashes on a NULL pointer, so return -EFSCORRUPTED if
-	 * if_data is junk.
-	 */
-	link = ip->i_df.if_u1.if_data;
-	if (XFS_IS_CORRUPT(ip->i_mount, !link))
-		return ERR_PTR(-EFSCORRUPTED);
-	return link;
-}
-
 static uint32_t
 xfs_stat_blksize(
 	struct xfs_inode	*ip)
@@ -1250,14 +1229,6 @@  static const struct inode_operations xfs_symlink_inode_operations = {
 	.update_time		= xfs_vn_update_time,
 };
 
-static const struct inode_operations xfs_inline_symlink_inode_operations = {
-	.get_link		= xfs_vn_get_link_inline,
-	.getattr		= xfs_vn_getattr,
-	.setattr		= xfs_vn_setattr,
-	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
-};
-
 /* Figure out if this file actually supports DAX. */
 static bool
 xfs_inode_supports_dax(
@@ -1408,10 +1379,7 @@  xfs_setup_iops(
 		inode->i_fop = &xfs_dir_file_operations;
 		break;
 	case S_IFLNK:
-		if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL)
-			inode->i_op = &xfs_inline_symlink_inode_operations;
-		else
-			inode->i_op = &xfs_symlink_inode_operations;
+		inode->i_op = &xfs_symlink_inode_operations;
 		break;
 	default:
 		inode->i_op = &xfs_inode_operations;
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index fc2c6a404647..4868bd986f52 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -22,6 +22,7 @@ 
 #include "xfs_trace.h"
 #include "xfs_trans.h"
 #include "xfs_ialloc.h"
+#include "xfs_error.h"
 
 /* ----- Kernel only functions below ----- */
 int
@@ -101,12 +102,10 @@  xfs_readlink(
 {
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_fsize_t	pathlen;
-	int		error = 0;
+	int		error = -EFSCORRUPTED;
 
 	trace_xfs_readlink(ip);
 
-	ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_LOCAL);
-
 	if (xfs_is_shutdown(mp))
 		return -EIO;
 
@@ -121,12 +120,22 @@  xfs_readlink(
 			 __func__, (unsigned long long) ip->i_ino,
 			 (long long) pathlen);
 		ASSERT(0);
-		error = -EFSCORRUPTED;
 		goto out;
 	}
 
+	if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
+		/*
+		 * The VFS crashes on a NULL pointer, so return -EFSCORRUPTED if
+		 * if_data is junk.
+		 */
+		if (XFS_IS_CORRUPT(ip->i_mount, !ip->i_df.if_u1.if_data))
+			goto out;
 
-	error = xfs_readlink_bmap_ilocked(ip, link);
+		memcpy(link, ip->i_df.if_u1.if_data, ip->i_disk_size + 1);
+		error = 0;
+	} else {
+		error = xfs_readlink_bmap_ilocked(ip, link);
+	}
 
  out:
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);