[2/5] xfs: fix memory corruption during remote attr value buffer invalidation
diff mbox series

Message ID 157859549406.164065.17179006268680393660.stgit@magnolia
State Superseded
Headers show
Series
  • xfs: fix buf log item memory corruption on non-amd64
Related show

Commit Message

Darrick J. Wong Jan. 9, 2020, 6:44 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

While running generic/103, I observed what looks like memory corruption
and (with slub debugging turned on) a slub redzone warning on i386 when
inactivating an inode with a 64k remote attr value.

On a v5 filesystem, maximally sized remote attr values require one block
more than 64k worth of space to hold both the remote attribute value
header (64 bytes).  On a 4k block filesystem this results in a 68k
buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
that even though we'll never use more than 65,600 bytes of this buffer,
XFS_MAX_BLOCKSIZE is 64k.

This is a problem because the definition of struct xfs_buf_log_format
allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
the dirty map, writing right off the end of the log item and corrupting
memory.  We've gotten away with this on x86_64 for years because the
compiler inserts a u32 padding on the end of struct xfs_buf_log_format.

Fortunately for us, remote attribute values are written to disk with
xfs_bwrite(), which is to say that they are not logged.  Fix the problem
by removing all places where we could end up creating a buffer log item
for a remote attribute value and leave a note explaining why.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   21 +++++++++++++++++++--
 fs/xfs/xfs_attr_inactive.c      |   34 ++++++----------------------------
 2 files changed, 25 insertions(+), 30 deletions(-)

Comments

Christoph Hellwig Jan. 10, 2020, 11:57 a.m. UTC | #1
> While running generic/103, I observed what looks like memory corruption
> and (with slub debugging turned on) a slub redzone warning on i386 when
> inactivating an inode with a 64k remote attr value.
> 
> On a v5 filesystem, maximally sized remote attr values require one block
> more than 64k worth of space to hold both the remote attribute value
> header (64 bytes).  On a 4k block filesystem this results in a 68k
> buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
> that even though we'll never use more than 65,600 bytes of this buffer,
> XFS_MAX_BLOCKSIZE is 64k.
> 
> This is a problem because the definition of struct xfs_buf_log_format
> allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
> invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
> the dirty map, writing right off the end of the log item and corrupting
> memory.  We've gotten away with this on x86_64 for years because the
> compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
> 
> Fortunately for us, remote attribute values are written to disk with
> xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> by removing all places where we could end up creating a buffer log item
> for a remote attribute value and leave a note explaining why.

I think this changelog needs an explanation why using
xfs_attr_rmtval_stale which just trylock and checks if the buffers are
in core only in xfs_attr3_leaf_freextent is fine.  And while the incore
part looks sane to me, I think the trylock is wrong and we need to pass
the locking flag to xfs_attr_rmtval_stale.
Darrick J. Wong Jan. 14, 2020, 12:59 a.m. UTC | #2
On Fri, Jan 10, 2020 at 03:57:42AM -0800, Christoph Hellwig wrote:
> > While running generic/103, I observed what looks like memory corruption
> > and (with slub debugging turned on) a slub redzone warning on i386 when
> > inactivating an inode with a 64k remote attr value.
> > 
> > On a v5 filesystem, maximally sized remote attr values require one block
> > more than 64k worth of space to hold both the remote attribute value
> > header (64 bytes).  On a 4k block filesystem this results in a 68k
> > buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
> > that even though we'll never use more than 65,600 bytes of this buffer,
> > XFS_MAX_BLOCKSIZE is 64k.
> > 
> > This is a problem because the definition of struct xfs_buf_log_format
> > allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
> > invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
> > the dirty map, writing right off the end of the log item and corrupting
> > memory.  We've gotten away with this on x86_64 for years because the
> > compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
> > 
> > Fortunately for us, remote attribute values are written to disk with
> > xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> > by removing all places where we could end up creating a buffer log item
> > for a remote attribute value and leave a note explaining why.
> 
> I think this changelog needs an explanation why using
> xfs_attr_rmtval_stale which just trylock and checks if the buffers are
> in core only in xfs_attr3_leaf_freextent is fine.

Hmm, maybe the *function* needs an explanation for why it's valid to use
rmtval_stale here:

/*
 * Invalidate any incore buffers associated with this remote attribute
 * value extent.   We never log remote attribute value buffers, which
 * means that they won't be attached to a transaction and are therefore
 * safe to mark stale.  The actual bunmapi will be taken care of later.
 */
STATIC int
xfs_attr3_rmt_inactive(
	struct xfs_inode	*dp,
	struct xfs_attr_inactive_list *lp)

I say the function is also confusingly named and could have its
arguments list trimmed.

> And while the incore
> part looks sane to me, I think the trylock is wrong and we need to pass
> the locking flag to xfs_attr_rmtval_stale.

Hmm, yeah, it's definitely wrong for the inactivation case -- the
inactivation thread holds the only incore reference to this unlinked
inode, so the buffer had better not be locked by another thread.

I'm not even all that sure why XBF_TRYLOCK is necessary in the remove
case, since we hold ILOCK_EXCL and there shouldn't be anyone else
messing around inside the xattr data.

--D

Patch
diff mbox series

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 6babdaac6057..1765f5351f9e 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -25,6 +25,23 @@ 
 
 #define ATTR_RMTVALUE_MAPSIZE	1	/* # of map entries at once */
 
+/*
+ * Remote Attribute Values
+ * =======================
+ *
+ * Remote extended attribute values are conceptually simple -- they're written
+ * to data blocks mapped by an inode's attribute fork, and they have an upper
+ * size limit of 64k.  Setting a value does not involve the XFS log.
+ *
+ * However, on a v5 filesystem, maximally sized remote attr values require one
+ * block more than 64k worth of space to hold both the remote attribute value
+ * header (64 bytes).  On a 4k block filesystem this results in a 68k buffer;
+ * on a 64k block filesystem, this would be a 128k buffer.  Note that the log
+ * format can only handle a dirty buffer of XFS_MAX_BLOCKSIZE length (64k).
+ * Therefore, we /must/ ensure that remote attribute value buffers never touch
+ * the logging system and therefore never have a log item.
+ */
+
 /*
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
@@ -401,7 +418,7 @@  xfs_attr_rmtval_get(
 			       (map[i].br_startblock != HOLESTARTBLOCK));
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
-			error = xfs_trans_read_buf(mp, args->trans,
+			error = xfs_trans_read_buf(mp, NULL,
 						   mp->m_ddev_targp,
 						   dblkno, dblkcnt, 0, &bp,
 						   &xfs_attr3_rmt_buf_ops);
@@ -411,7 +428,7 @@  xfs_attr_rmtval_get(
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
 							&offset, &valuelen,
 							&dst);
-			xfs_trans_brelse(args->trans, bp);
+			xfs_buf_relse(bp);
 			if (error)
 				return error;
 
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 5ff49523d8ea..5c83dd3611af 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -30,17 +30,13 @@ 
  */
 STATIC int
 xfs_attr3_leaf_freextent(
-	struct xfs_trans	**trans,
 	struct xfs_inode	*dp,
 	xfs_dablk_t		blkno,
 	int			blkcnt)
 {
 	struct xfs_bmbt_irec	map;
-	struct xfs_buf		*bp;
 	xfs_dablk_t		tblkno;
-	xfs_daddr_t		dblkno;
 	int			tblkcnt;
-	int			dblkcnt;
 	int			nmap;
 	int			error;
 
@@ -64,28 +60,12 @@  xfs_attr3_leaf_freextent(
 		ASSERT(map.br_startblock != DELAYSTARTBLOCK);
 
 		/*
-		 * If it's a hole, these are already unmapped
-		 * so there's nothing to invalidate.
+		 * Mark any incore buffers for the remote value as stale.  We
+		 * never log remote attr value buffers, so the buffer should be
+		 * easy to kill.
 		 */
-		if (map.br_startblock != HOLESTARTBLOCK) {
-
-			dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
-						  map.br_startblock);
-			dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
-						map.br_blockcount);
-			bp = xfs_trans_get_buf(*trans,
-					dp->i_mount->m_ddev_targp,
-					dblkno, dblkcnt, 0);
-			if (!bp)
-				return -ENOMEM;
-			xfs_trans_binval(*trans, bp);
-			/*
-			 * Roll to next transaction.
-			 */
-			error = xfs_trans_roll_inode(trans, dp);
-			if (error)
-				return error;
-		}
+		if (map.br_startblock != HOLESTARTBLOCK)
+			xfs_attr_rmtval_stale(dp, &map);
 
 		tblkno += map.br_blockcount;
 		tblkcnt -= map.br_blockcount;
@@ -174,9 +154,7 @@  xfs_attr3_leaf_inactive(
 	 */
 	error = 0;
 	for (lp = list, i = 0; i < count; i++, lp++) {
-		tmp = xfs_attr3_leaf_freextent(trans, dp,
-				lp->valueblk, lp->valuelen);
-
+		tmp = xfs_attr3_leaf_freextent(dp, lp->valueblk, lp->valuelen);
 		if (error == 0)
 			error = tmp;	/* save only the 1st errno */
 	}