[2/6] xfs: fix memory corruption during remote attr value buffer invalidation
diff mbox series

Message ID 157898350371.1566005.2641685060877851666.stgit@magnolia
State Superseded
Headers show
Series
  • xfs: fix buf log item memory corruption on non-amd64
Related show

Commit Message

Darrick J. Wong Jan. 14, 2020, 6:31 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

While running generic/103, I observed what looks like memory corruption
and (with slub debugging turned on) a slub redzone warning on i386 when
inactivating an inode with a 64k remote attr value.

On a v5 filesystem, maximally sized remote attr values require one block
more than 64k worth of space to hold both the remote attribute value
header (64 bytes).  On a 4k block filesystem this results in a 68k
buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
that even though we'll never use more than 65,600 bytes of this buffer,
XFS_MAX_BLOCKSIZE is 64k.

This is a problem because the definition of struct xfs_buf_log_format
allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
the dirty map, writing right off the end of the log item and corrupting
memory.  We've gotten away with this on x86_64 for years because the
compiler inserts a u32 padding on the end of struct xfs_buf_log_format.

Fortunately for us, remote attribute values are written to disk with
xfs_bwrite(), which is to say that they are not logged.  Fix the problem
by removing all places where we could end up creating a buffer log item
for a remote attribute value and leave a note explaining why.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   21 +++++++++++++-
 fs/xfs/xfs_attr_inactive.c      |   57 ++++++++++++---------------------------
 2 files changed, 37 insertions(+), 41 deletions(-)

Comments

Christoph Hellwig Jan. 14, 2020, 8:40 a.m. UTC | #1
> Fortunately for us, remote attribute values are written to disk with
> xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> by removing all places where we could end up creating a buffer log item
> for a remote attribute value and leave a note explaining why.

This is stil missing a comment that you are using a suitable helper
for marking the buffer stale, and why rmeoving the HOLEBLOCK check
is safe (which I now tink it is based on looking at the caller).

> -			error = xfs_trans_read_buf(mp, args->trans,
> +			error = xfs_trans_read_buf(mp, NULL,
>  						   mp->m_ddev_targp,
>  						   dblkno, dblkcnt, 0, &bp,
>  						   &xfs_attr3_rmt_buf_ops);
> @@ -411,7 +428,7 @@ xfs_attr_rmtval_get(
>  			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
>  							&offset, &valuelen,
>  							&dst);
> -			xfs_trans_brelse(args->trans, bp);
> +			xfs_buf_relse(bp);

FYI, I don't think mixing xfs_trans_read_buf and xfs_buf_relse is a good
pattern.

> @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent(
>  	 * Roll through the "value", invalidating the attribute value's
>  	 * blocks.
>  	 */
> -	tblkno = blkno;
> -	tblkcnt = blkcnt;
> +	tblkno = lp->valueblk;
> +	tblkcnt = lp->valuelen;

Nit: these could be easily initialized on the declaration lines.  Or
even better if you keep the old calling conventions of passing the
blockno and count by value, in which case we don't need the extra local
variables at all.

> @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive(
>  	 */
>  	error = 0;
>  	for (lp = list, i = 0; i < count; i++, lp++) {
> -		tmp = xfs_attr3_leaf_freextent(trans, dp,
> -				lp->valueblk, lp->valuelen);
> -
> +		tmp = xfs_attr3_rmt_inactive(dp, lp);

So given that we don't touch the transaction I don't think we even
need the memory allocation to defer the marking stale of the buffer
until after the xfs_trans_brelse.  But that could be a separate
patch, especially if the block/count calling conventions are kept as-is.
Darrick J. Wong Jan. 14, 2020, 11:02 p.m. UTC | #2
On Tue, Jan 14, 2020 at 12:40:11AM -0800, Christoph Hellwig wrote:
> > Fortunately for us, remote attribute values are written to disk with
> > xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> > by removing all places where we could end up creating a buffer log item
> > for a remote attribute value and leave a note explaining why.
> 
> This is stil missing a comment that you are using a suitable helper
> for marking the buffer stale, and why rmeoving the HOLEBLOCK check
> is safe (which I now tink it is based on looking at the caller).

Oops, I forgot to update the changelog.


> > -			error = xfs_trans_read_buf(mp, args->trans,
> > +			error = xfs_trans_read_buf(mp, NULL,
> >  						   mp->m_ddev_targp,
> >  						   dblkno, dblkcnt, 0, &bp,
> >  						   &xfs_attr3_rmt_buf_ops);
> > @@ -411,7 +428,7 @@ xfs_attr_rmtval_get(
> >  			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
> >  							&offset, &valuelen,
> >  							&dst);
> > -			xfs_trans_brelse(args->trans, bp);
> > +			xfs_buf_relse(bp);
> 
> FYI, I don't think mixing xfs_trans_read_buf and xfs_buf_relse is a good
> pattern.

Yeah, you're right.  I didn't want to go opencoding the !bp or
bp->b_error > 0 cases that happen in xfs_trans_buf_read to make this bug
fix an even bigger pile of patches, but maybe it's just time to clean up
xfs_buf_read() to return error values like most everywhere else.

> > @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent(
> >  	 * Roll through the "value", invalidating the attribute value's
> >  	 * blocks.
> >  	 */
> > -	tblkno = blkno;
> > -	tblkcnt = blkcnt;
> > +	tblkno = lp->valueblk;
> > +	tblkcnt = lp->valuelen;
> 
> Nit: these could be easily initialized on the declaration lines.  Or
> even better if you keep the old calling conventions of passing the
> blockno and count by value, in which case we don't need the extra local
> variables at all.
> 
> > @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive(
> >  	 */
> >  	error = 0;
> >  	for (lp = list, i = 0; i < count; i++, lp++) {
> > -		tmp = xfs_attr3_leaf_freextent(trans, dp,
> > -				lp->valueblk, lp->valuelen);
> > -
> > +		tmp = xfs_attr3_rmt_inactive(dp, lp);
> 
> So given that we don't touch the transaction I don't think we even
> need the memory allocation to defer the marking stale of the buffer
> until after the xfs_trans_brelse.  But that could be a separate
> patch, especially if the block/count calling conventions are kept as-is.

These last two I'll clean up in a followup patch that gets rid of the
pointless local variables in the first function and the pointless memory
allocation in the second function.

--D

Patch
diff mbox series

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index df1ab0569481..55e5b1d65449 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -25,6 +25,23 @@ 
 
 #define ATTR_RMTVALUE_MAPSIZE	1	/* # of map entries at once */
 
+/*
+ * Remote Attribute Values
+ * =======================
+ *
+ * Remote extended attribute values are conceptually simple -- they're written
+ * to data blocks mapped by an inode's attribute fork, and they have an upper
+ * size limit of 64k.  Setting a value does not involve the XFS log.
+ *
+ * However, on a v5 filesystem, maximally sized remote attr values require one
+ * block more than 64k worth of space to hold both the remote attribute value
+ * header (64 bytes).  On a 4k block filesystem this results in a 68k buffer;
+ * on a 64k block filesystem, this would be a 128k buffer.  Note that the log
+ * format can only handle a dirty buffer of XFS_MAX_BLOCKSIZE length (64k).
+ * Therefore, we /must/ ensure that remote attribute value buffers never touch
+ * the logging system and therefore never have a log item.
+ */
+
 /*
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
@@ -401,7 +418,7 @@  xfs_attr_rmtval_get(
 			       (map[i].br_startblock != HOLESTARTBLOCK));
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
-			error = xfs_trans_read_buf(mp, args->trans,
+			error = xfs_trans_read_buf(mp, NULL,
 						   mp->m_ddev_targp,
 						   dblkno, dblkcnt, 0, &bp,
 						   &xfs_attr3_rmt_buf_ops);
@@ -411,7 +428,7 @@  xfs_attr_rmtval_get(
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
 							&offset, &valuelen,
 							&dst);
-			xfs_trans_brelse(args->trans, bp);
+			xfs_buf_relse(bp);
 			if (error)
 				return error;
 
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 5ff49523d8ea..853a5a8defc9 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -25,22 +25,19 @@ 
 #include "xfs_error.h"
 
 /*
- * Look at all the extents for this logical region,
- * invalidate any buffers that are incore/in transactions.
+ * Invalidate any incore buffers associated with this remote attribute value
+ * extent.   We never log remote attribute value buffers, which means that they
+ * won't be attached to a transaction and are therefore safe to mark stale.
+ * The actual bunmapi will be taken care of later.
  */
 STATIC int
-xfs_attr3_leaf_freextent(
-	struct xfs_trans	**trans,
+xfs_attr3_rmt_inactive(
 	struct xfs_inode	*dp,
-	xfs_dablk_t		blkno,
-	int			blkcnt)
+	struct xfs_attr_inactive_list *lp)
 {
 	struct xfs_bmbt_irec	map;
-	struct xfs_buf		*bp;
 	xfs_dablk_t		tblkno;
-	xfs_daddr_t		dblkno;
 	int			tblkcnt;
-	int			dblkcnt;
 	int			nmap;
 	int			error;
 
@@ -48,8 +45,8 @@  xfs_attr3_leaf_freextent(
 	 * Roll through the "value", invalidating the attribute value's
 	 * blocks.
 	 */
-	tblkno = blkno;
-	tblkcnt = blkcnt;
+	tblkno = lp->valueblk;
+	tblkcnt = lp->valuelen;
 	while (tblkcnt > 0) {
 		/*
 		 * Try to remember where we decided to put the value.
@@ -57,35 +54,19 @@  xfs_attr3_leaf_freextent(
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt,
 				       &map, &nmap, XFS_BMAPI_ATTRFORK);
-		if (error) {
+		if (error)
 			return error;
-		}
-		ASSERT(nmap == 1);
-		ASSERT(map.br_startblock != DELAYSTARTBLOCK);
+		if (XFS_IS_CORRUPT(dp->i_mount, nmap != 1))
+			return -EFSCORRUPTED;
 
 		/*
-		 * If it's a hole, these are already unmapped
-		 * so there's nothing to invalidate.
+		 * Mark any incore buffers for the remote value as stale.  We
+		 * never log remote attr value buffers, so the buffer should be
+		 * easy to kill.
 		 */
-		if (map.br_startblock != HOLESTARTBLOCK) {
-
-			dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
-						  map.br_startblock);
-			dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
-						map.br_blockcount);
-			bp = xfs_trans_get_buf(*trans,
-					dp->i_mount->m_ddev_targp,
-					dblkno, dblkcnt, 0);
-			if (!bp)
-				return -ENOMEM;
-			xfs_trans_binval(*trans, bp);
-			/*
-			 * Roll to next transaction.
-			 */
-			error = xfs_trans_roll_inode(trans, dp);
-			if (error)
-				return error;
-		}
+		error = xfs_attr_rmtval_stale(dp, &map, 0);
+		if (error)
+			return error;
 
 		tblkno += map.br_blockcount;
 		tblkcnt -= map.br_blockcount;
@@ -174,9 +155,7 @@  xfs_attr3_leaf_inactive(
 	 */
 	error = 0;
 	for (lp = list, i = 0; i < count; i++, lp++) {
-		tmp = xfs_attr3_leaf_freextent(trans, dp,
-				lp->valueblk, lp->valuelen);
-
+		tmp = xfs_attr3_rmt_inactive(dp, lp);
 		if (error == 0)
 			error = tmp;	/* save only the 1st errno */
 	}