Message ID | 157859549406.164065.17179006268680393660.stgit@magnolia (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xfs: fix buf log item memory corruption on non-amd64 | expand |
> While running generic/103, I observed what looks like memory corruption > and (with slub debugging turned on) a slub redzone warning on i386 when > inactivating an inode with a 64k remote attr value. > > On a v5 filesystem, maximally sized remote attr values require one block > more than 64k worth of space to hold both the remote attribute value > header (64 bytes). On a 4k block filesystem this results in a 68k > buffer; on a 64k block filesystem, this would be a 128k buffer. Note > that even though we'll never use more than 65,600 bytes of this buffer, > XFS_MAX_BLOCKSIZE is 64k. > > This is a problem because the definition of struct xfs_buf_log_format > allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k). On i386 when we > invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of > the dirty map, writing right off the end of the log item and corrupting > memory. We've gotten away with this on x86_64 for years because the > compiler inserts a u32 padding on the end of struct xfs_buf_log_format. > > Fortunately for us, remote attribute values are written to disk with > xfs_bwrite(), which is to say that they are not logged. Fix the problem > by removing all places where we could end up creating a buffer log item > for a remote attribute value and leave a note explaining why. I think this changelog needs an explanation why using xfs_attr_rmtval_stale which just trylock and checks if the buffers are in core only in xfs_attr3_leaf_freextent is fine. And while the incore part looks sane to me, I think the trylock is wrong and we need to pass the locking flag to xfs_attr_rmtval_stale.
On Fri, Jan 10, 2020 at 03:57:42AM -0800, Christoph Hellwig wrote: > > While running generic/103, I observed what looks like memory corruption > > and (with slub debugging turned on) a slub redzone warning on i386 when > > inactivating an inode with a 64k remote attr value. > > > > On a v5 filesystem, maximally sized remote attr values require one block > > more than 64k worth of space to hold both the remote attribute value > > header (64 bytes). On a 4k block filesystem this results in a 68k > > buffer; on a 64k block filesystem, this would be a 128k buffer. Note > > that even though we'll never use more than 65,600 bytes of this buffer, > > XFS_MAX_BLOCKSIZE is 64k. > > > > This is a problem because the definition of struct xfs_buf_log_format > > allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k). On i386 when we > > invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of > > the dirty map, writing right off the end of the log item and corrupting > > memory. We've gotten away with this on x86_64 for years because the > > compiler inserts a u32 padding on the end of struct xfs_buf_log_format. > > > > Fortunately for us, remote attribute values are written to disk with > > xfs_bwrite(), which is to say that they are not logged. Fix the problem > > by removing all places where we could end up creating a buffer log item > > for a remote attribute value and leave a note explaining why. > > I think this changelog needs an explanation why using > xfs_attr_rmtval_stale which just trylock and checks if the buffers are > in core only in xfs_attr3_leaf_freextent is fine. Hmm, maybe the *function* needs an explanation for why it's valid to use rmtval_stale here: /* * Invalidate any incore buffers associated with this remote attribute * value extent. We never log remote attribute value buffers, which * means that they won't be attached to a transaction and are therefore * safe to mark stale. The actual bunmapi will be taken care of later. */ STATIC int xfs_attr3_rmt_inactive( struct xfs_inode *dp, struct xfs_attr_inactive_list *lp) I say the function is also confusingly named and could have its arguments list trimmed. > And while the incore > part looks sane to me, I think the trylock is wrong and we need to pass > the locking flag to xfs_attr_rmtval_stale. Hmm, yeah, it's definitely wrong for the inactivation case -- the inactivation thread holds the only incore reference to this unlinked inode, so the buffer had better not be locked by another thread. I'm not even all that sure why XBF_TRYLOCK is necessary in the remove case, since we hold ILOCK_EXCL and there shouldn't be anyone else messing around inside the xattr data. --D
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c index 6babdaac6057..1765f5351f9e 100644 --- a/fs/xfs/libxfs/xfs_attr_remote.c +++ b/fs/xfs/libxfs/xfs_attr_remote.c @@ -25,6 +25,23 @@ #define ATTR_RMTVALUE_MAPSIZE 1 /* # of map entries at once */ +/* + * Remote Attribute Values + * ======================= + * + * Remote extended attribute values are conceptually simple -- they're written + * to data blocks mapped by an inode's attribute fork, and they have an upper + * size limit of 64k. Setting a value does not involve the XFS log. + * + * However, on a v5 filesystem, maximally sized remote attr values require one + * block more than 64k worth of space to hold both the remote attribute value + * header (64 bytes). On a 4k block filesystem this results in a 68k buffer; + * on a 64k block filesystem, this would be a 128k buffer. Note that the log + * format can only handle a dirty buffer of XFS_MAX_BLOCKSIZE length (64k). + * Therefore, we /must/ ensure that remote attribute value buffers never touch + * the logging system and therefore never have a log item. + */ + /* * Each contiguous block has a header, so it is not just a simple attribute * length to FSB conversion. @@ -401,7 +418,7 @@ xfs_attr_rmtval_get( (map[i].br_startblock != HOLESTARTBLOCK)); dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock); dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount); - error = xfs_trans_read_buf(mp, args->trans, + error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dblkno, dblkcnt, 0, &bp, &xfs_attr3_rmt_buf_ops); @@ -411,7 +428,7 @@ xfs_attr_rmtval_get( error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino, &offset, &valuelen, &dst); - xfs_trans_brelse(args->trans, bp); + xfs_buf_relse(bp); if (error) return error; diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 5ff49523d8ea..5c83dd3611af 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -30,17 +30,13 @@ */ STATIC int xfs_attr3_leaf_freextent( - struct xfs_trans **trans, struct xfs_inode *dp, xfs_dablk_t blkno, int blkcnt) { struct xfs_bmbt_irec map; - struct xfs_buf *bp; xfs_dablk_t tblkno; - xfs_daddr_t dblkno; int tblkcnt; - int dblkcnt; int nmap; int error; @@ -64,28 +60,12 @@ xfs_attr3_leaf_freextent( ASSERT(map.br_startblock != DELAYSTARTBLOCK); /* - * If it's a hole, these are already unmapped - * so there's nothing to invalidate. + * Mark any incore buffers for the remote value as stale. We + * never log remote attr value buffers, so the buffer should be + * easy to kill. */ - if (map.br_startblock != HOLESTARTBLOCK) { - - dblkno = XFS_FSB_TO_DADDR(dp->i_mount, - map.br_startblock); - dblkcnt = XFS_FSB_TO_BB(dp->i_mount, - map.br_blockcount); - bp = xfs_trans_get_buf(*trans, - dp->i_mount->m_ddev_targp, - dblkno, dblkcnt, 0); - if (!bp) - return -ENOMEM; - xfs_trans_binval(*trans, bp); - /* - * Roll to next transaction. - */ - error = xfs_trans_roll_inode(trans, dp); - if (error) - return error; - } + if (map.br_startblock != HOLESTARTBLOCK) + xfs_attr_rmtval_stale(dp, &map); tblkno += map.br_blockcount; tblkcnt -= map.br_blockcount; @@ -174,9 +154,7 @@ xfs_attr3_leaf_inactive( */ error = 0; for (lp = list, i = 0; i < count; i++, lp++) { - tmp = xfs_attr3_leaf_freextent(trans, dp, - lp->valueblk, lp->valuelen); - + tmp = xfs_attr3_leaf_freextent(dp, lp->valueblk, lp->valuelen); if (error == 0) error = tmp; /* save only the 1st errno */ }