Message ID | 157898350371.1566005.2641685060877851666.stgit@magnolia (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xfs: fix buf log item memory corruption on non-amd64 | expand |
> Fortunately for us, remote attribute values are written to disk with > xfs_bwrite(), which is to say that they are not logged. Fix the problem > by removing all places where we could end up creating a buffer log item > for a remote attribute value and leave a note explaining why. This is stil missing a comment that you are using a suitable helper for marking the buffer stale, and why rmeoving the HOLEBLOCK check is safe (which I now tink it is based on looking at the caller). > - error = xfs_trans_read_buf(mp, args->trans, > + error = xfs_trans_read_buf(mp, NULL, > mp->m_ddev_targp, > dblkno, dblkcnt, 0, &bp, > &xfs_attr3_rmt_buf_ops); > @@ -411,7 +428,7 @@ xfs_attr_rmtval_get( > error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino, > &offset, &valuelen, > &dst); > - xfs_trans_brelse(args->trans, bp); > + xfs_buf_relse(bp); FYI, I don't think mixing xfs_trans_read_buf and xfs_buf_relse is a good pattern. > @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent( > * Roll through the "value", invalidating the attribute value's > * blocks. > */ > - tblkno = blkno; > - tblkcnt = blkcnt; > + tblkno = lp->valueblk; > + tblkcnt = lp->valuelen; Nit: these could be easily initialized on the declaration lines. Or even better if you keep the old calling conventions of passing the blockno and count by value, in which case we don't need the extra local variables at all. > @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive( > */ > error = 0; > for (lp = list, i = 0; i < count; i++, lp++) { > - tmp = xfs_attr3_leaf_freextent(trans, dp, > - lp->valueblk, lp->valuelen); > - > + tmp = xfs_attr3_rmt_inactive(dp, lp); So given that we don't touch the transaction I don't think we even need the memory allocation to defer the marking stale of the buffer until after the xfs_trans_brelse. But that could be a separate patch, especially if the block/count calling conventions are kept as-is.
On Tue, Jan 14, 2020 at 12:40:11AM -0800, Christoph Hellwig wrote: > > Fortunately for us, remote attribute values are written to disk with > > xfs_bwrite(), which is to say that they are not logged. Fix the problem > > by removing all places where we could end up creating a buffer log item > > for a remote attribute value and leave a note explaining why. > > This is stil missing a comment that you are using a suitable helper > for marking the buffer stale, and why rmeoving the HOLEBLOCK check > is safe (which I now tink it is based on looking at the caller). Oops, I forgot to update the changelog. > > - error = xfs_trans_read_buf(mp, args->trans, > > + error = xfs_trans_read_buf(mp, NULL, > > mp->m_ddev_targp, > > dblkno, dblkcnt, 0, &bp, > > &xfs_attr3_rmt_buf_ops); > > @@ -411,7 +428,7 @@ xfs_attr_rmtval_get( > > error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino, > > &offset, &valuelen, > > &dst); > > - xfs_trans_brelse(args->trans, bp); > > + xfs_buf_relse(bp); > > FYI, I don't think mixing xfs_trans_read_buf and xfs_buf_relse is a good > pattern. Yeah, you're right. I didn't want to go opencoding the !bp or bp->b_error > 0 cases that happen in xfs_trans_buf_read to make this bug fix an even bigger pile of patches, but maybe it's just time to clean up xfs_buf_read() to return error values like most everywhere else. > > @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent( > > * Roll through the "value", invalidating the attribute value's > > * blocks. > > */ > > - tblkno = blkno; > > - tblkcnt = blkcnt; > > + tblkno = lp->valueblk; > > + tblkcnt = lp->valuelen; > > Nit: these could be easily initialized on the declaration lines. Or > even better if you keep the old calling conventions of passing the > blockno and count by value, in which case we don't need the extra local > variables at all. > > > @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive( > > */ > > error = 0; > > for (lp = list, i = 0; i < count; i++, lp++) { > > - tmp = xfs_attr3_leaf_freextent(trans, dp, > > - lp->valueblk, lp->valuelen); > > - > > + tmp = xfs_attr3_rmt_inactive(dp, lp); > > So given that we don't touch the transaction I don't think we even > need the memory allocation to defer the marking stale of the buffer > until after the xfs_trans_brelse. But that could be a separate > patch, especially if the block/count calling conventions are kept as-is. These last two I'll clean up in a followup patch that gets rid of the pointless local variables in the first function and the pointless memory allocation in the second function. --D
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c index df1ab0569481..55e5b1d65449 100644 --- a/fs/xfs/libxfs/xfs_attr_remote.c +++ b/fs/xfs/libxfs/xfs_attr_remote.c @@ -25,6 +25,23 @@ #define ATTR_RMTVALUE_MAPSIZE 1 /* # of map entries at once */ +/* + * Remote Attribute Values + * ======================= + * + * Remote extended attribute values are conceptually simple -- they're written + * to data blocks mapped by an inode's attribute fork, and they have an upper + * size limit of 64k. Setting a value does not involve the XFS log. + * + * However, on a v5 filesystem, maximally sized remote attr values require one + * block more than 64k worth of space to hold both the remote attribute value + * header (64 bytes). On a 4k block filesystem this results in a 68k buffer; + * on a 64k block filesystem, this would be a 128k buffer. Note that the log + * format can only handle a dirty buffer of XFS_MAX_BLOCKSIZE length (64k). + * Therefore, we /must/ ensure that remote attribute value buffers never touch + * the logging system and therefore never have a log item. + */ + /* * Each contiguous block has a header, so it is not just a simple attribute * length to FSB conversion. @@ -401,7 +418,7 @@ xfs_attr_rmtval_get( (map[i].br_startblock != HOLESTARTBLOCK)); dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock); dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount); - error = xfs_trans_read_buf(mp, args->trans, + error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dblkno, dblkcnt, 0, &bp, &xfs_attr3_rmt_buf_ops); @@ -411,7 +428,7 @@ xfs_attr_rmtval_get( error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino, &offset, &valuelen, &dst); - xfs_trans_brelse(args->trans, bp); + xfs_buf_relse(bp); if (error) return error; diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 5ff49523d8ea..853a5a8defc9 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -25,22 +25,19 @@ #include "xfs_error.h" /* - * Look at all the extents for this logical region, - * invalidate any buffers that are incore/in transactions. + * Invalidate any incore buffers associated with this remote attribute value + * extent. We never log remote attribute value buffers, which means that they + * won't be attached to a transaction and are therefore safe to mark stale. + * The actual bunmapi will be taken care of later. */ STATIC int -xfs_attr3_leaf_freextent( - struct xfs_trans **trans, +xfs_attr3_rmt_inactive( struct xfs_inode *dp, - xfs_dablk_t blkno, - int blkcnt) + struct xfs_attr_inactive_list *lp) { struct xfs_bmbt_irec map; - struct xfs_buf *bp; xfs_dablk_t tblkno; - xfs_daddr_t dblkno; int tblkcnt; - int dblkcnt; int nmap; int error; @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent( * Roll through the "value", invalidating the attribute value's * blocks. */ - tblkno = blkno; - tblkcnt = blkcnt; + tblkno = lp->valueblk; + tblkcnt = lp->valuelen; while (tblkcnt > 0) { /* * Try to remember where we decided to put the value. @@ -57,35 +54,19 @@ xfs_attr3_leaf_freextent( nmap = 1; error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt, &map, &nmap, XFS_BMAPI_ATTRFORK); - if (error) { + if (error) return error; - } - ASSERT(nmap == 1); - ASSERT(map.br_startblock != DELAYSTARTBLOCK); + if (XFS_IS_CORRUPT(dp->i_mount, nmap != 1)) + return -EFSCORRUPTED; /* - * If it's a hole, these are already unmapped - * so there's nothing to invalidate. + * Mark any incore buffers for the remote value as stale. We + * never log remote attr value buffers, so the buffer should be + * easy to kill. */ - if (map.br_startblock != HOLESTARTBLOCK) { - - dblkno = XFS_FSB_TO_DADDR(dp->i_mount, - map.br_startblock); - dblkcnt = XFS_FSB_TO_BB(dp->i_mount, - map.br_blockcount); - bp = xfs_trans_get_buf(*trans, - dp->i_mount->m_ddev_targp, - dblkno, dblkcnt, 0); - if (!bp) - return -ENOMEM; - xfs_trans_binval(*trans, bp); - /* - * Roll to next transaction. - */ - error = xfs_trans_roll_inode(trans, dp); - if (error) - return error; - } + error = xfs_attr_rmtval_stale(dp, &map, 0); + if (error) + return error; tblkno += map.br_blockcount; tblkcnt -= map.br_blockcount; @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive( */ error = 0; for (lp = list, i = 0; i < count; i++, lp++) { - tmp = xfs_attr3_leaf_freextent(trans, dp, - lp->valueblk, lp->valuelen); - + tmp = xfs_attr3_rmt_inactive(dp, lp); if (error == 0) error = tmp; /* save only the 1st errno */ }