diff mbox series

[4/6] xfs: fix buffer state management in xrep_findroot_block

Message ID 153400172200.27471.13133656951315541955.stgit@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs-4.19: various fixes | expand

Commit Message

Darrick J. Wong Aug. 11, 2018, 3:35 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

We don't quite handle buffer state properly in online repair's findroot
routine.  If the buffer is already in-core we don't want to trash its
b_ops and state, so first we should try _get_buf to grab the buffer.  If
the buffer is loaded, we only want to verify the structure of the buffer
since it could be dirty and the crc hasn't yet been recalculated.

Only if the buffer hasn't been read in should try _read_buf, and if we
were the ones who read the buffer then we must be careful to oneshot the
buffer so that a subsequent _read_buf won't find a buffer with no ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 54 insertions(+), 13 deletions(-)

Comments

Allison Henderson Aug. 12, 2018, 7:53 a.m. UTC | #1
On 08/11/2018 08:35 AM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We don't quite handle buffer state properly in online repair's findroot
> routine.  If the buffer is already in-core we don't want to trash its
> b_ops and state, so first we should try _get_buf to grab the buffer.  If
> the buffer is loaded, we only want to verify the structure of the buffer
> since it could be dirty and the crc hasn't yet been recalculated.
> 
> Only if the buffer hasn't been read in should try _read_buf, and if we
> were the ones who read the buffer then we must be careful to oneshot the
> buffer so that a subsequent _read_buf won't find a buffer with no ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>   fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 54 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 97c3077fb005..fae50dced8bc 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -697,6 +697,7 @@ xrep_findroot_block(
>   	struct xfs_mount		*mp = ri->sc->mp;
>   	struct xfs_buf			*bp;
>   	struct xfs_btree_block		*btblock;
> +	xfs_failaddr_t			fa;
>   	xfs_daddr_t			daddr;
>   	int				block_level;
>   	int				error;
> @@ -718,28 +719,68 @@ xrep_findroot_block(
>   			return error;
>   	}
>   
> -	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
> -			mp->m_bsize, 0, &bp, NULL);
> -	if (error)
> -		return error;
> -
>   	/*
> -	 * Does this look like a block matching our fs and higher than any
> -	 * other block we've found so far?  If so, reattach buffer verifiers
> -	 * so the AIL won't complain if the buffer is also dirty.
> +	 * Try to grab the buffer, on the off chance it's already in memory.
> +	 * If the buffer doesn't have the DONE flag set it hasn't been read
> +	 * into memory yet.  Drop the buffer and read the buffer with NULL
> +	 * b_ops.  (This could race with another read_buf.)  If we get the
> +	 * buffer back with NULL b_ops then we know that there weren't any
> +	 * other readers.  There's a risk we won't match the buffer with any
> +	 * of the findroot prototypes, so we want to encourage the buffer layer
> +	 * to drop the buffer as soon as possible.
>   	 */
> +	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
> +			mp->m_bsize, 0);
> +	if (!bp)
> +		return -ENOMEM;
> +	if (!(bp->b_flags & XBF_DONE)) {
> +		xfs_trans_brelse(ri->sc->tp, bp);
> +
> +		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> +				daddr, mp->m_bsize, 0, &bp, NULL);
> +		if (error)
> +			return error;
> +		if (!bp->b_ops)
> +			xfs_buf_oneshot(bp);
> +	}
> +
> +	/* Does this look like a block matching our fs? */
>   	btblock = XFS_BUF_TO_BLOCK(bp);
>   	if (be32_to_cpu(btblock->bb_magic) != fab->magic)
>   		goto out;
>   	if (xfs_sb_version_hascrc(&mp->m_sb) &&
>   	    !uuid_equal(&btblock->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
>   		goto out;
> -	bp->b_ops = fab->buf_ops;
>   
> -	/* Make sure we pass the verifiers. */
> -	bp->b_ops->verify_read(bp);
> -	if (bp->b_error)
> -		goto out;
> +	/*
> +	 * We've matched this buffer by magic number to this findroot
> +	 * prototype.  If there are no buffer ops attached, attach the one
> +	 * specified by the prototype.  Otherwise, the buffer ops must match
> +	 * the prototype.   We don't want to disturb existing b_ops.
> +	 */
> +	if (bp->b_ops) {
> +		if (bp->b_ops != fab->buf_ops)
> +			goto out;
> +		/*
> +		 * If the buffer was already incore (on a v5 fs) then it should
> +		 * already have had b_ops assigned.  Call ->verify_struct to
> +		 * check the structure.  Avoid checking the CRC because we
> +		 * don't calculate CRCs until the buffer is written by the log.
> +		 */
> +		fa = bp->b_ops->verify_struct(bp);
> +		if (fa)
> +			goto out;
> +	} else {
> +		/*
> +		 * If we have to assign buffer ops, that means that nobody's
> +		 * checked the buffer structure or its CRC.  Do both now by
> +		 * calling ->verify_read.
> +		 */
> +		bp->b_ops = fab->buf_ops;
> +		bp->b_ops->verify_read(bp);
> +		if (bp->b_error)
> +			goto out;
> +	}
>   
>   	/* If we've recorded a root candidate... */
>   	block_level = xfs_btree_get_level(btblock);
> 
Ok, thanks for the commentary, it helps!
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Carlos Maiolino Aug. 13, 2018, 8:05 a.m. UTC | #2
On Sat, Aug 11, 2018 at 08:35:22AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We don't quite handle buffer state properly in online repair's findroot
> routine.  If the buffer is already in-core we don't want to trash its
> b_ops and state, so first we should try _get_buf to grab the buffer.  If
> the buffer is loaded, we only want to verify the structure of the buffer
> since it could be dirty and the crc hasn't yet been recalculated.
> 
> Only if the buffer hasn't been read in should try _read_buf, and if we
> were the ones who read the buffer then we must be careful to oneshot the
> buffer so that a subsequent _read_buf won't find a buffer with no ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

>  fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 54 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 97c3077fb005..fae50dced8bc 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -697,6 +697,7 @@ xrep_findroot_block(
>  	struct xfs_mount		*mp = ri->sc->mp;
>  	struct xfs_buf			*bp;
>  	struct xfs_btree_block		*btblock;
> +	xfs_failaddr_t			fa;
>  	xfs_daddr_t			daddr;
>  	int				block_level;
>  	int				error;
> @@ -718,28 +719,68 @@ xrep_findroot_block(
>  			return error;
>  	}
>  
> -	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
> -			mp->m_bsize, 0, &bp, NULL);
> -	if (error)
> -		return error;
> -
>  	/*
> -	 * Does this look like a block matching our fs and higher than any
> -	 * other block we've found so far?  If so, reattach buffer verifiers
> -	 * so the AIL won't complain if the buffer is also dirty.
> +	 * Try to grab the buffer, on the off chance it's already in memory.
> +	 * If the buffer doesn't have the DONE flag set it hasn't been read
> +	 * into memory yet.  Drop the buffer and read the buffer with NULL
> +	 * b_ops.  (This could race with another read_buf.)  If we get the
> +	 * buffer back with NULL b_ops then we know that there weren't any
> +	 * other readers.  There's a risk we won't match the buffer with any
> +	 * of the findroot prototypes, so we want to encourage the buffer layer
> +	 * to drop the buffer as soon as possible.
>  	 */
> +	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
> +			mp->m_bsize, 0);
> +	if (!bp)
> +		return -ENOMEM;
> +	if (!(bp->b_flags & XBF_DONE)) {
> +		xfs_trans_brelse(ri->sc->tp, bp);
> +
> +		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> +				daddr, mp->m_bsize, 0, &bp, NULL);
> +		if (error)
> +			return error;
> +		if (!bp->b_ops)
> +			xfs_buf_oneshot(bp);
> +	}
> +
> +	/* Does this look like a block matching our fs? */
>  	btblock = XFS_BUF_TO_BLOCK(bp);
>  	if (be32_to_cpu(btblock->bb_magic) != fab->magic)
>  		goto out;
>  	if (xfs_sb_version_hascrc(&mp->m_sb) &&
>  	    !uuid_equal(&btblock->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
>  		goto out;
> -	bp->b_ops = fab->buf_ops;
>  
> -	/* Make sure we pass the verifiers. */
> -	bp->b_ops->verify_read(bp);
> -	if (bp->b_error)
> -		goto out;
> +	/*
> +	 * We've matched this buffer by magic number to this findroot
> +	 * prototype.  If there are no buffer ops attached, attach the one
> +	 * specified by the prototype.  Otherwise, the buffer ops must match
> +	 * the prototype.   We don't want to disturb existing b_ops.
> +	 */
> +	if (bp->b_ops) {
> +		if (bp->b_ops != fab->buf_ops)
> +			goto out;
> +		/*
> +		 * If the buffer was already incore (on a v5 fs) then it should
> +		 * already have had b_ops assigned.  Call ->verify_struct to
> +		 * check the structure.  Avoid checking the CRC because we
> +		 * don't calculate CRCs until the buffer is written by the log.
> +		 */
> +		fa = bp->b_ops->verify_struct(bp);
> +		if (fa)
> +			goto out;
> +	} else {
> +		/*
> +		 * If we have to assign buffer ops, that means that nobody's
> +		 * checked the buffer structure or its CRC.  Do both now by
> +		 * calling ->verify_read.
> +		 */
> +		bp->b_ops = fab->buf_ops;
> +		bp->b_ops->verify_read(bp);
> +		if (bp->b_error)
> +			goto out;
> +	}
>  
>  	/* If we've recorded a root candidate... */
>  	block_level = xfs_btree_get_level(btblock);
>
Brian Foster Aug. 13, 2018, 4:56 p.m. UTC | #3
On Sat, Aug 11, 2018 at 08:35:22AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We don't quite handle buffer state properly in online repair's findroot
> routine.  If the buffer is already in-core we don't want to trash its
> b_ops and state, so first we should try _get_buf to grab the buffer.  If
> the buffer is loaded, we only want to verify the structure of the buffer
> since it could be dirty and the crc hasn't yet been recalculated.
> 
> Only if the buffer hasn't been read in should try _read_buf, and if we
> were the ones who read the buffer then we must be careful to oneshot the
> buffer so that a subsequent _read_buf won't find a buffer with no ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 54 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 97c3077fb005..fae50dced8bc 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -697,6 +697,7 @@ xrep_findroot_block(
>  	struct xfs_mount		*mp = ri->sc->mp;
>  	struct xfs_buf			*bp;
>  	struct xfs_btree_block		*btblock;
> +	xfs_failaddr_t			fa;
>  	xfs_daddr_t			daddr;
>  	int				block_level;
>  	int				error;
> @@ -718,28 +719,68 @@ xrep_findroot_block(
>  			return error;
>  	}
>  
> -	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
> -			mp->m_bsize, 0, &bp, NULL);
> -	if (error)
> -		return error;
> -
>  	/*
> -	 * Does this look like a block matching our fs and higher than any
> -	 * other block we've found so far?  If so, reattach buffer verifiers
> -	 * so the AIL won't complain if the buffer is also dirty.
> +	 * Try to grab the buffer, on the off chance it's already in memory.
> +	 * If the buffer doesn't have the DONE flag set it hasn't been read
> +	 * into memory yet.  Drop the buffer and read the buffer with NULL
> +	 * b_ops.  (This could race with another read_buf.)  If we get the
> +	 * buffer back with NULL b_ops then we know that there weren't any
> +	 * other readers.  There's a risk we won't match the buffer with any
> +	 * of the findroot prototypes, so we want to encourage the buffer layer
> +	 * to drop the buffer as soon as possible.
>  	 */
> +	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
> +			mp->m_bsize, 0);
> +	if (!bp)
> +		return -ENOMEM;
> +	if (!(bp->b_flags & XBF_DONE)) {
> +		xfs_trans_brelse(ri->sc->tp, bp);
> +
> +		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> +				daddr, mp->m_bsize, 0, &bp, NULL);

Maybe I'm missing something, but I thought buf_ops should only be
attached to the buffer if the call actually read the buffer from disk.
Doesn't that mean we could continue to call xfs_trans_read_buf() here
and do the oneshot thing below to cover the case where we read it (and
don't want to leave around a buf with ->b_ops == NULL)?

> +		if (error)
> +			return error;
> +		if (!bp->b_ops)
> +			xfs_buf_oneshot(bp);
> +	}
> +

It's also kind of painful that we have to re-read the same buffer
multiple times to compare it to each fab. It seems like this could use
some refactoring to read each buffer once, check it against each fab
until we get through the verifier, then run the root/level processing on
the fab that matched. Perhaps that's something for another patch though
(make it work vs. make it fast :P)...

Brian

> +	/* Does this look like a block matching our fs? */
>  	btblock = XFS_BUF_TO_BLOCK(bp);
>  	if (be32_to_cpu(btblock->bb_magic) != fab->magic)
>  		goto out;
>  	if (xfs_sb_version_hascrc(&mp->m_sb) &&
>  	    !uuid_equal(&btblock->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
>  		goto out;
> -	bp->b_ops = fab->buf_ops;
>  
> -	/* Make sure we pass the verifiers. */
> -	bp->b_ops->verify_read(bp);
> -	if (bp->b_error)
> -		goto out;
> +	/*
> +	 * We've matched this buffer by magic number to this findroot
> +	 * prototype.  If there are no buffer ops attached, attach the one
> +	 * specified by the prototype.  Otherwise, the buffer ops must match
> +	 * the prototype.   We don't want to disturb existing b_ops.
> +	 */
> +	if (bp->b_ops) {
> +		if (bp->b_ops != fab->buf_ops)
> +			goto out;
> +		/*
> +		 * If the buffer was already incore (on a v5 fs) then it should
> +		 * already have had b_ops assigned.  Call ->verify_struct to
> +		 * check the structure.  Avoid checking the CRC because we
> +		 * don't calculate CRCs until the buffer is written by the log.
> +		 */
> +		fa = bp->b_ops->verify_struct(bp);
> +		if (fa)
> +			goto out;
> +	} else {
> +		/*
> +		 * If we have to assign buffer ops, that means that nobody's
> +		 * checked the buffer structure or its CRC.  Do both now by
> +		 * calling ->verify_read.
> +		 */
> +		bp->b_ops = fab->buf_ops;
> +		bp->b_ops->verify_read(bp);
> +		if (bp->b_error)
> +			goto out;
> +	}
>  
>  	/* If we've recorded a root candidate... */
>  	block_level = xfs_btree_get_level(btblock);
>
Dave Chinner Aug. 13, 2018, 10:56 p.m. UTC | #4
On Sat, Aug 11, 2018 at 08:35:22AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We don't quite handle buffer state properly in online repair's findroot
> routine.  If the buffer is already in-core we don't want to trash its
> b_ops and state, so first we should try _get_buf to grab the buffer.  If
> the buffer is loaded, we only want to verify the structure of the buffer
> since it could be dirty and the crc hasn't yet been recalculated.
> 
> Only if the buffer hasn't been read in should try _read_buf, and if we
> were the ones who read the buffer then we must be careful to oneshot the
> buffer so that a subsequent _read_buf won't find a buffer with no ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

I don't know the history of how this came about, but IMO this isn't
a particularly nice solution.

> ---
>  fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 54 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 97c3077fb005..fae50dced8bc 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -697,6 +697,7 @@ xrep_findroot_block(
>  	struct xfs_mount		*mp = ri->sc->mp;
>  	struct xfs_buf			*bp;
>  	struct xfs_btree_block		*btblock;
> +	xfs_failaddr_t			fa;
>  	xfs_daddr_t			daddr;
>  	int				block_level;
>  	int				error;
> @@ -718,28 +719,68 @@ xrep_findroot_block(
>  			return error;
>  	}
>  
> -	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
> -			mp->m_bsize, 0, &bp, NULL);
> -	if (error)
> -		return error;
> -
>  	/*
> -	 * Does this look like a block matching our fs and higher than any
> -	 * other block we've found so far?  If so, reattach buffer verifiers
> -	 * so the AIL won't complain if the buffer is also dirty.
> +	 * Try to grab the buffer, on the off chance it's already in memory.
> +	 * If the buffer doesn't have the DONE flag set it hasn't been read
> +	 * into memory yet.  Drop the buffer and read the buffer with NULL
> +	 * b_ops.  (This could race with another read_buf.)  If we get the
> +	 * buffer back with NULL b_ops then we know that there weren't any
> +	 * other readers.  There's a risk we won't match the buffer with any
> +	 * of the findroot prototypes, so we want to encourage the buffer layer
> +	 * to drop the buffer as soon as possible.
>  	 */
> +	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
> +			mp->m_bsize, 0);
> +	if (!bp)
> +		return -ENOMEM;
> +	if (!(bp->b_flags & XBF_DONE)) {
> +		xfs_trans_brelse(ri->sc->tp, bp);
> +
> +		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> +				daddr, mp->m_bsize, 0, &bp, NULL);
> +		if (error)
> +			return error;
> +		if (!bp->b_ops)
> +			xfs_buf_oneshot(bp);
> +	}

Let's look a little closer. xfs_trans_read_buf() ends up in
xfs_buf_read_map(), which does:

....
        bp = xfs_buf_get_map(target, map, nmaps, flags);
        if (bp) {
                trace_xfs_buf_read(bp, flags, _RET_IP_);

                if (!(bp->b_flags & XBF_DONE)) {
                        XFS_STATS_INC(target->bt_mount, xb_get_read);
                        bp->b_ops = ops;
                        _xfs_buf_read(bp, flags);
		} else if (flags & XBF_ASYNC) {
.....

But what you are doing in the code above is trying to do is
determine if we needed to call _xfs_buf_read() on the buffer, and if
we do we use a different verify procedure on it.

So isn't there a simpler way to do this? e.g. pass a flag down to
xfs_buf_read_map() that says "use these ops for just this read".

	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
				daddr, mp->m_bsize, XBF_ONESHOT_OPS,
				&bp, NULL);

and change xfs_buf_read_map() to do:


....
        bp = xfs_buf_get_map(target, map, nmaps, flags);
        if (bp) {
                trace_xfs_buf_read(bp, flags, _RET_IP_);

                if (!(bp->b_flags & XBF_DONE)) {
                        XFS_STATS_INC(target->bt_mount, xb_get_read);
			if (flags & XBF_ONESHOT_OPS)
				orig_ops = bp->b_ops;
                        bp->b_ops = ops;
                        _xfs_buf_read(bp, flags);
			if (flags & XBF_ONESHOT_OPS)
				bp->b_ops = orig_ops;
		} else if (flags & XBF_ASYNC) {
			ASSERT(!(flags & XBF_ONESHOT_OPS));
....

Now you get back the buffer with it's original ops on it even if had
to be read from disk and you used a different verifier. Hence you
know how to treat it after this because the ops will be null if it
was not in core and had to be read from disk.

That also means you could supply fab->buf_ops to the
xfs_trans_read_buf() call, knowing that they'll be used on read and
you'll get a null bp->b_ops back despite the buffer already having
been fully verified. i.e. if it fails verification, you'll get an
error rather than having to having to run the verification yourself.
That means you only need to run the ->verify_struct() op if you get
back a non-null bp->b_ops, which further simplifies the function...

Cheers,

Dave.
Darrick J. Wong Sept. 28, 2018, 12:28 a.m. UTC | #5
On Tue, Aug 14, 2018 at 08:56:38AM +1000, Dave Chinner wrote:
> On Sat, Aug 11, 2018 at 08:35:22AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > We don't quite handle buffer state properly in online repair's findroot
> > routine.  If the buffer is already in-core we don't want to trash its
> > b_ops and state, so first we should try _get_buf to grab the buffer.  If
> > the buffer is loaded, we only want to verify the structure of the buffer
> > since it could be dirty and the crc hasn't yet been recalculated.
> > 
> > Only if the buffer hasn't been read in should try _read_buf, and if we
> > were the ones who read the buffer then we must be careful to oneshot the
> > buffer so that a subsequent _read_buf won't find a buffer with no ops.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> I don't know the history of how this came about, but IMO this isn't
> a particularly nice solution.

Ugh, yes.

> > ---
> >  fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 54 insertions(+), 13 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 97c3077fb005..fae50dced8bc 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -697,6 +697,7 @@ xrep_findroot_block(
> >  	struct xfs_mount		*mp = ri->sc->mp;
> >  	struct xfs_buf			*bp;
> >  	struct xfs_btree_block		*btblock;
> > +	xfs_failaddr_t			fa;
> >  	xfs_daddr_t			daddr;
> >  	int				block_level;
> >  	int				error;
> > @@ -718,28 +719,68 @@ xrep_findroot_block(
> >  			return error;
> >  	}
> >  
> > -	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
> > -			mp->m_bsize, 0, &bp, NULL);
> > -	if (error)
> > -		return error;
> > -
> >  	/*
> > -	 * Does this look like a block matching our fs and higher than any
> > -	 * other block we've found so far?  If so, reattach buffer verifiers
> > -	 * so the AIL won't complain if the buffer is also dirty.
> > +	 * Try to grab the buffer, on the off chance it's already in memory.
> > +	 * If the buffer doesn't have the DONE flag set it hasn't been read
> > +	 * into memory yet.  Drop the buffer and read the buffer with NULL
> > +	 * b_ops.  (This could race with another read_buf.)  If we get the
> > +	 * buffer back with NULL b_ops then we know that there weren't any
> > +	 * other readers.  There's a risk we won't match the buffer with any
> > +	 * of the findroot prototypes, so we want to encourage the buffer layer
> > +	 * to drop the buffer as soon as possible.
> >  	 */
> > +	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
> > +			mp->m_bsize, 0);
> > +	if (!bp)
> > +		return -ENOMEM;
> > +	if (!(bp->b_flags & XBF_DONE)) {
> > +		xfs_trans_brelse(ri->sc->tp, bp);
> > +
> > +		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> > +				daddr, mp->m_bsize, 0, &bp, NULL);
> > +		if (error)
> > +			return error;
> > +		if (!bp->b_ops)
> > +			xfs_buf_oneshot(bp);
> > +	}
> 
> Let's look a little closer. xfs_trans_read_buf() ends up in
> xfs_buf_read_map(), which does:
> 
> ....
>         bp = xfs_buf_get_map(target, map, nmaps, flags);
>         if (bp) {
>                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> 
>                 if (!(bp->b_flags & XBF_DONE)) {
>                         XFS_STATS_INC(target->bt_mount, xb_get_read);
>                         bp->b_ops = ops;
>                         _xfs_buf_read(bp, flags);
> 		} else if (flags & XBF_ASYNC) {
> .....
> 
> But what you are doing in the code above is trying to do is
> determine if we needed to call _xfs_buf_read() on the buffer, and if
> we do we use a different verify procedure on it.
> 
> So isn't there a simpler way to do this? e.g. pass a flag down to
> xfs_buf_read_map() that says "use these ops for just this read".
> 
> 	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> 				daddr, mp->m_bsize, XBF_ONESHOT_OPS,
> 				&bp, NULL);
> 
> and change xfs_buf_read_map() to do:
> 
> 
> ....
>         bp = xfs_buf_get_map(target, map, nmaps, flags);
>         if (bp) {
>                 trace_xfs_buf_read(bp, flags, _RET_IP_);
> 
>                 if (!(bp->b_flags & XBF_DONE)) {
>                         XFS_STATS_INC(target->bt_mount, xb_get_read);
> 			if (flags & XBF_ONESHOT_OPS)
> 				orig_ops = bp->b_ops;
>                         bp->b_ops = ops;
>                         _xfs_buf_read(bp, flags);
> 			if (flags & XBF_ONESHOT_OPS)
> 				bp->b_ops = orig_ops;
> 		} else if (flags & XBF_ASYNC) {
> 			ASSERT(!(flags & XBF_ONESHOT_OPS));
> ....
> 
> Now you get back the buffer with it's original ops on it even if had
> to be read from disk and you used a different verifier. Hence you
> know how to treat it after this because the ops will be null if it
> was not in core and had to be read from disk.
> 
> That also means you could supply fab->buf_ops to the
> xfs_trans_read_buf() call, knowing that they'll be used on read and
> you'll get a null bp->b_ops back despite the buffer already having
> been fully verified. i.e. if it fails verification, you'll get an
> error rather than having to having to run the verification yourself.
> That means you only need to run the ->verify_struct() op if you get
> back a non-null bp->b_ops, which further simplifies the function...

Hmm, that seems like a much better solution.  I'll look into it.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Darrick J. Wong Sept. 28, 2018, 12:32 a.m. UTC | #6
On Mon, Aug 13, 2018 at 12:56:59PM -0400, Brian Foster wrote:
> On Sat, Aug 11, 2018 at 08:35:22AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > We don't quite handle buffer state properly in online repair's findroot
> > routine.  If the buffer is already in-core we don't want to trash its
> > b_ops and state, so first we should try _get_buf to grab the buffer.  If
> > the buffer is loaded, we only want to verify the structure of the buffer
> > since it could be dirty and the crc hasn't yet been recalculated.
> > 
> > Only if the buffer hasn't been read in should try _read_buf, and if we
> > were the ones who read the buffer then we must be careful to oneshot the
> > buffer so that a subsequent _read_buf won't find a buffer with no ops.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/repair.c |   67 +++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 54 insertions(+), 13 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 97c3077fb005..fae50dced8bc 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -697,6 +697,7 @@ xrep_findroot_block(
> >  	struct xfs_mount		*mp = ri->sc->mp;
> >  	struct xfs_buf			*bp;
> >  	struct xfs_btree_block		*btblock;
> > +	xfs_failaddr_t			fa;
> >  	xfs_daddr_t			daddr;
> >  	int				block_level;
> >  	int				error;
> > @@ -718,28 +719,68 @@ xrep_findroot_block(
> >  			return error;
> >  	}
> >  
> > -	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
> > -			mp->m_bsize, 0, &bp, NULL);
> > -	if (error)
> > -		return error;
> > -
> >  	/*
> > -	 * Does this look like a block matching our fs and higher than any
> > -	 * other block we've found so far?  If so, reattach buffer verifiers
> > -	 * so the AIL won't complain if the buffer is also dirty.
> > +	 * Try to grab the buffer, on the off chance it's already in memory.
> > +	 * If the buffer doesn't have the DONE flag set it hasn't been read
> > +	 * into memory yet.  Drop the buffer and read the buffer with NULL
> > +	 * b_ops.  (This could race with another read_buf.)  If we get the
> > +	 * buffer back with NULL b_ops then we know that there weren't any
> > +	 * other readers.  There's a risk we won't match the buffer with any
> > +	 * of the findroot prototypes, so we want to encourage the buffer layer
> > +	 * to drop the buffer as soon as possible.
> >  	 */
> > +	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
> > +			mp->m_bsize, 0);
> > +	if (!bp)
> > +		return -ENOMEM;
> > +	if (!(bp->b_flags & XBF_DONE)) {
> > +		xfs_trans_brelse(ri->sc->tp, bp);
> > +
> > +		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
> > +				daddr, mp->m_bsize, 0, &bp, NULL);
> 
> Maybe I'm missing something, but I thought buf_ops should only be
> attached to the buffer if the call actually read the buffer from disk.
> Doesn't that mean we could continue to call xfs_trans_read_buf() here
> and do the oneshot thing below to cover the case where we read it (and
> don't want to leave around a buf with ->b_ops == NULL)?
> 
> > +		if (error)
> > +			return error;
> > +		if (!bp->b_ops)
> > +			xfs_buf_oneshot(bp);
> > +	}
> > +
> 
> It's also kind of painful that we have to re-read the same buffer
> multiple times to compare it to each fab. It seems like this could use
> some refactoring to read each buffer once, check it against each fab
> until we get through the verifier, then run the root/level processing on
> the fab that matched. Perhaps that's something for another patch though
> (make it work vs. make it fast :P)...

FWIW I /think/ Dave's suggestion will greatly simplify the code while
reducing the amount of thrashing we do here.  I'll give it a try and see
what happens.

--D

> Brian
> 
> > +	/* Does this look like a block matching our fs? */
> >  	btblock = XFS_BUF_TO_BLOCK(bp);
> >  	if (be32_to_cpu(btblock->bb_magic) != fab->magic)
> >  		goto out;
> >  	if (xfs_sb_version_hascrc(&mp->m_sb) &&
> >  	    !uuid_equal(&btblock->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
> >  		goto out;
> > -	bp->b_ops = fab->buf_ops;
> >  
> > -	/* Make sure we pass the verifiers. */
> > -	bp->b_ops->verify_read(bp);
> > -	if (bp->b_error)
> > -		goto out;
> > +	/*
> > +	 * We've matched this buffer by magic number to this findroot
> > +	 * prototype.  If there are no buffer ops attached, attach the one
> > +	 * specified by the prototype.  Otherwise, the buffer ops must match
> > +	 * the prototype.   We don't want to disturb existing b_ops.
> > +	 */
> > +	if (bp->b_ops) {
> > +		if (bp->b_ops != fab->buf_ops)
> > +			goto out;
> > +		/*
> > +		 * If the buffer was already incore (on a v5 fs) then it should
> > +		 * already have had b_ops assigned.  Call ->verify_struct to
> > +		 * check the structure.  Avoid checking the CRC because we
> > +		 * don't calculate CRCs until the buffer is written by the log.
> > +		 */
> > +		fa = bp->b_ops->verify_struct(bp);
> > +		if (fa)
> > +			goto out;
> > +	} else {
> > +		/*
> > +		 * If we have to assign buffer ops, that means that nobody's
> > +		 * checked the buffer structure or its CRC.  Do both now by
> > +		 * calling ->verify_read.
> > +		 */
> > +		bp->b_ops = fab->buf_ops;
> > +		bp->b_ops->verify_read(bp);
> > +		if (bp->b_error)
> > +			goto out;
> > +	}
> >  
> >  	/* If we've recorded a root candidate... */
> >  	block_level = xfs_btree_get_level(btblock);
> >
diff mbox series

Patch

diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 97c3077fb005..fae50dced8bc 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -697,6 +697,7 @@  xrep_findroot_block(
 	struct xfs_mount		*mp = ri->sc->mp;
 	struct xfs_buf			*bp;
 	struct xfs_btree_block		*btblock;
+	xfs_failaddr_t			fa;
 	xfs_daddr_t			daddr;
 	int				block_level;
 	int				error;
@@ -718,28 +719,68 @@  xrep_findroot_block(
 			return error;
 	}
 
-	error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp, daddr,
-			mp->m_bsize, 0, &bp, NULL);
-	if (error)
-		return error;
-
 	/*
-	 * Does this look like a block matching our fs and higher than any
-	 * other block we've found so far?  If so, reattach buffer verifiers
-	 * so the AIL won't complain if the buffer is also dirty.
+	 * Try to grab the buffer, on the off chance it's already in memory.
+	 * If the buffer doesn't have the DONE flag set it hasn't been read
+	 * into memory yet.  Drop the buffer and read the buffer with NULL
+	 * b_ops.  (This could race with another read_buf.)  If we get the
+	 * buffer back with NULL b_ops then we know that there weren't any
+	 * other readers.  There's a risk we won't match the buffer with any
+	 * of the findroot prototypes, so we want to encourage the buffer layer
+	 * to drop the buffer as soon as possible.
 	 */
+	bp = xfs_trans_get_buf(ri->sc->tp, mp->m_ddev_targp, daddr,
+			mp->m_bsize, 0);
+	if (!bp)
+		return -ENOMEM;
+	if (!(bp->b_flags & XBF_DONE)) {
+		xfs_trans_brelse(ri->sc->tp, bp);
+
+		error = xfs_trans_read_buf(mp, ri->sc->tp, mp->m_ddev_targp,
+				daddr, mp->m_bsize, 0, &bp, NULL);
+		if (error)
+			return error;
+		if (!bp->b_ops)
+			xfs_buf_oneshot(bp);
+	}
+
+	/* Does this look like a block matching our fs? */
 	btblock = XFS_BUF_TO_BLOCK(bp);
 	if (be32_to_cpu(btblock->bb_magic) != fab->magic)
 		goto out;
 	if (xfs_sb_version_hascrc(&mp->m_sb) &&
 	    !uuid_equal(&btblock->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
 		goto out;
-	bp->b_ops = fab->buf_ops;
 
-	/* Make sure we pass the verifiers. */
-	bp->b_ops->verify_read(bp);
-	if (bp->b_error)
-		goto out;
+	/*
+	 * We've matched this buffer by magic number to this findroot
+	 * prototype.  If there are no buffer ops attached, attach the one
+	 * specified by the prototype.  Otherwise, the buffer ops must match
+	 * the prototype.   We don't want to disturb existing b_ops.
+	 */
+	if (bp->b_ops) {
+		if (bp->b_ops != fab->buf_ops)
+			goto out;
+		/*
+		 * If the buffer was already incore (on a v5 fs) then it should
+		 * already have had b_ops assigned.  Call ->verify_struct to
+		 * check the structure.  Avoid checking the CRC because we
+		 * don't calculate CRCs until the buffer is written by the log.
+		 */
+		fa = bp->b_ops->verify_struct(bp);
+		if (fa)
+			goto out;
+	} else {
+		/*
+		 * If we have to assign buffer ops, that means that nobody's
+		 * checked the buffer structure or its CRC.  Do both now by
+		 * calling ->verify_read.
+		 */
+		bp->b_ops = fab->buf_ops;
+		bp->b_ops->verify_read(bp);
+		if (bp->b_error)
+			goto out;
+	}
 
 	/* If we've recorded a root candidate... */
 	block_level = xfs_btree_get_level(btblock);