diff mbox

[0.1/13] xfs: release new dquot buffer on defer_finish error

Message ID 20180503175343.GF4127@magnolia (mailing list archive)
State Superseded
Headers show

Commit Message

Darrick J. Wong May 3, 2018, 5:53 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
allocating a new dquot record", we allocate a new dquot block, grab a
buffer to initialize it, and return the locked initialized dquot buffer
to the caller for further in-core dquot initialization.  Unfortunately,
if the _bmap_finish errored out, _qm_dqalloc would also error out
without bothering to free the (locked) buffer.  Leaking a locked buffer
caused hangs in generic/388 when quotas are enabled.

Furthermore, the _bmap_finish -> _defer_finish conversion in
310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
xfs_defer_*") failed to observe that the buffer was held going into
_defer_finish and therefore failed to notice that the buffer lock is
/not/ maintained afterwards.  Now that we can bjoin a buffer to a
defer_ops, use this mechanism to ensure that the buffer stays locked
across the _defer_finish.  Release the holds and locks on the buffer as
appropriate if we have to error out.

There is a subtlety here for the caller in that the buffer emerges
locked and held to the transaction, so if the _trans_commit fails we
have to release the buffer explicitly.  This fixes the unmount hang
in generic/388 when quotas are enabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 21 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster May 4, 2018, 11:31 a.m. UTC | #1
On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> allocating a new dquot record", we allocate a new dquot block, grab a
> buffer to initialize it, and return the locked initialized dquot buffer
> to the caller for further in-core dquot initialization.  Unfortunately,
> if the _bmap_finish errored out, _qm_dqalloc would also error out
> without bothering to free the (locked) buffer.  Leaking a locked buffer
> caused hangs in generic/388 when quotas are enabled.
> 
> Furthermore, the _bmap_finish -> _defer_finish conversion in
> 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> xfs_defer_*") failed to observe that the buffer was held going into
> _defer_finish and therefore failed to notice that the buffer lock is
> /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> defer_ops, use this mechanism to ensure that the buffer stays locked
> across the _defer_finish.  Release the holds and locks on the buffer as
> appropriate if we have to error out.
> 
> There is a subtlety here for the caller in that the buffer emerges
> locked and held to the transaction, so if the _trans_commit fails we
> have to release the buffer explicitly.  This fixes the unmount hang
> in generic/388 when quotas are enabled.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
>  1 file changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index a7daef9e16bf..4c39d8632230 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
>  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
>  
>  	/*
> -	 * xfs_defer_finish() may commit the current transaction and
> -	 * start a second transaction if the freelist is not empty.
> +	 * Hold the buffer and join it to the dfops so that we'll still own
> +	 * the buffer when we return to the caller.  The buffer disposal on
> +	 * error must be paid attention to very carefully, as it has been
> +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> +	 * code when allocating a new dquot record" in 2005, and the later
> +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> +	 * the buffer locked across the _defer_finish call.  We can now do
> +	 * this correctly with xfs_defer_bjoin.
>  	 *
> -	 * Since we still want to modify this buffer, we need to
> -	 * ensure that the buffer is not released on commit of
> -	 * the first transaction and ensure the buffer is added to the
> -	 * second transaction.
> +	 * Above, we allocated a disk block for the dquot information and
> +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> +	 * transaction is gone but the new buffer is not joined or held to any
> +	 * transaction, so we must _buf_relse it.
>  	 *
> -	 * If there is only one transaction then don't stop the buffer
> -	 * from being released when it commits later on.
> +	 * If everything succeeds, the caller of this function is returned a
> +	 * buffer that is locked, held, and joined to the transaction.  If the
> +	 * transaction commit fails (in the caller) the caller must unlock the
> +	 * buffer manually.

If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
that the caller has to ultimately release it even after successful
transaction commit (assuming we don't roll the transaction again
somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
but it looks like that only clears the hold if the buffer isn't logged
in the tx. Hm?

Brian

>  	 */
> -
> -	xfs_trans_bhold(tp, bp);
> -
> +	xfs_trans_bhold(*tpp, bp);
> +	error = xfs_defer_bjoin(&dfops, bp);
> +	if (error) {
> +		xfs_trans_bhold_release(*tpp, bp);
> +		xfs_trans_brelse(*tpp, bp);
> +		goto error1;
> +	}
>  	error = xfs_defer_finish(tpp, &dfops);
> -	if (error)
> +	if (error) {
> +		xfs_buf_relse(bp);
>  		goto error1;
> -
> -	/* Transaction was committed? */
> -	if (*tpp != tp) {
> -		tp = *tpp;
> -		xfs_trans_bjoin(tp, bp);
> -	} else {
> -		xfs_trans_bhold_release(tp, bp);
>  	}
> -
> -	*O_bpp = bp;
>  	return 0;
>  
>  error1:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong May 4, 2018, 3:12 p.m. UTC | #2
On Fri, May 04, 2018 at 07:31:58AM -0400, Brian Foster wrote:
> On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> > allocating a new dquot record", we allocate a new dquot block, grab a
> > buffer to initialize it, and return the locked initialized dquot buffer
> > to the caller for further in-core dquot initialization.  Unfortunately,
> > if the _bmap_finish errored out, _qm_dqalloc would also error out
> > without bothering to free the (locked) buffer.  Leaking a locked buffer
> > caused hangs in generic/388 when quotas are enabled.
> > 
> > Furthermore, the _bmap_finish -> _defer_finish conversion in
> > 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> > xfs_defer_*") failed to observe that the buffer was held going into
> > _defer_finish and therefore failed to notice that the buffer lock is
> > /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> > defer_ops, use this mechanism to ensure that the buffer stays locked
> > across the _defer_finish.  Release the holds and locks on the buffer as
> > appropriate if we have to error out.
> > 
> > There is a subtlety here for the caller in that the buffer emerges
> > locked and held to the transaction, so if the _trans_commit fails we
> > have to release the buffer explicitly.  This fixes the unmount hang
> > in generic/388 when quotas are enabled.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
> >  1 file changed, 27 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index a7daef9e16bf..4c39d8632230 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
> >  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
> >  
> >  	/*
> > -	 * xfs_defer_finish() may commit the current transaction and
> > -	 * start a second transaction if the freelist is not empty.
> > +	 * Hold the buffer and join it to the dfops so that we'll still own
> > +	 * the buffer when we return to the caller.  The buffer disposal on
> > +	 * error must be paid attention to very carefully, as it has been
> > +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> > +	 * code when allocating a new dquot record" in 2005, and the later
> > +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> > +	 * the buffer locked across the _defer_finish call.  We can now do
> > +	 * this correctly with xfs_defer_bjoin.
> >  	 *
> > -	 * Since we still want to modify this buffer, we need to
> > -	 * ensure that the buffer is not released on commit of
> > -	 * the first transaction and ensure the buffer is added to the
> > -	 * second transaction.
> > +	 * Above, we allocated a disk block for the dquot information and
> > +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> > +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> > +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> > +	 * transaction is gone but the new buffer is not joined or held to any
> > +	 * transaction, so we must _buf_relse it.
> >  	 *
> > -	 * If there is only one transaction then don't stop the buffer
> > -	 * from being released when it commits later on.
> > +	 * If everything succeeds, the caller of this function is returned a
> > +	 * buffer that is locked, held, and joined to the transaction.  If the
> > +	 * transaction commit fails (in the caller) the caller must unlock the
> > +	 * buffer manually.
> 
> If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
> that the caller has to ultimately release it even after successful
> transaction commit (assuming we don't roll the transaction again
> somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
> but it looks like that only clears the hold if the buffer isn't logged
> in the tx. Hm?

Correct.  The buffer is initialized in the same transaction as the dquot
block allocation and committed in xfs_defer_finish.  After
initialization (which is to say when we return to xfs_qm_dqtobp), the
buffer is held, joined, and not logged to the transaction, and nothing
else is supposed to dirty the buffer.  Both buffer and transaction are
then returned in this state to _dqread, which the in-core dquot state
out of the dquot buffer and _trans_brelse's the (still clean) buffer,
which breaks the hold and unlocks the buffer.

After the refactor we guarantee that the buffer is locked, clean, and
not attached to a transaction by the time we get to calling
xfs_dquot_from_disk rather than returning transaction and buffer up the
call stack and having to reason up the stack about what state they're in.

--D

> Brian
> 
> >  	 */
> > -
> > -	xfs_trans_bhold(tp, bp);
> > -
> > +	xfs_trans_bhold(*tpp, bp);
> > +	error = xfs_defer_bjoin(&dfops, bp);
> > +	if (error) {
> > +		xfs_trans_bhold_release(*tpp, bp);
> > +		xfs_trans_brelse(*tpp, bp);
> > +		goto error1;
> > +	}
> >  	error = xfs_defer_finish(tpp, &dfops);
> > -	if (error)
> > +	if (error) {
> > +		xfs_buf_relse(bp);
> >  		goto error1;
> > -
> > -	/* Transaction was committed? */
> > -	if (*tpp != tp) {
> > -		tp = *tpp;
> > -		xfs_trans_bjoin(tp, bp);
> > -	} else {
> > -		xfs_trans_bhold_release(tp, bp);
> >  	}
> > -
> > -	*O_bpp = bp;
> >  	return 0;
> >  
> >  error1:
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster May 4, 2018, 3:41 p.m. UTC | #3
On Fri, May 04, 2018 at 08:12:47AM -0700, Darrick J. Wong wrote:
> On Fri, May 04, 2018 at 07:31:58AM -0400, Brian Foster wrote:
> > On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> > > allocating a new dquot record", we allocate a new dquot block, grab a
> > > buffer to initialize it, and return the locked initialized dquot buffer
> > > to the caller for further in-core dquot initialization.  Unfortunately,
> > > if the _bmap_finish errored out, _qm_dqalloc would also error out
> > > without bothering to free the (locked) buffer.  Leaking a locked buffer
> > > caused hangs in generic/388 when quotas are enabled.
> > > 
> > > Furthermore, the _bmap_finish -> _defer_finish conversion in
> > > 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> > > xfs_defer_*") failed to observe that the buffer was held going into
> > > _defer_finish and therefore failed to notice that the buffer lock is
> > > /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> > > defer_ops, use this mechanism to ensure that the buffer stays locked
> > > across the _defer_finish.  Release the holds and locks on the buffer as
> > > appropriate if we have to error out.
> > > 
> > > There is a subtlety here for the caller in that the buffer emerges
> > > locked and held to the transaction, so if the _trans_commit fails we
> > > have to release the buffer explicitly.  This fixes the unmount hang
> > > in generic/388 when quotas are enabled.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
> > >  1 file changed, 27 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > index a7daef9e16bf..4c39d8632230 100644
> > > --- a/fs/xfs/xfs_dquot.c
> > > +++ b/fs/xfs/xfs_dquot.c
> > > @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
> > >  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
> > >  
> > >  	/*
> > > -	 * xfs_defer_finish() may commit the current transaction and
> > > -	 * start a second transaction if the freelist is not empty.
> > > +	 * Hold the buffer and join it to the dfops so that we'll still own
> > > +	 * the buffer when we return to the caller.  The buffer disposal on
> > > +	 * error must be paid attention to very carefully, as it has been
> > > +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> > > +	 * code when allocating a new dquot record" in 2005, and the later
> > > +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> > > +	 * the buffer locked across the _defer_finish call.  We can now do
> > > +	 * this correctly with xfs_defer_bjoin.
> > >  	 *
> > > -	 * Since we still want to modify this buffer, we need to
> > > -	 * ensure that the buffer is not released on commit of
> > > -	 * the first transaction and ensure the buffer is added to the
> > > -	 * second transaction.
> > > +	 * Above, we allocated a disk block for the dquot information and
> > > +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> > > +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> > > +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> > > +	 * transaction is gone but the new buffer is not joined or held to any
> > > +	 * transaction, so we must _buf_relse it.
> > >  	 *
> > > -	 * If there is only one transaction then don't stop the buffer
> > > -	 * from being released when it commits later on.
> > > +	 * If everything succeeds, the caller of this function is returned a
> > > +	 * buffer that is locked, held, and joined to the transaction.  If the
> > > +	 * transaction commit fails (in the caller) the caller must unlock the
> > > +	 * buffer manually.
> > 
> > If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
> > that the caller has to ultimately release it even after successful
> > transaction commit (assuming we don't roll the transaction again
> > somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
> > but it looks like that only clears the hold if the buffer isn't logged
> > in the tx. Hm?
> 
> Correct.  The buffer is initialized in the same transaction as the dquot
> block allocation and committed in xfs_defer_finish.  After
> initialization (which is to say when we return to xfs_qm_dqtobp), the
> buffer is held, joined, and not logged to the transaction, and nothing
> else is supposed to dirty the buffer.  Both buffer and transaction are
> then returned in this state to _dqread, which the in-core dquot state
> out of the dquot buffer and _trans_brelse's the (still clean) buffer,
> which breaks the hold and unlocks the buffer.
> 

Ok that makes sense, but doesn't that depend on having a deferred
operation? Is that always guaranteed here?

Brian

> After the refactor we guarantee that the buffer is locked, clean, and
> not attached to a transaction by the time we get to calling
> xfs_dquot_from_disk rather than returning transaction and buffer up the
> call stack and having to reason up the stack about what state they're in.
> 
> --D
> 
> > Brian
> > 
> > >  	 */
> > > -
> > > -	xfs_trans_bhold(tp, bp);
> > > -
> > > +	xfs_trans_bhold(*tpp, bp);
> > > +	error = xfs_defer_bjoin(&dfops, bp);
> > > +	if (error) {
> > > +		xfs_trans_bhold_release(*tpp, bp);
> > > +		xfs_trans_brelse(*tpp, bp);
> > > +		goto error1;
> > > +	}
> > >  	error = xfs_defer_finish(tpp, &dfops);
> > > -	if (error)
> > > +	if (error) {
> > > +		xfs_buf_relse(bp);
> > >  		goto error1;
> > > -
> > > -	/* Transaction was committed? */
> > > -	if (*tpp != tp) {
> > > -		tp = *tpp;
> > > -		xfs_trans_bjoin(tp, bp);
> > > -	} else {
> > > -		xfs_trans_bhold_release(tp, bp);
> > >  	}
> > > -
> > > -	*O_bpp = bp;
> > >  	return 0;
> > >  
> > >  error1:
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong May 4, 2018, 3:52 p.m. UTC | #4
On Fri, May 04, 2018 at 11:41:21AM -0400, Brian Foster wrote:
> On Fri, May 04, 2018 at 08:12:47AM -0700, Darrick J. Wong wrote:
> > On Fri, May 04, 2018 at 07:31:58AM -0400, Brian Foster wrote:
> > > On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> > > > allocating a new dquot record", we allocate a new dquot block, grab a
> > > > buffer to initialize it, and return the locked initialized dquot buffer
> > > > to the caller for further in-core dquot initialization.  Unfortunately,
> > > > if the _bmap_finish errored out, _qm_dqalloc would also error out
> > > > without bothering to free the (locked) buffer.  Leaking a locked buffer
> > > > caused hangs in generic/388 when quotas are enabled.
> > > > 
> > > > Furthermore, the _bmap_finish -> _defer_finish conversion in
> > > > 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> > > > xfs_defer_*") failed to observe that the buffer was held going into
> > > > _defer_finish and therefore failed to notice that the buffer lock is
> > > > /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> > > > defer_ops, use this mechanism to ensure that the buffer stays locked
> > > > across the _defer_finish.  Release the holds and locks on the buffer as
> > > > appropriate if we have to error out.
> > > > 
> > > > There is a subtlety here for the caller in that the buffer emerges
> > > > locked and held to the transaction, so if the _trans_commit fails we
> > > > have to release the buffer explicitly.  This fixes the unmount hang
> > > > in generic/388 when quotas are enabled.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
> > > >  1 file changed, 27 insertions(+), 21 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > > index a7daef9e16bf..4c39d8632230 100644
> > > > --- a/fs/xfs/xfs_dquot.c
> > > > +++ b/fs/xfs/xfs_dquot.c
> > > > @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
> > > >  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
> > > >  
> > > >  	/*
> > > > -	 * xfs_defer_finish() may commit the current transaction and
> > > > -	 * start a second transaction if the freelist is not empty.
> > > > +	 * Hold the buffer and join it to the dfops so that we'll still own
> > > > +	 * the buffer when we return to the caller.  The buffer disposal on
> > > > +	 * error must be paid attention to very carefully, as it has been
> > > > +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> > > > +	 * code when allocating a new dquot record" in 2005, and the later
> > > > +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> > > > +	 * the buffer locked across the _defer_finish call.  We can now do
> > > > +	 * this correctly with xfs_defer_bjoin.
> > > >  	 *
> > > > -	 * Since we still want to modify this buffer, we need to
> > > > -	 * ensure that the buffer is not released on commit of
> > > > -	 * the first transaction and ensure the buffer is added to the
> > > > -	 * second transaction.
> > > > +	 * Above, we allocated a disk block for the dquot information and
> > > > +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> > > > +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> > > > +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> > > > +	 * transaction is gone but the new buffer is not joined or held to any
> > > > +	 * transaction, so we must _buf_relse it.
> > > >  	 *
> > > > -	 * If there is only one transaction then don't stop the buffer
> > > > -	 * from being released when it commits later on.
> > > > +	 * If everything succeeds, the caller of this function is returned a
> > > > +	 * buffer that is locked, held, and joined to the transaction.  If the
> > > > +	 * transaction commit fails (in the caller) the caller must unlock the
> > > > +	 * buffer manually.
> > > 
> > > If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
> > > that the caller has to ultimately release it even after successful
> > > transaction commit (assuming we don't roll the transaction again
> > > somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
> > > but it looks like that only clears the hold if the buffer isn't logged
> > > in the tx. Hm?
> > 
> > Correct.  The buffer is initialized in the same transaction as the dquot
> > block allocation and committed in xfs_defer_finish.  After
> > initialization (which is to say when we return to xfs_qm_dqtobp), the
> > buffer is held, joined, and not logged to the transaction, and nothing
> > else is supposed to dirty the buffer.  Both buffer and transaction are
> > then returned in this state to _dqread, which the in-core dquot state
> > out of the dquot buffer and _trans_brelse's the (still clean) buffer,
> > which breaks the hold and unlocks the buffer.
> > 
> 
> Ok that makes sense, but doesn't that depend on having a deferred
> operation? Is that always guaranteed here?

Assuming you meant the case where we _trans_read_buf'd the dquot buffer
in from disk, we return a buffer that's clean, locked, and joined to the
transaction.  The only difference is that the buffer isn't held, but
_trans_brelse clears the hold unconditionally.

--D

> Brian
> 
> > After the refactor we guarantee that the buffer is locked, clean, and
> > not attached to a transaction by the time we get to calling
> > xfs_dquot_from_disk rather than returning transaction and buffer up the
> > call stack and having to reason up the stack about what state they're in.
> > 
> > --D
> > 
> > > Brian
> > > 
> > > >  	 */
> > > > -
> > > > -	xfs_trans_bhold(tp, bp);
> > > > -
> > > > +	xfs_trans_bhold(*tpp, bp);
> > > > +	error = xfs_defer_bjoin(&dfops, bp);
> > > > +	if (error) {
> > > > +		xfs_trans_bhold_release(*tpp, bp);
> > > > +		xfs_trans_brelse(*tpp, bp);
> > > > +		goto error1;
> > > > +	}
> > > >  	error = xfs_defer_finish(tpp, &dfops);
> > > > -	if (error)
> > > > +	if (error) {
> > > > +		xfs_buf_relse(bp);
> > > >  		goto error1;
> > > > -
> > > > -	/* Transaction was committed? */
> > > > -	if (*tpp != tp) {
> > > > -		tp = *tpp;
> > > > -		xfs_trans_bjoin(tp, bp);
> > > > -	} else {
> > > > -		xfs_trans_bhold_release(tp, bp);
> > > >  	}
> > > > -
> > > > -	*O_bpp = bp;
> > > >  	return 0;
> > > >  
> > > >  error1:
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster May 4, 2018, 4:03 p.m. UTC | #5
On Fri, May 04, 2018 at 08:52:36AM -0700, Darrick J. Wong wrote:
> On Fri, May 04, 2018 at 11:41:21AM -0400, Brian Foster wrote:
> > On Fri, May 04, 2018 at 08:12:47AM -0700, Darrick J. Wong wrote:
> > > On Fri, May 04, 2018 at 07:31:58AM -0400, Brian Foster wrote:
> > > > On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> > > > > allocating a new dquot record", we allocate a new dquot block, grab a
> > > > > buffer to initialize it, and return the locked initialized dquot buffer
> > > > > to the caller for further in-core dquot initialization.  Unfortunately,
> > > > > if the _bmap_finish errored out, _qm_dqalloc would also error out
> > > > > without bothering to free the (locked) buffer.  Leaking a locked buffer
> > > > > caused hangs in generic/388 when quotas are enabled.
> > > > > 
> > > > > Furthermore, the _bmap_finish -> _defer_finish conversion in
> > > > > 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> > > > > xfs_defer_*") failed to observe that the buffer was held going into
> > > > > _defer_finish and therefore failed to notice that the buffer lock is
> > > > > /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> > > > > defer_ops, use this mechanism to ensure that the buffer stays locked
> > > > > across the _defer_finish.  Release the holds and locks on the buffer as
> > > > > appropriate if we have to error out.
> > > > > 
> > > > > There is a subtlety here for the caller in that the buffer emerges
> > > > > locked and held to the transaction, so if the _trans_commit fails we
> > > > > have to release the buffer explicitly.  This fixes the unmount hang
> > > > > in generic/388 when quotas are enabled.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > >  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
> > > > >  1 file changed, 27 insertions(+), 21 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > > > index a7daef9e16bf..4c39d8632230 100644
> > > > > --- a/fs/xfs/xfs_dquot.c
> > > > > +++ b/fs/xfs/xfs_dquot.c
> > > > > @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
> > > > >  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
> > > > >  
> > > > >  	/*
> > > > > -	 * xfs_defer_finish() may commit the current transaction and
> > > > > -	 * start a second transaction if the freelist is not empty.
> > > > > +	 * Hold the buffer and join it to the dfops so that we'll still own
> > > > > +	 * the buffer when we return to the caller.  The buffer disposal on
> > > > > +	 * error must be paid attention to very carefully, as it has been
> > > > > +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> > > > > +	 * code when allocating a new dquot record" in 2005, and the later
> > > > > +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> > > > > +	 * the buffer locked across the _defer_finish call.  We can now do
> > > > > +	 * this correctly with xfs_defer_bjoin.
> > > > >  	 *
> > > > > -	 * Since we still want to modify this buffer, we need to
> > > > > -	 * ensure that the buffer is not released on commit of
> > > > > -	 * the first transaction and ensure the buffer is added to the
> > > > > -	 * second transaction.
> > > > > +	 * Above, we allocated a disk block for the dquot information and
> > > > > +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> > > > > +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> > > > > +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> > > > > +	 * transaction is gone but the new buffer is not joined or held to any
> > > > > +	 * transaction, so we must _buf_relse it.
> > > > >  	 *
> > > > > -	 * If there is only one transaction then don't stop the buffer
> > > > > -	 * from being released when it commits later on.
> > > > > +	 * If everything succeeds, the caller of this function is returned a
> > > > > +	 * buffer that is locked, held, and joined to the transaction.  If the
> > > > > +	 * transaction commit fails (in the caller) the caller must unlock the
> > > > > +	 * buffer manually.
> > > > 
> > > > If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
> > > > that the caller has to ultimately release it even after successful
> > > > transaction commit (assuming we don't roll the transaction again
> > > > somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
> > > > but it looks like that only clears the hold if the buffer isn't logged
> > > > in the tx. Hm?
> > > 
> > > Correct.  The buffer is initialized in the same transaction as the dquot
> > > block allocation and committed in xfs_defer_finish.  After
> > > initialization (which is to say when we return to xfs_qm_dqtobp), the
> > > buffer is held, joined, and not logged to the transaction, and nothing
> > > else is supposed to dirty the buffer.  Both buffer and transaction are
> > > then returned in this state to _dqread, which the in-core dquot state
> > > out of the dquot buffer and _trans_brelse's the (still clean) buffer,
> > > which breaks the hold and unlocks the buffer.
> > > 
> > 
> > Ok that makes sense, but doesn't that depend on having a deferred
> > operation? Is that always guaranteed here?
> 
> Assuming you meant the case where we _trans_read_buf'd the dquot buffer
> in from disk, we return a buffer that's clean, locked, and joined to the
> transaction.  The only difference is that the buffer isn't held, but
> _trans_brelse clears the hold unconditionally.
> 

I'm referring to the xfs_qm_dqalloc() case. It looks like we
xfs_bmapi_write(), get the buffer, call xfs_qm_init_dquot_blk() (logs
the buffer) then go into the defer/exit sequence modified by this
patch...

If the defer finish doesn't do anything, are we in the right state or is
the buffer still dirty+held? If the latter, doesn't that mean the buffer
remains held after the caller commits the current transaction?

Brian

> --D
> 
> > Brian
> > 
> > > After the refactor we guarantee that the buffer is locked, clean, and
> > > not attached to a transaction by the time we get to calling
> > > xfs_dquot_from_disk rather than returning transaction and buffer up the
> > > call stack and having to reason up the stack about what state they're in.
> > > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > >  	 */
> > > > > -
> > > > > -	xfs_trans_bhold(tp, bp);
> > > > > -
> > > > > +	xfs_trans_bhold(*tpp, bp);
> > > > > +	error = xfs_defer_bjoin(&dfops, bp);
> > > > > +	if (error) {
> > > > > +		xfs_trans_bhold_release(*tpp, bp);
> > > > > +		xfs_trans_brelse(*tpp, bp);
> > > > > +		goto error1;
> > > > > +	}
> > > > >  	error = xfs_defer_finish(tpp, &dfops);
> > > > > -	if (error)
> > > > > +	if (error) {
> > > > > +		xfs_buf_relse(bp);
> > > > >  		goto error1;
> > > > > -
> > > > > -	/* Transaction was committed? */
> > > > > -	if (*tpp != tp) {
> > > > > -		tp = *tpp;
> > > > > -		xfs_trans_bjoin(tp, bp);
> > > > > -	} else {
> > > > > -		xfs_trans_bhold_release(tp, bp);
> > > > >  	}
> > > > > -
> > > > > -	*O_bpp = bp;
> > > > >  	return 0;
> > > > >  
> > > > >  error1:
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong May 4, 2018, 8:05 p.m. UTC | #6
On Fri, May 04, 2018 at 12:03:56PM -0400, Brian Foster wrote:
> On Fri, May 04, 2018 at 08:52:36AM -0700, Darrick J. Wong wrote:
> > On Fri, May 04, 2018 at 11:41:21AM -0400, Brian Foster wrote:
> > > On Fri, May 04, 2018 at 08:12:47AM -0700, Darrick J. Wong wrote:
> > > > On Fri, May 04, 2018 at 07:31:58AM -0400, Brian Foster wrote:
> > > > > On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > 
> > > > > > In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> > > > > > allocating a new dquot record", we allocate a new dquot block, grab a
> > > > > > buffer to initialize it, and return the locked initialized dquot buffer
> > > > > > to the caller for further in-core dquot initialization.  Unfortunately,
> > > > > > if the _bmap_finish errored out, _qm_dqalloc would also error out
> > > > > > without bothering to free the (locked) buffer.  Leaking a locked buffer
> > > > > > caused hangs in generic/388 when quotas are enabled.
> > > > > > 
> > > > > > Furthermore, the _bmap_finish -> _defer_finish conversion in
> > > > > > 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> > > > > > xfs_defer_*") failed to observe that the buffer was held going into
> > > > > > _defer_finish and therefore failed to notice that the buffer lock is
> > > > > > /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> > > > > > defer_ops, use this mechanism to ensure that the buffer stays locked
> > > > > > across the _defer_finish.  Release the holds and locks on the buffer as
> > > > > > appropriate if we have to error out.
> > > > > > 
> > > > > > There is a subtlety here for the caller in that the buffer emerges
> > > > > > locked and held to the transaction, so if the _trans_commit fails we
> > > > > > have to release the buffer explicitly.  This fixes the unmount hang
> > > > > > in generic/388 when quotas are enabled.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > >  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
> > > > > >  1 file changed, 27 insertions(+), 21 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > > > > index a7daef9e16bf..4c39d8632230 100644
> > > > > > --- a/fs/xfs/xfs_dquot.c
> > > > > > +++ b/fs/xfs/xfs_dquot.c
> > > > > > @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
> > > > > >  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
> > > > > >  
> > > > > >  	/*
> > > > > > -	 * xfs_defer_finish() may commit the current transaction and
> > > > > > -	 * start a second transaction if the freelist is not empty.
> > > > > > +	 * Hold the buffer and join it to the dfops so that we'll still own
> > > > > > +	 * the buffer when we return to the caller.  The buffer disposal on
> > > > > > +	 * error must be paid attention to very carefully, as it has been
> > > > > > +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> > > > > > +	 * code when allocating a new dquot record" in 2005, and the later
> > > > > > +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> > > > > > +	 * the buffer locked across the _defer_finish call.  We can now do
> > > > > > +	 * this correctly with xfs_defer_bjoin.
> > > > > >  	 *
> > > > > > -	 * Since we still want to modify this buffer, we need to
> > > > > > -	 * ensure that the buffer is not released on commit of
> > > > > > -	 * the first transaction and ensure the buffer is added to the
> > > > > > -	 * second transaction.
> > > > > > +	 * Above, we allocated a disk block for the dquot information and
> > > > > > +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> > > > > > +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> > > > > > +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> > > > > > +	 * transaction is gone but the new buffer is not joined or held to any
> > > > > > +	 * transaction, so we must _buf_relse it.
> > > > > >  	 *
> > > > > > -	 * If there is only one transaction then don't stop the buffer
> > > > > > -	 * from being released when it commits later on.
> > > > > > +	 * If everything succeeds, the caller of this function is returned a
> > > > > > +	 * buffer that is locked, held, and joined to the transaction.  If the
> > > > > > +	 * transaction commit fails (in the caller) the caller must unlock the
> > > > > > +	 * buffer manually.
> > > > > 
> > > > > If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
> > > > > that the caller has to ultimately release it even after successful
> > > > > transaction commit (assuming we don't roll the transaction again
> > > > > somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
> > > > > but it looks like that only clears the hold if the buffer isn't logged
> > > > > in the tx. Hm?
> > > > 
> > > > Correct.  The buffer is initialized in the same transaction as the dquot
> > > > block allocation and committed in xfs_defer_finish.  After
> > > > initialization (which is to say when we return to xfs_qm_dqtobp), the
> > > > buffer is held, joined, and not logged to the transaction, and nothing
> > > > else is supposed to dirty the buffer.  Both buffer and transaction are
> > > > then returned in this state to _dqread, which the in-core dquot state
> > > > out of the dquot buffer and _trans_brelse's the (still clean) buffer,
> > > > which breaks the hold and unlocks the buffer.
> > > > 
> > > 
> > > Ok that makes sense, but doesn't that depend on having a deferred
> > > operation? Is that always guaranteed here?
> > 
> > Assuming you meant the case where we _trans_read_buf'd the dquot buffer
> > in from disk, we return a buffer that's clean, locked, and joined to the
> > transaction.  The only difference is that the buffer isn't held, but
> > _trans_brelse clears the hold unconditionally.
> > 
> 
> I'm referring to the xfs_qm_dqalloc() case. It looks like we
> xfs_bmapi_write(), get the buffer, call xfs_qm_init_dquot_blk() (logs
> the buffer) then go into the defer/exit sequence modified by this
> patch...
> 
> If the defer finish doesn't do anything, are we in the right state or is
> the buffer still dirty+held? If the latter, doesn't that mean the buffer
> remains held after the caller commits the current transaction?

Ah, right, now I see what you're getting at.

If the defer_finish does nothing then yes we do exit up to _dqread with
the buffer locked, dirty, joined, and held to the transaction, in which
case the xfs_trans_brelse does nothing.  The transaction commit right
after that will log the buffer and release the hold, but the buffer's
still locked.

So I guess we need to roll the transaction after the defer_finish to
guarantee that the buffer is no longer dirty.  Potentially we could
modify _defer_finish to always roll at least once if any of the
_defer_[bi]join items are dirty so that we always return with the
_defer_*join'd items clean?

For now it's probably fine to _bhold_release it after the _defer_finish
(as Brian just suggested on irc) and reconsider "_defer_finish with
defer_joined dirty stuff" separately.

The refactor fixes all this so that we always commit the transaction
before handing back a bjoin'd buffer, so the state is consistent.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > After the refactor we guarantee that the buffer is locked, clean, and
> > > > not attached to a transaction by the time we get to calling
> > > > xfs_dquot_from_disk rather than returning transaction and buffer up the
> > > > call stack and having to reason up the stack about what state they're in.
> > > > 
> > > > --D
> > > > 
> > > > > Brian
> > > > > 
> > > > > >  	 */
> > > > > > -
> > > > > > -	xfs_trans_bhold(tp, bp);
> > > > > > -
> > > > > > +	xfs_trans_bhold(*tpp, bp);
> > > > > > +	error = xfs_defer_bjoin(&dfops, bp);
> > > > > > +	if (error) {
> > > > > > +		xfs_trans_bhold_release(*tpp, bp);
> > > > > > +		xfs_trans_brelse(*tpp, bp);
> > > > > > +		goto error1;
> > > > > > +	}
> > > > > >  	error = xfs_defer_finish(tpp, &dfops);
> > > > > > -	if (error)
> > > > > > +	if (error) {
> > > > > > +		xfs_buf_relse(bp);
> > > > > >  		goto error1;
> > > > > > -
> > > > > > -	/* Transaction was committed? */
> > > > > > -	if (*tpp != tp) {
> > > > > > -		tp = *tpp;
> > > > > > -		xfs_trans_bjoin(tp, bp);
> > > > > > -	} else {
> > > > > > -		xfs_trans_bhold_release(tp, bp);
> > > > > >  	}
> > > > > > -
> > > > > > -	*O_bpp = bp;
> > > > > >  	return 0;
> > > > > >  
> > > > > >  error1:
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index a7daef9e16bf..4c39d8632230 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -362,33 +362,39 @@  xfs_qm_dqalloc(
 			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
 
 	/*
-	 * xfs_defer_finish() may commit the current transaction and
-	 * start a second transaction if the freelist is not empty.
+	 * Hold the buffer and join it to the dfops so that we'll still own
+	 * the buffer when we return to the caller.  The buffer disposal on
+	 * error must be paid attention to very carefully, as it has been
+	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
+	 * code when allocating a new dquot record" in 2005, and the later
+	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
+	 * the buffer locked across the _defer_finish call.  We can now do
+	 * this correctly with xfs_defer_bjoin.
 	 *
-	 * Since we still want to modify this buffer, we need to
-	 * ensure that the buffer is not released on commit of
-	 * the first transaction and ensure the buffer is added to the
-	 * second transaction.
+	 * Above, we allocated a disk block for the dquot information and
+	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
+	 * the buffer is still locked to *tpp, so we must _bhold_release and
+	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
+	 * transaction is gone but the new buffer is not joined or held to any
+	 * transaction, so we must _buf_relse it.
 	 *
-	 * If there is only one transaction then don't stop the buffer
-	 * from being released when it commits later on.
+	 * If everything succeeds, the caller of this function is returned a
+	 * buffer that is locked, held, and joined to the transaction.  If the
+	 * transaction commit fails (in the caller) the caller must unlock the
+	 * buffer manually.
 	 */
-
-	xfs_trans_bhold(tp, bp);
-
+	xfs_trans_bhold(*tpp, bp);
+	error = xfs_defer_bjoin(&dfops, bp);
+	if (error) {
+		xfs_trans_bhold_release(*tpp, bp);
+		xfs_trans_brelse(*tpp, bp);
+		goto error1;
+	}
 	error = xfs_defer_finish(tpp, &dfops);
-	if (error)
+	if (error) {
+		xfs_buf_relse(bp);
 		goto error1;
-
-	/* Transaction was committed? */
-	if (*tpp != tp) {
-		tp = *tpp;
-		xfs_trans_bjoin(tp, bp);
-	} else {
-		xfs_trans_bhold_release(tp, bp);
 	}
-
-	*O_bpp = bp;
 	return 0;
 
 error1: