diff mbox

[2/2] xfs: fix double ijoin in xfs_reflink_cancel_cow_range

Message ID 20180307091020.6186-3-david@fromorbit.com (mailing list archive)
State Accepted
Headers show

Commit Message

Dave Chinner March 7, 2018, 9:10 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

AN inode is joined to teh same transaction twice in
xfs_reflink_cancel_cow_range() resulting in the following assert
failure:

[   30.180485] XFS: Assertion failed: !(lip->li_flags & XFS_LI_TRANS), file: fs/xfs/xfs_trans.c, line: 740
[   30.183435] ------------[ cut here ]------------
......
[   30.209264] Call Trace:
[   30.209935]  xfs_trans_add_item+0xcc/0xe0
[   30.210968]  xfs_reflink_cancel_cow_blocks+0xab/0x290
[   30.212249]  ? xfs_trans_reserve+0x1b4/0x2b0
[   30.213320]  ? kmem_zone_alloc+0x61/0xe0
[   30.214321]  xfs_reflink_cancel_cow_range+0xb2/0x1f0
[   30.215616]  xfs_fs_destroy_inode+0x1bd/0x280
[   30.216757]  dispose_list+0x35/0x40
[   30.217656]  evict_inodes+0x132/0x160
[   30.218620]  generic_shutdown_super+0x3a/0x110
[   30.219771]  kill_block_super+0x21/0x50
[   30.220762]  deactivate_locked_super+0x39/0x70
[   30.221909]  cleanup_mnt+0x3b/0x70
[   30.222819]  task_work_run+0x7f/0xa0
[   30.223762]  exit_to_usermode_loop+0x9b/0xa0
[   30.224884]  do_syscall_64+0x18f/0x1a0

Fix it and document that the callers of
xfs_reflink_cancel_cow_blocks() must have already joined the inode
to the permanent transaction passed in.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_reflink.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig March 7, 2018, 1:17 p.m. UTC | #1
On Wed, Mar 07, 2018 at 08:10:20PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> AN inode is joined to teh same transaction twice in
> xfs_reflink_cancel_cow_range() resulting in the following assert
> failure:

Needs some major spelling love :)

> 
> [   30.180485] XFS: Assertion failed: !(lip->li_flags & XFS_LI_TRANS), file: fs/xfs/xfs_trans.c, line: 740

That assertations seems like something that only exists locally in your
tree.  Any chance to send it out with this series?

The other option would be to make xfs_trans_ijoin a no-op if the inode
is already joined, except that this wouldn't work with the magic unlock
on commit.  Which is a feature I find horribly confusing, so we should
get rid of it, for which we'd need to get rid of the concept of
synchronous transactions in favour of leaving the log force to the
caller, which again would be more logical.

Guess I need to look into doing these cleanups as I don't want to force
them on anyone else.  Just need to finish all the other more important
bits on the todo list first :)
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 7, 2018, 9:07 p.m. UTC | #2
On Wed, Mar 07, 2018 at 05:17:25AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 07, 2018 at 08:10:20PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > AN inode is joined to teh same transaction twice in
> > xfs_reflink_cancel_cow_range() resulting in the following assert
> > failure:
> 
> Needs some major spelling love :)

Yeah, I sent it out as soon as I realised it was more than just an
isolated occurrence. Needs updating...

> > [   30.180485] XFS: Assertion failed: !(lip->li_flags & XFS_LI_TRANS), file: fs/xfs/xfs_trans.c, line: 740
> 
> That assertations seems like something that only exists locally in your
> tree.  Any chance to send it out with this series?

It's in the first patch. I've got to revise it, anyway, because I
missed the xfs_trans_brelease() case that removes the log item from
the transaction and that throws a false positive in the rmap
finishing code.

> The other option would be to make xfs_trans_ijoin a no-op if the inode
> is already joined, except that this wouldn't work with the magic unlock
> on commit. 

I'd much prefer we catch all the places where we are not handling
permanent transaction state correctly as we pass it between
different functions. Especially as we need to do that to get rid of
the log item descriptor abstraction...

> Which is a feature I find horribly confusing, so we should
> get rid of it, for which we'd need to get rid of the concept of
> synchronous transactions in favour of leaving the log force to the
> caller, which again would be more logical.
> 
> Guess I need to look into doing these cleanups as I don't want to force
> them on anyone else.  Just need to finish all the other more important
> bits on the todo list first :)

As always :P

Cheers,

Dave.
Christoph Hellwig March 8, 2018, 8:17 a.m. UTC | #3
On Thu, Mar 08, 2018 at 08:07:07AM +1100, Dave Chinner wrote:
> It's in the first patch. I've got to revise it, anyway, because I
> missed the xfs_trans_brelease() case that removes the log item from
> the transaction and that throws a false positive in the rmap
> finishing code.

Please split the infrastructure into a new patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 9, 2018, 6:43 p.m. UTC | #4
On Wed, Mar 07, 2018 at 08:10:20PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> AN inode is joined to teh same transaction twice in
> xfs_reflink_cancel_cow_range() resulting in the following assert
> failure:
> 
> [   30.180485] XFS: Assertion failed: !(lip->li_flags & XFS_LI_TRANS), file: fs/xfs/xfs_trans.c, line: 740
> [   30.183435] ------------[ cut here ]------------
> ......
> [   30.209264] Call Trace:
> [   30.209935]  xfs_trans_add_item+0xcc/0xe0
> [   30.210968]  xfs_reflink_cancel_cow_blocks+0xab/0x290
> [   30.212249]  ? xfs_trans_reserve+0x1b4/0x2b0
> [   30.213320]  ? kmem_zone_alloc+0x61/0xe0
> [   30.214321]  xfs_reflink_cancel_cow_range+0xb2/0x1f0
> [   30.215616]  xfs_fs_destroy_inode+0x1bd/0x280
> [   30.216757]  dispose_list+0x35/0x40
> [   30.217656]  evict_inodes+0x132/0x160
> [   30.218620]  generic_shutdown_super+0x3a/0x110
> [   30.219771]  kill_block_super+0x21/0x50
> [   30.220762]  deactivate_locked_super+0x39/0x70
> [   30.221909]  cleanup_mnt+0x3b/0x70
> [   30.222819]  task_work_run+0x7f/0xa0
> [   30.223762]  exit_to_usermode_loop+0x9b/0xa0
> [   30.224884]  do_syscall_64+0x18f/0x1a0
> 
> Fix it and document that the callers of
> xfs_reflink_cancel_cow_blocks() must have already joined the inode
> to the permanent transaction passed in.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_reflink.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 8c16177b33d4..6225d1ea3fdb 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -552,6 +552,9 @@ xfs_reflink_trim_irec_to_next_cow(
>   *
>   * If cancel_real is true this function cancels all COW fork extents for the
>   * inode; if cancel_real is false, real extents are not cleared.
> + *
> + * Caller must have already joined the inode to the current transaction. The
> + * inode will be joined to the transaction returned to the caller.
>   */
>  int
>  xfs_reflink_cancel_cow_blocks(
> @@ -592,7 +595,6 @@ xfs_reflink_cancel_cow_blocks(
>  			if (error)
>  				break;
>  		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
> -			xfs_trans_ijoin(*tpp, ip, 0);

Looks ok, but...

>  			xfs_defer_init(&dfops, &firstfsb);
>  
>  			/* Free the CoW orphan record. */
> @@ -1571,6 +1573,7 @@ xfs_reflink_clear_inode_flag(

Wait, what?  Why are we messing with xfs_reflink_clear_inode_flag here?

(Shame on me for looking at patch 3 before patch 2.)

The comment update in patch 3 is fine (caller must ijoin, function will
ijoin if returning new transaction) but ... didn't this function already
do all this before this churn below?

--D

>  	 * We didn't find any shared blocks so turn off the reflink flag.
>  	 * First, get rid of any leftover CoW mappings.
>  	 */
> +	xfs_trans_ijoin(*tpp, ip, 0);
>  	error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF, true);
>  	if (error)
>  		return error;
> @@ -1579,7 +1582,6 @@ xfs_reflink_clear_inode_flag(
>  	trace_xfs_reflink_unset_inode_flag(ip);
>  	ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
>  	xfs_inode_clear_cowblocks_tag(ip);
> -	xfs_trans_ijoin(*tpp, ip, 0);
>  	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
>  
>  	return error;
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 10, 2018, 12:48 a.m. UTC | #5
On Fri, Mar 09, 2018 at 10:43:35AM -0800, Darrick J. Wong wrote:
> On Wed, Mar 07, 2018 at 08:10:20PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > AN inode is joined to teh same transaction twice in
> > xfs_reflink_cancel_cow_range() resulting in the following assert
> > failure:
> > 
> > [   30.180485] XFS: Assertion failed: !(lip->li_flags & XFS_LI_TRANS), file: fs/xfs/xfs_trans.c, line: 740
> > [   30.183435] ------------[ cut here ]------------
> > ......
> > [   30.209264] Call Trace:
> > [   30.209935]  xfs_trans_add_item+0xcc/0xe0
> > [   30.210968]  xfs_reflink_cancel_cow_blocks+0xab/0x290
> > [   30.212249]  ? xfs_trans_reserve+0x1b4/0x2b0
> > [   30.213320]  ? kmem_zone_alloc+0x61/0xe0
> > [   30.214321]  xfs_reflink_cancel_cow_range+0xb2/0x1f0
> > [   30.215616]  xfs_fs_destroy_inode+0x1bd/0x280
> > [   30.216757]  dispose_list+0x35/0x40
> > [   30.217656]  evict_inodes+0x132/0x160
> > [   30.218620]  generic_shutdown_super+0x3a/0x110
> > [   30.219771]  kill_block_super+0x21/0x50
> > [   30.220762]  deactivate_locked_super+0x39/0x70
> > [   30.221909]  cleanup_mnt+0x3b/0x70
> > [   30.222819]  task_work_run+0x7f/0xa0
> > [   30.223762]  exit_to_usermode_loop+0x9b/0xa0
> > [   30.224884]  do_syscall_64+0x18f/0x1a0
> > 
> > Fix it and document that the callers of
> > xfs_reflink_cancel_cow_blocks() must have already joined the inode
> > to the permanent transaction passed in.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_reflink.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 8c16177b33d4..6225d1ea3fdb 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -552,6 +552,9 @@ xfs_reflink_trim_irec_to_next_cow(
> >   *
> >   * If cancel_real is true this function cancels all COW fork extents for the
> >   * inode; if cancel_real is false, real extents are not cleared.
> > + *
> > + * Caller must have already joined the inode to the current transaction. The
> > + * inode will be joined to the transaction returned to the caller.
> >   */
> >  int
> >  xfs_reflink_cancel_cow_blocks(
> > @@ -592,7 +595,6 @@ xfs_reflink_cancel_cow_blocks(
> >  			if (error)
> >  				break;
> >  		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
> > -			xfs_trans_ijoin(*tpp, ip, 0);
> 
> Looks ok, but...
> 
> >  			xfs_defer_init(&dfops, &firstfsb);
> >  
> >  			/* Free the CoW orphan record. */
> > @@ -1571,6 +1573,7 @@ xfs_reflink_clear_inode_flag(
> 
> Wait, what?  Why are we messing with xfs_reflink_clear_inode_flag here?
> 
> (Shame on me for looking at patch 3 before patch 2.)
> 
> The comment update in patch 3 is fine (caller must ijoin, function will
> ijoin if returning new transaction) but ... didn't this function already
> do all this before this churn below?

Peeling the onion from the inside out. First I fixed
xfs_reflink_cancel_cow_blocks(), then discovered that
xfs_reflink_clear_inode_flag() also joined the inode to the
transaction. Basically, patch 2 was a fix that triggered earlier in
a fstests run, patch three triggered later one after patch 2 was
done.

I've reworked the series - still testing because now I'm hitting
transaction block overruns again - and I'll make sure that it's all
sorted in that series.

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 8c16177b33d4..6225d1ea3fdb 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -552,6 +552,9 @@  xfs_reflink_trim_irec_to_next_cow(
  *
  * If cancel_real is true this function cancels all COW fork extents for the
  * inode; if cancel_real is false, real extents are not cleared.
+ *
+ * Caller must have already joined the inode to the current transaction. The
+ * inode will be joined to the transaction returned to the caller.
  */
 int
 xfs_reflink_cancel_cow_blocks(
@@ -592,7 +595,6 @@  xfs_reflink_cancel_cow_blocks(
 			if (error)
 				break;
 		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
-			xfs_trans_ijoin(*tpp, ip, 0);
 			xfs_defer_init(&dfops, &firstfsb);
 
 			/* Free the CoW orphan record. */
@@ -1571,6 +1573,7 @@  xfs_reflink_clear_inode_flag(
 	 * We didn't find any shared blocks so turn off the reflink flag.
 	 * First, get rid of any leftover CoW mappings.
 	 */
+	xfs_trans_ijoin(*tpp, ip, 0);
 	error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF, true);
 	if (error)
 		return error;
@@ -1579,7 +1582,6 @@  xfs_reflink_clear_inode_flag(
 	trace_xfs_reflink_unset_inode_flag(ip);
 	ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
 	xfs_inode_clear_cowblocks_tag(ip);
-	xfs_trans_ijoin(*tpp, ip, 0);
 	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
 
 	return error;