diff mbox series

[2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance

Message ID 18102abe-0101-bd08-dc5b-2f288dc0d8d3@sandeen.net (mailing list archive)
State Superseded
Headers show
Series xfs_repair: two fixes | expand

Commit Message

Eric Sandeen Oct. 23, 2018, 4:08 a.m. UTC
After commit:

15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi

xfs_bunmapi can legitimately return before all work is done.
Sadly nobody told xfs_repair, so it fires an assert:

 phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. 

Fix this by calling back in until all work is done, as we do
in the kernel.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

Comments

Darrick J. Wong Oct. 23, 2018, 4:22 a.m. UTC | #1
On Mon, Oct 22, 2018 at 11:08:49PM -0500, Eric Sandeen wrote:
> After commit:
> 
> 15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi
> 
> xfs_bunmapi can legitimately return before all work is done.
> Sadly nobody told xfs_repair, so it fires an assert:
> 
>  phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. 
> 
> Fix this by calling back in until all work is done, as we do
> in the kernel.

Looking at the rest of xfsprogs, I think the other directory-related
xfs_bunmapi callers probably need to be able to roll-and-continue, but
that seems like a topic for (a) the kernel and (b) separate patches.

> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
> Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/repair/phase6.c b/repair/phase6.c
> index e017326..b87c751 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -1317,7 +1317,7 @@ longform_dir2_rebuild(
>  	xfs_fileoff_t		lastblock;
>  	xfs_inode_t		pip;
>  	dir_hash_ent_t		*p;
> -	int			done;
> +	int			done = 0;
>  
>  	/*
>  	 * trash directory completely and rebuild from scratch using the
> @@ -1352,12 +1352,25 @@ longform_dir2_rebuild(
>  			error);
>  
>  	/* free all data, leaf, node and freespace blocks */
> -	error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
> -				&done);
> -	if (error) {
> -		do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
> -		goto out_bmap_cancel;
> -	}
> +	while (!done) {
> +	       error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA,
> +			               0, &done);
> +	       if (error) {
> +		       do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
> +		       goto out_bmap_cancel;
> +	       }
> +	       error = xfs_defer_finish(&tp);

error = -libxfs_defer_finish(...);

> +	       if (error) {
> +		       do_warn(("defer_finish failed -- error - %d\n"), error);
> +		       goto out_bmap_cancel;
> +	       }
> +	       /*
> +		* Close out trans and start the next one in the chain.
> +		*/
> +	       error = xfs_trans_roll_inode(&tp, ip);

error = -libxfs_trans_roll_inode(...);

> +	       if (error)
> +			goto out_bmap_cancel;
> +        }
>  
>  	ASSERT(done);

This assert can go away since !done is the loop test condition.

--D

>  
>
diff mbox series

Patch

diff --git a/repair/phase6.c b/repair/phase6.c
index e017326..b87c751 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1317,7 +1317,7 @@  longform_dir2_rebuild(
 	xfs_fileoff_t		lastblock;
 	xfs_inode_t		pip;
 	dir_hash_ent_t		*p;
-	int			done;
+	int			done = 0;
 
 	/*
 	 * trash directory completely and rebuild from scratch using the
@@ -1352,12 +1352,25 @@  longform_dir2_rebuild(
 			error);
 
 	/* free all data, leaf, node and freespace blocks */
-	error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
-				&done);
-	if (error) {
-		do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
-		goto out_bmap_cancel;
-	}
+	while (!done) {
+	       error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA,
+			               0, &done);
+	       if (error) {
+		       do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
+		       goto out_bmap_cancel;
+	       }
+	       error = xfs_defer_finish(&tp);
+	       if (error) {
+		       do_warn(("defer_finish failed -- error - %d\n"), error);
+		       goto out_bmap_cancel;
+	       }
+	       /*
+		* Close out trans and start the next one in the chain.
+		*/
+	       error = xfs_trans_roll_inode(&tp, ip);
+	       if (error)
+			goto out_bmap_cancel;
+        }
 
 	ASSERT(done);