diff mbox

Btrfs: fix unexpected balance crash due to BUG_ON

Message ID 1462230062-8053-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State Superseded
Headers show

Commit Message

Liu Bo May 2, 2016, 11:01 p.m. UTC
Mounting a btrfs can resume previous balance operations asynchronously.
An user got a crash when one drive has some corrupt sectors.

Since balance can cancel itself in case of any error, we can gracefully
return errors to upper layers and let balance do the cancel job.

Reported-by: sash <master.b.at.raven@chefmail.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/volumes.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Holger Hoffstätte May 3, 2016, 11:14 p.m. UTC | #1
On Tue, May 3, 2016 at 1:01 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> Mounting a btrfs can resume previous balance operations asynchronously.
> An user got a crash when one drive has some corrupt sectors.
>
> Since balance can cancel itself in case of any error, we can gracefully
> return errors to upper layers and let balance do the cancel job.
>
> Reported-by: sash <master.b.at.raven@chefmail.de>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/volumes.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index bd0f45f..5aed2e2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3418,13 +3418,25 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>                 ret = btrfs_shrink_device(device, old_size - size_to_free);
>                 if (ret == -ENOSPC)
>                         break;
> -               BUG_ON(ret);
> +               if (ret) {
> +                       /* btrfs_shrink_device never returns ret > 0 */
> +                       WARN_ON_ONCE(ret > 0);
> +                       goto error;
> +               }
>
>                 trans = btrfs_start_transaction(dev_root, 0);
> -               BUG_ON(IS_ERR(trans));
> +               if (IS_ERR(trans)) {
> +                       ret = PTR_ERR(trans);
> +                       goto error;
> +               }
>
>                 ret = btrfs_grow_device(trans, device, old_size);
> -               BUG_ON(ret);
> +               if (ret) {
> +                       btrfs_end_transaction(trans, dev_root);
> +                       /* btrfs_grow_device never returns ret > 0 */
> +                       WARN_ON_ONCE(ret > 0);
> +                       goto error;
> +               }
>
>                 btrfs_end_transaction(trans, dev_root);
>         }

Just a heads up that this seems to introduce a valid warning, since it now
can goto error before the first initializing use of path:

fs/btrfs/volumes.c: In function 'btrfs_balance':
fs/btrfs/volumes.c:3601:2: warning: 'path' may be used uninitialized
in this function [-Wmaybe-uninitialized]
  btrfs_free_path(path);
  ^
fs/btrfs/volumes.c:3385:21: note: 'path' was declared here
  struct btrfs_path *path;
                     ^
(it's really in __btrfs_balance which got inlined, so gcc thinks it's
at the call site).

Simply setting path = NULL at the beginning of __btrfs_balance fixes it, since
btrfs_free_path allows NULL values.

cheers
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo May 3, 2016, 11:30 p.m. UTC | #2
On Wed, May 04, 2016 at 01:14:27AM +0200, Holger Hoffstätte wrote:
> On Tue, May 3, 2016 at 1:01 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > Mounting a btrfs can resume previous balance operations asynchronously.
> > An user got a crash when one drive has some corrupt sectors.
> >
> > Since balance can cancel itself in case of any error, we can gracefully
> > return errors to upper layers and let balance do the cancel job.
> >
> > Reported-by: sash <master.b.at.raven@chefmail.de>
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/volumes.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index bd0f45f..5aed2e2 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -3418,13 +3418,25 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
> >                 ret = btrfs_shrink_device(device, old_size - size_to_free);
> >                 if (ret == -ENOSPC)
> >                         break;
> > -               BUG_ON(ret);
> > +               if (ret) {
> > +                       /* btrfs_shrink_device never returns ret > 0 */
> > +                       WARN_ON_ONCE(ret > 0);
> > +                       goto error;
> > +               }
> >
> >                 trans = btrfs_start_transaction(dev_root, 0);
> > -               BUG_ON(IS_ERR(trans));
> > +               if (IS_ERR(trans)) {
> > +                       ret = PTR_ERR(trans);
> > +                       goto error;
> > +               }
> >
> >                 ret = btrfs_grow_device(trans, device, old_size);
> > -               BUG_ON(ret);
> > +               if (ret) {
> > +                       btrfs_end_transaction(trans, dev_root);
> > +                       /* btrfs_grow_device never returns ret > 0 */
> > +                       WARN_ON_ONCE(ret > 0);
> > +                       goto error;
> > +               }
> >
> >                 btrfs_end_transaction(trans, dev_root);
> >         }
> 
> Just a heads up that this seems to introduce a valid warning, since it now
> can goto error before the first initializing use of path:
> 
> fs/btrfs/volumes.c: In function 'btrfs_balance':
> fs/btrfs/volumes.c:3601:2: warning: 'path' may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>   btrfs_free_path(path);
>   ^
> fs/btrfs/volumes.c:3385:21: note: 'path' was declared here
>   struct btrfs_path *path;
>                      ^
> (it's really in __btrfs_balance which got inlined, so gcc thinks it's
> at the call site).
> 
> Simply setting path = NULL at the beginning of __btrfs_balance fixes it, since
> btrfs_free_path allows NULL values.

That's right, it's weird that I didn't get this warning while testing it.

Thanks for catching it, Holger.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba May 4, 2016, 2:59 p.m. UTC | #3
On Mon, May 02, 2016 at 04:01:02PM -0700, Liu Bo wrote:
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3418,13 +3418,25 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>  		ret = btrfs_shrink_device(device, old_size - size_to_free);
>  		if (ret == -ENOSPC)
>  			break;
> -		BUG_ON(ret);
> +		if (ret) {
> +			/* btrfs_shrink_device never returns ret > 0 */
> +			WARN_ON_ONCE(ret > 0);
> +			goto error;
> +		}
>  
>  		trans = btrfs_start_transaction(dev_root, 0);
> -		BUG_ON(IS_ERR(trans));
> +		if (IS_ERR(trans)) {
> +			ret = PTR_ERR(trans);
> +			goto error;
> +		}
>  
>  		ret = btrfs_grow_device(trans, device, old_size);
> -		BUG_ON(ret);
> +		if (ret) {
> +			btrfs_end_transaction(trans, dev_root);
> +			/* btrfs_grow_device never returns ret > 0 */
> +			WARN_ON_ONCE(ret > 0);
> +			goto error;
> +		}

The "shrink then grow" trick seems to be necessary to make the workspace
for balance. I'm thinking what could be the intermediate result when it
succeeds only partially that we should worry about. (This also means
partial success on several devices only.)

If just shrink succeeds, then there is a smaller device that the user
did not ask for. This is effectively no change from the current
behaviour, now we fail more gracefully.

My idea is to print the at least original size of the device if
transaction start or grow phase fails.

Otherwise patch looks ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 8, 2016, 4:05 p.m. UTC | #4
On Tue, May 03, 2016 at 04:30:54PM -0700, Liu Bo wrote:
> > Just a heads up that this seems to introduce a valid warning, since it now
> > can goto error before the first initializing use of path:
> > 
> > fs/btrfs/volumes.c: In function 'btrfs_balance':
> > fs/btrfs/volumes.c:3601:2: warning: 'path' may be used uninitialized
> > in this function [-Wmaybe-uninitialized]
> >   btrfs_free_path(path);
> >   ^
> > fs/btrfs/volumes.c:3385:21: note: 'path' was declared here
> >   struct btrfs_path *path;
> >                      ^
> > (it's really in __btrfs_balance which got inlined, so gcc thinks it's
> > at the call site).
> > 
> > Simply setting path = NULL at the beginning of __btrfs_balance fixes it, since
> > btrfs_free_path allows NULL values.
> 
> That's right, it's weird that I didn't get this warning while testing it.
> 
> Thanks for catching it, Holger.

Please send a v2, the patch is desiable.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo July 8, 2016, 9:26 p.m. UTC | #5
On Fri, Jul 08, 2016 at 06:05:16PM +0200, David Sterba wrote:
> On Tue, May 03, 2016 at 04:30:54PM -0700, Liu Bo wrote:
> > > Just a heads up that this seems to introduce a valid warning, since it now
> > > can goto error before the first initializing use of path:
> > > 
> > > fs/btrfs/volumes.c: In function 'btrfs_balance':
> > > fs/btrfs/volumes.c:3601:2: warning: 'path' may be used uninitialized
> > > in this function [-Wmaybe-uninitialized]
> > >   btrfs_free_path(path);
> > >   ^
> > > fs/btrfs/volumes.c:3385:21: note: 'path' was declared here
> > >   struct btrfs_path *path;
> > >                      ^
> > > (it's really in __btrfs_balance which got inlined, so gcc thinks it's
> > > at the call site).
> > > 
> > > Simply setting path = NULL at the beginning of __btrfs_balance fixes it, since
> > > btrfs_free_path allows NULL values.
> > 
> > That's right, it's weird that I didn't get this warning while testing it.
> > 
> > Thanks for catching it, Holger.
> 
> Please send a v2, the patch is desiable.

Oh, I almost forgot this one, thanks for the reminder.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bd0f45f..5aed2e2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3418,13 +3418,25 @@  static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 		ret = btrfs_shrink_device(device, old_size - size_to_free);
 		if (ret == -ENOSPC)
 			break;
-		BUG_ON(ret);
+		if (ret) {
+			/* btrfs_shrink_device never returns ret > 0 */
+			WARN_ON_ONCE(ret > 0);
+			goto error;
+		}
 
 		trans = btrfs_start_transaction(dev_root, 0);
-		BUG_ON(IS_ERR(trans));
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+			goto error;
+		}
 
 		ret = btrfs_grow_device(trans, device, old_size);
-		BUG_ON(ret);
+		if (ret) {
+			btrfs_end_transaction(trans, dev_root);
+			/* btrfs_grow_device never returns ret > 0 */
+			WARN_ON_ONCE(ret > 0);
+			goto error;
+		}
 
 		btrfs_end_transaction(trans, dev_root);
 	}